Writing a feed is easy after the Channel object is built. Since RSS 2.0 is the most popular format, the IO classes in Yarfraw are all default to RSS 2.0. The following codes writes a channel to an xml file "rss20.xml" in RSS 2.0 format. Note that since every feed has only one channel, the top level 'rss' element is automatically generated for you, all you need to do is to pass in the channel object:
Channel c = new Channel() ... FeedWriter w = new FeedWriter("rss20.xml")); w.writeChannel(c);
Writing other formats are similarly easy, just need to tell the writer what format you want to write in. The following codes demostrate how to write in Atom 1.0 and RSS 1.0/RDF format.
FeedWriter writer = new FeedWriter("atom10.xml")); writer.setFormat(FeedFormat.ATOM10); writer.writeChannel(c); File f = new File("rss10.xml"); FeedWriter writer2 = new FeedWriter(f, FeedFormat.RSS10); writer2.writeChannel(c);
Sometimes you may want to simply add a couple of items to an existing feed without modifying the top level channel. You can use the FeedAppender class to do that. The appender methods work very much like the insert methods in java.util.List. You can either specify a specific location to append new items to, or you can simply append to the beginning or to the end.
Internally, the appender reads in the feed, modifies it, and then writes it back to the file. So if performance is a concern, instead of doing a read/write operation for every modification, you might want to write your own appender that only writes the channel to file when it needs to.
FeedAppender a = new FeedAppender(new File("rss20.xml")); a.appendAllItemsToBeginning(item); a.removeItem(2); a.appendAllItemsAt(2, item); a.appendAllItemsToEnd(item); FeedAppender a = new FeedAppender(new File("atom10.xml")); writer.setFormat(FeedFormat.ATOM10); //need to set the format explicitly because RSS 2.0 is the default
If you simply want to keep adding new items to your feed while at the same time keeping the number of items in the feed at a certain number (according to the specs, you are recommended to have only 15 items in a feed). You can tell the appender how many items you would like to keep in your feed. If the actual number of items (after appends) in the feed is greater than the number you specified, the appender will remove items from the END of the feed to ensure there is at most numItemToKeep items in the feed.
Note : the items in the input channel are assumed to be in descending order from new to old (i.e. the newest item is at index 0 of the item list). So when the appender need to trim the feed, it always remove items from the end of the items list (the items that are at the bottom of the feed).
FeedAppender a = new FeedAppender(new File("yarfraw.xml")); a.setNumItemToKeep(15); //this operation does NOT trim the list, trimming only occurs after append a.appendAllItemsAt(0, bigItemList); ... a.appendAllItemsAt(0, moreItems); a.appendAllItemsToBeginning(item1, item2, item3); //"yarfraw.xml" have at most 15 items no matter how many items were appended using the appender
You can use the reader class to read any supported feed formats. Similar to other IO classes, the reader class uses RSS 2.0 format by default (except when it's reading remotely, see examples below), so you need to set the appropriate format if the input is not in RSS 2.0 format.
FeedReader r = new FeedReader( "rss20.xml"); Channel c = r.readChannel(); FeedReader r2 = new FeedReader( "atom10.xml", FeedFormat.ATOM10); Channel c2 = r2.readChannel();
The reader is capable of reading a feed from a remote http url. In this case, the format of the remote feed is automatically detected. Format detection will be executed only once at object construction time.
FeedReader reader = new FeedReader(file); //it's default to Rss 2.0 reader.setFormat(FeedFormat.RSS10); //change it to other format reader.setFormat(FeedFormat.ATOM10); //change it to other format FeedReader reader2 = new FeedReader(new HttpURL("http://somewhere.com/rssfeed.xml")); Channel c = reader2.readChannel(); //check whether a reader is reading from a remote server reader2.isRemoteRead(); //This is a remote feed reader2 = new FeedReader(new HttpURL("http://somewhere.com/atom10feed.xml")); //so the format is automatically detected assert(reader2.getFormat() == FeedFormat.ATOM10); //you can still change the format of the reader, but it's not recommended.
(this is a new feature since version 0.9 , see more about HTTP conditional get here ).
If you are using Yarfraw to build an aggregator, you should definitely consider supporting conditional get to reduce network traffics. In Yarfraw, conditional get is supported by a special feed reader called CachedFeedReader, which extends the FeedReader class. As the name suggests, this class also supports basic caching of the previous parsed feed and therefore will only need to perform parsing when there are new changes to the feed that it's reading.
FeedReader cr = new CachedFeedReader(new HttpURL("http://newsrss.bbc.co.uk/rss/newsonline_world_edition/front_page/rss.xml")); ChannelFeed first = cr.readChannel(); //issue another read immediately after, since nothing should have been changed since the last read //the reader will return the cached ChannelFeed object from the previous read ChannelFeed second = cr.readChannel(); assertTrue(first == second);
The CachedFeedReader will keep a cached version of the previous read and parsed FeedChannel object, and performs conditional get to the remote sever using the 2 http headers: Last-Modified and Etag. If the server responses a 304 not modified status code, the reader will return the cached feed, otherwise it reads and parse the response as normal. If the remote feed is not modified since the last read, there's no need to perform parsing, therefore, this class will perform much better than the normal FeedReader class. One thing you should note is that this reader caches the reference of the original FeedChannel and also returns it to the caller, so if the FeedChannel object is modified, the cached version is also modified.
CachedFeedReader cr = new CachedFeedReader(new HttpURL("http://newsrss.bbc.co.uk/rss/newsonline_world_edition/front_page/rss.xml")); ChannelFeed first = cr.readChannel(); //if I modify the returned feed object first.setTitle("blah"); //the cached feed object is also modified assertEquals(first.getTitle(), cr.getCachedChannelFeed().getTitle());
When this cached behavior, an aggregator can skip any aggregation work if it finds the returned ChannelFeed object is the same as the previously cached object. For example:
CachedFeedReader cr = new CachedFeedReader(new HttpURL("http://newsrss.bbc.co.uk/rss/newsonline_world_edition/front_page/rss.xml")); ChannelFeed first = cr.readChannel(); ...some times later, you issues a read again to check for new changes //you first get a copy of the cached of the cached reference ChannelFeed previous = cr.getCachedChannelFeed(); //issue a read ChannelFeed current = cr.readChannel(); if(current == previous){ //this means nothing has changed }else{ //there are new changes, so need to perform aggregation }
If the extensibility elements are supported by one of the extension modules , you should consider using them instead of reading them as normal DOM elements .
If the feed contains extensibility elements that are not in the official specs, those elements will be read into the Channel(or Item)'s otherElements list. You can get all these elements by calling Channel.getElements or there is a convenient method you can use to quickly get an specific element from the list. For example, the feed from digg.com has an element call <digg:diggCount>, you can get this element by using the following code:
FeedReader r = new FeedReader(new HttpURL("http://www.digg.com/rss/index.xml")); Channel c = r.readChannel(); Element e = c.getItems().get(0).getElementByNS("http://digg.com/docs/diggrss/", "diggCount"); System.out.println(e.getTextContent());
Another example of feeds that use extensibility elements, they are very common for popular sites. Youtube's feed has 'media' elements in their feed that look like:
<channel> ... <item> ... <media:player url="http://youtube.com/?v=BIraBleRLck" /> <media:thumbnail url="http://img.youtube.com/vi/BIraBleRLck/default.jpg" width="120" height="90" /> <media:title>Salto nel melgot</media:title> <media:category label="Tags">crazy mad follia covo bergamo arma pol andrev video divertenti funny videos porn sex lesbians hot sexy nude girls</media:category> <media:credit>c45e6</media:credit> ... </item> ... </channel>
To get information about those elements, you would write:
FeedReader r = new FeedReader(new HttpURL("http://youtube.com/rss/global/recently_added.rss")); ChannelFeed c = r.readChannel(); System.out.println(c.getItems().get(0).getElementByLocalName("player")); System.out.println(c.getItems().get(0).getElementByNS("http://search.yahoo.com/mrss/", "title")); Element thumnail = c.getItems().get(0).getElementByNS("http://search.yahoo.com/mrss/", "thumbnail"); System.out.println(thumnail.getAttribute("url")); System.out.println(thumnail.getAttribute("width")); System.out.println(thumnail.getAttribute("height"));
There are a lot more interesting things you can do with the IO API, take a look Advanced Examples .
For more details, I encourage to take a look at the Javadoc . Also check out the FAQ section for more insights about this API.