Note: if you’re looking for code, there’s a link at the bottom of this article where you can get a copy. This article just explains what the code does.
About a year ago, Google mentioned casually that they would be getting rid of YouTube subscription feeds1. Earlier this year they finally made good on their threat and turned the feed off for good. This wasn’t a big problem for many people who don’t use RSS in their daily lives2, but for me it was a damn nuisance as I like all my updates in one place.
Google courteously explained that although the main YouTube feed was going away, you can still access individual channel feeds. No problem, right?
My preferred feed reader, the Old Reader, has a limit of up to 100 subscriptions on the free plan, and I have 435 subscriptions on YouTube alone. Please understand that I don’t have a TV licence as most of my video entertainment comes via YouTube and other online outlets, so it’s like being subscribed to 400+ channels that I actually want rather than paying Sky or Virgin for 900+ channels that I don’t want!
I refuse to pay money for extra subscriptions just because Google want to drop a service, so I decided to just build a subscription feed myself. How hard can it be, right?
Google did one good thing at least – at the bottom of everyone’s YouTube subscriptions management page is a button allowing you to export all your channel subscriptions to one OPML file. This file would make the basis of my new feed.
The file itself just contains data pertaining to where all the individual feeds are and some meta data (titles, channels, etc.).
My plan was to have a URL3 that the Old Reader can invoke and pull all the new items from. I know that the Old Reader is clever enough to know which links it has presented before, so all I needed to do was make sure that fresh links within the last day or two were showing and the reader would take care of the rest.
My website is built (mostly) in CodeIgniter, so I created a new controller class devoted to processing all my feeds. I hard-coded a “key” to pass in to prevent people other than my feed reader from accessing my subscriptions, and pointed the script to look at the folder where I would keep my current OPML file. Note that I am hard-coding a lot of things here which isn’t 100% in the spirit of “flexible code”, but it’s going to serve a specific purpose for me rather than being open to the public so it will do!
Unlike the horrendous non-standard markup I encountered when trying to process bookmark files, Google’s OPML is well-structured which means we can use the proper XML parsing tools built into PHP! We load the contents of the file into a SimpleXML object and loop through to collect the two most important items we need per YouTube channel – the title of the channel and the URL location of where the relevant feed can be found. Note that we have to “cast” the node into a string as PHP will try (and fail in) reassigning the object node to a variable otherwise.
We now have an array of feed URLs. I pass this into a function called _process_feeds() and inside that function we will now need to do the following:
If _process_feeds() returns an array with some items in it, we have some feed material! The best part was that I already had an RSS template built from when I created an RSS feed that amalgamates some WordPress feeds and a Twitter feed into one RSS. I simply load this existing view and pass in the collected the feed data and hey, presto! One YouTube subscriptions feed.
It looked good to go, so I uploaded all the files to my web server and plugged the URL into the Old Reader…
The Old Reader refused to register the URL as valid RSS. After a bit of tinkering, I managed to figure out why: Firebug revealed that the Old Reader would query a RSS feed for 30 seconds before timing out and presenting an error.
The sheer amount of data was causing my script to take about one minute and thirty seconds to run due to the amount of web calls it was making. Some investigation into optimising the calls suggested I should replace PHP’s file_get_contents() function with CURL if it’s installed on my server. After a quick test to determine I could use CURL, I wrote a small function to get the feeds using CURL and ran the script to see what the difference was.
I will be honest, there was no noticeable difference in speed when using CURL. So, without learning a leaner machine language to process the feeds with4, what could I do?
Rather than try and pull all the feed items live from the function, why not just dump out the RSS to a file and present that file’s address to my feed reader? The feed reader would query the static file and, as long as the contents were regularly updated, it should pick up new items.
A quick Google search revealed that I could load the contents of a CodeIgniter view into a variable. I would then dump this out into a file somewhere on my server and let the feed reader do the rest.
I then set up a cron job to run the main process_feeds() function once every six hours to refresh the contents of my RSS file, which is fairly simple as CodeIgniter has rudimentary CLI functionality5.
As an aside, for some bizarre reason my hosting provider’s cron panel just didn’t like the supposed CodeIgniter CLI invocation format at all. A line like this…
0 */6 * * * php -q [FILE PATH]/index.php youtube fetch_feed [KEY] >/dev/null 2>&1
…that should have worked fine just returned “content type: text/html” and refused to output anything else. It ran fine on a Linux box via the terminal and I spent a lot of time commenting out bits and pieces to see if it was my unique setup of CodeIgniter integrated with WordPress causing the problem. In the end, I found this excellent article and set up a separate cli.php file which worked just fine, invoked with the following cron:
0 */6 * * * php -q [FILE PATH]/cli.php "youtube/fetch_feed/[KEY]" >/dev/null 2>&1
If you can get away with using CodeIgniter’s “built-in” CLI interface, then bully for you!
To recap, I now have a static .rss file that updates every six hours with new videos from the last two days. Yet, there’s a problem. Beyond the initial import, the Old Reader refuses to register any new videos. It is particularly frustrating since at this point I am regularly checking the .rss file and it definitely has new videos showing in it.
After some research, it appears that my previous assumption6 that feed readers would just scan for RSS nodes with a recent date was incorrect. On reflection, it’s a little bit obvious: RSS feed aggregators could scan feeds for certain dates based on the last update date, but that’s a lot of processing to fetch a feed and check the videos, only to find there are no new ones.
No, instead a lot of services will apparently send a HTTP7 request and if the page returns specific headers as to when the page was last updated it will then poll for new content.
I’ll admit it now, my understanding of headers is fuzzy at best. I know it is invisible information passed to a server before a web page request, normally replying in the form of a response code, and that it can contain a lot of information pertaining to how a page should behave; beyond that, it is currently a bit of a mystery to me. I have previously set headers for a page when I wanted certain files to be prompted as a “download now” (see the HTML bookmark sorter article for an example), when redirecting as PHP can send a redirect condition to go to a new page, or even something as simple as a gesture towards your favourite recently-deceased author by placing his name in the headers of your website via .htaccess8.
I wrote a new function called show_feed() as part of the class, which returns the contents of the RSS file I generated and prefixes it with a header providing the date of the latest video in the feed. Then I just output the contents of the RSS feed. It was a bit trial and error and I had to use a few online resources to fathom it out, which I have referenced in the final script.
At first my feed reader did not want to update at all. Then I forced a refresh and then, lo and behold, on the following day I found the following in my feed reader…
There are a few benefits to my feed script:
However, there are also quite a few flaws in this script:
I would love to be able to offer this YouTube subscription feed service on my website (like the bookmark sorter) but it might prove costly in terms of bandwidth. Instead, I have placed all the code on GitHub into a repository. Please help yourself to it if it is of any value to you. It’s been designed to run in CodeIgniter but I imagine it would be quite simple to convert it to another MVC framework or just as a series of procedural static functions.
If you have any suggestions on how to improve my script, please let me know with a comment below (you can login using most popular social media accounts) or get in touch on GitHub (in which case please bear with me while I learn the process – I have used GitHub for versioning but not so much for collaboration!).
This doesn’t have anything to do with the subscription feed, but I updated my CodeIgniter framework to 3.0 this week and, lo and behold, the script stopped running via the command line. My “hacked” cli.php file didn’t work, and using other methods proved fruitless. It seems that a few people have the same problem with using CI via CLI on my hosting provider.
Luckily, I found a forum post where a clever sod named “ZoomIt” suggested using Wget instead. I’m just trying to load the controller function fetch_feed(), after all! I added a modifier to suppress the output (dump it out to the black hole of /dev/null) and hey, presto! One working feed generator, and this cron issue didn’t take a week to resolve like last time. The resulting cronjob looked like the following:
wget -qO- http://www.payneful.co.uk/youtube/fetch_feed/[KEY] &> /dev/null
It’s not as neat as invoking the PHP on the command line and a few forum posts recommended CURL instead as Wget is more for copying file output, but Wget works so I’m happy again.
Return to Viewing Webpage