Some time ago, I answered another user's question on Stack Overflow about database design for a multi-user feed aggregator. I also received an email from a developer asking for additional input, which I shared. But I thought I should put my response here, as well, for posterity's sake if nothing else. Note that my comments here assume MySQL as the database but should apply to any SQL database.
Basically, my emailer asked what to do about the fact that the
posts table will get huge very quickly if we have multiple users and a row for each post for each user. It's actually a pretty basic relational database scenario, but if you start your project as a single user application and later decide it's going to be multi-user, you may not realize that you probably need to completely redesign your database.
So I've posted the schemae I use for my multi-user feed aggregator (a private project):
I've also posted a sql command that you can run in a cron job to remove read posts that are more than 14 days old:
As an aside, I'm amazed by how many people are writing new aggregators. Is it a common programming class exercise to write a feed aggregator or something?
The New York Times is doing a lot of great things with its website and RSS feeds. But somewhere along the way, they've introduced a bug in their code that generates the RSS feed for the home page.
The bug is that the channel title switches back and forth between "NYT > NYTimes.com" and "NYT > Home Page". This alternates at least once an hour, all day long (as near as I can tell). This constant switching causes one of my feed readers (FeedDemon) to alert me of the change every time it occurs. Of course this latter point is not directly the Times's fault, but it is driving me insane.
Screenshots to prove that I'm not already insane:
If you look closely at the raw RSS feeds, you will notice that they appear to be using two different tools to generate the same feed. So I guess the two tools are not configured exactly in sync with one another.
Mike Arrington recently reignited some discussion about the definition of a blog.
I have a related question: What is a podcast?
I'm really only concerned with two technical questions:
- Does each feed item need an enclosure? If not, what ratio of items with enclosures to total items makes a feed a podcast?
- Does the enclosure need to be in a dedicated feed element, or is it okay to just put a link to the enclosure somewhere in the feed description? For example, is this a podcast? The publisher, a big RSS technology company, seems to think so.
There are other debatable points, too, such as whether Public Radio feeds are really podcasts, but I'm more interested in the technical points.
Not that I'll ever use it, but it looks like Microsoft is supporting the use of OPML subscription lists in Outlook 2007. I tried NewsGator's Outlook extension, and I think Outlook is the worst conceivable way to consume the large volume of information I have in my modestly sized subscription list. But at least Microsoft is going to let Outlook users get their subscription lists out of Outlook (and into a better aggregator). They even have a step-by-step guide for exporting your RSS feeds from Outlook.