You might also consider reading A funny quote or Daily and hourly schedules.
Link from Edith Frost
Edith Frost’s website, one of those 348North News checks each hour, has a comprehensive list of the good and bad bots (in her opinion) that visit the site. I am relieved to see that 348North is on the good side.
When I set about to set up this aggregator, the last thing I wanted to do was to add significantly to anyone’s bandwidth bill. That’s why, instead of scraping, I limited the requests to just an RSS/XML summary file. I also implemented an eTag subsystem that queries for 304 Not Modified headers for those sites that support it.
I’ve been toying with recording bandwidth usage per site (each cycle’s usage is already kept) so that site owners could decide for themselves whether they wanted 348North to continue to collect their feed. I need to work on that aspect — I don’t want to inconvienance others in my quest for selfish convienance.
written by Kevin in web stuff