Craigslist Blocks Yahoo Pipes

Craigslist has no love for Yahoo Pipes

Craigslist is one of the greatest sites in the world, and the entire Bay Area seems to revolve around it. Sadly, Craigslist's search facility is extremely bad, seemingly only capable of searching within a price range and neighborhood. Craigslist supplies RSS feeds, but this still means I have to sift through a lot of information in order to find what I'm looking for.

Yahoo Pipes provides a way to filter and manipulate RSS feeds. It's very visual, and relatively easy to use. This would be an excellent tool to prune down my Craigslist RSS feeds.

Unfortunately, as of some time in the recent past, Craigslist has begun blocking Yahoo Pipes. Perhaps someone wrote an overly-popular pipe which caused a tremendous load on Craigslist's servers, or perhaps Craigslist thinks they'll somehow lose income by allowing Pipes. Either way, it sucks.

The work-around which I've employed is to mirror the base Craigslist search on my own server, then feed the Yahoo Pipe from that.

This requires you to have a server which:

  1. Is HTTP accessible.
  2. Provides cron, or some other method of running a script at regular intervals.
  3. Has curl, wget, or another HTTP-content-fetching utility.

Mirroring the RSS Feed

First, create an appropriate directory structure. For example:

mkdir ~/public_html/feeds

Next, test out curl or a similar content-fetching application on a Craigslist RSS feed URL. Don't forget that quotes are usually needed around the URL:

curl "http://feedUrl" --output ~/public_html/feeds/yourFile.xml

Examine the content of the file and make sure that it's the expected XML. If the file is very small, and contains text to the effect of, "this URL has moved", then you may have forgotten to surround the URL with double quotes.

Creating Yahoo Pipe

To fetch this mirrored RSS feed, use the "Fetch Data" source and provide it the URL to your freshly-fetched file.

If the pipe can't be read, verify the permissions for the containing folder hierarchy on your server. For *nix boxes, make sure the execute bit is set (chmod a+x ~/feeds).

Automating Update

Create a script file which will retrieve any and all feeds you wish to mirror. I place my scripts in ~/bin, so I placed the following into ~/bin/fetch-feeds:


rm ~/public_html/feeds/yourFile.xml
curl "http://feedUrl" --output ~/public_html/feeds/yourFile.xml

Note that I delete the existing feed mirror before fetching the new one so that any retrieval error will be obvious.

Now, call this script from inside your crontab (Scheduled Tasks on Windows servers):

crontab -e

I update my mirror at 7am and 2pm with the following:

# Fetch Craigslist feeds at 7am and 2pm:
0 7,14 * * * ~/bin/fetch_feeds

About Jeff Fitzsimons

Jeff Fitzsimons is a software engineer in the California Bay Area. Technical specialties include C++, Win32, and multithreading. Personal interests include rock climbing, cycling, motorcycles, and photography.
This entry was posted in Internet, Technology. Bookmark the permalink.

1 Response to Craigslist Blocks Yahoo Pipes

  1. Evan says:

    I was trying to make a Pipe and kept getting errors and I’m glad I found your site saying CL shut off the flow to the Pipes. I’m not even sure I can use my server to do what you suggest…. I just wish that craigslist would allow this Yahoo Pipes to access their RSS feeds again.

Leave a Reply

Your email address will not be published. Required fields are marked *