About - Wild Bees Poster - Twitter - Posts - Podcasts - Namesakes

Build a Robot Journalist Assistant in 3 Easy Steps

Too much information, too little time to sift through it — who has time to find the few relevant stories that dozens or hundreds of beat-related blogs and company and government sites produce every day?

How about a digital assistant? There’s a way to automate that filtering process in just a few steps using Yahoo Pipes. One of the best parts about Pipes is that you don’t need to do any heavy lifting on your own to create powerful tools. You are free to copy publicly available Pipes and alter them however you need.

That’s what we’ll do to a “robot assistant” that I built. I use it to take more than 80 RSS feeds from a wide spectrum of political sites, bloggers, analysts, lobbyists and pollsters who I think are interesting and filter the hundreds-plus posts they generate each day with just a few specific keywords. The result? I end up with 10-15 posts every day that I know are likely to interest me. It’s completely automatic; I never have to think about it.

Here’s how to take that Pipe and make it your own.


1: The easiest step: Copy the Pipe I’ve already built

First off, you’ll need a Yahoo account. Once you’ve set that up and have logged in to Yahoo, visit this Pipe: Robot and click on “View source.”

Here’s how Yahoo Pipes works. The blue bubbles on the left column are commands. You drag them to the canvas, input the needed information, and then connect them together using “pipes” that you drag between the modules . It’s visual programing.

But you don’t need to worry about any of that.

Notice up in the right hand corner where it says “Save a copy”? Click on that. The screen will grey out for a minute, and then – outside of the words “Copy of” in front of the Pipe title, the whole thing will look like it did before. But now it’s your own. Click on “Back to My Pipes” at the top of the page and you’ll see where your Pipe is listed. We’ll come back to this page later on.

You’ll need to do the same copying process for this Pipe called subRobot as well. These two pipes need each other to run. But aside from copying subRobot, we won’t have to do anything to it. When you’re done, close that browser window.

2: Create a spreadsheet in Google Docs

This spreadsheet will contain the online sources you want to feed into your assistant. You can title the document whatever you want, but it needs to have two specific characteristics:

1) The first line of the first column must be called “feeds” (lowercase). 2) That first column is where all of your feed URLs should go, one on each line. Website URLs won’t work; they need to be RSS feed URLs.

Here’s what my spreadsheet looks like.

Think of your spreadsheet as a working document. Even after the Pipe is set up you can always add more feeds or remove feeds. The more the better — up to several hundred.

Now that you’ve entered all your feeds into the spreadsheet, look up in the right hand corner to where there’s a drop-down button next to the word “Share”. Click on that and select “Publish as a web page”.

A small window will open. Change “All sheets” to “Sheet1″. Click the “Start publishing” button. When you do that, the options on the bottom half of the window will now be accessible. Change the drop-down menu that says “Web page” to “CSV (comma-separated values)”. Then select and copy the link in the box below. Your window should look something like this, although with a different link. What’s most important is that your link ends in “output=csv”.

3: Copy and paste the Docs link, add your keywords, and you’re done

Now it’s time to make this Pipe your own. Go to pipes.yahoo.com and click on My Pipes. Mouseover the Robot copy pipe and click “Edit source”. The very top module is called “Fetch CSV”. Paste that URL you copied from the spreadsheet into the top field in that box. Very important: If your link starts with “https” change it to “http”.

The Pipe is now pulling in all of the posts from all of your RSS feeds. That was easy!

Now let’s do some customization. Scroll down to the box called “Filter”. It’s pretty self explanatory: As the information “flows” down through the pipes to this step, we’re going to use the “Permit” option to allow only some of it to get through. If you want to keep things easy, just change those keywords to whatever you want. If you want to get a little more detailed in your filtering, I have instructions at the bottom of this page.

One final thing. It’s a quick drag-and-drop change and then we’re done. In the module called “Loop” there’s a smaller module inside of it called “[open] subRobot”. In that smaller module, click the red box in the right corner. Poof — it disappeared.

Now go to the left hand column on your screen. Scroll down and click on “My pipes”. There will be two bubbles there — drag the one called “subRobot copy” into the hole left by the module that you deleted. Once it’s there, look for where it says “Change this >” and use that menu to select “item.feeds”.

Hit the save button and you’re done! When it’s finished saving, click on the “Run Pipe” link at the very top of the page.


More filtering info

The field called “item.description” means “everything in the RSS entry (i.e. the full or partial blog post or news story available in the RSS feed)”. In my example, I’m letting every entry that contain the words “facebook” “social media” or “twitter” get through. If I switched “Contains” to “Does not contain” I would get the opposite result. Change those keywords to whatever you want. Use the “+” button to add more fields if you need them.

If you’re interested, you have a few more options you can mix in. Click on “item.description”. In the drop-down menu, “item.Pubdate” and “item.title” are both useful. For example, use item.title to limit the flow of information to only blog posts with a certain keyword in the title. Use item.Pubdate to limit to a day or a date range.

The results

Once you’ve clicked on “Run Pipe”, it will take you to the results page. Depending on the type of information you’re getting, to see your results you may need to click on the “List” tab instead of the “Image” tab (which is just slideshow of all the various images in your results).

There are two ways to get back to this page. You can bookmark it, or you can go to your “My Pipes” page via pipes.yahoo.com. When you’re there, you’ll see both of your Pipes. Mouseover the Robot copy pipe and click “View results”.

The third and best option is to just grab the RSS feed that’s available right above the results.

Feel free to change the name from Robot to something else. But do not change the name of the sub pipe.

One last thing to note: Pipes is not real time. There can sometimes be an hour or more delay between when a site posts something and when it shows up in a Pipe. If you’re getting your Pipe results via RSS, that delay can be even longer.

If you have any questions or run into any problems, please feel free to email me: abraham@abrahamhyatt.com.

The pipes in this post are based on a method developed by the awesome Pipes guru, hapdaniel. If you start to build pipes on your own, he’s an invaluable resource in the forums.


Troubleshooting

If you’re not seeing results, double check that you haven’t accidentally made your filters too restrictive. At the bottom of the Robot pipe, click on “Pipe Output”. Use your mouse to pull up the frame at the bottom of you screen with the debugger results in it. (It will only list a few results, even though your actual output may be much larger.) Don’t like the headlines you see? Play with the keywords in the filter, and use the “Refresh” link in the debugger pane to see what your changes result in. You may have to hit refresh several times, especially if you’re dealing with a large amount of info.

If you’re still having problems, the issue maybe that you’re trying to push too much information through the Pipe. The problem isn’t in how many RSS feeds you have — that should be ok up to several hundred. But if your keywords aren’t specific enough and there are hundreds or thousands of results, the Pipe will likely break.


This site respects DoNotTrack. Click here to opt out of tracking.