Yahoo Term Extractor
August 1st, 2007A recent project I was working on cause me to stumble over the Yahoo Term Extractor. Something I had previously never heard of - it is a very underrated tool.
The Term Extraction Web Service provides a list of significant words or phrases extracted from a larger content.
So give it a paragraph of text, for example an article, a blog entry, or in my case a converted document (MS Word or PDF) and it will give you the most common and significant words or phrases.
It can help in “tagging” or other meta data (2) reliant applications. You can never trust users to submit decent tags.
Go on, give it a try. You’ll need an Application Key mind.
Create a simple HTML form and give it a go.
1 2 3 4 5 6 7 8 9 10 | <form id="form1" name="form1" method="post" action="http://search.yahooapis.com/ContentAnalysisService/V1/termExtraction" /><input name="appid" type="text" id="appid" value="YOUR-APPLICATION-ID" size="80" /> <br /> <textarea name="context" id="context" cols="45" rows="5"></textarea> <br /> <input type="submit" name="button" id="button" value="Submit" /> </form> |
You can even add an optional query to help with the extraction process.
1 | <input name="query" type="text" id="query" size="80" /> |
Once I figured it out, I decided to loop over my existing data populate the meta data. Dam, users.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | $data = array('appid' => "YOUR-APPLICATION-ID", 'context' => 'TABLE-DATA'); $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, 'http://search.yahooapis.com/ContentAnalysisService/V1/termExtraction'); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $data); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $yte_input = curl_exec($ch); curl_close($ch); $phpobject = simplexml_load_string($yte_input); foreach($phpobject as $term) { $rs_yte = $conn->Execute(YOUR INSERT QUERY); } |
Simple as SimpleXML!
| Bookmark it del.icio.us | Reddit | Slashdot | Digg | Facebook | Technorati | Google | StumbleUpon | Window Live | Tailrank | Furl | Propeller | Yahoo |
Was this post useful to you? Let me know, buy me a beer!
Alternatively, if you're feeling impecunious, you may like to subscribe to my RSS feed, or see other articles in the HTML, CSS, AJAX, MySQL, PHP category.