Tags: nutch

  • Sort by: Date / Title / URL

    1. based on an open source program called Nutch
    2. 2006-02-16 to , by MarkTag.com
    3. "It would be nice if there were an open-source search engine owned by the world." http://nutch.sourceforge.net/blog/cutting.html
    4. Spelling Rate this Message: http://www.nabble.com/Spelling-t569437.html#a1547313 I have it loaded on mozdex.com and it works fairly well. Only thing i noticed is it seems to look for longer versions of a matching phrase vs immediate common mistakes. For example "diat pill" (which is a very common query) comes up as diatribe pill instead of "diet pill" :) BUT as my index grows perhaps these will trickle out.
    5. use a PHP front-end with Nutch. Run Tomcat on the same box as PHP. Then write a PHP search page that makes an HTTP call to "http://localhost/opensearch" for the actual search operation. Parse the resulting XML (RSS 2.0) with xml_parse_into_struct() and display. I'm using this setup on http://www.busytonight.com and it works great. --Matt http://www.nabble.com/Replace-Tomcat-and-JSP-with-PHP-in-Nutch-How-Hard-is-It--t515663.html#a1398738 Here's an example of a Nutch-based site that has both /search.jsp and /opensearch interfaces available. AFAIK, it accepts all of the same parameters that the stock Nutch setup accepts. http://www.mozdex.com/search.jsp?query=miserable&failure http://www.mozdex.com/opensearch?query=miserable&failure Nutch 0.7 supports A9 opensearch RSS. http://fisher.osu.edu/resources/search.html http://jon.shoberg.net
      2005-12-08 to , , , by MarkTag.com
    6. Heritrix Crawler vs. Nutch Crawler 主要目的不同。 Heritrix 是个 "archival crawler" -- 用来获取完整的、精确的、站点内容的深度复制。包括获取图像以及其他非文本内容。抓取并存储相关的内容。对内容来者不拒,不对页面进行内容上的修改。重新爬行对相同的URL不针对先前的进行替换。爬虫通过Web用户界面启动、监控、调整,允许弹性的定义要获取的URL。 二者的差异: Nutch 只获取并保存可索引的内容。Heritrix则是照单全收。力求保存页面原貌 Nutch 可以修剪内容,或者对内容格式进行转换。 Nutch 保存内容为数据库优化格式便于以后索引;刷新替换旧的内容。而Heritrix 是添加(追加)新的内容。 Nutch 从命令行运行、控制。Heritrix 有 Web 控制管理界面。 Nutch 的定制能力不够强,不过现在已经有了一定改进。Heritrix 可控制的参数更多。
      2005-11-25 to , , , by MarkTag.com
    7. Mozdex -- open source search engine based on Nutch http://www.mozdex.com/ http://www.nutch.org/ interesting readings: http://www.linux.org/news/2004/04/09/0002.html http://www.technewsworld.com/story/31653.html

    First / Previous / Next / Last / Page 1 of 2