<?xml version="1.0" encoding="UTF-8" ?>

<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
    <title>marktag.com书签: nutch</title> 
    <link>/</link> 
    <description>Recent bookmarks posted to marktag.com书签</description>
    <ttl>60</ttl>

    <item>
        <title>Acrosscan.ca - Canadian Search Engine</title>
        <link>http://www.acrosscan.com/</link>
        <description>based on an open source program called Nutch</description>
        <dc:creator>MarkTag.com</dc:creator>
        <pubDate>Wed, 14 Jun 2006 10:44:10 +0000</pubDate>

            <category>acrosscan.ca</category>
            <category>canadian</category>
            <category>engine</category>
            <category>nutch</category>
            <category>search</category>
    
    </item>
    <item>
        <title>Creative Commons: search results</title>
        <link>http://search.creativecommons.org/</link>
        <description></description>
        <dc:creator>MarkTag.com</dc:creator>
        <pubDate>Wed, 22 Feb 2006 17:51:44 +0000</pubDate>

            <category>creativecommons search</category>
            <category>nutch</category>
    
    </item>
    <item>
        <title>Nutch - SWiK</title>
        <link>http://swik.net/nutch</link>
        <description></description>
        <dc:creator>MarkTag.com</dc:creator>
        <pubDate>Thu, 16 Feb 2006 17:02:20 +0000</pubDate>

            <category>nutch</category>
            <category>swik</category>
    
    </item>
    <item>
        <title>Enjoy Nutch-ing</title>
        <link>http://www.nutch.cn/</link>
        <description>&quot;It would be nice if there were an open-source search engine owned by the world.&quot; 
http://nutch.sourceforge.net/blog/cutting.html</description>
        <dc:creator>MarkTag.com</dc:creator>
        <pubDate>Wed, 25 Jan 2006 18:53:33 +0000</pubDate>

            <category>free search</category>
            <category>nutch</category>
            <category>nutch.cn</category>
            <category>open-source search engine</category>
    
    </item>
    <item>
        <title>PublicServers - Nutch Wiki</title>
        <link>http://wiki.apache.org/nutch/PublicServers</link>
        <description></description>
        <dc:creator>MarkTag.com</dc:creator>
        <pubDate>Wed, 25 Jan 2006 18:35:05 +0000</pubDate>

            <category>based on nutch</category>
            <category>nutch</category>
            <category>publicservers</category>
    
    </item>
    <item>
        <title>[#NUTCH-48]</title>
        <link>http://issues.apache.org/jira/browse/NUTCH-48</link>
        <description>Spelling  Rate this Message: 
http://www.nabble.com/Spelling-t569437.html#a1547313
I have it loaded on mozdex.com and it works fairly 
well. 

Only thing i noticed is it seems to look for longer 
versions of a matching phrase vs immediate common 
mistakes. 

For example &quot;diat pill&quot; (which is a very common query) 
comes up as diatribe pill  instead of &quot;diet pill&quot; :) 

BUT as my index grows perhaps these will trickle out.</description>
        <dc:creator>MarkTag.com</dc:creator>
        <pubDate>Thu, 08 Dec 2005 10:33:49 +0000</pubDate>

            <category>nutch</category>
            <category>patch</category>
            <category>spelling</category>
            <category>spelling suggestions</category>
    
    </item>
    <item>
        <title>nutch,opensearch,a9,rss</title>
        <link>http://www.mozdex.com/search.jsp?query=miserable&amp;failure</link>
        <description>use a PHP front-end with Nutch. Run Tomcat on the   same box as PHP. Then write a PHP search page that makes an HTTP call   to &quot;http://localhost/opensearch&quot; for the actual search operation.   Parse the resulting XML (RSS 2.0) with xml_parse_into_struct() and   display. I'm using this setup on http://www.busytonight.com and it works great. --Matt
http://www.nabble.com/Replace-Tomcat-and-JSP-with-PHP-in-Nutch-How-Hard-is-It--t515663.html#a1398738
Here's an example of a Nutch-based site that has both /search.jsp  and /opensearch interfaces available. AFAIK, it accepts all of the   same parameters that the stock Nutch setup accepts. 
http://www.mozdex.com/search.jsp?query=miserable&amp;failure 
http://www.mozdex.com/opensearch?query=miserable&amp;failure 

Nutch 0.7 supports A9 opensearch RSS. http://fisher.osu.edu/resources/search.html http://jon.shoberg.net</description>
        <dc:creator>MarkTag.com</dc:creator>
        <pubDate>Thu, 08 Dec 2005 09:55:59 +0000</pubDate>

            <category>a9</category>
            <category>nutch</category>
            <category>opensearch</category>
            <category>rss</category>
    
    </item>
    <item>
        <title>Heritrix Crawler vs. Nutch Crawler</title>
        <link>http://www.dbanotes.net/web/heritrix_crawler_vs_nutch_crawler.html</link>
        <description>Heritrix Crawler vs. Nutch Crawler

主要目的不同。 Heritrix 是个 &quot;archival crawler&quot; -- 用来获取完整的、精确的、站点内容的深度复制。包括获取图像以及其他非文本内容。抓取并存储相关的内容。对内容来者不拒，不对页面进行内容上的修改。重新爬行对相同的URL不针对先前的进行替换。爬虫通过Web用户界面启动、监控、调整，允许弹性的定义要获取的URL。
二者的差异：

Nutch 只获取并保存可索引的内容。Heritrix则是照单全收。力求保存页面原貌 
Nutch 可以修剪内容，或者对内容格式进行转换。 
Nutch 保存内容为数据库优化格式便于以后索引；刷新替换旧的内容。而Heritrix 是添加(追加)新的内容。 
Nutch 从命令行运行、控制。Heritrix 有 Web 控制管理界面。 
Nutch 的定制能力不够强，不过现在已经有了一定改进。Heritrix 可控制的参数更多。</description>
        <dc:creator>MarkTag.com</dc:creator>
        <pubDate>Fri, 25 Nov 2005 16:40:55 +0000</pubDate>

            <category>crawler</category>
            <category>heritrix</category>
            <category>nutch</category>
            <category>vs</category>
    
    </item>
    <item>
        <title>Mozdex -- open source search engine based on Nutch</title>
        <link>http://www.mozdex.com/</link>
        <description>Mozdex -- open source search engine based on Nutch
http://www.mozdex.com/ 
http://www.nutch.org/ 

interesting readings: 
http://www.linux.org/news/2004/04/09/0002.html 
http://www.technewsworld.com/story/31653.html</description>
        <dc:creator>MarkTag.com</dc:creator>
        <pubDate>Fri, 25 Nov 2005 15:32:53 +0000</pubDate>

            <category>based on</category>
            <category>engine</category>
            <category>mozdex</category>
            <category>nutch</category>
            <category>open source</category>
            <category>search</category>
    
    </item>
    <item>
        <title>未知都是已知的: Nutch 初体验</title>
        <link>http://www.dbanotes.net/archives/2005/01/nutch_aee.html</link>
        <description>Nutch 支持的个人搜索引擎或是主题搜索引擎</description>
        <dc:creator>MarkTag.com</dc:creator>
        <pubDate>Fri, 25 Nov 2005 12:28:57 +0000</pubDate>

            <category>nutch</category>
            <category>初体验</category>
            <category>未知都是已知的</category>
            <category>个人搜索引擎</category>
            <category>主题搜索引擎</category>
    
    </item>

</channel>
</rss>