Crawling in Open Source, Part 1Introduction
In this article I will give you a short introduction to crawling in general and then move on to Apache Nutch, its history, and architecture, and explanations of its core processing steps and MapReduce functions at a very technical level. Aft......
How The Yahoo! Directory Really Ranks Its DirectorLast Update: Tuesday, January 04, 2005.
In this article, I show you how the Yahoo! Directory ranks its listings, so you can use this knowledge to get higher rankings.
First you must understand that Yahoo! is a web directory, NOT a search engine.
......
The Basic Parts of a Search EngineWhile there are different ways to organize web content, every crawling search engine has the same basic parts:
Crawler (or Spider)
The crawler does just what its name implies. It scours the web following links,updating pages, and adding new pages when......