caltcasc 发表于 2007-5-6 18:52:54

搜索引擎Search Engines

Internet search tools fall into two camps:search engines,such as HotBot and AltaVista,and online directories,such as Yahoo and Lycos.The difference between the two is related to how they compile their site listings.Of course,there are exceptions to every rule.Some search utilities,such as Ask Jeeves,combine the search engine and directory approaches into a single package,hoping to provide users with the best of both worlds.

  In directory-based search services,the Web site listings are compiled manually.For example,the everpopular Yahoo dedicates staff resources to accept site suggestions from users,review and categorize them,and add them to a specific directory on the Yahoo site.

  You can usually submit your Web site simply by filling out an online form.On Yahoo,for example,you'll find submission information at www.yahoo.com/docs/info/include.html.Because human intervention is necessary to process,verify,and review submission requests,expect a delay before your site secures a spot in a directory-based search service.

  On the flip side,search engines completely automate the compilation process,removing the human component entirely.

  A software robot,called a spider or crawler,automatically fetches sites all over the Web,reading pages and following associated links.By design,a spider will return to a site periodically to check for new pages and changes to existing pages.

  Results from spidering are recorded in the search engine’s index or catalog.Given the wealth of information available on the Internet,it is not surprising that indexes grow to very large sizes.For example,the AltaVista index has recently been increased to top out at 350 million pages.This may seem like a mammoth number,but by all estimates it still represents less than 35 percent of all pages on the Web.

  Because of the depth and breadth of information being indexed,there is usually a delay,sometimes up to several weeks,between the time a site has been“spidered”and when it appears in a search index.Until this two-step process has been completed,a site remains unavailable to search queries.

  Finally,the heart of each search engine is an algorithm that matches keyword queries against the information in the index,ranking results in the order the algorithm deems most relevant.

  Because the spiders,resulting indexes,and search algorithms of each search engine differ,so do the search results and rankings across the various search engines.This explains why a top 10 site in HotBot may not appear near the top of Alta Vista when the same keyword search criterion is entered.

  In addition,many,but not all,search utilities also reference metatags—invisible HTML tags within documents that describe their content—as a way to control how content is indexed.As a result,proper use of metatags throughout a site can also boost search engine ranking.

  因特网搜索工具分为两大阵营:搜索引擎,如HotBot和AltaVista,以及在线目录,如Yahoo和Lycos。两者间的差别与它们如何编撰网站编目有关。当然,对任何规律都有例外。有些搜索实用程序,如Ask Jeeves,把搜索引擎和目录方法合并成单一的软件包,希望把这两个阵营中最好的东西提供给用户。

  在基于目录的搜索服务中,Web网站编目是手工编撰的。比如一直流行的Yahoo就指定专门的人力资源来接受用户对网站的建议,并对建议进行评价和分类,再把它们加到Yahoo网站上特定目录中。

  通常是通过简单地填写在线表格就能把你的网站信息提交给(搜索引擎)。例如,在Yahoo网站上,你可以在 www.yahoo.com/docs/info/include.htm1上找到提交信息。由于人工干预对处理、验 证和评价提交请求是必要的,所以在网站在基于目录的搜索服务中捕捉到一处之前,可 望有些延迟。

  另一方面,搜索引擎完全实现了编撰过程的自动化,彻底消除了人工干预。

  一个叫做蜘蛛或爬虫的软件机器人自动地在整个Web上取出站点,阅读页面和跟随相关的链接。通过设计,蜘蛛可以周期性地返回到站点,检查新的页面和修改已有页面。

  蜘蛛爬行得到的结果记录在搜索引擎的索引或目录中。已知了因特网上可资利用的信息的价值,对索引扩张到非常大的规模是不会感到惊讶的。 例如,AltaVista的索引最近已增至3.5亿页而名列前茅。这个数字看来好像非常大,但总体估计它仅代表了Web上不足35%的页面。

  由于已编索引的信息的深度与广度(非常大),所以通常在“蜘蛛爬行过”站点的时间与出现在搜索索引中的时间之间有一个延迟,有时多达几周。只有这两步的过程完成之后,站点才能供搜索查询使用。

  最后,每个搜索引擎的心脏是一种算法,它将关键字查询与索引中的信息匹配起来,并按算法认为最有关联的顺序把结果列出。

  由于每种搜索引擎的蜘蛛、产生的索引和搜索算法都是不一样的,所以在不同搜索引擎上的搜索结果和排列次序是不同的。这就解释了为什么当相同的关键字搜索准则输入进去时,HotBot中排在最前面的10个站点不会出现在 AltaVista中最前面的站点中。

  此外,很多(但不是所有的)搜索实用程序也引用元标记(文档中用来描述其内容的、看不见的HTML标记),作为控制内容如何编索引的方法。因此,在整个站点中正确使用元标记也能提高(此站点)在搜索引擎中的排列名次。
页: [1]
查看完整版本: 搜索引擎Search Engines