THE BIGGEST OBSTACLE TO effective Web searches is sheer volume. With an estimated 2 billion Web pages now online–and 40 new ones posted every minute–traditional search engines like Lycos and HotBot are drowning in data, able to index only a fraction of published sites.
Because Internet users spend 50 percent of their online time doing research (according to a Business Week/Harris poll), a new wave of search engine technology can’t come soon enough. Here’s the rundown on some current second-generation offerings.
Google (http://www.google. com/) is based on the concept that the of a page increases with the number of links referencing it. For example, because a link to Harvard University’s home page is referenced many more times than a link to its football team’s page, a Google search for Harvard would display the home page first. Google founder Larry Page says, “Accuracy increases with the number of pages indexed. We’re currently ranking over 100 million pages by solving an equation containing 500 million variables and a billion terms.”
Similar approaches come from Direct Hit (http://www.directhit.com/shopping-answers), which ranks results by looking at sites selected by users who’ve previously made the same query, and Clever, a research project developed by IBM. Clever analyzes the hyperlink structure between pages and identifies pages as authoritative sources based on the linkage patterns. It follows links pointing to other pages, taking into account the text surrounding the links. Andrew Ould, a spokesperson for IBM’s Almaden Research Center, says Clever may eventually “help automate the process of building customized directories,” but is focused on “particular topics of interest to vertical portals.”
Fast (formerly at alltheweb.com), a system being developed in Norway, lives up to its name with blazing speed, particularly when displaying images and video clips. Claiming access to about 200 million searchable documents, Fast hopes to eventually index a billion pages.
Meanwhile, existing engines such as Ask Jeeves (http://www.ask.com/) and Alta-Vista are going beyond Boolean-style searches and adding natural-language capabilities. Ask Jeeves’s editors use traditional search engines to help answer plain-English questions. Vice president of marketing David Helliard says Jeeves is developing a broad range of answer templates that produce keenly targeted results; editors check user logs daily for the types of questions being asked.
Among software search tools, KCSL’s X-Portal Findware uses a two-pronged approach to deliver the right answers. State your query in natural language or Boolean terms; to find the answer, Findware mines data from 22 electronic references–“over a gigabyte” of data from encyclopedias, almanacs, dictionaries, and the like, says Markus Gunn, vice president of sales and marketing. The engine uses proprietary technology to rate the quality and relevance of pages found by 30 traditional search engines.
Last, the KnowAll engine uses a sophisticated conceptual knowledge base to dissect your question (which can actually be longer than a sentence) into both an idea and a pattern. It then selects the search engines most likely to provide a meaningful answer, prioritizes the information returned, then stores the searches in a database you can query offline.