ADVANCED SEARCH STRATEGY/SEARCHING FULL-TEXT

The use of truncation and the logical operators, AND and OR, are the key elements needed to develop a basic search strategy for most research topics. Sometimes, however, some additional procedures are needed for more precise and focused searches. Such additional precision is especially important when searching the full-text of articles or other records in full-text databases.

Since the use of truncation and the OR logical operator basically broaden a search, the AND operator is the primary method discussed so far for narrowing a search. When used in a full-text database, the AND logical operator often does not limit searches precisely enough since it retrieves any records in which the different ANDed search terms appear anywhere within the same record. For example, let's say we were looking for articles on how the internet is used in the field of chemistry. If we entered the search: internet AND chemistry in a full-text database, we would probably retrieve many articles that mentioned the words internet and chemistry in separate parts of the articles, but which did not really deal with the relationship between the two terms in any way.

So how can you limit a search more precisely than just with an AND operator? There are two basic methods for more accurately limiting searches that are commonly used in full-text searching: "proximity operators" and "field searching." Proximity operators are used to search for records in which terms are relatively close to each, while field searching retrieves records in which terms are in a specific part of the record. In addition, the third logical operator, NOT, may be used to exclude records that contain a given term or terms.

Proximity Operators

The "proximity operator" is another type of "operator" that can be used in addition to logical operators to refine searches. Although different databases use various slightly different types, proximity operators are basically used to specify how near one term must be to another and, sometimes, in what word order those terms should be. One of the most common proximity operator is W/n (the "n" stands for any number). For example, in some databases (including the Proquest Newspapers database), abortion w/20 legislation would retrieve all records containing the words "abortion" and "legislation" within 20 words of each other. Other proximity operators, available in some database programs, limit search terms to the same sentence or to the same paragraph. In the Alta Vista World Wide Web search engine, the NEAR operator (for example, abortion NEAR legislation ) retrieves documents containing terms within 10 words of each other.

When using proximity operators, you should keep in mind the general principle that the closer two words are to each other in a document, the more likely they are related to each other in some way. This is basically why proximity operators provide more search precision than an AND operator. For example, an AND search would retrieve a record in which one search term appeared near the beginning of an article while another term appeared only at the very end of the article. In such a case, the two words are probably not significantly related to each other and the search is not very effective. If, on the other hand, a proximity operator is used to retrieve articles in which two search terms are within the same sentence, it is quite likely that the terms are related to each other in those records.

A proximity operator could also be effective in a search in which you initially think of using a phrase (multiple-word term). For example, the search: abortion legislation would not retrieve an article with the sentence, "This legislation has limited the availability of abortion..." This article would be retrieved, however, using a proximity operator as in the previous examples, abortion w/20 legislation or abortion NEAR legislation.

Adjacency Operators for Search Phrases

In some database search programs, a special kind of proximity operator must be used when doing a keyword search for search phrases (search terms with multiple words--such as "information services" or "college students.") Although many databases will search for phrases exactly as they are entered, some databases require that you indicate that a set of words must be searched as a phrase. In most Web search engines, for example, to search for the phrase: information services, you must put quotes around the phrase, like this: "information services". In databases on the Dialog online service, you must enter (w) between the words in a phrase, e.g. information(w)services.

Field Searching

"Field searching", sometimes referred to as "field delimiting", is an additional method of refining searches that can allow even more precise searching than using just logical operators and proximity operators. Field searching allows you to focus your search on specific fields in all records and to bypass irrelevant information. Different databases include different fields and use different methods for field searching. Some common fields include subject, title, author, journal name, date and language. When searching for a specific subject, the most common field to search is the subject or descriptor field. It is important to remember, however, that the terms included in the subject or descriptor field of most databases are usually limited to those terms included in the controlled vocabulary for that database. To search in the subject or descriptor field, therefore, it is most effective to use terms from the controlled vocabulary for the specific database being searched. In some databases, however, proper names and other exceptional terms may be included in the subject field even though they are not listed in the controlled vocabulary list.

When searching full-text databases, it is often effective to begin by limiting a search to the subject or descriptor field, if the database is adequately indexed. If the result of a subject index search is too limited, the search can be extended to other fields that are broader than the subject terms but still more limited than searching the full text. The abstract field, when available, can be a very effective field to search, since it includes more words than just those in the controlled vocabulary but is limited to words that summarize the key ideas of the full article or other document. Since many full-text databases do not include abstracts, the lead paragraph field is another field that is often used for a degree of search precision and breadth relatively similar to abstracts. The lead paragraph field, which is included in most newspaper databases as well as many other full-text databases, generally includes the first paragraph (or particular number of words comparable to a long paragraph) at the beginning of each document. In some database search programs the headline (or article title) and the lead paragraph can be searched at the same time. (In the Nexis service, for example, the designation for the headline-lead paragraph field is "HLEAD.") When this is available, adding the headline can be somewhat more effective than searching the lead paragraph alone.

The "NOT" Operator

The NOT logical operator excludes or eliminates records that contain a given term or terms. Placing a NOT between two terms instructs the computer to search for all records that contain the first term but that do NOT contain the second term. For example, if you were looking for articles on information services that were not provided by libraries, you could enter the search: "information services NOT libraries." The search would retrieve articles dealing with information services, but articles dealing with libraries would be eliminated. The NOT operator should be used very carefully and cautiously because it can often eliminate records unexpectedly that the searcher would not actually want eliminated. For example, in the search: "information services NOT libraries" if an article was titled "Information Services Outside of Libraries" and if the search included the title field, this article would be eliminated.

Relevance Ranking and Natural Language Searching

For many years computer scientists have been attempting to develop innovations in database search programs (called "search engines") that could improve on traditional Boolean searching capabilities. Various new search features that can potentially help less skilled searchers achieve more accurate search results--especially in full-text databases--are are now commonly available in World Wide Web search engines and in some online database services. These features apply new database program capabilities most commonly referred to as "relevance ranking" and "natural language searching." Essentially, relevance ranking identifies documents primarily according to the frequency in which key terms appear in individual documents as compared to the frequency of those terms in the entire database. (Some programs apply similar or additional relevancy criteria.) Based on statistical analyses of these criteria, the program ranks documents according to those considered to be most relevant and displays them in that order. Natural language searching allows you to enter search descriptions in plain English and the programs then use various linguistic and other techniques to identify significant words and phrases to be searched.

Search features such as relevance ranking can help achieve more accurate searches than through the use of Boolean techniques alone, especially with certain types of searches. These types of features can be more useful when researching conceptual and complex issues as opposed to when the topic is highly specific, such as those involving names of people or organizations. The quality of search results has been commonly measured by two standards: precision and recall. Precision refers to the proportion of the documents retrieved in the search that are relevant to the research question, while recall is the proportion of all the documents in the database relevant to the research question that are actually retrieved by the search. Relevancy ranking and natural language searching may provide better recall in certain searches, but more precision still usually requires human judgment. Although some people believe that these new types of search techniques will eventually replace Boolean searching, these features should generally be used as additional tools that can supplement Boolean strategies rather than substitute for them.

| Home | Syllabus | Assignments | Text | Student Projects | Instructor |


last revised: 2-29-00 by Eric Brenner, Skyline College, San Bruno, CA

These materials may be used for educational purposes if you inform and credit the author and cite the source as: LSCI 105: Online Research. All commercial rights are reserved. To contact the author, send comments or suggestions to: Eric Brenner at brenner@smcccd.cc.ca.us