WHAT IS A DATABASE?

When you use an automated teller machine to withdraw money from your bank account, you are using a database. When a travel agent makes an airline reservation for you, she is using a database. When a telephone operator gives you a phone number, he is using a database. Any significant collections of information stored on computers are virtually always organized as databases. Databases dealing with published information usually found in libraries, such as books, articles and other types of documents, are commonly called bibliographic databases. Until recently, bibliographic databases have been primarily access tools. Access tools--such as catalogs and indexes--do not give you the actual book, article or other material with the full information you'll need for your research; they give you enough information to find that material. As you'll learn later in this chapter, it is now more and more common to find databases that go beyond just access tools and include the full text or other information from articles, books or other documents. Understanding the elements of databases and how they work is a key to accessing information in the information age.

Earlier we defined a database simply as a collection of information organized in a systematic form so that specific pieces of the information can be easily accessed. If we take the books in a library and list the title, author and subject(s) of each book, we have a collection of information that can be organized as a database. The trick is to use a form of organization that can make it easy to find all of the books on a specific subject or by a particular author.

Records and Fields

All databases are organized by two basic elements: records and fields.

Records are the units of information that can be retrieved. In our library example, all of the information describing each book is a record.

Fields are the different parts of a record by which the record is retrievable. The fields in our example are title, author and subject. Let's look at how records and fields are arranged in a very limited library database.

(FIELD 1)

AUTHOR

(FIELD 2)

TITLE

(FIELD 3)

SUBJECT(S)

RECORD 1

HYMAN, RICHARD

INFORMATION ACCESS

ONLINE BIBLIOGRAPHIC SEARCHING; BIBLIOGRAPHICAL SERVICES;

CATALOGING--HISTORY

RECORD 2

MOSCO, VINCENT

THE POLITICAL ECONOMY OF INFORMATION

COMMUNICATION; INFORMATION SCIENCE--ECONOMIC ASPECTS; MASS MEDIA

RECORD 3

SAXBY, STEPHEN

THE AGE OF INFORMATION

COMPUTERS AND CIVILIZATION; INFORMATION TECHNOLOGY--SOCIAL ASPECTS

RECORD 4

TECHRANIAN, MAJID

TECHNOLOGIES OF POWER

INFORMATION TECHNOLOGY--SOCIAL ASPECTS; DEMOCRACY

RECORD 5

TUCKER, FRANK

THE FRONTIER SPIRIT AND PROGRESS

CIVILIZATION, MODERN--1950- ; MASS MEDIA; PROGRESS

RECORD 6

WOODWARD, KATHLEEN

THE MYTHS OF INFORMATION

TECHNOLOGY AND CIVILIZATION; INFORMATION THEORY; MASS MEDIA

You can see that in this simple database, each line of information is a record and each column of information represents a field. When information is organized in this way in a computerized database, the process of retrieving information can be automated.

Accessing Database Information by "Browsing" an Index

Let's see what actually happens when we try to access information in this computerized database. There are two basic methods commonly used to retrieve records from a database. The first method involves looking through an alphabetical list of all of the words in a particular field. In database terminology, such a list is called an "index." In our simplified database, the subject "index" would look like this:

BIBLIOGRAPHICAL SERVICES

CATALOGING--HISTORY

CIVILIZATION, MODERN--1950-

COMMUNICATION

COMPUTERS AND CIVILIZATION

DEMOCRACY

INFORMATION SCIENCE--ECONOMIC ASPECTS

INFORMATION TECHNOLOGY--SOCIAL ASPECTS

INFORMATION THEORY

MASS MEDIA

ONLINE BIBLIOGRAPHIC SEARCHING

PROGRESS

TECHNOLOGY AND CIVILIZATION

In order to find all of the records on a specific subject, such as "mass media", we could simply look through the subject index to see if that subject is listed. If we find a subject in the index, we know that there are records on this subject in the database and we can simply select that subject in order to view all of the records on the subject. If we select MASS MEDIA from this index, the database program displays the following records:

(FIELD 1)

AUTHOR

(FIELD 2)

TITLE

(FIELD 3)

SUBJECT(S)

RECORD 1

MOSCO, VINCENT

THE POLITICAL ECONOMY OF INFORMATION

COMMUNICATION; INFORMATION SCIENCE--ECONOMIC ASPECTS; MASS MEDIA

RECORD 2

TUCKER, FRANK

THE FRONTIER SPIRIT AND PROGRESS

CIVILIZATION, MODERN--1950- ; MASS MEDIA; PROGRESS

RECORD 3

WOODWARD, KATHLEEN

THE MYTHS OF INFORMATION

TECHNOLOGY AND CIVILIZATION; INFORMATION THEORY; MASS MEDIA

This process of looking through an index to find the right word(s) to retrieve specific records is commonly called "browsing" an index. Browsing an index is especially useful when you are not sure what word(s) to use for a search.

 

Accessing Database Information by "Searching" for "Keywords"

The second method of accessing records-- commonly called "keyword searching"-- does not display an index. A "keyword search" simply looks for all of the records that contain the given word or words-- sometimes called "keyword(s)" or "search term(s)". A "keyword search" may or may not be limited to a specified field or group of fields. If we do a "keyword search" for "information" in all of the fields in our database, the computer looks for any appearance of the word, "information", anywhere in any record. All of the records containing "information" are then copied into a new list, or "set", of records. The resulting set is shown below:

(FIELD 1)

AUTHOR

(FIELD 2)

TITLE

(FIELD 3)

SUBJECT(S)

RECORD 1

HYMAN, RICHARD

INFORMATION ACCESS

ONLINE BIBLIOGRAPHIC SEARCHING; BIBLIOGRAPHICAL SERVICES; CATALOGING--HISTORY

RECORD 2

MOSCO, VINCENT

THE POLITICAL ECONOMY OF INFORMATION

COMMUNICATION; INFORMATION SCIENCE--ECONOMIC ASPECTS; MASS MEDIA

RECORD 3

SAXBY, STEPHEN

THE AGE OF INFORMATION

COMPUTERS AND CIVILIZATION; INFORMATION TECHNOLOGY--SOCIAL ASPECTS

RECORD 4

TECHRANIAN, MAJID

TECHNOLOGIES OF POWER

INFORMATION TECHNOLOGY--SOCIAL ASPECTS; DEMOCRACY

RECORD 5

WOODWARD, KATHLEEN

THE MYTHS OF INFORMATION

TECHNOLOGY AND CIVILIZATION; INFORMATION THEORY; MASS MEDIA

This illustrates the most basic process of "database searching"-- that is, looking up information using a computerized database. Notice that it is essentially a system of matching words. The searcher must come up with the right word (or combination of words) that exactly match the same word(s) within the records that are needed. The "browse" method uses an index in a database to help identify the desired search term in a specific field. The "keyword search" method looks directly for records containing the given search terms without displaying an index and without necessarily limiting the search to a specific field.

It is important to realize that computers do not "understand" anything you type into them. They simply do very quick matching of the characters you enter. You can't expect the computer to automatically understand exactly what you are looking for and then give you just the information you need. In order to be effective in accessing information from computers, you need to keep in mind the basic way databases function. It is very important to 1) first choose the best word(s) to use for a search and 2) then analyze the data that the computer provides in order to refine your search.

 

Database Types

Access Tools

Our database example is actually a reduced and simplified version of one common type of library database-- a computerized catalog, usually called an online catalog. A catalog is also a type of access tool. Other access tools include periodical indexes and abstracts. An access tool is a computerized database or print source that helps you find the book, article, document, audiovisual or other material that you need. It does not include the actual material you'll read or view, but it gives you the information you'll need to access that material. After you use an access tool, you must then retrieve the actual book, article or other material by either finding it in a library (or bookstore) or ordering it to be sent to you--typically through a library interlibrary loan service or through a commercial document delivery service. (See for more information on interlibrary loan and document delivery services.)

Online Catalogs

Online catalogs generally list all of the books in a particular library or group of libraries. They are computerized versions of card catalogs, which have traditionally been used for the same purpose. In our simple book database, the information in each record-- just author, title and subject for each book-- contains quite a bit less information than most actual catalogs. In a typical catalog, each record usually contains not only author, title and subject information, but also other information such as the publisher, place of publication, publication date and the book's call number. The call number is the set of letters and numbers on the spine of the book that indicate where the book is located on the library shelves. By including the call number, the online catalog gives you the information you need to access the book--to find the book on the shelf. When a record consists of this type of basic information about a book or article, it is commonly called a citation. Databases made up exclusively of citations are generally the simplest types of bibliographic databases since each citation includes only the most essential information about a book or other document. The following is an example of a citation from the University of California online catalog, called Melvyl:

Author: Bender, David R. Title: National information policies : strategies for the future /David R. Bender, Sarah T. Kadec, Sandy I. Morton. Washington, DC : Special Libraries Association, c1991.

Description: iv, 62 p. ; 28 cm.Series: SLA occasional papers series ; no. 2. Notes: Includes bibliographical references (p. 48-58).

Subjects: Information services and state -- United States. Information science -- Government policy -- United States.

Other entries: Kadec, Sarah T. Morton-Schwalb, Sandy I.

Call numbers: UCB LibSchLib Z678.2 .B45 1991

Periodical Indexes and Abstracts

Indexes to articles in magazines, newspapers or journals are another common access tool made up primarily of citations. Many of these indexes are available in computerized form. Since magazines, newspapers and journals are all referred to in libraries as "periodicals" (publications that are published "periodically"), databases that list articles from these types of publications are commonly referred to as periodical indexes. (Be sure not to confuse the different uses of the word "index." "Index" is used here to describe a general type of database as distinguished from the use of "index" when referring to the part of a database described previously, i.e., an alphabetical list of all of the words in a particular field.) A typical citation in an index to magazine articles consists of the author, title and page numbers of the article and the title, date and volume number of the magazine. Occasionally, if the title does not identify the content of the article, a brief amount of description will be added (usually in parentheses after the title), as in the following example of a citation from Academic Index database (published by Information Access):

Bewildering the herd. (interview with Noam Chomsky on the mass media industry) by Rick Szykowny il v50 The Humanist Nov-Dec '90 p8(10) 57E3601

In addition to the basic citation data for each article, more and more periodical databases include abstracts--short summaries of the articles. Abstracts are generally from a couple of sentences to a few paragraphs in length and are included in each record following the basic citation. Databases that include abstracts may be referred to as abstracting services, abstract databases or simply abstracts. In other words, "abstracts" may refer to either a part of a record (a short summary of the document) or a type of database (that includes abstracts in its records.) In databases that include abstracts, the text of the abstract usually makes up a separate field. When searching for articles on a particular subject in an abstract database, a researcher can search for a particular keyword or words, not only in the subject or title fields, but anywhere in the entire abstract. Although periodical databases are the most common type of database that includes abstracts, other types of databases may also include abstracts. One well-known abstract database that is not a periodical database is Dissertation Abstracts, which includes summaries of doctoral dissertations and masters theses. The following example is a record from Periodical Abstracts database (published by UMI):

92239317

Title: Why the Old Media's Losing Ground

Authors: Alter, Jonathan

Journal: Newsweek Vol: 119 Iss: 23 Date: Jun 8, 1992 pp: 28

Jrnl Code: GNEW ISSN: 0028-9604 Jrnl Group: News

Abstract: The mainstream media seems to have lost control of the 1992 presidential election and is splitting into two parts: Old Media, which consists of network TV, big newspapers and magazines, public TV and elite journalists, and New Media, which includes the less elitist and more democratic CNN, C-Span, infotainment talk shows, computer bulletin boards and satellite hookups. Photograph

Subjects: Presidential elections; Mass media

Type: Commentary

Length: Medium (10-30 col inches)

The following is another example of a record from an abstract database. This record is from Sociological Abstracts (from Sociological Abstracts, Inc.), a major academic database:

358913 93Z4003 Democratizing the Data Banks: Getting Government Online

Love, James Packard The American Prospect 1992, 9, spring, 48-50.

CODEN: APROEY PUB. YEAR: 1992 COUNTRY OF PUBLICATION: United States LANGUAGE: English DOCUMENT TYPE: Abstract of Journal Article (aja) The development of online access to large-scale information systems can change the relation of citizens to the state, if advocates of electronic access to government data banks succeed in establishing a new principle of democratic access. Under a bill introduced by Representative Charlie Rose of NC, the Wide Information Network for Data Online (WINDO) would make appropriate online government services available to all. Potential benefits of WINDO are discussed, & it is argued that making government services electronically accessible to the public is the first step in making the public sector itself more "user-friendly." W. Howard (Copyright 1993, Sociological Abstracts, Inc., all rights reserved.)

DESCRIPTORS: Data Banks (D196200); Government Policy (D333900); Information Technology (D397175); Access (D004000); Public Sector (D682800)

IDENTIFIERS: public access, online government data banks, legislation/potential benefits; SECTION HEADINGS: methodology and research technology- computer methods, media, & applications (0188)

Access Tools vs. Full-Text Databases

Catalogs and index and abstract databases are considered access tools since they just provide citation or summary information--but not the full text of the document. As computer technology has rapidly increased in speed and storage capacity over the last decade, it is increasingly common to find databases that include the full text of documents as part of the database itself. Three general types of databases that do include the full documents are directory, full-text and image databases.

Directory Databases

Directory-type information includes lists of people, organizations, products or services, with brief data associated with each item. Telephone and zip code directories are some of the most well-known types of directories, but other common types of directories include city directories, government directories and trade and industry directories. When categorizing types of databases, the definition of directory databases is often broadened to include not only these most "pure" forms of directories, but also other reference-type databases that do not include much extended textual data. Some other types of databases that may be included in this category are biographical and statistical databases. The following example is a record from a directory database called Public Opinion Online (POLL). This database, produced by the Roper Center for Public Opinion Research, is a directory of survey questions and responses conducted by major U.S. polling firms and the media:

00059970 QUESTION ID: USHARRIS.081081 R05

005 (Favor or oppose)...Cutting back on the access people have to government records about themselves and public officials under the Freedom of Information Act.

Favor 33%

Oppose 63

Not sure 4

ORGANIZATION CONDUCTING SURVEY: LOUIS HARRIS & ASSOCIATES (HARRIS)

SOURCE: HARRIS SURVEY

SURVEY BEGINNING DATE: 07/08/81

SURVEY ENDING DATE: 07/12/81

SURVEY RELEASE DATE: 08/10/81

INTERVIEW METHOD: Telephone

NO. OF RESPONDENTS: 1252

SURVEY POPULATION: National adult

DESCRIPTORS: RIGHTS

(c) Roper Center for Public Opinion Research, U. of Connecticut

 

The next example is a record from the biographical directory database, Marquis Who's Who (File 234 on Dialog):

00062198 Record provided by: Biographee

Gore, Albert, Jr.

OCCUPATION(S): Vice President of the United States

BORN: Mar. 31, 1948 Washington, DC

PARENTS: Albert and Pauline (LaFon) G.

SEX: Male

FAMILY: married Mary Elizabeth Aitcheson, May 19, 1970; children: Karenna, Kristin, Sarah, Albert III.

EDUCATION:

postgrad., Law Sch., 1974-76

postgrad., Grad. Sch. of Religion, Vander, 1971-72

BA cum laude (Univ. schol, Harvard U., 1969

CAREER:

V.P. of U.S., 1993-

livestock and tobacco farmer, from 1973

homebuilder and land developer, Tanglewood Home Builders Co., 1971-76

U.S. senator from Tenn., 1985-93

mem., 95th-98th Congresses from Tenn., 1977-85

investigative reporter, editorial writer, The Tennessean, 1971-76

CREATIVE WORKS:

Author: Earth in the Balance: Ecology and the Human Spirit, 1992.

MILITARY:

Served with U.S. Army, 1969-71, Vietnam.

MEMBERSHIPS:

Mem. Farm Bur., Tenn. Jaycees.

POLITICAL/RELIGIOUS AFFILIATION: Democrat. Baptist.

CLUBS AND LODGES: Am. Legion, VFW.

Mailing Address:

Office:

The White House

Office of Vice President

Old Executive Office Bldg N

Washington DC 20503

Home:

RR 2

Carthage TN 37030-9802

Full-Text Databases

An increasingly popular type of database is the full-text database. As the name implies, records in full-text databases include the complete text of the articles or other types of documents included in the database. The most common examples of full-text databases are computerized encyclopedias and newspaper databases. Many general periodical databases now include the full-text of growing numbers of the periodicals in the databases. More scholarly and academic journals have been slower to be produced in full-text format. Although computerized encyclopedias on CD-ROM often include photographs and other graphics and a small number of graphic materials are beginning to be included in some other CD-ROM full-text databases, most full-text periodical databases do not yet include any graphics from the print version of the publication. A big advantage of most full-text databases is that all of the words in the entire text of the articles can be searched to find just those articles containing the specified keywords. The following example is a relatively short full-text record from Grolier's Multimedia Encyclopedia:

central processing unit

The central processing unit (CPU) is that part of a DIGITAL COMPUTER where the instructions are interpreted and the specified arithmetic operations and data manipulations are carried out. Ordinarily the CPU has connected to it one or more COMPUTER MEMORY units and INPUT-OUTPUT DEVICES. These associated units usually connect to the CPU via interface terminals and are addressed and commanded by the CPU to respond cooperatively in accordance with the needs of the programmed process. Thus, in terms of function, the CPU contains both the main calculational workshop and the governing control of the computer system. Accordingly, an entire computer system is usually referred to by the name and type of its CPU.

During the 1960s a CPU usually consisted of at least the arithmetic unit and associated registers, instruction registers, decoders, counters, and enactment controls, and other memory registers and controls (see COMPUTER REGISTER). At that time the CPU was usually physically the largest and most complex unit in a computer system and was sometimes also called the mainframe. In the 1970s and 1980s great technical advances in miniaturization made physical size insignificant as a criterion for identifying the CPU. Moreover, the trend toward large computer systems consisting of several interconnected computers has made identification of the CPUs in such systems more complex. In modern microcomputers (see COMPUTER, PERSONAL), extremely small silicon chips constitute CPUs that are actually dwarfed by their larger input and output devices.

Julian Bigelow

Bibliography: Bartee, Thomas C., Digital Computer Fundamentals, 6th ed. (1985); Mims, Forrest, III, Understanding Digital Computers (1986); Nashelsky, Louis, Introduction to Digital Computer Technology, 4th ed. (1988).

Image Databases

A relatively new type of database that will probably become more popular in the future is the image database. Image databases include an exact copy of each page of the articles (or other documents) that they index. Articles that are printed out from image databases are exact reproductions of the original articles, including all photographs, drawings or other graphics, just as if they were photocopied from the periodical. There is one important limitation of most image databases as compared to full-text databases. Most mage databases currently available do not allow the user to search the entire text of articles. The image databases currently on the market provide searching of typical citations and abstracts and then display the full images of articles after the search has been completed. Image-based databases are beginning to be more common as computer technology continues to improve. Ultimately databases will probably allow searching of the full-text of all documents and will provide retrieval of printed images that are exact reproductions of the original. A variation on image databases are online databases that include a feature for the immediate online ordering of photocopies of articles in the database. The copy of the article may be quickly faxed or sent electronically to the user.

Multimedia Databases

Multimedia databases are a new form of database based on the new area of multimedia computing that combines traditional computer text and graphics with sounds, photographs, animation and full-motion video. Multimedia databases generally link various audio and video elements (brief sound recordings or video segments) to database records. This new form of database allows different media (print, audio and video), that have traditionally been read, listened to, viewed and organized separately, to be combined into a single new integrated form. Multimedia databases that provide access to audio and video archives through computerized searchs can greatly facilitate the inclusion of these newer forms of information in scholarly research. Multimedia elements (especially video) require large computer storage capacity and the development of multimedia databases (especially on the World Wide Web) have begun to grow rapidly in the last few years as the technology for increased storage capacity, transfer rates and data compression has improved and has become less expensive.

The distinctions between different bibliographic database types is not always very precise because it is increasingly common to find databases that include different types of records. Many periodical databases that originally only included citations, now include abstracts in most newer records; and many databases with abstracts are adding the full-text of articles from a portion of the indexed periodicals. Some full-text CD-ROM and World Wide Web databases now include graphic images, sounds, animation and full-motion video in selected records.

| Home | Syllabus | Assignments | Text | Student Projects | Instructor |


last revised: 1-15-98 by Eric Brenner, Skyline College, San Bruno, CA

These materials may be used for educational purposes if you inform and credit the author and cite the source as: LSCI 105 Computerized Research. All commercial rights are reserved. To contact the author, send comments or suggestions, email: Eric Brenner at brenner@smcccd.cc.ca.us