Database and Index Types

From Rootsweb
Jump to: navigation, search

General References and Guides

This article is part of a series.
Introduction to the General References and Guides
Overview of Databases and Indexes
Database and Index Types
List of Specific Databases and Indexes
Library Catalogs
List of Useful Finding Aid References

This article originally appeared in "General References and Guides" by Kory L. Meyerink, MLS, AG, FUGA in The Source: A Guidebook to American Genealogy

This article lists a number of database and index types.

Biography Indexes

At least 12 million Americans have been the subject of a biographical sketch in collective biography volumes. While many of these sketches are in local histories, more than 5 million appear in books with a nationwide scope, including biographical dictionaries and encyclopedias such as Who's Who in America and Men and Women of Science. Not only do biographical sketches provide information about the subject's birth, marriage, death, and family, they also usually provide biographical information seldom available in other sources. This may include occupations, political and religious affiliations, military service, educational achievements, lifetime accomplishments, and much more.

Fortunately, locating such sketches has become much easier in the past two decades. To determine if someone has published a brief biography about an ancestor or relative, you must use biographical sources. They are best accessed by a growing number of indexes, many of which are introduced in this chapter. For more information on the following indexes, and many more, as well as the kinds of records they index and how to evaluate them, see chapter 18, 'Biographies,' in Printed Sources.

Census Indexes

After more than three decades of work by hundreds of individuals and dozens of organizations in the genealogical field, statewide census indexes now exist for all extant federal censuses taken from 1790 through 1930. These indexes exist in many forms, including book, microfilm, microfiche, CD-ROM, and Internet databases. Furthermore, for most indexes, more than one index exists, permitting researchers to overcome the inherent problems with census indexes. The following discussion can only provide an overview of the nature, use, and cautions when using census indexes. For more information on census indexes, how to evaluate them, and how to overcome their limitations, see chapter 9, 'Censuses and Tax Lists,' in Printed Sources.

Observe caution when using census indexes. For example, the indexers may not have been well-trained in early American handwriting; most census indexes have been made from microfilm copies, and the writing may have been faded or difficult to read. Most indexes for the 1850 and later censuses contain only the heads of households and persons in the households with different surnames. Often, two or more indexes exist for the same census; if possible, use them all. However, do not depend on the index alone. If an ancestor was known to have lived in a county when a census was taken but does not appear in the index, search the entire township or county. In larger cities for the post-1850 period, city directories may be helpful as a type of index.

Electronic Family Trees

One of the most significant types of genealogical databases spawned by the arrival of personal computers in genealogy is the growing collection of electronic family trees. Often these computer files are wrongly called GEDCOM (which stands for Genealogical Data Communications) files, after the file format that allows sharing of genealogical data between software programs. Regardless of their designation, with more than one billion names circulating in such files, all readily searchable, in part or in full, via the Internet, they deserve the attention of all serious family historians. A solid understanding of these increasingly important tools includes knowing the nature and type of such trees, as well as how to search and evaluate them.

Electronic Family Trees are computerized databases of family history information compiled by individual genealogists that represent the core findings of the genealogist's research. Today, virtually every genealogist uses one or more computer programs to file, store, and manage the information they find about their family during their research. These 'database management programs,' such as Family Tree Maker (FTM), Legacy, The Master Genealogist (TMG), Personal Ancestral File (PAF), Ancestral Quest, and Roots Magic are discussed elsewhere. It is the publicly distributed output of these programs that is of current interest.

The hallmarks of an electronic family tree include the following elements of the database which are used to manage the data:

  • It describes an individual in terms of genealogical identifiers'names with at least some key dates and places (including birth, marriage, and death).
  • It links those individuals to other individuals in the database by birth or marriage.
  • The database uses standard data fields to permit consistency in searching and presentation (it is not just electronic text).
  • It may be published on the Internet and/or on CD-ROM or other electronic media.
  • It identifies the name and contact information for the submitter.
  • It can usually be shared (downloaded) using the GEDCOM format.

Some electronic family trees also include the following elements (at the discretion of the submitter):

  • Source citations
  • Notes made by the compiler

These individual electronic family tree databases reside on the creator's computer, or may be published on a personal website, copied to a CD-ROM, or contributed to any of a number of large online collections.

Types of Electronic Collections

There are two major types of electronic family tree collections: merged and unmerged. In a merged database, all new information submitted by an individual is merged with the existing information, resulting in a list with no duplicate entries. While, in theory, it keeps the database organized, it opens the door to wrong information being merged into the database to exist alongside or, in some cases, to the exclusion of correct information.

To alleviate this concern, most collections of electronic family trees today consist of unmerged databases. Thousands of individual researchers submit their own, self-researched, electronic family tree to a collection. There it resides side-by-side with the contributions of other researchers. Often the same person is included on several family trees. It is then left up to each researcher to evaluate the various entries for the same person and decide which (if any) is correct. Of course, a relative may appear ten, twenty, or even thirty or more times in the same database, if many different people have included him or her in their contribution. For early American ancestors, this is not uncommon.

Searching Electronic Family Trees

While anyone can post their own genealogical database to their personal Web page, the vast majority are published at one (or more) of a number of websites that specifically collect electronic family trees. These sites have each developed a separate name for their collection, along with a search engine that generally allows powerful searches, beyond just first and last names.

Searching electronic family trees can be challenging. The problem with searching such collections is matching the researcher's knowledge to the information in the database. If the submitter only included the individual's name, then including a birth date on the search form will not locate the submitted entry. If the data file reports that John Jones was born 'about 1870' then a search for John Jones born 1873 will not identify him in the database.

When searching such collections, the usual rule of thumb is to provide less information, rather than more. If the collection returns too many 'hits' or matches, then refine the search with more detail. Depending on the search engine, it is usually better to add a child or parent to the search, rather than a place or date. While other researchers may have input a slightly different place or date, the name of a close relative is more likely to be the same.

However, even this often does not work. If the searcher uses the individual's child as a close relative, but the database has the individual as the most recent generation, listing no children, and linked only to parents, such a search request will not find the person being sought. Therefore, try a number of different searches, including different information each time. This may mean other versions of the name, alternate birth or death dates, or places. Try broadening the place. Using 'Montana' for the place may be more successful than the more specific 'Hamilton, Ravalli, Montana.'

The great benefit of electronic family trees, of course, is that the researcher can gain some key clues about entire new branches for his or her own family tree. Often a previous researcher has had access to information and sources another researcher has no knowledge of, or has not yet taken the time to find.

Conversely, one of the great limitations of most electronic family trees is the failure of the submitter to include source citations, or even notes indicating where the information came from. Without such information, it is very difficult to verify the data.

Evaluation of Electronic Family Trees

Evaluating the information in electronic family trees is made more difficult by the typical lack of source citation or notes in most such trees. Therefore, it is crucial to examine the new information and compare it to known information. Due to the rampant copying of family trees, it is not satisfactory to simply determine the number of similar (or identical) entries for the persons in question.

The first step in evaluating the information found is to understand how it was probably gathered. The most common sources for the information in electronic family trees include the following:

  • Personal knowledge of some family member who provided it to the compiler by way of letter, e-mail, phone call, or other means
  • Other electronic family trees published previously
  • Published genealogies or family history books
  • Journal articles about the family
  • Published lineages, such as descent from a Revolutionary War patriot
  • Previous research findings by an earlier member of the family

Following are a few guidelines to use when evaluating information in electronic family trees:

First, examine the individual's record for any source citations or notes, and read them carefully. Make sure they pertain to the individual, and not just to the family tree in general, or a different family in the tree. For example, if the 1850 census is cited, but the person was born in 1856, that source does not pertain to that person, although it may well pertain to the parents.

Second, precise dates (day, month, year) and places (town, county, country) suggest there was some credible source behind that data, even if it is not cited. Most researchers don't just make up the data. Determine what sources could have existed for that place, time period, and ethnic group from which the information may have been obtained. For example, since Ohio did not keep any kind of birth records before 1867, an exact birth date and place in 1842 in Ohio usually suggests either a church record, cemetery inscription, death record, or some family record (such as a Bible). Third, examine the whole tree where the individual appears. Are sources given for some persons? Are there several generations with minimal information, such as just names, with no places or dates (or vague places and dates)? Are there just one or two children in many families? These are clues regarding how carefully (or sloppily) the tree was compiled.

Fourth, seek original records (censuses, death certificates, probate files, and so forth) for the persons of interest which can provide some substantiation of the facts given in the file. You may never find a corroborating document for every fact stated in such databases. True facts sometimes come from sources no longer extant or accessible. Careful research will permit you to judge the validity of the information as a whole.

Ethnic Societies

Societies devoted to researching particular ethnic groups can provide information from data submitted by their members. Two examples of databases that pertain to specific ethnic groups are P.O.I.N.T. (Pursuing Our Italian Names Together), and the Jewish Genealogical People Finder.

As with local societies, ethnic societies generally have an Internet presence, and are readily found through general search engines. Directories identifies additional means of identifying and locating such societies.

Genealogy Indexes

Before conducting extensive research, find out if a genealogy (or family history) for the surname of interest has already been published. Such a find can be of great value, allowing you to build on previous research instead of redoing the same work. Tens of thousands of such works have been published. Many trace the descendants of one person through several generations; others trace the ancestors of a couple. Various combinations also exist but, generally, such books are based on one surname. The use of indexes, catalogs, and bibliographies, many of which are introduced in this chapter, are the critical tools for determining if someone has compiled a history or genealogy that includes your family.

Immigration Indexes

The topic of immigration to the United States is also discussed in chapter 9, 'Immigration Records.' Immigration records are generally available in two forms: printed lists taken from manuscripts or compilations and unpublished manuscript lists. A mammoth, ongoing work seeking to index all printed immigration records is P. William Filby's Passenger and Immigration Lists Index. This work contains more than five million personal names filed alphabetically and includes age (if given), destination, and source citations for approximately 4,000 printed immigration lists. Supplements add approximately 150,000 names every year. Supplements are cumulated every five years. This index has been published on CD-ROM by Family Tree Maker ( and is available as part of the immigration database at Of course, as with all on-going indexes, the CD-ROM is less and less complete as new entries are added to the index, necessitating updates every few years. Two excellent articles by P. William Filby in the Genealogical Journal explain this project: 'Published Passenger Lists' (1979) and 'Published Passenger Lists' (1983). Several other projects and indexes are mentioned in the latter article.

Of the more than 20 million people who came to the United States in the nineteenth century, most are now included in published (book or especially electronic) immigration lists. Many arrival lists compiled by ports of entry survived and have been microfilmed by the National Archives. Indexes exist for arrivals at the following ports for the years indicated:

Baltimore: 1820'74, 1852'97, 1853'66
Boston: 1848'91
Mobile, Alabama: 1820'62
New Bedford, Massachusetts: 1823'74
New Orleans: 1820'50 and 1853'99
New York City: 1820'46 and 1897'1902
Philadelphia: 1800'1906, 1820'74

These indexes are available at the National Archives and its regional branches and at the Family History Library, as well as larger genealogical libraries. Most of these lists are now also indexed on the Internet. Also see Genealogical Research in the National Archives, and Loretto Dennis Szucs's and Sandra Hargreaves Luebking's The Archives: A Guide to the National Archives Field Branches.

Indexes to Original Records

Most indexes to original records exist on a state, county, or other local level. However, some indexes, or types of indexes, pertain to research in an entire country, and deserve brief mention here.

Vital Records

The major databases for vital records are discussed throughout this chapter (see International Genealogical Index, Vital Records Index, and U.S. Social Security Death Index). However, since indexes to vital records differ from the databases, some important aspects of those indexes should be mentioned here. As indicated, databases include most of the significant information about an event, for example, the exact date and place of a birth or death, and the parent's names. Many indexes provide much less information. Sometimes only the name of the subject and the year of the event are given, along with a reference number. In large indexes, this may make it difficult to determine which entry pertains to the person a researcher is seeking.

Large indexes may also make it difficult to recognize the subject of the search, if the name was recorded with a much different spelling. The 'wrong' spelling of the name may be many pages away from the spelling being searched. Electronic indexes (on the Internet or CD-ROM), may not permit easy searching for variant spellings. Therefore, where possible, search indexes at the smallest locality possible, such as the county for vital records, or the specific church or cemetery. This will help eliminate 'noise,' or false positives, in the search.

Each state should have an index to the births, marriages, and deaths they have recorded since they required registration at the statewide level. However, those indexes have at least two drawbacks. First, they generally will not include such vital records as were recorded at the local (usually county) level prior to state requirements. Second, many of those indexes are not readily searchable by genealogists. Rather, the researcher must submit a request, with payment, to have whatever index exists searched by a government employee. This does not allow the researcher to carefully search the index for themselves, so it is not known how thoroughly the index was searched.

While privacy rights continue to keep some records closed, and sometimes close records that were previously public, some states have made efforts to provide publicly available indexes of their vital records. Most commonly this is done with death records, although some statewide marriage and birth records exist. Usually birth or marriage indexes are only public long after the events occurred, and the participants are likely to be deceased.

In conjunction with the Social Security Death Index (see the section later in this chapter), the growing number of statewide death indexes provides increasing coverage for twentieth century deaths (and some from earlier dates). Most of the states have made indexes to some or all of their death records available. Most are posted on the Internet, while others are available only on microfilm. Often the Internet indexes are simply another edition of an earlier microfilm index.

Local History Indexes

Many printed genealogies and biographies are buried in the thousands of local histories that have been published throughout the United States in the last century and a half. Once found, they can produce added insight on a particular ancestor'a father's name or place of origin, for example'or add several generations to a lineage. Most local histories are not yet available on the Internet, and must be accessed at the libraries where the books are housed. The proper use of available indexes can greatly assist in identifying and finding these sources, as well as learning if an ancestor may be discussed within its pages.

For more information on the following indexes, and many more, as well as the kinds of records they index and how to evaluate them, see chapter 17, 'County and Local Histories,' in Printed Sources. Those that are generally nationwide in scope are described in the Specific Databases and Indexes section that follows.

Statewide Indexes

A relatively recent development is the creation of statewide indexes to local histories. Many every-name indexes have been published for individual local histories; now works are appearing that include many histories in one alphabetical index. These are personal-name indexes to those for whom a sketch or important information is available. They are not every-name indexes to all the books included. When using these indexes, check every sketch in an area of interest (county, city) for all people having the surname of interest. In this way the ancestor may be found, even if not the subject of a sketch. Statewide biographical sketch indexes are available (usually published as books, but sometimes existing only as card files) for Alabama, Alaska, Arkansas, California, Colorado, Connecticut, Delaware, Florida, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Montana, Nevada, New Hampshire, New Jersey, New York, North Dakota, Ohio, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Virginia, Washington, West Virginia, Wisconsin, and Wyoming. A complete discussion of both published and manuscript indexes, including bibliographic citations, is in chapter 16, 'County and Local Histories,' of Printed Sources.

Local Societies

Many county historical and genealogical societies maintain files of their members' interests. Such files usually include families from the locality served by the society. The information in them reflects the findings of society members as they have researched families of their areas. Since the societies' focus is on serving their members, they are usually quite helpful in connecting inquiries to members with the same family or surname. Most have websites that reflect their area of coverage, and many have posted databases and indexes to local collections.

Locating genealogical societies has become easier in recent years. Searching for them by name in an Internet search engine will usually locate their site. If the name is unknown, or uncertain, use generic terms in the search engine, such as the name of the county or city and the words genealogical and society. The Federation of Genealogical Societies (FGS) is an umbrella organization to which many (but not all) genealogical societies belong. The FGS website includes links to the home pages of their member societies.

Military Indexes

Almost every United States genealogist has one or more ancestors who served in the military, thus creating great interest in military records. In fact, many lineage societies have been formed around service in a particular war. A few select military indexes are mentioned in the Specific Databases and Indexes section that follows.

Many books have been compiled on those who served in other U.S. wars, as described in Military Records. Also see the discussion of lineage societies in Hereditary and Lineage Organizations.

Online Academic Research Sources

Many libraries have integrated computer technology, CD-ROM databases, and online searching into their reference services. Library personnel can obtain information from hundreds of subscription databases, including America, History and Life, Historical Abstracts, Comprehensive Dissertation Abstracts, Encyclopedia of Associations, National Newspaper Abstracts, Standard and Poor's News Service, Social Science Citation Index, ERIC (Educational Resources Information Center), and many others.

Typically these databases and indexes have been developed over the last thirty to forty years by major corporations who provide the information, for a fee, to libraries (and sometimes, independent researchers). One such corporation is ProQuest, which vends its genealogical sources under the name Heritage Quest Online. Another is Gale Research Co., which provides libraries with online subscription access to biographies and other genealogically useful material.

However, some indexes and databases, such as those listed above, have genealogical value but are not part of typical genealogical packages from these library vendors. Generally their genealogical value is limited to certain situations and circumstances, but the diligent researcher should be aware of them. Such subscriptions are popular at academic and many major public libraries. Depending on the subscription arrangements, library patrons can access some of these research tools from their homes via the Internet. Others can only be used onsite at the library.

Such research tools require little time to search. They can provide bibliographies of books and articles of interest, including locations of specific reference materials that can be borrowed through interlibrary loan. As an example, one search required just seconds to search 9,000 periodicals covering a ten-year period, yielding thirty-two entries of specific interest. It would have taken at least four months to physically search those periodicals, even with the best of indexes.

Most of these databases are now being distributed on the Internet, although CD-ROM versions are also offered for many. They are available at local research libraries. Usually there is no charge for the researcher to use them because the library pays an annual subscription fee as part of their operating budget. Sources of interest to the genealogist now on CD-ROM (and also available online) include Biography Index, America, History and Life, Biography and Genealogy Master Index, Congressional Masterfile, and others.

Many experienced genealogists conduct research in academic libraries. Although large and organized for academic studies, these facilities have the funds to purchase some of the most important research tools available'many of them discussed throughout this book. Almost every academic library has a website where you can explore the potentials for research before signing up. It may be necessary to pay a fee to become a 'Friend of the Library' or an Associate in order to gain borrowing privileges and a library card.

Query Files and Online Forums

A useful but often overlooked form of index is the query file. A query is a kind of 'want ad' that is published in genealogical periodicals. Individual researchers write brief descriptions of the family they are seeking in the hopes of locating others who know more about the family. The query is a popular approach in genealogy for learning if others are researching a particular family. Almost every genealogical society or periodical maintains or prints a query file for its members or subscribers. In addition, some periodicals exist specifically to publish queries. In a very real sense, queries are indexes to ongoing research. The query seldom contains significant genealogical material, but it does, like other indexes, refer to a source for more information. If a researcher is not a member of the society or does not subscribe to the periodical, there is usually a small fee to place a query. The files or publications, however, are usually available for research at no cost. Over time, the addresses associated with queries often become out of date as researchers move. However, the society that maintains the file may have the researcher's present address.

Periodical Indexes

For more than 150 years, genealogists and genealogical societies have been printing periodicals (serials, journals, or magazines) that include a large variety of original sources, abstracts, transcripts, how-to articles, and compiled family histories. Periodicals spawn periodical indexes, which, even though not every-name indexes, are very helpful. Some of the indexes mentioned above include some genealogical periodicals, but the following focus exclusively on periodicals. For more information on the following indexes, and others, as well as the kinds of records they index and how to evaluate them, see chapter 19, 'Genealogical Periodicals,' in Printed Sources.

Vital Records Databases

Vital records are the building blocks of genealogy. With these records of births, marriages, and deaths, and the locations of those events, the family historian builds a family tree. Hence, they are among the most popular of records. Therefore, some of the most significant databases used in genealogy deal with vital records. Although the term 'Vital Records' in the United States tends to refer to government-created records of births, deaths, or marriages, this chapter uses the broader definition, understood in Europe and elsewhere, as any collection of births, marriages, and/or deaths. The chief difference is that under such a definition, information from church registers and some other sources is also considered vital records.


Coming soon...

External Links