This article is aimed at anthropologists with little computer experience who wish to know more about the types of things that are possible to do using such a machine. Conversations with anthropologists have indicated that many would like to know more about database systems. The form comes from teaching classes and individual students as well as from dissatisfaction with introductory books and user manuals filled with business or natural science examples. I have tried to concentrate on the principles of using the type of application rather than on any specific product. For which keys to press or menu items to choose, refer to the product manual. Each product is different in detail but shares common elements with others.
This introduction describes simple examples of such information, how best to represent them and what can be done with them. These examples are deliberately restricted to a few pieces of information, whereas in practice databases usually contain many.
Three fundamental units of information in a database system are the field, the record and the table.
A field contains information of a specific type. In a collection of information about persons one item recorded might be age. When storing this information in a database age would be a field which would contain the age of each person. It is useful to think of fields as the columns in a printed table, each holding information about a specific aspect of the matter being represented.
A record is an instance of the entity being represented. In the above example, a record would contain information about a single person. In a printed table this might be a row. For every person the database might contain name, age and sex.
A table is the combination of rows and columns. Again the analogy of the printed table in a book or a report is useful. It contains information about a number of instances of the same entity type, for example persons, farms or villages.
There is a problem with database system terminology. Many aspects
of a database systems can be referred to by a number of terms. Product
suppliers have an unfortunate tendency to use terms which are inconsistent
with, and occasionally in conflict with, those used in other products
and those used in standard languages such as SQL. The manufacturer
of one widely used micro database system uses the term database to
refer to what I, following more common usage, have called table. This
usage is not only confusing but also reflects operational inadequacies
in the particular product. At the end of this article is a glossary
of terms used to refer to aspects of a database system.
Whether or not to use a database system depends in part on the characteristics of the information and its purpose.
If the data is largely numerical and the purpose is some form of numerical analysis, a spreadsheet or statistical analysis package such as SPSSx may be more appropriate. Most DBMS have simple numerical functions built in which allow simple calculation, but are not really suitable for this type of analysis on their own.
Where data consists for the most part of large pieces of text and the purpose of examining the information relies on its textual nature, again a DBMS is probably inappropriate. Special products often called Textbases or Text Databases exist which are designed for working with this type of information. Where such software is not available a DBMS might prove useful for storing and accessing an index to textual information.
The most common type of information stored in a database system consists of categorisations of data and sometimes of numeric data as well. The purpose is usually to select or sort this information on some criteria internal to the data. Numeric data can be extracted or calculated and transferred to a spreadsheet or statistical package as a secondary operation.
A few database systems are able to store pictures. Those which are at present commercially available tend to be highly specialised towards retrieval and display, and are limited in more general facilities. DBMS able to store pictures as part of a full range of facilities exist at present only in experimental form but will become more commonplace during this decade.
It is in fact possible to make most DBMS perform all of the above tasks, but this requires great expertise, which is rarely available and is usually better achieved by using a more appropriate type of package.

The first field, HHid, is a unique identifier given to each household. In this case an integer number is used but an alphabetic or alphanumeric code could be employed instead. Identifiers of this type are important in distinguishing different occurrences of the same entity. This is because other possible identifiers, such as location, may not have unique values.
Before putting information into a database, the DBMS must be told what the data is like in each field. That is whether it consists of numbers or strings of characters (letters or digits), how many characters can be expected, whether the numbers are integers (eg. 67) or real (eg. 6.7) and the number of decimal places. Some systems have special types for dates, time, currency units and blocks of text. There are limitations with such types. Dates must usually be after a specified day and year, which can sometimes be quite recent (post 1900 or even within the last twenty or so years). They must also be specified in an exact manner, such as 8/5/88. Blocks of text can be included in some systems but can not be searched. Currency units usually have to be of a decimal type.
In the example the fields are defined as follows:
HHid integer number
location string - maximum 30 characters
members integer number
type string - maximum 5 characters
income currency to 2 decimal places
A DBMS will usually have a facility for entering data into a table, record by record and will allow for some data validation. Validation is where only a specified range of values is allowed. This would force an error to be reported if a letter was typed into the field for number of members as this must be an integer number. Many systems also allow ranges to be set, for instance known upper and lower income limits might prevent some typing errors.
When all records are in the database, information can be extracted on various criteria, records with certain values can be counted and the information presented sorted into a different order. Sorting can usually be done on any field and calculations on any numerical field. With large databases, it is possible to select particular fields to be displayed. To answer a question about the household types are found among the poor (defined as less than 100 currency units), households with less than 100 units of income could be selected and the number with each type counted.
Alternatively all of the records could be presented but sorted by number of members, grouping together households of the same size. Average income could be calculated for the whole set of records or for groups of records selected on certain criteria, such as residence in a particular area.
This has described some of the possibilities of using a DBMS to record
a few pieces of information about an entity of a particular type.
The data is in the form of a flat file database.
Some DBMS can only use data in this form. The example is unrealistically
simple but illustrates the relevant points.

It would be much better to have two different tables - one for people and one for households - each with the information recorded only once. The two tables would have to be related together in some way. One way of doing this would be to use the unique identifier HHid. This would be a field in the household table and also in the people table. Each person record would contain the number or code of their household. This common field could then be used to join the information in the household table to that in the people table. In database terminology a unique identifier field is known as a primary key. When such a field is placed in another table in order to create a link (HHid in the person table) it is known as a foreign key. These two tables are a very simple relational database.
Pid, name, age, sex, HHid

Because information on individuals within the household has been added, the members field is no longer needed. The number of people in each household can be calculated by counting the number of people with each household identifier. Type is a more difficult concept and may rely on complex rules of classification. Although it may be possible to write database queries that will work out types from information in the people table, it could be a complex and time consuming task and could require extra data not recorded in the database (for instance the kin relations of the members).
This simple database can be altered to allow other possibilities. If information is available on the income of each household member, this could be recorded in a field within the people table. Perhaps total household income could be calculated from this and it would also allow breakdowns of income by age and sex. If, however, there appeared to be a separate category of general household income, distinct from personal incomes, a field in the household table would be needed to record this data. Most DBMS allow the same name to be used for fields in different tables - though not in the same one - but for clarity it is usually better to have unique names.The two tables now look like this,
Pid, name, age, sex, pincome, HHid
This is a very simple relational database consisting of the two tables, household and people linked by the household identifier (HHid). The types within which the information is classified for this example have been used uncritically. To represent information in a more complex way requires more tables but the method of creating and linking them is similar. The objective in designing a relational database is to store each discrete piece of information once and once only (except of course for the keys used for linking tables eg. HHid). If a piece of information has to be changed at any time - if perhaps an erroneous location has been entered - this is done in one place and nowhere else.
Similar operations to those described for one table can be performed on each table alone, on items selected from one table on the basis of criteria from the other or on items selected on the basis of values in fields in both tables. The personal income of females over 25 years old in households of a given type, for instance, can be examined by this method. As well as calculating figures, the database can be used for retrieving details within selected criteria such as where households are (location) which contain persons over 60 years of age living alone.
Many DBMS have facilities for creating output screens, often called forms or layouts. These are very useful if you are going to be asking for exactly the same information on many occasions, as perhaps with an administrative database of students' details. With a research database, questions are perhaps more likely to be unique and varied. For this purpose a query language is usually better than a pre-set report. It is hoped that this section may provide a rough guide to general principles. It is not intended as a manual.
"What is the average age of members in households of each type
?"
select h.type, avr = avg(p.age)
from household h, person p
where h.HHid = p.HHid
group by h.type
type avr 1b 23 2a 29 2b 45
What is the personal income of females over 25 years old in households
of each type ?
"Where are households which contain persons over 60 years of
age living alone ?"
"Compare the proportion of total income contributed by individuals
and by the household as a unit."
select h.HHid,h.income,totp=sum(p.pincome),
hprop=((float4(h.income)/float4(h.income+sum(p.pincome)))*100),
pprop=((float4(sum(p.pincome))/float4(h.income+sum(p.pincome)))*100)
from hh h,per p
where h.HHid=p.HHid
group by h.HHid,h.income
This query demonstrates how a more complex calculation can be built up in the select statement. The results - pprop for personal income and hprop for household income - are percentages of the total income from household and members.HHid income totp hprop pprop 1 2300 5700 28.75 71.25 2 3050 1750 63.54 36.46 3 4000 3000 57.14 42.86
The most simple type of database, the flat file type, consists of a single table. For those used to statistical packages, such as SPSSx, this is similar to the way data is stored in rows and columns for such a package.
A relational database consists of two or more tables, each linked to at least one other table by a field common to both. The fields must be of the same type in each table to be joined. Ideally each piece of information should occur only once in the database.
A DBMS is most suitable for data consisting of discrete pieces of information, for instance a census or survey. It is much less suitable for large pieces of text and for mainly numeric data where the sole purpose is to calculate numeric results.
This paper has introduced a few of the basic concepts associated with database systems. It is hoped that a further paper will explore how more complex modelling can be achieved with a relational database system and look at how to design a database.
Field type
Fields can consist of character strings (letters or digits), integer
numbers, real numbers and special types such as dates or blocks of
text. The type of a field determines to some extent the operations
that may be performed upon it. Numeric fields may be used in calculation,
character fields can be counted but not added together.