This sample Environmental and Occupational Epidemiology Research Paper is published for educational and informational purposes only. Like other free research paper examples it is not a custom research paper. If you need help with writing your assignment, please use research paper writing services and buy a research paper on any topic.
- Environmental Epidemiology
- Occupational Epidemiology
- Types of Epidemiologic Studies
- Incidence Studies
- Incidence Case–Control Studies
- Prevalence Studies
- Prevalence Case–Control Studies
- Measurement of Exposure
- Subjective Measures of Exposure
- Exposure Monitoring
- Personal Versus Area Sampling
- Sampling: When and How Often?
- Exposure Grouping
- Exposure Modeling
- Selection Bias
- Information Bias
- Interpretation of Environmental and Occupational Epidemiology Studies
In this research paper, we describe the key features of environmental and occupational epidemiology studies, the types of study designs, measurement of exposure, issues of bias, and interpretation of environmental and occupational epidemiology studies. We do not discuss methods of data analysis, and readers are referred to more detailed epidemiologic texts for more information (Rothman and Greenland, 1998).
Environmental epidemiology and occupational epidemiology are separate fields that are usually covered by separate textbooks (Steenland and Savitz, 1997; Checkoway et al., 2004). However, the two fields often involve common exposures and there is therefore inevitably some overlap in the epidemiological methods that are used. For example, exposure to pesticides can be studied in the context of community exposures or exposures to pesticide production workers or commercial sprayers. These two different contexts of exposure provide the basis for the difference between environmental and occupational epidemiology. Each context has its own advantages and disadvantages in terms of conducting research, although in most instances studying such an exposure in the occupational context is easier and more valid scientifically than studying it in the environmental context.
Environmental epidemiology is the use of epidemiology to investigate causes of disease that are found in the environment (Pearce and Woodward, 2004). A recent World Health Report estimated that 24% of the global disease burden and 23% of all deaths can be attributed to environmental factors (World Health Organization, 2006). Among children 0–14 years of age, the proportion of deaths attributed to the environment was estimated to be as high as 36%, with the largest proportion in developing countries. Diseases with the largest absolute burden included diarrhea, lower respiratory infections, other unintentional injuries including occupational injuries, and malaria. The most important risk factors contributing to disease and mortality are unsafe drinking water and poor sanitation and hygiene (diarrhea), and indoor air pollution related largely to household solid fuel use and possibly second-hand smoke as well as to outdoor air pollution (lower respiratory infections) (World Health Organization, 2006). In most developed countries, these risks are now largely controlled, with the exception of outdoor air pollution, through the provision of safe drinking water, adequate food, waste disposal, immunizations, and adequate health care. However, other diseases with suspected environmental causes such as cancer, cardiovascular disease, asthma, chronic obstructive pulmonary disease, and diabetes are still common and are in fact increasing in prevalence in many developed countries.
The term environment is very broad and includes epidemiological studies at the molecular, individual, population, and ecosystem levels. Analyses at the ecosystem level are unique to environmental epidemiology and often require research methods that are quite different from those used in other areas of epidemiologic research, including systems-based approaches such as complexity theory.
There are also some features of environmental epidemiology that provide particular challenges to researchers (Pearce and Woodward, 2004).
Firstly, environmental epidemiology is concerned generally with exposures that are, by definition, characteristics of the environment, not the individuals who live in that environment. Examples include infectious organisms in the water supply, features of the legislative environment (restrictions on smoking in bars for example), and air pollutants both indoors and outdoors. This means that the exposures that are being studied are typically widespread and not readily controlled by the individuals who are directly affected. What are the consequences for epidemiology? The fact that the exposures are widespread means that it may be difficult to find individuals who can act as an unexposed comparison group (for example, persons who are not exposed to air pollution). Sometimes the exposures are not only widespread, but also vary little within a given population (for example, air pollution levels in a neighborhood) compared with the differences between populations. In these circumstances, ecological studies – in which the unit of comparison is the group rather than the individual – may be particularly useful. An example of an ecological study of air pollution would be a study that compared the frequency of respiratory illnesses in different neighborhoods with the average levels of nitrogen oxide and ozone in those locations.
Secondly, environmental epidemiology often involves studying exposures at low levels. An example is environmental dioxin exposure. Exposures in the general environment from sources such as incinerators are usually orders of magnitude less than those that may be experienced in some occupational settings (such as workers in the incinerator, or workers producing chemicals contaminated with dioxin). One consequence is that environmental epidemiology is frequently searching for risks on the margin of detectability. However, this does not mean that the risks presented by low-level exposures in the general environment are necessarily unimportant. First, these exposures are typically involuntary (people who live close to incinerators, for example, have little choice over whether they are exposed to dioxin from the sites), and the public is far more sensitive to potential dangers of this kind than exposures that are seen to have an element of discretion about them. Second, the increase in risk for an individual may be small, but if exposures affect large numbers of people, then the overall burden of illness attributable to the exposure will be substantial. Relatively few people are exposed to dioxin at work, so although this may be an important personal health issue, the impact on the health of the population overall will be relatively small. On the other hand, if low-level environmental dioxin exposures do have health effects, then this would be a significant public health issue since the number of people at risk would be very large.
Thirdly, the measurement of exposure may be particularly difficult in environmental epidemiology studies. For example, if someone experiences spray drift from pesticides being sprayed on a farm near their home, it may be very difficult to determine how much exposure they received, if any.
Finally, studies of environmental causes of disease and injury tend to involve dispersed, heterogeneous populations. This provides particular challenges in recruitment of study participants and in the analysis of findings. It may be difficult to define the exposed population, weakening the confidence with which results can be extrapolated to other groups. The very mixed nature of the general population (in terms of age, health status, and co-exposures) means that an overall average risk estimate may mask considerable variations in the strength of effect in different subgroups. Consider for example a study of hospitalization rates in relation to ambient levels of air pollution in a major city. The population of the city will include a number of groups that are likely to be more susceptible than the average city inhabitant to the effects of pollution (such as the elderly, people with preexisting chest disease, outdoor workers). Typically, the numbers of susceptible individuals and the exposures they receive are not known, and caution must be applied in interpreting the relation observed between pollution levels and health outcome (e.g., numbers of hospital admissions per day). Exposure guidelines based on studies of this kind may not provide adequate protection to the most sensitive groups in the population, and this must be taken into account when epidemiological results are translated into public health policy (Woodward et al., 1995).
The major occupational health problems include cancer, heart disease, respiratory disease, musculoskeletal disease, neurological disease, hearing loss, and injury. Worldwide, 6000 people die each day as a result of their job, and of these deaths 15% are due to accidents and 85% to work-related disease. The picture is quite different when occupational morbidity is considered, with accidents accounting for about 90% of cases and nonfatal disease only about 10% (Driscoll et al., 2004).
Studying exposures in the occupational context has many scientific advantages in comparison to studies of environmental exposures (Checkoway et al., 2004).
Firstly, the exposures are generally well defined in time and space, rather than being ubiquitous in the environment. For example, in a study of occupational dioxin exposure, the exposure may be restricted to just a few departments within a factory, and workers in the other departments may serve as a nonexposed comparison group. Even if all workers in a particular factory receive some exposure, it is relatively straightforward to find a nonexposed comparison group from another factory or industry.
Secondly, as noted above, occupational exposures are typically at much higher levels than environmental exposures, and study power is therefore correspondingly greater.
Thirdly, the estimation of exposure is generally more straightforward in occupational studies. For example, in a study of pesticide production workers, even if individual exposure measurements were not available, it would be relatively straightforward to classify workers in categories of exposure on the basis of their work history (job titles and departments) and a Job-Exposure-Matrix ( JEM) (Checkoway et al., 2004).
Finally, occupational populations are generally less heterogeneous than the communities that are studied in environmental epidemiology. For example, in studies of blue collar workers, there are usually few differences in lifestyle between exposed and nonexposed workers, so confounding by factors such as tobacco smoking and alcohol is usually weak. In addition, occupational populations do not usually include children or the elderly, two groups that may be particularly susceptible to some exposures. Furthermore, for many occupational exposures, it may be rare that the workforce includes pregnant women.
These differences between exposures in the occupational and environmental context mean that it is generally more straightforward, and more valid scientifically, to study the occupational context. On the other hand, there may be difficulties in extrapolating findings from occupational studies to more heterogeneous populations with lower levels of exposure. Furthermore, there are some environmental exposures (e.g., air pollution, pollen exposure) for which it is relatively difficult to find suitable occupational populations. Thus, for many exposures environmental studies provide a useful complement to occupational studies.
Types of Epidemiologic Studies
All epidemiologic studies are (or should be) based on a particular population (the study population, source population, or base population) followed over a particular period of time (the study period or risk period). The different epidemiological study designs differ only in the manner in which the source population is defined, and the manner in which information is drawn from this population (Checkoway et al., 2004).
The most complete approach involves utilizing all of the information from the source population in a cohort study (follow-up study, longitudinal study) of disease incidence. Follow-up may be prospective (which is more expensive and time-consuming but may enable better quality data to be collected), or it may be based on historical records. The most common measure of disease occurrence is the incidence rate, which is a measure of the disease occurrence per unit time, and is the number of new cases of the outcome under study divided by the total person-time at risk. The usual approach is to compare the incidence rate in those exposed and those not exposed to a particular factor (e.g., air pollution) and to estimate the rate ratio. This may involve comparing the disease incidence in the exposed group (e.g., people living next to a factory which emits high levels of air pollution) with some nonexposed external reference population, such as another geographic area or the national population. Alternatively, if the source population involves both exposed and nonexposed persons (e.g., if some people are exposed and some are not exposed within the same geographical area), then a direct comparison can be made within this population.
Table 1 shows an example of an environmental epidemiology cohort study of the population exposed to dioxin after the 1976 accident in Seveso, Italy (Bertazzi et al., 2001). The accident took place in summer 1976 and exposed several thousand people in the neighboring area to substantial quantities of tetrachlorodibenzo-p-diozin (TCDD). Three contaminated zones (A, B, and R) were defined based on dioxin soil measurements along the direction of the prevailing winds. The study included all people living in these three zones at the time of the accident, or entering in the 10-year period after the accident. Vital status over the following 20 years was determined by contacting the vital statistics offices of the 11 study towns and of thousands of municipalities throughout the country to reach those subjects who had migrated (Bertazzi et al., 2001). The expected numbers of deaths were estimated based on the age, calendar period, and gender distribution of the population over the 20-year follow-up period. Table 1 shows the findings for all-cause mortality and cancer mortality in the two most heavily exposed zones (A and B) in the 20 years following the accident. It shows that there was little evidence of an elevation in all-cause mortality, but there was a significant increase in cancer mortality, particularly for the period 15 years or more after the accident.
Table 1 Cohort study of the population in zones A and B combined exposed to dioxin after the 1976 accident in Seveso, Italy
Modified from Bertazzi PA, Consonni D, Bachetti S, et al. (2001) Health effects of dioxin exposure: A 20-year mortality study. American Journal of Epidemiology 153: 1031–1044.
Incidence Case–Control Studies
Cohort studies are the most complete and definitive approaches to studying the occupational causes of disease, since they utilize all of the information in the source population. However, they often require large numbers and may be very expensive in terms of time and resources. The same findings can often be achieved more efficiently by using a case–control design. The key feature of case–control studies is that they involve studying all of the cases from the source population over the defined risk period (e.g., all cases of lung cancer in Rome during 2002), but only a sample of the non-cases are studied (e.g., a general population sample of people who do not currently have lung cancer). Exposure information is then collected for both groups. The aim is to obtain the same findings that would have been obtained with a full cohort study, but in a more efficient manner, because exposure information is collected only on the cases and a sample of controls, rather than on the entire population. For example, the earliest studies of smoking and lung cancer used the case–control design and the findings were subsequently confirmed in cohort studies.
In case–control studies, the relative risk measure is the odds ratio, which is the ratio of the odds of exposure in the cases (i.e., the number exposed divided by the number not exposed) and the odds of exposure in the controls. Gaertner et al. (2004) conducted a case–control study of occupational risk factors for bladder cancer in Canada. They identified incident cases of histological confirmed bladder cancer in adults aged 20–74 years identified through the provincial cancer registries in seven Canadian provinces, and selected 2847 controls from the general population of these provinces matched for age and gender. Cases and controls were sent postal questionnaires with telephone follow-up when necessary. Table 2 shows the findings for auto mechanics, an occupation which involves exposure to exhaust fumes and lubricating oils, both of which can contribute to bladder cancer risk (Gaertner et al., 2004). A higher proportion of cases than controls had worked as an auto mechanic (OR = 1.69, 95% CI 1.02–2.82) and there was a statistically significant association with duration of employment (Table 2).
Table 2 Case–control study of occupational risk factors for bladder cancer in Canada
P-value for trend = 0.01.
- a Adjusted for age, province, race, smoking, ex-smoking, and consumption of fruit, fried food and coffee, as well as for employment in nine suspect occupations.
- b Reference category.
Modified from Gaertner RRW, Trpeski L, and Johnson KC (2004) A case–control study of occupational risk factors for bladder cancer in Canada. Cancer Causes & Control 15: 1007–1019.
Incidence studies are usually conducted when studying fatal diseases such as cancer, since cases can be identified through death registrations or cancer registrations. However, when studying nonfatal chronic disease such as asthma, it is difficult to detect incident cases without very intensive follow-up. Thus, it is more common to study prevalence rather than incidence. This can be defined as point prevalence estimated at one point in time, or period prevalence which denotes the number of cases that existed at any time during some time interval (e.g., 1 year). Prevalence studies represent a considerable saving in resources compared with incidence studies, since it is only necessary to evaluate disease prevalence at one point in time, rather than continually searching for incident cases over an extended period of time. On the other hand, this gain in efficiency is achieved at the cost of some loss of information, since it may be much more difficult to understand the temporal relationship between various exposures and the occurrence of respiratory disease. In particular, it is usually difficult to ascertain, in a prevalence study, at what age disease first occurred, and it is therefore difficult to determine which exposures preceded the development of disease, even when accurate historical exposure information is available.
Table 3 shows an example of a prevalence study. Ehrlich et al. (1998) conducted a cross-sectional study of kidney function abnormalities among 382 South African lead battery factory workers. Data on current and historical blood lead concentrations were available to categorize workers by exposure level. There were increasing prevalence trends of abnormalities of serum creatinine, serum uric acid, urinary N-acetyl-b-D-glucosaminidase with both current and historical cumulative blood levels.
Table 3 Prevalence (%) of renal dysfunction in South African lead/acid battery production workers
- a N-acetyl-b-D-glucosaminidase.
Modified from Ehrlich R, Robins T, Jordaan E, et al. (1998) Lead absorption and renal dysfunction in a South African battery factory. Occupational and Environmental Medicine 55: 453–460; Checkoway H, Pearce N, and Kriebel D (2004) Research Methods in Occupational Epidemiology. New York: Oxford University Press.
Prevalence Case–Control Studies
Just as an incidence case–control study can be used to obtain the same findings as a full cohort study, a prevalence case–control study can be used to obtain the same findings as a full prevalence study in a more efficient manner. For example, if obtaining exposure information is difficult or costly (e.g., if it involves lengthy interviews or collection of serum samples), then it may be more efficient to conduct a prevalence case–control study by obtaining exposure information on all of the prevalent cases of disease and a sample of controls selected at random from the non-cases.
Table 4 shows an example of a prevalence case–control study. Studies of congenital malformations usually involve estimating the prevalence of malformations at birth (i.e., this is a prevalence rather than an incidence measure). Garcia et al. (1999) conducted a (prevalence) case–control study of occupational exposure to pesticides and congenital malformations in Comunidad Valenciana, Spain. A total of 261 cases and 261 controls were selected from those infants born in eight public hospitals during 1993–1994. For mothers who were involved in agricultural activities in the risk period (the month before conception and the first trimester of pregnancy), the adjusted prevalence odds ratio for congenital malformations was 3.2 (95% CI 1.1–9.0). There was no such association with exposure outside of this period, or with paternal agricultural work.
Table 4 Case–control study of parental agricultural work and congenital malformations
The risk period was defined as the month before conception and the first trimester of pregnancy.
- a Adjusted for maternal and paternal confounders: spontaneous abortion (month), twins (index pregnancy), drug use during pregnancy (mother), heavy smoking during pregnancy (mother), education (mother), industrial work (father), and age >40 years (father).
- b Reference category.
Modified from Garcia AM, Fletcher T, Benavides FG, et al. (1999) Parental agricultural work and selected congenital malformations. American Journal of Epidemiology 149: 64–74.
Measurement of Exposure
In studies of environmental and occupational causes of disease, the distinction must be made between exposure and dose. The term exposure refers to the presence of a substance (e.g., environmental pesticide exposure) in the external environment. The term dose refers to the amount of substance that reaches susceptible targets within the body (e.g., concentration of a specific pesticide metabolite in the liver) (Checkoway et al., 2004).
Epidemiological studies rarely have optimal exposure/ dose data and often rely on relatively crude measures of exposure. The key issue is that the exposure data need not be perfect, but that it must be of similar quality for the various groups being compared. Provided that this principle is followed, then any bias from misclassification of exposure will be nondifferential, and will tend to produce false-negative findings. Thus, if positive findings do occur, one can be confident that these are not due to inaccuracies in the exposure data; on the other hand, if no association (or only a weak association) is found between exposure and disease, then the possibility of nondifferential information bias should be considered. In general, the aim of exposure assessment is to: (1) ensure that the exposure data are of equal quality in the groups being compared and (2) ensure that the data are of the best possible quality given the former restriction.
Subjective Measures of Exposure
More often than not exposure or dose cannot be measured directly; instead researchers have to rely on subjective methods of exposure assessment. This is particularly the case in historical cohort studies and in case–control studies focusing on diseases with a long latency period.
Traditionally, exposure to risk factors such as environmental tobacco smoke has been measured with questionnaires, and this approach has a long history of successful use in epidemiology. More recently, it has been argued that the major problem in epidemiology is the lack of adequate exposure data, and that this situation can be rectified by increasing use of molecular markers of exposure (Schulte and Perera, 1993). In fact, there are a number of major limitations of currently available biomarkers of exposures such as cigarette smoking, particularly with regard to historical exposures. Questionnaires have good validity and reproducibility with regard to current exposures and are likely to be superior to biological markers with respect to historical exposures.
In occupational epidemiology, exposure is often estimated simply on the basis of occupation and industry and is typically dichotomized as never/ever exposed. More recently there has been increased use of semi-quantitative exposure assessment covering the whole exposure period using the full work history, and applying quantitative job exposure matrices (JEM) and expert assessment (Checkoway et al., 2004). In the absence of more sophisticated methods, these approaches may provide an efficient and low-cost method of assessing exposure, but it may result in considerable (nondifferential) misclassification.
In addition to questionnaires, JEMs, and biological measurements, personal or environmental monitoring is commonly used to measure environmental or occupational exposures. Although this has the potential to provide a more valid and accurate exposure assessment, this may not always be the case and is strongly dependent on the chosen sampling strategy, which in turn is dependent on a large number of factors, including:
- type of exposure and disease or symptoms of interest;
- acute versus chronic health outcomes (e.g., disease exacerbation versus disease development);
- population versus patient-based approaches;
- suspected exposure variation both in time and space, and between the diseased and reference populations;
- available methods to measure exposure;
- costs of sampling and analyses.
Data collected for environmental or occupational monitoring purposes may be of limited value in epidemiological studies. For example, monitoring is often done in areas where exposures are likely to be highest, in order to ensure compliance with exposure limits. Epidemiological studies, by contrast, require information on average levels of exposure and it may therefore be necessary to conduct a special survey involving random sampling, rather than relying on data collected for monitoring purposes.
Personal Versus Area Sampling
In general, personal measurements best represent the etiologically relevant current exposures, and personal sampling is therefore preferred over area sampling. Modern sampling equipment is now sufficiently light and small to allow it to be used for personal sampling purposes, and several studies focusing on chemical air pollution, for example, have demonstrated its feasibility in both the indoor and outdoor environments (Checkoway et al., 2004). However, personal sampling may not always be possible due to practical constraints, i.e., it is too cumbersome for the study subjects, or there is no portable equipment to make the desired measurements (measurements of viable microorganisms, for example).
In situations where personal sampling is not possible, area sampling can be applied to reconstruct personal exposure using the microenvironmental model approach. In this model, exposure of an individual to an airborne agent is defined as the time-weighted average of agent concentrations encountered as the individual passes through a series of microenvironments. However, some exposures only occur episodically, and these patterns are not likely to be accurately captured by environmental area samplers. In addition, it is practically impossible to measure all the relevant microenvironments.
Sampling: When and How Often?
To the extent to which this is possible, samples should be taken such that they represent the true exposure at the appropriate time window. In the case of acute effects, exposure measurements taken shortly before the effects occurred would be most useful. For chronic effects, the situation is more complicated since exposure should ideally be assessed prior to the occurrence of health effects and preferably in the time window that is biologically most relevant, i.e., when the exposure is thought to be the most problematic or when subjects are most susceptible for these exposures. This is only possible in longitudinal cohort studies (or historical cohort studies where historical exposure information is available). Even then it is often not clear when people are most susceptible to the exposures of interest. In cross-sectional studies, exposure measurements can also be valuable in assessing retrospective exposures, particularly when the environment in which people live or work has not changed significantly.
Measures of exposure should be sufficiently accurate and precise, so that the effect of exposure on disease can be estimated with minimal bias and maximum efficiency. Precision can be gained (that is, measurement error can be reduced) by increasing the number of samples taken either by: (1) increasing the number of subjects in whom exposure is measured or (2) increasing the number of exposure measurements per subject. In population studies, repeated sampling within subjects is particularly effective with exposures that are known to vary largely over time within subjects relative to the variation observed between subjects with the same job title or in the same work force. If the within-subject variability is small compared to the variation between subjects, however, repeated measures will not significantly reduce measurement error. If within- and between-subject variation is known (from previous surveys or pilot studies, for example) the number of samples required to obtain a given reduction in bias of the risk estimate can be computed in the manner described by Boleij et al. (1995). For instance, in studies that involve airborne sampling of viable micro-organisms in the indoor environment a within- versus between-home variance ratio of 3–4 in concentration is not uncommon, due to high temporal variation in microbial concentrations, combined with very short sampling times. In this particular situation, 27–36 samples per home would be required to estimate the average exposure reliably for an epidemiological study with less than 10% bias in the relationship between some health endpoint and the exposure. For most other exposure situations, the within- versus between-subject variation is, however, substantially lower, and far fewer repeated samples are therefore required.
In occupational epidemiology, a significant increase in validity may be achieved by using group mean exposure levels rather than individual levels since group-based exposure levels often (but not always!) vary less within job titles than within individuals. Exposure groups may be based on occupational categories, job title, work area, etc. Intragroup and intergroup variances and the pooled standard error of the mean can be calculated to evaluate the relative efficiency for various grouping procedures. Provided that reasonably homogeneous exposure groups can be defined with sufficient contrast between them, these same groups can be used to predict exposure levels of subjects for whom no exposure measurements are available, making this a very attractive option when limited resources are available to assess exposure. A similar approach may be employed for environmental exposures, but defining exposure groups with sufficient contrast is often not feasible because exposures often vary little within a given population. Ecological analyses may in those circumstances be more efficient.
If the main factors that explain the variation in personal exposure are known, then mathematical models can be developed to predict individual exposure levels for those subjects where no or only limited exposure measurements are available (provided that valid information on determinants of exposure is available). Multiple regression models are most commonly employed, and can include variables such as tasks performed, type of production, environmental or climate characteristics, use of personal protective equipment, personal behavior, time spent in exposed areas, etc. Although these models can be very useful, they have limitations. In particular, the prediction model is generalizable only for the particular situation in which the data were collected. Extrapolation to other environments with the same exposure, or to the same environment at a different time point, may not be valid, and collection of new exposure data to update and/or validate the old model may be necessary (Boleij et al., 1995). Although exposure models to predict individual exposures have been used in environmental epidemiology, their use is more widespread (and perhaps more successful) in occupational epidemiology. Some examples of empirical exposure modeling include models to assess cadmium levels in blood in the general population, inhalation exposure to hydrocarbons among commercial painters, exposure to inhalable dust in bakery workers, and chemical and mutagenic exposure in the rubber industry. These types of exposure models have been shown to explain 50–80% of the variability in exposure, but models with poorer performance have also been described. For example, Van Strien et al. (1994) assessed the association between home characteristics and house dust mite allergen levels in mattress dust using multiple regression analyses, and this model explained ‘only’ 26% of the variance.
Although presented as separate strategies, often exposure assessment in occupational and epidemiological studies involve combinations of different approaches, for example a combination of subjective and objective measurements, or a combination of current personal sampling and the use of historical exposures collected for monitoring purposes.
Systematic error, or bias, occurs if there is a difference between what the study is actually estimating and what it is intended to estimate. Systematic error is thus distinguished from random error in that it would be present even with an infinitely large study, whereas random error can be reduced by increasing the study size.
There are many different types of bias, but three general forms have been distinguished (Rothman and Greenland, 1998): Confounding, selection bias, and information bias. In general terms, these refer to biases inherent in the source population because of differences in disease risk between the groups being compared (confounding), biases resulting from the manner in which study subjects are selected from the source population (selection bias), and biases resulting from the misclassification of these study subjects with respect to exposure or disease (information bias).
Confounding occurs when the exposed and nonexposed groups (in the source population) are not comparable due to inherent differences in background disease risk, usually due to exposure to other risk factors. Similar problems can occur in randomized trials in that randomization is not always successful and the groups to be compared may have different characteristics (and different baseline disease risk) at the time that they enter the study. However, there is more concern about noncomparability in epidemiological studies because of the absence of randomization.
Confounding can be controlled in the study design, or in the analysis, or both. Control in the analysis involves stratifying the data into subgroups according to the levels of the confounder(s) and calculating a summary effect estimate that summarizes the information across strata. For example, in a study of environmental tobacco smoke (ETS) exposure and lung cancer, we might compare the risk of lung cancer in people exposed and people not exposed to ETS. We might make this comparison within five different age groups and in men and women, yielding ten (5 2) different comparisons; for each stratum, we would calculate the relative risk of lung cancer in those exposed to ETS, compared with those not exposed, and we would then average these relative risks across the strata, giving more weight to strata with larger numbers of people (and lung cancer cases).
Whereas confounding generally involves biases inherent in the source population, selection bias involves biases arising from the procedures by which the study subjects are chosen from the source population. Thus, selection bias is not usually an issue in a cohort study involving an internal reference population and with complete followup, since this incorporates all of the available information from the source population. Selection bias is of more concern in case–control studies since these involve sampling from the source population. In particular, selection bias can occur in a case–control study if controls are chosen in a nonrepresentative manner, e.g., if exposed people were more likely to be selected as controls than nonexposed people.
Information bias involves misclassification of the study subjects with respect to disease or exposure status. Thus, the concept of information bias refers to those people actually included in the study (whereas selection bias refers to the selection of the study subjects from the source population, and confounding generally refers to noncomparability within the source population).
Nondifferential information bias occurs when the likelihood of misclassification of exposure is the same for diseased and nondiseased persons (or when the likelihood of misclassification of disease is the same for exposed and nonexposed persons). Nondifferential misclassification of exposure generally biases the effect estimate toward the null value. Thus, it tends to produce false-negative findings and is of particular concern in studies that find no association between exposure and disease (although it should be emphasized that nondifferential misclassification of a confounder can lead to bias away from the null if the confounder produces confounding towards the null). Differential information bias occurs when the likelihood of misclassification of exposure is different in diseased and nondiseased persons (or the likelihood of misclassification of disease is different in exposed and nonexposed persons). This can bias the observed effect estimate in either direction, either toward or away from the null value. For example, in a lung cancer case–control study, the recall of exposures (such as pesticide exposure) in healthy controls might be different from that of cases with lung cancer. In this situation, differential information bias would occur, and it could bias the odds ratio toward or away from the null, depending on whether cases were more or less likely to recall previous exposures than controls.
As a general principle, it is important to ensure that the misclassification is nondifferential, by ensuring that exposure information is collected in an identical manner in diseased and nondiseased (and that disease information is collected in an identical manner in the exposed and nonexposed groups). In this situation, the bias is in a known direction (toward the null), and although there may be concern that negative findings may be due to nondifferential information bias, at least one can be confident that any positive findings are not due to information bias.
Interpretation of Environmental and Occupational Epidemiology Studies
Occupational Epidemiology Studies The first task in interpreting the findings of an epidemiological study is to assess the likelihood that the study findings represent a real association, or whether they may be due to various biases (confounding, selection bias, information bias) or chance. If it is concluded that the observed associations are likely to be real, then attention shifts to more general causal inference, which should be based on all available information, rather than on the findings of a single study. A systematic approach to causal inference was elaborated by Bradford Hill (1965) and has since been widely used and adapted.
The temporal relationship is crucial; the cause must precede the effect. This is usually self-evident, but difficulties may arise in studies (usually case–control or cross-sectional studies) when measurements of exposure and effect are made at the same time (e.g., by questionnaire, blood tests, etc.).
An association is plausible if it is consistent with other knowledge. For instance, laboratory experiments may have shown that a particular environmental exposure can cause cancer in laboratory animals, and this would make more plausible the hypothesis that this exposure could cause cancer in humans. However, biological plausibility is a relative concept; many epidemiological associations were considered implausible when they were first discovered but were subsequently confirmed in experimental studies. Lack of plausibility may simply reflect current lack of medical knowledge.
Consistency is demonstrated by several studies giving the same result. This is particularly important when a variety of designs are used in different settings, since the likelihood that all studies are making the same mistake is thereby minimized. However, a lack of consistency does not exclude a causal association, because different exposure levels and other conditions may reduce the impact of exposure in certain studies.
The strength of association is important in that a strongly elevated relative risk is more likely to be causal than a weak association, which could be influenced by confounding or other biases. However, the fact that an association is weak does not preclude it from being causal; rather it means that it is more difficult to exclude alternative explanations.
A dose–response relationship occurs when changes in the level of exposure are associated with changes in the prevalence or incidence of the effect. The demonstration of a clear dose–response relationship provides strong evidence for a causal relationship, since it is unlikely that a consistent dose–response relationship would be produced by confounding.
Reversibility is also relevant in that when the removal of a possible cause results in a reduced disease risk, the likelihood of the association being causal is strengthened.
Of these criteria for causal inference, only the criterion of temporality is a necessary criterion for establishing causality, in that if the cause does not precede the effect then the association must not be causal. Furthermore, none of these criteria, either individually or collectively, is sufficient to establish causality with certainty, but causality may be assumed to have been established beyond reasonable doubt if these criteria are substantially met.
- Bertazzi PA, Consonni D, Bachetti S, et al. (2001) Health effects of dioxin exposure: A 20-year mortality study. American Journal of Epidemiology 153: 1031–1044.
- Boleij JSM, Buringh E, Heederik D, et al. (1995) Occupational Hygiene of Chemical and Biological Agents. Amsterdam, the Netherlands: Elsevier.
- Checkoway H, Pearce N, and Kriebel D (2004) Research Methods in Occupational Epidemiology. New York: Oxford University Press.
- Driscoll T, Mannetje A, Dryson E, et al. (2004) The Burden of Occupational Disease and Injury in New Zealand: Technical Report. Wellington, New Zealand: NOHSAC, 2004.
- Ehrlich R, Robins T, Jordaan E, et al. (1998) Lead absorption and renal dysfunction in a South African battery factory. Occupational and Environmental Medicine 55: 453–460.
- Gaertner RRW, Trpeski L, and Johnson KC (2004) A case–control study of occupational risk factors for bladder cancer in Canada. Cancer Causes & Control 15: 1007–1019.
- Garcia AM, Fletcher T, Benavides FG, et al. (1999) Parental agricultural work and selected congenital malformations. American Journal of Epidemiology 149: 64–74.
- Hill AB (1965) The environment and disease: Association or CAusation? Proceedings of the Royal Society of Medicine 58: 295–300.
- Pearce N and Woodward A (2004) Environmental epidemiology. In: Cameron S, Cromar N, and Fallowfield H (eds.) Environmental Health in Australia and New Zealand, pp. 3–19. Sydney, Australia: Oxford University Press.
- Rothman KJ and Greenland S (1998) Modern Epidemiology. Philadelphia, PA: Lippincott-Raven.
- Schulte P and Perera F (1993) Molecular Epidemiology: Principles and Practices. New York: Academic Press.
- Steenland K and Savitz DA (eds.) (1997) Topics in Environmental Epidemiology. New York: Oxford University Press.
- Teschke K, Olshan AF, Daniels JL, et al. (2002) Occupational exposure assessment in case–control studies: Opportunities for improvement’. Occupational and Environmental Medicine 59: 575–593.
- Van Strien RT, Verhoeff AP, Brunekreef B, et al. (1994) Mite antigen in-house dust-relationship with different housing characteristics in the Netherlands. Clinical and Experimental Allergy 24: 843–853.
- Woodward A, Guest C, Steer K, et al. (1995) Tropospheric ozone: Respiratory effects and Australian air quality goals. Journal of Epidemiology and Community Health 49: 401–407.
- World Health Organisation (2006) Preventing Disease Through Healthy Environments: Towards an Estimate of the Environmental Burden of Disease. Geneva, Switzerland: WHO.