Integrating the Demographic and Health Surveys, IPUMS-I, and TerraPopulus to Explore Mortality and Health Outcomes at the District Level in Ghana, Malawi, and Tanzania

In this paper, we first show how the Demographic and Health Surveys (DHS) can be integrated with other data sources to expand the types of variables available for analysis of population and health outcomes. Second, we demonstrate one particular example of such integration by modelling the social, physical, and built environment determinants of health outcomes at the district level in Ghana, Malawi, and Tanzania. To do so, we created district-level measures of a number of variables from the DHS, and then merged them with district-level data from the IPUMS, an environmental data set called TerraPopulus, and other sources. We find that it is feasible to combine the DHS with other data sources, and that many health and environment indicators are heterogeneous within countries, justifying further analysis at low levels of geography and suggesting benefits to using such techniques to design fine-grained programmatic interventions. Résumé


Introduction
The Demographic and Health Surveys (DHS) are the preeminent data source for studying a broad range of health outcomes across sub-Saharan Africa. One of the ways to enhance this rich data source is to combine it with other datasets in order to conduct spatially-based analyses, or to provide contextual variables for individual-level analyses. In this paper, we first demonstrate the feasibility of such integration by merging DHS data from Ghana, Malawi, and Tanzania-aggregated at the district level-with data from a variety of sources to create a data set that captures mortality and health outcomes (e.g., child mortality, infant mortality, and the proportion of children born underweight) and key indicators of the social, physical, and built environment, such as breastfeeding duration, rainfall, and access to water/sanitation. We then use these integrated data to carry out an exploratory spatial analysis of the relationship between the social, physical, and built environment and health and mortality outcomes.
We focus on sub-Saharan Africa-the region featuring the highest disease burden in the world, much of it due to HIV/AIDS, maternal causes, and diseases afflicting infants and children (Murray et al., 2012). There are, however, outliers: healthy and unhealthy localities that can offer important insights for researchers and policy makers. Our focus on district-level data applies a distinguished insight from the fertility literature to the study of population health-that understanding differences within, not just between, countries will increase knowledge about the processes that produce good or poor health as well as the influences of local contexts on these processes.
Through our exploratory spatial analysis using DHS data integrated with other sources, we find that many health and environment indicators are heterogeneous within countries. Such heterogeneity justifies analysis at smaller geographic levels than are normally examined with DHS data and suggests that merged data such as ours may be used to design more effective and efficient interventions. In particular, knowing exactly how the social, physical, and built environments matter to health outcomes in specific localities can greatly aid health program design, given that social and built environmental factors may be changed, but physical factors can only be managed.
We summarize literature below as it relates to our application of merged DHS data to understand health and mortality outcomes at the district level. We then describe how we merged the DHS with other data sources, present the results of our exploratory spatial analysis, and finish with conclusions about integrating data sources and conducting spatial analyses.

Literature Review and Theoretical Framework
A wealth of cross-national research on health inequalities focuses on nation states, asking why some have better population health than others (Lopez et al., 2006;Murray and Lopez, 1999;Firebaugh and Goesling, 2004;Caldwell, 1986;Kuhn, 2010;Edwards, 2011). These studies find that with the exception of sub-Saharan Africa (Firebaugh and Goesling, 2004;Wilson, 2001), health inequality is declining on a global scale, leading to demographic convergence.
In sub-Saharan Africa, life expectancy declined in the 1990s (Timaeus and Jasseh, 2004). Coupled with the continent's faster-than-average rate of population growth, this arrested an otherwise positive global trend toward lower levels of health inequality. Despite these troubling trends, Kuhn (2010) recently showed that four of 11 countries with the best infant mortality trajectories over the past 25 years were in sub-Saharan Africa. So while the world's poor health is concentrated in the subcontinent, healthy pockets exist, warranting a thorough investigation of mortality and broader health outcomes.
Scholars increasingly recognize that withincountry variation in health accounts for most of world health inequality-substantially more than the oft discussed national differences (Pradhan et al., 2003). The literature on health inequality has followed the scholarship on income inequality, but with some very different conclusions: two-thirds of global health inequality exists within countries, while only one-quarter of global income inequality exists within countries (Firebaugh, 2000;Milanovic, 1999). Notably, Pradhan and colleagues (2003) report that four of the six countries with the greatest intracountry health inequality are in sub-Saharan Africa-Chad, Zimbabwe, Nigeria, and Mali-but they neither discuss the spatial dimensions of withincountry variation nor specify the contexts that produce it.
Scholarship on health outcomes within countries relies heavily on individual-level data and situating individuals within their particular contexts via multilevel models. The exploratory spatial analysis we present is distinct in conceptualizing health primarily as an attribute of place-one that is not necessarily mediated by individual behaviour. Other scholars have taken a similar approach with promising results. Examining the influence of context on HIV outcomes, Feldacker et al. (2011) find that characteristics of place-specifically income inequality and distance from a health clinic within census enumeration areas-substantially and significantly affected individuals' likelihood of being HIV-positive. Notably, vulnerability to HIV infection was independent of individual indicators of risk behaviour such as multiple partnerships and condom use, especially for women. Similarly, scholars have noted that features of the physical environment like temperature (Patz et al., 2005) and rainfall (Maccini and Yang, 2009) are important determinants of health outcomes. These and similar micro-level studies represent an important shift in strategies to conceptualize and analyse population health-not exclusively as characteristics of individuals but also of places.
Our emphasis on place as a determinant of health reflects that of previous research on fertility. In particular, it hearkens back to insights generated by the European Fertility Project, which demonstrated how the diffusion of knowledge, ideas, and values within countries (i.e., rather than across national boundaries), facilitated fertility declines in Europe during the 19th century (Coale and Watkins, 1986). Further place-based insights from fertility research have come from sub-national studies in India (Guilmoto and Rajan, 2001), Brazil, and Mexico (Potter et al., 2010). We expect that a spatial perspective can yield similar insights into patterns of health and mortality.

Data and Methods
We merged existing data from the DHS, i Integrated Public Use Microdata Series International (IPUMS-I), ii and a new data resource, Terra Populus (TerraPop). iii We then created and integrated new measures to proxy the presence of nongovernmental organizations (NGOs) and media coverage of health issues. In cases where the same variable existed in multiple data sets, we prioritized diversity in sources in order to demonstrate the potential for data integration. We focus on three countries-Ghana, Malawi, and Tanzania-as illustrative cases; our analyses are not comparative, but designed to show that the integration techniques are applicable across locales and that the conclusions of the exploratory spatial analysis hold regardless of context.
We selected these countries as representative of the three major regions of the subcontinent (West, East, and Southern, respectively), capturing some of the diversity in social, physical, and built environments that characterizes sub-Saharan Africa. The integration of data sources allows for analysis at the district level, arguably the ideal level for studying health outcomes because districts are small enough to reflect fine-grained heterogeneity in health, and also the administrative level at which most health programming is targeted. The DHS has provided data from nationally representative samples on population and health in over 90 countries for over 20 years. IPUMS-I, with a goal to "inventory, preserve, harmonize, and disseminate census microdata from around the world," provides a tremendous amount of basic demographic information (Minnesota Population Center, 2013a). TerraPop is a newly developing global population and environmental data resource that combines population data with environmental data such as land use, elevation and temperature (Minnesota Population Center, 2013b).
In both IPUMS-I and TerraPop, district-level measures can be easily generated by aggregating micro-data, as all respondents are linked to the districts in which they reside. With DHS data, however, districts have to be generated by combining available geospatial data with district maps from Global Administrative Areas (GADM). Each DHS cluster is associated with latitude and longitudinal coordinates which can be merged with district boundaries as defined by the GADM database. Using ArcGIS, we matched the cluster codes with districts, iv which then allowed us to produce reliable district-level aggregate measures from DHS micro-data (Burgert et al., 2012).
In the interest of harmonizing comparable measures across all data sources, we restricted the time frame for this analysis to 1998-2002. Publically available data from DHS (including geospatial data), IPUMS-I, and TerraPop are available at this time from all three countries. The end result is data for a total of 110 districts in Ghana, 28 in Malawi, and 134 in Tanzania. Health Indicators We focus on three key infant and child health outcomes -infant mortality, child mortality, and low birth weight-using DHS data. We collapsed mortality measures by district as proportions of the sub-group by age. Proportion of infant deaths for each district is thus the number of those under 12 months who died in the year preceding the survey divided by the total number of those born in the previous 12 months, taken from the birth histories. The proportion of child deaths for each district is similarly the number of those between zero and five years old who died divided by the total number in that same age category. Finally, proportion low birth weight is the number of children in a district who were born below 2.5 kilograms, the World Health Organization cut-off for low birth weight, divided by the total number of children born.
Across the districts examined, approximately eight per cent of infants died during their first year. But 20% of districts experienced no infant deaths in the previous year, while in one district close to a third of children born in the previous year died. In terms of child mortality, a similar pattern emerges: 14% of districts experienced no child deaths, and the maximum was a third of children. The low birth weight variable has the highest percentage of zeroes (close to half), with a mean of three per cent and a maximum of slightly more than a fifth of babies.
The literature indicates that the factors relevant to infant and child mortality include fertility behaviour, infant feeding/nutrition, access to health services, environmental health conditions, and socioeconomic status (Rutstein, 2000;Mosley and Chen, 1984). The main predictors of low birth weight are malnutrition, smoking, and in the case of sub-Saharan Africa, malaria (Guyatt and Snow, 2004;Kramer, 1987). Because low birth weight is a major predictor of infant mortality (McCormick, 1985;Mosley and Chen, 1984) and because our independent variables include characteristics of the physical environment that closely correlate with malaria, we examine the same set of predictors for low birth weight as we do for infant and child mortality. Focusing on the characteristics of place, rather than individuals, we categorize the independent variables as pertaining to social, physical, or built environments. Social Environment To capture the social environment, we use five measures. First, because of the relevance of contraception to birth spacing, overall fertility, and thus infant and child mortality (Cleland et al., 2012), we leverage DHS data to calculate the proportion of women in a district who have heard of family planning on the radio. On average, 50% of women in each district had heard a family planning message on the radio in the months preceding the survey. Second, because of the key role that breastfeeding plays in infant nutrition and mortality (Rutstein, 2000), we operationalized breastfeeding norms by calculating the average months of breastfeeding for the most recent, weaned child of DHS women, which was slightly more than 16 months on average. Third, as a proxy for socioeconomic status (Rutstein, 2000), we obtained from IPUMS-I the proportion of district residents who are literate. From self-reports, one is literate if he or she can read and write in any language, which has an average of 58%.
Our fourth measure of the social environment reflects the potential of the media to raise awareness about health issues (Krieger et al., 2013), particularly before social programs fully address them (Hilliard et al., 2007). The media coverage of health issues is a measure of how frequently the written press mentions a particular district in conjunction with health outcomes. This variable is based on newspaper articles catalogued at AllAfrica.com, which contains articles from publications throughout Africa. v Using the site's premium search engine, we identified all articles published between October 17, 1996 (the earliest date available) and December 31, 2001 (representing the end of our study period) that were tagged as relating to both (1) health and (2) the country in question (Ghana, Malawi, or Tanzania). The "health" category in AllAfrica.com covers a broad array of topics, ranging from HIV/AIDS to tuberculosis to maternal mortality. Then, using Provalis Research's WordStat program, we searched the body of each article for the mention of district and major city names. We coded this variable into four categories: zero mentions for a district; 1 mention; 2-5 mentions; and 6 or more mentions. On average, districts referenced in articles about health were mentioned 2.7 times, and slightly more than half of districts in our sample were not mentioned at all.
Our final measure of the social environment represents an element of access to care and is NGO presence by district. Building on the insight that countries with family planning NGOs have better health outcomes (Robinson, 2011) and that NGOs have been associated with positive health outcomes overall (Leonard, 2002;DeJong, 1991 (Patz et al., 2005) and rainfall (Maccini and Yang, 2009) are important determinants of health outcomes, particularly as they relate to malaria and low birth weight (Guyatt and Snow, 2004). To measure the physical environment, we leveraged data from TerraPop. We use three measures: average temperature, average rainfall, and average elevation. Temperature and average rainfall are based on records from 1950-2000.
Month-by-month temperatures for each grid cell (12 temperature values per grid cell) capture seasonality, and the monthly values of each cell are then averaged to obtain the average annual temperature for the cell. The cells' annual average temperatures were averaged again across districts. The mean is 24.3 °C. The rainfall measure represents the average total annual precipitation across the district in millimetres, and has an average of close to 120 centimetres. The average elevation for the districts is 581 meters.

Built Environment
The ability to safely dispose of human waste and to keep living spaces free of contaminants are major determinants of infant and child mortality (Rutstein, 2000;Ikamari, 2013;Kibet, 2010). We use two measures to index related household and community infrastructure-the presence of a toilet in the home from IPUMS-I and the average number of minutes to a potable water source from DHS. The presence of a toilet in the home is a binary variable at the individual level that we indexed as a proportion of households in the district with a toilet. On average, approximately two thirds of district residents have a toilet in their home. Number of minutes to a potable water source is averaged over all DHS respondents in a district, and is 34 minutes on average (but ranging from five to 300 minutes).
We begin with descriptive spatial analysis based on maps of the three countries. We then present a bivariate analysis at the district level for Ghana, the country with the most complete data.  Table 1 extends the comparisons made between the maps to show bivariate correlations (Pearson's r) between all independent variables and the three measures of infant and child mortality and health for Ghana.

Discussion
Panels 1 and 2 of Figure 1  More important than the inverse relationships between variables is the extensive heterogeneity observed across all three pairs of maps. In Ghana there is striking variation in both literacy and infant mortality. The same can be said of elevation and child mortality in Malawi and of sanitary infrastructure and low birth weight in Tanzania. Such heterogeneity in environment suggests that the same type of health program is unlikely to be equally effective across districts. It also shows that there are "positive deviants" to be explored further: districts with disadvantageous social, physical, or built environments that nonetheless perform well on health outcomes.
When comparing the relative strength of relationships between the three types of environment in Ghana and the three health outcomes (Table 1), each type of environment matters, but the social and physical appear to matter more than the built. Specifically, literacy has a significant relationship with infant mortality, while both literacy and family planning awareness have significant relationships with child mortality-and in the expected directions. The relationship between media mentions and low birth weight is positive, suggesting that rather than helping to convey information, the media may report on worse-off districts precisely because of the (negative) health stories.
There are significant relationships between at least one of the physical environment variables and each of the three dependent variables. Elevation is the only physical environment variable that has a significant relationship across all three dependent variables, and is negatively associated with infant and child mortality, but positively associated with low birth weight. This relationship, as well as the marginally significant, negative relationship between rainfall and child mortality, are somewhat contrary to expectations, and may be the result of other peculiarities of the districts. If so, this suggests the potential of additional district-level measures to capture factors that could influence child health and mortality.
Finally, the only built environment variable with a significant relationship to the health outcomes we examine here is sanitation, which has the expected negative relationship with child mortality.

Conclusions
Our analysis makes two main contributions. First, we demonstrate the feasibility of combining DHS data with a variety of other sources, ranging from those that are readily available (IPUMS, TerraPop) to some requiring much greater prior manipulation (e.g., our media and NGO variables). Second, our exploratory spatial analysis demonstrates high levels of within-country heterogeneity in mortality and health outcomes, as well as high variability in the social, physical, and built environmental characteristics that drive those outcomes. The analysis indicates specifically that knowledge of family planning, literacy, and media mentions are the most important features of the social environment (that we were able to measure) that matter to infant and child health outcomes in Ghana.
The relationships between the physical environment and infant and child health outcomes in Ghana are largely as expected, and equally strong to the social environment relationships. Variables representing the built environment in Ghana have the weakest relationship with health outcomes. Programming to improve infant and child health outcomes in Ghana should thus take into account both those factors that can be changed, even if with difficulty, such as the social and built environments, as well as those that can only be managed, such as the physical environment.
These analyses illustrate the potential utility of combining the abundance of variables in the DHS with yet other data sources in order to account for factors that may be critical for health outcomes but fall outside the scope of the DHS itself. Such an approach significantly expands the number of variables whose relationships with health outcomes can be tested. Where relevant, spatial approaches may be especially well-suited for aiding in the development of programs tailored specifically to these localities, thus leading to greater efficiencies. For example, the most innovative, and thus potentially effective, interventions are simply too expensive to implement across entire countries (e.g., antiretrovirals for HIV prevention). But, if the neediest districts could be identified, funding could be more accurately targeted.
Of course, our analyses are basic and primarily illustrative. In particular, because of small sample size, we were unable to explore as wide of an array of variables as would be necessary to design an effective intervention.
Small sample size also prevented multivariate analysis. In addition, even though the DHS is nationally representative, not all districts have sufficient observations to generate reliable estimates, and the labour associated with creating variables like those measuring media coverage and NGO presence can become prohibitive. Nonetheless, these challenges hint at exciting areas for future research. In particular, because of the spatial and temporal reach of the DHS, our analysis could be expanded to include more countries as well as a longitudinal perspective, in addition to many more variables from the DHS and other sources. More advanced spatial analysis techniques, such as change detection, could be applied to the data, as could sophisticated multivariate modelling techniques. Many of the current challenges of using the DHS as with our example will be eased substantially with the Integrated Demographic and Health Series, a project to harmonize and integrate DHS surveys across countries and years. vii Future research seeking to conduct comparative or longitudinal place-based analyses should look to this valuable resource.