Analysis of demographic and health survey to measure poverty of household in Rwanda

The use of the asset index in poverty targeting is a modern technique. We used the principal component analysis (PCA) technique in order to create the asset index. Then the asset index was used to assess the socio-economic status (SES) of households. The reliability of the index was tested firstly by ascertaining whether the index was internally coherent, secondly the robustness was tested using the sub-indices such as housing infrastructure and ownership. The methodology is applied and demonstrated using the household survey data in Rwanda. The Rwanda data analysis showed that the age of household head, education level of the household head, gender of the household head, place of residence, the province of household head and size of the household (number of household members) were the significant predictors of poverty of the household in Rwanda.


Introduction
A measure of the socio-economic status of households is an important element in most economic and demographic analyses. This measure is very useful not only in estimating poverty and inequality within the society but also can be used as a control variable in assessing the effect of other variables associated with wealth (Filmer and Pritchett, 2001). Most measurements and analysis of poverty have been done based on income in developed countries and consumption or expenditure in developing countries (Sahn & Stifel, 2003). However, collecting data on income and expenditure can be time and money consuming (Vyas and Kamaranayake, 2006). In addition, in low-income countries, measurement of consumption and expenditure is fraught with difficulties such as problem of recall and reluctance to divulge information. Additionally, prices are likely to differ substantially across times and areas, necessitating complex adjustment of the expenditure figures to reflect these price differences (Deaton and Zaidi, 1999). Sahn and Stifel (2003) studied the theoretical framework underpinning household income or expenditure as a tool for classifying socio-economic status in developing countries. Their theoretical framework gave rise to five problems. Firstly, the quality of income and expenditure data is most likely to be poor, particularly in middle and low income countries. Secondly, these data are collected on the basis of recall memory. The recall data are prone to measurement errors. Thirdly, prices of goods, nominal interest rates and depreciation rates for semi-durable or durable goods are difficult to discern when constructing consumption aggregates. Fourthly, consumer price indices in developing countries are unavailable and unreliable, especially when inflation tends to be high or variable. In addition, regional and seasonal price indices in most developing countries are greatly variable and rare to find. Problems of sampling bias, under-reporting of income and difficulties of converting household products into money terms are also raised. For this reason, an asset based approach was essential to determine socioeconomic status as an alternative tool for classifying the households in their socio-economic status. (Filmer andPrichett, 1998 and2001;Montgomery et al., 2000;Sahn and Stifel, 2000and McKenzie, 2005 created an asset index using the demographic health survey variables such as durable goods, source of drinking water, toilet facility and housing quality to describe the household welfare instead of using a household's income or expenditure. There are many methods in literature used to compute the weights of asset index other than PCA. For instance multiple correspondence analysis (MCA) is analogous to PCA, but is used for discrete data (Bartholomew et al., 2002, Booysen et al., 2005. Factor analysis was used by Sahn and Stifel (2003) and has a similar aim to PCA, in terms of expressing a set of variables into a smaller number of indices or factors. The only difference between PCA and factor analysis is that while there are no assumptions associated with PCA, the factors derived from factor analysis are assumed to represent the underlying processes that result in the correlation between the variables. The main problem of the factor analysis method is that not all the assets show a linear relationship with living standards. PCA is the most frequently used method because it is computationally easier, it can use the type of data that can be easily collected in household survey (Vyas and Kumaranayake, 2006), and use all of the variables in reducing the dimensionality of the data (Jobson, 1992).
However, it is not explicit when we are interpreting the asset index as a poverty measure since its effectiveness depends on the choice of asset used. PCA, as other statistical methods, has advantages and disadvantages. The main challenge of PCA based indices is to ensure that the range of asset variables used is broad enough to avoid problems of clumping and truncation. But once clumping and truncation are identified one of the solutions is to include additional variables that capture inequalities between households (McKenzie, 2003). The World Bank in its series of socio-economic differences in health nutrition and population has also constructed PCA based asset indices using DHS data. (Filmer and Pritchett, 2001;Achia et al., 2010) created the asset index but they did not consider land ownership and livestock. In addition, (Achia et al., 2010;National Institute of Statistics of Rwanda et al., 2012) computed the asset index; however, nowhere in their report can you find the test for the reliability of the asset index. Therefore, the current study included the test of reliability of asset index. Moreover, we explored how to quantify land ownership and other latent variables. There are many studies done on determinants of poverty but most of them are based on consumption and expenditure data (Rodriguez and Smith, 1994;Mok et al., 2007;Saidatulakman and Riaz, 2012). The above mentioned authors make use of logistic regression as the primary tools of analysis. Achia et al., 2010 used logistic regression to identify the key determinants of poverty in Kenya based on DHS data; but in their study they did not consider gender of household head or possible interaction effects. Hence this study focusses on a joint application of PCA and regression model in the study of the determinants of poverty. The joint effect of two or more variables (i.e., interaction effects) is also thoroughly discussed. We used the Rwanda Demographic Health Survey (2010) data as an application of the above mentioned methodologies. Thereby the findings of this study will endeavour to contribute to identifying the key factors of poverty of households in Rwanda and hence contribute to the effort of the Economic Development and Poverty Reduction Strategy 2 of Rwanda. The Government of Rwanda has placed many contingency plans in its efforts to alleviate poverty and has created many policies and strategies such as Vision 2020 which is implemented using Economic Development and Poverty Reduction Strategy (EDPRS) and Vision 2020 Umurenge (a highly decentralized integrated rural development programme designed to accelerate the alleviation of extreme poverty reduction in Rwanda).

Data Source
The Rwanda Demographic Health Survey (RDHS) (2010) was done in two stages. In the first stage 492 villages (known as clusters or enumeration areas) were considered with 12540 households, where 2009 and 10531 households were urban and rural respectively. Secondly, systematic sampling was used to select households in the selected villages. All women and men aged respectively 15-49 and 15-59 were eligible to be interviewed. The survey had various types of questionnaires such as for household, for men and for women. We used only the household data to determine the factors determining the poverty among households in Rwanda. The questionnaire includes households' ownership of durable goods, school attendance, source of drinking water, sanitation facilities, washing places, housing characteristics such as building material and number of rooms used for sleeping.

Computation of a principal components analysis and poverty index
We used the statistical procedure of Principal Component Analysis (PCA) to compute the weights of asset index. PCA is a multivariate statistical technique that linearly transforms an original data set of variables into a substantially smaller set of uncorrelated variables that represents most of the information in the original set of variables (Jackson, 1991;Joliffe, 2002). The basic idea is to present a set of variables by a smaller number of variables called principal components. A small set of uncorrelated variables is much easier to understand and use in further analysis than a larger set of correlated variables. The principal components are chosen such that the first component accounts for as much of the variation in the original data as possible subjected to the constraint that the sum of the squares of the scoring factors (or weights) is equal to 1. The second component is completely uncorrelated with the first component, and explains additional but less variation than the first component, subjected to the same constraint of the sum of the squares of the scoring factors equal to 1. The subsequent components are uncorrelated with the previous components; therefore each component captures an additional dimension in the data, while explaining smaller and smaller proportions of the variation of original variables in the data. The remaining components are computed in similar fashion. The cut off point for the number of principal components is based on the magnitude of the variances of the principal components. The graphical method called scree diagram uses the steepness of the graph change as a cutoff point. The first principal component is used as the household's wealth index (Códova, 2009;Filmer and Pritchett, 2001;Manly, 2005). The scoring factors for each indicator from this first principal component are used to generate a household score. The assets that are more unequally distributed across the sample will have a higher weight in the first principal component.

Figure1: Scree plot.
Asset indexes derived from DHS data can be subjected to a number of tests (Filmer and Pritchett, 1998). For instance a good index has to be internally coherent, which means that it has to consistently produce a clear separation across poor, middle and rich households for each asset included in the index. This means that each of the variables included in the index can be compared across households that fall into the poorest 40%, middle 40% and richest 20% of the population based on the asset index. It has also to be robust; that means to produce similar classifications of households or individuals across constructions of asset index based on different subsets of variables (Filmer and Prithett, 2001;Booysen, 2002).

Poverty index
For Rwanda household questionnaire data which has 53 variables, PCA analysis scree plot (refer Figure 1) shows the cutoff points of two principal components. The reliability test of asset index: The internal coherence is tested in Table 1, where the last three columns compare the average ownership of each asset across the poor, middle and richest households. The robustness is tested in Table2 and can be found by comparing the differences between the ranking of the poorest 40 % of the households of the original asset index and their ranking based on the indexes constructed using some subsets of different variables. We have used 12 variable indicators of durable goods and seven variable indicators from housing infrastructure (toilet facility, wall material, floor material, roof material, source of drinking water, source of cooking fuel Table 2. The asset index produced a similar classification when different subsets of variables were used Table2. Therefore, this asset index is robust. Table 1 reports the scoring factors of 53 variables and their corresponding percentage in the wealth quintile. Generally, a variable with a positive factor score or weights contributes to higher SES, and conversely a variable with negative factor score weighs towards lower SES. Usually, the richest households (20% or fifth quintile) have assets with higher factor scores. For instance 8.1% of richest households have flush toilets whereas poorest and middle households are 0%; 85.2% of richest households have a cement floor against 0% of poorest households and 1.7% of middle households; 81.0% of richest households have a metal roof against 53.2% of middle households and 34.4% of poor households; 53.5% of fifth quintile own electricity against 0.8 % of third and fourth quintile and 0 % of first and second quintile; 86.6% of richest households own a mobile phone against 56.6% of middle and 3.3% of poor households; 9.5% of fifth quintile own a personal computer against 0% of middle and 0% of poor households (see Table 1). The higher percentage of poor households (40% or first and second quintile), would have assets with lower scores. For instance 98.9% of poor households own a latrine toilet against 87.3% of richest households. 100% of poor households own earth/sand floors against 94.3% of middle households and 10.0% of richest households; 7.7% of poor households own a thatch roof against 0.0% of richest households; 82.1% of poor households use wood as cooking fuel whereas 44.6 % of richest households use wood for cooking, 97.7 % of poor households own land usable for agriculture against 53.3% of fifth quintile (Table1).  The DHS data set are more reflective of longer-run household wealth or living standards (Filmer and Pritchett, 2001). Therefore, if we are interested in current resources available to Rwanda households an asset based index may not be the right measure. According to Falkingham and Namazie (2002) ownership does not always capture the quality of asset. Some variables may have a different relationship with SES across a sub-group. For instance, farmland ownership may be more reflective of wealth in rural areas than urban. In our analysis we have excluded ethnicity and religion because they are not included in the household data set of Rwanda even though religion seems to be more individual than household characteristics. Using assessment of the Demographic and Spatial Profiles of the poor based on the principal component scores and household ranking into five quintiles from the poorest to the richest, where the first two quintiles are commonly classified as poorer and poor (40%), the third and fourth quintiles as middle (40%) and the fifth quintile as richest (20%).
Therefore, in the current study, we considered the first two quintiles as cut-off points (40%) and computed a dichotomous variable (socio economic index) indicating whether the household is poor or not (Filmer and Pritchett, 1998;Vyas and Kamaranayake, 2006). The 40 th percentile was used as the poverty line (Achia, Wangombe and Khadioli, 2010;Vyas and Kamaranayake, 2006;Booysen, 2002). We classified the social economic status as poor if the household poverty index is below the 40 th percentile, otherwise it was classified as not poor. From Table 3, we observe that there is a very strong association between Province, gender, age, education level of household head, size of household and place of residence with socio economic status (SES).
We applied a logistic regression analysis of the socio-economic status (SES) as response variable and the demographic characteristics of the household as explanatory variables.

Model analysis
The deviance was used for model selection (Agresti, 2002;Collett, 2003). The main effects and possible combination of two-way interaction terms were fitted, whilst attention was given to the hierarchical principle in statistics modelling. The goodness of-fit of the selected model was tested using the Hosmer and Lemeshow test (Hosmer and Lemeshow, 2000) and the influential observations were detected by plotting the Cook's distance against the observation (Collett, 2003). The appropriateness of the link function was tested by refitting the model with linear predictor and its square as explanatory variables. The index plot of the Cook's distance statistic indicated that there was no influential observation since none of the Cook's distance is greater than 1 (Figure2).

Result from logistic regression
The deviance of the model with all main effects was 7256.980 and the deviance for the model with all main effects and three interactions was reduced to 7181.518. This deviance is smaller than all other nested models. Therefore, the model of all main effects and three interactions was selected. The Hosmer and Lemeshow chi-square was 7.3263 with 8 degree of freedom and p-value of 0.5019. This very large p-value shows that there was no lack of fit when the model was fitted to the data. The link function was appropriate because the linear predictor was significant (p-value<.0001) while the square of linear predictor was insignificant (p-value=0.1821) (Table 4.). The characteristics of the household head are important to the living conditions of all household members. From Table 5, the logistic regression results show that poverty increases with the decreasing level of education of the household head. A household headed by a household head with secondary education is 6.481 (p-value=0.0017) times more likely to be poor as compared to a household headed by a household head with a higher education. A household headed by a household head with primary education is 24.416 (p-value <.0001) times more likely to be poor as compared to a household headed by a household head with a higher education, and a household headed by a household head with no education is 41.971 (p-value <.0001) more likely to be poor as compared to a household headed by a household with a higher education.
We are interested in investigating the joint effect of gender and age of the household head. The results are presented in Figure 3.a. From Figure 3.a, we observe that a household headed by a female is more likely to be poor as compared to a household headed by a male from 21-72 years old. Further, from 72 years old a household headed by a female is less likely to be poor than a household headed by a male. It is also interesting to note the relationship between age of the household head and the size of the household. Figure 3.c. shows that poverty decreases with the increasing age of the household head regardless of the size of the household. Furthermore, for a household headed by a young person of 21 years old, poverty increases as the size of the household increases. This result suggests that old people should not live alone and that households headed by a young household head should be monitored by experienced household members. The relationship between provinces (Kigali city, Southern, Western, Northern and Eastern) and place of residence (urban or rural) is presented in Figure 3.b. Each province of Rwanda has urban and rural places. As Figure 3.b. indicates, an urban household is less likely to be poor compared to a rural household in all provinces. These results revealed that a rural household from Southern province is the poorest (Figure 3.b), while rural households from western and northern provinces are almost the same but more likely to be poor compared to a rural household from eastern province. A rural household from Kigali is less likely to be poor as compared to a rural household from eastern province.

Conclusion and recommendations
This paper used PCA to create a socio-economic index. The main advantage of this method over the classical methods based on income and consumption expenditure is that it avoids many of the measurement problems associated with the classical method, such as recall bias and seasonality. This method may be very important for poor countries which not only lack the requisite household survey data to design policies and evaluate program effectiveness, but also do not have the financial or human resources to generate such information. Based on the RDHS (2010) data, this study revealed that the demographic and spatial profile of poor households in Rwanda are: education of household head, gender of household head, age of the household head, place of residence (rural or urban), Region (province), and size of the household. This study found that the majority of poor households have low standards of education. This suggests that there is a need to improve existing access to higher education in order to speed up poverty reduction. It also found that rural households are more likely to be poor than urban households: this supports the existing policy of grouped settlement where people are advised to build their house in a township known as Imidugudu. But this suggests a special policy for targeting poverty reduction in rural households. The rural household of Southern province is more likely to be poor compared to other households from other provinces; this suggests provincial targeting in poverty reduction. It was found that poverty in younger headed households increases by increasing the size of household; this suggests that improvement of family planning may be through sensitization of existing policy or improving it. The findings of this study are consistent with those of (Achia, Wangombe and Khadioli, 2010;Vyas and Kamaranayake, 2006;Booysen, 2002;Filmer and Pritchett 2001). Furthermore the paper identified interaction effects between the age of household head and size of the household, the age of household head and the gender of the household head, place of residence (urban or rural) and the province which previous studies did not do. The use of asset index has some limitations such as the DHS data sets which are more reflective of longer-run household wealth or living standards (Filmer and Pritchett, 2001). Therefore, if we are interested in current resources available to Rwanda households an asset based index may not be the right measure. The ownership may not capture the quality of the asset and quantity (Falking and Namazie, 2002). One must be aware of the fact that DHS data is cross-sectional and may not be able to address causality; hence longitudinal studies which will solve the problem of causality are recommended for future research.