This is a short example how to use the provided data of GISD. We will download the Index and a shapefile for German administrative boundaries at 31.12.2014 form www.geodatenzentrum.de to show how a simple analysis can be performed. All analysis is done in R.
To perform the analysis, we need to utilize some packages.
# Libraries
require(leaflet) # interactive maps
require(rgdal) # Read Shapefile
require(readr) # Read CSV
require(dplyr) # Datamanagement
require(ggplot) # Plots
require(ggmap) # Mapping tools
require(stargazer) # Nice Tables
require(RColorBrewer)
require(raster)
require(classInt)
As a sample use case for the German Deprivation Index, we are going to Download the Values for 2011 and match them to another dataset containing Rates for High Speed Internet Access on the district (NUTS-3) level. Those Rates represent the Fraction of Households, having Access to high speed Internet in a District.
To Plot the data as a map, we also need a spatial boundary file, containting the district boundaries in 2014. This can be obtained easily from geodatenzentrum.de (Not shown).
GISD ist stored on GitHUB you can access this tutorial and all relevant datasets on the GitHub Page for this Project. We are using here data from the 2018 revision of the score, the release notes can be found here. We then go to *Files of the 2018 Update->Bund->Kreise) to get the Download link for the Values of the german Kreise (NUTS-3) for 2014.
# Deprivation 2014
download.file("https://raw.githubusercontent.com/lekroll/GISD/master/Revisions/2018/Bund/Kreis/Kreis_2014.csv", temp)
Deprivation2014 <- read_csv(temp , locale = locale(encoding = "WINDOWS-1252"))
No analysis is complete without an outcome. We choose freely available data from the report https://www.gut-leben-in-deutschland.de which is also hosted freely available on GitHub. The indicator we are using is showing the proportion of households that have access to High Speed Internet (>50 MBit/s).
# High Speed Internet Access
download.file("https://raw.githubusercontent.com/gut-leben-in-deutschland/bericht/master/content/07/03/districts.csv", temp)
HighSpeedInternet <- read_csv(temp)
HighSpeedInternet <- HighSpeedInternet %>% filter(year==2015) %>% dplyr::select(krs,value)
PlotDaten <- merge(Deprivation2014,HighSpeedInternet, by.x="Kreiskennziffer", by.y="krs")
names(PlotDaten)[9]<- "HighSpeedInternet"
SHPKreise@data <- left_join(SHPKreise@data,PlotDaten,by="Kreiskennziffer")
Now, that we have all the data imported, we can start plotting them. First a map of Deprivation:
Next a map of High Speed Internet Access:
So, this looks like there’s an association. Let’s check this out graphically:
Well there is one, but is it significant?
linmodel <- lm(SHPKreise$HighSpeedInternet ~ SHPKreise$GISD_Score )
stargazer(linmodel, type="html")
Dependent variable: | |
HighSpeedInternet | |
GISD_Score | -0.347*** |
(0.052) | |
Constant | 0.864*** |
(0.033) | |
Observations | 428 |
R2 | 0.094 |
Adjusted R2 | 0.092 |
Residual Std. Error | 0.184 (df = 426) |
F Statistic | 44.254*** (df = 1; 426) |
Note: | p<0.1; p<0.05; p<0.01 |
Might be, that we missed some Covariates in our model. What about the size and density of the population in the Regions? Maybe socioeconomic deprivation and urbanity are too closely associated? Let’s check this out in a second model:
SHPKreise$ShapeArea = area(SHPKreise)/1000000
SHPKreise$PopDensity <- SHPKreise$Bevölkerung/SHPKreise$ShapeArea
SHPKreise$Population <- SHPKreise$Bevölkerung
linmodel2 <- lm(SHPKreise$HighSpeedInternet ~ SHPKreise$GISD_Score + SHPKreise$Population + SHPKreise$PopDensity + SHPKreise$ShapeArea)
stargazer(linmodel,linmodel2, type="html")
Dependent variable: | ||
HighSpeedInternet | ||
(1) | (2) | |
GISD_Score | -0.347*** | -0.138*** |
(0.052) | (0.045) | |
Population | 0.00000*** | |
(0.00000) | ||
PopDensity | -0.00000*** | |
(0.00000) | ||
ShapeArea | -0.0002*** | |
(0.00001) | ||
Constant | 0.864*** | 0.832*** |
(0.033) | (0.029) | |
Observations | 428 | 428 |
R2 | 0.094 | 0.411 |
Adjusted R2 | 0.092 | 0.405 |
Residual Std. Error | 0.184 (df = 426) | 0.149 (df = 423) |
F Statistic | 44.254*** (df = 1; 426) | 73.691*** (df = 4; 423) |
Note: | p<0.1; p<0.05; p<0.01 |
This example has shown, that it is comparably simple and straightforward to use GISD for Analysis of Regional Inequalities in Germany. It can be used for Analysis regarding the whole population as well as for state specific purposes. If you need help with GISD please contact us.
Lars Eric Kroll, Maria Schumann, Jens Hoebel, Thomas Lampert.Regional health differences – developing a socioeconomic deprivation index for German. Journal of Health Monitoring 2017 2(2). Robert Koch Institute, Berlin. Download