<back

Overview

This is a short example how to use the provided data of GISD. We will download the Index and a shapefile for German administrative boundaries at 31.12.2014 form www.geodatenzentrum.de to show how a simple analysis can be performed. All analysis is done in R.

Preparations

To perform the analysis, we need to utilize some packages.

# Libraries
require(leaflet) # interactive maps
require(rgdal)   # Read Shapefile
require(readr)  # Read CSV
require(dplyr)  # Datamanagement
require(ggplot)  # Plots
require(ggmap) # Mapping tools
require(stargazer) # Nice Tables
require(RColorBrewer)
require(raster)
require(classInt)

As a sample use case for the German Deprivation Index, we are going to Download the Values for 2011 and match them to another dataset containing Rates for High Speed Internet Access on the district (NUTS-3) level. Those Rates represent the Fraction of Households, having Access to high speed Internet in a District.

Spatial Polygons

To Plot the data as a map, we also need a spatial boundary file, containting the district boundaries in 2014. This can be obtained easily from geodatenzentrum.de (Not shown).

Data on Socioeconomic Deprivation (GISD)

GISD ist stored on GitHUB you can access this tutorial and all relevant datasets on the GitHub Page for this Project. We are using here data from the 2018 revision of the score, the release notes can be found here. We then go to *Files of the 2018 Update->Bund->Kreise) to get the Download link for the Values of the german Kreise (NUTS-3) for 2014.  

# Deprivation 2014
download.file("https://raw.githubusercontent.com/lekroll/GISD/master/Revisions/2018/Bund/Kreis/Kreis_2014.csv", temp)
Deprivation2014 <- read_csv(temp , locale = locale(encoding = "WINDOWS-1252"))

Outcome: High Speed Internet Access

  No analysis is complete without an outcome. We choose freely available data from the report https://www.gut-leben-in-deutschland.de which is also hosted freely available on GitHub. The indicator we are using is showing the proportion of households that have access to High Speed Internet (>50 MBit/s).  

# High Speed Internet Access
download.file("https://raw.githubusercontent.com/gut-leben-in-deutschland/bericht/master/content/07/03/districts.csv", temp)
HighSpeedInternet <- read_csv(temp)
HighSpeedInternet <- HighSpeedInternet %>% filter(year==2015) %>% dplyr::select(krs,value)
PlotDaten <- merge(Deprivation2014,HighSpeedInternet, by.x="Kreiskennziffer", by.y="krs")
names(PlotDaten)[9]<- "HighSpeedInternet"
SHPKreise@data <- left_join(SHPKreise@data,PlotDaten,by="Kreiskennziffer")

Results

  Now, that we have all the data imported, we can start plotting them. First a map of Deprivation:  

Map: Deprivation 2014

 
 

Next a map of High Speed Internet Access:

   

High Speed Internet Access 2015

   

So, this looks like there’s an association. Let’s check this out graphically:

 
 

Plot: Association of Deprivation and Internet Access

 

Well there is one, but is it significant?

 

Model 1: Linear Regression

linmodel <- lm(SHPKreise$HighSpeedInternet ~ SHPKreise$GISD_Score  )
stargazer(linmodel, type="html")
Dependent variable:
HighSpeedInternet
GISD_Score -0.347***
(0.052)
Constant 0.864***
(0.033)
Observations 428
R2 0.094
Adjusted R2 0.092
Residual Std. Error 0.184 (df = 426)
F Statistic 44.254*** (df = 1; 426)
Note: p<0.1; p<0.05; p<0.01

 

Might be, that we missed some Covariates in our model. What about the size and density of the population in the Regions? Maybe socioeconomic deprivation and urbanity are too closely associated? Let’s check this out in a second model:

 

Model 2: Linear Regression on High Speed Internet Access controlled for population size and density

SHPKreise$ShapeArea  = area(SHPKreise)/1000000
SHPKreise$PopDensity <- SHPKreise$Bevölkerung/SHPKreise$ShapeArea
SHPKreise$Population <- SHPKreise$Bevölkerung
linmodel2 <- lm(SHPKreise$HighSpeedInternet ~ SHPKreise$GISD_Score + SHPKreise$Population + SHPKreise$PopDensity + SHPKreise$ShapeArea)
stargazer(linmodel,linmodel2, type="html")
Dependent variable:
HighSpeedInternet
(1) (2)
GISD_Score -0.347*** -0.138***
(0.052) (0.045)
Population 0.00000***
(0.00000)
PopDensity -0.00000***
(0.00000)
ShapeArea -0.0002***
(0.00001)
Constant 0.864*** 0.832***
(0.033) (0.029)
Observations 428 428
R2 0.094 0.411
Adjusted R2 0.092 0.405
Residual Std. Error 0.184 (df = 426) 0.149 (df = 423)
F Statistic 44.254*** (df = 1; 426) 73.691*** (df = 4; 423)
Note: p<0.1; p<0.05; p<0.01

 

Conclusion

This example has shown, that it is comparably simple and straightforward to use GISD for Analysis of Regional Inequalities in Germany. It can be used for Analysis regarding the whole population as well as for state specific purposes. If you need help with GISD please contact us.

 

How to cite GISD

Lars Eric Kroll, Maria Schumann, Jens Hoebel, Thomas Lampert.Regional health differences – developing a socioeconomic deprivation index for German. Journal of Health Monitoring 2017 2(2). Robert Koch Institute, Berlin. Download

Import to Reference Manager