Spatial Data Science Introduction

Spatial data science deals with large-scale location-based data-intensive datasets. These data sets should be visualized and anylised in a way that there is a large gain of information. The challenging task is to find the correct method related to the individuell problem which gives you reasonable results.

Import of relevant packages.

import pandas as pd
import geopandas as gpd
import matplotlib

Vectordata and rasterdata

There are two common spatial data formats, Vectordata and rasterdata.

  • vectordata: spatial data, polygones, line, points with additional information about the points, lines… the information is combined in a table, name coordinates, …
  • rasterdata: e.g. satellite data

The data is in general stored in databases. There a special GIS file formats for spatial data e.g. GeoJSON files, Esri shapefiles or PostGIS (PostgreSQL) databases.

In this tutorial we focus on vectordata.

Import of geospatial data

The GeoPandas library is able to read many of the file formats.

Some helpful commands are:

  • with .head() we get the first rows of the dataset (same as in pandas).
  • it exists a ‘geometry’ collumn with information of the geometry type of out data
  • with .plot() we can visualize the data in a very easy way

Download from https://www.govdata.de/web/guest/daten/-/searchresult/q//f/format%3Ageojson%2Ctype%3Adataset%2C/s/relevance_desc your data sets. For example choose the Ortsteile, Straßenbahnlinien and Sehenswürdigkeiten of the Hanse- und Universitätsstadt Rostock.

Load the following data sets and try out the commands above.

! wget https://geo.sv.rostock.de/download/opendata/ortsteile/ortsteile.json ! wget https://geo.sv.rostock.de/download/opendata/sehenswuerdigkeiten/sehenswuerdigkeiten.json ! wget https://geo.sv.rostock.de/download/opendata/strassenbahnlinien/strassenbahnlinien.json ! wget https://geo.sv.rostock.de/download/opendata/bodenrichtwerte_2019/bodenrichtwerte_2019.json

ortsteile = gpd.read_file('ortsteile.json')
sehensw = gpd.read_file('sehenswuerdigkeiten.json')
strassenbahn = gpd.read_file('strassenbahnlinien.json')
# TODO: Add your code here

The GeoPandas libray returns a GeoDataFrame. This is helpful because we can use .geometry to get information about the geometry of our objects. It also contains some good methods for working with spatial data e.g. area, distance, buffer or intersection.

Geometries: points, lines, polygons

Vector Data is represended by the following objects.

  • ponit data: represents a single point in space
  • line data or line string: a sequence of points, tha form a line
  • polygon data: filled area, described by finite closed set of lines
  • Also cominations of multiple point, lines or polygons e.g. multilines are possible.

alt text Source: https://automating-gis-processes.github.io/2016/Lesson1-Geometric-Objects.html

Create own points, lines and polygons

Use the shapely library to create a Point, a Polygon or a Line.

Try to create Points, Polygons and Lines.

from shapely.geometry import Point, LineString, Polygon
Point(1,1)

svg

Polygon([(1,1),(2,2),(3,1),(1,1)])

svg

# TODO: Add your code here

Coordinates reference systems

We have to be careful with different coordinate reference systems (crs). There exists many different projections of the 3D world to 2D maps. For example is the WGS84 with the coordinates in longitude and latitude one of the most used ones. This coordinate system has epsg:4326.

If your data has different sourses it is very important to compare the crs and if the data has different crs to change at least one of the data sets to the other or both to a common coordinate reference system.

With .crs you can see the coordinate reference system of your data.

sehensw.crs
{'init': 'epsg:4326'}

All our data is from the same provider. And all the selected data sets have the same coordinate reference system.

To change from one coordinatesystem to another coordinatesystem use: new_crs = data_set.to_crs(eps_code)

Try to change the coordinate reference system to another one.

# TODO: Add your code here

Plot all data/layers together

To get a common plot of all different data (in the same crs) use ax as a name for the plot.

ax = ortsteile.plot(edgecolor='k',facecolor='none')
strassenbahn.plot(ax=ax)
sehensw.plot(ax=ax, color='red')
<matplotlib.axes._subplots.AxesSubplot at 0x7facad81f518>

png

After this first short introduction to spatial data and how to load and visualize spatial data we continue with some spatial relations and operations.

Spatial relationships and operations

Spatial relations

A very important tool of geospatial data is that we can deal with spatial relations. In GIS the relations are typically based on the DE-9IM model. Here are some examples for relations between spatial objects.

alt text Source: https://en.wikipedia.org/wiki/Spatial_relation

The figure shows examples of relations between objects. For example can a Polygon a touch a polygon b or a line a can cross a line b.

A whole list can be found on Wikipedia.

Test the operations on yout data. Choose two polygons and check if they touch or are within eachother. Create a Line between two point and repeat the operations from before. (Hint: use LineString from shapely.geometry)

from shapely.geometry import Point, LineString
Strom = sehensw.loc[sehensw['bezeichnung'] == 'Alter Strom','geometry'].squeeze()
Rathaus = sehensw.loc[sehensw['bezeichnung'] == 'Rostocker Rathaus','geometry'].squeeze()
line = LineString([Strom, Rathaus])
print(line)
LINESTRING (12.08811258836566 54.17860304250995, 12.14101254619718 54.08855961651935)
Stadtmitte = ortsteile.loc[ortsteile['gemeindeteil_name'] == 'Stadtmitte','geometry'].squeeze()
Hansa = ortsteile.loc[ortsteile['gemeindeteil_name'] == 'Hansaviertel','geometry'].squeeze()
Stadtmitte.within(Rathaus)
False
Rathaus.within(Stadtmitte)
True
gpd.GeoSeries([Rathaus,Stadtmitte]).plot(cmap='tab10')
<matplotlib.axes._subplots.AxesSubplot at 0x7facaaeb4048>

png

# TODO: Add your code here

Spatial relationships with GeoDataFrames

The same methods which are described in Spatial relations can be applied to GeoDataFrames.

To easily work with different objects in one dataset. We can form a GeoSeries and plot our data directly in one figure. Use the example to create a similar plot with other objects.

gpd.GeoSeries([Rathaus,Stadtmitte, line, Strom]).plot(cmap='tab10')
<matplotlib.axes._subplots.AxesSubplot at 0x7facaae32668>

png

We can also search for some object in the whole data set of points. For example

ortsteile.contains(Hansa)
0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16     True
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28    False
29    False
30    False
dtype: bool
# TODO: Add your code here

This is an overview of possible spatial relations:

  • equals
  • contains
  • crosses
  • disjoint
  • intersects
  • overlaps
  • touches
  • within
  • covers

More information can be found on [https://shapely.readthedocs.io/en/stable/manual.html#predicates-and-relationships]{https://shapely.readthedocs.io/en/stable/manual.html#predicates-and-relationships}.

Spatial operations

Additional to the spatial relations which returns only boolean values GeoPandas has operations to return new geometric objects. For example we can create a buffer around a point. This will return then a polygon.

More information can be found on [https://shapely.readthedocs.io/en/stable/manual.html#spatial-analysis-methods]{https://shapely.readthedocs.io/en/stable/manual.html#spatial-analysis-methods}.

To add a buffer around an object use .buffer()

s = Strom.buffer(0.1)
r = Rathaus.buffer(0.1)
s

svg

gpd.GeoSeries([s,r]).plot(cmap='tab10')
<matplotlib.axes._subplots.AxesSubplot at 0x7facaada18d0>

png

With this new objects it is possible to use the spatial relations introduces before e.g. intersection, union, difference

Be careful with the unit of the buffer. It depends on the coordinate reference system. If the buffer should be in meters the related coordinate reference system should be also in meters. Otherwise .buffer(1) can be also 1 degree.

Create a new object with a buffer of 100 m and apply some spatial relations to the new object. For example difference, union or intersection.

r.intersection(s)

svg

# TODO: Add your code here

There is a special union: unary_union create a multipolygon from a several polygons. Thest the uniary_union operation to get a multipolygon.

# TODO: Add your code here

Spatial join / spatial overlay

The concept of the spatial join operator is to combine information of geospatial datasets based on their spatial relationship. Based on polygons and points we determine for each point in which polygon they are located. The procedure is the same as in panda.

A non geometrical example would be to join two data sets based on a feature.

The operation .merge() is not doing something spatially in generall. If the two data sets have a common feature the tables are just merged together dependent on that feature.

In this tutorial we learned about spatial relations. The operation .within() checks if a point, polygon or line is within another polygon. But this is only for one point and not a whole data set. In GeoPandas this operation is sjoin.

If we have our data set from before. We have the Sehendswürdigkeiten and our Ortsteile. We can join both data sets together. In the attribute table are now information from both data sets combined.

joined = gpd.sjoin(sehensw, ortsteile,op='within', how='left')

joined.head()
uuid_left kreis_name_left kreis_schluessel_left gemeindeverband_name_left gemeindeverband_schluessel_left gemeinde_name_left gemeinde_schluessel_left gemeindeteil_name_left gemeindeteil_schluessel_left strasse_name ... uuid_right kreis_name_right kreis_schluessel_right gemeindeverband_name_right gemeindeverband_schluessel_right gemeinde_name_right gemeinde_schluessel_right gemeindeteil_name_right gemeindeteil_schluessel_right abkuerzung
0 4b2c73f4-9a6c-11e5-a047-0050569b7e95 Rostock 13003 Rostock, Hanse- und Universitätsstadt 130030000 Rostock, Hanse- und Universitätsstadt 130030000000 Seebad Warnemünde 0001 Alexandrinenstr. ... 06743318-15c0-11e5-9ba8-0050569b7e95 Rostock 13003 Rostock, Hanse- und Universitätsstadt 130030000 Rostock, Hanse- und Universitätsstadt 130030000000 Seebad Warnemünde 0001 Wmd
1 4b2f8b20-9a6c-11e5-a097-0050569b7e95 Rostock 13003 Rostock, Hanse- und Universitätsstadt 130030000 Rostock, Hanse- und Universitätsstadt 130030000000 Stadtmitte 0019 Wokrenterstr. ... 06742846-15c0-11e5-9ba6-0050569b7e95 Rostock 13003 Rostock, Hanse- und Universitätsstadt 130030000 Rostock, Hanse- und Universitätsstadt 130030000000 Stadtmitte 0019 Stm
2 4adb8dd6-9a6c-11e5-9fbd-0050569b7e95 Rostock 13003 Rostock, Hanse- und Universitätsstadt 130030000 Rostock, Hanse- und Universitätsstadt 130030000000 Stadtmitte 0019 Beim Waisenhaus ... 06742846-15c0-11e5-9ba6-0050569b7e95 Rostock 13003 Rostock, Hanse- und Universitätsstadt 130030000 Rostock, Hanse- und Universitätsstadt 130030000000 Stadtmitte 0019 Stm
3 4b2e750a-9a6c-11e5-a067-0050569b7e95 Rostock 13003 Rostock, Hanse- und Universitätsstadt 130030000 Rostock, Hanse- und Universitätsstadt 130030000000 Stadtmitte 0019 Lagerstr. ... 06742846-15c0-11e5-9ba6-0050569b7e95 Rostock 13003 Rostock, Hanse- und Universitätsstadt 130030000 Rostock, Hanse- und Universitätsstadt 130030000000 Stadtmitte 0019 Stm
4 4b2f199c-9a6c-11e5-a07d-0050569b7e95 Rostock 13003 Rostock, Hanse- und Universitätsstadt 130030000 Rostock, Hanse- und Universitätsstadt 130030000000 Stadtmitte 0019 Wokrenterstr. ... 06742846-15c0-11e5-9ba6-0050569b7e95 Rostock 13003 Rostock, Hanse- und Universitätsstadt 130030000 Rostock, Hanse- und Universitätsstadt 130030000000 Stadtmitte 0019 Stm

5 rows × 28 columns

In this example the point get information from the polygons. This operator just adds data to the dataset but it is not changeing the geometry.

Sometimes a changes of the geometry is the aim. This is possible with an overlay. If you want to create new geometries based on an intersection this is possible in the following way. For example find the areas which are not directly reachable by tram.

strassenbahn['geometry'] = strassenbahn.buffer(0.01)
diff = gpd.overlay(ortsteile, strassenbahn, how='difference')
diff.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7facaadc7860>

png

Remember:

spatial join: transfer attributes from one dataframe to another based on the spatial relationship spatial overlay: construct new geometries based on spatial operation between both dataframes (and combining attributes of both dataframes)

Visualization of spatial data

The following part is about the visualization of spatial data, finding instinct patterns and the modelling of spatial data.

Choropleth Maps

Choro means region, dealing with areal unit data (polygons) which have a spatial attribute that we are interested in studying. Most of the research fields are in social sciencces for example market sequentation or income levels.

df = gpd.read_file('bodenrichtwerte_2019.json')

We can plot our data as a choropleth map with the plot command. Our Data is in the GEoDataFrame format.

df.head()
uuid brw_id kreisname kreisschl geverna geverschl gesl geteilschl gena gasl ... weer bem frei brzname umdart lumnum typ guelt_von guelt_bis geometry
0 cf176aa2-d2a4-11e8-a1e3-0050569946ac 13003_0000060 Rostock 13003 Rostock, Hanse- und Universitätsstadt 130030000 130030000000 0009 Rostock, Hanse- und Universitätsstadt 13003 ... None 135 60 Plattenbaugebiet Groß Klein None None Wohnbauflächen 01-01-2019 31-12-2020 MULTIPOLYGON (((12.07260 54.16016, 12.07231 54...
1 cf17ea72-d2a4-11e8-a21a-0050569946ac 13003_0000177 Rostock 13003 Rostock, Hanse- und Universitätsstadt 130030000 130030000000 0019 Rostock, Hanse- und Universitätsstadt 13003 ... None 740 177 San.- Faule Grube/Marienkirche None None Sanierungsgebiet 01-01-2019 31-12-2020 MULTIPOLYGON (((12.13853 54.09008, 12.13831 54...
2 cf16b1d4-d2a4-11e8-a19e-0050569946ac 13003_0000135 Rostock 13003 Rostock, Hanse- und Universitätsstadt 130030000 130030000000 0014 Rostock, Hanse- und Universitätsstadt 13003 ... None 80 135 Tschaikowskistr. Kaserne None None Sonderbauflächen 01-01-2019 31-12-2020 MULTIPOLYGON (((12.09295 54.08641, 12.09379 54...
3 cf175c92-d2a4-11e8-a1dc-0050569946ac 13003_0000281 Rostock 13003 Rostock, Hanse- und Universitätsstadt 130030000 130030000000 0014 Rostock, Hanse- und Universitätsstadt 13003 ... None 8 281 Botanischer Garten None None sonstige Flächen 01-01-2019 31-12-2020 MULTIPOLYGON (((12.08918 54.09019, 12.08908 54...
4 cf161bca-d2a4-11e8-a158-0050569946ac 13003_0000138 Rostock 13003 Rostock, Hanse- und Universitätsstadt 130030000 130030000000 0014 Rostock, Hanse- und Universitätsstadt 13003 ... None 20 138 südl. Am Waldessaum/Sportplätze None None sonstige Flächen 01-01-2019 31-12-2020 MULTIPOLYGON (((12.08376 54.08299, 12.08141 54...

5 rows × 52 columns

df.plot(column='bem')
<matplotlib.axes._subplots.AxesSubplot at 0x7facaac06278>

png


Author: Dietlinde Dierks
Last modified: 15.10.2019