Spatial Data Science Introduction
Spatial data science deals with large-scale location-based data-intensive datasets. These data sets should be visualized and anylised in a way that there is a large gain of information. The challenging task is to find the correct method related to the individuell problem which gives you reasonable results.
Import of relevant packages.
import pandas as pd
import geopandas as gpd
import matplotlib
Vectordata and rasterdata
There are two common spatial data formats, Vectordata and rasterdata.
- vectordata: spatial data, polygones, line, points with additional information about the points, lines… the information is combined in a table, name coordinates, …
- rasterdata: e.g. satellite data
The data is in general stored in databases. There a special GIS file formats for spatial data e.g. GeoJSON files, Esri shapefiles or PostGIS (PostgreSQL) databases.
In this tutorial we focus on vectordata.
Import of geospatial data
The GeoPandas library is able to read many of the file formats.
Some helpful commands are:
- with .head() we get the first rows of the dataset (same as in pandas).
- it exists a ‘geometry’ collumn with information of the geometry type of out data
- with .plot() we can visualize the data in a very easy way
Download from https://www.govdata.de/web/guest/daten/-/searchresult/q//f/format%3Ageojson%2Ctype%3Adataset%2C/s/relevance_desc your data sets. For example choose the Ortsteile, Straßenbahnlinien and Sehenswürdigkeiten of the Hanse- und Universitätsstadt Rostock.
Load the following data sets and try out the commands above.
! wget https://geo.sv.rostock.de/download/opendata/ortsteile/ortsteile.json ! wget https://geo.sv.rostock.de/download/opendata/sehenswuerdigkeiten/sehenswuerdigkeiten.json ! wget https://geo.sv.rostock.de/download/opendata/strassenbahnlinien/strassenbahnlinien.json ! wget https://geo.sv.rostock.de/download/opendata/bodenrichtwerte_2019/bodenrichtwerte_2019.json
ortsteile = gpd.read_file('ortsteile.json')
sehensw = gpd.read_file('sehenswuerdigkeiten.json')
strassenbahn = gpd.read_file('strassenbahnlinien.json')
# TODO: Add your code here
The GeoPandas libray returns a GeoDataFrame. This is helpful because we can use .geometry to get information about the geometry of our objects. It also contains some good methods for working with spatial data e.g. area, distance, buffer or intersection.
Geometries: points, lines, polygons
Vector Data is represended by the following objects.
- ponit data: represents a single point in space
- line data or line string: a sequence of points, tha form a line
- polygon data: filled area, described by finite closed set of lines
- Also cominations of multiple point, lines or polygons e.g. multilines are possible.
Source: https://automating-gis-processes.github.io/2016/Lesson1-Geometric-Objects.html
Create own points, lines and polygons
Use the shapely library to create a Point, a Polygon or a Line.
Try to create Points, Polygons and Lines.
from shapely.geometry import Point, LineString, Polygon
Point(1,1)
Polygon([(1,1),(2,2),(3,1),(1,1)])
# TODO: Add your code here
Coordinates reference systems
We have to be careful with different coordinate reference systems (crs). There exists many different projections of the 3D world to 2D maps. For example is the WGS84 with the coordinates in longitude and latitude one of the most used ones. This coordinate system has epsg:4326.
If your data has different sourses it is very important to compare the crs and if the data has different crs to change at least one of the data sets to the other or both to a common coordinate reference system.
With .crs you can see the coordinate reference system of your data.
sehensw.crs
{'init': 'epsg:4326'}
All our data is from the same provider. And all the selected data sets have the same coordinate reference system.
To change from one coordinatesystem to another coordinatesystem use: new_crs = data_set.to_crs(eps_code)
Try to change the coordinate reference system to another one.
# TODO: Add your code here
Plot all data/layers together
To get a common plot of all different data (in the same crs) use ax as a name for the plot.
ax = ortsteile.plot(edgecolor='k',facecolor='none')
strassenbahn.plot(ax=ax)
sehensw.plot(ax=ax, color='red')
<matplotlib.axes._subplots.AxesSubplot at 0x7facad81f518>
After this first short introduction to spatial data and how to load and visualize spatial data we continue with some spatial relations and operations.
Spatial relationships and operations
Spatial relations
A very important tool of geospatial data is that we can deal with spatial relations. In GIS the relations are typically based on the DE-9IM model. Here are some examples for relations between spatial objects.
Source: https://en.wikipedia.org/wiki/Spatial_relation
The figure shows examples of relations between objects. For example can a Polygon a touch a polygon b or a line a can cross a line b.
A whole list can be found on Wikipedia.
Test the operations on yout data. Choose two polygons and check if they touch or are within eachother. Create a Line between two point and repeat the operations from before. (Hint: use LineString from shapely.geometry)
from shapely.geometry import Point, LineString
Strom = sehensw.loc[sehensw['bezeichnung'] == 'Alter Strom','geometry'].squeeze()
Rathaus = sehensw.loc[sehensw['bezeichnung'] == 'Rostocker Rathaus','geometry'].squeeze()
line = LineString([Strom, Rathaus])
print(line)
LINESTRING (12.08811258836566 54.17860304250995, 12.14101254619718 54.08855961651935)
Stadtmitte = ortsteile.loc[ortsteile['gemeindeteil_name'] == 'Stadtmitte','geometry'].squeeze()
Hansa = ortsteile.loc[ortsteile['gemeindeteil_name'] == 'Hansaviertel','geometry'].squeeze()
Stadtmitte.within(Rathaus)
False
Rathaus.within(Stadtmitte)
True
gpd.GeoSeries([Rathaus,Stadtmitte]).plot(cmap='tab10')
<matplotlib.axes._subplots.AxesSubplot at 0x7facaaeb4048>
# TODO: Add your code here
Spatial relationships with GeoDataFrames
The same methods which are described in Spatial relations can be applied to GeoDataFrames.
To easily work with different objects in one dataset. We can form a GeoSeries and plot our data directly in one figure. Use the example to create a similar plot with other objects.
gpd.GeoSeries([Rathaus,Stadtmitte, line, Strom]).plot(cmap='tab10')
<matplotlib.axes._subplots.AxesSubplot at 0x7facaae32668>
We can also search for some object in the whole data set of points. For example
ortsteile.contains(Hansa)
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 True
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
30 False
dtype: bool
# TODO: Add your code here
This is an overview of possible spatial relations:
- equals
- contains
- crosses
- disjoint
- intersects
- overlaps
- touches
- within
- covers
More information can be found on [https://shapely.readthedocs.io/en/stable/manual.html#predicates-and-relationships]{https://shapely.readthedocs.io/en/stable/manual.html#predicates-and-relationships}.
Spatial operations
Additional to the spatial relations which returns only boolean values GeoPandas has operations to return new geometric objects. For example we can create a buffer around a point. This will return then a polygon.
More information can be found on [https://shapely.readthedocs.io/en/stable/manual.html#spatial-analysis-methods]{https://shapely.readthedocs.io/en/stable/manual.html#spatial-analysis-methods}.
To add a buffer around an object use .buffer()
s = Strom.buffer(0.1)
r = Rathaus.buffer(0.1)
s
gpd.GeoSeries([s,r]).plot(cmap='tab10')
<matplotlib.axes._subplots.AxesSubplot at 0x7facaada18d0>
With this new objects it is possible to use the spatial relations introduces before e.g. intersection, union, difference
Be careful with the unit of the buffer. It depends on the coordinate reference system. If the buffer should be in meters the related coordinate reference system should be also in meters. Otherwise .buffer(1) can be also 1 degree.
Create a new object with a buffer of 100 m and apply some spatial relations to the new object. For example difference, union or intersection.
r.intersection(s)
# TODO: Add your code here
There is a special union: unary_union create a multipolygon from a several polygons. Thest the uniary_union operation to get a multipolygon.
# TODO: Add your code here
Spatial join / spatial overlay
The concept of the spatial join operator is to combine information of geospatial datasets based on their spatial relationship. Based on polygons and points we determine for each point in which polygon they are located. The procedure is the same as in panda.
A non geometrical example would be to join two data sets based on a feature.
The operation .merge() is not doing something spatially in generall. If the two data sets have a common feature the tables are just merged together dependent on that feature.
In this tutorial we learned about spatial relations. The operation .within() checks if a point, polygon or line is within another polygon. But this is only for one point and not a whole data set. In GeoPandas this operation is sjoin.
If we have our data set from before. We have the Sehendswürdigkeiten and our Ortsteile. We can join both data sets together. In the attribute table are now information from both data sets combined.
joined = gpd.sjoin(sehensw, ortsteile,op='within', how='left')
joined.head()
uuid_left | kreis_name_left | kreis_schluessel_left | gemeindeverband_name_left | gemeindeverband_schluessel_left | gemeinde_name_left | gemeinde_schluessel_left | gemeindeteil_name_left | gemeindeteil_schluessel_left | strasse_name | ... | uuid_right | kreis_name_right | kreis_schluessel_right | gemeindeverband_name_right | gemeindeverband_schluessel_right | gemeinde_name_right | gemeinde_schluessel_right | gemeindeteil_name_right | gemeindeteil_schluessel_right | abkuerzung | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 4b2c73f4-9a6c-11e5-a047-0050569b7e95 | Rostock | 13003 | Rostock, Hanse- und Universitätsstadt | 130030000 | Rostock, Hanse- und Universitätsstadt | 130030000000 | Seebad Warnemünde | 0001 | Alexandrinenstr. | ... | 06743318-15c0-11e5-9ba8-0050569b7e95 | Rostock | 13003 | Rostock, Hanse- und Universitätsstadt | 130030000 | Rostock, Hanse- und Universitätsstadt | 130030000000 | Seebad Warnemünde | 0001 | Wmd |
1 | 4b2f8b20-9a6c-11e5-a097-0050569b7e95 | Rostock | 13003 | Rostock, Hanse- und Universitätsstadt | 130030000 | Rostock, Hanse- und Universitätsstadt | 130030000000 | Stadtmitte | 0019 | Wokrenterstr. | ... | 06742846-15c0-11e5-9ba6-0050569b7e95 | Rostock | 13003 | Rostock, Hanse- und Universitätsstadt | 130030000 | Rostock, Hanse- und Universitätsstadt | 130030000000 | Stadtmitte | 0019 | Stm |
2 | 4adb8dd6-9a6c-11e5-9fbd-0050569b7e95 | Rostock | 13003 | Rostock, Hanse- und Universitätsstadt | 130030000 | Rostock, Hanse- und Universitätsstadt | 130030000000 | Stadtmitte | 0019 | Beim Waisenhaus | ... | 06742846-15c0-11e5-9ba6-0050569b7e95 | Rostock | 13003 | Rostock, Hanse- und Universitätsstadt | 130030000 | Rostock, Hanse- und Universitätsstadt | 130030000000 | Stadtmitte | 0019 | Stm |
3 | 4b2e750a-9a6c-11e5-a067-0050569b7e95 | Rostock | 13003 | Rostock, Hanse- und Universitätsstadt | 130030000 | Rostock, Hanse- und Universitätsstadt | 130030000000 | Stadtmitte | 0019 | Lagerstr. | ... | 06742846-15c0-11e5-9ba6-0050569b7e95 | Rostock | 13003 | Rostock, Hanse- und Universitätsstadt | 130030000 | Rostock, Hanse- und Universitätsstadt | 130030000000 | Stadtmitte | 0019 | Stm |
4 | 4b2f199c-9a6c-11e5-a07d-0050569b7e95 | Rostock | 13003 | Rostock, Hanse- und Universitätsstadt | 130030000 | Rostock, Hanse- und Universitätsstadt | 130030000000 | Stadtmitte | 0019 | Wokrenterstr. | ... | 06742846-15c0-11e5-9ba6-0050569b7e95 | Rostock | 13003 | Rostock, Hanse- und Universitätsstadt | 130030000 | Rostock, Hanse- und Universitätsstadt | 130030000000 | Stadtmitte | 0019 | Stm |
5 rows × 28 columns
In this example the point get information from the polygons. This operator just adds data to the dataset but it is not changeing the geometry.
Sometimes a changes of the geometry is the aim. This is possible with an overlay. If you want to create new geometries based on an intersection this is possible in the following way. For example find the areas which are not directly reachable by tram.
strassenbahn['geometry'] = strassenbahn.buffer(0.01)
diff = gpd.overlay(ortsteile, strassenbahn, how='difference')
diff.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7facaadc7860>
Remember:
spatial join: transfer attributes from one dataframe to another based on the spatial relationship spatial overlay: construct new geometries based on spatial operation between both dataframes (and combining attributes of both dataframes)
Visualization of spatial data
The following part is about the visualization of spatial data, finding instinct patterns and the modelling of spatial data.
Choropleth Maps
Choro means region, dealing with areal unit data (polygons) which have a spatial attribute that we are interested in studying. Most of the research fields are in social sciencces for example market sequentation or income levels.
df = gpd.read_file('bodenrichtwerte_2019.json')
We can plot our data as a choropleth map with the plot command. Our Data is in the GEoDataFrame format.
df.head()
uuid | brw_id | kreisname | kreisschl | geverna | geverschl | gesl | geteilschl | gena | gasl | ... | weer | bem | frei | brzname | umdart | lumnum | typ | guelt_von | guelt_bis | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | cf176aa2-d2a4-11e8-a1e3-0050569946ac | 13003_0000060 | Rostock | 13003 | Rostock, Hanse- und Universitätsstadt | 130030000 | 130030000000 | 0009 | Rostock, Hanse- und Universitätsstadt | 13003 | ... | None | 135 | 60 | Plattenbaugebiet Groß Klein | None | None | Wohnbauflächen | 01-01-2019 | 31-12-2020 | MULTIPOLYGON (((12.07260 54.16016, 12.07231 54... |
1 | cf17ea72-d2a4-11e8-a21a-0050569946ac | 13003_0000177 | Rostock | 13003 | Rostock, Hanse- und Universitätsstadt | 130030000 | 130030000000 | 0019 | Rostock, Hanse- und Universitätsstadt | 13003 | ... | None | 740 | 177 | San.- Faule Grube/Marienkirche | None | None | Sanierungsgebiet | 01-01-2019 | 31-12-2020 | MULTIPOLYGON (((12.13853 54.09008, 12.13831 54... |
2 | cf16b1d4-d2a4-11e8-a19e-0050569946ac | 13003_0000135 | Rostock | 13003 | Rostock, Hanse- und Universitätsstadt | 130030000 | 130030000000 | 0014 | Rostock, Hanse- und Universitätsstadt | 13003 | ... | None | 80 | 135 | Tschaikowskistr. Kaserne | None | None | Sonderbauflächen | 01-01-2019 | 31-12-2020 | MULTIPOLYGON (((12.09295 54.08641, 12.09379 54... |
3 | cf175c92-d2a4-11e8-a1dc-0050569946ac | 13003_0000281 | Rostock | 13003 | Rostock, Hanse- und Universitätsstadt | 130030000 | 130030000000 | 0014 | Rostock, Hanse- und Universitätsstadt | 13003 | ... | None | 8 | 281 | Botanischer Garten | None | None | sonstige Flächen | 01-01-2019 | 31-12-2020 | MULTIPOLYGON (((12.08918 54.09019, 12.08908 54... |
4 | cf161bca-d2a4-11e8-a158-0050569946ac | 13003_0000138 | Rostock | 13003 | Rostock, Hanse- und Universitätsstadt | 130030000 | 130030000000 | 0014 | Rostock, Hanse- und Universitätsstadt | 13003 | ... | None | 20 | 138 | südl. Am Waldessaum/Sportplätze | None | None | sonstige Flächen | 01-01-2019 | 31-12-2020 | MULTIPOLYGON (((12.08376 54.08299, 12.08141 54... |
5 rows × 52 columns
df.plot(column='bem')
<matplotlib.axes._subplots.AxesSubplot at 0x7facaac06278>
Author: | Dietlinde Dierks |
Last modified: | 15.10.2019 |