Geo Engine is a cloud-ready geo-spatial data processing platform. This documentation presents the foundations of the system and how to use it.
Geo Engine is a cloud-ready geospatial data processing platform. Here, we give an overview of its architecture and describe the main components.
Geo Engine consists of the backend and several frontends. The backend is subdivided into three subcomponents: services, operators, and data types. Data types specify primitives like feature collections for vector or gridded raster data. Moreover, it defines plots and basic operations, e.g., projections. The Operators block contains the processing engine and operators, i.e., source operators, raster- and vector time series processing. Furthermore, there are raster time series stream adapters, which can be used as building blocks for operators. The Services block contains protocols, e.g., OGC standard interfaces, as well as Geo Engine specific interfaces. These can be workflow registration, plot queries, and data upload. Each of the subcomponents can have additions in Geo Engine Pro, for instance, User Management, which is only available in Geo Engine Pro.
Operators
Services
Frontends for the Geo Engine are geoengine-ui for building web applications on top of Geo Engine. geoengine-python offers a Python library that can be used in Jupyter Notebooks. 3rd party applications like QGIS can access Geo Engine via its OGC interfaces.
geoengine-ui
geoengine-python
All components of Geo Engine are fully containerized and Docker-ready. Geo Engine builds upon several technologies, including GDAL, arrow, Angular, and OpenLayers.
A dataset is a loadable unit in Geo Engine. It is a parameter of a source operator (e.g., a GdalSource) and identifies the data that is loaded. Geo Engine supports different types of data, reflected by a DataId, which refers to internal datasets and external data.
GdalSource
DataId
An internal dataset is a dataset that is stored in the Geo Engine. Thus, it is efficiently accessible and can be used in workflows. The dataset is identified by a DatasetName and contains a DatasetDefinition that describes the data.
DatasetName
DatasetDefinition
The DatasetName is a string that consists of a namespace (optional) and a name, separated by a colon. For instance, namespace:name or name refer to datasets. The name can consist of characters (a-Z & A-Z), numbers (0-9), dashes (-) and underscores (_).
namespace:name
name
a-Z
A-Z
0-9
-
_
An external dataset is a dataset that is not stored in the Geo Engine. Geo Engine accesses it from a foreign location. The dataset is identified by an ExternalDataId that consists of a DataProviderId and a LayerId. While the DatasetProviderId is usually a UUID that identifies the data provider for Geo Engine itself, the LayerId is a string that identifies the layer in the data provider.
ExternalDataId
DataProviderId
LayerId
DatasetProviderId
The ExternalDataId is a string that consists of a namespace, the DataProviderId and a name, separated by a colon. The namespace cannot be omitted and is _ for the global namespace. For instance, _:{uuid}:name or namespace:{uuid}:name refer to datasets. If the name is a complex string, it can be enclosed by backticks, e.g., namespace:{uuid}:`name with spaces`.
_:{uuid}:name
namespace:{uuid}:name
namespace:{uuid}:`name with spaces`
A layer is a browsable unit in Geo Engine. In general, it is a named Workflow with additional meta information like a description and a default Colorizer. Layers are identified by a LayerId, which is usually a UUID. Every layer can be part of one or more Layer collections.
Workflow
Colorizer
Layer collections
Layer collections are groups of Layers. The collections themselves can be grouped inside other collections. Every layer collection has a name and a description. Layer collections, just like layers, can be part of one or more other layer collections.
Layers
Inside Geo Engine's web interface, you can browse the available layers and layer collections when adding data.
Inside Python, you can use the
ge.layer_collection()
function to get a list of the root collection which contains paths to all underlying layers.
While much of Geo Engine's functionality is Open Source and freely usable, some parts are only available in the Pro version. To use the Pro version, you need to purchase a Pro license. You may, however, be eligible for a free academic license. Please contact us at info@geoengine.de to request one.
The Pro version of Geo Engine includes a user management system. Users can either be anonymous or registered. On the first startup, an admin user will be created.
Geo Engine has a Role Based Access Control (RBAC) system. Users can have different roles and permissions on resources are granted to these roles. By default, they have a unique role for themselves and either the role anonymous or registered. The admin user has the role admin.
anonymous
registered
admin
Geo Engine allows defining permissions for resources like Datasets, Workflows, Layers and Projects. When a resource is created, the creator gets the Owner permission. This means they can do everything with the resource, including deleting it and permitting others to use it. For read-only access, the Read permission is available. The management of the permissions is done via the Permissions API. Admin users, i.e. users with the role admin assigned to them, can create new roles and assign them to users. The management of roles is also done via the Permissions API. Please refer to the API documentation (TODO: link) for more information. Alternatively, you can also use our Python library to manage permissions. Please refer to the Python library documentation for more information.
Owner
Read
Let's say Alice creates a project P. She automatically gets the Owner permission assigned on the project to her user role. Then, she adds a Read permission for User Bob. Before the permission is added, the system checks for the Owner permission on project P. As Alice is the owner, this operation succeeds. When Bob tries to access the project P the system checks for the Read permission which again succeeds.
Alice now wants to grant Charly and and Dave the Read permission as well. Both Charly and Dave have the role Friends of Alice. She decides to give the permission to the role instead of both users individually. Both Charly and Dave can now access project P, but Mallory, who does not have the role gets a PermissionDenied error. When later on Erin gets the role R assigned, she automatically gains access to project P as well.
Friends of Alice
PermissionDenied
The complete permission scenario looks like this
This chapter introduces the API of Geo Engine.
This section introduces the workflow API of Geo Engine.
Call /workflow/{workflowId}/metadata to get the result descriptor of the workflow. It describes the result of the workflow by data type, spatial reference, temporal and spatial extent and some more information that is specific to raster and vector results.
/workflow/{workflowId}/metadata
{ "type": "raster", "dataType": "U8", "spatialReference": "EPSG:4326", "measurement": { "type": "unitless" }, "time": { "start": "2014-01-01T00:00:00.000Z", "end": "2014-07-01T00:00:00.000Z" }, "bbox": { "upperLeftCoordinate": [-180.0, 90.0], "lowerRightCoordinate": [180.0, -90.0] } }
{ "type": "vector", "dataType": "MultiPoint", "spatialReference": "EPSG:4326", "columns": { "id": "int", "name": "text", "value": "float" }, "time": { "start": "2014-04-01T00:00:00.000Z", "end": "2014-07-01T00:00:00.000Z" }, "bbox": { "lowerLeftCoordinate": [3.9662060000000001, 45.9030360000000002], "upperRightCoordinate": [19.171284, 51.8473430000000022] } }
This chapter introduces the datatypes of Geo Engine.
A colorizer specifies a mapping between values and pixels/objects of an output image. Different variants of colorizers perform different kinds of mapping. In general, there are two families of colorizers: gradient and palette. Gradients are used to interpolate a continuous spectrum of colors between explicitly stated tuples (breakpoints) of a value and a color. A palette colorizer on the other hand, is used to generate a discrete set of colors, each mapped to a specific value.
breakpoints
palette
There are three miscellaneous fields in both of the gradient colorizers, namely noDataColor, overColor and underColor. The field noDataColor is used for all missing, NaN or no data values. The fields overColor and underColor are used for all overflowing values. For instance, if there are breakpoints defined from 0 to 10, but a value of -5 or 11 is mapped to a color, the respective field will be chosen instead. This way, you can specifically highlight values that lie outside of a given range.
noDataColor
overColor
underColor
NaN
0
10
-5
11
For a palette colorizer, there are no overColor and underColor fields. If a given value does not match any entry in the palette's definition, it is mapped to the defaultColor. The noDataColor works in the same manner as in the gradiant variants.
defaultColor
Colors are defined as RGBA arrays, where the first three values refer to red, green and blue and the fourth one to alpha, which means transparency. The values range from 0 to 255. For instance, [255, 255, 255, 255] is opaque white and [0, 0, 0, 127] is semi-transparent black.
255
[255, 255, 255, 255]
[0, 0, 0, 127]
A linear gradient linearly interpolates values within breakpoints of a color table. For instance, the example below is showing a gradient representing the physical conditions of water at different temperatures. The gradient is defined between 0.0 and 99.99, where 0.0 is shown as a light blue and 99.99 as blue. Any value less than 0.0, hence being ice, is shown as white. Values above 99.99 are shown as a light gray.
0.0
99.99
{ "type": "linearGradient", "breakpoints": [ { "value": 0.0, "color": [204, 229, 255, 255] }, { "value": 99.99, "color": [0, 0, 255, 255] } ], "noDataColor": [0, 0, 0, 0], "overColor": [224, 224, 224, 255], "underColor": [255, 255, 255, 255] }
A logarithmic gradient logarithmically interpolates values within breakpoints of a color table and allows only positive values. This colorizer is particularly useful in situations, where the data values increase exponentially and minor changes in the lower numbers would not be recognizable anymore.
Services report errors that try to use a logarithmic gradient specification with values where value <= 0.
value <= 0
{ "type": "logarithmicGradient", "breakpoints": [ { "value": 1.0, "color": [255, 255, 255, 255] }, { "value": 100.0, "color": [0, 0, 0, 255] } ], "noDataColor": [0, 0, 0, 0], "overColor": [0, 0, 0, 255], "underColor": [255, 255, 255, 255] }
A palette maps values as classes to a certain color. Unmapped values result in the defaultColor.
{ "type": "palette", "colors": { "1": [255, 255, 255, 255], "2": [0, 0, 0, 255] }, "noDataColor": [0, 0, 0, 0], "defaultColor": [0, 0, 0, 0] }
The RGBA colorizer maps U32 values "as is" to RGBA colors. 8 and 16 bit values are interpreted as grayscale colors. 64 bit values are interpreted as RGBA colors (but loose precision).
U32
{ "type": "rgba" }
Measurements describe stored data, i.e. what is measured and in which unit.
Some values do not have an associated measurement or no information is present.
{ "type": "unitless" }
The type continuous specifies a continuous variable that is measured in a certain unit.
continuous
{ "type": "continuous", "measurement": "Reflectance", "unit": "%" }
A classification maps numbers to named classes.
{ "type": "classification", "measurement": "Land Cover", "classes": { "0": "Grassland", "1": "Forest", "2": "Water" } }
A query rectangle defines a multi-dimensional spatial query in Geo Engine. It consists of three parts:
The spatial bounds behave differently for raster, vector, or plot queries. For raster queries, the spatial bounds define a spatial partition. This means the lower right corner of the spatial bounds is not included in the query. For vector queries, the spatial bounds define a bounding box, i.e., a rectangle where all bounds are included. Plot queries behave like vector queries.
{ "spatial_bounds": { "upper_left_coordinate": { "x": 10.0, "y": 20.0 }, "lower_right_coordinate": { "x": 70.0, "y": 80.0 } }, "time_interval": { "start": "2010-01-01T00:00:00Z", "end": "2011-01-01T00:00:00Z" }, "spatial_resolution": { "x": 1.0, "y": 1.0 } }
Rasters can have the following data types:
"U8"
A time instance is a single point in time. It is specified in UTC time zone 0 and has a maximum resolution of milliseconds.
Specifying in ISO 8601:
"2010-01-01T00:00:00Z"
Using the same date as a UNIX timestamp in milliseconds:
1262304000000
A time interval consists of two TimeInstances. Please be aware, that the interval is defined in close-open semantics. This means, that the start time is inclusive and the end time of the interval is exclusive. In mathematical notation, the interval is defined as [start, end).
TimeInstance
[start, end)
{ "start": "2010-01-01T00:00:00Z", "end": "2011-01-01T00:00:00Z" }
Using the same date as UNIX timestamps in milliseconds:
{ "start": 1262304000000, "end": 1293840000000 }
A time step consists of granularity and the number of steps. For instance, you can specify yearly steps by settings the granularity to Years and the number of steps to 1. Half-yearly steps can be specified by setting the granularity to Months and the number of steps to 6.
Years
Months
granularity
TimeGranularity
months
step
integer
The granularity of the time steps can take one of the following values.
millis
seconds
minutes
hours
days
years
{ "granularity": "months", "step": 1 }
This chapter introduces the operators of Geo Engine.
The ColumnRangeFilter operator allows filtering FeatureCollections. Users can define one or more data ranges for a column in the data table that is then filtered. The filter can be used for numerical as well as textual columns. Each range is inclusive, i.e., [start, end] includes as well the start as the end.
ColumnRangeFilter
FeatureCollection
[start, end]
start
end
For instance, you can filter a collection to only include column values that are either in the range 0-10 or 20-30. Moreover, you can specify the range a to k to dismiss all column values that start with larger letters in the alphabet.
a
k
column
"precipitation"
ranges
[[42,43]]
keepNulls
true
The ColumnRangeFilter operator expects exactly one vector input.
vector
SingleVectorSource
If the value in the column parameter is not a column of the feature collection, an error is thrown.
{ "type": "ColumnRangeFilter", "params": { "column": "population", "ranges": [[1000, 10000]], "keepNulls": false }, "sources": { "vector": { "type": "OgrSource", "params": { "data": "places", "attributeProjection": ["name", "population"] } } } }
{ "type": "ColumnRangeFilter", "params": { "column": "name", "ranges": [ ["a", "k"], ["v", "z"] ], "keepNulls": false }, "sources": { "vector": { "type": "OgrSource", "params": { "data": "places", "attributeProjection": ["name", "population"] } } } }
The Expression operator performs a pixel-wise mathematical expression on one or more raster sources. The expression is specified as a user-defined script in a very simple language. The output is a raster time series with the result of the expression and with time intervals that are the same as for the inputs. Users can specify an output data type. Internally, the expression is evaluated using floating-point numbers.
Expression
An example usage scenario is to calculate NDVI for a red and a near-infrared raster channel. The expression uses two raster sources, referred to as A and B, and calculates the formula (A - B) / (A + B). When the temporal resolution is months, our output NDVI will also be a monthly time series.
(A - B) / (A + B)
expression
outputType
RasterDataType
U8
outputMeasurement
Measurement
{ "type": "continuous", "measurement": "NDVI"}
mapNoData
bool
false
The following describes the types used in the parameters.
Expressions are simple scripts to perform pixel-wise computations. One can refer to the raster inputs as A for the first raster, B for the second, and so on. Furthermore, expressions can check with A IS NODATA, B IS NODATA, etc. for NO DATA values. This is important if mapNoData is set to true. Otherwise, NO DATA values are mapped automatically to the output NO DATA value. Finally, the value NODATA can be used to output NO DATA.
A
B
A IS NODATA
B IS NODATA
NODATA
Users can think of this implicit function signature for, e.g., two inputs:
fn (A: f64, B: f64) -> f64
As a start, expressions contain algebraic operations and mathematical functions.
(A + B) / 2
In addition, branches can be used to check for conditions.
if A IS NODATA { B } else { A }
Function calls can be used to access utility functions.
max(A, 0)
Currently, the following functions are available:
abs(a)
min(a, b)
min(a, b, c)
max(a, b)
max(a, b, c)
sqrt(a)
ln(a)
log10(a)
cos(a)
sin(a)
tan(a)
acos(a)
asin(a)
atan(a)
pi()
e()
round(a)
ceil(a)
floor(a)
mod(a, b)
to_degrees(a)
to_radians(a)
To generate more complex expressions, it is possible to have variable assignments.
let mean = (A + B) / 2; let coefficient = 0.357; mean * coefficient
Note, that all assignments are separated by semicolons. However, the last expression must be without a semicolon.
The Expression operator expects one to eight raster inputs.
SingleRasterSource
C
The parsing of the expression can fail if there are, e.g., syntax errors.
{ "type": "Expression", "params": { "expression": "(A - B) / (A + B)", "outputType": "F32", "mapNoData": false }, "sources": { "A": { "type": "GdalSource", "params": { "data": "sentinel2-b8" } }, "B": { "type": "GdalSource", "params": { "data": "sentinel2-b4" } } } }
The GdalSource is a source operator that reads raster data using GDAL. The counterpart for vector data is the OgrSource.
OgrSource
data
"ndvi"
None
If the given dataset does not exist or is not readable, an error is thrown.
{ "type": "GdalSource", "params": { "data": "ndvi" } }
The Interpolation operator artificially increases the resolution of a raster by interpolating the values of the input raster. If the operator is queried with a resolution that is coarser than the input resolution, the interpolation is not applied but the input raster is returned unchanged. Unless a particular input resolution is specified, the resolution of the input raster is used, if it is known.
Interpolation
interpolation
InterpolationMethod
inputResolution
InputResolution
The operator supports the following interpolation methods:
nearestNeighbor
biLinear
The operator supports the following input resolutions:
{"type": "source"}
{"type": "value", "x": 0.1, "y": 0.1}
The Interpolation operator expects exactly one raster input.
source
If the input resolution is set as "source" but the resolution of the input raster is not known, an error will be thrown.
{ "type": "Raster", "operator": { "type": "Interpolation", "params": { "interpolation": "biLinear", "inputResolution": { "type": "source" } }, "sources": { "raster": { "type": "GdalSource", "params": { "data": "ndvi" } } } } }
The LineSimplification operator allows simplifying FeatureCollections of (multi-)lines or (multi-)polygons by removing vertices. Users can select a simplification algorithm and specify an epsilon for parametrization. Alternatively, they can omit the epsilon, which results in the epsilon being automatically determined by the query's spatial resolution.
LineSimplification
epsilon
For instance, you can remove the vertices of a large country polygon for drawing it on a small map. This results in a simpler polygon that is easier to draw and reduces the amount of data that needs to be transferred.
1.0
algorithm
douglasPeucker
visvalingam
"douglasPeucker"
The LineSimplification operator expects exactly one vector input.
MultiPolygon
MultiLineString
{ "type": "LineSimplification", "params": { "algorithm": "douglasPeucker", "epsilon": 1.0 }, "sources": { "vector": { "type": "OgrSource", "params": { "data": "ne_10m_admin_0_countries" } } } }
{ "type": "LineSimplification", "params": { "algorithm": "visvalingam" }, "sources": { "vector": { "type": "OgrSource", "params": { "data": "ne_10m_admin_0_countries" } } } }
The NeighborhoodAggregate operator computes an aggregate function for a pixel and its neighborhood. The operator can be defined as a neighborhood matrix with either weights or predefined shapes and an aggregate function.
NeighborhoodAggregate
An example usage scenario is to calculate a Gaussian filter to smoothen or blur an image. For each time step in the raster time series, the operator computes the aggregate for each pixel and its neighborhood.
The output data type is the same as the input data type. As the matrix and the aggregate in- and outputs are defined as floating point values, the internal computation is done as floating point calculations.
neighborhood
Neighborhood
{ "type": "weightsMatrix", "weights": [ [1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0] ]}
aggregateFunction
AggregateFunction
"sum"
There are several types of neighborhoods. They define a matrix of weights. The rows and columns of this matrix must be odd.
The weights matrix is defined as an \( n \times m \) matrix of floating point values. It is applied to the pixel and its neighborhood to serve as the input for the aggregate function.
For instance, a vertical derivative filter (a component of a Sobel filter) can be defined like this:
{ "type": "weightsMatrix", "weights": [ [1.0, 0.0, -1.0], [2.0, 0.0, -2.0], [1.0, 0.0, -1.0] ] }
The aggregate function should be sum in this case.
sum
The rectangle neighborhood is defined by its shape \( n \times m \). The result is a weights matrix with all weights set to 1.0.
{ "type": "rectangle", "dimensions": [3, 3] }
The aggregate function computes a single value from a set of values. The following aggregate functions are supported:
standardDeviation
NO DATA
The NeighborhoodAggregate operator expects exactly one raster input.
If the neighborhood rows or columns are not positive or odd, an error will be thrown.
{ "type": "NeighborhoodAggregate", "params": { "neighborhood": { "type": "weightsMatrix", "weights": [ [1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0] ] }, "aggregateFunction": "sum" }, "sources": { "raster": { "type": "GdalSource", "params": { "data": "ndvi" } } } }
The OgrSource is a source operator that reads vector data using OGR. The counterpart for raster data is the GdalSource.
"places"
attributeProjection
Array<String>
["name", "population"]
attributeFilters
Array<AttributeFilter>
[{"attribute": "population", "ranges": [[1000, 10000]]}]
The AttributeFilter defines one or more ranges on the values of an attribute. The ranges include the lower and upper bounds of the range.
AttributeFilter
attribute
String
Array<Array<String \| Number>>
{ "type": "OgrSource", "params": { "data": "places", "attributeProjection": ["name", "population"], "attributeFilters": [ { "attribute": "population", "ranges": [[1000, 10000]], "keepNulls": false } ] } }
The PointInPolygon operator filters point features of a (multi-)point collection with polygons. In more detail, the points of each feature are checked against the polygons of the other collection. If one or more point is included in any polygon's ring, the feature is included in the output.
PointInPolygon
For instance, you can filter tree features inside the polygons of a forest. All features, that weren't inside any forest polygon, are considered either part of another forest or outliers and are thus removed.
The operator is parameterless.
The PointInPolygon operator expects two vector inputs.
points
polygons
If the points vector input is not a (multi-)point feature collection, an error is thrown.
If the polygons vector input is not a (multi-)polygon feature collection, an error is thrown.
{ "type": "PointInPolygon", "params": {}, "sources": { "points": { "type": "OgrSource", "params": { "data": "places", "attributeProjection": ["name", "population"] } }, "polygons": { "type": "OgrSource", "params": { "data": "germany_outline" } } } }
The Rasterization operator creates a raster from a point vector source. It offers two options for rasterization: A grid rasterization and a (gaussian) density rasterization (heatmap).
Rasterization
The Rasterization operator expects exactly one vector input.
params
GridOrDensity
{"type": "grid", ...}
GridOrDensity contains a field type which can have the value grid or density for a grid rasterization or density rasterization, respectively.
type
grid
density
GridOrDensity has additional fields which are parameters specific to the type of the rasterization. These are described below separately.
spatialResolution
SpatialResolution
{"x": 10.0, "y": 10.0}
originCoordinate
Coordinate2D
{"x": 0.0, "y": 0.0}
gridSizeMode
fixed
relative
"fixed"
The following describes the types used in the grid rasterization parameters.
The parameters spatialResolution and originCoordinate consist of two fields x and y which describe a resolution/position in x/y direction.
x
y
For gridSizeMode the two options fixed and relative are available. Fixed means the spatialResolution is interpreted as a constant grid cell size. Relative means the spatialResolution is used as a multiplier for a query's spatial resolution, making the resulting grid size adaptive to the query resolution.
Fixed
Relative
cutoff
number
0.01
stddev
The cutoff percentage (must be in [0, 1)) is treated as a hard cutoff point. A larger cutoff percentage leads to faster processing, however it also introduces inaccuracies in the result since points further than the derived radius away from a pixel do not influence its value. It is meant to be set such that the ignored density values are small enough to not make a visible difference in the resulting raster.
If the cutoff is not in [0, 1) or the stddev is negative, an error will be thrown.
{ "type": "Raster", "operator": { "type": "Rasterization", "params": { "type": "grid", "spatialResolution": { "x": 10, "y": 10 }, "gridSizeMode": "fixed", "originCoordinate": { "x": 0, "y": 0 } }, "sources": { "vector": { "type": "OgrSource", "params": { "data": "ne_10m_ports", "attributeProjection": null, "attributeFilters": null } } } } }
{ "type": "Raster", "operator": { "type": "Rasterization", "params": { "type": "density", "cutoff": 0.01, "stddev": 1 }, "sources": { "vector": { "type": "OgrSource", "params": { "data": "ne_10m_ports", "attributeProjection": null, "attributeFilters": null } } } } }
The raster scaling operator scales/unscales the values of a raster by a given slope factor and offset. This allows to shrink and expand the value range of the pixel values needed to store a raster. It also allows to shift values to all-positive values and back. We use the GDAL terms of scale and unscale. Raster data is often scaled to reduce memory/storage consumption. To get the "real" raster values the unscale operation is applied. Keep in mind that scaling might reduce the precision of the pixel values. (To actually reduce the size of the raster, use the raster type conversion operator and transform to a smaller datatype after scaling.)
The operator applies the following formulas to every pixel.
For unscaling the formula is: p_new = p_old * slope + offset. The key for this mode is mulSlopeAddOffset.
p_new = p_old * slope + offset
mulSlopeAddOffset
For scaling the formula is: p_new = (p_old - offset) / slope. The key for this mode is subOffsetDivSlope.
p_new = (p_old - offset) / slope
subOffsetDivSlope
p_old and p_new refer to the old and new pixel value. The slope and offset values are either properties attached to the input raster or a fixed value.
p_old
p_new
An example for Meteosat Second Generation properties is:
msg.calibration_offset
msg.calibration_slope
slope
SlopeOffsetSelection
{"type": "metadataKey" "domain": "", "key": "scale" }
offset
{"type": "constant" "value": 0.1 }
scalingMode
"mulSlopeAddOffset"
{"type": "continuous", "measurement": "Reflectance","unit": "%"}
* if no outputMeasurement is given, the measurement of the input raster is used.
The RasterScaling operator expects exactly one raster input.
RasterScaling
The SlopeOffsetSelection type is used to specify a metadata key or a constant value.
{"type": "auto"}
{"type": "constant", "value": number}
{"type": "metadataKey", "domain": string, "key": string}
* if set to "auto", the operator will use the values from the decicated (GDAL) raster properties for scale and offset.
"auto"
{ "type": "RasterScaling", "params": { "slope": { "type": "metadataKey", "domain": "", "key": "scale" }, "offset": { "type": "constant", "value": 1.0 }, "outputMeasurement": null, "scalingMode": "mulSlopeAddOffset" }, "sources": { "raster": { "type": "GdalSource", "params": { "data": "modis-b6" } } } }
The RasterTypeConversion operator allows changing the data type of raster data. It transforms all pixels into the new data type.
RasterTypeConversion
Applying the operator could lead to a loss of precision, e.g., converting a F32 value of 3.1 to a U8 will return a value of 3.
F32
3.1
3
If the old value is not valid in the new type it will clip at the value range of the new type. E.g., converting a F32 value of 300.0 to a U8 will return a value of 255.
300.0
outputDataType
The RasterTypeConversion operator expects exactly one raster input.
{ "type": "RasterTypeConversion", "params": { "outputDataType": "U8" }, "sources": { "raster": { "type": "GdalSource", "params": { "data": "ndvi" } } } }
The RasterVectorJoin operator allows combining a single vector input and multiple raster inputs. For each raster input, a new column is added to the collection from the vector input. The new column contains the value of the raster at the location of the vector feature. For features covering multiple pixels like MultiPoints or MultiPolygons, the value is calculated using an aggregation function selected by the user. The same is true if the temporal extent of a vector feature covers multiple raster time steps. More details are described below.
RasterVectorJoin
MultiPoints
MultiPolygons
Example: You have a collection of agricultural fields (Polygons) and a collection of raster images containing each pixel's monthly NDVI value. For your application, you want to know the NDVI value of each field. The RasterVectorJoin operator allows you to combine the vector and raster data and offers multiple spatial and temporal aggregation strategies. For example, you can use the first aggregation function to get the NDVI value of the first pixel that intersects with each field. This is useful for exploratory analysis since the computation is very fast. To calculate the mean NDVI value of all pixels that intersect with the field you should use the mean aggregation function. Since the NDVI data is a monthly time series, you have to specify the temporal aggregation function as well. The default is none which will create a new feature for each month. Other options are first and mean which will calculate the first or mean NDVI value for each field over time.
Polygons
first
mean
none
The RasterVectorJoin operator expects one vector input and one or more raster inputs.
sources
SingleVectorMultipleRasterSources
The RasterVectorJoin operator has the following parameters:
names
"["NDVI", "Elevation"]"
featureAggregation
"first"
featureAggregationIgnoreNoData
boolean
temporalAggregation
"none"
temporalAggregationIgnoreNoData
If the length of names is not equal to the number of raster inputs, an error is thrown.
{ "type": "RasterVectorJoin", "params": { "names": ["NDVI"], "featureAggregation": "first", "temporalAggregation": "mean", "temporalAggregationIgnoreNoData": true }, "sources": { "vector": { "type": "OgrSource", "params": { "data": "places" } }, "rasters": [ { "type": "GdalSource", "params": { "data": "ndvi" } } ] } }
The Reprojection operator reprojects data from one spatial reference system to another. It accepts exactly one input which can either be a raster or a vector data stream. The operator produces all data that, after reprojection, is contained in the query rectangle.
Reprojection
The concrete behavior depends on the data type.
The reprojection operator reprojects all coordinates of the features individually. The result contains all features that, after reprojection, are intersected by the query rectangle. If not all coordinates of the vector data stream could be projected, the operator returns an error.
To create tiles in the target projection, the operator first loads the corresponding tiles in the source projection. Note, that in order to create one reprojected output tile, it may be necessary to load multiple source tiles. For each output pixel, the operator takes the value of the input pixel nearest to its upper left corner.
In order to obtain precise results but avoid loading too much data, the operators estimate the resolution in which it loads the input raster stream. The estimate is based on the target resolution defined by the query rectangle and the relationship between the length of the diagonal of the query rectangle in both projections. Please refer to the source code for details.
In case a tile, or part of a tile, is not available in the source projection because it is outside of the defined extent, the operator will produce pixels with no data values. If the input raster stream has no no data value defined, the value 0 will be used instead.
targetSpatialReference
EPSG:4326
The Reprojection operator expects exactly one raster or vector input.
RasterOrVectorOperator
The operator returns an error if the target projection is unknown or if the input data cannot be reprojected.
{ "type": "Reprojection", "params": { "targetSpatialReference": "EPSG:4326" }, "sources": { "source": { "type": "GdalSource", "params": { "data": "ndvi" } } } }
The RGB composite operator computes pixel-wise rgba values on three raster sources, referred to as red, green, and blue. They fill the red, green, and blue parts of the output, which are U32 pixels, respectively. Internally, the four bytes of the (unitless) U32 are filled with red, green, blue and alpha information. The special rgba colorizer symbology treats the values "as is" and maps them to the RGB output.
RGB
rgba
redMin
redMax
redScale
[0, 1]
1
greenMax
greenMin
greenScale
0.5
blueMin
blueMax
blueScale
0.75
The RGB composite operator expects three raster inputs.
red
green
blue
The parsing of the parameters can fail if, e.g., scale values are not in the range [0, 1].
{ "type": "Rgb", "params": { "redMin": 0, "redMax": 2000, "redScale": 1, "greenMin": 0, "greenMax": 2000, "greenScale": 1, "blueMin": 0, "blueMax": 2000, "blueScale": 1 }, "sources": { "red": { "type": "GdalSource", "params": { "data": "sentinel2-b2" } }, "green": { "type": "GdalSource", "params": { "data": "sentinel2-b3" } }, "blue": { "type": "GdalSource", "params": { "data": "sentinel2-b4" } } } }
The TemporalRasterAggregation aggregates a raster time series into uniform time intervals (windows). The output is a time series that begins with the first window that contains the start of the query time. Each time slice has the same length, defined by the window parameter. The pixel values are computed by aggregating all rasters that are contained in the input and that are valid in the current window using the defined aggregation method. All output slices that are contained in the query time interval are produced by the operator. The optional windowReference parameter allows specifying a custom anchor point for the windows. This is the imagined start from which on the timeline is divided into uniform aggregation windows. By default, it is 1970-01-01T00:00:00Z which means that windows of, e.g., 1 hour or 1 month will begin at the full hour or the start of the month.
TemporalRasterAggregation
window
aggregation
windowReference
1970-01-01T00:00:00Z
An example usage scenario is to transform a daily raster time series into monthly aggregates. Here, the query should start at the beginning of the month and the window should be 1 month. The aggregation method allows calculating, e.g., the maximum or mean value for each pixel. If we perform a query with time [2021-01-01, 2021-04-01), we would get a time series with three time steps. If we perform a query with an instant like [2021-01-01, 2021-01-01), we will get a single time step containing the aggregated values for January 2021.
Aggregation
{ "type": "max", "ignoreNoData": false}
TimeStep
{ "granularity": "Months", "step": 1}
There are different methods that can be used to aggregate the raster time series. Encountering a no data value makes the aggregation value of a pixel also no data unless the ignoreNoData parameter is set to true.
ignoreNoData
min
max
last
count
Attention: For the variants sum and count, a saturating addition is used. This means, that if the sum of two values exceeds the maximum value of the data type, the result will be the maximum value of the data type. Thus, users must be aware to choose a data type that is large enough to hold the result of the aggregation.
The TemporalRasterAggregation operator expects exactly one raster input.
raster
If the aggregation method is first, last, or mean and the input raster has no no data value, an error is thrown.
{ "type": "TemporalRasterAggregation", "params": { "aggregation": { "type": "max", "ignoreNoData": false }, "window": { "granularity": "Months", "step": 1 }, "windowReference": "1970-01-01T00:00:00Z", "sources": { "raster": { "type": "GdalSource", "params": { "data": "ndvi" } } } } }
The TimeProjection projects vector dataset timestamps to new granularities and ranges. The output is a new vector dataset with the same geometry and attributes as the input. However, each time step is projected to a new time range. Moreover, the QueryRectangle's temporal extent is enlarged as well to include the projected time range.
TimeProjection
QueryRectangle
An example usage scenario is to transform snapshot observations into yearly time slices. For instance, animal occurrences are observed at a daily granularity. If you want to aggregate the data to a yearly granularity, you can use the TimeProjection operator. This will change the validity of each element in the dataset to the full year where it was observed. This is, for instance, useful when you want to combine it with raster time series and use different temporal semantics than the originally recorded validities.
{ "granularity": "years", "step": 1}
stepReference
The TimeProjection operator expects exactly one vector input.
If the step is negative, an error is thrown.
{ "type": "TimeProjection", "params": { "step": { "granularity": "years", "step": 1 } }, "sources": { "vector": { "type": "OgrSource", "params": { "data": "ndvi" } } } }
The TimeShift operator allows retrieving data temporally relative to the actual QueryRectangle. It shifts the query rectangle by a given amount of time and modifies the result data accordingly. Users have two options for specifying the time shift:
TimeShift
The output is either a stream of raster data or a stream of vector data depending on the input.
An example usage scenario is to compare the current time with the previous time of the same raster data. For instance, a raster source outputs monthly data aggregates of mean temperatures. If you want to compute the difference between the current month and the previous month, you can use the TimeShift operator. You will have two workflows. One is the unmodified temperature raster source. The other is the same source, shifted by one month. Then, you can use both workflows as sources of an Expression operator.
Note: This operator modifies the time values of the returned data. For rasters and vector data, it shifts the time intervals opposite to the time shift specified in the operator. This is necessary to have only data inside the result that is part of the QueryRectangle's time interval. As an example, we shift monthly data by one month to the past. Our query rectangle points to February. Then, the operator shifts the query rectangle to January. The data, originally valid for January, is shifted forward to February again, to fit into the original query rectangle, which is February.
absolute
"relative"
If type is relative, you need to specify the following parameters:
"months"
value
-1
If the type is absolute, you need to specify the following parameters:
timeInterval
TimeInterval
{ "start": "2010-01-01T00:00:00Z", "end": "2010-02-01T00:00:00Z"}
The TimeShift operator expects either one vector input or one raster input.
SingleRasterOrVectorSource
{ "type": "TimeShift", "params": { "type": "relative", "granularity": "months", "value": -1 }, "sources": { "source": { "type": "GdalSource", "params": { "data": "ndvi" } } } }
{ "type": "TimeShift", "params": { "type": "absolute", "time_interval": { "start": "2010-01-01T00:00:00Z", "end": "2010-02-01T00:00:00Z" } }, "sources": { "source": { "type": "GdalSource", "params": { "data": "ndvi" } } } }
The VectorJoin operator allows combining multiple vector inputs into a single feature collection. There are multiple join variants defined, which are described below.
VectorJoin
For instance, you want to join tabular data to a point collection of buildings. The point collection contains the geolocation of the buildings and their id. The attribute data collection has the building id and the height information. Combining the two feature collections leads to a single point collection with geolocation and height information.
EquiGeoToData
"EquiGeoToData"
leftColumn
"id"
rightColumn
rightColumn_suffix
right
"right"
The VectorJoin operator expects two vector inputs.
left
If the value in the left parameter is not a column of the left feature collection, an error is thrown.
If the value in the right parameter is not a column of the right feature collection, an error is thrown.
If the left input is not a geo data collection, an error is thrown.
If the right input is not a (non-geo) data collection, an error is thrown.
{ "type": "VectorJoin", "params": { "type": "EquiGeoToData", "leftColumn": "id", "rightColumn": "id", "rightColumnSuffix": "_other" }, "sources": { "points": { "type": "OgrSource", "params": { "data": "places", "attributeProjection": ["name", "population"] } }, "polygons": { "type": "OgrSource", "params": { "data": "germany_outline" } } } }
The VisualPointClustering is a clustering operator for point collections that removes clutter and preserves the spatial structure of the input. The output is a point collection with a count and radius attribute. The operator utilizes the input resolution of the query to determine when points, being displayed as circles, would overlap. Moreover, it allows aggregating non-geo attributes to preserve the other columns of the input. For more information on the algorithm, cf. the paper Beilschmidt, C. et al.: A Linear-Time Algorithm for the Aggregation and Visualization of Big Spatial Point Data. SIGSPATIAL/GIS 2017: 73:1-73:4.
VisualPointClustering
An exemplary use case for this operator is the visualization of point data in an online map application. There, you can use this operator as the final step of the workflow to cluster the points and display them as circles. These circles then pose a decluttered view of the data, e.g., via a WFS endpoint.
minRadiusPx
deltaPx
radiusColumn
"__radius"
countColumn
"__count"
columnAggregates
MeanNumber
StringSample
Null
{ "foo": { "columnName": "numericColumn", "aggregateType": "MeanNumber", "measurement": { "type": "unitless" } }, "bar": { "columnName": "textColumn", "aggregateType": "StringSample" }}
The VisualPointClustering operator expects exactly one vector input that must be a point collection.
If the source value vector is not a point collection, an error is thrown.
If multiple columns in columnAggregates have the same names, an error is thrown.
{ "type": "VisualPointClustering", "params": { "minRadiusPx": 8.0, "deltaPx": 1.0, "radiusColumn": "__radius", "countColumn": "__count", "columnAggregates": { "mean_population": { "columnName": "population", "aggregateType": "MeanNumber", "measurement": { "type": "unitless" } }, "sample_names": { "columnName": "name", "aggregateType": "StringSample" } } }, "sources": { "vector": { "type": "OgrSource", "params": { "data": "places", "attributeProjection": ["name", "population"] } } } }
Plots are special kinds of operators that generate visualizations.
Geo Engine supports three output types:
jsonPlain
jsonVega
imagePng
Thus, plots can contain statistics, visualizations, and images.
The BoxPlot is a plot operator that computes a box plot over
BoxPlot
Thereby, the operator considers all data in the given query rectangle.
The boxes of the plot span the 1st and 3rd quartile and highlight the median. The whiskers indicate the minimum and maximum values of the corresponding attribute or raster.
In the case of vector data, the operator generates one box for each of the selected numerical attributes. The operator returns an error if one of the selected attributes is not numeric.
columnNames
Vec<String>
["x","y"]
For raster data, the operator generates one box for each input raster.
Raster-1
Raster-2
["A","B"]
The operator consumes exactly one vector or multiple raster operators.
MultipleRasterOrSingleVectorSource
The operator returns an error in the following cases.
If your dataset contains infinite or NAN values, they are ignored for the computation. Moreover, if your dataset contains more than 10.000values (which is likely for rasters), the median and quartiles are estimated using the P^2 algorithm described in:
infinite
NAN
10.000
R. Jain and I. Chlamtac, The P^2 algorithm for dynamic calculation of quantiles and histograms without storing observations, Communications of the ACM, Volume 28 (October), Number 10, 1985, p. 1076-1085. https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf
{ "type": "BoxPlot", "params": { "columnNames": ["x", "y"] }, "sources": { "source": { "type": "OgrSource", "params": { "data": "ndvi" } } } }
{ "type": "BoxPlot", "params": { "columnNames": ["A", "B"] }, "sources": { "source": [ { "type": "GdalSource", "params": { "data": "ndvi" } }, { "type": "GdalSource", "params": { "data": "temperature" } } ] } }
The ClassHistogram is a plot operator that computes a histogram plot either over categorical attributes of a vector dataset or categorical values of a raster source. The output is a plot in Vega-Lite specification.
ClassHistogram
For instance, you want to plot the frequencies of the classes of a categorical attribute of a feature collection. Then you can use a class histogram to visualize and assess this.
columnName
string
"temperature"
The operator consumes either one vector or one raster operator.
The operator returns an error if…
The operator returns an error if
The operator only uses values of the categorical Measurement. It ignores missing or no-data values and values that are not covered by the Measurement.
{ "type": "ClassHistogram", "params": { "columnName": "foobar" }, "sources": { "vector": { "type": "OgrSource", "params": { "data": "ndvi" } } } }
The FeatureAttributeValuesOverTime is a plot operator that computes a multi-line plot for feature attribute values over time. For distinguishing features, the data requires an id column. The output is a plot in Vega-Lite specification.
FeatureAttributeValuesOverTime
For instance, you want to plot the NDVI values of a feature collection of trees. Then, you can use a multi-line plot to visualize the trees by their id.
idColumn
id
valueColumn
The operator consumes exactly one vector operator.
The operator returns an error if the selected columns ( idColumn and valueColumn) do not exist or valueColumn is not numeric.
The operator processes a maximum of 20 different ids. After recognizing more than 20 different ids, the operator ignores the rest.
20
{ "type": "FeatureAttributeValuesOverTime", "params": { "idColumn": "id", "valueColumn": "temperature" }, "sources": { "vector": { "type": "OgrSource", "params": { "data": "ndvi" } } } }
The Histogram is a plot operator that computes a histogram plot either over attributes of a vector dataset or values of a raster source. The output is a plot in Vega-Lite specification.
Histogram
For instance, you want to plot the data distribution of numeric attributes of a feature collection. Then you can use a histogram with a suitable number of buckets to visualize and assess this.
bounds
HistogramBounds
values
{ "min": 0.0, "max": 20.0}
"data"
buckets
Number
SquareRootChoiceRule
{ "type": "number", "value": 20}
interactive
The operator returns an error if the selected column (columnName) does not exist or is not numeric.
If bounds or buckets are not defined, the operator will determine these values by itself which requires processing the data twice.
If the buckets parameter is set to squareRootChoiceRule, the operator estimates it using the square root of the number of elements in the data.
squareRootChoiceRule
{ "type": "Histogram", "params": { "columnName": "foobar", "bounds": { "min": 5.0, "max": 10.0 }, "buckets": { "type": "number", "value": 15 }, "interactive": false }, "sources": { "vector": { "type": "OgrSource", "params": { "data": "ndvi" } } } }
The MeanRasterPixelValuesOverTime is a plot operator that computes a time series plot of mean raster values. For each time step in the raster time series, it computes one mean value. The output is a plot in Vega-Lite specification.
MeanRasterPixelValuesOverTime
For instance, you want to plot the mean temperature of a monthly raster time series. Then, you can use this operator to generate a time series plot.
timePosition
center
"start"
area
The operator consumes exactly one raster operator.
{ "type": "MeanRasterPixelValuesOverTime", "params": { "timePosition": "start", "area": true }, "sources": { "raster": { "type": "GdalSource", "params": { "data": "ndvi" } } } }
The PieChart is a plot operator that computes a pie chart for a given vector dataset. Moreover, the operator considers all data in the given query rectangle.
PieChart
There are multiple variants on how to compute the slices of the pie chart. In addition, it is possible to compute a donut chart instead of a standard pie chart.
"count"
"name"
The type parameter can be one of the following values:
32
If the attribute has a Measurement of type Classification, the operator uses the class name instead of the raw value.
Classification
{ "type": "PieChart", "params": { "type": "count", "columnName": "name", "donut": false }, "sources": { "vector": { "type": "OgrSource", "params": { "data": "ndvi" } } } }
The ScatterPlot is a plot operator that computes a scatter plot over two attributes of a vector dataset. Thereby, the operator considers all data in the given query rectangle.
ScatterPlot
In case of more than 500 points to plot, the representation changes from a regular scatter plot to a 2D Histogram with buckets determined from the underlying data.
500
columnX
"width"
columnY
"height"
The operator returns an error if one of the selected columns does not exist or is not numeric.
If your dataset contains infinite or NAN values, they are ignored for the computation. Moreover, if your dataset contains more than 10.000 values, the buckets of the histogram are generated based on those 10.000 values. Later values outside those bounds are ignored.
{ "type": "ScatterPlot", "params": { "columnX": "width", "columnY": "height" }, "sources": { "vector": { "type": "OgrSource", "params": { "data": "ndvi" } } } }
The Statistics operator is a plot operator that computes count statistics over
Statistics
The output is a JSON description.
For instance, you want to get an overview of a raster data source. Then, you can use this operator to get basic count statistics.
In the case of vector data, the operator generates one statistic for each of the selected numerical attributes. The operator returns an error if one of the selected attributes is not numeric.
For raster data, the operator generates one statistic for each input raster.
{ "type": "Statistics", "params": { "columnNames": ["A"] }, "sources": { "source": [ { "type": "GdalSource", "params": { "data": "ndvi" } } ] } }
{ "A": { "valueCount": 6, "validCount": 6, "min": 1.0, "max": 6.0, "mean": 3.5, "stddev": 1.707 } }