Postgres / PostGIS – Of Rasters and GeoJSONs

Over the past few weeks I have been looking into Neo4j Spatial and PostGIS, the spatial plugin for Postgres. The starting point was, that I needed to store and access naturally tessellated tiles or cells, each with an extent of 100m X 100m in an area of around 370km X 240km, equalling a total of 8’880’000 individual cells. The goal is to store information into these tiles regarding some attributes which correspond to the spatial extent of the tiles. For example if you are interested in the number of animals living in a certain area or land cover collection, a spatial resolution of 100m X 100m is already quite sufficient.

My goal is for users to be able to contribute to the land cover of a cell. This means that cells will be empty until a user specifies a land cover type of such a cell. This means that I must be able to add values to the cells as time goes by. First I looked at storing all cells in the Neo4j Spatial r-Tree. I played around a bit and found that Neo4j excels at answering routing related questions, but is not very efficient when handling large amounts of naturally tessellated spatial data. I then thought, well naturally tessellated tiles in a predefined extent is nothing more than a raster, so I started to look into PostGIS.

In PostGIS the workflow consisted of creating an empty tiled raster of the extent I needed, adding bands to the raster where every band corresponded to a specific piece of information I wanted to store and then querying and updating the values in a specific extent.

First of all I created a table which will hold my raster tiles:

CREATE TABLE gameField (rid SERIAL PRIMARY KEY, rast raster);

Then I created an empty tiled raster:

            ST_MakeEmptyRaster( 3700, 2400, 2483800, 1301000, 100, -100, 0, 0, 2056)
        , 1, '32BUI'::text, 0, NULL)
    , 10,10, TRUE, NULL)

This query first creates a raster with ST_MakeEmptyRaster() with an extent of 3700 X 2400 cells, with the top left corner being at coordinates [2483800, 1301000] in the coordinate system with the SRID = 2056. It also tells the function to make every cell with an extent of 100 X 100 units of the coordinate system, in this coordinate system this means 100m X 100m. CAUTION: you might be wondering what the minus sign means before the second “100”. I found out that PostGIS needs the top-left corner to be specified (as the documentation states) but the y-values of the cells must be minus as to create the raster from the top to bottom. This I do not quite understand yet but after a long time trying to figure out why my raster was not created correctly, this was the solution.

The query then adds a band to the created raster with ST_AddBand() and specifies the band to take unsigned 32 bit integers as possible values and have the default value be 0.

The query then takes the created raster which now has 1 band and tiles this into tiles with the command ST_Tile(). The tiles are set to have an extent of 10 X 10 cells. This means, that every tile contains 10 X 10 cells of the original raster, meaning each tile has an extent of (10 X 100)m X (10 X 100)m = 1km X 1km. I tried various different tile sizes and found that 10 X 10 cells per tile proved to have the best performance with 100 X 100 cells per tile only being slightly slower. More on performance below.

All the created tiles are finally stored into the “rast” column of the gameField table, which we created in the first step.

Now I have a tiled raster of the extend I want at the location I want with 1 band allowing unsigned 32bit integer values to be stored. This is already great, but we can boost the performance considerably by creating a spatial index on the tiles:

CREATE INDEX gameField_rast_gist_idx ON gameField USING GIST (ST_ConvexHull(rast));

This creates a new spatial intex using the convex hull (which in this case equals the bounding box) of the individual tiles.

Now I have 1 band, but I really need another band, this is no problem as bands can be added to existing rasters:

    rast = ST_AddBand(rast, 2, '32BUI'::text, 0, NULL);

Now I have a raster with an index and 2 bands. You might want to change values of cells in specific locations. This is no problem and can be achieved with the query:

    rast = ST_SetValue(rast, 1, ST_Transform(ST_SetSRID(ST_MakePoint(6.25,45.25),4326),2056),987654321) 
    ST_Intersects(rast, ST_Transform(ST_SetSRID(ST_MakePoint(6.25,45.25),4326),2056));

Lets go through the query. I first state that I would like to update the raster column of the table gameField. I then say that the rast should equal some function. The function ST_SetValue() allows values of cells at a specific location to be changed. I declare that I would like to set a new value in the rast column, in band 1 (second parameter of the function) at the location of the point (third parameter). The point is created in the function with ST_MakePoint(). A new point does not have a coordinate system and so one has to be assigned. I define that the point has the coordinate system of WGS84 (SRID = 4326) and then transform it to my specific coordinate system (SRID = 2056) with the function ST_Transform(). I deliberately added this step so that the code could be used with GPS-data, which is mostly in WGS84.

This would already suffice to change the value, but as you might remember, I created an index and using ST_Intersects() greatly increases performance! The function says, only look at the raster which has an intersection with the point specified as the second parameter.

Last but not least, I wanted to get data from the database. As I am interested in efficiently visualizing the results, I only want cells whose centroids are within a radius of 400m of a specific position. In addition, I would like a GeoJSON to be returned with the coordinates being the coordinates of the centroids and the properties containing the values of the two bands along with a buffered bounding box of the cells. The GeoJSON can then easily be integrated into mapping libraries such as Leaflet or OpenLayers.

    row_to_json(fc) as GeoJSON
        'FeatureCollection' as type, 
        array_to_json(array_agg(f)) as features
    FROM (
        SELECT 'Feature' as type,
        (ST_AsGeoJSON(ST_Transform(ST_SetSRID((b.UID).geom,2056),4326))::json ) as geometry,
        SELECT row_to_json(t) 
            FROM (select ST_X(b.BBOX_BL) as blx, ST_Y(b.BBOX_BL) as bly, ST_X(b.BBOX_TR) as trx, ST_Y(b.BBOX_TR) as try, (b.UID).val as UID, (b.BID).val as BID ) t
        ) as properties
        FROM (
                c.UID as UID,
                c.BID as BID,
                ST_Transform(ST_SetSRID(ST_MakePoint(ST_X((c.UID).geom)-49,ST_Y((c.UID).geom)-49),2056),4326) AS BBOX_BL,
                ST_Transform(ST_SetSRID(ST_MakePoint(ST_X((c.UID).geom)+49,ST_Y((c.UID).geom)+49),2056),4326) AS BBOX_TR
            FROM (                        
                    (ST_PixelAsCentroids(u.r,1,false)) As UID,
                    (ST_PixelAsCentroids(u.r,2,false)) As BID
                    SELECT ST_Union(rast) as r
                    FROM gameField
                    WHERE ST_Intersects(rast, ST_Buffer(ST_Transform(ST_SetSRID(ST_MakePoint(".$lng.",".$lat.") ,4326),2056), 400))
                ) as u
            ) as c
                    ST_Intersects((c.UID).geom, ST_Buffer(ST_Transform(ST_SetSRID(ST_MakePoint(".$lng.",".$lat.") ,4326),2056), 400))
                AND ST_Intersects((c.BID).geom, ST_Buffer(ST_Transform(ST_SetSRID(ST_MakePoint(".$lng.",".$lat.") ,4326),2056), 400))

        ) as b   
    ) as f 
)  as fc;

The first thing that happens in this query is that the tiles which are within 400m of a specific point (".$lng.",".$lat.") are returned with ST_Intersects() and ST_Buffer() and then joined to a new raster with ST_Union(). The coordinates of the point are not hard coded as this query is part of an API which takes a location as an input and returns all cells whose centroid are within 400m of mentioned location. From this new raster, the centroids and values of the cells are returned with ST_PixelAsCentroids(), where the first parameter is a reference to the new raster containing all tiles within 400m of the point and the second parameter defines the band out of which the values should be taken. In addition, the WHERE clause defines that only the cells within 400m are to be returned. The rest of the query returns the values of the cells along with the respective bounding box of each cell (BBOX_BL, BBOX_TR) as properties and the coordinates of the centroid as the geometry.coordinates of the GeoJSON with the function ST_AsGeoJSON().

The resulting GeoJSON looks like this:


Where the coordinates in the “geometry” key of the GeoJSON are the coordinates of the centroid of each cell. The properties blx,bly,trx,try correspond to the bottom left X, bottom left Y, top right X and top right Y coordinates of the bounding box. The property “uid” corresponds to the cell value of BAND 1 in the raster at the feature location and the property “bid” corresponds to the cell value of BAND 2 in the raster at the location of the feature.

This pipeline can be used to efficiently store and retrieve values from a large raster. For example, if you wanted to make an app where users can contribute to the land cover mapping efforts, you could make 2 bands. The first band would store the ID of the user who contributed and the second band would hold the ID of the land cover type. This allows you to query not only the different land cover types, but also which user contributed in which location. If you need to store additional information or attributes, you could add bands or you could use the raster as a spacial index where each cell has a unique ID, which is used as a reference to an additional attribute table with the format: Cell_ID | Attribute 1 | Attribute 2 | ... | Attribute n.
Land cover is just one case, you could easily apply the same workflow to a plethora of different cases dealing with the storing and querying of spatial data in large areas.

The image below shows the result in Leaflet:

The image shows the position of the user (red dot), which acts as the location in our query. The green circle shows the area which is in a radius of 400m of the user. The rectangles show all the tiles whose centroids are within the green circle, where the grey rectangles have a value of “0” and the orange rectangle (middle right) has a value of “>0”. in addition, the blue circles show the locations of the centroids of each cell. This is nothing more than a visualisation of the GeoJSON returned by our query discussed above.

A quick word on performance. As I have already stated on stackoverflow the performance of queries is highly dependent on tiling and indexing:

Test Raster Extent: 5000 X 2000 = 10’000’000 cells

Querying the raster no tiling, no index: 522ms
Querying the raster with tiling, no index: 227ms
Querying the raster with tiling, with index: 84ms

Share this post