oshdb icon indicating copy to clipboard operation
oshdb copied to clipboard

Update ETL docu

Open SlowMo24 opened this issue 4 years ago • 10 comments

Bug Using the current ETL docu creates an H2-file that cannot be directly used in the Ohsome-API ("extract.region" not parsable -> it is empty).

To Reproduce Run the ETL as documented here: https://github.com/GIScience/oshdb/tree/master/oshdb-tool/etl then start a local ohsome-api as docuemented here: https://gitlab.gistools.geog.uni-heidelberg.de/giscience/big-data/ohsome/ohsome-api/tree/master

SlowMo24 avatar Apr 15 '20 11:04 SlowMo24

@FabiKo117 the problem here is that the extract.region is not an mandatory parameter (--ploy or --bbox) of the Extract step.

rtroilo avatar Apr 15 '20 13:04 rtroilo

But I think the ohsome API would always need an info about the extract region to react accordingly if a request is partially/fully outside of the region where data is available.

FabiKo117 avatar Apr 15 '20 14:04 FabiKo117

the ohsome API would always need an info about the extract region

yeah, but the ohsome API is not the only thing in the world potentially using an OSHDB extract. :sweat_smile:

There are also theoretically some situations where the region is unknown (e.g. when transforming an OSM history file from an unknown source). Therefore it IMO makes sense that this is an optional parameter in the ETL tool.

--

I think this is simply "just" a missing documentation problem, not really a bug.

tyrasd avatar Apr 16 '20 10:04 tyrasd

There are 2 more infos in the extract,

  • header.bbox (which comes from the pbf header itself, not always set!) and
  • data.bbox (which is computed from the node locations)

I'm not sure if we want this but the ohsome-api could fall back to one of those with a warning.

rtroilo avatar Apr 16 '20 10:04 rtroilo

I think our tools should work seamlessly together with default config. The Ohsome-api handles missing metadata just fine i guess. So in my opinion ETL could also either place a parsable dummy value for extract.region if it is not given or omit the information. ATM it creates an invalid geojson which crashes the ohsome api.

SlowMo24 avatar Apr 16 '20 11:04 SlowMo24

yeah, but the ohsome API is not the only thing in the world potentially using an OSHDB extract.

I did not say that it should become a mandatory parameter :P

I am totally fine with just adding this info to the docu, so people that want to use it for the ohsome API know what they have to do.

FabiKo117 avatar Apr 16 '20 11:04 FabiKo117

When documenting it also has to be mentioned that the ohsome-API only accepts a GeoJSON-geometry, not a Feature or FeatureCollection. I just faced that problem with an extract downloaded from our download server.

SlowMo24 avatar Apr 16 '20 12:04 SlowMo24

data.bbox (which is computed from the node locations) […] the ohsome-api could fall back to one of those with a warning.

@rtroilo In general, we don't want to fall back to this in the ohsome-api. This is because the computed bbox of an OSM (history) extract is not suitable to determine the "usable coverage area" of the extract. A user of the ohsome-api would not see the warning, if there was one while starting up the API.

tyrasd avatar Apr 16 '20 15:04 tyrasd

So in my opinion ETL could also either place a parsable dummy value for extract.region if it is not given or omit the information.

@SlowMo24 What dummy value should be included? something like {"type": "Polygon", "coordinates": [[[0,0],[0,0],[0,0],[0,0],[0,0]]]} would be the only "not incorrect" thing to put in as a dummy value, but isn't very useful: Instead of crashing the ohsome-api's startup routine, it would kind of crash every incoming request (which is worse in my opinion).

IMHO the current situation is almost fine: The ohsome-api requires special metadata field to be set, which the ETL tool cannot determine and set automatically. Maybe instead of crashing the ohsome api should print a meaningful error message, for example:

Required field "extract.region" in metadata table of oshdb source not found. If you work with a self-produced OSHDB H2 extract, please don't forget to specify the necessary extract region information as documented in .

tyrasd avatar Apr 16 '20 15:04 tyrasd

This will be addressed by @rtroilo when re-integrating the importer.

Hagellach37 avatar Oct 20 '22 13:10 Hagellach37