PBF error: illegal blob size
What version of osmium-tool are you using?
osmium version 1.16.0 (v1.16.0)
libosmium version 2.20.0
What operating system version are you using?
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
Tell us something about your system
- https://cloudprice.net/vm/Standard_E64ds_v5
- 512 GB RAM
- 64 vCPUs
What did you do exactly?
My intention is to extract Delaware out of USA. These are the steps that I take to achieve this:
- Get boundaries of USA:
osmium tags-filter --output USA.boundary.pbf --overwrite --no-progress USA.osm.pbf r/ISO3166-1:alpha3=USA - Get boundaries of Delaware out of USA admin boundaries file:
osmium tags-filter --output US-DE.boundary.pbf --overwrite --no-progress USA.boundary.pbf r/ISO3166-2=US-DE - Create a config.json file for the extract command: config.json
- Extract Delaware out of USA using its boundaries:
osmium extract --config config.json --option complete-partial-relations=65 --strategy smart --verbose --overwrite --no-progress USA.osm.pbf
These commands go through successfully and create the intended Delaware file, but the Delaware file seems to be corrupted. When I run osmium fileinfo -e Delaware.osm.pbf, it crashes showing the following error:
PBF error: illegal blob size
File:
Name: Delaware.osm.pbf
Format: PBF
Compression: none
Size: 650160326
Header:
Bounding boxes:
With history: no
Options:
generator=osmium/1.16.0
pbf_dense_nodes=true
pbf_optional_feature_0=Sort.Type_then_ID
sorting=Type_then_ID
What did you expect to happen?
Running osmium fileinfo -e Delaware.osm.pbf should not crash saying PBF error: illegal blob size.
What did you do to try analyzing the problem?
At the beginning I suspected that the input data is corrupted, so I executed the following commands on USA.osm.pbf:
-
osmium fileinfo -e USA.osm.pbf -
osmium check-refs USA.osm.pbfBoth of the above command properly show the statistics, and report that there is no referential integrity issues. Also, it is noteworthy to mention that I am able to cut all the other US states, and all of them pass fileinfo and check-refs commands.
To summarize, my input pbf is valid (because of the two above commands), but after extract the output pbf becomes corrupted (because it cannot complete any of the above commands). Given the described situation, would you agree that there might be an issue in the osmium tool? Do you need me to send you some pieces of the data?
I can not reproduce this. I used the USA file from Geofabrik and did the steps you describe and the resulting file is fine.
I had to repair the config file you provided, it does not work otherwise. Are you sure the osmium extract command even ran? Maybe the broken Delaware.osm.pbf is left over from an earlier attempt?
The other thing: The Delaware.osm.pbf you seem to have is rather large, 650 MB. Mine is only 19 MB.
The input PBF file that I have for USA is indeed enriched with other data sources. That's why it's bigger. Would it be ok for you if I upload a piece of USA that I'm working with somewhere, and send it to you to reproduce the issue? Shall I send the link privately to [email protected] ? Otherwise I'd violate my company's data privacy.
Also what was the repair that you did on the config file?
Are you sure the osmium extract command even ran? Maybe the broken Delaware.osm.pbf is left over from an earlier attempt?
Yes, I'm pretty sure that osmium extract ran, since I see the output of osmium about extract.
Ah, you should have mentioned that you are working with proprietary data. If you want support for that, please contact me by email and I'll send you my consulting rates.
@joto thank you for your answer.
If you want support for that, please contact me by email and I'll send you my consulting rates.
Ok. If we decide that we definitely must send our propriety data to you, I'll hand-over your consulting fee to my manager.
My suspicion is about can_add function in libosmium. Please see here:
https://github.com/osmcode/libosmium/blob/f88048769c13210ca81efca17668dc57ea64c632/include/osmium/io/detail/pbf_output_format.hpp#L362
In this line return size() < max_used_blob_size;, we compare if the current size of the blob is less than the max allowed blob size, and then we blindly add the entity to the blob, without checking if the entity to be added is not too big to exceed the limit.
In other words, we don't check the size() + sizeOfEntityToBeAdded < max_used_blob_size.
Is my understanding correct from the can_add function?
Side note: as workaround, I changed the max_entities_per_block to 1000 (instead of the default 8000), compiled the osmium + libosmium, and now the PBFs that osmium extract of the custom osmium creates are valid (no longer illegal blob size).
Would you agree that this is a signal that the can_add function does not return always the correct value, as I described above?
The can_add function returns what it is supposed to return. Libosmium works as intended for OSM data. Making libosmium work for non-OSM data is out of scope and I'll only work on that if somebody pays me for it. So unless somebody can demonstrate that this is a problem for OSM data or pays me for it, this will not change.