osm-read
osm-read copied to clipboard
Why not store IDs as BigInt?
Once I obtain the records containing string ids (including the referenced nodes in the ways) I create new BigInt objects to replace their string representations.
Has any consideration been made of parsing them into BigInt values within osm-read?
Perhaps it would be a useful option to have if it were not to be done by default. Making it an option would avoid breaking changes for those who expect string values.
I assume you create the BigInt by invoking it using the string id? For example: BigInt(id)
If this is the case I'm not sure if adding this behavior as a feature flag to osm-read is worth the effort. People which need the id in a number representation can easily do by themself.
Are there any more benefits of parsing the id within osm-read which I have missed @metabench ?
The earlier it's represented as BigInt the less time strings longer that 8 bytes need to be stored. It's not a big efficiency difference.
Getting the data from osm-read in the most appropriate type is the largest advantage as far as I can tell. Would make programming it easier and maybe a bit more performant.
There would likely be less processing to do between the data that's stored in the protobuf and having usable output if it were parsed as BigInt. I don't know whether or not there is anything in the osm-read codebase that would make it difficult to do, such as relying on a schema or dependency which already parses them into strings.
Looking at various TODOs such as https://github.com/marook/osm-read/blob/411aba24bc0e413d29d60e0249453c11ff1b8a52/lib/pbfParser.js#L335
There is no problem with integers of the size we get in OSM PBF files, such as for high node IDs. File positions beyond 2^32 are also fine.
"The Number.MAX_SAFE_INTEGER constant represents the maximum safe integer in JavaScript (253 – 1)." - MDN Web Docs.
It's worth noting that the numeric parts beyond 32bit are lost when doing binary operations such as '>>>'.
When representing these numbers in a TypedArray, 64 bit integer types should be used (signed or unsigned will work, but I go for unsigned when I am only supporting unsigned numbers).