osrm-backend icon indicating copy to clipboard operation
osrm-backend copied to clipboard

Do not generate intermediate .osrm file in osrm-extract.

Open SiarheiFedartsou opened this issue 3 years ago • 1 comments

Issue

closes https://github.com/Project-OSRM/osrm-backend/issues/6329

Tasklist

Requirements / Relations

Link any requirements here. Other pull requests this PR is based on?

SiarheiFedartsou avatar Aug 31 '22 18:08 SiarheiFedartsou

Some things to discuss:

  1. Now in order to generate .osrm file user has to pass --dump-nbg-graph flag to osrm-extract. Is it a good naming? Should we may be make it default and allow to disable by CLI option? I can imagine that someone somehow relies on this file and uses osrm-components with it(it is really low chance IMO, but still).
  2. The only place where this .osrm can be used is osrm-components tool which is as I understand purely for debug, so IMO .osrm feels a bit "loud" name for such kind of files. May be rename it to something?
  3. We suggest to pass this .osrm file everywhere in the docs as if it is really used as input file(e.g. ./osrm-customize ./berlin-latest.osrm). Actually it is just a base name. So I suggest to just remove this extension from everywhere: like ./osrm-partition ./berlin-latest. But for backward compatibility leave the ability to work if it was passed in the old way(we may emit warning if it used this way). I can imagine that a lot of people have working pipelines with this .osrm in them.

@mjjbell WDYT?

SiarheiFedartsou avatar Sep 01 '22 16:09 SiarheiFedartsou

@mjjbell can you take a look again?

And also let's discuss these questions:

Some things to discuss:

  1. Now in order to generate .osrm file user has to pass --dump-nbg-graph flag to osrm-extract. Is it a good naming? Should we may be make it default and allow to disable by CLI option? I can imagine that someone somehow relies on this file and uses osrm-components with it(it is really low chance IMO, but still).
  2. The only place where this .osrm can be used is osrm-components tool which is as I understand purely for debug, so IMO .osrm feels a bit "loud" name for such kind of files. May be rename it to something?
  3. We suggest to pass this .osrm file everywhere in the docs as if it is really used as input file(e.g. ./osrm-customize ./berlin-latest.osrm). Actually it is just a base name. So I suggest to just remove this extension from everywhere: like ./osrm-partition ./berlin-latest. But for backward compatibility leave the ability to work if it was passed in the old way(we may emit warning if it used this way). I can imagine that a lot of people have working pipelines with this .osrm in them.

@mjjbell WDYT?

For 1 I doubt someone really relies on it and it is super straightforward to just pass this additional flag if it is really needed.

For 3 I now more inclined to leave this .osrm extension everywhere with a note that it is not a specific file anymore, but base path for all other files we need(all of them have .osrm.* pattern in name)

SiarheiFedartsou avatar Sep 28 '22 14:09 SiarheiFedartsou

Now in order to generate .osrm file user has to pass --dump-nbg-graph flag to osrm-extract. Is it a good naming? Should we may be make it default and allow to disable by CLI option? I can imagine that someone somehow relies on this file and uses osrm-components with it(it is really low chance IMO, but still).

The only place where this .osrm can be used is osrm-components tool which is as I understand purely for debug, so IMO .osrm feels a bit "loud" name for such kind of files. May be rename it to something?

It's the internal node-based-graph representation prior to compression, creation of dummy edges and other stuff that doesn't exist in the OSM data. So I would say it's the correct arg name, but I agree that it's not obvious that this should generate an .osrm file.

Renaming it to .nbg seems like a good fit with the other toolchain output, given we have already have the OSM IDs and coordinates in .nbg_nodes, and the compressed graph in .cnbg.

We suggest to pass this .osrm file everywhere in the docs as if it is really used as input file(e.g. ./osrm-customize ./berlin-latest.osrm). Actually it is just a base name. So I suggest to just remove this extension from everywhere: like ./osrm-partition ./berlin-latest. But for backward compatibility leave the ability to work if it was passed in the old way(we may emit warning if it used this way). I can imagine that a lot of people have working pipelines with this .osrm in them.

Thinking longer-term, I can see an approach where we pass around a metadata/config file, that includes information about the settings used during processing, and detailing which datasets have been generated (e.g. if we wanted to support optional loading), which maybe could be a good use for an .osrm file? In which case it would be beneficial to keep the instructions for passing that as the argument. What do you think?

mjjbell avatar Sep 29 '22 19:09 mjjbell

For 3 I now more inclined to leave this .osrm extension everywhere with a note that it is not a specific file anymore, but base path for all other files we need(all of them have .osrm.* pattern in name)

Sorry I'm reading everything in the wrong order. If that's the case, then I agree, and it also aligns with the potential of a config/metadata file in the future.

mjjbell avatar Sep 29 '22 20:09 mjjbell

For 1 I doubt someone really relies on it and it is super straightforward to just pass this additional flag if it is really needed.

Agreed. Also, given the .osrm file is marked as temporary in the toolchain, I don't think users can have expectations of stability.

mjjbell avatar Sep 29 '22 20:09 mjjbell