Large memory consumption(10GB RAM), option to reduce it?
First of all, thank you so much for making this tool, can't wait for v0.3! That said, is it possible to add option to reduce memory consumption? (Eg. streaming data from csv files, caching data on file) Doesn't matter if it takes longer to finish, currently larger GTFS can take up to 10GB of memory which can hit limits on cloud servers.
For example: https://transitfeeds.com/p/ov/814/latest/download
Unfortunately, no. I have been planning to add a mode to gtfsparser which stores all data on disk and only holds ID->disk references in memory for a while now, though.
I'm not very well-versed with the technique, but would an (optional) mode that uses memory mapping work? This way, not all data would hava to be stored in memory at once, but the design (presumably) wouldn't have as much of an effect on gtfstidy's architecture as other solutions?
@patrickbr @derhuerst That would be great! Currently, increasing SWAP seems to work, but obviously not the best option. Memory mapping also could be worth looking into!
I noticed another thing with stop time minimization (-T). The trip ids are deleted(obviously that's from converting them to frequencies), but I tried using keep-trip-ids and couldnt find deleted trip-ids in output. I was thinking additional column to frequencies.txt would be great, (E.g. minimized_trip_ids) which would map all deleted trips to trip_id in frequencies. The use case could be saving all trip_ids to map them to gtfs-realtime data.