xg-old
xg-old copied to clipboard
Indexing small graphs blows up when including many paths
I have a relatively small (34MB) graph that when attempting to index uses over 16GB of RAM and is subsequently killed on my machine. @ekg thinks this may be caused by inefficiencies in storing small paths. As paths are stored as bit vectors over the entire graph (rather than storing them with a start and end position), short paths still take up O(length of graph) space, so having many of them in one graph can invite chaos.
How many paths are in your tiny graph? Is there a test case we could attach?
@adamnovak's implementation of succinct threads is a preferable way to do this. Construction is still linear in the number of threads, which isn't ideal, but it will help a lot in terms of memory usage. What needs to be done is to specify a set of positional paths which should be indexed using the O(1) technique, which is expensive and not compressible. The rest can still be queried or converted into positional paths as needed.