pollen
pollen copied to clipboard
`calyx_depth.py`: use the eDSL, stop using odgi's Python bindings
Currently, calyx_depth.py generates Calyx code
- Without the use of the Calyx eDSL (aka the builder library).
-
With the use of the odgi Python bindings (it's not obvious to see, but it uses
parse_data
which in turn uses the Python bindings).
I propose to change both of these.
- Using the eDSL will lighten the code significantly.
- The odgi Python bindings have proved hard to use. Besides, the current code uses the bindings only to infer the dimensions of the graph, and we have in-house ways of doing that now.
These relatively minor gains aside, the true win is that, after this, we will truly be using odgi only as a source of inspiration and oracle-testing, not as part of our workflow.
We currently do something like:
-
graph.gfa --odgi build--> graph.og --parse_data.py--> graph.data
-
calyx_depth.py (with one call to parse_data.py to learn the graph's dimensions) --> calyx_depth.futil
-
calyx_depth.futil
tangoes withgraph.data
in the usual Calyx way.
My proposal is something more like:
-
graph.gfa --pollen_data_gen--> graph.data
-
calyx_depth.py (with one call to mygfa to learn the graph's dimensions, and with calls to the Calyx eDSL for lightening) --> calyx_depth.futil
-
calyx_depth.futil
tangoes withgraph.data
in the usual Calyx way.
The first piece of this is already there: I already generate the same graph.data
using pollen_data_gen
as one would have using the usual route. The third piece should work if the first two work.
The second piece decomposes into two unrelated parts:
- [ ] #107
- [ ] #106
Wonderful; thank you for summarizing this so clearly! Indeed, both of these would be great advances that would "modernize" our hardware generation.
One small operational note: the eDSL adoption can happen incrementally, if it's a lot of work. That is, it's not necessary to flip everything over all at once; it suffices to convert individual pieces to use the builder
library one at a time since it interoperates with the "plain" AST library pretty smoothly.
Thanks Adrian! Dropping the "triage required" tag. This issue remains up for grabs.
This looks good to me! I agree that switching over the builder library isn't a top priority, but it would certainly be nice to at some point.
I don't necessarily think it is a bad thing to use odgi
in our hardware generation if it turns out that it's faster to transform a .gfa
graph to an .og
graph and then use odgi
to get its dimensions, rather than just obtaining the dimensions from the .gfa
file directly, but since .gfa
files seem to be sort of a "standard representation" for pangenomic graphs, it would be good to at least transition exine
to taking in .gfa
files as input rather than .og
files, which it currently expects. I also think it would be a win just to avoid making pollen users install odgi
.
Indeed. Sticking with .gfa
files is the right thing to do here because (a) they are the "source of truth," (b) odgi is a difficult-to-install dependency (to put it lightly), and (c) performance at hardware-generation time is not very important.