buttery-eel
buttery-eel copied to clipboard
Model convention and availability of the aligner in buttery-eel
@SBurnard Moving this https://github.com/hasindu2008/nci-scripts/issues/1 conversation to here as these questions are about buttery-eel.
It is a great suggestion about the model conversions. @Psy-Fer, we should maintain some server to standalone model mapping page in https://github.com/Psy-Fer/buttery-eel/blob/main/docs/. The tricky thing is these models keep changing from version to version, so perhaps we can document the use of the following command.
cd /path/to/ont-dorado-server/data/
grep "model" *.cfg | tr ':' '\t' | tr '=' '\t' | awk '{print $1"\t"$2"\t"$3}' | sort -k1,1
@SBurnard Buttery-eel relies on the dorado-server from ONT (which does the live basecalling in MinKNOW) to implement the basecalling. So these model configuration convention comes from ONT's dorado-server and due to some reason, they have a different convention in standalone Dorado. How I find the models is as follows on Gadi:
cd /g/data/if89/apps/buttery-eel/0.5.1+dorado7.4.12/ont-dorado-server/data/
grep "model" *.cfg | tr ':' '\t' | tr '=' '\t' | awk '{print $1"\t"$2"\t"$3}' | sort -k1,1
About the second question, slow5-dorado is a fork of the standalone Dorado, so all the extra features in Dorado such as alignment are there. But we have not made a release recently:
- Dorado has a zillion dependencies and make a few days to get the things compiled
- The codebase changes are upside-down changes making it hard to keep adding the slow5 support
The good thing with the dorado-server is we can simply get the binary from ONT and use the client-server approach (implemented in buttery-eel) to access BLOW5 files.
I am not sure if Dorado server supports alignment. @Psy-Fer Does it? However, even if it supports alignment I personally believe that having basecalling and alignment to be modular has greater benefits:
- The user can transparently know which minimap version and parameters they use, can tune parameters for their needs and even change to a different aligner if they wish
- Having it separate means that users will likely cite those aligners they use, which would otherwise be just buried under "Dorado"
- I rather trust standalone minimap2 than a modified version coming from ONT. In fact, in f5c, several issues that arose ended up finally being attributed to some weird thing in Dorado alignment
- ONT has a track record of NOT honouring backward compatibility, so there is a chance that the API for getting the alignment information would keep changing (thus we will get an extra thing to rewrite things everytime)
- Having separate modules improves the maintainability. "One tool does all the things" approach leads to complex systems that have their own set of problems, and would create a dependency and maintenance nightmare.
I can go on .....
Let me cite the following extract from Heng Li, Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences, Bioinformatics "We hope in this process, the community could standardize the input and output formats of various tools, so that a developer could focus on a component he or she understands best. Such a modular approach has been proved to be fruitful in the development of short-read tools—in fact, the best short-read pipelines all consist of components developed by different groups—and will be equally beneficial to the future development of long-read mappers and assemblers."
I understand that having a single command that runs all could be convenient, but not sure if it really worth considering the above factors. What do you think?