q2-qemistree
q2-qemistree copied to clipboard
Updating Qemistree so it runs from SIRIUS workspace and generic input files
Hi @anupriyatripathi
This is to initiate a discussion to resolve the current issue users are having running Qemistree.
Context:
- SIRIUS v.4.4 introduced new parameters, new tools (like CANOPUS/COSMIC), and new folder structure and other changes.
- SIRIUS v.4.0.1 (that Qemistree requires) is not supported since April 31th 2021 and it is not possible to run CSIFingerID server side. (https://bio.informatik.uni-jena.de/2021/04/sirius-4-0-1-end-of-life/).
- As a result -> it is presently impossible to run Qemistree as seen from the users trying to use it and running into related issues: https://groups.google.com/g/molecular_networking_bug_reports/search?q=Qemistree
A sustainable solution would be to modify the Qemistree workflow by externalizing the SIRIUS computation part.
The user would provide the SIRIUS workspace as input to run qiime qemistree make-hierarchy
. Of course, the user would be instructed to have computed a minimal set of steps SIRIUS/CSIFINGERID and ZODIAC/CANOPUS as optional.
For even larger flexibility in the long run and for offering wider support for other similarity functions, like basic cosine score ,and those that are being developed (like MS2DeepScore, https://www.biorxiv.org/content/10.1101/2021.04.18.440324v1), the best would be to have the possibility to run the hierarchy from the generic input files:
They would be:
- a similarity matrix.
- feature annotation metadata table(s).
- the feature quantificatation file.
Actually @ElDeveloper, with my support, wrote a python script for the Earth Microbiome Project that generates a tree/hierarchy from a novel SIRIUS workspace. The script only usesscipy scikit qiime2
libraries. Maybe we should release that very soon to help the users who are struggling ?
Is anyone interested in testing that solution ?
Hi @lfnothias, thanks for raising these important issues regarding:
a) support for making hierarchies with other similarity/dissimilarity matrices such as cosine scores, Tanimoto scores, etc b) the incompatibility of q2-qemistree with the latest Sirius version It is good to know that @ElDeveloper has a prototype to generate a chemical hierarchy from a new Sirius workspace, which means that this can be done.
Would you or @ElDeveloper or someone else be interested in working on adding some of these functionalities to q2-qemistree? I would be able to support the development process by discussing how to best implement this and providing code reviews.
Yes, this is absolutely a great idea. I think probably the best way is to create a new directory format (SiriusWorkspacev440
or something like that). Then we can write two transformers, one to extract the fingerprints and one to extract the feature metadata. The commands you would run are something along the lines of:
To get the fingerprints (in a matrix form)
qiime tools import \
--input-path emp-sirius-workspace \
--output-path emp-fingerprints.qza \
--format SiriusWorkspacev440 \
--type FeatureTable[Frequency]
To get the feature metadata (for use with other plugins)
qiime tools import \
--input-path emp-sirius-workspace \
--output-path emp-feature-metadata.qza \
--format SiriusWorkspacev440 \
--type FeatureData[Molecules]
After the user has done this, then a user would need to use the fingerprints to build the tree (we can add a new action).
The biggest change from this is that we would leave running Sirius up to the end users, and we would mostly be handling the tree construction QCing, etc by parsing the Sirius workspaces. I kinda like this idea because when Sirius changes its outputs in the future, then we'll only need to write a new directory format, for example SiriusWorkspacev666
. In the artifact outputs would remain the same (a feature table and the corresponding feature metadata).
Nice. That seems a very practical way to deal with SIRIUS in the long run ! If we could also support a similarity matrix as input, that would give the maximum flexibility for incorporating other tools/similarity function.
Great, thanks @lfnothias. Any thoughts @anupriyatripathi?
Hey dear all @lfnothias @ElDeveloper @anupriyatripathi , do you guys have an updated way for this Qemistree workflow?
Hi @amcaraballor, we have worked on updating Qemistree with @helenamrusso. Her Github branch has the latest version that is compatible with the latest version of Sirius. It will be merged into the main workflow soon but you can use the branch if needed. @helenamrusso has been using it and helping other users as well.