GetOrganelle
GetOrganelle copied to clipboard
Make database paths configurable - add GetOrganelle to Galaxy
Seems to me that the databases are currently stored next to the library. This does not work for read-only installations (e.g. containers and multi user installations) and is considered bad practice for conda installations (even if writable).
Ideally this could be done via an environment variable of command line parameter.
How about making the databases or a file storing the path under "~/.GetOrganelle" by default?
Thanks for the feedback.
I would prefer a command line parameter (~/.GetOrganelle
might be a good default). Only allowing ~/.GetOrganelle
would not be helpful for multi user installations, where an admin might want to provide the data bases at a central location for all users.
In the latest update (version 1.7.3), sort by priority,
- The database for each single run can be customized using a command line parameter following the flag "--config-dir".
- If "--config-dir" was not set, it will look for the shell environment value
GETORG_PATH
. So the admin could set a global default for all users. - If
GETORG_PATH
was not set, the default is "~/.GetOrganelle".
I just noticed that this is for Galaxy project, my collaborator @wbyu has mentioned many times of adding GetOrganelle to Galaxy. We are very interested in contributing. Please let me know if there's anything else we can help.
Good to know :) Indeed some "small" test data and an example command line would simplify creating such a tool a bit.
We have a simulated mini-data along with the command: Example 1/2 (https://github.com/Kinggerm/GetOrganelle/wiki/Examples)
and a few real test data (also very small ones): Example 3/4/5 (https://github.com/Kinggerm/GetOrganelle/wiki/Examples)
Downloaded reference data with python get_organelle_config.py -a all --config-dir config
Trying to get the examples running: python get_organelle_from_reads.py ... --config-dir config/
. This gives me:
############################################################################
ERROR: /home/berntm/.GetOrganelle/SeedDatabase/embplant_pt.fasta not found!
I'm also wondering if you could switch from optparse (which is deprecated) to argparse? For argparse I could auto-generate Galaxy wrappers. For get_organelle_from_reads
I started with the conversion to argparse - if you like I could open a PR.
Thanks for the feedback. Sorry about the remaining issues - now I believe I have fixed it and tested it in different ways. Please find the latest version at github.
Sure. Please branch out from the latest master if you haven't started the conversion. Thanks again!
@bernt-matthias I just made several updates and fixes to a new GetOrganelle version, in which I switched from optparse to argparse. It's currently on a different branch from the master: https://github.com/Kinggerm/GetOrganelle/tree/update_assembly_with_variable_overlaps
@Kinggerm this looks great :)
1.7.4 is now formally released with all above requirements fulfilled.
Excellent. I hope that I find the time to wrap this for Galaxy any time soon.
@bernt-matthias Hi, do you have further updates?
https://github.com/galaxyproject/tools-iuc/pull/4455
Thanks for the reference to the small test data. Is there also a small seed (and label) database? For the IUC Galaxy tool repo we are aiming at <1MB per test file.
Just back from traveling. Sure, I can prepare a small seed and label database for it ASAP. I will keep you updated.
Hi @Kinggerm any news on small seed and label databases?
Hi @bernt-matthias, I created a minimal dataset derived from 0.0.1, named it 0.0.1.minima
, and uploaded it to https://github.com/Kinggerm/GetOrganelleDB. It is downloadable through https://github.com/Kinggerm/GetOrganelleDB/releases/download/0.0.1.minima/v0.0.1.minima.tar.gz.
Please also update GetOrganelle to 1.7.6.1 to use it smoothly, otherwise lower GetOrganelle versions have to manually download the files and use get_organelle_config.py --use-local
to add the database.
This 0.0.1.minima
is, in total, 1.1MB uncompressed with the seed-label pair of each organelle type as follows:
type | size | note |
---|---|---|
animal_mt | 36KB | |
embplant_pt+embplant_mt | 684KB | These two types must be used together |
embplant_nr | 16KB | |
fungus_mt | 84KB | |
fungus_nr | 8KB | |
other_pt | 328KB |
However, please note that the local files will inflate a lot as you format them into bowtie2-
& blastn-
indices.
Please let me know if these make sense.
Thanks a lot. We will try to use those for the galaxy tool wrapper tests.
Seems that we can close this issue. Thanks for the help.