GetOrganelle icon indicating copy to clipboard operation
GetOrganelle copied to clipboard

Make database paths configurable - add GetOrganelle to Galaxy

Open bernt-matthias opened this issue 3 years ago • 19 comments

Seems to me that the databases are currently stored next to the library. This does not work for read-only installations (e.g. containers and multi user installations) and is considered bad practice for conda installations (even if writable).

Ideally this could be done via an environment variable of command line parameter.

bernt-matthias avatar Dec 26 '20 10:12 bernt-matthias

How about making the databases or a file storing the path under "~/.GetOrganelle" by default?

Kinggerm avatar Dec 28 '20 04:12 Kinggerm

Thanks for the feedback.

I would prefer a command line parameter (~/.GetOrganelle might be a good default). Only allowing ~/.GetOrganelle would not be helpful for multi user installations, where an admin might want to provide the data bases at a central location for all users.

bernt-matthias avatar Dec 28 '20 12:12 bernt-matthias

In the latest update (version 1.7.3), sort by priority,

  1. The database for each single run can be customized using a command line parameter following the flag "--config-dir".
  2. If "--config-dir" was not set, it will look for the shell environment value GETORG_PATH. So the admin could set a global default for all users.
  3. If GETORG_PATH was not set, the default is "~/.GetOrganelle".

Kinggerm avatar Jan 20 '21 04:01 Kinggerm

I just noticed that this is for Galaxy project, my collaborator @wbyu has mentioned many times of adding GetOrganelle to Galaxy. We are very interested in contributing. Please let me know if there's anything else we can help.

Kinggerm avatar Jan 20 '21 08:01 Kinggerm

Good to know :) Indeed some "small" test data and an example command line would simplify creating such a tool a bit.

bernt-matthias avatar Jan 20 '21 09:01 bernt-matthias

We have a simulated mini-data along with the command: Example 1/2 (https://github.com/Kinggerm/GetOrganelle/wiki/Examples)

and a few real test data (also very small ones): Example 3/4/5 (https://github.com/Kinggerm/GetOrganelle/wiki/Examples)

Kinggerm avatar Jan 20 '21 09:01 Kinggerm

Downloaded reference data with python get_organelle_config.py -a all --config-dir config

Trying to get the examples running: python get_organelle_from_reads.py ... --config-dir config/. This gives me:

############################################################################
ERROR: /home/berntm/.GetOrganelle/SeedDatabase/embplant_pt.fasta not found!

I'm also wondering if you could switch from optparse (which is deprecated) to argparse? For argparse I could auto-generate Galaxy wrappers. For get_organelle_from_reads I started with the conversion to argparse - if you like I could open a PR.

bernt-matthias avatar Feb 10 '21 18:02 bernt-matthias

Thanks for the feedback. Sorry about the remaining issues - now I believe I have fixed it and tested it in different ways. Please find the latest version at github.

Sure. Please branch out from the latest master if you haven't started the conversion. Thanks again!

Kinggerm avatar Feb 11 '21 18:02 Kinggerm

@bernt-matthias I just made several updates and fixes to a new GetOrganelle version, in which I switched from optparse to argparse. It's currently on a different branch from the master: https://github.com/Kinggerm/GetOrganelle/tree/update_assembly_with_variable_overlaps

Kinggerm avatar Mar 31 '21 09:03 Kinggerm

@Kinggerm this looks great :)

bernt-matthias avatar Mar 31 '21 09:03 bernt-matthias

1.7.4 is now formally released with all above requirements fulfilled.

Kinggerm avatar Apr 15 '21 16:04 Kinggerm

Excellent. I hope that I find the time to wrap this for Galaxy any time soon.

bernt-matthias avatar Apr 15 '21 19:04 bernt-matthias

@bernt-matthias Hi, do you have further updates?

Kinggerm avatar Oct 08 '21 02:10 Kinggerm

https://github.com/galaxyproject/tools-iuc/pull/4455

bernt-matthias avatar Mar 19 '22 10:03 bernt-matthias

Thanks for the reference to the small test data. Is there also a small seed (and label) database? For the IUC Galaxy tool repo we are aiming at <1MB per test file.

bernt-matthias avatar Mar 20 '22 10:03 bernt-matthias

Just back from traveling. Sure, I can prepare a small seed and label database for it ASAP. I will keep you updated.

Kinggerm avatar Mar 22 '22 23:03 Kinggerm

Hi @Kinggerm any news on small seed and label databases?

bernt-matthias avatar May 05 '22 07:05 bernt-matthias

Hi @bernt-matthias, I created a minimal dataset derived from 0.0.1, named it 0.0.1.minima, and uploaded it to https://github.com/Kinggerm/GetOrganelleDB. It is downloadable through https://github.com/Kinggerm/GetOrganelleDB/releases/download/0.0.1.minima/v0.0.1.minima.tar.gz. Please also update GetOrganelle to 1.7.6.1 to use it smoothly, otherwise lower GetOrganelle versions have to manually download the files and use get_organelle_config.py --use-local to add the database.

This 0.0.1.minima is, in total, 1.1MB uncompressed with the seed-label pair of each organelle type as follows:

type size note
animal_mt 36KB
embplant_pt+embplant_mt 684KB These two types must be used together
embplant_nr 16KB
fungus_mt 84KB
fungus_nr 8KB
other_pt 328KB

However, please note that the local files will inflate a lot as you format them into bowtie2- & blastn- indices.

Please let me know if these make sense.

Kinggerm avatar May 06 '22 23:05 Kinggerm

Thanks a lot. We will try to use those for the galaxy tool wrapper tests.

bernt-matthias avatar May 09 '22 08:05 bernt-matthias

Seems that we can close this issue. Thanks for the help.

bernt-matthias avatar Sep 07 '23 18:09 bernt-matthias