Adding plasmid MLST schemes
Hi Torsten,
Would it be possible to include the plasmid MLST schemes on pubmlst [https://pubmlst.org/plasmid/] to your database?
Cheers,
Duy
@mdphan I could probably add them, but I would need a consistent place to download the alleles and scheme files from, that doesn't require any logins.
Do you know how to do that?
Hi Torsten,
I forwarded your question to Keith Jolley and I am copying his reply below:
The RESTful API is the preferred method – documented at http://bigsdb.readthedocs.io/en/latest/rest.html.
See the download_alleles scripts at https://github.com/kjolley/BIGSdb/tree/develop/scripts/rest_examples (Perl or Python versions available).
For example, you could download all alleles from the IncHI1 scheme to the current directory with the following command:
download_alleles.pl --database pubmlst_plasmid_seqdef --scheme_id 5
The corresponding profiles file can be got by calling:
wget rest.pubmlst.org/db/pubmlst_plasmid_seqdef/schemes/5/profiles_csv
Hope his answer contains all the info you need.
Cheers,
Duy
Having pMLST would be great! Here are some things I found out after trying to include the schemes into the tool:
Scheme IncA/C cgPMLST
- has a column named cgST, and not ST as expected by the tool
- has ST numbers formatted as "int.int" which seems to cause problems when running
mlst(version2.10, error inperl5/MLST/Scheme.pmline 65).mlstworked only after replacing the STs by normal numeric values.
~~Edit: http://rest.pubmlst.org/db/pubmlst_plasmid_seqdef/classification_schemes gives also an empty list of scheme IDs~~
Correction: The URL to get the available schemes is http://rest.pubmlst.org/db/pubmlst_plasmid_isolates/schemes.
Edit: Also scheme IncF does not have any profiles and creating an empty profile with only a header does not work: Can't use an undefined value as a HASH reference at perl5/MLST/Scheme.pm line 65.
@VGalata Did you manage to add scheme IncA/C cgPMLST to the tool? I have
- changed column cgST to ST
- Replaced ST "int.int" number by normal numeric values.
I can see that the sequences from the scheme were added to the blast database but the scheme wasn't added to the scheme list (ie. not found when run mlst --list).
@mdphan Yes, I managed to do that. mlst --list prints for me all added schemes: IncA_C__PMLST IncHI1 IncHI2 IncN IncI1 IncF IncA_C__cgPMLST. Probably there is a problem with the names of your directories?
You can find my code here: https://github.com/VGalata/plsdb
Look into README.md, and the snakemake pipeline and config fles (pipeline.*). In pipeline.json there is an entry for pmlst and in pipeline.snake are rules to build the DB (look for rules below # pmlst). The tool mlst is installed using (Mini)conda and I remove all "normal" MLST schemes before downloading and adding the pMLST schemes.
Edit: If it helps I can also give you all required files.
I'm interested in this feature. Could I help out?
If you can write a stand-alone script that uses no non-core libraries that creates a compatible MLST output folder, then yes.
If using Python libraries is okay I would use my code to create a stand-alone script.
Based on the current mlst codebase I assume Torsten means a stand-alone perl (or bash) script. The 'core' perl libraries are described here:
https://www.perl.com/article/what-is-the-perl-core-/
...and listed here:
https://perldoc.perl.org/index-modules-A.html
...but there seems to be some ambiguity about whether the HTTP::Tiny module is part of the core modules. I think that module might be useful here. Would it be ok to use? I gather that it's included in perl 5.14 and later.
I've got the beginnings of a pMLST download script here:
https://github.com/dfornika/mlst/blob/pmlst-script/scripts/pmlst-download_pub_mlst.pl
...based heavily on @VGalata 's scripts:
https://github.com/VGalata/plsdb/blob/83e0682fe18741afb950517dd07bdc3725c9acbe/pipeline.snake#L324
https://github.com/VGalata/plsdb/blob/83e0682fe18741afb950517dd07bdc3725c9acbe/utils.py#L233
...and also the BIGSdb download_alleles.pl script:
https://github.com/kjolley/BIGSdb/blob/develop/scripts/rest_examples/perl/download_alleles.pl
It's a work-in-progress, I'm open to comments/suggestions. Once it's in a more functional state I can submit a pull-request if welcome.
The other option that has worked well for me is to set up the plasmid MLST scheme of interest using ariba https://github.com/sanger-pathogens/ariba/wiki/MLST-calling-with-ARIBA