mlst Adding plasmid MLST schemes

Hi Torsten,

Would it be possible to include the plasmid MLST schemes on pubmlst [https://pubmlst.org/plasmid/] to your database?

Cheers,

Duy

Jul 06 '18 04:07 mdphan

@mdphan I could probably add them, but I would need a consistent place to download the alleles and scheme files from, that doesn't require any logins.

Do you know how to do that?

Jul 06 '18 04:07 tseemann

Hi Torsten,

I forwarded your question to Keith Jolley and I am copying his reply below:

The RESTful API is the preferred method – documented at http://bigsdb.readthedocs.io/en/latest/rest.html.

See the download_alleles scripts at https://github.com/kjolley/BIGSdb/tree/develop/scripts/rest_examples (Perl or Python versions available).

For example, you could download all alleles from the IncHI1 scheme to the current directory with the following command:

download_alleles.pl --database pubmlst_plasmid_seqdef --scheme_id 5

The corresponding profiles file can be got by calling:

wget rest.pubmlst.org/db/pubmlst_plasmid_seqdef/schemes/5/profiles_csv

Hope his answer contains all the info you need.

Cheers,

Duy

Jul 06 '18 06:07 mdphan

Having pMLST would be great! Here are some things I found out after trying to include the schemes into the tool:

Scheme IncA/C cgPMLST

has a column named cgST, and not ST as expected by the tool
has ST numbers formatted as "int.int" which seems to cause problems when running mlst (version 2.10, error in perl5/MLST/Scheme.pm line 65). mlst worked only after replacing the STs by normal numeric values.

~~Edit: http://rest.pubmlst.org/db/pubmlst_plasmid_seqdef/classification_schemes gives also an empty list of scheme IDs~~ Correction: The URL to get the available schemes is http://rest.pubmlst.org/db/pubmlst_plasmid_isolates/schemes.

Edit: Also scheme IncF does not have any profiles and creating an empty profile with only a header does not work: Can't use an undefined value as a HASH reference at perl5/MLST/Scheme.pm line 65.

Aug 03 '18 07:08 VGalata

@VGalata Did you manage to add scheme IncA/C cgPMLST to the tool? I have

changed column cgST to ST
Replaced ST "int.int" number by normal numeric values.

I can see that the sequences from the scheme were added to the blast database but the scheme wasn't added to the scheme list (ie. not found when run mlst --list).

Aug 29 '18 06:08 mdphan

@mdphan Yes, I managed to do that. mlst --list prints for me all added schemes: IncA_C__PMLST IncHI1 IncHI2 IncN IncI1 IncF IncA_C__cgPMLST. Probably there is a problem with the names of your directories?

You can find my code here: https://github.com/VGalata/plsdb Look into README.md, and the snakemake pipeline and config fles (pipeline.*). In pipeline.json there is an entry for pmlst and in pipeline.snake are rules to build the DB (look for rules below # pmlst). The tool mlst is installed using (Mini)conda and I remove all "normal" MLST schemes before downloading and adding the pMLST schemes.

Edit: If it helps I can also give you all required files.

Aug 29 '18 06:08 VGalata

I'm interested in this feature. Could I help out?

May 11 '19 00:05 dfornika

If you can write a stand-alone script that uses no non-core libraries that creates a compatible MLST output folder, then yes.

May 16 '19 01:05 tseemann

If using Python libraries is okay I would use my code to create a stand-alone script.

May 16 '19 11:05 VGalata

Based on the current mlst codebase I assume Torsten means a stand-alone perl (or bash) script. The 'core' perl libraries are described here:

https://www.perl.com/article/what-is-the-perl-core-/

...and listed here:

https://perldoc.perl.org/index-modules-A.html

...but there seems to be some ambiguity about whether the HTTP::Tiny module is part of the core modules. I think that module might be useful here. Would it be ok to use? I gather that it's included in perl 5.14 and later.

May 21 '19 19:05 dfornika

I've got the beginnings of a pMLST download script here:

https://github.com/dfornika/mlst/blob/pmlst-script/scripts/pmlst-download_pub_mlst.pl

...based heavily on @VGalata 's scripts:

https://github.com/VGalata/plsdb/blob/83e0682fe18741afb950517dd07bdc3725c9acbe/pipeline.snake#L324

https://github.com/VGalata/plsdb/blob/83e0682fe18741afb950517dd07bdc3725c9acbe/utils.py#L233

...and also the BIGSdb download_alleles.pl script:

https://github.com/kjolley/BIGSdb/blob/develop/scripts/rest_examples/perl/download_alleles.pl

It's a work-in-progress, I'm open to comments/suggestions. Once it's in a more functional state I can submit a pull-request if welcome.

May 23 '19 00:05 dfornika

The other option that has worked well for me is to set up the plasmid MLST scheme of interest using ariba https://github.com/sanger-pathogens/ariba/wiki/MLST-calling-with-ARIBA

Jul 23 '19 03:07 danielleingle