platon icon indicating copy to clipboard operation
platon copied to clipboard

Trouble testing on chromosomes

Open Liqueurdefehling opened this issue 2 years ago • 9 comments

Hi I am testing Platon 1.6 on the E. coli chromosome accession number CP027572.1 as well as bacterial chromosomes CP045233.1 and CP011509.1. platon [–c] --db /env/ig/biobank/by-soft/platon/1.6/db/ --output …/test_ecoli_c/ --verbose …/ecoli.fasta There is no output when running in accuracy mode. When launched in –c mode, I get a table with one row, the ID being the sequence ID and the RDS being negativ, and the chromosome.fasta file is empty whereas the sequence is in the plasmid.fasta file. The same thing happens when I try an input file containing both chromosomes and plasmids sequences, every sequences are in the plasmid.fasta file. Any idea on what I might be missing ? Best regards

Liqueurdefehling avatar Apr 15 '22 13:04 Liqueurdefehling

Hi @Liqueurdefehling , Though this might sound confusing in first place, it is actually the expected behavior. Platon was designed to classify draft contigs and thus extract plasmid-borne contigs. In order to do so, one can adjust sensitivity/specificity values by running Platon in either sensitivity, accuracy or specificity mode via the --mode parameter.

In addition and besides the above described normal operation, one can also use Platon in order to characterize (NOT classify) all plasmids via --characterize.

In that context, the above behavior is expected since in characterization mode, Platon executes the full characterization pipeline which is why all contigs are handled as plasmid-borne. I agree that in this case the output might be misleading and this might deserve a little bit of improvement.

oschwengers avatar Apr 19 '22 08:04 oschwengers

Thank you for explanation, now makes sense. I had similar results using the --characterize option. All the contigs were written into the <prefix>.plasmid.fasta file while the <prefix>.chromosome.fasta file was empty. Can this be chaged so the plasmids that had hits will be automatically written into a file for further use? Great tool anyways , thanks a lot! G

Gian77 avatar Jul 20 '22 17:07 Gian77

I have tried to comapre the two outputs with (bottom, secon cat) and without (top, first cat) the --characterize option and I am not sure how to interpret the result. Wht the 2 contigs NODE_5 and NODE_11 that were included in the <prefix>.plasmid.fasta then are not marked as having any plasmid hits, even when using the --characterize option. Thanks a lot. G Screenshot from 2022-07-20 15-46-53 .

Gian77 avatar Jul 20 '22 19:07 Gian77

Hi @Gian77 , the --characterize option simply conducts all characterization tasks without filtering for or predicting any plasmid/chromosome inference. It's just a convenience option to characterize all contigs.

If you'd like to predict plasmid-borne contigs, then you should use Platon in the default mode w/o --characterize. In your example NODE_5 and NODE_11 are predicted to be plasmid-borne.

oschwengers avatar Aug 02 '22 14:08 oschwengers

Hey @oschwengers,

thanks for the explanation, very useful. I am still confused, though, about what the # Plasmid Hits field means in the --characterize mode of platon. I have several contigs that have 1 in the characterize mode in that field, should't they match with what predicted in the default mode?

Thanks much! Gian

Gian77 avatar Sep 16 '22 15:09 Gian77

Hi @Gian77 , wel, it depends. Sure, a small contig can have a BLAST+ hit against a reference plasmid. But this might also be a small part of a mobile element or a fragment thereof, for example an IS, transposon or even just a transposase. To filter out these maybe false-positives, Platon screens for contigs with a sufficiently-high RDS. Only after this initial screening step, remaining contigs are characterized. By this, we can significantly speed up the entire process.

oschwengers avatar Sep 21 '22 07:09 oschwengers

Hello @oschwengers , Thanks for the explanation. So this means that the characterization may be not 100% correct due to the reasons you mention above, while the default mode it is correct since is performed after the screening for possible false positives. In the end I shoudl trust the default mode results, correct? Thanks a lot, Gian

Gian77 avatar Sep 21 '22 18:09 Gian77

Well, not exactly. The characterization is correct in terms of the descriptions. This step does not classify by any means, it merely provides all information on all contigs. For an actual classification (chromosome/plasmid), you should use Platon in the default (accuracy) mode.

oschwengers avatar Sep 23 '22 06:09 oschwengers

ok @oschwengers, will look into the manual. I think I did not specified accuracy mode when I run it. Thanks a lot, Gian

Gian77 avatar Sep 24 '22 14:09 Gian77