aviary icon indicating copy to clipboard operation
aviary copied to clipboard

checkm.out strange 0 completeness and marker genes for large bins

Open gbouras13 opened this issue 2 years ago • 10 comments

Hi Rhys,

I was successfully able to run aviary very easily on a test sample (it was much easier to install, lorikeet issue notwithstanding, and runs a lot smoother for me than atlas for what it is worth).

t's awesome and I love the output from rosella so thanks for that too - the UMAPs make the bins very clear!

I am parsing the output files now and I noticed possible issues with the check.out output file. I have uploaded it. I have also uploaded the equivalent from atlas from the same sample. (fwiw, aviary found 3 extra small bins vs atlas, which is nice!)

Essentially, I think there must be some issue with checkm because I am getting 0 or near 0 completeness for bins that I am sure are quite complete (based off the atlas output).

9 of the bins are over 1MB, with 8 around 2MB, and also I have blasted large chunks of the contigs just to confirm that they are indeed the correct species/genera. But they all seem to have 0 completeness and 0 marker genes found in the checkm.out file, which seems very wrong to me. So I am thinking it is likely an issue with checkm.

checkm.out.txt

atlas_completeness.txt

George

gbouras13 avatar Jul 20 '22 06:07 gbouras13

sample_2_checkm.out.txt

And same issue on another sample (for what it is worth atlas only found 5 bins for this sample).

gbouras13 avatar Jul 20 '22 06:07 gbouras13

Hi George,

I have some theories and it might be normal/expected behaviour from aviary, but first just need to confirm a few things:

  • Does that checkm.out file contain all of the bins that aviary recovered or have you trimmed it down to only contain the strange bins?
  • Also, have you tried running checkm directly on those bins to see if the results are the same?

Cheers, Rhys

rhysnewell avatar Jul 20 '22 22:07 rhysnewell

Turns out I was getting an issue like this https://github.com/metagenome-atlas/atlas/issues/216

It was caused by the fact the TMPDIR set on my cluster was not in my home directory.

Defining TMPDIR to my home directory in my slurm submission script solved the issue.

George

gbouras13 avatar Jul 21 '22 00:07 gbouras13

So the bin information produced by aviary was all as expected?

rhysnewell avatar Jul 21 '22 00:07 rhysnewell

No, not at all. Most of the bins had 90+% completeness when I re-ran it - the output looks good now.

The error must have resulted in some output being written still. I'm not exactly sure how.

George

gbouras13 avatar Jul 21 '22 00:07 gbouras13

Okay, I'm going to re-open this then. I'll have to figure out if this is an aviary issue or not

rhysnewell avatar Jul 21 '22 00:07 rhysnewell

I'll upload an example "correct" output later if you would like, the issue seems to be related to how Snakemake sets the tmpdir resource.

gbouras13 avatar Jul 21 '22 00:07 gbouras13

I haven't been able to reproduce this, all of the checkm results I've been testing seem to be correct. I've now added a kind of verification step where the completeness and contamination scores are reviewed by CheckM2 at the final stage and merged into the bin_info.tsv file. But yeah, haven't seen anything weird.

Aviary does output low completeness/high contamination bins that would generally be ignored by other binning algorithms in case you were noticing some of them

rhysnewell avatar Jul 24 '22 22:07 rhysnewell

I haven't had the issue since I set TMPDIR="

" before running aviary.

Also as an aside, I am not sure that the full pipeline is running to completion for me yet, I don't have 'bin_info.tsv', 'coverm_abundances.tsv' or 'checkm_minimal.tsv' in my output bins directory, only 'checkm.out' and a symlink to the bins - so when I run aviary cluster it does not work.

My application is little a bit unusual I guess in that I'm more interested in the bins themselves than the abundances/de-replication, so it is enough for me for now - I will wait until you're done implementing checkm2 and other fixes before I hassle you some more!

gbouras13 avatar Jul 25 '22 01:07 gbouras13

Oh, that's odd. The full output should definitely be there, if you have time it would be helpful if you could search your log files for any errors towards the end of the pipeline that might be causing that. If not I'll see if I can also replicate that behaviour

rhysnewell avatar Jul 25 '22 01:07 rhysnewell

I might go ahead and close this issue, it does not seem to be reproducible at least with newer versions. Please reopen if it is still an issue

rhysnewell avatar Sep 13 '23 05:09 rhysnewell