nanocompore icon indicating copy to clipboard operation
nanocompore copied to clipboard

SampComp error

Open rania-o opened this issue 2 years ago • 5 comments

Hello,

I'm using Nanocompore to compare between a modified sample and an IVT sample. I've already done the nanopolish collapse step and I got this in the log file (for the IVT sample, the modified one also has similar results):

2022-03-25T10:41:46.337153+0100 WARNING - MainProcess | Running Eventalign_collapse
2022-03-25T10:41:46.337736+0100 INFO - MainProcess | Checking and initialising Eventalign_collapse
2022-03-25T10:41:46.339649+0100 INFO - MainProcess | Starting data processing
2022-03-25T10:54:48.308272+0100 INFO - Process-6 | Output reads written:21561


Written Reads:21561 Kmers:6887018

and when I grep the valid kmers in the output collapsed file I get : 6078205 valid kmers / 6887018 kmers.

After this, I tried to run SampComp (even-though I don't have any replicats) :

nanocompore sampcomp --file_list1 psi0_transcrit_oligo_collapsed_reads.eventalign/out_eventalign_collapse.tsv  --file_list2 psi2_transcrit_oligo_collapsed_reads.eventalign/out_eventalign_collapse.tsv --fasta ../transcript_oligo.fasta  --outpath ./samp_comp_results

2022-03-25T14:32:16.222012+0100 WARNING - MainProcess | Running SampComp
2022-03-25T14:32:16.222857+0100 INFO - MainProcess | Checking and initialising SampComp
2022-03-25T14:32:16.226479+0100 INFO - MainProcess | Only 1 replicate found for condition Condition1
2022-03-25T14:32:16.226733+0100 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2022-03-25T14:32:16.227296+0100 INFO - MainProcess | Only 1 replicate found for condition Condition2
2022-03-25T14:32:16.227704+0100 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2022-03-25T14:32:16.230122+0100 INFO - MainProcess | Reading eventalign index files
2022-03-25T14:32:18.253073+0100 INFO - MainProcess | 	References found in index: 1
2022-03-25T14:32:18.253414+0100 INFO - MainProcess | Filtering out references with low coverage
2022-03-25T14:32:18.254686+0100 INFO - MainProcess | 	References remaining after reference coverage filtering: 0
2022-03-25T14:32:18.255010+0100 INFO - MainProcess | Starting data processing
2022-03-25T14:32:18.301037+0100 INFO - Process-3 | All Done. Transcripts processed: 0
2022-03-25T14:32:18.309365+0100 INFO - MainProcess | Loading SampCompDB
2022-03-25T14:32:18.317105+0100 INFO - MainProcess | The result database is empty
2022-03-25T14:32:18.318381+0100 INFO - MainProcess | Saving results

So I run it again with a min_coverage equal to 0 :

nanocompore sampcomp --file_list1 psi0_transcrit_oligo_collapsed_reads.eventalign/out_eventalign_collapse.tsv  --file_list2 psi2_transcrit_oligo_collapsed_reads.eventalign/out_eventalign_collapse.tsv --fasta ../transcript_oligo.fasta  --outpath ./samp_comp_results_2 --min_coverage 0

Condition:Condition1 Sample:Condition1_1 	High fraction of invalid kmers: 21,555	valid reads: 6
Condition:Condition2 Sample:Condition2_1 	High fraction of invalid kmers: 20,243	valid reads: 2

but there are almost 6 millions of valid kmers, isn't it enough ? or does it means that my data is not suitable for nanocompre ? (I used other tools to detect modifications, and it worked well)

This is the message error I got :

2022-03-25T14:59:18.933552+0100 WARNING - MainProcess | Running SampComp
2022-03-25T14:59:18.934119+0100 INFO - MainProcess | Checking and initialising SampComp
2022-03-25T14:59:18.937440+0100 INFO - MainProcess | Only 1 replicate found for condition Condition1
2022-03-25T14:59:18.937670+0100 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2022-03-25T14:59:18.938098+0100 INFO - MainProcess | Only 1 replicate found for condition Condition2
2022-03-25T14:59:18.938339+0100 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2022-03-25T14:59:18.940320+0100 INFO - MainProcess | Reading eventalign index files
2022-03-25T14:59:20.513673+0100 INFO - MainProcess | 	References found in index: 1
2022-03-25T14:59:20.514114+0100 INFO - MainProcess | Filtering out references with low coverage
2022-03-25T14:59:20.515235+0100 INFO - MainProcess | 	References remaining after reference coverage filtering: 1
2022-03-25T14:59:20.515533+0100 INFO - MainProcess | Starting data processing
2022-03-25T14:59:20.637782+0100 ERROR - Process-2 | Error doing GMM test on reference dystro-oligo
2022-03-25T14:59:20.638123+0100 ERROR - Process-2 | Error in Worker
nanocompore.common.NanocomporeError: Error doing GMM test on reference dystro-oligo
ValueError: Expected 2D array, got 1D array instead:
array=[].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I don't know if it's clear, waiting for your help. Thank you.

rania-o avatar Mar 25 '22 15:03 rania-o

I get the same error, did you find a solution to that error? I guess you need at least two replicates per condition?

JannesSP avatar Oct 11 '22 10:10 JannesSP

No, I didn't. I just used other tools.

rania-o avatar Oct 11 '22 11:10 rania-o

Hi rania-o and JannesSP, I apologise for the lack of activity here last year. How long is your reference sequence? If it is near 100 nt long then you may need to lower the reference length. You may also want to look at the --max_invalid_kmers_freq option and set it higher than 0.1 (the default).

I know you've likely moved on from using nanocompore, but if you try these settings and it works for you, let me know.

Thanks, Logan

lmulroney avatar Jan 05 '23 13:01 lmulroney

@rania-o What other tools have you tried ?

keenhl avatar Mar 11 '24 18:03 keenhl

@keenhl Drummer, Epinano, Eligos, Xpore ...

rania-o avatar Mar 16 '24 17:03 rania-o