TideHunter icon indicating copy to clipboard operation
TideHunter copied to clipboard

[abpoa_gen_cons] "Not enough sequences to perform msa"

Open santiago-es opened this issue 3 years ago • 10 comments

So after successfully compiling TideHunter and AbPoa I'm running into this error when attempting to generate a consensus sequence:

[abpoa_gen_cons] "Not enough sequences to perform msa"

However, my sequence file is of length: 195588 (as measured by zcat < *.fq.gz | wc -l) so I feel like that should be more than enough sequence?

I am attempting to use a large gzipped fastq file as TideHunter input. As a minimally reproducible example, I will include on of the constitutent fastq files which is much smaller (FAR....12.fastq.gz):

FAR63237_pass_barcode01_9dc2df5e_12.fastq.gz

These fastqs were generated by the default MinKnow basecaller from FAST5 files produced by nanopore sequencing on a MinION 9.3.4 flow cell.

My execute command for tidehunter was:

./TideHunter-v1.5.3.2/bin/TideHunter -f 3 *.fastq.gz

and I also attempted

./TideHunter-v1.5.3.2/bin/TideHunter -f 3 $(zcat < *.fastq.gz)

which has not yet failed with this error but also seems to be taking much longer than anticipated to run.

Thanks for your help

santiago-es avatar Feb 25 '22 16:02 santiago-es

Hi,

With the command

~/program/TideHunter/bin/TideHunter -f3 ./FAR63237_pass_barcode01_9dc2df5e_12.fastq.gz > out

It works normally on my machine.

Can you also provide the data which cause the error [abpoa_gen_cons] "Not enough sequences to perform msa"?

Yan

yangao07 avatar Feb 27 '22 11:02 yangao07

I get the error with this file as well, but here is the file I am trying to do this on: (sending gdrive link because file too large to attach (~780 Mb)

https://drive.google.com/file/d/1r7nMrOjSGGxFJrVlDaqwXpkYzHffcJsH/view?usp=sharing

santiago-es avatar Feb 28 '22 18:02 santiago-es

Also just to be precise, the error specifically says "[abpoa_gen_cons] No enough sequences to perform msa."

santiago-es avatar Feb 28 '22 18:02 santiago-es

I think the issue might once again be with compilation if it works on your machine...

I re-downloaded the TideHunter repo (git clone --recursive) and the updated abPOA within it (also --recursive) and tried to rebuild after including <arm_neon.h> in simde instructions, and changing march=native to mcpu=apple-m1 as I had done previously. I needed to change 3 instances of "%ld" in the src/main.c to "%lld" to address several warnings, but then make fails again with the same error as last time:

Undefined symbols for architecture arm64: "_ksw_extz2_sse", referenced from: _ksw2_global in ksw2_align.o _ksw2_global_with_cigar in ksw2_align.o _ksw2_right_ext in ksw2_align.o _ksw2_left_ext in ksw2_align.o _ksw2_right_extend in ksw2_align.o _ksw2_left_extend in ksw2_align.o ld: symbol(s) not found for architecture arm64 clang: error: linker command failed with exit code 1 (use -v to see invocation) make: *** [bin/TideHunter] Error 1

after doing make clean_all all armv8=1 aarch64=1 (which I believe are the correct flags for my Apple M1 Pro MBP)

santiago-es avatar Feb 28 '22 19:02 santiago-es

Hi, with your 780MB data, I did re-produce the error. Let me look into it, and this may take some time.

yangao07 avatar Mar 01 '22 10:03 yangao07

Ok, thanks!

The larger file was produced by using zcat to combine several smaller fq.he files like so

Zcat *fastq.gz > newFile.fq.gz

Hope that is helpful!

Santiago On Mar 1, 2022, 2:49 AM -0800, Yan Gao @.***>, wrote:

Hi, with your 780MB data, I did re-produce the error. Let me look into it, and this may take some time. — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>

santiago-es avatar Mar 01 '22 14:03 santiago-es

Thanks for the bug push. I've re downloaded and compiled (successfully, with no alternations) the repo on my Ubuntu 20.x distro and TideHunter now successfully runs for the individual fastq files, but not the merged fastq (reproduces the error in the original issue comment). Odd behavior I dont quite understand. Hope this helps!

santiago-es avatar Mar 03 '22 19:03 santiago-es

I did not see the error using mcf10a.fq.gz. Can you upload the merged data?

yangao07 avatar Mar 04 '22 03:03 yangao07

The Mcf10a file is a merged fastq.gz of 12 fq.gz files with the same barcode. The other Merged file is 2.1 Gb. Both produce the same error on my machine. Perhaps it’s because I installed w wget instead of git clone on my Linux distro?

On Mar 3, 2022, 7:31 PM -0800, Yan Gao @.***>, wrote:

I did not see the error using mcf10a.fq.gz. Can you upload the merged data? — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>

santiago-es avatar Mar 04 '22 04:03 santiago-es

Maybe, just try to re-run it with git clone.

yangao07 avatar Mar 06 '22 07:03 yangao07