pbbioconda
pbbioconda copied to clipboard
Lima failing to detect CCS data after Skera de-concatination & Bam2Fastq conversion
Operating system Amazon Linux 2
Package name Lima v 2.9 Skera v1.2
Conda environment
# packages in environment at /home/ec2-user/miniconda3/envs/pbtk:
#
# Name Version Build Channel
lima 2.9.0 h9ee0642_1 bioconda
pbskera 1.2.0 hdfd78af_0 bioconda
pbtk 3.1.0 h9ee0642_0 bioconda
Describe the bug Lima is failing to identify CCS readsafter using pbskera to de-concat into s-reads (through SMRTLink and via command line), and bam2fastq conversion. This was not an issue prior to Kinnex datasets - our workflow on native fastq-converted reads did not cause this error. I noticed only one other person reporting this bug, but back in 2021 and doesn't appear to be relevant.
Error message 20240430 04:09:25.288 | WARN | Attention! You are trying to demultiplex non CCS data. CLR demultiplexing is only supported with BAM/XML input! Will proceed to demultiplex each sequence individually, not grouped by ZMW!
To Reproduce
- De-concat raw reads with skera
- bam2fastq -u -o reads skera.bam
- lima --hifi-preset ASYMMETRIC --biosample-csv barcode-sample-16S.csv --split-named --output-missing-pairs input.fastq kinnex16S.fasta demux.fastq
Expected behavior Our workflow remains the same pre-kinnex and post-kinnex data, with this error only occuring with kinnex datasets. It is perhaps due to a header change. The new fastq header contains an extra set of info compared to the old datasets: Old: @m84073_240328_065715_s1/133239718/ccs New: @m84073_240426_082659_s4/250483516/ccs/16_1598