BBTools
BBTools copied to clipboard
Problem running shred with certain values of `length`
We're using shred to split a large fish chromosome into chunks. We've found that setting length=550000000 gives us an error, but length=500000000 works.
minlen=1, maxlen=-1
Input is being processed as unpaired
java.lang.Exception:
An input file appears to be misformatted:
The character with ASCII code 0 appeared where a base was expected.
Sequence #3
Sequence ID: 'CM057449.1 Neoceratodus forsteri isolate LF-2020 chromosome 1_1, whole genome shotgun sequence_1647000000--2097967297'
Sequence: '[84, 84, 67, 65, 65, 71, 71, 65, 65, 84, 84,
...
This can be bypassed with the flag 'tossjunk', 'fixjunk', or 'ignorejunk'
at shared.KillSwitch.kill(KillSwitch.java:97)
at stream.Read.validateCommonCase_branchless(Read.java:413)
at stream.Read.validate(Read.java:116)
at stream.Read.<init>(Read.java:78)
at stream.Read.<init>(Read.java:51)
at synth.Shred.processUnevenly(Shred.java:360)
at synth.Shred.processInner(Shred.java:286)
at synth.Shred.process(Shred.java:226)
at synth.Shred.main(Shred.java:43)
The input is the assembled chromosome file for chr1 from the GCA_016271365.2_neoFor_v3.1 assembly on NCBI.
This works.
apptainer exec \
docker://quay.io/biocontainers/bbmap:39.37--he5f24ec_0 \
reformat.sh \
-Xmx16g \
in=chr1_1.fna.gz \
out=chr1_1.reformat.fa
apptainer exec \
docker://quay.io/biocontainers/bbmap:39.37--he5f24ec_0 \
shred.sh \
-Xmx16g \
length=500000000 \
overlap=1000000 \
in=chr1_1.reformat.fa \
out=chr1_1.reformat.shred.fa
But changing the second command to this gives us the error.
apptainer exec \
docker://quay.io/biocontainers/bbmap:39.37--he5f24ec_0 \
shred.sh \
-Xmx16g \
length=550000000 \
overlap=1000000 \
in=chr1_1.reformat.fa \
out=chr1_1.reformat.shred.fa