BBTools icon indicating copy to clipboard operation
BBTools copied to clipboard

Problem running shred with certain values of `length`

Open TomHarrop opened this issue 2 months ago • 0 comments

We're using shred to split a large fish chromosome into chunks. We've found that setting length=550000000 gives us an error, but length=500000000 works.

minlen=1, maxlen=-1
Input is being processed as unpaired
java.lang.Exception: 
An input file appears to be misformatted:
The character with ASCII code 0 appeared where a base was expected.
Sequence #3
Sequence ID: 'CM057449.1 Neoceratodus forsteri isolate LF-2020 chromosome 1_1, whole genome shotgun sequence_1647000000--2097967297'
Sequence: '[84, 84, 67, 65, 65, 71, 71, 65, 65, 84, 84,

...

This can be bypassed with the flag 'tossjunk', 'fixjunk', or 'ignorejunk'
	at shared.KillSwitch.kill(KillSwitch.java:97)
	at stream.Read.validateCommonCase_branchless(Read.java:413)
	at stream.Read.validate(Read.java:116)
	at stream.Read.<init>(Read.java:78)
	at stream.Read.<init>(Read.java:51)
	at synth.Shred.processUnevenly(Shred.java:360)
	at synth.Shred.processInner(Shred.java:286)
	at synth.Shred.process(Shred.java:226)
	at synth.Shred.main(Shred.java:43)


The input is the assembled chromosome file for chr1 from the GCA_016271365.2_neoFor_v3.1 assembly on NCBI.

This works.

apptainer exec \
docker://quay.io/biocontainers/bbmap:39.37--he5f24ec_0 \
    reformat.sh \
    -Xmx16g \
    in=chr1_1.fna.gz \
    out=chr1_1.reformat.fa

apptainer exec \
docker://quay.io/biocontainers/bbmap:39.37--he5f24ec_0 \
    shred.sh \
    -Xmx16g \
    length=500000000 \
    overlap=1000000 \
    in=chr1_1.reformat.fa \
    out=chr1_1.reformat.shred.fa 

But changing the second command to this gives us the error.

apptainer exec \
docker://quay.io/biocontainers/bbmap:39.37--he5f24ec_0 \
    shred.sh \
    -Xmx16g \
    length=550000000 \
    overlap=1000000 \
    in=chr1_1.reformat.fa \
    out=chr1_1.reformat.shred.fa 

TomHarrop avatar Nov 15 '25 10:11 TomHarrop