blobtools icon indicating copy to clipboard operation
blobtools copied to clipboard

blobtools bamfilter arguments

Open tomwells-oxf opened this issue 4 years ago • 1 comments

Hi, In the documentation it lists options for bamfilter that don't seem to exist in the version I downloaded and installed using the instructions (Option A: Conda). These include --sort --keep and --threads, while -u is actually -U. Are the docs wrong or am I downloading the wrong version? Cheers

tomwells-oxf avatar Feb 16 '21 10:02 tomwells-oxf

@DRL I have also noticed the same thing, as well as other mysterious behavior from blobtools bamfilter. See these arguments provided by blobtools bamfilter don't match the documentation (https://blobtools.readme.io/docs/bamfilter)

The arguments from blobtools bamfilter

(blobtools) [user@server blobtools]$ blobtools -v
1.1.1
(blobtools) [user@server blobtools]$ blobtools bamfilter
usage: blobtools bamfilter  -b FILE [-i FILE] [-e FILE] [-U] [-n] [-o PREFIX] [-f FORMAT]
                                [-h|--help]
                                  
(blobtools) [user@server blobtools]$ blobtools bamfilter -h
usage: blobtools bamfilter  -b FILE [-i FILE] [-e FILE] [-U] [-n] [-o PREFIX] [-f FORMAT]
                                [-h|--help]

    Options:
        -h --help                   show this
        -b, --bam FILE              BAM file (sorted by name)
        -i, --include FILE          List of contigs whose reads are included
                                    - writes FASTAs of pairs where at least
                                        one read maps sequences in list
                                        (InUn.fq, InIn.fq, ExIn.fq)
        -e, --exclude FILE          List of contigs whose reads are excluded (outputs reads that do not map to sequences in list)
                                    - writes FASTAs of pairs where at least
                                        one read does not maps to sequences in list
                                        (InUn.fq, InIn.fq, ExIn.fq)
        -U, --exclude_unmapped      Exclude pairs where both reads are unmapped
        -n, --noninterleaved        Use if fw and rev reads should be in separate files
        -f, --read_format FORMAT    FASTQ = fq, FASTA = fa [default: fa]
        -o, --out PREFIX            Output prefix

I also noticed a few other discrepancies when running. When I run:

#blobtools bamfilter -b ${gIllumina_unmappedspades_bam} -i pullreads_Actinomycetota_contigs.txt -o pullreads_Actinomycetota_contigs

I get the following files.

pullreads_Actinomycetota_contigs.data.bam.ExIn.fa
pullreads_Actinomycetota_contigs.data.bam.info.txt
pullreads_Actinomycetota_contigs.data.bam.InIn.fa
pullreads_Actinomycetota_contigs.data.bam.UnUn.fa

(base) [user@server pullreads]$ less pullreads_Actinomycetota_contigs.data.bam.info.txt | Total pairs | 82,485,421 | 100.0% | | InUn pairs | 0 | 0.0% | | InIn pairs | 151,618 | 0.2% | | ExIn pairs | 719 | 0.0% | | UnUn pairs | 28,073,930 | 34.0% |

I get no InUn pairs in the fasta, but I get ExIn pairs, despite the fact that I did not tell blobtools bamfilter to -e (exclude) any contigs. I have run this on multiple other contigs from the BAM/Assembly I was interested in and I always get ExIn pairs and no InUn pairs, which is a little suspicious to me. Shouldn't there be pairs in which one read maps and the other does not? Could ExIn and InUn possibly be accidentally switched?

Thanks for taking a look!

margaretc-ho avatar May 22 '23 17:05 margaretc-ho