hostile icon indicating copy to clipboard operation
hostile copied to clipboard

Hostile with no options classifying different than --invert

Open jannikseidelQBiC opened this issue 1 year ago • 3 comments

Hi and first, thanks for the great work.

I tried to run Hostile to get the filtered result files and the removed read-pairs (Illumina paired-end data as input). What caught my eye is that the two results do not match: reads_removed in the first output should be the same as reads_out in the second (and the other combination).

Mode reads_removed reads_out
no option 19870638 42475288
--invert 42896358 19449568
Difference to 'no option' 421070 -421070

The commands I used (installation of Hostile 1.1.0 via conda):

hostile clean --fastq1 <file_forward>.fq.gz --fastq2 <file_reverse>.fq.gz --out-dir filtered_1 > log1_filtered.log
hostile clean --fastq1 <file_forward>.fq.gz --fastq2 <file_reverse>.fq.gz --out-dir removed_1 --invert > log1_removed.log

It seams that running with the --invert flag does a different classification than without. Am I missing an option to set to get the same results?

Thanks in advance!

PS: Here are the log files.

[
    {
        "version": "1.1.0",
        "aligner": "bowtie2",
        "index": "human-t2t-hla",
        "options": [],
        "fastq1_in_name": "<file_forward>.fq.gz",
        "fastq1_in_path": "<path_to_files>/<file_forward>.fq.gz",
        "fastq1_out_name": "<file_forward>.clean_1.fastq.gz",
        "fastq1_out_path": "filtered_1/<file_forward>.clean_1.fastq.gz",
        "reads_in": 62345926,
        "reads_out": 42475288,
        "reads_removed": 19870638,
        "reads_removed_proportion": 0.31872,
        "fastq2_in_name": "<file_reverse>.fq.gz",
        "fastq2_in_path": "<path_to_files>/<file_reverse>.fq.gz",
        "fastq2_out_name": "<file_reverse>.clean_2.fastq.gz",
        "fastq2_out_path": "filtered_1/<file_reverse>.clean_2.fastq.gz"
    }
]
[
    {
        "version": "1.1.0",
        "aligner": "bowtie2",
        "index": "human-t2t-hla",
        "options": [
            "invert"
        ],
        "fastq1_in_name": "<file_forward>.fq.gz",
        "fastq1_in_path": "<path_to_files>/<file_forward>.fq.gz",
        "fastq1_out_name": "<file_forward>.clean_1.fastq.gz",
        "fastq1_out_path": "removed_1/<file_forward>.clean_1.fastq.gz",
        "reads_in": 62345926,
        "reads_out": 19449568,
        "reads_removed": 42896358,
        "reads_removed_proportion": 0.68804,
        "fastq2_in_name": "<file_reverse>.fq.gz",
        "fastq2_in_path": "<path_to_files>/<file_reverse>.fq.gz",
        "fastq2_out_name": "<file_reverse>.clean_2.fastq.gz",
        "fastq2_out_path": "removed_1/<file_reverse>.clean_2.fastq.gz"
    }
]

jannikseidelQBiC avatar Sep 09 '24 06:09 jannikseidelQBiC

Hi Jannik, thank you, this is interesting. From your data there certainly appears to be a problem with how --invert is implemented. By any chance are you able to send me some (or all) of your test data?

Bede

bede avatar Sep 09 '24 19:09 bede

Hi Bede, the dataset I cannot share. Could you try to reproduce the behavior with another dataset? If it depends on only this dataset this would be also highly interesting.

Best, Jannik

jannikseidelQBiC avatar Sep 11 '24 06:09 jannikseidelQBiC

Thank you – that's understandable. I will investigate using other data.

On Wed, 11 Sep 2024 at 07:39, Jannik Seidel @.***> wrote:

Hi Bede, the dataset I cannot share. Could you try to reproduce the behavior with another dataset? If it depends on only this dataset this would be also highly interesting.

Best, Jannik

— Reply to this email directly, view it on GitHub https://github.com/bede/hostile/issues/42#issuecomment-2342780609, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHWAAFC3GIBWGFGIMC7BRTZV7QSTAVCNFSM6AAAAABN3ZL4TKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBSG44DANRQHE . You are receiving this because you commented.Message ID: @.***>

bede avatar Sep 11 '24 06:09 bede

Please accept my apologies for the delay. I've reproduced and pushed a fix to be released in coming days. I had mistakenly assumed that samtools view -F 12 outputs the inverse of samtools view -f 12 in the case of paired reads. Now we use a Samtools filter expression for the inverted paired scenario using logical OR on the bitwise flags 4 and 8 rather than AND previously used incorrectly. This issue only affected --invert mode in the paired read case. A test case has been written. Thank you very much for catching this.

https://github.com/bede/hostile/commit/cc8a1010ac9e7b1f0a80042e0bb3cbbf05d1e30d

bede avatar Dec 13 '24 20:12 bede

Released in 2.0.0

bede avatar Dec 19 '24 17:12 bede