Hostile with no options classifying different than --invert
Hi and first, thanks for the great work.
I tried to run Hostile to get the filtered result files and the removed read-pairs (Illumina paired-end data as input). What caught my eye is that the two results do not match:
reads_removed in the first output should be the same as reads_out in the second (and the other combination).
| Mode | reads_removed | reads_out |
|---|---|---|
| no option | 19870638 | 42475288 |
--invert |
42896358 | 19449568 |
| Difference to 'no option' | 421070 | -421070 |
The commands I used (installation of Hostile 1.1.0 via conda):
hostile clean --fastq1 <file_forward>.fq.gz --fastq2 <file_reverse>.fq.gz --out-dir filtered_1 > log1_filtered.log
hostile clean --fastq1 <file_forward>.fq.gz --fastq2 <file_reverse>.fq.gz --out-dir removed_1 --invert > log1_removed.log
It seams that running with the --invert flag does a different classification than without. Am I missing an option to set to get the same results?
Thanks in advance!
PS: Here are the log files.
[
{
"version": "1.1.0",
"aligner": "bowtie2",
"index": "human-t2t-hla",
"options": [],
"fastq1_in_name": "<file_forward>.fq.gz",
"fastq1_in_path": "<path_to_files>/<file_forward>.fq.gz",
"fastq1_out_name": "<file_forward>.clean_1.fastq.gz",
"fastq1_out_path": "filtered_1/<file_forward>.clean_1.fastq.gz",
"reads_in": 62345926,
"reads_out": 42475288,
"reads_removed": 19870638,
"reads_removed_proportion": 0.31872,
"fastq2_in_name": "<file_reverse>.fq.gz",
"fastq2_in_path": "<path_to_files>/<file_reverse>.fq.gz",
"fastq2_out_name": "<file_reverse>.clean_2.fastq.gz",
"fastq2_out_path": "filtered_1/<file_reverse>.clean_2.fastq.gz"
}
]
[
{
"version": "1.1.0",
"aligner": "bowtie2",
"index": "human-t2t-hla",
"options": [
"invert"
],
"fastq1_in_name": "<file_forward>.fq.gz",
"fastq1_in_path": "<path_to_files>/<file_forward>.fq.gz",
"fastq1_out_name": "<file_forward>.clean_1.fastq.gz",
"fastq1_out_path": "removed_1/<file_forward>.clean_1.fastq.gz",
"reads_in": 62345926,
"reads_out": 19449568,
"reads_removed": 42896358,
"reads_removed_proportion": 0.68804,
"fastq2_in_name": "<file_reverse>.fq.gz",
"fastq2_in_path": "<path_to_files>/<file_reverse>.fq.gz",
"fastq2_out_name": "<file_reverse>.clean_2.fastq.gz",
"fastq2_out_path": "removed_1/<file_reverse>.clean_2.fastq.gz"
}
]
Hi Jannik, thank you, this is interesting. From your data there certainly appears to be a problem with how --invert is implemented. By any chance are you able to send me some (or all) of your test data?
Bede
Hi Bede, the dataset I cannot share. Could you try to reproduce the behavior with another dataset? If it depends on only this dataset this would be also highly interesting.
Best, Jannik
Thank you – that's understandable. I will investigate using other data.
On Wed, 11 Sep 2024 at 07:39, Jannik Seidel @.***> wrote:
Hi Bede, the dataset I cannot share. Could you try to reproduce the behavior with another dataset? If it depends on only this dataset this would be also highly interesting.
Best, Jannik
— Reply to this email directly, view it on GitHub https://github.com/bede/hostile/issues/42#issuecomment-2342780609, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHWAAFC3GIBWGFGIMC7BRTZV7QSTAVCNFSM6AAAAABN3ZL4TKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBSG44DANRQHE . You are receiving this because you commented.Message ID: @.***>
Please accept my apologies for the delay. I've reproduced and pushed a fix to be released in coming days. I had mistakenly assumed that samtools view -F 12 outputs the inverse of samtools view -f 12 in the case of paired reads. Now we use a Samtools filter expression for the inverted paired scenario using logical OR on the bitwise flags 4 and 8 rather than AND previously used incorrectly. This issue only affected --invert mode in the paired read case. A test case has been written. Thank you very much for catching this.
https://github.com/bede/hostile/commit/cc8a1010ac9e7b1f0a80042e0bb3cbbf05d1e30d
Released in 2.0.0