ComputationalGenomicsManual icon indicating copy to clipboard operation
ComputationalGenomicsManual copied to clipboard

Filtering host reads

Open chrissy005 opened this issue 6 months ago • 1 comments

Hello, I was attempting the following codes as you described to filter out host sequences:.

"host sequences: mkdir host not_host samtools fastq -F 3588 -f 65 output.bam | gzip -c > host/output_S_R1.fastq.gz echo "R2 matching host genome:" samtools fastq -F 3588 -f 129 output.bam | gzip -c > host/output_S_R2.fastq.gz

sequences that are not host: samtools fastq -F 3584 -f 77 output.bam | gzip -c > not_host/output_S_R1.fastq.gz samtools fastq -F 3584 -f 141 output.bam | gzip -c > not_host/output_S_R2.fastq.gz samtools fastq -f 4 -F 1 output.bam | gzip -c > not_host/output_S_Singletons.fastq.gz"

I am new to samtools and do not understand the -F and -f flags as well as the integers that follow them. Do these determine which sequences are host and non-host?

chrissy005 avatar Aug 08 '24 07:08 chrissy005