kmermaid icon indicating copy to clipboard operation
kmermaid copied to clipboard

Add option to make per-cell bam files

Open olgabot opened this issue 4 years ago • 2 comments

Currently, if one wants to count reads with differential hashes, in genes, one needs to grep/search the ENTIRE 22-gigabyte channel bam file for one single cell (out of ~700,000), which is extremely inefficient. So let's do this work up fron. After filtering for the good barcodes, then add the option to create per-cell bam files which are useful for nf-predictorthologs.

script:
barcode_pattern = "CB:Z:${cell_barcode}-1|XC:Z:${cell_barcode}" 
"""
samtools view ${channel_bam} \\
  | rg --threads ${task.cpus}  '${barcode_pattern}' - \\
  | cat ${header_sam} - \\
  | samtools view -Sb > ${cell_barcode_bam}
"""

@lekhakaranam may be a good feature to add after the template merge (#93 )

olgabot avatar Aug 17 '20 18:08 olgabot

looks like this PR was opened but closed a while ago - https://github.com/nf-core/kmermaid/pull/97

pranathivemuri avatar Oct 20 '20 21:10 pranathivemuri

Oh yeah I think there were some merge/rebase issues

olgabot avatar Oct 21 '20 18:10 olgabot