buttery-eel
buttery-eel copied to clipboard
demultiplexing remora SAMs
Please add the following instructions to somewhere:
For FASTQ output from buttery-eel, we call guppy_barcoder which comes with the ONT Guppy package for demuxing. It seems that guppy_barcoder does not take uSAM as input. However, the following approach that converts uSAM to FASTQ (keeps the methylation information as name tags) can be used for demuxing.
#convert
samtools fastq -TMM,ML romara.mod.sam > remora.mods.fastq
#demux
guppy_barcoder <kit options> -i /dir/containing/remora.mods.fastq -s demuxed_out/ -x cuda:all
#Then you can use minimap2 with -y option to align these FASTQs
minimap2 -ax map-ont -y ref.fa demuxed_out/barcodex.mods.fastq | samtools sort - > barcodex.mods.bam
Note that if your remora.mods.fastq file is pretty large and your RAM is less than the size of fastq, Guppy_barcoder will run out of memory as it seem to load the whole fastq file to memory. To avoid this issue,we can split the big fastq file into smaller files as below:
#split the large fastq to smaller fastq containing 4000 reads in each
mkdir split_fastq/
split -l 16000 remora.mods.fastq --additional-suffix=.fastq split_fastq/
#call barcoder on that split fastq dir so it does not run out of RAM
guppy_barcoder <kit options> -i split_fastq/-s demuxed_out/ -x cuda:all
Some versions of Guppy barcoder seem to incorrectly use a space instead of a tab for separating runid and barcode tags, causing issues in downstream processing. So please fix your barcoded FASTQs as below, before using with tools such as Minimap2.
cat barcode04.test.fastq | sed 's/ runid/\trunid/g' | sed 's/ barcode/\tbarcode/g' > barcode04_fixed.test.fastq
minimap2 -ax map-ont /mnt/d/genome/hg38noAlt/hg38noAlt.idx barcode04_fixed.test.fastq -y | samtools sort - > barcode04.bam