misc icon indicating copy to clipboard operation
misc copied to clipboard

gen_raw_mask.pl truncating fasta file

Open NicMAlexandre opened this issue 3 years ago • 1 comments

Hello,

  1. I have a chromosome (Chr1) that is 196345723 bp in length.

  2. psmc/utils/./splitfa Sept22Assembly.fasta 35 | split -l 20000000 #This step produced three files and I confirmed the correct end to chromosome 1 in xaa #kmer fasta headers are like so

Chr1_1 AGAGTGGTGGGGACAAGGCTCAGAGCCTGAACTGA

cat xaa xab xac > xxaa #I also confirmed the correct end to chromosome 1 in xxaa

  1. bwa aln -R 1000000 -O 3 -E 3 Sept22Assembly.fasta xxaa > xxaa.sai bwa samse Sept22Assembly.fasta xxaa.sai xxaa > aln-se.sam

#I viewed mapped reads for Chr1 and find that there are sites mapping well up into 196000000

  1. perl seqbility-20091110/gen_raw_mask.pl aln-se.sam > rawMask_35.fa

All errors propogate after this step where my chromosome size is now only 5 million bp.

I also find that all representative sites for Chr1 stop at 5609912 which is eerily similar to the last kmer fasta header for Chr1 (>Chr1_5609878).

Any help here would be greatly appreciated,

Nicolas

NicMAlexandre avatar Sep 29 '21 21:09 NicMAlexandre

I should note that I'm working with two other groups that have been reporting the same errors with truncation.

NicMAlexandre avatar Sep 29 '21 21:09 NicMAlexandre