misc
misc copied to clipboard
gen_raw_mask.pl truncating fasta file
Hello,
-
I have a chromosome (Chr1) that is 196345723 bp in length.
-
psmc/utils/./splitfa Sept22Assembly.fasta 35 | split -l 20000000 #This step produced three files and I confirmed the correct end to chromosome 1 in xaa #kmer fasta headers are like so
Chr1_1 AGAGTGGTGGGGACAAGGCTCAGAGCCTGAACTGA
cat xaa xab xac > xxaa #I also confirmed the correct end to chromosome 1 in xxaa
- bwa aln -R 1000000 -O 3 -E 3 Sept22Assembly.fasta xxaa > xxaa.sai bwa samse Sept22Assembly.fasta xxaa.sai xxaa > aln-se.sam
#I viewed mapped reads for Chr1 and find that there are sites mapping well up into 196000000
- perl seqbility-20091110/gen_raw_mask.pl aln-se.sam > rawMask_35.fa
All errors propogate after this step where my chromosome size is now only 5 million bp.
I also find that all representative sites for Chr1 stop at 5609912 which is eerily similar to the last kmer fasta header for Chr1 (>Chr1_5609878).
Any help here would be greatly appreciated,
Nicolas
I should note that I'm working with two other groups that have been reporting the same errors with truncation.