guacamole icon indicating copy to clipboard operation
guacamole copied to clipboard

Potential mapping error filter

Open hammer opened this issue 9 years ago • 0 comments

From the methods section of "Recurrent somatic mutations in regulatory regions of human cancer genomes":

Filtering out false positives from mapping errors and SNPs. SNPs from dbSNP Build 141 were downloaded from the UCSC Genome Browser Table Browser. Called mutations that had the same chromosomal position and variant allele as a common SNP were filtered out. Predicted mapping errors were determined by querying BLAT36 with a 201-bp region centered on the genomic position of the variant. Notably, the variant allele was used in place of the reference allele for this analysis. A score between 0 and 100 was given on the basis of the length of the longest aligned region for a given BLAT result that included a match of up to100 bp in length to the reference genome such that the reference allele for the matched genomic region matched the called variant allele. A 201-bp window was chosen because it should be sufficiently long to cover all potential overlapping reads, as mapped read sizes are typically smaller than 100 bp. For the analysis using 10-bp windows, a regional score was generated by averaging the scores of all the mutations contained within the region. Regions with an average score of greater than 50 were filtered out as potential false positives. The 1000 Genomes Project hs37d5 reference was used for BLAT searches. The analysis in Supplementary Figure 1 of overlap between the filtered-out regions and difficult-to-align regions of the genome was performed using 50-mer alignability tracks. Any mutation with a score of 0.5 or less was considered difficult to align.

hammer avatar Jun 10 '15 04:06 hammer