GoogleGenomics icon indicating copy to clipboard operation
GoogleGenomics copied to clipboard

Update Bioconductor converters to handle multi-allelic data and data with non-variant segments.

Open deflaux opened this issue 10 years ago • 0 comments

The variant converters currently have trouble with multi-allelic data and non-variant segments. We can work around this by filtering and reshaping the data before sending it to the converters (example below) but it would be better if we pushed correct handling for this into the package.

See also https://github.com/Bioconductor/GoogleGenomics/issues/32 for another example of how non-variant segments can also be expressed in the data.

  variants <- getVariants(datasetId="10473108253681171589", chromosome="22",
              start=50300077, end=50301500)

  # Remove non-variant segments
  only_variants <- Filter(function(v) { 1 <= length(v$alternateBases)}, variants)

  # Convert to biallelic data by truncating alternateBases
  # (this isn't how it should be fixed, its just an example)
  biallelic_variants <- lapply(only_variants, function(v) {
    if(1 < length(v$alternateBases)) {
      v$alternateBases = v$alternateBases[[1]]
    }
    v
  })

  granges <- variantsToGRanges(biallelic_variants)

deflaux avatar Jul 28 '15 23:07 deflaux