GoogleGenomics
GoogleGenomics copied to clipboard
Update Bioconductor converters to handle multi-allelic data and data with non-variant segments.
The variant converters currently have trouble with multi-allelic data and non-variant segments. We can work around this by filtering and reshaping the data before sending it to the converters (example below) but it would be better if we pushed correct handling for this into the package.
See also https://github.com/Bioconductor/GoogleGenomics/issues/32 for another example of how non-variant segments can also be expressed in the data.
variants <- getVariants(datasetId="10473108253681171589", chromosome="22",
start=50300077, end=50301500)
# Remove non-variant segments
only_variants <- Filter(function(v) { 1 <= length(v$alternateBases)}, variants)
# Convert to biallelic data by truncating alternateBases
# (this isn't how it should be fixed, its just an example)
biallelic_variants <- lapply(only_variants, function(v) {
if(1 < length(v$alternateBases)) {
v$alternateBases = v$alternateBases[[1]]
}
v
})
granges <- variantsToGRanges(biallelic_variants)