gosling.js
gosling.js copied to clipboard
API change (CsvData): A single way to define genomic fields
Background
Genomic positions are often defined in terms of a chromosome and a chromosome position (chrom
, chromPos
).
In order to show multiple chromosomes on the same linear or circular axis, however, these chromosome positions need to be converted to "absolute" position, based on some fixed ordering of the chromosomes. For the human genome, there is a conventional ordering of the chromosomes.
To convert a relative genomic position to an absolute genomic position, you need to know the order of the chromosomes and the size of each chromosome. This information is used to compute the absolute position from a given (chrom
, chromPos
) pair. For example, the absolute position of position 200 on chromosome 2 would be the length of chromosome 1, plus 200 len(chrom1) + 200
.
Given that a CSV file contains chromosome fields and chromosome position fields, there needs to be some way of associating the right pairs together such that Gosling can calculate the correct absolute position.
Current API
Currently there are two different ways to associate chromosome fields with chromosome position fields, depending on the number of chromosome fields.
Single chromosome field
Most CSV files will have a single chromosome field and one or more position fields. In the below example, we want Gosling to use the CHROM
and POS
column to determine the absolute position.
CHROM POS
chr2 100
chr2 200
chr3 150
data: {
url: 'my_csv.csv',
chromosomeField: 'CHROM',
genomicFields: ['POS']
}
Multiple chromosome fields
There are more complex CSV files that have multiple chromosome fields.genomicFieldsToConvert
is a way to associate different position fields with different chromosome fields.
CHROMa POSa CHROMb POSb
chr2 100 chr3 120
chr2 200 chr1 700
chr3 150 chr2 200
data: {
url: 'my_csv.csv',
genomicFieldsToConvert: [{
chromosomeField: "CHROMa"
genomicFields: ["POSa"]
},
{
chromosomeField: "CHROMb"
genomicFields: ["POSb"]
}]
}
Proposed change: A single way to define genomic fields
Rather than having different ways to define these two use cases, we would like to have a single way to associate the chromosome fields with the chromosome position fields.
Option 1: Keep current way to define multiple chromosomes together
@sehilyi
The explicit use of key names (e.g., chromosomeField ), while can result in an error, makes it clear what that is for to users and is little more consistent to other parts of the grammar
"genomicFieldsToConvert": [
genomicFieldsToConvert: [{
chromosomeField: "CHROMa"
genomicFields: ["POSa"]
},
{
chromosomeField: "CHROMb"
genomicFields: ["POSb"]
}]
]
Option 2: Represent chromosome name and positions as key:value pairs
Proposed by @manzt
"genomicFieldsToConvert": {
"CHROMa": ["POSa"],
"CHROMb":["POSb"]
}
Another side-effect of this design is that if chromosomeField is mutually exclusive with others when there are multiple, then a map makes an invalid state un-representable (whereas we would need to handle duplicates in a list).
Option 3: A data transform
Proposed by @sehilyi
Rather than the relative to absolute data transform be implicit inside of Gosling, it could be made more explicit to the user. The user could configure a data transform which creates a new field that is the absolute chromosomal position. This option is probably the most verbose but also the most flexible.
dataTransform: [{
{ "type": "relToAbsCoordinates", "chromosomeField": "CHROMa", "genomicField": "POSa", "newField": "POSa_absolute" },
{ "type": "relToAbsCoordinates", "chromosomeField": "CHROMb", "genomicField": "POSb", "newField": "POSb_absolute" }
}]