gosling.js
gosling.js copied to clipboard
feat: Make BED v1 a primitive data format
Motivation
BED (Browser Extensible Data) format provides a flexible way to define the data lines that are displayed in an annotation track. It has recently been formalized in the v1 specification.
Gosling currently support BED via CSV, but it is quite verbose and users can define any field names they'd like for standard BED fields:
Specifying BED12+1 in Gosling as CSV
{
"type": "csv",
"url": "https://localhost:8080/data.bed",
"headerNames": ["chrom", "chromStart", "chromEnd", "name", "score", "strand", "thickStart", "thickEnd", "itemRgb", "blockCount", "blockSizes", "myField"],
"chromosomeField": "chrom",
"genomicFields": ["chromStart", "chromEnd"],
"quantitativeFields": ["score", "thickStart", "thickEnd", "blockCount"],
"separator": "\t"
}
Proposal
Add BED
as a new data-type in Gosling. BED
is designed for this exact use case, and should be the preferred format for representing text-based genomic annotation data (over a custom CSV capturing identical information). Using BED will make specifications less verbose and more reusable. Using BED has the additional side-effect of ensuring datasets behind a Gosling visualization are more likely to be interoperable with other genomics tools.
interface BED {
type: "bed";
url: string;
customFields?: string;
separator?: string;
}
Specifying BED12+1 in Gosling as CSV
{
"type": "bed",
"url": "https://localhost:8080/data.bed",
"customFields": ["myField"]
}
Thank you for creating this issue! This will be a helpful update to make our grammar more genomic-specific.
One quick clarification - By the length of customeFields
, we will infer the number of standard and custom fields, i.e., if the length is 1, then we consider the last column to be the custom one while the other fields are standard ones.
One quick clarification - By the length of
customFields
, we will infer the number of standard and custom fields, i.e., if the length is 1, then we consider the last column to be the custom one while the other fields are standard ones.
Yes exactly. We can determine BEDn+m from the custom fields alone (n = total # of columns - m). Custom fields can only follow standard fields, so the order of customFields
matters and the number of custom fields tells us how many of standard fields are present.
e.g.
For a TSV with 4 columns
{
"type": "bed",
"url": "https://localhost:8080/data.bed",
}
Interpretation is BED4 (chrom
, chromStart
, chromEnd
, score
)
{
"type": "bed",
"url": "https://localhost:8080/data.bed",
"customFields": ["custom"]
}
Interpretation is BED3+1 (chrom
, chromStart
, chromEnd
, custom
)
The final thing here is whether types need to be defined for the custom fields. This is similar to part of the discussion in #579, and I'd argue for a similar reason they are not necessary.
This is similar to part of the discussion in #579, and I'd argue for a similar reason they are not necessary.
I assume the custom fields will be either nominal
or quantitative
. If so, I agree with not requiring users to specify the field types.