jbrowse-components
jbrowse-components copied to clipboard
UCSC data loading
UCSC is a great data resource, they do a lot of curation and such, but it is encoded in such a way that "not all the interesting data will be in a simple GTF"
An example of a non-GTF file that contains interesting curated info is their "kgXref" table. This shows the cross references to other databases http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/kgXref.txt.gz
It may be useful to work on continued UCSC compatibility efforts (the ucsc-to-json of jbrowse 1 was quite powerful) to download the data either statically or dynamically from their REST API
inspired by their blog post here http://genome.ucsc.edu/goldenPath/newsarch.html#090723
more stuff in "linked tables" beyond even the kgXref too...screenshot from their table browser
key value: uses "select fields from primary and related tables"
might try to create a new ucsc-to-json script following discussions from yesterday. the alternative is using their rest api but i think it will be slower and less reliable than bulk loading. will try to get a handle on how much data (gigabytes, etc) is used in the process
we now have a ucsc browser at http://s3.amazonaws.com/jbrowse.org/code/jb2/main/index.html?config=%2Fjbrowse.org%2Fdemos%2Fucsc%2Fconfig.json
it uses https://github.com/cmdcolin/ucsc2jbrowse to bulk load files
it can be improved on (including potentially things like the kgXref mentioned above to access a bunch of extra feature metadata)