owmeta icon indicating copy to clipboard operation
owmeta copied to clipboard

Incorporate data from gene transcription atlas

Open slarson opened this issue 8 years ago • 4 comments

http://cole-trapnell-lab.github.io/projects/worm-cell-atlas/

There are many ways this could help. This should improve accuracy of several of the data sources, including neuropeptides, neuro receptors, and ion channels.

Direct link to the getting started page: http://atlas.gs.washington.edu/worm-rna/docs/

Tasks

Data types

  • [ ] add a Gene class with, at least, a "description" and a "name" field
  • [ ] add a means to provide non-DataSource parameters to a DataTranslator's translate method
  • [ ] add a means to generate distinct identifiers based on non-DataSource arguments to a DataTranslator's translate method

Make a DataTranslator that pulls information from the Cao et al. data set which

  • [ ] produces Gene instances for genes in the data set
  • [ ] produces relationships indicating expression of gene products for the neurons included in the data set. Simple "Neuron X expresses
  • [ ] accepts parameters specific to the kinds of analysis done in Cao et al., which influence the reported patterns of expression

slarson avatar Aug 18 '17 15:08 slarson

Downloaded the raw data.

The version of monocle referenced in the link is old -- I'm not sure how that'll affect usage with later versions, but I've downloaded the latest using instructions here: http://cole-trapnell-lab.github.io/monocle-release/docs/#installing-monocle

I made a cursory read over the docs. As a first cut, could query this dataset for channel or neuron data. Would likely have a sub-class generated for the Neuron and Channel to indicate the data's available.

In doing this work, Contexts should be kept in mind since this data should sit it its own context.

mwatts15 avatar Oct 18 '17 02:10 mwatts15

Couldn't install the latest 'monocle' due to a failure to build VGAM.

mwatts15 avatar Oct 18 '17 11:10 mwatts15

I installed everything in an AWS EC2 instance and had a look at some of the raw data, although you can get most of this from their 'vignette' as well.

The types of neurons included in the data set

> neuron.types
 [1] "AFD"              "ASEL"             "ASER"             "ASG"             
 [5] "ASI/ASJ"          "ASK"              "AWA"              "AWB/AWC"         
 [9] "BAG"              "CAN"              "Cholinergic (11)" "Cholinergic (15)"
[13] "Cholinergic (23)" "Cholinergic (24)" "Cholinergic (26)" "Cholinergic (29)"
[17] "Cholinergic (3)"  "Cholinergic (35)" "Cholinergic (36)" "Cluster 10"      
[21] "Cluster 13"       "Cluster 16"       "Cluster 17"       "Cluster 21"      
[25] "Cluster 25"       "Cluster 27"       "Cluster 40"       "Cluster 5"       
[29] "Dopaminergic"     "DVA"              "flp-1(+)"         "GABAergic"       
[33] "Pharyngeal (33)"  "Pharyngeal (37)"  "PVC/PVD"          "RIA"             
[37] "RIC"              "SDQ/ALN/PLN"      "Touch receptor"   "URX/AQR/PQR"  

It's needed to map most of these to the standard names we typically use, like RIAL. I'm not sure yet what the indexes mean in the "Cholinergic (...)".

mwatts15 avatar Jun 30 '18 20:06 mwatts15

This may be useful in deciding how to implement the translator: https://stackoverflow.com/questions/5630441/how-do-rpy2-pyrserve-and-pyper-compare

It may also be an option to use R in batch mode to just run a script to produce the data (or maybe littler)

mwatts15 avatar Jun 30 '18 20:06 mwatts15