ga4gh-schemas icon indicating copy to clipboard operation
ga4gh-schemas copied to clipboard

Biosample tissue site

Open david4096 opened this issue 8 years ago • 23 comments

Biosamples are normally from a tissue sample that can be named. This data should be represented as a named field of a Biosample message, i.e. tissue_type. I suppose one might use something like the Foundational Model of Anatomy to restrict the vocabulary. @mbaudis @nishill

david4096 avatar Sep 07 '16 21:09 david4096

I am for solving this through an OntologyTerm object, as part of a general BioFeatures collection object which contains information private to the BioSample. Others would be histology, anatomic location, associated disease (of the ->tissue<- as a somatic phenotypic variation...). We seem to address this in discussions happening at the moment in the MTT, and on documents such as BioFeature and use cases documents.

mbaudis avatar Sep 08 '16 07:09 mbaudis

+1, adding suggestion to use Uberon for anatomical parts (it does map to FMA amongst others)

mcourtot avatar Sep 08 '16 09:09 mcourtot

here's a snippet of how TCGA classifies tissues. the total number of combinations is 188:

mysql> select sampletype, tissue_anatomic_site, tissue_anatomic_site_description, count(*) ct 
from metadata_biospecimen 
where tissue_anatomic_site in ('Head/Neck', 'Eye', 'Colon') 
group by sampletype, tissue_anatomic_site, tissue_anatomic_site_description;
+---------------------+----------------------+----------------------------------+-----+
| sampletype          | tissue_anatomic_site | tissue_anatomic_site_description | ct  |
+---------------------+----------------------+----------------------------------+-----+
| Primary solid Tumor | Colon                | NULL                             |   2 |
| Primary solid Tumor | Colon                | Ascending Colon                  |  98 |
| Primary solid Tumor | Colon                | Cecum                            | 110 |
| Primary solid Tumor | Colon                | Descending Colon                 |  22 |
| Primary solid Tumor | Colon                | Hepatic Flexure                  |  24 |
| Primary solid Tumor | Colon                | Splenic Flexure                  |   7 |
| Primary solid Tumor | Colon                | Transverse Colon                 |  38 |
| Primary solid Tumor | Eye                  | Choroid                          |  64 |
| Primary solid Tumor | Eye                  | Ciliary body                     |  15 |
| Primary solid Tumor | Eye                  | Iris                             |   1 |
| Primary solid Tumor | Head/Neck            | NULL                             |   4 |
| Primary solid Tumor | Head/Neck            | Alveolar                         |  17 |
| Primary solid Tumor | Head/Neck            | Base of the Tongue               |  27 |
| Primary solid Tumor | Head/Neck            | Buccal Mucosa                    |  23 |
| Primary solid Tumor | Head/Neck            | Floor of Mouth                   |  62 |
| Primary solid Tumor | Head/Neck            | Hypopharynx                      |  10 |
| Primary solid Tumor | Head/Neck            | Larynx, NOS                      | 116 |
| Primary solid Tumor | Head/Neck            | Lip                              |   3 |
| Primary solid Tumor | Head/Neck            | Oral Cavity, NOS                 |  74 |
| Primary solid Tumor | Head/Neck            | Oral Tongue                      |  83 |
| Primary solid Tumor | Head/Neck            | Oropharynx                       |   9 |
| Primary solid Tumor | Head/Neck            | Palate, Hard                     |   7 |
| Primary solid Tumor | Head/Neck            | Tongue, NOS                      |  50 |
| Primary solid Tumor | Head/Neck            | Tonsil                           |  45 |
+---------------------+----------------------+----------------------------------+-----+
24 rows in set (0.02 sec)

mdmiller53 avatar Sep 08 '16 15:09 mdmiller53

Thanks! This thread was started specifically to address how to model the TCGA data! Is there a minimum change to the biosample message that would allow this? Perhaps we might add tissue type as an ontology term from Uberon?

david4096 avatar Sep 08 '16 23:09 david4096

There seems to be consent among the MTT to

  • rename the diagnosis attribute in BioSample to something more neutral, a.k.a. SampleCharacteristics, SampleFeatures, SampleOntologies...
  • change the value to be a wrapper around multiple OntologyTerm objects, or just a list of OTs

This would then address tissue type etc, since this could just be an Uberon OT.

There is discussion of this in the BioFeature document - see esp. page 2. In the case of BioSample, we can go a less nested route than the abstraction discussed there since the time attributes & description are already there and we probably can live with a single set of terms. So: Implement the bullet points above = practical solution.

(We have drafted a large list of use cases to identify what else is needed esp. for BioSample / Individual; this will soon be addressed piecemeal...).

mbaudis avatar Sep 09 '16 06:09 mbaudis

@david4096 Please have a look at https://github.com/ga4gh/schemas/pull/710 (sorry for the committ mess...).

mbaudis avatar Sep 09 '16 13:09 mbaudis

It works for me, I think if we go with a tag-bag approach a logical OR would satisfy most use cases.

david4096 avatar Sep 09 '16 18:09 david4096

Thanks @david4096. Other comments/votes, please; @mcourtot, @mdmiller53, @sarahhunt ...?

mbaudis avatar Sep 09 '16 18:09 mbaudis

yes, the changes in #710 look fine to me. still minorly concerned whether OntologyTerm needs to be richer (i.e. allow qualifying OntologyTerms and OntologyValues) but that's a different discussion and will undoubtedly be driven by use cases

mdmiller53 avatar Sep 09 '16 19:09 mdmiller53

@mdmiller53 Thanks; and regarding the qualifiers, the point is certainly well taken. But options are either in the OT, or in the wrapper object. The structure there is not immediately obvious; wrapper seems more sane (e.g. you qualify your diagnostic call, and the OTs are only abstractions of this); but this wouldn't fit very well here where the wrapper is basically a first level object, and the characteristics heterogeneous. Still best solution IMO would be akin the "Biofeatures" (any name allowed) list containing "Biofeature" wrappers etc.; see the document. But this will be a separate issue.

mbaudis avatar Sep 09 '16 20:09 mbaudis

So pls. vote/comment on https://github.com/ga4gh/schemas/issues/711 now.

mbaudis avatar Sep 09 '16 20:09 mbaudis

#711 got closed in favor of #725 so pls. vote/comment on that one now :-)

kozbo avatar Nov 14 '16 22:11 kozbo

so minorly confused here, are we considering this issue or #725 or both for voting?

mdmiller53 avatar Nov 14 '16 23:11 mdmiller53

@mdmiller53 comment from #711 : "Following the discussions at Vancouver: Closing this in favour #725." #725 implements this issue.

kozbo avatar Nov 15 '16 00:11 kozbo

@mdmiller53 I'll close this. https://github.com/ga4gh/schemas/pull/725 (which was merged into metadata-integration branch) defines this as being covered through better definition of OntologyTerms (termId + termLabel, and URI provided through a service), and these being represented through Biocharacteristic-type phenotypes and diseases.

mbaudis avatar Nov 15 '16 19:11 mbaudis

Reopening this as we have no way to track that this fix isn't merged to master yet without this issue. So will close once the metadata-integration branch is merged into master.

kozbo avatar Nov 17 '16 01:11 kozbo

@mbaudis with the characteristics we have improved the granularity of describing phenotypes, however, following on our conversations Monday, I believe we need to add a tissue_site that allows one to state using an ontology term where a sample was taken from.

When both a tumor and healthy sample have been derived from an individual, there should be a clear field that states where the sample was taken from in either case.

david4096 avatar Feb 21 '17 23:02 david4096

@david4096 My take here:

  • a biosample will have only one value for the site; however
  • tissue_site is overly verbose; consider e.g. environmental samples, metagenomic (i.e. from some anatomical site but not representing human tissue)
  • so site is just one item to characterise the sample's provenance

In principle, one could go with the way we discussed - you can have everything in a list of Biocharacteristics. However, it may be better to have a similar Provenance collection, which could contain multiple Characteristic objects describing different aspects of the sample's origin.

mbaudis avatar Feb 22 '17 01:02 mbaudis

Maybe something like sample_source? I think having a collection would be helpful, as we could have cell lines derived from specific tissues for example, and we'd want to capture both in a characteristic object, e.g.

sample_source: [    {
         description: “breast carcinoma cell line”,
         repeated OntologyTerm ontologyTerms: [
             {
               term_id:  “CLO:0009468”,
               term_label:  “UACC-893 cell”,
             },
             {
               term_id:  “EFO:0000305”,
               term_label:  “breast carcinoma”,
             },
             {
             term_id: “UBERON:0000310”,
             term_label:  “breast”,
              },
      ],
     } ]

I'm not sure if we'd want to name nested attributes, for example in this case "cell_line", "disease", "organism_part". Not naming them makes for an easier schema, but slightly less precise query.

mcourtot avatar Feb 22 '17 09:02 mcourtot

@david4096 @mcourtot Yes, and in fact in arrayMap we use "SAMPLESOURCE" for similar purposes (cell line, metastasis::liver ...).

So if somebody wants to craft a PR for this ...

mbaudis avatar Feb 22 '17 16:02 mbaudis

PR created - I think changes required are fairly minimal as we already have the Biocharacteristic objects, but please review!

mcourtot avatar Feb 22 '17 16:02 mcourtot

for consistency, it might be good to have a companion best practice documentation for the different types. for instance 'for cell lines, these are the recommended ontology fields, for metagenomics ..., for human ..., etc.', perhaps based on minimum information standards where appropriate

mdmiller53 avatar Feb 22 '17 17:02 mdmiller53

@mdmiller53 Yes, exactly. There are many things where documentation will be a very important element of efficient use.

mbaudis avatar Feb 22 '17 17:02 mbaudis