airr-standards
airr-standards copied to clipboard
Question around study_group_description
Again, from Emily, sparked by our discussion at the Vocab/Ontology meeting...
We have been treating it solely as case and control, but never actively defined "case" - generally the studies we work this have a single case that can be described in diagnosis, etc. Essentially, should we change this and define case and simply refer to a control as a control? Or should we alter it so that we define the two more effectively. Eg. study on SLE, case is patients with flares, control is patients without flares. Would we want this defined specifically in study_group_description?
Maybe this is a discussion for the Vocab/Ontology diagnosis group?
I vote for the more detailed definitions because that is the only way I think that the samples can be effectively used for a variety of meta-analyses.
@bcorrie what is the current status of this issue? We are planing to represent the study_group_description in our backend DB, but were are bit puzzled, as we considered it to be a property of the Subject not of the Diagnosis (where it is currently located).
@bussec this has not progressed. There seem to be two separate questions here:
- Can we come up with a "controlled vocabulary" or an ontology to describe "Designation of study arm to which the subject is assigned to". That is the question that this issue is related to.
- What is the correct AIRR object for this information to be associated with? Currently it is part of diagnosis.
We have come up with our own "semi-controlled vocabulary" for this field that we use in our curation process, ao it makes it possible to find "Case/Control" as well as "Healthy" subjects if you know the controlled vocabulary. This is unsatisfactory. 8-)
I have little experience in study design, so I am not sure which AIRR object this belongs to... Subject seems a bit limiting to me, hence why I think maybe it is in diagnosis??? Can you conceive of a study where you had two different samples from the same subject and one was a "Control" and the other "Case". For example, healthy tissue being a Control and diseased tissue (e.g a tumor receiving some treatment) a Case? Or two diseased tissue samples from a subject and one tissue receiving some sort of intervention and the other not???
@bcorrie Some thoughts on this:
- There are situations conceivable in which two samples taken from the same subject at the same time point would belong to different "case"/ "control" groups, e.g., radiation or embolization protocols. But for immunology these things will be rare, if they exists at all. Also, this is what we have
Sample.disease_state_samplefor :wink: . - It seems to me like this is a "relative-absolute" problem and in the end we need both types of information:
disease_diagnosisdescribes the absolute state whilestudy_group_descriptiondefines the relative position within the cohort.
- Therefore, as you already wrote,
study_group_description=Controlis not helpful when you are looking for healthy controls. This can only be captured indisease_diagnosis. - As DOID does not seem to contain a concept for "no apparent disease", we could either:
- make a term request (maybe it just never occurred to the maintainers) or
- introduce a boolean property
Subject.healthy
- As DOID does not seem to contain a concept for "no apparent disease", we could either:
- make a term request (maybe it just never occurred to the maintainers) or
- introduce a boolean property
Subject.healthy
I've asked IEDB how they handle this as it might provide some guidance.
Should a healthy field be at the Subject level. What about a sample from healthy tissue versus disease tissue? Should this be subjcet.diagnosis.healthy instead?
And I wouldn't think that subject.diagnosis.study_group_description == Control and subject.healthy == true (or even subject.diagnosis.healthy = true would necessarily mean a healthy control would it? You could certainly have that state when the study did not have a Control (Healthy) study group.
It kind of feels to me like study_group_description could use some refinement. Almost like we need an additional field (or two) that describes the details of the study groups. study_group_description could be a controlled vocabulary (Case, Control) but then maybe we need a field (e.g. in subject.diagnosis ) that states a qualifier/keyword to Case/Control that explicitly says that the sample belongs to a study design subgroup. For example, subject.diagnosis.study_design_keywords = [Healthy] or subject.diagnosis.study_design_keywords = [Healthy, Vaccinated]
If we have some controlled vocabulary terms (e.g. Healthy) for the keywords, but allow researchers to add their own, that would cover most of the bases and in particular allow us to look for healthy controls (subject.diagnosis.study_group_description == Control and subject.diagnosis.study_design_keywords = [Healthy])
As DOID does not seem to contain a concept for "no apparent disease", we could either:
- make a term request (maybe it just never occurred to the maintainers) or
- introduce a boolean property
Subject.healthyI've asked IEDB how they handle this as it might provide some guidance.
From Randi @ IEDB:
we use an internal identifier that we coined healthy ONTIE [ONTIE:0003423] we use "host health status" as the highest node and integrate disease ontology terms, healthy, infection without disease, and animal models of disease into a single owl file/tree view
Should a
healthyfield be at theSubjectlevel. What about a sample from healthy tissue versus disease tissue? Should this besubjcet.diagnosis.healthyinstead?
Yes, exactly ;-D It all depends by what you mean, "healthy control", which is ambiguous and may be different based upon the analysis being performed. It is certainly reasonable that a subject, designated as "healthy", would let you consider all samples from that subject as potential healthy controls.
However, it's become very common in cancer studies to collect a tumor sample but also collect an adjacent healthy tissue sample for comparative analysis. In this case, the subject is not healthy (as they have cancer), but that adjacent tissue is considered a healthy control for analysis purposes.
Which is all quite different from a clinical trial with one set of subjects designated as "Case" and given a treatment, and another set designated as "Control" without treatment, but in both sets the subjects are not "healthy".
Link to ONTIE: https://ontology.iedb.org/ontology
Note: high overlap with #516
@javh I think this should be an AIRR 2.0 issue no? The limitation is that there is no mechanism in the AIRR Spec to designate a healthy control.