nidm-specs icon indicating copy to clipboard operation
nidm-specs copied to clipboard

Modelling of p-values and re-use from STATO (OBI)

Open cmaumet opened this issue 9 years ago • 5 comments

We initially planned to re-use STATO terms for:

  • p-value
  • FWE-corrected p-value
  • FDR p-value

But looking more closely at STATO (OBI), those terms are defined as classes while we use them as properties so that we can not re-use STATO terms directly with our current model (cf. ISA-tools/stato#38 for more details).

So... I think we have two options:

  1. Keep our current model and create nidm terms for p-values (i.e. do not re-use STATO terms), e.g. for a nidm:HeightThreshold:

      niiri:my_height_threshold a nidm:HeightThreshold ;
                  nidm:pValueFWER "0.05"^^xsd:float ;
                  nidm:pValueUncorrected "7.62e-07"^^xsd:float ;
                  prov:value "5.23"^^xsd:float . # corresponding statistic value
    
  2. Modify our model so that the STATO p-value classes can be used, e.g. for a nidm:HeightThreshold we could have:

    niiri:my_height_threshold_1 a nidm:HeightThreshold ;
                nidm:hasPValue niiri:my_fwer_p_value ;
                nidm:hasPValue niiri:my_uncorrected_p_value^^xsd:float ;
                prov:value "5.23"^^xsd:float . # corresponding statistic value
    
    niiri:my_fwer_p_value a obi:'FWER p-value' ;
                prov:value "0.05"^^xsd:float . 
    
    niiri:my_uncorrected_p_value a obi:'uncorrected p-value' ;
                prov:value "7.62e-07"^^xsd:float . 
    
Pros and cons
  • Option 1 provides a more condensed representation and the corresponding queries are hence shorter (cf. example below).

But:

  • With option 1, we are unable to re-use the STATO terms which align very closely with our needs. This also means that we will have to come up with our own definitions (of the same concepts...).
Queries

This is how a query searching for an FWER p-value height threshold would look like (written with semantic identifiers just for readability):

  • Option 1:
    SELECT ?p_corr_fwe WHERE { 
    ?height a nidm:HeightThreshold: .
    ?height nidm:pValueFWER ?p_corr_fwe.
    }
    
  • Option 2:
    SELECT ?p_corr_fwe WHERE { 
    ?height a nidm:HeightThreshold: .
    ?height nidm:hasPValue ?pfwer_entity .
    ?pfwer_entity a obi:'FWER p-value' .
    ?pfwer_entity prov:value ?p_corr_fwe .
    }
    
Discussion

This is quite an important point as it affects our data model directly. Could you let me know what would be your preference between option 1 and option 2?

I tend to think that option 2 is the right way to go (as we agreed to re-use as many STATO terms as we can) but this means quite an update in the structure of the model.

cmaumet avatar Apr 16 '15 08:04 cmaumet

I like option 2 and defining p values as classes in general, so that the correction method (for FWE and FDR) can be also modelled as attributes.

tiborauer avatar Apr 16 '15 09:04 tiborauer

I see the conceptual elegance of option 2, but am wary about the amount of work needed to implement & query complexity. I hope @gllmflndn will weigh in on this, as he'll have to implement it for SPM NIDM export; also, I wonder if @satra and/or @chrisfilo has any insight on the speed of SPARQL queries, and whether Option 2 represents a negligible or appreciable increase in query complexity over Option 1.

About re-use of STATO terms: With option 1, no, we can't directly re-use them, but when the concept is the same, we can directly reference the STATO term in our definition.

nicholst avatar Apr 16 '15 09:04 nicholst

I also like option 2 and think that referencing the STATO term in our definition is basically defeating the purpose of creating a model like this since that information is then an "orphan" - you don't get any benefit from it semantically.

khelm avatar Apr 17 '15 15:04 khelm

It makes export implementation and SPARQL queries slightly more complicated, but if option 2 is the way to go, so be it...

gllmflndn avatar Apr 17 '15 16:04 gllmflndn

It does sound like a rather big change - I suppose there is no way to turn the class into a property - and if that's indeed the case I would agree that we have to go all the way and do it. Nolan, Satra, Dave, Jessica ... any thought / comment ? I know that queries will not be written by most researchers - but it does look a little heavy, so a double brain storming would be good !

jbpoline avatar Apr 17 '15 16:04 jbpoline