Gemma icon indicating copy to clipboard operation
Gemma copied to clipboard

GEO sample characteristic format that evades our parser

Open ppavlidis opened this issue 1 year ago • 2 comments

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3443346

In this study, one of the annotations is like:

strain, sex: C57BL/6, male

(Why data submitters do this, in 2019, is a good question)

We seem to be misparsing that so that we have sex=C57BL/6 in the data set https://gemma.msl.ubc.ca/expressionExperiment/showExpressionExperiment.html?id=25256

I'm not sure we should really be trying to parse things like this, because it's not a supported format and who knows what else is going to collide with that, but wanted to note it as something to look at and see if we can avoid mangling it.

ppavlidis avatar Apr 12 '24 18:04 ppavlidis

I fixed that in the experiment so it won't be visible any more

ppavlidis avatar Apr 12 '24 18:04 ppavlidis

Here's another similar situation https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM604766

This is not really our fault: the submitter listed it as "gender" (do mice have genders?) = "ovarectomized female"

Just another example of how data submitters are not aware of how this information should be presented to make it actually useful for any computational work.

I'm going to leave that the way it is, for now.

ppavlidis avatar Apr 12 '24 18:04 ppavlidis