qiita icon indicating copy to clipboard operation
qiita copied to clipboard

Add functionality for changing metadata column types

Open justinshaffer opened this issue 4 years ago • 12 comments

Hello,

I'm finding the need to update metadata column types in some of my analyses. For example, changing one variable that is interpreted as 'numeric' to 'categorical'.

Specifically, I am testing for the influence of 'host_subject_id' on beta-diversity, using the group significance test, which requires a categorical metadata column as input, however because all states of that character are numeric identifiers it is interpreted as numeric, preventing the test. I am not sure whether using the correlation test would be appropriate, so would rather change the column type from numeric to categorical such to be able to use the group significance test.

I do not yet see a way to do this, but please correct me if I am wrong.

Thanks in advance.

justinshaffer avatar Sep 13 '19 19:09 justinshaffer

You are right and this is an interesting issue because we will need to add to Qiita support of the Q2 column type or a way to force/pass this info to Q2.

antgonza avatar Sep 13 '19 19:09 antgonza

Sounds good to me.

I wonder, though, if there is a quicker solution to allow python data-type changes in the parameter settings in Qiita. For example, could we allow "as.numeric(factor)" in the field for the metadata column, or something similar?

Best,

Justin

On Fri, Sep 13, 2019 at 12:31 PM Antonio Gonzalez [email protected] wrote:

You are right and this is an interesting issue because we will need to add to Qiita support of the Q2 column type https://docs.qiime2.org/2019.7/tutorials/metadata/#column-types or a way to force/pass this info to Q2.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocore/qiita/issues/2947?email_source=notifications&email_token=ADSDCGDNRCZJ7BKPPNKIQX3QJPTA7A5CNFSM4IWTIQU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6V7NGY#issuecomment-531363483, or mute the thread https://github.com/notifications/unsubscribe-auth/ADSDCGBIWILQL6RB6AML4VDQJPTA7ANCNFSM4IWTIQUQ .

justinshaffer avatar Sep 13 '19 21:09 justinshaffer

Yesterday we got a report about this issue but going the other way around: numeric fields that are not consider numeric because they have other values due to having blanks/controls/others.

Anyway, thinking about this, a possible "easy" solution is to filter the mapping file before running any given command to only contain the samples in the input artifacts: this should help by only using those samples during the validation/casting allowing for those numeric categories with categorical values to become numeric. However, the original categorical represented by numerical issue opened here, will not be solved. Now, a counter argument is that we should support categorical data represented by numbers; like sex: 1/2 or treatment: 0/1.

Thoughts?

antgonza avatar Mar 17 '21 16:03 antgonza

Hi Antonio and Qiita aficionados,

I have come across a similar issue as well. During an analysis workflow, no matter if I filter out any categorical variable during the workflow tree, I cannot run a numeric-based analysis on metadata that is numeric in nature. For example, I can't use a 'disease_severity_index' category that is all numerical even if I filter out 'not applicable' from blanks. This is unfortunate, because it seems that all Qiita workflows based on numerical metadata values (alpha-correlation, beta-correlation, custom PCoA axis, etc) fail. These are some super awesome workflows that I would love to use. If I'm correct, it seems to be an unavoidable problem, because it means one would have to categorize blanks under a metadata category as some number, which is non-sensical. Use of blanks as we all know is crucial for good microbiome analysis. I'm not sure what you mean by "only contain samples in the input artifacts" or "those samples during the validation/casting". I'd love to provide more input on your suggestion, but I don't totally understand. I think this is what I've described above, but it sounds like a complex problem. Rob Quinn

quinnrob avatar Mar 17 '21 17:03 quinnrob

@quinnrob; thank you for your feedback.

Just to clarify what I mean with "only contain samples in the input artifacts" let's use an example: A study has samples that come from a given treatment A/B and the metadata has columns: treatment & ph; where all the values in treatment are either high-fat/low-fat and ph have values of between 1-2 for A and 3-4 for B and 'blanks' for all columns for all sample controls (blanks). The problem is that currently in an analyses the full mapping file is passed so even if an artifact (input file) contains only treatment samples, the parser assumes in both scenarios (treatment/ph) that the columns are categorical (cause is using all the samples). The proposed solution is to filter - internally, completely hidden from the user - the mapping file before actually running the command so if a user filtered all "blanks" then ph will be numerical and treatment will be categorical (cause the 'none' values are gone).

Hope this helps.

antgonza avatar Mar 17 '21 18:03 antgonza

I see. So if you say 'hidden from the user' does that mean I cannot do this filtering myself? Does this have anything to do with the upcoming Qiita system update?

quinnrob avatar Mar 18 '21 15:03 quinnrob

Yes, as you can imagine this is something that needs to be program/automated. No deployment scheduled for this as we still not have a solution planed and implemented.

antgonza avatar Mar 18 '21 15:03 antgonza

Ok. thanks. Perhaps a workaround would be to update the metadata by including a column that forces the blanks to be numerical by using 0 perhaps. I'll play around and see what I can come up with.

quinnrob avatar Mar 18 '21 16:03 quinnrob

Hi Antonio,

I think one way to do this would be to somehow drop the strings to leave empty values that won't be interpreted as states but rather NAs by QIIME2?

Justin

On Wed, Mar 17, 2021 at 9:36 AM Antonio Gonzalez @.***> wrote:

Yesterday we got a report about this issue but going the other way around: numeric fields that are not consider numeric because they have other values due to having blanks/controls/others.

Anyway, thinking about this, a possible "easy" solution is to filter the mapping file before running any given command to only contain the samples in the input artifacts: this should help by only using those samples during the validation/casting allowing for those numeric categories with categorical values to become numeric. However, the original categorical represented by numerical issue opened here, will not be solved. Now, a counter argument is that we should support categorical data represented by numbers; like sex: 1/2 or treatment: 0/1.

Thoughts?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/qiita-spots/qiita/issues/2947#issuecomment-801229769, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSDCGCWMWFTSKKBVOULE43TEDKVFANCNFSM4IWTIQUQ .

-- Justin Shaffer, PhD IRACDA Postdoctoral Fellow Rob Knight Group Department of Pediatrics, School of Medicine University of California, San Diego justinshafferbio.wordpress.com

justinshaffer avatar Mar 24 '21 17:03 justinshaffer

@justinshaffer, good point - an option might be to "transform" all official EBI null values to actual empty spaces so the Q2 parsers work as expected or add these to the Q2 blank declaration.

BTW In case you missed it, there is an Q2 discussion about this topic: https://github.com/qiime2/qiime2/issues/568

antgonza avatar Mar 25 '21 11:03 antgonza

Interesting thread. I look forward to hearing how you guys resolve it. Please do let me know if I can help. RAQ

Get Outlook for iOShttps://aka.ms/o0ukef


From: Antonio Gonzalez @.> Sent: Thursday, March 25, 2021 7:16:22 AM To: qiita-spots/qiita @.> Cc: Quinn, Robert @.>; Mention @.> Subject: Re: [qiita-spots/qiita] Add functionality for changing metadata column types (#2947)

@justinshafferhttps://urldefense.com/v3/__https://github.com/justinshaffer__;!!HXCxUKc!l8-pnBqHXKBWAltOaC6Ll_QwCJ18qtCZbBM_zCZTMhfnROAuU2r6vXwaqZF8BAx5$, good point - an option might be to "transform" all official EBI null values to actual empty spaces so the Q2 parsers work as expected or add these to the Q2 blank declaration.

BTW In case you missed it, there is an Q2 discussion about this topic: qiime2/qiime2#568https://urldefense.com/v3/__https://github.com/qiime2/qiime2/issues/568__;!!HXCxUKc!l8-pnBqHXKBWAltOaC6Ll_QwCJ18qtCZbBM_zCZTMhfnROAuU2r6vXwaqWYYT96_$

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/qiita-spots/qiita/issues/2947*issuecomment-806567796__;Iw!!HXCxUKc!l8-pnBqHXKBWAltOaC6Ll_QwCJ18qtCZbBM_zCZTMhfnROAuU2r6vXwaqcA7vC5j$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ARBJTB2NVON2IHSLHB5YXP3TFMLQNANCNFSM4IWTIQUQ__;!!HXCxUKc!l8-pnBqHXKBWAltOaC6Ll_QwCJ18qtCZbBM_zCZTMhfnROAuU2r6vXwaqTJykX77$.

quinnrob avatar Mar 25 '21 13:03 quinnrob

While working with a user in the help account we might have come with a solution for columns that have numeric values, and is something like this: cast(column_of_integer as integer) >= value1 AND cast(column_of_integer as integer) <= value2

antgonza avatar Feb 24 '23 18:02 antgonza