qiita icon indicating copy to clipboard operation
qiita copied to clipboard

Mechanism for modifying metadata in (meta)analysis

Open adswafford opened this issue 7 years ago • 11 comments
trafficstars

Putting an issue here to collect ideas and track.

I'd advocate still for being able to download, modify, upload, and update metadata to a specific analysis, leaving the original study metadata untouched. This will enable flexibility for meta-analysis and prevent users from the data-duplication model of downloading all the raw data for studies of interest and then uploading as a new merged study with fixed metadata.

adswafford avatar Nov 29 '17 18:11 adswafford

This is actually not difficult in principle as analysis already support multiple metadata files and we could simply add it, the issues (opportunities?) are:

  1. should we allow this, in the base that any user can make any analysis public and there is no actual curation of the metadata
  2. should we have a validator? what should it validate? That all samples match? Anything else?
  3. once you upload the new mapping file, should all steps be redone? should it be a new analysis?
  4. should we keep track of all of the new/old mapping files?

Perhaps a way to do this is that you can reuse a biom by creating a new analysis with your own mapping file, which can't be made public until admin revision - or with a big banner that says: different from metadata in original studies.

antgonza avatar Nov 29 '17 18:11 antgonza

I like the idea of allowing users to reuse a biom with their own mapping file and a warning banner that says the metadata is different from the original study. Or to Jose's point, we could only allow them to add additional columns to the metadata (though ideally not with an in-Qiita GUI) and then append/prepend the added metadata categories with something to flag that they are not present in the original study, e.g. "custom_" or "_12345" (where 12345 = the analysis ID).

For the specific issues:

  1. with 1 or more of the above safety mechs, I think it's fine for it to be made public
  2. Just all samples matching and no db violations should be fine right?
  3. this is a good point. it would be great if there was an option for users to choose to rerun all of the analyses with the new metadata, but perhaps the new analysis should start branching from a fresh node so that the older analyses are maintained? i'm wondering if the older analyses should be marked read-only once the metadata is updated?
  4. yes, but just using the same tracking we now use for metadata and prep files, unless we need to hang onto them in some other way to prevent the older analyses from breaking.

adswafford avatar Nov 30 '17 01:11 adswafford

Thanks. Perhaps the less intrusive and "easiest" implementation will be for the user to generate/copy an analysis with their own mapping file, with these steps:

  • The user selects an analysis and clicks to reprocess with own mapping file.
  • The mapping file is uploaded, validated (only sample names), and a new analysis is created - the user has the option to rerun all steps from the copied analysis. Note that the mapping file info is not actually stored in the DB and it's only used in the analysis so makes things easier.

antgonza avatar Nov 30 '17 01:11 antgonza

Is this enabled now? I would like to upload an edited version of the original metadata to my analysis?

NeginValizadegan avatar Nov 07 '20 19:11 NeginValizadegan

Hi @negin1986, thank you for your interest. Not yet, hopefully we will add during the January deployment. Currently, you would need to create a new analysis with your updated studies and recreate the steps ... at least this is a point and click operation.

antgonza avatar Nov 09 '20 14:11 antgonza

Hi @antgonza, thank you for your response. I am not interested in removing studies. I am interested in using an edited metadata, for example, removing specific samples, relabeling variables, making the metadata look nicer outside qiita and bring it back, things like that.

NeginValizadegan avatar Nov 09 '20 18:11 NeginValizadegan

@negin1986, for reference here: let's imagine that you have selected X studies or preparations for your meta-analysis, then after multiple commands/steps (alpha, rarefaction, beta, etc), you realize that your analysis have Y columns with wrong values and that perhaps it will be nice to also merge 2 columns. Ideally, what this means and what this issue is about, a user will update the sample or prep information files of each study individually; then, it will hit some kind of refresh, and a new merged metadata file for that analysis will be available. The spirit behind this is that, if there is a column that is helpful for your analysis, it will help other users in the future so adding it to the study will make that possible. Obviously, this is only possible for your own studies; however, if you have system-wide suggestions I encourage you to send them to the qiita.help account.

Now, if you need to remove samples from an exiting meta-analysis, I suggest using the filter feature-table or distance-matrix commands available in the analysis section.

Anyway, if you have further questions please send them to the qiita.help account.

antgonza avatar Nov 09 '20 20:11 antgonza

Extracting the main points from the discussing linked above here; in case it helps. A possible "easy" solution for this issue is to have some "button" that restarts the analysis: (1) updates the metadata and (2) restarts all jobs in the analysis - if this is an agreeable solution, some outstanding questions are:

  • do we need to keep a backup of the old artifact?
  • what should happen if a job fails (for example cause a metadata column doesn't exist or the values are now wrong)? Do we mark all further down steps but error them? what about the previously created artifacts from that job/step?

antgonza avatar Mar 31 '21 21:03 antgonza

Hi Antonio,

Cool!

  • by the old artifact do you mean all of the stuff in the analysis network prior to restarting?
  • I think if the job fails there's a few options. One could be to somehow force a 'check' upfront - I guess between (1) and (2) that 'makes sure' that the downstream analyses can complete strictly based on checking changes to the metadata. Alternatively and more aligned with your suggestion - you could have the 'failed' steps turn red - for this it would be great to somehow keep all of the downstream ones as well - turning them red or similar - rather than having them disappear as usually happens when a step fails. A third option could be to 'duplicate' the analysis within the same network but as a separate tree - that way this one could behave normally with respect to 'failed' steps causing all downstream steps to disappear as normal - with the idea that the user can cross-reference the original network for 'what they did'. How the second tree is treated with respect to 'failed' steps could also be altered as above.

Justin

On Wed, Mar 31, 2021 at 2:10 PM Antonio Gonzalez @.***> wrote:

Extracting the main points from the discussing linked above here; in case it helps. A possible "easy" solution for this issue is to have some "button" that restarts the analysis: (1) updates the metadata and (2) restarts all jobs in the analysis - if this is an agreeable solution, some outstanding questions are:

  • do we need to keep a backup of the old artifact?
  • what should happen if a job fails (for example cause a metadata column doesn't exist or the values are now wrong)? Do we mark all further down steps but error them? what about the previously created artifacts from that job/step?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/qiita-spots/qiita/issues/2410#issuecomment-811467566, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSDCGFZXEZPG5KTGS2TRPTTGOFULANCNFSM4EF5UJDA .

-- Justin Shaffer, PhD IRACDA Postdoctoral Fellow Rob Knight Group Department of Pediatrics, School of Medicine University of California, San Diego justinshafferbio.wordpress.com

justinshaffer avatar Apr 02 '21 07:04 justinshaffer

  • yup
  • a command/parameters/metadata check is basically running the commands or at least transversing throughout all commans and checking all inputs/outputs - a full job by itself. Now, I like the option of "copying" the analysis to create a new one; I think this will make it even easier but will not solve #3088 feature of keeping the analysis id static. We could start "simple" by copying a full analysis and when the time comes to solve #3088 we can add a checkbox for new/same (copy or not) analysis.

antgonza avatar Apr 02 '21 11:04 antgonza

Hi Antonio,

Yeah that sounds good. One thing that might help with the ID issue in the mean time would be to add a 'Related analyses' list to the analysis page, similar to the 'Analyses included in' currently displayed on the study pages - that way people can track them the copies if they have unique IDs.

Justin

On Fri, Apr 2, 2021 at 4:53 AM Antonio Gonzalez @.***> wrote:

  • yup
  • a command/parameters/metadata check is basically running the commands or at least transversing throughout all commans and checking all inputs/outputs - a full job by itself. Now, I like the option of "copying" the analysis to create a new one; I think this will make it even easier but will not solve #3088 https://github.com/qiita-spots/qiita/issues/3088 feature of keeping the analysis id static. We could start "simple" by copying a full analysis and when the time comes to solve #3088 https://github.com/qiita-spots/qiita/issues/3088 we can add a checkbox for new/same (copy or not) analysis.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/qiita-spots/qiita/issues/2410#issuecomment-812498007, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSDCGHN3DS7HEUNSJ27I53TGWV2RANCNFSM4EF5UJDA .

-- Justin Shaffer, PhD IRACDA Postdoctoral Fellow Rob Knight Group Department of Pediatrics, School of Medicine University of California, San Diego justinshafferbio.wordpress.com

justinshaffer avatar Apr 02 '21 16:04 justinshaffer