pyam icon indicating copy to clipboard operation
pyam copied to clipboard

track provenance and the operations recipe

Open danielhuppmann opened this issue 4 years ago • 5 comments

At a workshop of IPCC TG-Data this week, @aspinuso presented on "Data-Intensive and Reproducible Science", using a pyam tutorial as an example for a workflow with detailed provenance tracking: https://github.com/aspinuso/pyam-binder/blob/master/pyam.ipynb

Looking at the more advanced workflows being discussed for AR6, it might be useful to include some basic support or integration for dispel4py or another package in pyam.

danielhuppmann avatar Nov 09 '19 08:11 danielhuppmann

@gidden @znicholls @Rlamboll, what do you think?

danielhuppmann avatar Nov 09 '19 08:11 danielhuppmann

Looks very cool. Looks way more complex than we can include and test in our first AR6 workflow draft though haha. If we just want it for reproducibility, I think I'd prefer to make the iiasa-climate-assessment public instead (with appropriate tags) as reproducing from the output of dispel4py looks non-trivial.

znicholls avatar Nov 09 '19 11:11 znicholls

Dear all

Thanks for the interest.

Just as clarification, the most up-to-date repo of the dispel4py processing library, which includes support for provenance configuration and traceability, is currently developed in the context of the DARE platform and accesible at.

https://gitlab.com/project-dare/dispel4py

Sorry for the confusion. We are in the process of migrating versions and repositories.

I agree that adopting the whole library could be at the moment too complex to address basic traceability needs. Usually libraries are used through dispel4py rather than having it integrated within the library itself. However it could help the realisation of generic and more complex traceable workflows. Especially when these require different analysis libraries, custom metadata and larger computational resources. Let me mention in this thread @rosafilgueira, who is one of the main designer and developer of the tool.

Cheers Alessandro

Op 9 nov. 2019 06:47 schreef Zeb Nicholls [email protected]:

Looks very cool. Looks way more complex than we can include and test in our first AR6 workflow draft though haha. If we just want it for reproducibility, I think I'd prefer to make the iiasa-climate-assessment public instead (with appropriate tags) as reproducing from the output of dispel4py looks non-trivial.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/IAMconsortium/pyam/issues/287?email_source=notifications&email_token=AACNZDB6FUAH4445CKVAC3LQS2POVA5CNFSM4JLES6T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDUEG7I#issuecomment-552092541, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACNZDGLTMI7CUYOVBOPA33QS2POVANCNFSM4JLES6TQ.

aspinuso avatar Nov 09 '19 12:11 aspinuso

HI all - is there a good example to look at of this tool being used in the wild?

gidden avatar Nov 10 '19 09:11 gidden

I think it's important to identify the use cases and the extent of what you want to cover in terms of traceability of the results and the processes involved. If the aim is reproducing and/or tracing (these are two different problems) exclusively what pyam generates, then you should be good enough with combining notebooks and binder repositories with pyam custom lineage. Consider that, if in my binder example I would have used exclusively a pyam-lineage-aware implementation, I would have lost the information about the storage part of the workflow, which includes the location and the ID assigned to the produced image within a repository.

If you want to scale to wider use cases which involves more tools and software libraries, workflow systems are usually a better way for discretising tasks, describe and trace methods. In that case lineage comes usually for free. Have a look also at CWL Tool (https://github.com/common-workflow-language/cwltool) and PROV-ONE (http://jenkins-1.dataone.org/jenkins/view/Documentation%20Projects/job/ProvONE-Documentation-trunk/ws/provenance/ProvONE/v1/provone.html) as generic tools and models for workflow and provenance description in that context.

aspinuso avatar Nov 13 '19 14:11 aspinuso