Add author and tools details in RO-Crate
Here we go again ! I am launching this PR on a bit of work I did to ameliorate the RO-Crate plugin.
I updated the _add_worfklow function. I added the information on the author of the workflow:
- Name
- Orcid The update checks if the workflow attribute 'creator_metadata' exist. If it exists, it writes into the ro-crate the name of the creator (if there is no name it leaves a "") same for the idenifier (by default i put the ORCID maybe it's not a good Idea and if there is no Orcid it leaves a ""). If there is no 'creator_metadata' nothing is written on the creator.
I created a function _add_tools that checks the different info in workflow.steps to add some tools details:
- Id of the tool
- tool's name (label if there is one or id if there is no label)
- tools' version
- tool's description: the update checks if the tool has an annotations attribute if it exist it is written as the description if it does not the description remains "")
- url to the the toolshed (not sure this part is relevant as I did not yet solve how to have the exact url to the tool in the toolshed and I just let the global url to the toolshed)
I am not sure if I wrote that the correct way corresponding to the run crate profile (this is a bit obscure to me)
@pauldg What do you think of this ? Thanks !!
Thanks a lot for the review and help @mvdbeek !
@OliverWoolland could you review this? I think this solves many of the issues you had about the Workflow Run Crate from Galaxy
TO DO in a 2nd time (maybe in a later PR but I'm just putting that here to keep it in mind):
- [ ] Add xrefs of tools
- [ ] Add citations of tools
- [ ] Add edam ontologies of tools
Quick update from the RO-Crate side (with apologies from the long delay) - the metadata that's being added looks fine to me as I read the code, but I would like to look at an example RO-Crate generated using this PR to make sure the JSON-LD output looks like it should. I'm waiting for @pauldg or @OliverWoolland to provide me with such a crate, whoever gets there first
Hi all, I've been trying to test these changes on a local 24.1.3 instance I have (base commit 71f5048e52502b6506c6eb4f7aef16dc3dccd15f)
I've not been able to get the revisions to run for me, with errors being thrown relating to the use of the toolbox.
I think I've tracked the problem down to the origin of this PR being usegalaxy-eu rather than galaxyproject. The two forks seem to be handling the toolbox a little differently.
I'd be curious if others have found the same? Or if there is an obvious solution?
Apologies if I have misunderstood anything!
@OliverWoolland what kind of errors are you seeing? There should only be minor differences on .eu compared to the upstream (https://github.com/galaxyproject/galaxy/compare/release_24.1...usegalaxy-eu:galaxy:release_24.1_europe) Plus the tests are running fine in CI, and those are run against the merge of this to galaxyproject/galaxy afaik.
Thanks @martenson! This is the error I see
[2024-11-15 10:43:15,368: ERROR/main] Task galaxy.prepare_invocation_download[2d38756c-b893-4a20-8ca7-80e243608cc4] raised unexpected: AttributeError("'GalaxyManagerApplication' object has no attribute '_toolbox'")
Traceback (most recent call last):
File "/galaxy/server/.venv/lib/python3.12/site-packages/celery/app/trace.py", line 453, in trace_task
R = retval = fun(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File "/galaxy/server/.venv/lib/python3.12/site-packages/celery/app/trace.py", line 736, in __protected_call__
return self.run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/galaxy/server/lib/galaxy/celery/__init__.py", line 184, in wrapper
rval = app.magic_partial(func)(*args, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/galaxy/server/.venv/lib/python3.12/site-packages/lagom/wrapping.py", line 28, in _bound_func
return inner_func(*bound_args, **bound_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/galaxy/server/.venv/lib/python3.12/site-packages/lagom/wrapping.py", line 45, in _error_handling_func
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/galaxy/server/lib/galaxy/celery/tasks.py", line 392, in prepare_invocation_download
model_store_manager.prepare_invocation_download(request)
File "/galaxy/server/lib/galaxy/managers/model_stores.py", line 156, in prepare_invocation_download
with model.store.get_export_store_factory(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/galaxy/server/lib/galaxy/model/store/__init__.py", line 2481, in __exit__
self._finalize()
File "/galaxy/server/lib/galaxy/model/store/__init__.py", line 2850, in _finalize
ro_crate = self._init_crate()
^^^^^^^^^^^^^^^^^^
File "/galaxy/server/lib/galaxy/model/store/__init__.py", line 2518, in _init_crate
invocation_crate_builder = WorkflowRunCrateProfileBuilder(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/galaxy/server/lib/galaxy/model/store/ro_crate_utils.py", line 49, in __init__
self.toolbox = self.model_store.app.toolbox
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/galaxy/server/lib/galaxy/app.py", line 373, in toolbox
return self._toolbox
^^^^^^^^^^^^^
AttributeError: 'GalaxyManagerApplication' object has no attribute '_toolbox'. Did you mean: 'toolbox'?
I'm not sure if that is the issue, but note that these changes are on the dev branch not release_24.1...
Fair enough! I'll try and cross-check myself
I don't have a local development Galaxy set up already, so I cloned Galaxy dev, checked out this PR, then installed Galaxy from scratch. When I created a simple workflow, invoked it, and tried to export as RO-Crate, I got the same error as @OliverWoolland above.
[2024-11-21 13:14:58,757: ERROR/main] Task galaxy.prepare_invocation_download[a4bab0c4-8c68-422a-bff2-e2a51b5c44a6] raised unexpected: AttributeError("'GalaxyManagerApplication' object has no attribute '_toolbox'")
Traceback (most recent call last):
File "/home/eli/galaxy/galaxy/.venv/lib/python3.12/site-packages/celery/app/trace.py", line 453, in trace_task
R = retval = fun(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File "/home/eli/galaxy/galaxy/.venv/lib/python3.12/site-packages/celery/app/trace.py", line 736, in __protected_call__
return self.run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/eli/galaxy/galaxy/lib/galaxy/celery/__init__.py", line 184, in wrapper
rval = app.magic_partial(func)(*args, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/eli/galaxy/galaxy/.venv/lib/python3.12/site-packages/lagom/wrapping.py", line 28, in _bound_func
return inner_func(*bound_args, **bound_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/eli/galaxy/galaxy/.venv/lib/python3.12/site-packages/lagom/wrapping.py", line 45, in _error_handling_func
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/eli/galaxy/galaxy/lib/galaxy/celery/tasks.py", line 390, in prepare_invocation_download
model_store_manager.prepare_invocation_download(request)
File "/home/eli/galaxy/galaxy/lib/galaxy/managers/model_stores.py", line 156, in prepare_invocation_download
with model.store.get_export_store_factory(
File "/home/eli/galaxy/galaxy/lib/galaxy/model/store/__init__.py", line 2481, in __exit__
self._finalize()
File "/home/eli/galaxy/galaxy/lib/galaxy/model/store/__init__.py", line 2850, in _finalize
ro_crate = self._init_crate()
^^^^^^^^^^^^^^^^^^
File "/home/eli/galaxy/galaxy/lib/galaxy/model/store/__init__.py", line 2518, in _init_crate
invocation_crate_builder = WorkflowRunCrateProfileBuilder(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/eli/galaxy/galaxy/lib/galaxy/model/store/ro_crate_utils.py", line 48, in __init__
self.toolbox = self.model_store.app.toolbox
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/eli/galaxy/galaxy/lib/galaxy/app.py", line 397, in toolbox
return self._toolbox
^^^^^^^^^^^^^
AttributeError: 'GalaxyManagerApplication' object has no attribute '_toolbox'. Did you mean: 'toolbox'?
I checked out galaxyproject:dev and the export works as expected.
Finally, I merged Marie59:ro-crate2 into galaxyproject:dev on a local branch and the error appeared again.
So the error appears to be somewhere in this PR, but I don't have the expertise required to understand or fix it.
It's visible in CI too - https://github.com/galaxyproject/galaxy/actions/runs/11610465172/job/32407941424?pr=18820 has some 500 errors which I think come from this problem.
I had a closer look and this is what I found:
The export process happens in the context of a Celery task - a background task that runs out of the main galaxy process. The "galaxy app" in this context is a GalaxyManagerApplication. This is a "reduced" version of the UniverseApplication that strips out some functionality and services not required outside the main Galaxy process).
Here we can see that the toolbox is set up only in the UniverseApplication (full-fledged Galaxy app):
https://github.com/galaxyproject/galaxy/blob/455da5c0bb543bb5f3c8ba7869732e0c13217a26/lib/galaxy/app.py#L775
Moving this call (along with possibly other toolbox-related calls) to the parent class GalaxyManagerApplication will probably fix it but I don't know if the toolbox was explicitly left out of the GalaxyManagerApplication for strong reasons (like performance, etc.) or if it can be safely set up at that level to make it available in the Celery context.
Maybe someone else from the @galaxyproject/wg-backend can answer this question?
Edit: lol, Marius was answering at the same time here https://github.com/galaxyproject/galaxy/pull/18820#discussion_r1852257210 so, that would be the answer.
Hello all, Thanks for all your feedback and reviews ! So if I understand correctly, for now, I leave this PR on the side 😢 (hopefully will be useful one day) I can then open a PR for the things added that does not rely on the toolbox. Does this sounds good ? Did I got everything right ?
Thanks again for all your help !!
Yes, or drop the things which require tool access from this PR.
@Marie59 please don't be discouraged, this is good work on the whole and I'm happy to see the RO-Crate outputs being improved :)
As I see it, only a small amount of lines need to be removed from this PR to take out the toolbox-dependent bits (namely the _get_tool_metadata function and the few lines that call it or its output), so that would be the easiest thing to do.
If you decide to make a new PR instead, so that the toolbox code is still visible here if someone wants to find it again, that's fine as well. In that case, please tag me in the new one so I can keep track of it!
@Marie59 please don't be discouraged, this is good work on the whole and I'm happy to see the RO-Crate outputs being improved :)
As I see it, only a small amount of lines need to be removed from this PR to take out the toolbox-dependent bits (namely the
_get_tool_metadatafunction and the few lines that call it or its output), so that would be the easiest thing to do.If you decide to make a new PR instead, so that the toolbox code is still visible here if someone wants to find it again, that's fine as well. In that case, please tag me in the new one so I can keep track of it!
Not discouraged no worries ! Thanks for the encouragements ! I'll remove what need to be removed here (and keep a back up).
Hello all ! So, normally I removed what needed to be removed. Locally I succeed to generate a Ro-crate with my galaxy test instance. I tried to add some tests (they are working too). Let me know if something is not good and if I can improve anything.
Can you resolve the conflict in test/unit/data/model/test_model_store.py ?
I don't understand the last errors I get ...
I don't understand the last errors I get ...
diff --git a/lib/galaxy/model/__init__.py b/lib/galaxy/model/__init__.py
index bd5d59384e..385969d846 100644
--- a/lib/galaxy/model/__init__.py
+++ b/lib/galaxy/model/__init__.py
@@ -7893,7 +7893,7 @@ class Workflow(Base, Dictifiable, RepresentById):
has_cycles: Mapped[Optional[bool]]
has_errors: Mapped[Optional[bool]]
reports_config: Mapped[Optional[bytes]] = mapped_column(JSONType)
- creator_metadata: Mapped[Optional[bytes]] = mapped_column(JSONType)
+ creator_metadata: Mapped[Optional[List[Dict[str, Any]]]] = mapped_column(JSONType)
license: Mapped[Optional[str]] = mapped_column(TEXT)
source_metadata: Mapped[Optional[Dict[str, str]]] = mapped_column(JSONType)
uuid: Mapped[Optional[Union[UUID, str]]] = mapped_column(UUIDType)
should fix the error: "int" has no attribute "get" [attr-defined] mypy errors.
@nsoranzo thank you for the help ! I was a bit lost !!
I took a look at the output JSON-LD and I've made some suggestions based on the Provenance Run Crate profile - Galaxy RO-Crates don't strictly conform to this, but it offers some best practices that are applicable here.
From a Workflow Run Crate profile perspective (the less-strict one we actually intend to conform to), the metadata added in this PR looks good.
Thanks a lot for the review ! I tried to add as much as I could your suggestions !
I don't understand the last comment though on formalParameters what should I do ?
Is everything else good ?
So can we merged this ? or is there some improvements to be done still ?
Thank you so much @Marie59! And sorry for the delay.
I will rebase the branch with the current dev changes since there seem to be some conflicts due to the base branch being based on a pretty old version of dev, but somehow GitHub doesn't detect those conflicts :sweat_smile:
Everything else looks good :+1: