galaxy icon indicating copy to clipboard operation
galaxy copied to clipboard

Add author and tools details in RO-Crate

Open Marie59 opened this issue 1 year ago • 3 comments

Here we go again ! I am launching this PR on a bit of work I did to ameliorate the RO-Crate plugin.

I updated the _add_worfklow function. I added the information on the author of the workflow:

  • Name
  • Orcid The update checks if the workflow attribute 'creator_metadata' exist. If it exists, it writes into the ro-crate the name of the creator (if there is no name it leaves a "") same for the idenifier (by default i put the ORCID maybe it's not a good Idea and if there is no Orcid it leaves a ""). If there is no 'creator_metadata' nothing is written on the creator.

I created a function _add_tools that checks the different info in workflow.steps to add some tools details:

  • Id of the tool
  • tool's name (label if there is one or id if there is no label)
  • tools' version
  • tool's description: the update checks if the tool has an annotations attribute if it exist it is written as the description if it does not the description remains "")
  • url to the the toolshed (not sure this part is relevant as I did not yet solve how to have the exact url to the tool in the toolshed and I just let the global url to the toolshed)

I am not sure if I wrote that the correct way corresponding to the run crate profile (this is a bit obscure to me)

@pauldg What do you think of this ? Thanks !!

Marie59 avatar Sep 16 '24 17:09 Marie59

Thanks a lot for the review and help @mvdbeek !

Marie59 avatar Sep 18 '24 07:09 Marie59

@OliverWoolland could you review this? I think this solves many of the issues you had about the Workflow Run Crate from Galaxy

stain avatar Sep 19 '24 08:09 stain

TO DO in a 2nd time (maybe in a later PR but I'm just putting that here to keep it in mind):

  • [ ] Add xrefs of tools
  • [ ] Add citations of tools
  • [ ] Add edam ontologies of tools

Marie59 avatar Sep 19 '24 09:09 Marie59

Quick update from the RO-Crate side (with apologies from the long delay) - the metadata that's being added looks fine to me as I read the code, but I would like to look at an example RO-Crate generated using this PR to make sure the JSON-LD output looks like it should. I'm waiting for @pauldg or @OliverWoolland to provide me with such a crate, whoever gets there first

elichad avatar Nov 12 '24 14:11 elichad

Hi all, I've been trying to test these changes on a local 24.1.3 instance I have (base commit 71f5048e52502b6506c6eb4f7aef16dc3dccd15f)

I've not been able to get the revisions to run for me, with errors being thrown relating to the use of the toolbox.

I think I've tracked the problem down to the origin of this PR being usegalaxy-eu rather than galaxyproject. The two forks seem to be handling the toolbox a little differently.

I'd be curious if others have found the same? Or if there is an obvious solution?

Apologies if I have misunderstood anything!

OliverWoolland avatar Nov 15 '24 08:11 OliverWoolland

@OliverWoolland what kind of errors are you seeing? There should only be minor differences on .eu compared to the upstream (https://github.com/galaxyproject/galaxy/compare/release_24.1...usegalaxy-eu:galaxy:release_24.1_europe) Plus the tests are running fine in CI, and those are run against the merge of this to galaxyproject/galaxy afaik.

martenson avatar Nov 15 '24 10:11 martenson

Thanks @martenson! This is the error I see

[2024-11-15 10:43:15,368: ERROR/main] Task galaxy.prepare_invocation_download[2d38756c-b893-4a20-8ca7-80e243608cc4] raised unexpected: AttributeError("'GalaxyManagerApplication' object has no attribute '_toolbox'")
Traceback (most recent call last):
  File "/galaxy/server/.venv/lib/python3.12/site-packages/celery/app/trace.py", line 453, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/galaxy/server/.venv/lib/python3.12/site-packages/celery/app/trace.py", line 736, in __protected_call__
    return self.run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/galaxy/server/lib/galaxy/celery/__init__.py", line 184, in wrapper
    rval = app.magic_partial(func)(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/galaxy/server/.venv/lib/python3.12/site-packages/lagom/wrapping.py", line 28, in _bound_func
    return inner_func(*bound_args, **bound_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/galaxy/server/.venv/lib/python3.12/site-packages/lagom/wrapping.py", line 45, in _error_handling_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/galaxy/server/lib/galaxy/celery/tasks.py", line 392, in prepare_invocation_download
    model_store_manager.prepare_invocation_download(request)
  File "/galaxy/server/lib/galaxy/managers/model_stores.py", line 156, in prepare_invocation_download
    with model.store.get_export_store_factory(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/galaxy/server/lib/galaxy/model/store/__init__.py", line 2481, in __exit__
    self._finalize()
  File "/galaxy/server/lib/galaxy/model/store/__init__.py", line 2850, in _finalize
    ro_crate = self._init_crate()
               ^^^^^^^^^^^^^^^^^^
  File "/galaxy/server/lib/galaxy/model/store/__init__.py", line 2518, in _init_crate
    invocation_crate_builder = WorkflowRunCrateProfileBuilder(self)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/galaxy/server/lib/galaxy/model/store/ro_crate_utils.py", line 49, in __init__
    self.toolbox = self.model_store.app.toolbox
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/galaxy/server/lib/galaxy/app.py", line 373, in toolbox
    return self._toolbox
           ^^^^^^^^^^^^^
AttributeError: 'GalaxyManagerApplication' object has no attribute '_toolbox'. Did you mean: 'toolbox'?

OliverWoolland avatar Nov 15 '24 10:11 OliverWoolland

I'm not sure if that is the issue, but note that these changes are on the dev branch not release_24.1...

davelopez avatar Nov 15 '24 10:11 davelopez

Fair enough! I'll try and cross-check myself

OliverWoolland avatar Nov 15 '24 11:11 OliverWoolland

I don't have a local development Galaxy set up already, so I cloned Galaxy dev, checked out this PR, then installed Galaxy from scratch. When I created a simple workflow, invoked it, and tried to export as RO-Crate, I got the same error as @OliverWoolland above.

[2024-11-21 13:14:58,757: ERROR/main] Task galaxy.prepare_invocation_download[a4bab0c4-8c68-422a-bff2-e2a51b5c44a6] raised unexpected: AttributeError("'GalaxyManagerApplication' object has no attribute '_toolbox'")
Traceback (most recent call last):
  File "/home/eli/galaxy/galaxy/.venv/lib/python3.12/site-packages/celery/app/trace.py", line 453, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/home/eli/galaxy/galaxy/.venv/lib/python3.12/site-packages/celery/app/trace.py", line 736, in __protected_call__
    return self.run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/eli/galaxy/galaxy/lib/galaxy/celery/__init__.py", line 184, in wrapper
    rval = app.magic_partial(func)(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/eli/galaxy/galaxy/.venv/lib/python3.12/site-packages/lagom/wrapping.py", line 28, in _bound_func
    return inner_func(*bound_args, **bound_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/eli/galaxy/galaxy/.venv/lib/python3.12/site-packages/lagom/wrapping.py", line 45, in _error_handling_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/eli/galaxy/galaxy/lib/galaxy/celery/tasks.py", line 390, in prepare_invocation_download
    model_store_manager.prepare_invocation_download(request)
  File "/home/eli/galaxy/galaxy/lib/galaxy/managers/model_stores.py", line 156, in prepare_invocation_download
    with model.store.get_export_store_factory(
  File "/home/eli/galaxy/galaxy/lib/galaxy/model/store/__init__.py", line 2481, in __exit__
    self._finalize()
  File "/home/eli/galaxy/galaxy/lib/galaxy/model/store/__init__.py", line 2850, in _finalize
    ro_crate = self._init_crate()
               ^^^^^^^^^^^^^^^^^^
  File "/home/eli/galaxy/galaxy/lib/galaxy/model/store/__init__.py", line 2518, in _init_crate
    invocation_crate_builder = WorkflowRunCrateProfileBuilder(self)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/eli/galaxy/galaxy/lib/galaxy/model/store/ro_crate_utils.py", line 48, in __init__
    self.toolbox = self.model_store.app.toolbox
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/eli/galaxy/galaxy/lib/galaxy/app.py", line 397, in toolbox
    return self._toolbox
           ^^^^^^^^^^^^^
AttributeError: 'GalaxyManagerApplication' object has no attribute '_toolbox'. Did you mean: 'toolbox'?

I checked out galaxyproject:dev and the export works as expected.

Finally, I merged Marie59:ro-crate2 into galaxyproject:dev on a local branch and the error appeared again.

So the error appears to be somewhere in this PR, but I don't have the expertise required to understand or fix it.

It's visible in CI too - https://github.com/galaxyproject/galaxy/actions/runs/11610465172/job/32407941424?pr=18820 has some 500 errors which I think come from this problem.

elichad avatar Nov 21 '24 13:11 elichad

I had a closer look and this is what I found:

The export process happens in the context of a Celery task - a background task that runs out of the main galaxy process. The "galaxy app" in this context is a GalaxyManagerApplication. This is a "reduced" version of the UniverseApplication that strips out some functionality and services not required outside the main Galaxy process).

Here we can see that the toolbox is set up only in the UniverseApplication (full-fledged Galaxy app):

https://github.com/galaxyproject/galaxy/blob/455da5c0bb543bb5f3c8ba7869732e0c13217a26/lib/galaxy/app.py#L775

Moving this call (along with possibly other toolbox-related calls) to the parent class GalaxyManagerApplication will probably fix it but I don't know if the toolbox was explicitly left out of the GalaxyManagerApplication for strong reasons (like performance, etc.) or if it can be safely set up at that level to make it available in the Celery context.

Maybe someone else from the @galaxyproject/wg-backend can answer this question?

Edit: lol, Marius was answering at the same time here https://github.com/galaxyproject/galaxy/pull/18820#discussion_r1852257210 so, that would be the answer.

davelopez avatar Nov 21 '24 14:11 davelopez

Hello all, Thanks for all your feedback and reviews ! So if I understand correctly, for now, I leave this PR on the side 😢 (hopefully will be useful one day) I can then open a PR for the things added that does not rely on the toolbox. Does this sounds good ? Did I got everything right ?

Thanks again for all your help !!

Marie59 avatar Nov 22 '24 14:11 Marie59

Yes, or drop the things which require tool access from this PR.

mvdbeek avatar Nov 22 '24 15:11 mvdbeek

@Marie59 please don't be discouraged, this is good work on the whole and I'm happy to see the RO-Crate outputs being improved :)

As I see it, only a small amount of lines need to be removed from this PR to take out the toolbox-dependent bits (namely the _get_tool_metadata function and the few lines that call it or its output), so that would be the easiest thing to do.

If you decide to make a new PR instead, so that the toolbox code is still visible here if someone wants to find it again, that's fine as well. In that case, please tag me in the new one so I can keep track of it!

elichad avatar Nov 22 '24 16:11 elichad

@Marie59 please don't be discouraged, this is good work on the whole and I'm happy to see the RO-Crate outputs being improved :)

As I see it, only a small amount of lines need to be removed from this PR to take out the toolbox-dependent bits (namely the _get_tool_metadata function and the few lines that call it or its output), so that would be the easiest thing to do.

If you decide to make a new PR instead, so that the toolbox code is still visible here if someone wants to find it again, that's fine as well. In that case, please tag me in the new one so I can keep track of it!

Not discouraged no worries ! Thanks for the encouragements ! I'll remove what need to be removed here (and keep a back up).

Marie59 avatar Nov 23 '24 18:11 Marie59

Hello all ! So, normally I removed what needed to be removed. Locally I succeed to generate a Ro-crate with my galaxy test instance. I tried to add some tests (they are working too). Let me know if something is not good and if I can improve anything.

Marie59 avatar Nov 25 '24 14:11 Marie59

Can you resolve the conflict in test/unit/data/model/test_model_store.py ?

mvdbeek avatar Nov 25 '24 14:11 mvdbeek

I don't understand the last errors I get ...

Marie59 avatar Nov 27 '24 08:11 Marie59

I don't understand the last errors I get ...

diff --git a/lib/galaxy/model/__init__.py b/lib/galaxy/model/__init__.py
index bd5d59384e..385969d846 100644
--- a/lib/galaxy/model/__init__.py
+++ b/lib/galaxy/model/__init__.py
@@ -7893,7 +7893,7 @@ class Workflow(Base, Dictifiable, RepresentById):
     has_cycles: Mapped[Optional[bool]]
     has_errors: Mapped[Optional[bool]]
     reports_config: Mapped[Optional[bytes]] = mapped_column(JSONType)
-    creator_metadata: Mapped[Optional[bytes]] = mapped_column(JSONType)
+    creator_metadata: Mapped[Optional[List[Dict[str, Any]]]] = mapped_column(JSONType)
     license: Mapped[Optional[str]] = mapped_column(TEXT)
     source_metadata: Mapped[Optional[Dict[str, str]]] = mapped_column(JSONType)
     uuid: Mapped[Optional[Union[UUID, str]]] = mapped_column(UUIDType)

should fix the error: "int" has no attribute "get" [attr-defined] mypy errors.

nsoranzo avatar Nov 27 '24 10:11 nsoranzo

@nsoranzo thank you for the help ! I was a bit lost !!

Marie59 avatar Nov 27 '24 15:11 Marie59

I took a look at the output JSON-LD and I've made some suggestions based on the Provenance Run Crate profile - Galaxy RO-Crates don't strictly conform to this, but it offers some best practices that are applicable here.

From a Workflow Run Crate profile perspective (the less-strict one we actually intend to conform to), the metadata added in this PR looks good.

Thanks a lot for the review ! I tried to add as much as I could your suggestions !

I don't understand the last comment though on formalParameters what should I do ?

Is everything else good ?

Marie59 avatar Dec 23 '24 09:12 Marie59

So can we merged this ? or is there some improvements to be done still ?

Marie59 avatar Jan 15 '25 07:01 Marie59

Thank you so much @Marie59! And sorry for the delay.

I will rebase the branch with the current dev changes since there seem to be some conflicts due to the base branch being based on a pretty old version of dev, but somehow GitHub doesn't detect those conflicts :sweat_smile:

Everything else looks good :+1:

davelopez avatar Jan 30 '25 15:01 davelopez