workflow-execution-service-schemas icon indicating copy to clipboard operation
workflow-execution-service-schemas copied to clipboard

Remove workflow_attachment

Open geoffjentry opened this issue 6 years ago • 15 comments

Discussed in Hinxton, but looking for general feedback. Last year there was a debate which eventually led to the addition of the workflow_attachment feature, which sits alongside the workflow_url. I'd like to propose removing workflow_attachment and require all workflows to be submitted via workflow_url.

This does remove potentially useful functionality, however my understanding is that workflow_attachment is not typically used and it is definitely not on the golden path (i.e. submiiting a TRS URL, etc). The driving use case would typically be in rapid development mode, and presumably one could just not use a WES endpoint for that.

geoffjentry avatar Apr 30 '19 12:04 geoffjentry

Breaking change would require new round of GA4GH approval process — earmarking for 2.0.

jaeddy avatar May 03 '19 00:05 jaeddy

FYI @wshands

denis-yuen avatar May 03 '19 11:05 denis-yuen

-1 from me, but it could be made an optional feature. In addition to development, it provides a solution for the case where workflow data and scripts are kept somewhere that isn't available to the WES endpoint (perhaps because of authentication). At minimum, we would first need to have a workable solution for passing along credentials to access protected resources via workflow_url.

tetron avatar May 03 '19 13:05 tetron

@patmagee @pditommaso -- any thoughts on this from an implementer standpoint? it seems like it might be too soon to take away workflow_attachment from a WES implementer standpoint, in case its a common mode to pass long workflow descriptor files & dependencies.

ruchim avatar Dec 09 '20 01:12 ruchim

@ruchim I certainly wouldn't be overtly sad to see this feature go. We have never used this field to supply input files for the workflow, however we have been using it (unfortunately) for several applications

  • supplying workflow descriptors, in lieu of an inline json parameter

Something like the following would honestly work work well, and even allow resolution of imports. If we did this we could deprecated the attachment:

{
  "workflow_url": "foo.wdl",
  "descriptors": {
     "foo.wdl": "CONTENT"'
     "fizz/buzz.wdl":"CONTENT"
  }
}
  • supplying options.json. This is a convenience method, only really needed because the engine params are only string,string values , Instead of arbitrary strings
  • supply tokens in a map for file localization. This has been deprecated and we will be using something closer to token exchange with DRS in the future

My ultimate dream would be for the WES request to be represented as a single JSON Post request. Why? Because the run log has a single JSON request property that is "supposed" to represent what the user submitted, but it in no way is able to achieve that with workflow attachments. Once you add an attachment, its completely lost as far as the spec is concerned

patmagee avatar Dec 09 '20 02:12 patmagee

I had assumed the reason attachment existed was so one doesn't have a huge RunRequest with a bunch of descriptors squeezed in (because at some point the sheer size of the request could be too big?) -- though I'm not sure how often that has been true for most. So I'm not sure if the idea has already been discussed to insert imported descriptors directly into the workflow request. The alternative is what Jeff suggested earlier -- descriptor files hosted and accessed through a URL -- like a TRS URL.

ruchim avatar Dec 15 '20 00:12 ruchim

@ruchim The motivating use case came from our group as WDL and/or Cromwell at the time could not handle HTTP based imports so one needed to pass in all of the imports (remember the zip bundle?). At the Hinxton meetings we discussed removing it as at the time to my knowledge we were the only ones who wanted it in the first place, but as you see above others had come around to enjoying its presence

geoffjentry avatar Dec 15 '20 02:12 geoffjentry

The problem with this requirement is that hardly the backend (implementing the point) has access to the storage used by the process (read container) that will run the workflow, especially when using a cloud-native environment. IMO the complexity it adds it's not balanced by the advantages it provides, also considering all workflow engine should be able to pull the workflow assets from a URL.

pditommaso avatar Dec 15 '20 15:12 pditommaso

Backend-end access to the docker's process shared-storage can be quite easy and fast with compressed rsync/mirrordir/etc and bind mounts, especially since it is a flow where the delta file-changes are not on average 100% (different from the original), and are also temporally stretched out.

pgrosu avatar Dec 15 '20 17:12 pgrosu

IIRC the intention was that if you had a workflow that required multiple files and the WES client couldn't host the files itself and didn't have a convenient place to upload those files, then you use the attachment to upload a file bundle. This was also to avoid the requirement for DRS (which wasn't standardized at the time) or teaching a WES client separately how to upload files via N different protocols.

An alternate solution would be if the WES endpoint published some broadly-supported endpoint for the client to upload its files, and there was a WES client that supported uploading files via N different protocols.

tetron avatar Dec 15 '20 17:12 tetron

I feel like we are getting to a point of API maturity across the GA4GH board where we can start to rely on the other standards at least to some extent. I would shy away from a universal endpoint to upload files... In my experience file uploading is better left to dedicated services designed to handle the throughput.

In this case we could probably piggy back off of DRS, and add a priorty to focus on solving the core issues around how a user can access authenticated DRS records.

patmagee avatar Jan 04 '21 19:01 patmagee

To summarize a bit from what I'm reading above -- the intention was to solve the problem for providing many descriptor files, though we ended up using it as a way to provide info to the engine about file uploads.

I agree @patmagee that with DRS getting more mature and more drivers -- we'd want to rely on files being uploaded to a DRS service or a raw cloud bucket (but the questions around managing credentials needs more discussion on another ticket: )

I certainly have a bias here -- the advantage of having descriptors being accessible through URLs forces better integration with TRS implementations (Dockstore) and Github -- which are the two most common ways I've seen descriptor files stored. And both those systems allow for excellent provenance/history of updates, versioning/tagging and documentation (these advantages is where my bias kicks in). I think if we can clearly make a case for why URLs add value to a WES user, and that the burden of converting URLs to raw text is an acceptable one in sight of the user gains -- would WES implementations endorse & adopt this change?

ruchim avatar Jan 07 '21 13:01 ruchim

Reopening this can of worms, and looking for some implementor feedback here (@uniqueg, @wleepang, @vsmalladi, and any one else)

patmagee avatar Sep 22 '22 09:09 patmagee

Agree with @tetron here to the extent that I feel we shouldn't deprecate it without having a solution for the use case he described (and perhaps others).

Also, workflow attachments were easy for us to implement and we never ran into issues of having too many or too large files in practice.

That being said, I'm all for explicitly allowing, recommending or even mandating WES implementations to support (versioned) TRS URIs and/or other types of URIs (e.g., GitHub repos).

So, in summary: happy to deprecate if and only if a more elegant and functionally equivalent solution has proven its worth in the real world.

uniqueg avatar Sep 22 '22 12:09 uniqueg

I think we will need attachments. There are reasons that people want to upload particular workflows for testing or local that they don't want upload to a repository (Github). Also can the WES spec work with private git repositories and urls? @patmagee

vsmalladi avatar Sep 22 '22 14:09 vsmalladi