arvados icon indicating copy to clipboard operation
arvados copied to clipboard

save cwl.input.json to keep collection if large

Open jrandall opened this issue 7 years ago • 4 comments
trafficstars

If cwl.input.json is larger than 1MB, save it to keep rather than inlining it in the container_request mounts entry.

jrandall avatar Nov 14 '18 15:11 jrandall

This is a good idea, workbench won't know how to show workflow inputs (but they are very large so that's probably okay). Changing the literal form from "json" to "text" is also likely to cause regressions.

tetron avatar Nov 14 '18 21:11 tetron

What's the status of this PR?

mr-c avatar Sep 24 '19 08:09 mr-c

I think the status is that it potentially breaks the ability to display workflow inputs in Workbench, so in order to accept the fix it would need to be paired with an update to Workbench (or at least some investigation of the side effects of this change on Workbench).

tetron avatar Sep 24 '19 13:09 tetron

@tetron I've lost track of exactly where this ended up, but it looks like I did implement some fixes on the workbench side to support "kind":"text" (but it looks like I did not PR them and they may have bitrotted by now): https://github.com/wtsi-hgi/arvados/commit/bf61e04cae19ac003af8b9734a42c622d860c281

Also see: https://dev.arvados.org/issues/13685

My recollection is that there are two related performance fixes related to handling of CWL workflows with large numbers of inputs. This PR touches both of them.

One is to implement "kind": "text" as an alternative to "kind": "json" - this is so that various parts of the code will stop repeatedly parsing the (large and complex) inputs as JSON and just treat them as opaque text instead. Even JSON as small as 1MB can, for example, cause the workbench to be very slow as it processes the JSON and then renders DOM elements to display all elements contained therein, but this did not only affect the workbench, other parts of the system were highly accelerated when large numbers of container requests with very large inputs were being handled - I believe that includes the API server.

The other fix is to store very large content (defined here as >1MB although that should probably have a config knob) in keep rather than inlining it into the mounts structure (which then gets passed around and repeatedly parsed as part of the container / container_request objects). This further speeds up processing of containers / container requests because they are much smaller.

jrandall avatar Sep 24 '19 14:09 jrandall