cwltool icon indicating copy to clipboard operation
cwltool copied to clipboard

17054 custom name

Open lijiayong opened this issue 3 years ago • 8 comments

lijiayong avatar Aug 04 '21 16:08 lijiayong

Codecov Report

Merging #1477 (8393482) into main (462377e) will decrease coverage by 9.79%. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1477      +/-   ##
==========================================
- Coverage   65.82%   56.03%   -9.80%     
==========================================
  Files          89       45      -44     
  Lines       15624     7845    -7779     
  Branches     3935     1975    -1960     
==========================================
- Hits        10285     4396    -5889     
+ Misses       4251     2927    -1324     
+ Partials     1088      522     -566     
Impacted Files Coverage Δ
workflow_job.py 80.00% <0.00%> (-0.50%) :arrow_down:
process.py 69.11% <0.00%> (ø)
cwltool/argparser.py
cwltool/__main__.py
cwltool/stdfsaccess.py
cwltool/udocker.py
cwltool/pathmapper.py
cwltool/__init__.py
cwltool/expression.py
cwltool/validate_js.py
... and 35 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 462377e...8393482. Read the comment docs.

codecov[bot] avatar Aug 04 '21 16:08 codecov[bot]

The use case here is that if you have something like a wide scatter over samples, the name of the step that is visible to the user can incorporate useful information such as the sample id. It is a label for a specific instance of an execution of a step, not the description of the step (which is what id or label represent).

This was specifically requested by a user for a large pharma company who is known to run gigantic workflows.

tetron avatar Aug 04 '21 19:08 tetron

The use case here is that if you have something like a wide scatter over samples, the name of the step that is visible to the user can incorporate useful information such as the sample id. It is a label for a specific instance of an execution of a step, not the description of the step (which is what id or label represent).

Then should it be called a "job name" to better disambiguate? As steps already have an id, label, and description field.

Alternatively, logs could be improved to always include the name of the scattered variable and its current value when scattering. Thus hand-specifying this field would be unnecessary.

mr-c avatar Aug 05 '21 15:08 mr-c

Then should it be called a "job name" to better disambiguate? As steps already have an id, label, and description field.

Yes. That is a good idea. @lijiayong could you rename it "JobName" instead of "StepName" ?

Alternatively, logs could be improved to always include the name of the scattered variable and its current value when scattering. Thus hand-specifying this field would be unnecessary.

Logging the name of the scattered variable doesn't generalize terribly well, we don't know for sure what parameters are most relevant to the user to identify a job, and users may want to do some string munging to display only the most useful part. So I'm pretty users would still ask to be able to control the job names.

tetron avatar Aug 05 '21 16:08 tetron

Alternatively, logs could be improved to always include the name of the scattered variable and its current value when scattering. Thus hand-specifying this field would be unnecessary.

Logging the name of the scattered variable doesn't generalize terribly well, we don't know for sure what parameters are most relevant to the user to identify a job, and users may want to do some string munging to display only the most useful part. So I'm pretty users would still ask to be able to control the job names.

Sure, but we should implement the improved auto naming of scattered jobs as that will enhance the logging in the vast majority of situations.

mr-c avatar Aug 05 '21 16:08 mr-c

Is this intended to be part of a future version of the CWL standards?

Then there should be an issue opened to discuss the syntax.

I'm a bit uncomfortable with the number of cwltool extensions that have yet to be included in the standards.

If this needs to be fast tracked for Arvados and it's customer, then we can merge the supporting parts without the cwltool:JobName and Arvados can implement arv:JobName

mr-c avatar Aug 05 '21 17:08 mr-c

Is this intended to be part of a future version of the CWL standards?

Then there should be an issue opened to discuss the syntax.

I'm a bit uncomfortable with the number of cwltool extensions that have yet to be included in the standards.

I don't understand this, there are precisely 3 cwltool extensions right now, this would only be a 4th one. One is highly experimental (ProcessGenerator) and the others (including this one) could be standards track, but that's a separate discussion. The development strategy in the past was generally to implement something and then iterate?

If this needs to be fast tracked for Arvados and it's customer, then we can merge the supporting parts without the cwltool:JobName and Arvados can implement arv:JobName

I'm not sure I understand, it needs to check for the hint type:

self.step.get_requirement("http://commonwl.org/cwltool#StepName")

Are you suggesting that it would not be added to the extensions yml file, or that the code in cwltool would look for the extension with the arvados namespace instead of the cwltool namespace?

tetron avatar Aug 05 '21 18:08 tetron

I'd prefer there was a callback that Arvados uses for their own arv prefixed extension; and the default without that uses the name of the scattered variable(s) and their current value(s) when scattering.

mr-c avatar Aug 18 '21 16:08 mr-c