nextflow
nextflow copied to clipboard
Print process dictionary in logs (#5940)
What
Update the logs in the following manner:
-
Replace workflow process name list, which mixes defined process names and aliased process names
[main] DEBUG nextflow.Session - Workflow process names [dsl2]: awith two lists: one of process definitions from each file, one of aliases used for each process.
[main] DEBUG nextflow.Session - Workflow process definitions [dsl2]: main.nf [a, other_process, ...], sub.nf [a, ...] [main] DEBUG nextflow.Session - Workflow resolved process names: a[main.nf:a], sub:a[sub.nf:a], x[sub.nf:a] -
Complement the following log line:
[main] DEBUG nextflow.processor.TaskProcessor - Starting process > xwith typed list of inputs and outputs of the process.
Starting process > x (type:ArgName, ..., default:$) -> (type:OutName, ...)
Motivations
- Makes process info printed logs more complete. Human parsing of nextflow files is no longer needed to easily identify where the code of a process originates from.
- Corrects ambiguous merge of "conflicting" resolved names in the currently printed list: See #5940 for more details.
Deploy Preview for nextflow-docs-staging canceled.
| Name | Link |
|---|---|
| Latest commit | e3d30ad306a42a33bdab00bd4b93618cd7903200 |
| Latest deploy log | https://app.netlify.com/sites/nextflow-docs-staging/deploys/68143c57b06fdd0008c4fb21 |
Oops.. I thought I'd been very cautious not to alter any existing behavior when modifying the code.. but the unit tests seem broken.
After a quick look, I think one issue is that in ProcessDef unit test, ProcessDef instance is cloned without originating from a script already registered in ScriptMeta. Because of this, the cloneWithName() function I implemented fails. Of course, this scenario never occurr.s.ed in real workflow I tested my code with, as a cloned process always originates from a script.
Unfortunately, I'm off for two weeks so I can't have a look at why the tests are failing at the moment. (Also, I'm not using a proper IDE, only building and running, so I probably need to setup a proper dev environment to debug these issue.)
The PR is ready to be merged.
I noticed that the "resolved" name of the included process may be ambiguous, in case the same alias is used in different workflow, for different original process.
This is because when an include occurs in a workflow sub, say with process p being included as toto, the process is first cloned with name toto, before being re-cloned with the unique name sub:toto. If main workflow clones process q with the name toto then, the resolved name toto will appear twice, once for each workflow. The aliased process toto included in the main file, will not be recloned with a different name. Hence, the process toto will be executed, but it will be ambiguous whether it is the clone of p, or q.
I'll fix this issue before reopening the PR.
To better motivate the novelty brought by this PR, a link to a video where I explain how I study Nextflow in my research project: https://youtu.be/YbiVhuC3jx8?feature=shared&t=2197 The most interesting/relevant part of the video is at 36 min 37 sec. Don't hesitate to contact me for more info :)
Argh. @bentsherman I'm actually counting on this feature (and the associated PR) to build a post-execution trace analysis mechanism. Early results demonstrate that I can generate automatically configurations files for future executions of the pipeline, where parameterized resource allocation are closer to actual needs than developers guesstimates. On a cluster managed with Slurm, this generally enables more efficient scheduling of the computations.
Early open source results are presented here : https://youtu.be/YbiVhuC3jx8?si=z-z2JR5X-MDRm7dk&t=2204 and after consolidating them with additional pipelines, a publication is envisioned.
Although I understand this might not be strictly needed in Nextflow's current roadmap, I believe the PR to be harmless and relatively lightweight to merge...
That's great, but then we should pursue that feature directly rather than piggy-backing on a debug message. Consider submitting a new issue with your proposal and we can discuss how to best implement it in the runtime