nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

Print process dictionary in logs (#5940)

Open kdesnos opened this issue 7 months ago • 4 comments

What

Update the logs in the following manner:

  1. Replace workflow process name list, which mixes defined process names and aliased process names

    [main] DEBUG nextflow.Session - Workflow process names [dsl2]: a
    

    with two lists: one of process definitions from each file, one of aliases used for each process.

    [main] DEBUG nextflow.Session - Workflow process definitions [dsl2]: main.nf [a, other_process, ...], sub.nf [a, ...]
    [main] DEBUG nextflow.Session - Workflow resolved process names: a[main.nf:a], sub:a[sub.nf:a], x[sub.nf:a]  
    
  2. Complement the following log line:

    [main] DEBUG nextflow.processor.TaskProcessor - Starting process > x
    

    with typed list of inputs and outputs of the process.

    Starting process > x (type:ArgName, ..., default:$) -> (type:OutName, ...)
    

Motivations

  • Makes process info printed logs more complete. Human parsing of nextflow files is no longer needed to easily identify where the code of a process originates from.
  • Corrects ambiguous merge of "conflicting" resolved names in the currently printed list: See #5940 for more details.

kdesnos avatar Apr 04 '25 04:04 kdesnos

Deploy Preview for nextflow-docs-staging canceled.

Name Link
Latest commit e3d30ad306a42a33bdab00bd4b93618cd7903200
Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/68143c57b06fdd0008c4fb21

netlify[bot] avatar Apr 04 '25 04:04 netlify[bot]

Oops.. I thought I'd been very cautious not to alter any existing behavior when modifying the code.. but the unit tests seem broken.

After a quick look, I think one issue is that in ProcessDef unit test, ProcessDef instance is cloned without originating from a script already registered in ScriptMeta. Because of this, the cloneWithName() function I implemented fails. Of course, this scenario never occurr.s.ed in real workflow I tested my code with, as a cloned process always originates from a script.

Unfortunately, I'm off for two weeks so I can't have a look at why the tests are failing at the moment. (Also, I'm not using a proper IDE, only building and running, so I probably need to setup a proper dev environment to debug these issue.)

kdesnos avatar Apr 04 '25 05:04 kdesnos

The PR is ready to be merged.

kdesnos avatar Apr 28 '25 03:04 kdesnos

I noticed that the "resolved" name of the included process may be ambiguous, in case the same alias is used in different workflow, for different original process.

This is because when an include occurs in a workflow sub, say with process p being included as toto, the process is first cloned with name toto, before being re-cloned with the unique name sub:toto. If main workflow clones process q with the name toto then, the resolved name toto will appear twice, once for each workflow. The aliased process toto included in the main file, will not be recloned with a different name. Hence, the process toto will be executed, but it will be ambiguous whether it is the clone of p, or q.

I'll fix this issue before reopening the PR.

kdesnos avatar May 01 '25 07:05 kdesnos

To better motivate the novelty brought by this PR, a link to a video where I explain how I study Nextflow in my research project: https://youtu.be/YbiVhuC3jx8?feature=shared&t=2197 The most interesting/relevant part of the video is at 36 min 37 sec. Don't hesitate to contact me for more info :)

kdesnos avatar Jul 07 '25 14:07 kdesnos

Argh. @bentsherman I'm actually counting on this feature (and the associated PR) to build a post-execution trace analysis mechanism. Early results demonstrate that I can generate automatically configurations files for future executions of the pipeline, where parameterized resource allocation are closer to actual needs than developers guesstimates. On a cluster managed with Slurm, this generally enables more efficient scheduling of the computations.

Early open source results are presented here : https://youtu.be/YbiVhuC3jx8?si=z-z2JR5X-MDRm7dk&t=2204 and after consolidating them with additional pipelines, a publication is envisioned.

Although I understand this might not be strictly needed in Nextflow's current roadmap, I believe the PR to be harmless and relatively lightweight to merge...

kdesnos avatar Nov 19 '25 07:11 kdesnos

That's great, but then we should pursue that feature directly rather than piggy-backing on a debug message. Consider submitting a new issue with your proposal and we can discuss how to best implement it in the runtime

bentsherman avatar Nov 19 '25 14:11 bentsherman