pypsa-eur Add user friendly messages to snakemake workflow

Closes #1844 .

Changes proposed in this Pull Request

Adds user-readable messages describing what each rule does in understandable language.

Instead of:

[Fri Oct  3 17:11:26 2025]
Job 0: plot_power_networks_clustered
Reason: Input files updated by another job: resources/test-elec-tyndp/maps/power-network-s-all.pdf
[Fri Oct  3 17:11:26 2025]
Finished jobid: 0 (Rule: plot_power_networks_clustered)

we see

[Fri Oct  3 17:11:26 2025]
Job 0: Plotting clustered power network topology
Reason: Input files updated by another job: resources/test-elec-tyndp/maps/power-network-s-all.pdf
[Fri Oct  3 17:11:26 2025]
Finished jobid: 0 (Rule: plot_power_networks_clustered)

Checklist

[x] I tested my contribution locally and it works as intended.
[x] Code and workflow changes are sufficiently documented.
[x] Changed dependencies are added to envs/environment.yaml.
[x] Changes in configuration options are added in config/config.default.yaml.
[x] Changes in configuration options are documented in doc/configtables/*.csv.
[x] Sources of newly added data are documented in doc/data_sources.rst.
[x] A release note doc/release_notes.rst is added.

TODO

[ ] Integrate with PyPSA-Eur documentation

Oct 03 '25 15:10 willu47

Hi @willu47, Thanks for this PR! It will certainly improve the readability of the logs. However, I'm concerned that removing the rule name from the logs when the rule starts cause inconvenience. I'm used to searching for rule names in the logs when debugging. Using the example in the description, I would only find finished jobs. This makes it more difficult to determine when the rule started, and where to look for logs from a specific rule. This is particularly important when multiple rules are logging at the same time.

Oct 05 '25 12:10 tgilon

NB this could be mitigated by adding the rule name into the message. There is a placeholder for it as well:

rule my_cool_rule:
    message:
        "This is a cool message for {rule}."

gives

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job             count
------------  -------
my_cool_rule        1
total               1

Select jobs to execute...
Execute 1 jobs...

[Mon Oct  6 09:31:38 2025]
Job 0: This is a cool message for my_cool_rule.
Reason: Rules with neither input nor output files are always executed.
[Mon Oct  6 09:31:38 2025]
Finished jobid: 0 (Rule: my_cool_rule)

edit: Or also dynamically, could just prepend it after all rules:

rule my_cool_rule:
    message:
        "This is a cool message for {rule}."

# Make sure not to use `rule` as a variable name
for name, r in rules._rules.items():
    print(r.rule.message)
    r.rule.message += " And this is an extra message for {rule}."

output:

Building DAG of jobs...
Job stats:
job             count
------------  -------
my_cool_rule        1
total               1


[Mon Oct  6 09:40:40 2025]
Job 0: This is a cool message for my_cool_rule. And this is an extra message for my_cool_rule.
Reason: Rules with neither input nor output files are always executed.
Job stats:
job             count
------------  -------
my_cool_rule        1
total               1

Reasons:
    (check individual jobs above for details)
    neither input nor output:
        my_cool_rule
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

Oct 06 '25 07:10 euronion

On the advice of @euronion, I've appended the rule names to the end of the messages. Here's what the job start and end points look like in the logs now:

[Mon Oct  6 09:59:23 2025]
Job 30: Retrieving synthetic electricity demand data (Rule: retrieve_synthetic_electricity_demand)
Reason: Missing output files: data/load_synthetic_raw.csv
[Mon Oct  6 09:59:28 2025]
Finished jobid: 30 (Rule: retrieve_synthetic_electricity_demand)
3 of 45 steps (7%) done

[Mon Oct  6 09:59:23 2025]
Job 38: Retrieving shipping raster data (Rule: retrieve_ship_raster)
Reason: Missing output files: data/shipdensity_global.zip
[Mon Oct  6 09:59:28 2025]
Finished jobid: 38 (Rule: retrieve_ship_raster)
4 of 45 steps (9%) done

[Mon Oct  6 09:59:28 2025]
Job 8: Retrieving Marine Regions data (Rule: retrieve_eez)
Reason: Missing output files: data/eez/World_EEZ_v12_20231025_LR/eez_v12_lowres.gpkg
[Mon Oct  6 09:59:28 2025]
Finished jobid: 8 (Rule: retrieve_eez)
5 of 45 steps (11%) done

Oct 06 '25 09:10 willu47

On the advice of @euronion, I've appended the rule names to the end of the messages. Here's what the job start and end points look like in the logs now:

That's definitely more practical for me, at least. Thank you, @willu47!

Oct 06 '25 09:10 tgilon

Thanks @willu47 ! I would rather vote for putting it at the front:
[Mon Oct  6 09:59:23 2025]
Job 30: retrieve_synthetic_electricity_demand - Retrieving synthetic electricity demand data.
Reason: Missing output files: data/load_synthetic_raw.csv
For some rules, the messages contain useful additional information (e.g. when wildcards are used). But for others, there is no added value (e.g. simplify_network vs "Simplifying network"). The trade off is fine for me, but I think the rule name, which is the most important information, should still be at the beginning

I think this is a valid point by @lkstrp . Overwriting the message also hides information not only about the wildcards, but also about the resources and outputs. I'd prefer it if we could keep the standard output message from snakemake and with the additional messages you have prepared @willu47 .

Although looking at the code in snakemake it seems more like a bug that the additional information is hidden. Can I summon @coroa on this briefly?

Oct 08 '25 12:10 euronion

I very much support the idea of descriptive messages in the console output!

At the same time, I agree with the concerns about rulename, wildcard, input, output, and resource information, which should not be lost.

Let's try to have both?

Oct 08 '25 16:10 fneum

Although looking at the code in snakemake it seems more like a bug that the additional information is hidden. Can I summon @coroa on this briefly?

No, not a bug. It's just quite complicated, the logging system shuttles this extra struct down to finally: https://github.com/snakemake/snakemake/blob/2eb9079dae29f271aeacff75e59f6c2f6a0d352d/src/snakemake/logging.py#L187-L201 where if a message is given, it overrides the default job info, that's sad, but not really a good way around it except for writing a custom logger plugin for snakemake. See https://github.com/snakemake/snakemake-interface-logger-plugins for details.

This actually sent me on an exploration spree which resulted in the following experiment: https://github.com/coroa/snakemake-logger-plugin-tui

You can install that with pip install -e https://github.com/coroa/snakemake-logger-plugin-tui in your pypsa-eur env.

And then trial run it with: snakemake --logger tui -c4. It is neat. unfortunately it does not receive any information about the downloads, so while everything downloads it just looks stuck, error handling is also not there yet, and i think the approach is limited and it would be good to move the tui out of the logging handler actually. We can set up a quick session to discuss if someone is interested.

It integrates with the messages here, so feel free to use this branch.

Oct 10 '25 00:10 coroa

Thanks all for the comments. If I try to summarise the discussion so far:

We would like to see more descriptive messages for each of the PyPSA-Eur rules, so that it is easier to understand the purpose of each rule, particularly for new users. Some of the rule names are quite cryptic while others are straightforward.
[Power/all] users need to see information on inputs, outputs, rule names and other key rule parameters to rules. This information is by default printed to the console as well as to the log.
Using the message attribute overwrites all existing console output, essentially hiding the key metadata associated with a rule, including the rule name. This makes debugging harder.

I for one am surprised that snakemake does not seem to use a two tier logging structure out of the box. Normally INFO level messages are much less verbose than what snakemake prints out, and would correspond to a more user-friendly description level. A DEBUG level would then include more verbose input and output information corresponding to what a power user might want. Personally, I would also appreciate a less verbose INFO level print out to the console, and a more verbose DEBUG output streamed to the filesystem.

Way forward

Option 1

Incorporate all information needed for logging into the descriptive message. Messages can include wildcards (this is already implemented); and we could add a custom function reproducing the existing logging behaviour of snakemake, only adding the descriptive message.

Option 2

A custom logging plugin (TUI), enabling different logging levels, the redirection of verbose DEBUG messages to the filesystem and descriptive INFO level messages to the console.

Option 3

Leave it as it is! Close this pull request and save this conversation for posterity!

Oct 10 '25 09:10 willu47

Thanks all for the comments. If I try to summarise the discussion so far:

We would like to see more descriptive messages for each of the PyPSA-Eur rules, so that it is easier to understand the purpose of each rule, particularly for new users. Some of the rule names are quite cryptic while others are straightforward.

[Power/all] users need to see information on inputs, outputs, rule names and other key rule parameters to rules. This information is by default printed to the console as well as to the log.

Using the message attribute overwrites all existing console output, essentially hiding the key metadata associated with a rule, including the rule name. This makes debugging harder.

I for one am surprised that snakemake does not seem to use a two tier logging structure out of the box. Normally INFO level messages are much less verbose than what snakemake prints out, and would correspond to a more user-friendly description level. A DEBUG level would then include more verbose input and output information corresponding to what a power user might want. Personally, I would also appreciate a less verbose INFO level print out to the console, and a more verbose DEBUG output streamed to the filesystem.

Way forward

Option 1

Incorporate all information needed for logging into the descriptive message. Messages can include wildcards (this is already implemented); and we could add a custom function reproducing the existing logging behaviour of snakemake, only adding the descriptive message.

Option 2

A custom logging plugin (TUI), enabling different logging levels, the redirection of verbose DEBUG messages to the filesystem and descriptive INFO level messages to the console.

Option 3

Leave it as it is! Close this pull request and save this conversation for posterity!

I would favour Option 1: Reproducing the original snakemake messages should not be to difficult, since all placeholders like {resources}, {input}, {output}, ... work in a rule's message.

Oct 10 '25 09:10 euronion

The minimal variant of Option 2, is not difficult, demonstration in #1855 .

Oct 10 '25 16:10 coroa

@coroa I have merged/cherry-picked your commits implementing the plugin logger and then renamed it to PyPSA. You can close #1855 without merging. I noticed that the log file being produced does not contain sufficient information for debugging - but I think that is a separate issue. For now, I believe the PR is ready for final review and merge.

Oct 22 '25 11:10 willu47

Glad, you like it. It will be a small nightmare or maybe a good claude job to align with #1869 .

Oct 27 '25 21:10 coroa

@coroa I have merged/cherry-picked your commits implementing the plugin logger and then renamed it to PyPSA. You can close #1855 without merging. I noticed that the log file being produced does not contain sufficient information for debugging - but I think that is a separate issue. For now, I believe the PR is ready for final review and merge.

Thanks for that @willu47 . It looks good, i would say. Yes, it is a known issue that the logfile does not contain all relevant output, because snakemake.log only contains logging output from the main snakemake executor not the launched job subprocesses. It's a thing that has been long on my list to try to improve the job log output. But agreed not something for this PR.

I would like to hold off with merging this one until we get the retrieval updates in.

Oct 27 '25 21:10 coroa