snakemake icon indicating copy to clipboard operation
snakemake copied to clipboard

Bring --draft-notebook to basic python scripts

Open moritzschaefer opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe. I like to write scripts interactively (as is nicely supported via --edit-notebook), but I dislike the jupyter notebook ecosystem.

Describe the solution you'd like It would be great to have an option --draft-script or --print-snakemake-object-python that prints the python code to populate the snakemake object, as in the first cell of the notebook launched by --edit-notebook.

Describe alternatives you've considered

  • Stick to the notebook
  • Copy the code line from the notebook However: This feature is about convenience and iterating without barriers (such as the bulky jupyter server), so it would be great to have this slim feature.

Additional context

  • I would actually like the most to work in org-mode (via org-babel), thus an equivalent --draft-org of --draft-notebook would be exactly what I need. But understand that it's probably too niche to support it. The proposed feature above would serve sufficiently though.

moritzschaefer avatar Mar 09 '24 15:03 moritzschaefer

Hi,

I would also be interested in this feature and took some time to look into it.

As a temporary workaround, I saw that there is a mock_snakemake() function in the PyPSA/pypsa-eur repository that does something along those line (link to original PR. I'm attaching below a simplified version of their helpers.py file with only that function and a small wrapper I created to do a try/except so that scripts would work both with snakemake and with a regular IPython or Python session.

@johanneskoester would there be interest for a feature like this? If so I would be happy to try to contribute. I guess that within Snakemake the mock_snakemake() would not be an optimal implementation? I can see at least two paths to achieve this:

  • Use the functionality already implemented by PythonScript, but instead of execute_script() after write_script(), snakemake could:
    • drop in a text editor, or
    • Add a preamble to the top of the script and save it permanently, then the user can remove this preamble when no longer needed or update it if the snakefile changes, or
    • write the file in an easily discoverable location (maybe the current script directory)"hang", let the user edit the draft script, and resume at some prompt.
  • Use something closer to mock_snakemake (or my slightly modified mockwrap() below) so that users who want to leverage this feature have to explicitly indicate it. The advantage of this would be that the preamble is dynamically re-generated every time the script is run.

Let me know if I should explore one of the following options. I'm new to Snakemake so there might very well be something I overlooked that makes this impossible or more complicated than I thought.

Thank you!

(Fixing this might also help with #247 and #2932 re debugging snakemake script with any editor's debugger)

The modified helpers.py with only mock_snakemake
# -*- coding: utf-8 -*-
# SPDX-FileCopyrightText: : 2017-2024 The PyPSA-Eur Authors
#
# SPDX-License-Identifier: MIT

import logging
from pathlib import Path

logger = logging.getLogger(__name__)


def mockwrap(
    rulename,
    root_dir=None,
    configfiles=None,
    submodule_dir="workflow/submodules/pypsa-eur",
    **wildcards,
):
    try:
        from snakemake.script import snakemake
    except ImportError:
        from helpers import mock_snakemake

        snakemake = mock_snakemake(
            rulename,
            root_dir=root_dir,
            configfiles=configfiles,
            submodule_dir=submodule_dir,
            **wildcards,
        )

    return snakemake


def mock_snakemake(
    rulename,
    root_dir=None,
    configfiles=None,
    submodule_dir="workflow/submodules/pypsa-eur",
    **wildcards,
):
    """
    This function is expected to be executed from the 'scripts'-directory of '
    the snakemake project. It returns a snakemake.script.Snakemake object,
    based on the Snakefile.

    If a rule has wildcards, you have to specify them in **wildcards.

    Parameters
    ----------
    rulename: str
        name of the rule for which the snakemake object should be generated
    root_dir: str/path-like
        path to the root directory of the snakemake project
    configfiles: list, str
        list of configfiles to be used to update the config
    submodule_dir: str, Path
        in case PyPSA-Eur is used as a submodule, submodule_dir is
        the path of pypsa-eur relative to the project directory.
    **wildcards:
        keyword arguments fixing the wildcards. Only necessary if wildcards are
        needed.
    """
    import os

    import snakemake as sm
    from snakemake.api import Workflow
    from snakemake.common import SNAKEFILE_CHOICES
    from snakemake.script import Snakemake
    from snakemake.settings.types import (
        ConfigSettings,
        DAGSettings,
        ResourceSettings,
        StorageSettings,
        WorkflowSettings,
    )

    script_dir = Path(__file__).parent.resolve()
    if root_dir is None:
        root_dir = script_dir.parent
    else:
        root_dir = Path(root_dir).resolve()

    user_in_script_dir = Path.cwd().resolve() == script_dir
    if str(submodule_dir) in __file__:
        # the submodule_dir path is only need to locate the project dir
        os.chdir(Path(__file__[: __file__.find(str(submodule_dir))]))
    elif user_in_script_dir:
        os.chdir(root_dir)
    elif Path.cwd().resolve() != root_dir:
        raise RuntimeError(
            "mock_snakemake has to be run from the repository root"
            f" {root_dir} or scripts directory {script_dir}"
        )
    try:
        for p in SNAKEFILE_CHOICES:
            if os.path.exists(p):
                snakefile = p
                break
        if configfiles is None:
            configfiles = []
        elif isinstance(configfiles, str):
            configfiles = [configfiles]

        resource_settings = ResourceSettings()
        config_settings = ConfigSettings(configfiles=map(Path, configfiles))
        workflow_settings = WorkflowSettings()
        storage_settings = StorageSettings()
        dag_settings = DAGSettings(rerun_triggers=[])
        workflow = Workflow(
            config_settings,
            resource_settings,
            workflow_settings,
            storage_settings,
            dag_settings,
            storage_provider_settings=dict(),
        )
        workflow.include(snakefile)

        if configfiles:
            for f in configfiles:
                if not os.path.exists(f):
                    raise FileNotFoundError(f"Config file {f} does not exist.")
                workflow.configfile(f)

        workflow.global_resources = {}
        rule = workflow.get_rule(rulename)
        dag = sm.dag.DAG(workflow, rules=[rule])
        wc = dict(wildcards)
        job = sm.jobs.Job(rule, dag, wc)

        def make_accessable(*ios):
            for io in ios:
                for i, _ in enumerate(io):
                    io[i] = os.path.abspath(io[i])

        make_accessable(job.input, job.output, job.log)
        snakemake = Snakemake(
            job.input,
            job.output,
            job.params,
            job.wildcards,
            job.threads,
            job.resources,
            job.log,
            job.dag.workflow.config,
            job.rule.name,
            None,
        )
        # create log and output dir if not existent
        for path in list(snakemake.log) + list(snakemake.output):
            Path(path).parent.mkdir(parents=True, exist_ok=True)

    finally:
        if user_in_script_dir:
            os.chdir(script_dir)
    return snakemake

vandalt avatar Aug 27 '24 21:08 vandalt