grass.experimental: Add object to access tools as functions
This adds a Tools class which allows to access GRASS tools (modules) to be accessed using methods. Once an instance is created, calling a tool is calling a function (method) similarly to grass.jupyter.Map. Unlike grass.script, this does not require generic function name and unlike grass.pygrass module shortcuts, this does not require special objects to mimic the module families.
Outputs are handled through a returned object which is result of automatic capture of outputs and can do conversions from known formats using properties.
Usage example is in the _test() function in the file.
The code is included under new grass.experimental package which allows merging the code even when further breaking changes are anticipated.
It seems to be an useful addition. On the other hand we have already two APIs to run GRASS modules: grass.script.*_command() and grass.pygrass.modules which is already confusing for the user. What is a benefit of the third one? It would be useful to merge existing APIs into single one instead introducing another one.
It seems to be an useful addition.
I still need to provide more context for this, but do you see some benefits already?
On the other hand we have already two APIs to run GRASS modules:
grass.script.*_command()andgrass.pygrass.moduleswhich is already confusing for the user.
The intro to this is obviously xkcd Standards.
I'm not happy with the two competing interfaces. It's almost three, because we have Module and than also shortcuts.
As far as I understand, grass.script.*_command() was written to closely mimic the Bash experience with minimal involvement of Python. Python layer is mostly just avoiding need to pass all parameters as strings.
grass.pygrass.modules was written to mimic the grass.script.*_command() API and to manipulate the module calls themselves.
What is a benefit of the third one?
The design idea is 1) to make the module (tool) calls as close to Python function calls as possible and 2) to access the results conveniently. To access the (text) results, it tries to mimic subprocess.run.
Additionally, it tries to 1) provide consistent access to all modules and 2) allow for extensibility, e.g., associating session parameters or computational region with a Tools object rather than passing it to every method.
The existing APIs are more general in some ways, especially because they make no assumptions about the output or its size. This API makes the assumption that you want the text output Python or that it is something small and you can just ignore that. If not, you need to use a more general API. After all, Tools itself, is using pipe_command to do the job.
It would be useful to merge existing APIs into single one instead introducing another one.
Given the different goals of the two APIs, I was not able to figure out how these can be merged. For example, the Module class from grass.pygrass was supposed to be a drop-in replacement for run_command, but it was not used that way much (maybe because it forces you to use class as an function). Any suggestions? What would be the features and aspects of each API worth keeping? For example, the Tools object might be able to create instances of the Module class.
I can also see that some parts of the new API could be part of the old ones like output-parsing related properties for the Module class, but there are some existing issues which the new API is trying to fix such as r.slope_aspect spelling in PyGRASS shortcuts and Python function name plus tools name as a string in grass.script.
Finally, the subprocess changed too over the years, introducing new functions with run being the latest addition, so reevaluation of our APIs seems prudent even if it involves adding functions as subprocess did.
Anyway, I think some unification would be an ideal scenario.
This is how exceptions look like currently in this PR: The error (whole stderr) is part of the exception, i.e., always printed with the traceback, not elsewhere, and it is under the traceback, not above like now (or even somewhere else in case of notebooks and GUI).
Traceback (most recent call last):
File "experimental/tools.py", line 252, in <module>
_test()
File "experimental/tools.py", line 241, in _test
tools_pro.feed_input_to("13.45,29.96,200").v_in_ascii(
File "experimental/tools.py", line 185, in wrapper
return self.run(grass_module, **kwargs)
File "experimental/tools.py", line 148, in run
raise gs.CalledModuleError(
grass.exceptions.CalledModuleError: Module run `v.in.ascii input=- output=point format=xstandard` ended with an error.
The subprocess ended with a non-zero return code: 1. See the following errors:
ERROR: Value <xstandard> out of range for parameter <format>
Legal range: point,standard
Traceback (most recent call last):
File "experimental/tools.py", line 252, in <module>
_test()
File "experimental/tools.py", line 241, in _test
tools_pro.feed_input_to("13.45,29.96,200").v_in_ascii(
File "experimental/tools.py", line 185, in wrapper
return self.run(grass_module, **kwargs)
File "experimental/tools.py", line 148, in run
raise gs.CalledModuleError(
grass.exceptions.CalledModuleError: Module run `v.in.ascii input=- output=point format=standard` ended with an error.
The subprocess ended with a non-zero return code: 1. See the following errors:
WARNING: Vector map <point> already exists and will be overwritten
WARNING: Unexpected data in vector header:
[13.45,29.96,200]
ERROR: Import failed
Solved conflicts
While the code quality is down because my node version is too old and I can't run pre-commit, this can now be used to access JSON output directly (and naturally fails to parse if there is not JSON output):
mean_value = tools.r_univar(map="surface", format="json")[0]["mean"]
It can also suggest (one or more) tools in case of misspelling:
tools.r_sloppy_respect(elevation="surface", slope="slope")[0]["mean"]
AttributeError: Tool r.sloppy.respect not found. Did you mean: r.slope.aspect?
While the code quality is down because my node version is too old and I can't run pre-commit,
Since somewhere during the code sprint, in the PR that updated markdown lint in pre-commit, I had to fix this usability problem as the Ubuntu 22.04 runners defaulted to node 18 (others are installed but not active), but is already EOL or very close to, and the tool dropped support since. I configured it in a way that most normal users won't need to do anything or think about. Pre-commit uses the system node if available, otherwise sets up one in an environment similar to venvs for python.
If you would to try again today (with the pre-commit file from the main branch, or the branch updated), it should work as expected.
Experimental CLI
This is not really that important for this PR specifically, but similarly to what I did when adding mapset locking to the Python library, I added CLI for testing.
Set up needed without FHS:
export PYTHONPATH=$(./bin.x86_64-pc-linux-gnu/grass --config python-path)
There is only temporary XY project set up, so it is not really useful at this point, except for things like help or g.extension:
python -m grass.app run g.extension -l
python -m grass.app run g.region --help
python -m grass.app run r.slope.aspect --interface-description
I'm using the CLI as an unusual use case for the Tools API, so it helps me understand that standard input and output work as expected, and they do:
# stdin
echo "a = 6" | python -m grass.app run r.mapcalc file=-
# stdout
python -m grass.app run g.region -p | grep res
The CLI will become more interesting over time with changes such as #5877 and #5843.
Comparing to and migrating from run_command family of functions
Here are examples of how the different use cases of run_command and friends look like with the Tools API, organized by the grass.script counterparts to Tools API calls.
You can just thumbs up this if you find that reasonable, but feel free to comment, too. The new API keeps the focus on the tools themselves rather than having user go through different functions to call the tool with different inputs and outputs (run_command vs parse_command vs read_command vs write_command) or even through dedicated wrappers to get the output of the tool in a form reasonable in Python context (g.region as region, g.list as list_strings, etc.)
Imports
# original:
import grass.script as gs
# replacement
from grass.experimental.tools import Tools
import io # only needed when stdin is used
run_command - just run the tool
# original:
gs.run_command(
"r.random.surface", output="surface", seed=42
)
# replacement using the run function which is syntactically close to run_command:
tools = Tools() # same for one or multiple calls
tools.run("r.random.surface", output="surface2", seed=42) # name as a string
# assuming we already have tools and using the function syntax:
tools.r_random_surface(output="surface3", seed=42) # name as a function
write_command - provide standard input (text)
# original:
gs.write_command(
"v.in.ascii",
input="-",
output="point1",
separator=",",
stdin="13.45,29.96,200\n",
)
# replacement:
tools.run(
"v.in.ascii",
input=io.StringIO("13.45,29.96,200\n"),
output="point2",
separator=",",
)
# or with function name syntax:
tools.v_in_ascii(
input=io.StringIO("13.45,29.96,200\n"),
output="point3",
separator=",",
)
read_command - get standard output (text)
# original:
assert (
gs.read_command("g.region", flags="c")
== "center easting: 0.500000\ncenter northing: 0.500000\n"
)
# replacement:
assert (
tools.run("g.region", flags="c").stdout
== "center easting: 0.500000\ncenter northing: 0.500000\n"
)
# or with function name syntax:
assert (
tools.g_region(flags="c").text
== "center easting: 0.500000\ncenter northing: 0.500000"
)
parse_command - get machine readable standard output
# original (numbers are strings):
assert gs.parse_command(
"g.region", flags="c", format="shell"
) == {
"center_easting": "0.500000",
"center_northing": "0.500000",
}
# numbers are always numbers with JSON:
assert gs.parse_command(
"g.region", flags="c", format="json"
) == {
"center_easting": 0.5,
"center_northing": 0.5,
}
# replacement with format=shell (numbers are not strings, but actual numbers as in JSON
# if they convert to Python int or float):
assert tools.run("g.region", flags="c", format="shell").keyval == {
"center_easting": 0.5,
"center_northing": 0.5,
}
# parse_command with JSON and the function call syntax:
assert tools.g_region(flags="c", format="json").json == {
"center_easting": 0.5,
"center_northing": 0.5,
}
parse_command storing JSON output in a variable and accessing individual values
# original:
data = gs.parse_command(
"g.region", flags="c", format="json"
)
assert data["center_easting"] == 0.5
assert data["center_northing"] == 0.5
# replacement:
data = tools.g_region(flags="c", format="json")
assert data["center_easting"] == 0.5
assert data["center_northing"] == 0.5
Dedicated wrappers: r.mapcalc
# mapcalc wrapper of r.mapcalc
# original:
gs.mapcalc("a = 1")
# replacement for short expressions:
tools.r_mapcalc(expression="b = 1")
# replacement for long expressions:
tools.r_mapcalc(file=io.StringIO("c = 1"))
Dedicated wrappers: g.list
# test data preparation (for comparison of the results):
names = ["a", "b", "c", "surface", "surface2", "surface3"]
# original:
assert gs.list_grouped("raster")["PERMANENT"] == names
# replacement (using the JSON output of g.list):
assert [
item["name"]
for item in tools.g_list(type="raster", format="json")
if item["mapset"] == "PERMANENT"
] == names
# original and replacement (directly comparing the results):
assert gs.list_strings("raster") == [
item["fullname"] for item in tools.g_list(type="raster", format="json")
]
# original and replacement (directly comparing the results):
assert gs.list_pairs("raster") == [
(item["name"], item["mapset"])
for item in tools.g_list(type="raster", format="json")
]
Dedicated wrappers: all other tools
# Wrappers in grass.script usually parse shell-script style key-value pairs,
# and convert values from strings to numbers, e.g. g.region:
assert gs.region()["rows"] == 1
# Conversion is done automatically in Tools and/or with JSON, and the basic tool
# call syntax is more lightweight, so the direct tool call is not that different
# from a wrapper. Direct tool calling also benefits from better defaults (e.g.,
# printing more in JSON) and more consistent tool behavior (e.g., tools accepting
# format="json"). So, direct call of g.region to obtain the number of rows:
assert tools.g_region(flags="p", format="json")["rows"] == 1
run_command with returncode
# original:
assert (
gs.run_command(
"r.mask.status", flags="t", errors="status"
)
== 1
)
# replacement:
tools = Tools(errors="ignore")
assert tools.run("r.mask.status", flags="t").returncode == 1
assert tools.r_mask_status(flags="t").returncode == 1
run_command with overwrite
# original:
gs.run_command(
"r.random.surface",
output="surface",
seed=42,
overwrite=True,
)
# replacement:
tools = Tools()
tools.r_random_surface(output="surface", seed=42, overwrite=True)
# or with global overwrite:
tools = Tools(overwrite=True)
tools.r_random_surface(output="surface", seed=42)
I updated documentation of the Tools class and documentation of the test functions. I also updated the PR description to reflect the latest state.
Does anyone have any unanswered questions about this PR or the Tools API in general? I'm leaning towards moving it from grass.experimental.tools to grass.tools.
This is now ready for review. The original description is updated. A follow up PR #6015 will add this to the generated tool documentation.
Thanks for looking at the coverage @echoix. I still struggle with evaluating it. This should be pretty well covered and my goal indeed is to be at 100% coverage or very close to it.
Thanks for looking at the coverage @echoix. I still struggle with evaluating it. This should be pretty well covered and my goal indeed is to be at 100% coverage or very close to it.
Indeed, 100% is not always desirable. I just took a quick look, and indeed, almost 100% patch coverage, except two cases, one I think is more easily testable. I’ll write a little comment in these places
Not only that this is ready, but #6111 (replacing grass.script by grass.tools in written documentation) and #6015 (adding grass.tools to generated tools documentation) are close to going in after this.
Not only that this is ready, but #6111 (replacing grass.script by grass.tools in written documentation) and #6015 (adding grass.tools to generated tools documentation) are close to going in after this.
I'll make sure this gets merged for the weekend
You made it! 100% patch coverage:
@wenzeslaus Do you want to prepare the commit message?