grass.experimental: Add API and CLI to access tools without a session
Building on top of #2923 (not merged), this adds functionality which allows accessing "packed" native GRASS rasters to be used as tool parameters in command line:
grass run r.slope.aspect elevation=~/data/elevation.pack slope=~/data/slope.pack
grass run r.univar map=~/data/slope.pack
The above syntax is not actually implemented, but the code below works:
PYTHONPATH=$(grass --config python-path)
python -m grass.app run r.slope.aspect elevation=.../elevation.pack slope=.../slope.pack
The same functionality is also available from Python where it copies the syntax of plain Tools from #2923:
from grass.tools import StandaloneTools
tools = StandaloneTools()
tools.r_slope_aspect(elevation="elevation.pack", slope="slope.pack", aspect="aspect.pack")
print(f"Mean slope: {tools.r_univar(map='slope.pack')['mean']}")
The above syntax does not fully work, but the following one does:
tools.run("r.slope.aspect", elevation="elevation.pack", slope="slope.pack")
This PR is not meant for merging as is, but currently represents a final combination of all different features proposed. See discussion #5830 for details.
I know that this is just a draft so far but quick tests with prefix parameter look fine. Thanks for all the work! The whisperer in Python console does not whisper what I would expect (see below) but otherwise smooth whatever I tried.
>>> from grass.experimental import tools
>>> v = tools.Tools(prefix='v')
>>> v.random(output='test', npoints=5)
<grass.experimental.tools.ExecutedTool object at 0x7f5e34d8b8c0>
>>> # let's check if it is there
>>> g = tools.Tools(prefix='g')
>>> g.list(type='vector').text
'test'
>>> # whispering test
>>> v.
v.env v.feed_input_to( v.ignore_errors_of() v.levenshtein_distance( v.no_nonsense_run_from_list( v.parse_command( v.run( v.run_command( v.run_from_list( v.suggest_tools(
The whisper is not implemented yet, but the bulk of the underlying coding for that is already done for the errors.
File handling behavior
My question is if we value more complete consistency between CLI and Python API or we prefer having the best behavior possible in the given context.
The API naturally supports the following workflow where some imported data is reused between calls and some data is never exported:
tools = StandaloneTools()
tools.r_slope_aspect(elevation="elevation.grr", slope="slope.grr", aspect="aspect")
tools.r_flow(elevation="elevation", aspect="aspect", flowaccumulation="accumulation.grr", flags="3")
The elevation.grr raster is imported for the r.slope.aspect call and then it sits in the project, so the r.flow call can just use it. Similarly, aspect is created only within the project, so it is available for r.flow, but not exported. Here is the overview:
| Data | I/O | Handling |
|---|---|---|
| elevation.grr | input | imported, reused |
| slope | output | created, exported |
| aspect | temporary | created, used, not exported |
| accumulation | output | created, exported |
This is great, but slightly inconsistent with the command line behavior. There is no relation between the command line calls, so each call is separate and always has a new fresh project:
grass run r.slope.aspect elevation="elevation.grr" slope="slope.grr" aspect="aspect.grr"
grass run r.flow elevation="elevation.grr" aspect="aspect.grr" flowaccumulation="accumulation.grr" -3
The elevation.grr raster is now imported once of each call and aspect needs to be exported and imported to be used in the next call.
We could make Python API consistent by reducing the "state" aspect of StandaloneTools, and having each function call use a separate session with a fresh project. Then you would always write this:
tools = StandaloneTools()
tools.r_slope_aspect(elevation="elevation.grr", slope="slope.grr", aspect="aspect.grr")
tools.r_flow(elevation="elevation.grr", aspect="aspect.grr", flowaccumulation="accumulation.grr", flags="3")
In Python, we can make it configurable with parameters of the class (or different classes), for example:
tools1 = StandaloneTools(use_one_project=False) # behaves exactly like CLI
tools2 = StandaloneTools(use_one_project=True) # allows for rasters to be reused
This approach can allow control over other behaviors, for example, reduce_reimports=True may allow for elevation to be imported only once and aspect not at all in the following example:
tools = StandaloneTools()
tools.r_slope_aspect(elevation="elevation.grr", slope="slope.grr", aspect="aspect.grr")
# When we see elevation.grr and aspect.grr as inputs in the following function call,
# we will just use the ones we already have.
tools.r_flow(elevation="elevation.grr", aspect="aspect.grr", flowaccumulation="accumulation.grr", flags="3")
Even with the behavior being potentially configurable, I prefer going here with the best possible behavior for the context as opposed to complete consistency between the CLI and Python API. So, my choice at this point is to have one session (and one project) for all function calls with one StandaloneTools object.
Additionally, the different behavior in Python API does not mean that a user cannot achieve the same with CLI. For the CLI not to have feature parity, despite the different defaults, we could implement something like:
grass -c "elevation.grr" "project1"
grass --project "project1" run r.slope.aspect elevation="elevation.grr" slope="slope.grr" aspect="aspect"
grass --project "project1" run r.flow elevation="elevation" aspect="aspect" flowaccumulation="accumulation.grr" -3
rm -r "project1"
Do you agree or disagree with having the different default behavior in CLI and in Python API?
Even with the behavior being potentially configurable, I prefer going here with the best possible behavior for the context as opposed to complete consistency between the CLI and Python API
I fully agree with this approach. Exports and imports with each function call give it an undesired overhead.
When it comes to the options of configuration - the only option I would see useful is having something like tools1 = StandaloneTools(project_id=project1) with some default tmp_project. Then you could have in the same script two different projects and potentially compare their results (e.g., how do the results differ if I run it in two different projections?) or do some other tricks.
So basically it is creating a fluent interface in Python? Like what is often seen in JavaScript, but other languages too (like C#)
Computational region behavior
There is more than way one for the computational region to behave when calling multiple tools with the same StandaloneTools object in Python:
- First input in the first function call determines the computational region for the call and all subsequent calls.
- First input of each function call determines the computational region for the given function call (only).
- Computational region is never set automatically and user always needs to explicitly set it manually.
1. First input of the first call
First input of the first call of a tool (function) determines the computational region. Subsequent calls use that region.
tools = StandaloneTools()
# The following will take the first input and use it for the computational region.
tools.r_slope_aspect(
elevation="raster_file_3x3.grr",
aspect="file.grr",
flags="a",
)
# Now computational region is set to whatever raster_file_3x3.grr is.
# The following applies the standard GRASS resampling and extent rules.
tools.r_slope_aspect(
elevation="raster_file_4x5.grr",
aspect="file2.grr",
flags="a",
)
# file2.grr has now size 3x3 and if the extents are not overlapping, it contains only nulls.
This is the behavior currently implemented. The nice thing is that it allows for not using g.region at all (above) or using it at any point:
tools = StandaloneTools()
tools.g_region(raster="raster_file_4x5.grr")
tools.r_slope_aspect(
elevation="raster_file_3x3.grr",
aspect="file.grr",
flags="a",
)
# The output is now 4x5.
tools.g_region(raster="raster_file_3x3.grr")
tools.r_slope_aspect(
elevation="raster_file_4x5.grr",
aspect="file2.grr",
flags="a",
)
# The output is now 3x3.
The raster parameter of g.region simply works as expected. I added also checking for any computational region modification based on the underlying WIND file modification time. This way, the current code also supports any g.region parameter or, theoretically, any other tool which would change the region:
tools = StandaloneTools()
tools.g_region(n=..., ..., res=...)
tools.r_slope_aspect(
elevation="raster_file_3x3.grr",
aspect="file.grr",
flags="a",
)
I like this option because g.region works at any place as expected, but you can also completely leave it out. So, you can focus on the tools and your data, with some API-specific risks related to not handling extent and resolution explicitly, but if you know about computational region, and want to tap into its power, you can. I'm little less comfortable with inheriting the region from the first call in all the subsequent calls, but I expect this not to be an issue for most of workflows.
2. First input of each call
First input of the each call of a tool (function) determines the computational region for the given call. Subsequent calls are not influenced by the previous calls.
tools = StandaloneTools()
# The following will take the first input and use it for the computational region.
tools.r_slope_aspect(
elevation="raster_file_3x3.grr",
aspect="file.grr",
flags="a",
)
# Now computational region is set to whatever raster_file_3x3.grr is.
# The following will take the first input and use it for the computational region.
tools.r_slope_aspect(
elevation="raster_file_4x5.grr",
aspect="file2.grr",
flags="a",
)
# Now computational region is set to whatever raster_file_4x5.grr is.
# file2.grr has now size 4x5 and its the extent is overlapping with raster_file_4x5.grr.
This make the calls completely independent in terms of region. This means that also any g.region calls are ignored. A variation of this could change the behavior based on the computational region changes: If a change would be detected (based on the file modified time like in option 1), the computational region would be respected, otherwise each call would get its own computational region.
What I like about this option is that it is clear how each call behaves regardless of their order, and independent calls (with different StandaloneTools objects) give the same result as a series on calls on the same object. It also aligns well with CLI (see below). However, it does not work with g.region, or it would switch behavior on the fly to accommodate it.
3. Manual-only explicitly set region
The computational region is not set automatically and it defaults to whatever the default is (1x1 at at 0,0 this point). User needs to explicitly call g.region. Subsequent calls use that region.
tools = StandaloneTools()
# Set the computation region explicitly.
tools.g_region(raster="raster_file_3x3.grr")
tools.r_slope_aspect(
elevation="raster_file_3x3.grr",
aspect="file.grr",
flags="a",
)
# Now, set the computation region explicitly again if a different one is needed.
tools.g_region(raster="raster_file_4x5.grr")
tools.r_slope_aspect(
elevation="raster_file_4x5.grr",
aspect="file2.grr",
flags="a",
)
This works just the way it works now, so any experienced GRASS user will be right at home, but any user still needs to know about the computational region, and each workflow will have at least two steps, computational region setup and the actual tool call.
Configuration
We need to decide what is the default behavior, but we can also provide configuration for all behaviors, for example:
tools = StandaloneTools(region_from_first_call=True) # option 1
tools = StandaloneTools(region_for_each_call=True) # option 2
tools = StandaloneTools(explicit_region_only=True) # option 3
or:
tools = StandaloneTools(use_region=False, refresh_region=False) # option 1
tools = StandaloneTools(use_region=False, refresh_region=True) # option 2
tools = StandaloneTools(use_region=True, refresh_region=None) # option 3
Notably, the use of use_region: bool is similar to grass.jupyter.Map where by default, the first added raster, or the first added vector (possibly combined with a subsequently added raster), determines the computational region used internally for display, but using use_region=True turns of the automatic region setting, and simply follows the current computational region.
CLI
Similarly to the issue with data files, the CLI needs to behave a certain way because the individual calls do not share one object like the Python API does. CLI follows the option 2, each call has its own computational region.
# The following will take the first input and use it for the computational region.
grass run r.slope.aspect elevation="raster_file_3x3.grr" aspect="file.grr" -a
# The following will again take the first input and use it for the computational region.
grass run r.slope.aspect elevation="raster_file_4x5.grr" aspect="file.grr" -a
With using existing project (similarly to the feature parity in data file handling), we could provide CLI with a project parameter and a set of parameters related to computational region:
grass -c "elevation.grr" "project1"
grass --project ~/data/nc --use-region run r.slope.aspect elevation="raster_file_3x3.grr" aspect="file.grr" -a
grass --project ~/data/nc --use-region run r.slope.aspect elevation="raster_file_4x5.grr" aspect="file.grr" -a
rm -r "project1"
Bonus: Tracking state of computational region
While I used last modified time to track user edits to the computational region, we could support this tracking in the computational region itself. The current trouble is that computational region is tracked in a file called WIND, and this file is created with each mapset, in fact presence of the WIND file is the check used to recognize valid mapsets. While this is nice for tools because they can simply rely on computational region being set (this is happening in the library code, not the tool code itself), the computational region needs to be set before any tool runs, so this is possibly without any input data to provide a reasonable computational region. Later, there is no way of telling whether the values in the computational region are from a user or they are simply the default. The default is 1x1 at 0,0, but should we simply assume legitimate user case for that extent and resolution, and behave differently based on that. We don't do that now.
To help the system know what the status is, we could save the status, or here more the provenance, of the computational region to the computational region itself. So the WIND file would have a new key source with values default and user. While the creation of the default WIND file would store default, g.region would store user. Possibly, system or auto state could show that an automated system set the computational region through some means, but user did not touch it yet.
We could take a different approach, and have states based on from what the region was determined, namely determined from vector and raster (and then default and user for all the other states). If the system (that can be the StandaloneTools or the GUI) sees computational region which is default, but has a raster as tool parameter, it would call g.region with raster which would then store raster as the state. Subsequent calls would see raster, and would not touch the region. If first tool call has only a vector as a parameter, the system can call g.region with vector which would then store vector as the state. A next call with raster as a parameter would supply the resolution and alignment, and store raster. Again, subsequent calls would see raster, and would not touch the region.
The StandaloneTools don't need this. Even generally, this can be done with checking the time stamp or the content. However, this would be a way how to implement same behavior in different places possibly without using the same API.
When it comes to the options of configuration - the only option I would see useful is having something like
tools1 = StandaloneTools(project_id=project1)with some defaulttmp_project.
If I understand you correctly, the feature makes sense to me, and is already there. StandaloneTools can use an existing session:
# No project and no session
gs.create_project("project1")
with gs.setup.init("project1") as session:
tools = StandaloneTools(session=session)
# With an existing session, but in a separate mapset.
with grass.experimental.TemporaryMapsetSession() as session:
tools = StandaloneTools(session=session)
So basically it is creating a fluent interface in Python?
It seems to me that method chaining is heavily present in an fluent interface. I don't use method chaining here because the point is to return data when appropriate which the method changing prevents. Also, the methods here don't modify the object that much which is what the methods in fluent interface do (they modify the project, so more a side effect). Here, the tools object is an interface for functionality. In OOP, this could be a facade, providing a front-face to complex, underlying code, consisting of multiple components. Especially the NumPy piece is trying to achieve more the functional programming ideas than OOP, with the object being a necessary vehicle for providing good interface (tools as function names), cutting some overhead (at minimum the session setup), and possibly allowing for configuration.
When it comes to the options of configuration - the only option I would see useful is having something like
tools1 = StandaloneTools(project_id=project1)with some defaulttmp_project.If I understand you correctly, the feature makes sense to me, and is already there. StandaloneTools can use an existing session:
# No project and no session gs.create_project("project1") with gs.setup.init("project1") as session: tools = StandaloneTools(session=session)# With an existing session, but in a separate mapset. with grass.experimental.TemporaryMapsetSession() as session: tools = StandaloneTools(session=session)
Thanks, it is the latter one. The first one looks terrible to me as it uses both gscript and tools.
Use of NumPy array IO with the standalone tools API
The combination of NumPy array IO (from #5878) with the standalone tools API (from #5843 - this PR) allows to use tools with NumPy arrays without a project:
from grass.experimental.standalone_tools import StandaloneTools
tools = StandaloneTools()
slope = tools.r_slope_aspect(elevation=np.ones((2, 3)), slope=np.ndarray)
Complications with computational region
With how the StandaloneTools are implemented now, the following will fail because the initially set region will be incompatible with the array size in the second call (see option 1 on the region comment above):
from grass.experimental.standalone_tools import StandaloneTools
tools = StandaloneTools()
slope1 = tools.r_slope_aspect(elevation=np.ones((2, 3)), slope=np.ndarray)
slope2 = tools.r_slope_aspect(elevation=np.ones((5, 5)), slope=np.ndarray)
One way how to avoid it is providing some parameter to StandaloneTools, like StandaloneTools(refresh_region=True). Another way is to use multiple instances:
from grass.experimental.standalone_tools import StandaloneTools
slope1 = StandaloneTools().r_slope_aspect(elevation=np.ones((2, 3)), slope=np.ndarray)
slope2 = StandaloneTools().r_slope_aspect(elevation=np.ones((5, 5)), slope=np.ndarray)
Having the multiple calls and having that instance immediately forgotten does not look that great.
Evaluating length of user code
One could also argue that, in case of NumPy arrays, really plain functions are preferable over calling a tool as a method of an object because even a single call still requires creation of an object beforehand or in the same statement as in these two examples:
from grass.experimental.standalone_tools import StandaloneTools
tools = StandaloneTools()
slope = tools.r_slope_aspect(elevation=np.ones((2, 3)), slope=np.ndarray)
from grass.experimental.standalone_tools import StandaloneTools
slope = StandaloneTools().r_slope_aspect(elevation=np.ones((5, 5)), slope=np.ndarray)
Shortcut object in the library
We could create a StandaloneTools object on the Python module level, so that users can import it. This would be similar to grass.pygrass.modules.shortcuts (hence calling it shortcut here). In the library, we would have:
# grass/experimental/standalone_tools.py
tools = StandaloneTools(refresh_region=True, keep_data=False, use_one_project=False)
And then the user code would be:
# myscript.py
from grass.experimental.standalone_tools import tools
slope = tools.r_slope_aspect(elevation=np.ones((5, 5)), slope=np.ndarray)
This would exist alongside the option to create one or more StandaloneTools objects, and it would likely have different configuration (independent region, no data preserved, for truly standalone runs). Result would be possible confusion due to another option and some inconsistency, but it might be the best way how to provide such API because it creates simplest user code.
Dependent #2923 was merged