sh Support binding a callable with a baked command

First of all, I'd like to say thanks for this module, it's a lifesaver when working with stuff that only provides command line tools.

Sometimes, I'd like to bind a callable so that I can easily process the output of a command without repetitive boilerplate.

As an example, I'm currently working with the gcloud CLI, where I do stuff like this:

iam = sh.gcloud.iam.bake(format='json')
list_of_roles = json.loads(str(iam.roles.list(show_deleted=True, project='my-project')))

Having to do stuff like json.loads(str(...)) for every invocation gets repetitive quickly, and it also makes stuff a little more difficult to reason about.

Therefore, I'd like to have a feature to bind a callable such that callable(RunningCommand()) is returned. Usage of the API might look like this:

iam = sh.gcloud.iam.bake(format='json', _callable=lambda x: json.loads(str(x))
list_of_roles = iam.roles.list(show_deleted=True, project='my-project') # returns a list

May 27 '22 17:05 supriyo-biswas

Thanks for bringing it up. I can relate to your issue though, and have had a similar experience with repetition.

There is another feature I have been thinking about, called "shims" that would do something similar to what you describe. Basically, anyone in the community could write a "shim package" plugin, named something like sh-shim-gcloud. It would contain input and output shims that could be hooked into the underlying commands.

So for example, if you had the sh-shim-gcloud package installed, you would be able to call sh.gcloud.iam.roles.list(...) and if the format='json' arg was detected by the shim, it would automatically deserialize to python objects. Similarly, it could detect format='yaml'.

It's a very general solution, but it would allow community members to add their own shim packages, and sh would pick them up and use them automatically.

We can leave this issue open to invite more discussion around this topic.

May 27 '22 19:05 amoffat

I have a feeling we're special-casing something that could be solved with standard Python customization patterns. With a small patch to sh you could do:

import json
import sh

class MyRunningCommand(sh.RunningCommand):
    def json(self):
        return json.loads(str(self))

sh.Command.RunningCommandCls = MyRunningCommand

echo = sh.Command("echo")
res = echo('{"a": 123}').json()
print(res, type(res))  # {'a': 123} <class 'dict'>

There's a small caveat in that you need to the sh.Command() indirection, due to the way SelfWrapper handles imports.

The nice thing is that it allows customization of other aspects of RunningCommand, also things that we haven't thought of yet.

The patch, for reference:

--- a/sh.py
+++ b/sh.py
@@ -1157,6 +1157,7 @@ class Command(object):
     RunningCommand object, which represents the Command put into an execution
     state. """
     thread_local = threading.local()
+    RunningCommandCls = RunningCommand
 
     _call_args = {
         "fg": False,  # run command in foreground
@@ -1517,7 +1518,7 @@ class Command(object):
         if output_redirect_is_filename(stderr):
             stderr = open(str(stderr), "wb")
 
-        return RunningCommand(cmd, call_args, stdin, stdout, stderr)
+        return self.__class__.RunningCommandCls(cmd, call_args, stdin, stdout, stderr)

May 31 '22 08:05 ecederstrand

@supriyo-biswas I just committed the above patch. Would my suggested solution work for you?

Aug 04 '22 11:08 ecederstrand

With the second PR, the standard case without the sh.Command() indirection also works:

import json
from sh import RunningCommand, Command, echo


class MyRunningCommand(RunningCommand):
    def json(self):
        return json.loads(str(self))


Command.RunningCommandCls = MyRunningCommand

res = echo('{"a": 123}').json()
print(res, type(res))  # {'a': 123} <class 'dict'>

Aug 04 '22 14:08 ecederstrand

There are other commands for which I’d like to have the ability to get their raw outputs. If I understand correctly, the current proposal is a all-or-nothing deal?

Aug 04 '22 15:08 supriyo-biswas

Nope, this will work just like before in the normal case. But if you want the output transformed to JSON, you call the .json() method.

Aug 05 '22 05:08 ecederstrand

In my opinion the JSON method is still repetitive boilerplate, my aim for this feature being something that feels natural, as if the externally called command were a part of Python itself.

However, I would understand if you want to take the project in a different direction. The PRs that I made continue to work for me :)

Aug 05 '22 16:08 supriyo-biswas

Coming from C# and Java, I'm not sure I would call the 7 chars in .json() "boilerplate", compared to the alternative _callable=.... Anyway, if you really want to avoid the extra method call, then you could have your __str__ autodetect the output format:

import json
from sh import RunningCommand, Command, some_command

class MyRunningCommand(RunningCommand):
    def __str__(self):
        res = super().__str__()
        if b'--format=json' in self.cmd:
            return json.loads(res)
        return res


Command.RunningCommandCls = MyRunningCommand

res = some_command('{"a": 123}', format="json")
print(res, type(res))

Aug 08 '22 05:08 ecederstrand

@ecederstrand one concern I have is manipulating the global Command.RunningCommandCls from potentially different places. Not only as a race condition, but with different submodules competing and overwriting that class type. There would need to be a way to scope that change locally. We have at least one mechanism for scoping sh objects (execution contexts), but it will add to the complexity.

Revisiting this thread again, @supriyo-biswas your original suggestion is ergonomic but the complication with it is that output from a subprocess is not produced at once... it is streamed in asynchronously, in different chunk sizes, and put into different buffers. The pseudocode from your original suggestion implies that all of the data has been received and the process has ended, but it would need to account for the streaming cases somehow. Do you have some thoughts on how this post processor should handle the streaming case?

Aug 10 '22 05:08 amoffat

@amoffat I understand your concern and agree that messing with global state is problematic.

It is actually possible to change the RunningCommandCls of just the execution context:

import sh

sh2 = sh()

class A(sh.RunningCommand):
    def __str__(self):
        return f"In {self.__class__}"

class B(A):
    pass

sh.Command.RunningCommandCls = A
sh2.Command.RunningCommandCls = B
print(sh.echo())
print(sh2.echo())

which outputs:

In <class '__main__.A'>
In <class '__main__.B'>

Aug 10 '22 07:08 ecederstrand

@ecederstrand oh, great! that solves my concern. it sounds like this issue is solved then yeah? @supriyo-biswas if you'd like to work on your original suggestion, feel free to start a PR. i think there is no reason why both solutions can't exist.

Aug 12 '22 17:08 amoffat