Re-thinking the command substitution operators
Abstract
This is a proposal for re-designing the command substitution (a.k.a command pipelines) mechanisms in xonsh.
A simple example for command substitution in bash:
echo hi $(whoami)
NOTE: This example is very basic and naïve. Please look at the usage examples below.
Motivation
Command substitutions have a range of usage scenarios, and we want them all to be easy and concise. These usages differ by two factors - stripping and splitting.
- Stripping: Most unix and windows cli commands append newlines to their output, which we might want to strip.
- Splitting: A program might output multiple values separated by spaces, newlines, tabs or other tokens, which we might want to split to multiple arguments.
Some of the more common combinations aren't easily expressed in xonsh currently.
Examples for each scenario (commands with the corresponding output format):
| Strip trailing newline | No stripping | Strip all whitespace | |
|---|---|---|---|
| No splitting | whoami, uname, hostname, nproc |
echo -n ... | wc -l |
|
| Splitting by newline | find, git branch/log/diff --name-only |
||
| Splitting by whitespace | Meaningless | Meaningless | groups, yay -Qdtq, pkg-config |
| Splitting by other token | parsing /etc/passwd |
Current xonsh behavior
| Operator | Strip | Split | Return value | Mode |
|---|---|---|---|---|
$() |
No | No | String | Both |
!() |
No | No | CommandPipeline object | Python |
@$() |
Whitespace | Whitespace | List of args | Command |
$[] |
No | No | None | Both |
![] |
No | No | CommandPipeline object | Both |
Current state and comparisons with other shells
The first three options are the more common ones according to our current examples.
| Strip | Split | Xonsh | Bash | Zsh | Fish |
|---|---|---|---|---|---|
| Trailing newlines | No | @($(CMD).rstrip('\n')) |
- | "$(CMD)" |
- |
| Trailing newlines | Newlines | @($(CMD).splitlines('\n')) |
- | - | (CMD) |
| Whitespace | Whitespace | @$(CMD) |
$(CMD) |
$(CMD) |
(CMD | string split ' ' --no-empty) |
| No | No | $(CMD) |
- | - | - |
| Whitespace | No | @($(CMD).strip()) |
"$(CMD)" |
- | - |
| No | Newlines | @($(CMD).split('\n')) |
- | - | - |
| Whitespace | Newlines | @($(CMD).strip().split('\n')) |
- | - | (CMD | string trim | string split ' ' --no-empty) |
| Trailing newlines | Token | @($(CMD).rstrip('\n').split(TOKEN)) |
- | - | (CMD | string split TOKEN --no-empty) |
| No | Token | @($(CMD).split(TOKEN)) |
- | - | - |
| Whitespace | Token | @($(CMD).strip().split(TOKEN)) |
- | - | (CMD | string split TOKEN --no-empty | string trim) |
NOTE: As far as I can tell, none of the shells has an option to only strip the single last newline.
Current proposals
XEP-2 (by @anki-code)
See details and examples in the XEP - https://github.com/anki-code/xonsh-operators-proposal/blob/main/XEP-2.rst. In short:
| Mode | Current | Proposed |
|---|---|---|
| Python | $() returns output string. | $() returns CommandPipeline object. |
| Subproc | $() returns output string. | $() returns list of lines from output without trailing new line symbol. |
| Python | $[] returns None. | $[] returns HiddenCommandPipeline object. |
| Both | !() exists. | !() removed. |
| Both | ![] exists. | ![] removed. |
CommandPipeline (CP) class changes:
- Add str representation the same as
CP.out. - Remove trailing new lines in
CP.out,CP.linesandCP.__iter__. - Add all string methods i.e. the
$().split()will returnCP.out.split()that is IFS analogue in fact. - Add all string methods for lines i.e.
$().lines_find(txt)will return[l.find(txt) for l in CP.lines].
This issue's initial proposal with influence from @adqm
- Strip trailing newline in
$(). - Strip trailing newlines in
!().__iter__(i.e. use.splitlines()).
Pros:
- No type-changing back-compatibility issues.
-
!()is consistantly iterated through. - Splitting by whitespace remains unchanged (
@$()). - The complete output is still available easily via
!().out.
Cons:
- Python code that uses
$()might need to change if the trailing newline is important.
https://github.com/laloch/xonsh/commit/98363df (by @laloch)
Add @$[] operator to split by newlines.
Pros:
- Completely backwards-compatible.
Cons:
- More arbitrary syntax.
- Doesn't really extend
@$()and$[]in a meaningful way. - Still doesn't solve the most common
No split, Strip newlinescenario.
Rejected proposals
Initial proposal in this issue
Run .rstrip('\n') in $().
Cons:
- No easy way to use the complete output if a user needs it.
Add a config option to strip $()
Cons:
this is a case where configuration would be bad, because xonsh scripts would either run or fail based on how the environment was configured.
For community
⬇️ Please click the 👍 reaction instead of leaving a +1 or 👍 comment
Hi @daniel-shimon, this has been discussed (and rejected) several times already. The reasoning, if I remember correctly, was that the trailing newline is part of the captured output and therefore should be kept as is. What if the terminating newline character is significant to your pipeline? For instance echo "foo" | wc -l vs. echo "foo\n" | wc -l? What about outputs terminated by several newline characters? How many of them would you want to strip? rstrip(...) strips all of them...
Btw, I use git rebase -i @$(git merge-base HEAD master) several times a day and it does its thing just fine :wink:
There's a little trick using abbrevs: You can define abbrevs["$$"] = '@($(<edit>).rstrip("\\n"))'.
I understand what you're saying but it still doesn't feel very helpful. I'm talking of course only about subprocesses which are substituted inside another command, in which case I can't really see how that's helpful.
In any case I guess this issue is a pretty opinionated thing if it has been discussed so many times. Do you think adding a config option like $XONSH_STRIP_SUBPROC would be acceptable?
In any case I guess this issue is a pretty opinionated thing if it has been discussed so many times.
Yeah, it really is. The fundamental issue is that many CLIs add a trailing newline without detecting if they are connected to a TTY or not. Bash and other shells effectively add an rstrip operation when capturing (or really a split operation), effectively deciding what the output of subprocesses should be.
I personally think that a lot of Bash's syntax and magic is particularly confusing to new users.
Do you think adding a config option like $XONSH_STRIP_SUBPROC would be acceptable?
I wish it was. I think this is a case where configuration would be bad, because xonsh scripts would either run or fail based on how the environment was configured. It seems like it could cause a lot of downstream bugs. So I think that this is a scenario where a strong decision is needed.
I'm talking of course only about subprocesses which are substituted inside another command
Now this is an interesting idea. I am open to considering it. To-date all operators work exactly the same independent of their calling context. Things that would need to be fleshed out for this proposal.
- What happens with the other subprocess operators depending on their calling modes:
![],!(),$[] - What do we do with the
@$()operator?
I do like to keep an open mind, though! So I can be convinced away from the above opinions.
@scopatz what if create distinct operator for this?
what if create distinct operator for this?
That is what @$() was supposed to be :wink: I am in favor or removing syntax at this point, rather than adding. I really want to avoid the issues that Bash has where there are so many specialized syntax elements that no one can remember them. Xonsh should be easy for people to learn and use and consistent in its implementation.
If anything I think we should either fully implement @$() or remove it. The initial idea for @$() what that you could register transformation functions (like a decorator), that would modify output. For example, @upper$() would uppercase the output. Or you could apply many times, like @split@upper(). Then what we have now would just be the default value: @split$() == @$().
However, this was never fully done, so maybe it is better to drop the syntax entirely.
I do like to keep an open mind, though! So I can be convinced away from the above opinions.
Good to hear! I was fearing this is a dead end.
- I agree with @anki-code that
@$()should remain as-is since it's very xonsh-specific in nature and the ability to spread arguments is pretty cool - I think the other subprocess operators shouldn't change as well, since if we're using them we're probably doing something more advanced than just command substitution.
The basic jist is that $() is used daily for command substitution, and keeping the trailing newlines isn't robust enough for day to day use (e.g. when dealing with paths you have to use @$() which is cumbersome and won't work for paths with spaces).
If we're using $() in more complex python code, we probably want to be accurate, and the command line(s) will probably be bigger such that adding a .rsplit() is more legitimate
I already have @$[] operator here long forgotten, that splits the captured output on newlines only, yielding
$ showcmd @$[echo "space here\nbreak here"]
['space here', 'break here']
instad of
$ showcmd @$(echo "space here\nbreak here")
['space', 'here', 'break', 'here']
I can open a PR if there's general interest in the functionality. I do, however, agree with @scopatz in that we should not add more syntax for no good reason.
@laloch could you please share your PR? I want to understand the process of creating additional operators =)
Another option is to only have $() strip the trailing newlines, while !() doesn't.
This way $() is robust for day-to-day command substitution but for full accuracy one can use !().out.
This resonates with $() being intended for simple usages (you can't use it to get the exit code etc), so I think adding the rsplit to make it more robust and useful is pretty sound
Yeah I agree about not adding more syntax in this area
could you please share your PR? I want to understand the process of creating additional operators =)
Yeah, of course. Here you go: laloch/xonsh@98363df. It's pretty trivial, since it copies almost everything from @$().
I like this logic when I thought about users from bash world and fast diving into xonsh. Looks consistent.
It's not just for users coming from a different shell, it's really more useful and intuitive since you want information from some standard unix command (e.g. whoami, uname, nproc etc), and they all append newlines to their outputs.
Here's another example from a classical daily shell use - make -j $(nprocs) fails. Yeah, you could use @$() but it doesn't really express what you want (stripping, not splitting), and I don't think this kind of simple classic shell usage should cost a lot of syntax overhead.
I have only one concern #3394 :)
#3394 is a bug - sometimes you need to call !(cmd).end() in order to get the resulting CommandPipeline attributes populated.
My main point here is that if we are going to make one subprocess operator modal (ie Python vs Subprocess), it is worth thinking about how all of them should behave in a modal fashion.
Also, if as part of this, we should decide if we can get deprecate @$() or if we should fully implement it, and how this operator would behave modally too.
@anki-code's suggestion here gets close to that, but is probably not the complete answer.
I guess what I want to see is a table with all 5 subprocess operators and how they would behave (ie what their return values would be) in both Python and subprocess mode. I think this would be a good exercise because such a tutorial would need to make it into the docs eventually anyway, if we proceeded down this path.
The PR - https://github.com/xonsh/xonsh/pull/3926
@anki-code if I understand correctly with your suggestion $() isn't modal?
Meaning $() always returns the rstripped output, and !() returns the complete output in subprocess mode and a CommandPipeline object in python mode?
This sounds like a pretty good idea to me - !() is used for fine-grained usages and $() just works for substitutions.
@scopatz In this suggestion, the !() operator doesn't really act differently wrt to the current mode, it just returns a helpful thing in subprocess mode and returns the complete CommandPipeline object in python mode.
I've edited the description of the solution.
@anki-code very nice!
Can you clarify that the behavior should be .rstrip('\n) like in other shells?
Also can you add the make - j $(nproc) example?
In my opinion, $() should return a string in both Python & subprocess mode, not a list. The other operators are free to return various other objects
@scopatz could you give an arguments why?
Anyway, on the technical side, the original design was for $ to indicate that a built-in Python object was returned (str or None), while ! indicated a more rich object, now CommandPipeline. () indicated capturing, ie nothing would be printed to stdout. [] means that output flows through to the screen.
So in this light, $() means capture the output and return a basic Python object. It is supposed to be the same as a the Bash operator $().
Now, I think there might be some confusion about what Bash does. In Bash, because everything is a string*, the $() just returns a string of stdout. I am not even sure that Bash strips the trailing newline. However, in most Bash usage, the Internal Field Separator $IFS variable actually determines how text strings are split by the Bash parser. By default this is set to do whitespace stripping. So in effect, $() in Bash works like @($(cmd).split()) would in xonsh.
In my opinion, $IFS is a bad idea because it can lead to all sorts of strange behavior, like 3rd party scripts not running because the calling process has set this variable incorrectly. $IFS also makes $() seem like it is doing more than it is.
I think using $() in xonsh to split into a list of arguments is a neat idea, but it would necessitate the addition of some default or configurable way to split those arguments. For example, should $() be split by lines or by whitespace (like effectively what Bash does)?
Like with Bash, splitting $() into a list of str and being able to configure how this split happens would have the following issues:
- Some scripts would not be able to run with each other or
- Every script would need to be very aware of how
$()was being split.
This is why I think that $ simple return operators like $() and $[] should pick one uniform way of returning and stick to that. It builds a consistent ecosystem of code that can be run with each other. The approach in xonsh so far has been to thus allow the users / authors to explicitly split however they want. @($(cmd).split()) may seem like more work to write, but it is less work to inspect and debug later on.
So while I keep an open mind on what these various operators should return, I think we should also try to learn from some of the bad design of other shells.
I hope this makes sense!
*Yes, you can make arrays, but it is hard and annoying to do so.
On the less technical side, I think there are a lot of things that we all want fixed before 1.0. This is another problem that probably should be resolved (one way or the other).
Speaking only for myself, xonsh is something that I can only code on during the nights and weekends. I have other things to do in my free time as well. Because I am trying to support all parts of xonsh (review all PRs, put out releases, fix fundamental issues with the project, etc), I don't have as much time as even I would like to spend on each issue.
There are other forks of xonsh out there. Which is great! Xonsh has made the process of creating high quality shells so much easier, that I am happy that people have made improvements in the direction that they want to go. Many years worth of effort have gone into xonsh from a fairly big group of people.
I am sorry if you feel development is too slow. It is going as fast as it possibly can. An single person working on their own is always able to work faster than a group trying to make decisions.
On the other hand, working to improve xonsh as a group, even though it is harder and slower, does improve the shell for everyone in the long run. If a change is important and valuable, it will make it in. But the bigger the effects of the change, like with subprocess syntax, the more people are going to need to weigh in on the pros/cons and the more forethought will need to be put into it.
I agree we should improve the efficiency of the discussion, and document all the arguments in a centralized way.
Following what you guys said, since this is such a major design decision in xonsh I suggest we take a step back from implementation details, gather concise examples, compare to other shells and document the pros-and-cons for each proposal.
I'll try to make this issue a mini-PEP for the subject. @scopatz if you prefer some other medium, a separate repo could be cool.
I've updated the description to express all the things we've talked about. @anki-code @laloch please let me know if I missed stuff and give us more examples so that I'll update the description.
Let's take a few days to get more examples, discuss more options and let more people from the community express their opinions :smile:
Chiming in since I was one of those who was spam-mentioned in the other thread and because I was involved in the discussion and implementation of these operators the first time around, though I've not had a horse in this race for several years...
(BTW, hi again, @scopatz et al! It's been a long time; I hope all is well!)
Years ago when this conversation started, I was a proponent of stripping a trailing newline from the output of $(...) but leaving the output as a string; and leaving !(...) as-is. And I'm still a proponent of that change, FWIW (was that the original proposal in this thread? I haven't been following the conversation).
I'm not crazy about the implementation in #3926, since $(...) returning a list just doesn't feel right to me. You can always do list(!(...)) if you want something like that. I'd rather see fewer keystrokes for common operations, and more keystrokes for less common things, so I think it makes the most sense for $(...) to return a string with trailing newline stripped, and !(...) to return whatever fancy object xonsh returns nowadays for that kind of operation. So I would like to see:
$ user = $(whoami)
# easy, and common, and (i'm pretty sure) matches bash in terms of what is captured...
# if not, it at least matches my expectation and doesn't surprise me when i switch between shells
$ user = !(whoami).out
# if you want the trailing newline for some reason
$ files = list(!(ls))
# if you want a list instead of a generator for some reason
This is the behavior I have in my fork, and I'm happy with it; I've never been surprised by what I get out of either of these operators, and examples like the make -j $(nproc) work just fine. I have never used the middle example (with the trailing newline), though I do loop over !(...) directly regularly, and I occasionally make a list out of it (as above) for later use.
I feel like this small change is also all that one needs to provide a relatively easy way to get just about anything you would want from a command. In Python mode:
- If you want the output as a string, you can do
$(...), no need for any indexing, stripping, joining, etc - If that deleted some whitespace that you wanted, you have to type a bit more, but you can do
!(...).out - If you want something to loop over the lines, you can do
!(...)
Then, in terms of interpreting as arguments to a subprocess-mode command:
- If you want the whole result as a single argument, you can do
command $(...) - If you want the result as multiple arguments split by whitespace, you can do
command @$(...) - If you want the result as multiple arguments split by newlines, you can do
command @(!(...))
I feel like this structure, on the whole, is less work for the more common operations than the current behavior, and also when compared to the structure where $(...) returns a list-like thing. And if you want something more complicated than the above, I feel like you should be able to build it from those primitives.
Basically, I feel like the current behavior is mostly consistent, and that the single small change of r-stripping newlines by default in $(...) reduces end-user surprise. I'm not sure the additional changes of the structure proposed in #3926 provide much additional benefit, particularly when weighed against the cost of such a big (and backwards-incompatible) change.
I don't have the time or the energy to jump into the fray here, though, other than to say that I am probably -1 on #3926 if I still get a vote, to agree with @daniel-shimon that these operators are indeed worth thinking about more, and to present my view of how I would want things to work.
@scopatz I am in favor of removing @$. We could find some pythonic ways to do it instead. Having too many operators leads to confusion. Python is as simple as one can get. We could use/abuse the pipe operator https://pypi.org/project/pipe/ 😄. We can discuss that in another issue I think.
We can blame bash for all we want but it is everywhere, even dockerfilss, CI configs etc, Copying the syntax but not the behaviour leds to confusion and frustration. I myself copy pasted code and expected it to run (my bad it was in bash😄). I understand that Xonsh wants to establish high standards. But come on, bash is not going to be replaced anytime soon and xonsh is primarily interactive usage (Though the scripts can be in xonsh, I find myself using plain python+sh. It is way more maintainable and has complete toolchain to lint,format etc.,). So why not make it more usable and less surprising.
@anki-code Having said that, it is not fair to add such disruptive behaviour to an old project and have it by default. Many people would be having their scripts doing things using this behaviour. The best bet would be to make this a xontrib. Since this is a core code, making it modular, maybe it can be achieved.
Having said that, it is not fair to add such disruptive behaviour to an old project
@jnoortheen thanks for your attention and time! The backwards compatibility is not a strong argument and xonsh is not so old (5 years slow xonsh development vs 15 years of fast fish development vs 30 years zsh/bash). Bash becomes ugly because thinks about backwards compatibility every day.
Hi @adqm, thanks for the detailed explanation! A few questions:
I think it makes the most sense for $(...) to return a string with trailing newline stripped
- Do you mean to exclusively strip one trailing newline or all of them?
If you want the result as multiple arguments split by newlines, you can do command @(!(...))
- Currently each line returned here contains it's trailing newline. In light of this discussion, don't you think these newlines should be stripped as well (like
str.splitlines())?
If that deleted some whitespace that you wanted, you have to type a bit more, but you can do !(...).out
- What about having
!()return the entire output in command mode? I personally think it's inconsistent with@(!@()), but I want to hear your opinion.
I've added this proposal with the changes I think are necessary (strip newlines in iteration), feel free to improve on it
@anki-code I updated the description 👍
On another note:
Returning a list in python mode beats the simple intuitive meaning of $().
Absolutely no. The
$()is a substitution operator in other shells and it's absolutely expected that$()will substitute arguments list.
What I mean is that the splitting by lines is unintuitive and unfamiliar for the default $() implementation.
Most of our examples come from the upper-left corner of the usage scenario - No split, Strip newline.
In this regard, I feel that your proposal does too much magic for these cases. You're splitting by lines even though that's not what a user intends to do when using commands like uname.
The PR assumes that these commands will always contain one line, so the user's intent will align with OutputLines behavior.
This is the reason as of now I agree with @adqm the most.
We already have line splitting with !().__iter__.
If it will strip the trailing newlines (e.g. use .splitlines()), in conjunction with $() stripping the trailing newline we will easily have all three major use cases:
- Strip trailing newline, no split
$() - Strip and split by newline
@(!()) - Split by whitespace
@$()
@daniel-shimon I reviewed the description and I've found that the table "Strip | Split | Xonsh" is unclear. It just hide the examples and use cases. Examples added to XEP-2.