Don’t parse the pipeline as text when it is directed from an EXE to another EXE or file. Keep the bytes as-is.
Currently PowerShell parses STDOUT as string when piping from an EXE, while in some cases it should be preserved as a byte stream, like this scenario:
curl.exe http://whatever/a.png > a.png
or
node a.js | gzip -c > out.gz
Affected patterns include: native | native, native > file and (maybe) cat file | native.
@vors @lzybkr
The current NativeCommandProcessor breaks:
- LF line endings.
- Non-ASCII text within UTF-8 without BOM header.
- Binary file redirects (like
curl.exe’s output). -
>layouts text into 80 columns by default.
Maybe add a cmdlet/operator to call native command and get its raw output (as a byte array / stream?), something like this:
# Consider ^& operator is an alias for Get-CommandRawOutputStream; this is just an example syntax
$output = ^& curl.exe http://whatever/a.png # $output now is a byte array or stream
$output > C:\Temp\file.png # file.png now is a valid image file
# This should be valid, too:
^& curl.exe http://whatever/a.png > C:\Temp\file.png
This opens an opportunity for some additional usage patterns (you can put this raw content into variables, and pipe raw content from native commands to managed cmdlets).
But maybe we could add a special kind of redirection operator (like 2>&1, 3>&1, *>&1 we already have), something like this (where %>&1 is a new redirection operator that redirects command "raw output" without processing it as a string):
$output = curl.exe http://whatever/a.png %>&1
$output > C:\Temp\file.png
# Or even this:
curl.exe http://whatever/a.png %> C:\Temp\file.png # which is just awesome
Overall: I don't think that this kind of redirection should be tied to only native commands or some limited list of usage patterns (e.g. native | native).
@ForNeVeR My proposal is that:
- For
native | native, keep the bytes as-is. This is already purposed by @vors. - For
ps | native, add a set of cmdlets which encodes PS objects into bytes, perhapsps | encode-text utf-8 | native. - For
native | ps, we can use the type system to identify whether a cmdlet accepts “raw input”. For cmdlets likeout-fileor maybedecode-text, it will keep the bytes fromnative, and other cmdlets will use the parsed string as its input.
@be5invis okay, it seems like this proposal also supports all the relevant use cases I can imagine.
Shouldn't this open up an RFC since this is a breaking change (changes the observed behaviour)?
A workaround for this is to provide a cmdlet that stores the content in a temporary file. A working example is Use-RawPipeline in PowerShell Gallery. The current implementation is to store the file, but it could also be streamlined so that the file doesn't have to be stored.
See also #559, where this appears to be actively discussed and worked on by @vors on the PowerShell team.
Great discussion! Thank you all for the feedback.
I'd like to share my plans about this work:
- In the scope if this issue we will address only
native | nativeandnative > filebehavior. Note, that although it could be seen as a breaking change, it would not be the case for text output. The behavior would be preserved. Byte output would be much more reliable without wrapping bytes in PS strings. We agreed with @lzybkr that it's not breaking, hence no RFC process would be applied. - I don't see the immediate need in enhancing
native | pscase, since PS is able to consume strings only from the native commands. Although, somebody may want to write function like
function foo
{
param([byte[]]$rawBytes)
}
they may archive it with a temp file or some other technique as @GeeLaw pointed out.
- Similarly,
ps | nativecase has a well established pattern: when ps objects need to be passed to the native command, we apply implicitOut-Stringand pass everything as a text. Because PS doesn't use byte streams as a primitive for pipeline, I don't think we should develop special sugar to support it in the language directly. If there is a case, when it needs to be done, similar work-arounds can be used.
We can revisit the last two parts later, but I'd like to set expectations about scope of this issue.
@vors However the current “>” is identical to out-file, so you have to add a special version of out-file which takes raw bytes. So why don’t you give the ability to everyone?
@vors The change in #2450 greatly improves the experience, but the design still feels a little awkward and inconsistent. It seems to be based on arbitrary patterns rather than consistent behavior of operators and cmdlets with respect to input arguments of particular .NET types.
As a user I would expect binary operators like | to behave consistently given a LHS expression that evaluates to a byte stream (or some appropriate choice of byte stream-ish object), regardless of whether it is produced by invocation of a native executable, piping from a file, invocation of a PowerShell function/script/cmdlet, or .NET FFI.
Similarly, I would expect the | operator to behave consistently given a RHS that "can accept stdin", for some meaning of accepting stdin appropriate to the RHS expression in question. For native executables and files this is just sending the bytes to the correct file descriptor, for PowerShell invocables perhaps it would be param([Stream]$rawBytes) as you suggested.
If there is no way to overload |'s behavior so that this is not a breaking change, then we should have a different piping operator for raw streams, and cmdlets for converting between byte streams and guessed-encoding-decoded lines (similar to $ and ~ from @GeeLaw's Use-RawPipeline project).
Oh, https://psguy.me/modules/Use-RawPipeline/ is very interesting, thank you for the link.
There are 2 conversations going on here:
- just
native | native(ornative > file) -
native | psandps | native
They are highly related and it's true that solving (2) in a general way will buy us (1) automatically. However the scope of the work for (2) is much broader and includes RFC and what not, while the (1) is a low hanging fruit: it can be done in a non-breaking manner, greatly improve perf for common cases and the changes itself are very modest. Note that to achieve perf parity with bash, we would need (1) in one form or another.
That's why I think that it make sense to separate these two tasks.
@vors What I mean is that IMHO native, ps etc. shouldn't be distinct, first class concepts in the first place. It makes the language conceptually simpler if it consists only of expressions that can produce and consume .NET values and operators that can wire such expressions together. PowerShell is after all strongly typed, if not statically typed.
Changing the language so that (ping.exe 1.2.3.4).GetType() is Stream (or some similar type with suitable metadata about the process) would be a way to synthesize native commands with the rest of Powershell in a less inconsistent way.
That's why I think that it make sense to separate these two tasks.
Do you know if there's already an RFC or issue for the latter task?
FYI Use-RawPipeline has been reworked to allow streamlined experience instead of having to store the content in a file and wait before the previous process ends to perform the next piped process. Its source code is available from https://github.com/GeeLaw/Use-RawPipeline
There seems to be an assumption throughout this that PowerShell's philosophy that "everything is a pipeline" is OK. However, I think there might be some value in thinking of the use case of "PowerShell as a legacy native command launcher" as distinct. Would it be possible to allow the redirection operators to have their traditional meaning of redirecting the native command's stdout directly to the raw file instead of piping its output back to PowerShell? Even if all of the encoding issues are resolved for general piping, giving a command a pipe when it expects a physical file is still a semantic change.
Requiring a user to know that they need to specify special obscure options to say "don't change the output of this command" seems error-prone at best. I'm arguing that redirecting a simple command directly to a file should be data-preserving by default.
Or, is the philosophy that if someone wants to use non-text native commands that they should just switch back to a traditional cmd window?
@SteveL-MSFT
Following command will be broken due to this issue. docker save microsoft/windowsservercore:ltsc2016 > msft_wsc_ltsc2016.tar
You have to use docker save -o instead.
I have hard time understanding what #2450 actually fix. Because even though command like ping.exe github.com | grep.exe Reply works. The command:
curl.exe "https://i.redd.it/dntes9fqy3x11.jpg" > test1.jpg
still works only in cmd/WSL's bash/git bash.
I was trying to use git show ref:path/to/file.png > file.png and it looks that it's still not possible to use it in powershell. Are there any serious plans to fix it?
@mpawelski
#2450 fixes the problem of requiring upstream native command to finish before piping the output downstream. It does not address PowerShell’s parsing byte stream from native output and reserializing object to byte stream. Please take some time to learn about how object-oriented pipes work and you will learn the problem is really hard to solve consistently. For that purpose, please use a native redirection/piping utility, e.g., Command Prompt, Start-Process, or Use-RawPipeline.
@GeeLaw So you think PS should break simple, native-exe piping, on purpose??????
@be5invis which post are you replying to?
If you are replying to a post 2+ months ago, I pointed out the necessity of RFC and developed a workaround.
If you are replying to a post ~1 month ago, I was explaining to @mpawelski that #2450 does not address this specific issue. The additional point (this issue is hard to solve consistently) re-enforces the necessity of RFC (and probably additional documentation explaining the new/old behaviors), and provides pointers to workarounds in current versions of PowerShell.
As for breaking exe piping, I interpret current implementation as mistakenly breaking it with some intention behind the scene (for other use cases). Improvement (coming up with a more intuitive conversion rule, implementing it, and documenting it, which are the RFC part) and education (making people aware of the nuances) are both important — one should know and currently cannot choose to ignore the difference between native utilities and cmdlets.
Edited by @joeyaiello: As a reminder, please be respectful and follow our Code of Conduct when commenting on issues or PRs.
Re-requesting eval due to this being a blocker for Linux users who want to make pwsh their default shell in Linux.
One of the main reasons why I haven’t swapped over to pwsh over bash is due to the pipelining issues. Pwsh doesn’t work well when you pipe raw bytes between native apps which is a fairly common scenario
Re-assigned to @JamesWTruher, who will work on this in 7.3 development cycle.
I am interested to know how many people who want the "as-is" bytes are also in the camp of people who are using LF instead of CR LF on their Windows PowerShell scripts...
I've proposed a mode that guides behavior based on detection of LF vs. CR LF of the script containing the pipe/redirect:
Option for LF vs CR LF Piping To Match Line Endings of Running Script #16511
It may be (?) that someone's feelings about the importance of "as-is" redirection vs CR LF is effectively captured by their git autocrlf setting. So this would piggyback on that.
Perhaps a useful implementation would recognise the historical significance of a [byte] stream as it pertains to native applications. Three scenarios would need to be considered:
-
Whenever a stream of actual
[byte]s is passed into a pipeline to a native program (e.g.Get-Content -AsByteStream somefile | native) then noOut-Stringconversion is applied, the bytes are streamed as-is. It seems unlikely that many native apps would be expecting a sequence of decimal representations of the value of the byte, one per line, and this could easily be created by simply converting the[byte]stream into an[int]stream (which would then still useOut-String). Of course, there would be no practical way for PS to know that a stream consists entirely of[byte]s unless the (single) object passed were a[byte[]](which would then be enumerated into a[byte]stream). It would be a matter for debate as to whether only this particular case would trigger the special behaviour or any stream with an initial[byte]would trigger it but then throw a terminating error if a non-[byte]were passed (similar toSet-Content -AsByteStreamwhich has the same issue when taking input[object]s from the pipeline). -
For the case where a native program is the pipeline input to a cmdlet (e.g.
native | Sort-Object), the current behaviour is retained since often (if not mostly) the native program's output will be (typ. ASCII) text. Bytes are converted to unicode characters by whatever mapping is appropriate and then collected into[string]s which are passed to the cmdlet. However, similar to the way commandline arguments are parsed into objects but the original text is retained in case a native command is being invoked, each[string]object would be wrapped in a[psobject](or just have an added property) which would contain the original[byte]s received from the native program (maybe a[byte[]]or a[string]with the original[byte]s collected but not mapped to unicode equivalents). This would be invisible to existing cmdlets (possibly directly accessible if desired via a public property) but a new filter cmdlet (ConvertTo-ByteStreamorcbsperhaps) would be provided which would restore the original output into a[byte]stream which could then be passed into the remaining pipeline. This would also cover the case of saving native output into a file by simply doingnative | ConvertTo-ByteStream | Set-Content -AsByteStream(the behaviour of>would be unaltered to minimise potential breaking changes). Of course, for native programs that produced raw binary data (i.e. not character strings of any form), very big (but unused)[string]s could be produced, potentially impacting upon both processor and memory usage. Whether some limit would be imposed on the size of[string]s built from native output is a topic for debate. Any such limit would not affect true[byte]streams, just the form of[string]s built from that stream which might terminate before an "end of line" (a meaningless term for raw binary data). -
For
native | native ( | native ...), I would suggest that the basic ethos of the PS pipeline (passing objects) is totally inappropriate as such programs are not suited for (nor even aware of) the PS pipeline. For this case (only), the originalcmd(and Unix™) behaviour should be restored. Each program would be started in its own process (as currently) with a (Windows/Unix™) anonymous byte stream pipe connecting StdOut of each to StdIn of the next (if any). Any sequence ofnative | native (| native ...)within a larger pipeline would be treated as a single native program with one input stream from the PS pipeline and one output stream back to the PS pipeline as per items 1 and 2 above, e.g.Get-Content -AsByteStream somefile | native | native | native | Sort-Object | morewould be treated as
Get-Content -AsByteStream somefile | ( one native doing 3 things ) | Sort-Object | moreWhile this might seem to be the breakingest change possible, I suspect that the vast majority of native programs (with input / output suitable for piping) would have been designed around this behaviour (byte stream pipes, the type obtained from kernel APIs). Certainly, anything intended for invocation by
cmdwould expect it. In particular,find.exe,findstr.exeandsort.exedo (and also don't like Unicode). Other programs that might have the ability to process Unicode would either utilise a command line option, a BOM (LE / BE) as the first two bytes read or (less predictable) DBCS heuristics, e.g. expecting a lot of alternating 0 bytes for mostly ASCII characters or maybe a LE/BE-Unicode Space, TAB or new-line within the first 100 byte pairs (I'm looking at you,Scripting.FileSystemObject), but they would still read a byte stream (as paired bytes). Further on the "make it work likecmd/Unix™", any redirections of file handles within a native pipeline grouping (native | native | ...) would need to operate on the actual process handles as there would be no "PS pipeline" within the grouping, e.g.Get-Content -AsByteStream infile | native1 2> native1.err | native2 2> native2.err | ConvertTo-ByteStream | Set-Content -AsByteStream outfileHere, native1.err and native2.err are connected directly to file handle 2 of native1 and native2, respectively. Alternately,
Get-Content -AsByteStream infile | native1 2>&1 | native2 2>&1 | ConvertTo-ByteStream | Set-Content -AsByteStream outfileIn this case, streams are merged with file handle 2 of both native1 and native2 being duplicated from each's file handle 1 (their respective anonymous output pipes). Whether (on Unix™ like systems) being able to open/redirect/merge file handles other than 2 (à la sh(1) and derivatives) would be desirable (feasible?) within a native pipeline grouping is less clear. There would be few, if any, native programs (on any system) that expected anything beyond StdIn, StdOut and StdErr.
I believe that the preceding could go a long way toward resolving the apparent discontinuity between (legacy) program simple byte oriented CLI I/O and the more powerful but more complicated object oriented PS pipeline. While some potentially breaking changes are involved, I suspect these would mostly affect strategies (kludges?) used to work around the incompatibility issues addressed here. Further, the changes proposed allow original (incompatible) behaviour to be maintained at the PS script level. For native output, no change will occur unless the new ConvertTo-ByteStream cmdlet is used. For input, simply changing a [byte] stream to an [int] stream (of the same values) will restore previous behaviour. For native to native pipes, interposing an explicit Out-String between each native in the pipeline should restore previous behaviour by passing the data through PS, with the associated conversions (not 100% sure about this one).
One way to avoid any magic would be that PowerShell inspects a list of exceptions that the user can configure themselves. It could be as simple as a JSON file stored at a specific location in your user profile:
{
"exceptions": [
{
"path": "C:\\Python39\\python.exe",
"stdin": "native",
"stdout": "native",
"stderr": "native"
},
]
}
Obviously, the exact format and location of the configuration isn't so relevant, but this would make it an easy to configure opt-in feature that does not disrupt the way PowerShell works by default.
One way to avoid any magic would be that PowerShell inspects a list of exceptions
General proposal is in #13428
Any updates? Seems that without this we cannot use gzip to compress data stream:
cat -AsByteStream a.js | gzip > a.gz
The data stream will be corrupted by the >.
This command works on every shell except PowerShell.
@JamesWTruher is this still on your radar?
Can we expect this being committed in 7.3? Whenever I want to write bytes to native executables via pipeline, I have to launch a cmd or bash shell to achieve this, which makes PowerShell really hard to use, and, useless in the scenario where binary data processing against native executables is usual in a script.
This has annoyed me for a long time. 💢 I believe there are more PS users feeling confused about it. However it's so frustrating that PR #15861 was closed. It's 2022 now, and unfortunately we still cannot get rid of the object-based stream redirection for native executables. 😭
lewis-yeung, it can be quite the pain. The workaround of https://github.com/GeeLaw/PowerShellThingies/tree/master/modules/Use-RawPipeline requires rewriting your commands but can deliver performance for many situations I have found.
GitHub
My PowerShell thingies. Contribute to GeeLaw/PowerShellThingies development by creating an account on GitHub.
@mitchcapper Thank you. I noticed this amazing work by GeeLaw. Hope that a built-in implementation can be applied to the syntax (>/>> operator) in future PowerShell releases. 🙏
I guess this reinforces my general frustration with PowerShell - we need an intrusive work-around to actually use it as a shell (i.e., running command lines consisting only of EXEs with the same input/output semantics as cmd).
Can anybody summarize what the hold-up is? Are there some remaining issues that I've overlooked in the discussion, or is it some difficulty in the implementation details?
Hope that a built-in implementation can be applied to the syntax (>/>> operator) in future PowerShell releases.
(+1) I don't know if it's true for everyone, but if it just required using a different operator to get the effect--that would be fine with me!
e.g. My concern isn't so much "the specific | and > characters don't have the right behavior" as it is "can't seem to do it at all".
Everything in PowerShell has (to me) a very foreign syntax, so one more adjustment would not be a problem.
However it's so frustrating that PR https://github.com/PowerShell/PowerShell/pull/15861 was closed.
Thank you for sharing this PR! I made a build to see how well it works and it hit a problem for me immidiatlly:
PS /Users/vors/src/PowerShell> cat ./README.md | grep CODE
ResourceUnavailable: Program 'cat' failed to run: StandardError has not been redirected.At line:1 char:1
+ cat ./README.md | grep CODE
+ ~~~~~~~~~~~~~~~.
Maybe this is because I'm using MacOs, not sure.
It's sad that this issue is not prioritized more. 🙁 I think the team was planning to improve on "interactive shell experience" on the last roadmap. Having pipeline operator work "as expected" for native executables is definitely in spirit of improving shell experience. I understand it might be hard to add it and coexist with standard "objects pipeline" of Powershell, but to be honest, it's a basic functionality of a shell to handle native command and pipes.
As on external user I feel the progress of improving Powershell has slowed down recently. When Powershell got opensources I got excited it'll get improvements for its quirks and warts. And we get some (like && operator and some works on PSNativeCommandArgumentPassing) but the progress is much slower than I hoped for and the communication with community is not that good as others already had noticed in other places.
I got a proof of concept of this working a few days ago. There's a lot more left to be done, it still needs some abstraction, error handling, redirections in general, and a lot of testing. You would be correct to call this a bug fix, but from a planning and expectations standpoint (i.e. the amount of time it'll take me to finish) it's more in the realm of a feature.
That said, so far so good!

On the topic of passing bytes as is to native commands, can anyone think of a real world scenario where you would be doing $varWithRawBytesInIt | someNativeCommand and actually want each byte to be stringified and passed one by one?
Nope. Excellent question though.
My reasoning (in case others can spot a flaw): In .NET, char and byte are unambiguously distinct types (not even described on the same page in Microsoft's reference). So, if I even managed to create a "raw bytes" variable in the first place, I could assume I got that as the output from a previous native command and/or by reading a file in "raw" mode. In both of those cases, I'd want the semantics to be the same as if I had redirected/piped that original source directly into my native command in a single command line. Sort of a mathematical "transitive property" I suppose.
Even in my past C# programming, if I ever resorted to creating some sort of "collection of byte", the reason was always to indicate "don't touch/modify this data in any way".
If someone were to even present a hypothetical case where they would want it processed, I'd lean towards saying the user should explicitly pipe it through some sort of "stringify" PowerShell command such as Select-Object, since I would expect that to be such a rarely desired semantic.
Tangent, slightly off-topic:
it would be more intuitive to me to use Get-Content to do the explicit data conversion since it has the concept of splitting raw data into lines based on a delimiter and an encoding, but apparently it is prohibited from reading from ~stdin~ the input pipeline for some reason.
Get-Content: The input object cannot be bound to any parameters for the command either because the command does not take pipeline input or the input and its properties do not match any of the parameters that take pipeline input.
If someone were to ... want it processed, ... the user should explicitly pipe it through some sort of "stringify" PowerShell command
^ Agreed.
apparently it is prohibited from reading from stdin for some reason
PowerShell itself does not support reading from stdin. From the help topic about_Redirection:
[!IMPORTANT] The SUCCESS and ERROR streams are similar to the stdout and
stderr streams of other shells. However, stdin is not connected to the
PowerShell pipeline for input.
Get-Content is tied up with the architectural idea of providers which represent access to structured data like a file system, the registry, certificate store, etc. Not all providers support the interface (IContentCmdletProvider) used by Get-Content. But Get-Content does work with providers other than the file system provider. Try these commands some time:
Get-Content env:\PATH
Get-Content function:\prompt
Ah, my apologies @rkeithhill, I'm guessing I should have said "input pipeline" instead of "stdin". Still struggling with the lingo.
Anyway, I was looking for any PowerShell command that satisfies "read a byte stream from the pipeline and convert it to an array of strings based on any -Delimiter and -Encoding parameters". Get-Content was tantalizingly close.
Wow... just wow. I found another scenario that bit me for a few hours tonight. I was trying to compare the AWS "fingerprint" for a private key I had to determine what key went with what file. Long story short, due to this unexpected behavior the following commands produced different results between Git bash and PS. (It's more an issue now that I'm trying to embrace Powershell more, and it's becoming the default terminal in many IDE's):
Incorrect value using Powershell pipe:
PS D:\dev\AWS Keys> openssl pkcs8 -in .\private.pem -nocrypt -topk8 -outform DER | openssl sha1 -c
(stdin)= 8a:cb:c5:84:4d:a6:a3:5f:ed:03:67:f6:f6:88:a0:bd:02:8e:b5:dc
Correct value using Git bash:
$ openssl pkcs8 -in .\private.pem -nocrypt -topk8 -outform DER | openssl sha1 -c
(stdin)= 89:e3:a5:d2:84:33:ba:ac:64:55:38:e5:5b:52:1f:9e:42:bf:e5:c2
Any seasoned programmer would expect the output stream of bytes to flow between processes. This has me baffled, but I'm glad I've been made aware of this oddity!
Any progress or fix on the issue, it is about time to give us the option to pipe bytestream...
Yep plenty of progress, just a large change with a lot of design considerations.
Speaking of, anyone know of a command that emits binary data to stderr? My current thinking is to just not touch stderr and leave it to it's current string reading ways.
And in the same vein, I'm thinking that redirecting stderr to stdout (e.g. 2>&1) should fall back to current string reading behavior for both stdout and stderr. Can anyone think of scenarios where that would not be desirable?
@SeeminglyScience I think it's reasonable to assume stderr always emits text. Can you maybe create a draft PR with your changes so far?
Speaking of, anyone know of a command that emits binary data to stderr?
My issue would not be emitting binary data, but wanting to preserve the sense of CR/LF line endings.
If the program writing to stderr did not put CR in its output, I don't want to see it in the receiving programs or if written to a file.
Speaking of, anyone know of a command that emits binary data to stderr? My current thinking is to just not touch stderr and leave it to it's current string reading ways.
And in the same vein, I'm thinking that redirecting stderr to stdout (e.g.
2>&1) should fall back to current string reading behavior for both stdout and stderr. Can anyone think of scenarios where that would not be desirable?
IMO it's not that critical to avoid current "string reading behavior" in this case because when you use 2>&1 you most likely deal with text output, not binary.
Though I agree with @AE1020 that it's not just about binary data, even when program output text I want to pipe it or redirect "as is" without any changes by Powershell.
I think this would be unnecessary exception to "we don't parse native executables output as text in pipelines and redirections" feature. It would be another Powershell gotcha, where it behaves slightly different is some corner case.
I agree with the recent comments. If I write a command line that consists only of native executables, all piping and redirection should be as-is, no parsing of anything as text. If a command is ending lines with newlines only on Windows, that should be preserved when I redirect that output to a file or another executable.
I know this goes against the historical architecture of PowerShell, but my expectation would be that parsing things as text would be only done as a "last resort" if/when the output of a binary executable is being read by a cmdlet, rather than thinking of it as something to decide when the output is written.
And, the idea of merging stderr into stdout and having that "taint" how stdout behaves sounds like a particularly confusing idea.
Draft PR is up #17857