Input file is removed without being compressed with redirecting STDOUT together with --rm
Describe the bug
Input file is removed without being compressed when --rm is given and stdout is redirected.
Expected behavior
Input file stays untouched and is not removed.
To Reproduce
Steps to reproduce the behavior:
-
lz4 --rm inputfile >/dev/null - Warning appears:
Warning : using stdout as default output. Do not rely on this behavior: use explicit '-c' instead ! -
inputfilewas deleted (check withls)

System (please complete the following information):
❯ inxi
CPU: quad core Intel Core i5-6500 (-MCP-) speed/min/max: 3477/800/3600 MHz
Kernel: 5.15.78-1-MANJARO x86_64 Up: 8d 20h 7m Mem: 5281.8/7804.1 MiB (67.7%)
Storage: 1.14 TiB (57.6% used) Procs: 332 Shell: fish inxi: 3.3.23
❯ lz4 --version
*** LZ4 command line interface 64-bits v1.9.4, by Yann Collet ***
Additional context Add any other context about the problem here.
Noted this while developing a compression algorithm benchmarker: https://gitlab.com/nobodyinperson/compression-algorithm-benchmark
https://fosstodon.org/@nobodyinperson/109404289543643442
Hi @nobodyinperson,
Could you try to avoid to use --rm in this case?
I think lz4, gzip and xz are able to work in similar manner with the following commands:
# lz4
cd
mkdir lz4-issue-1193
cd lz4-issue-1193/
touch inputfile
lz4 -1 -c inputfile > /dev/null
ls
# inputfile
# gzip
cd
mkdir lz4-issue-1193-gz
cd lz4-issue-1193-gz/
touch inputfile
gzip -1 -c inputfile > /dev/null
ls
# inputfile
# xz
cd
mkdir lz4-issue-1193-xz
cd lz4-issue-1193-xz/
touch inputfile
xz -z -0 -c inputfile > /dev/null
ls
# inputfile
Sure I can and now that I know my data is being removed otherwise I definitely will 😉
Still, I think preventing data loss should be top priority so an accidental misuse of options and redirection should absolutely not delete data.
In the above case, a user that just doesn't want to see the output of their lz4 --rm filename command will then append the common >/dev/null that works for pretty much every other command and then find that their data was deleted without a warning or confirmation.
gzip, zstd and xz all behave as expected (still saving the compressed output file before deleting the source file when redirecting STDOUT), lz4 behaves differently here:
❯ touch bla
❯ gzip bla > /dev/null
❯ ls
bla.gz
❯ rm bla.gz
❯ touch bla
❯ zstd bla --rm > /dev/null
bla : ( 0 B => 13 B, bla.zst)
❯ ls
bla.zst
❯ rm bla.zst
❯ touch bla
❯ xz bla >/dev/null
❯ ls
bla.xz
❯ touch bla
❯ lz4 bla --rm > /dev/null
Warning : using stdout as default output. Do not rely on this behavior: use explicit `-c` instead !
❯ ls
❯ # lz4 removes input file without writing compressed output file when redirecting STDOUT, gzip, xz and zstd don't
Commenting this block results in the intuitive behaviour (not writing compressed data to STDOUT even if STDOUT is not a TTY):
https://github.com/lz4/lz4/blob/8a31e6402df11c1bf8fbb1db3b29ec2c76fe6f26/programs/lz4cli.c#L680-L688
According to the warning this apparently seems to be an unstable default. If it's undecided anyway, how about making it behave the same way as other compression clis?
Looking back at this issue,
I think the problem was that the CLI was doing exactly what it was being told,
aka it removes the input file (--rm) and sends the compressed output to stdout (> /dev/null),
but what was being told wasn't correctly understood, because lz4 uses a syntax which is not identical to gzip.
Historically, lz4 did not initially behave the same as gzip at all. Over time though, the behavior and capability set have been progressively fleshed out to mimic gzip whenever it's not directly incompatible.
An important example of incompatibility is lz4 file1 file2, which actually means "compress file1 and write the compressed result into file2". Another difference is the implicit stdout policy.
To address these last incompatibilities, and make lz4 behave a lot closer to gzip, the official work-around is to use lz4 -m. Now lz4 -m file1 file2 means "compress file1 and file2 into file1.lz4 and file2.lz4", like it would with gzip. And now, when applied to this current issue:
❯ touch bla
❯ lz4 -m --rm bla > /dev/null
❯ ls
bla.lz4
lz4 -m exists specifically to make lz4 behave closer to gzip command line. So I think that's the solution to this issue.
On a longer term perspective, we can still consider evolutions to lz4, but impact on the ecosystem must be carefully considered.
One topic that has been already discussed is to make lz4 -m the default behavior.
But I'm concerned about the risk of breaking compatibility with existing lz4 users and scripts.
Another one is to ditch the "implicit stdout" policy even when invoking lz4 without -m.
That one has been considered for a while, and this is why there is currently a warning message:
Warning : using stdout as default output. Do not rely on this behavior: use explicit `-c` instead !
Since this warning has been set in 2019, almost 5 years ago, we could consider that enough time has passed to warn most lz4 users about this impending change, and we could now proceed on the second step of the plan, which is to remove the "implicit stdout" policy from lz4, even when -m is not specified.
I would still expect some usages to be altered by this change, but the vast majority should no longer rely on this behavior, so that might be preferable to the risk of new users inadvertently triggering this behavior because it's different from gzip.
Since it seems this should be triaged, I set a milestone v.1.9.5 as a mark of it. We may leave it as is and will revisit in future release(s).