ugrep icon indicating copy to clipboard operation
ugrep copied to clipboard

Feature: in-place replacement

Open rrthomas opened this issue 7 months ago • 2 comments

I am the maintainer of rpl: https://github.com/rrthomas/rpl , a utility to find and replace regexp patterns in files.

The currently-released version is written in Python, and has performance problems. I have just spent a while rewriting it in Vala, and (as you can see), I have a release candidate.

Imagine my excitement when I discover that ugrep (which I have used as my grep for some years now) has a --replace option. But alas! it cannot replace in-place.

I would love to be able to use ugrep, and avoid releasing my new version of rpl. I would be happy to simply suggest that users of the current Python version who require better performance use ugrep instead.

It would seem that "all" that is required is some option that modifies --replace to allow in-place replacement, e.g. --in-place or maybe --replace-in-place as an alternative to --replace.

Otherwise, the coding required seems fairly straightforward. If an implementation strategy similar to rpl's were used, ugrep --replace --in-place would need to:

  • Create a temporary file in the same directory as the original (so it can simply be renamed on completion) for the output.
  • Write output to the temporary file instead of standard output.
  • In case of some error that stops the run of ugrep, delete the temporary file; otherwise, move it over the original.
  • Deal suitably with the output encoding: if recoding is done on input, reverse that on output, and again deal with any errors.
  • Deal suitably with file metadata (here I would copy GNU cp's --preserve flag and default behaviour rather than rpl's --force and --keep-times options).

Other functionality that rpl offers is comprised of its case-matching (--match-case) and interactive overwriting (--prompt, but again the GNU cp names -i, --interactive seem better to me).

A MVP could omit handling permissions, interactive overwriting, and case-matching.

There are obvious interesting things that ugrep -Q could offer (AFAICT, it doesn't yet allow in-place replacement) too.

Thanks for ugrep!

rrthomas avatar May 25 '25 23:05 rrthomas

Thank you for your feedback and suggestions! I will consider your request. However, ugrep (or any a grep tool) should be safe and thus never modify the input files. IMO this feature would be best in a clone of ugrep under a different tool name. Its behavior would be similar to sed with s/old/new/g to replace old with new. Note that sed has an option -I to edit files in place. So you could use sed, no?

genivia-inc avatar Jun 16 '25 16:06 genivia-inc

Thanks for the reply—it's a good point about expectation of "grep"; I think this would be possible to achieve by requiring a different executable name.

sed is good in a tight spot, but I have put a lot of effort into maintaining and improving rpl over the years, including a complete rewrite, precisely because it has various ergonomic advantages, not least that you know what you are getting with it, rather than wondering which version of sed you have; but also, it has a clean syntax and various ready-to-hand features that are useful and combine easily.

rrthomas avatar Jun 16 '25 16:06 rrthomas

I'm leaving this for later and closing for now, which really should be a separate tool IMHO. If others find this useful, then I may give this a go.

genivia-inc avatar Jun 17 '25 16:06 genivia-inc

Here is one reason "why not sed?" ... https://blog.robenkleene.com/2023/12/26/introducing-rep-ren/ .. if there is a way to have ug output in the format rep expects, then ug can be a replacement for rg

gcflymoto avatar Jun 18 '25 06:06 gcflymoto

Thanks for this, @gcflymoto. I also maintain the file renamer mmv, and I had wondered whether there could be a uniform approach to replacing text in files and renaming files. rep/ren shows there can!

Separating the matching from the replacement is a great improvement, and I very much like the idea of being able to use ug here.

rrthomas avatar Jun 18 '25 09:06 rrthomas

I agree with all comments, except that IMO it should be a separate tool under a different name. What if by accident --in-place is used (e.g. backing up over history)? Or in an alias that isn't clear what it does? Or in a .ugrep config file? Then your file system will be garbled by just running ug. Very bad.

It also would not work when searching compressed files or filtered files like PDF. Again, that's not an issue when it is a different in-place replacement tool that has these expected limitations.

I don't understand why rg does this. It is bad practice and goes against the Unix motto to keep tools do only what they are supposed to do. Use tools for their designed purpose, i.e. grep to search (safe) and sed to replace including in-place (not so safe).

genivia-inc avatar Jun 18 '25 12:06 genivia-inc

My understanding of reading about rep and ren is that they are separate tools, which understand the output of rg, which sounds like exactly what you're after, @genivia-inc?

rrthomas avatar Jun 18 '25 19:06 rrthomas

My understanding of reading about rep and ren is that they are separate tools, which understand the output of rg, which sounds like exactly what you're after, @genivia-inc?

I understand, but what are the details you want me to look at to support this with ug? What would be different or specific for ug to output to pass to rep and ren? I've never heard of these tools (Googling redirects me to completely unrelated things). I can do whatever I want on any system with (u)grep, sed and (g)awk. Or Python if I have to.

genivia-inc avatar Jun 19 '25 14:06 genivia-inc

@genivia-inc, they are not mainstream. In the Linux world, with sed being the mainstream, there is no "spec" or convention for a structured output format of search tools. I don't necessarily like that rep and ren use the line text output out of rg instead of the JSON format.

PS. For reference https://github.com/robenkleene/rep-grep https://github.com/robenkleene/ren-find

gcflymoto avatar Jun 19 '25 17:06 gcflymoto

Got it. Looks like the Rust community keeps on looking for "killer apps" to (re)write in Rust. Rust is just OK. It is pushed by some (largely anonymous) big tech stakeholders to exert control over the tech ecosphere, for better or worse. It has pro/cons to C++ as you know. Memory management in C++ is getting better anyway and I really like C++ much better than Rust, which is terrible to write and takes forever to compile. Nope, no longer interesting to me. I've written a lot of code in many programming languages in my career, among them are Fortran, BASIC (old and VS), Pascal, Modula 3, Prolog, Java, Haskell, Lisp/Scheme, assembly. Rust to me is like ADA that tried to convince everybody to use it as a safer alternative. It eventually failed, just like PL/1 and Modula failed when mainstream languages evolved to keep up with ideas.

Or am I too critical (or cynical?)

genivia-inc avatar Jun 19 '25 19:06 genivia-inc

I don't see that the implementation language has anything to do with the idea of rep/ren. (FWIW, I find Rust a pain to write and don't use it, but I appreciate the quality of the developer experience and documentation culture.)

I think the author of rep/ren makes a good case for doing things differently; in particular, addressing "why not just use sed?", and has come up with a set of tools that neatly factor out a problem space in the classic UNIX tradition. It has to invent a format for specifying replacements precisely because it's not standardized, and it does so simply and elegantly; hooking into diff is another nice touch.

That this problem space is of interest to many users is evidenced by the many tools that have been invented over the years, none of which has gained significant traction, as far as I'm aware. The two I maintain are obvious examples. Both have survived in Debian for over 20 years (so someone's finding them useful), I have ended up being the upstream maintainer for both (because there were problems that needed fixing), and both are rather inelegant "all-in-one" solutions to their respective problem areas; "mmv" even has something like a "regex replacements for glob" format that appears in no other tool. No wonder not many people want to use them! I use them both all the time, but it is only because I find them so useful that I am happy to put up with their individual quirks.

rrthomas avatar Jun 19 '25 21:06 rrthomas

A comment from another angle: rpl, the file replacement tool I maintain, came to me written in Python. I recently found it had fairly fundamental performance problems which I was unable to fix in Python. I therefore rewrote it in Vala. Despite adding a lot of tests at the same time, I have found, now that I try to test its performance, that I have performance regressions compared to the Python version, and new bugs. This is a lot of effort that would be unnecessary if I were able to use something like ug, with existing well-tuned performance, but with added replacement ability. (You could just say "well, duh, you wrote it in Vala, you should have written it in Rust/C++/D/OCaml/whatever…", but I think that would be missing the point: the problem is that I had to write this stuff at all.)

rrthomas avatar Jun 19 '25 21:06 rrthomas