sd
sd copied to clipboard
feat: allow streaming, line-buffered input and output
The major shortcoming of sd right now is that it doesn't support streaming stdin to stdout, instead, all input is read into memory which means sd can't be used for bigger-than-memory tasks.
This PR is based on @vtronko's work in https://github.com/chmln/sd/issues/100 but adapted for current main.
It's a bit of a graft and rough around the edges, no validation of multi-line mode switch off etc, but it's a proof of concept and already useful (roughly 3x faster than sed).
Resolves:
- #100
- #154
2 cents:
- I'd argue there's no need for a
--line-bufferedflag. Anyone relying on the full read behavior? - I "unified" stdin and file scenarios into mmaps. They don't have
read_line(). We can maybe implement aBufReadcontainer for it? Or split into separate logic again?
- I'd argue there's no need for a
--line-bufferedflag. Anyone relying on the full read behavior?
Very likely people are, yes. Or anyone who wants to match over multiple lines
- I "unified" stdin and file scenarios into mmaps. They don't have
read_line(). We can maybe implement aBufReadcontainer for it? Or split into separate logic again?
You can get a slice from it and wrap it in an std::io::Cursor. That being said I'm planning on splitting up at least some of the logic again
@CosmicHorrorDev:
- I'd argue there's no need for a
--line-bufferedflag. Anyone relying on the full read behavior?Very likely people are, yes. Or anyone who wants to match over multiple lines
Multi-line matching isn't tied to full reads, at least theoretically. Practically it might be, yes. But "line buffered" should be implementation detail. Something like --multiline might be better.
I'm curious if a temp file would be useful here.
You can get a slice from it and wrap it in an
std::io::Cursor. That being said I'm planning on splitting up at least some of the logic again
TIL std::io::Cursor ;) The reason of splitting being?
Multi-line matching isn't tied to full reads, at least theoretically. Practically it might be, yes. But "line buffered" should be implementation detail. Something like
--multilinemight be better.I'm curious if a temp file would be useful here.
Allowing for line buffering inputs opens the door for streaming stdin and stdout (e.g. someone running just sd can type lines and see the live output after each line). It's more of a conceptual thing since a lot of people think of text files by line
TIL
std::io::Cursor;) The reason of splitting being?
To support streaming reads from stdin. I was looking through ripgreps source and it looks like they still special case streaming stdin, so we'll likely have to too if we want that behavior (which I do)
Did a little test on Cursor<Mmap>::lines(), it works. We can unify the logic (again?) on iterating through .lines().
(Excuse my "unificationism" XD)