sd feat: allow streaming, line-buffered input and output

trafficstars

The major shortcoming of sd right now is that it doesn't support streaming stdin to stdout, instead, all input is read into memory which means sd can't be used for bigger-than-memory tasks.

This PR is based on @vtronko's work in https://github.com/chmln/sd/issues/100 but adapted for current main.

It's a bit of a graft and rough around the edges, no validation of multi-line mode switch off etc, but it's a proof of concept and already useful (roughly 3x faster than sed).

Resolves:

#100
#154

Dec 23 '23 23:12 corneliusroemer

2 cents:

I'd argue there's no need for a --line-buffered flag. Anyone relying on the full read behavior?
I "unified" stdin and file scenarios into mmaps. They don't have read_line(). We can maybe implement a BufRead container for it? Or split into separate logic again?

Dec 25 '23 11:12 nc7s

I'd argue there's no need for a --line-buffered flag. Anyone relying on the full read behavior?

Very likely people are, yes. Or anyone who wants to match over multiple lines

I "unified" stdin and file scenarios into mmaps. They don't have read_line(). We can maybe implement a BufRead container for it? Or split into separate logic again?

You can get a slice from it and wrap it in an std::io::Cursor. That being said I'm planning on splitting up at least some of the logic again

Dec 25 '23 14:12 CosmicHorrorDev

@CosmicHorrorDev:

I'd argue there's no need for a --line-buffered flag. Anyone relying on the full read behavior?

Very likely people are, yes. Or anyone who wants to match over multiple lines

Multi-line matching isn't tied to full reads, at least theoretically. Practically it might be, yes. But "line buffered" should be implementation detail. Something like --multiline might be better.

I'm curious if a temp file would be useful here.

You can get a slice from it and wrap it in an std::io::Cursor. That being said I'm planning on splitting up at least some of the logic again

TIL std::io::Cursor ;) The reason of splitting being?

Dec 25 '23 14:12 nc7s

Multi-line matching isn't tied to full reads, at least theoretically. Practically it might be, yes. But "line buffered" should be implementation detail. Something like --multiline might be better.

I'm curious if a temp file would be useful here.

Allowing for line buffering inputs opens the door for streaming stdin and stdout (e.g. someone running just sd can type lines and see the live output after each line). It's more of a conceptual thing since a lot of people think of text files by line

TIL std::io::Cursor ;) The reason of splitting being?

To support streaming reads from stdin. I was looking through ripgreps source and it looks like they still special case streaming stdin, so we'll likely have to too if we want that behavior (which I do)

Dec 25 '23 14:12 CosmicHorrorDev

Did a little test on Cursor<Mmap>::lines(), it works. We can unify the logic (again?) on iterating through .lines().

(Excuse my "unificationism" XD)

Dec 25 '23 16:12 nc7s

sd sd copied to clipboard

feat: allow streaming, line-buffered input and output

sd
sd copied to clipboard