elvish icon indicating copy to clipboard operation
elvish copied to clipboard

Should IO ports be first class?

Open tesujimath opened this issue 11 months ago • 6 comments

What new feature should Elvish have?

I am wondering whether IO ports should be first class, so they could be assigned to variables, passed to functions, etc.

My use-case motivating this is processing stream input, where there is a header to be processed first. So I envisage being able to do from-lines | take 1 for the header, but still subsequently have access to the IO port to read the rest of the stream.

I haven't thought at all about syntax yet, it's the concept I would first like to understand.

This would seem to be an answer also for #741

Thoughts?

Output of "elvish -version"

0.21.0

Code of Conduct

tesujimath avatar Jan 10 '25 20:01 tesujimath

Here's a bit more about what this could look like.

Let's suppose we have a new builtin function make-io-port. And here, obviously, echo is standing in for an external program.

var p = (make-io-port)
echo "1\n2\n3\n4\n5" | from-lines | $p
$p | take 2
▶ 1
▶ 2
$p | take 1
▶ 3
$p
▶ 4
▶ 5

Perhaps we could also or instead have a builtin function capture, which creates an IO port and attaches its input.

var p = (echo "1\n2\n3\n4\n5" | from-lines | capture)
$p | take 2
▶ 1
▶ 2
$p | take 1
▶ 3
$p
▶ 4
▶ 5

This should work for streaming input, so imagine instead of 5 lines, an unending stream from some external source.

I hope that gives the flavour of what I am envisaging. I am sure there is an interesting design space for such a feature!

tesujimath avatar Jan 12 '25 01:01 tesujimath

I started looking into the implementation for this.

There's a challenge here in the way forms are expected to consume all their input, even if they don't use it. For example, the take builtin looks like this:

func take(fm *Frame, n int, inputs Inputs) error {
	out := fm.ValueOutput()
	var errOut error
	i := 0
	inputs(func(v any) {
		if errOut != nil {
			return
		}
		if i < n {
			errOut = out.Put(v)
		}
		i++
	})
	return errOut
}

That is, it reads all the input and discards anything beyond the first n.

So to do what I envisage would require reworking the way Inputs are handled, so take would be able to decline to consume input beyond what it wanted to pass on, to leave it to a later consumer.

tesujimath avatar Apr 03 '25 01:04 tesujimath

So if @xiaq doesn't find this feature interesting I think I should give up.

tesujimath avatar Apr 03 '25 01:04 tesujimath

Oh yeah IO ports should definitely be supported as a first class data type. I haven't been able to spend much time on Elvish recently unfortunately.

xiaq avatar Apr 06 '25 12:04 xiaq

Wait actually there's more going on semantically than making IO ports first class in your example:

var p = (echo "1\n2\n3\n4\n5" | from-lines | capture)
$p | take 2
▶ 1
▶ 2
$p | take 1
▶ 3
$p
▶ 4
▶ 5

It seems that the idea is that the original pipeline would be left running on the background while $p still holds some data?

That's not how pipeline's termination semantics is in Elvish - a pipeline terminates when all constituent commands terminate. Changing this would lead to problems of having a lot of background processes running I suppose?

For this example in particular, it seems that what we need is to have a value-channel counterpart to file:pipe, and use run-parallel for structured concurrency:

var p = (make-value-chan)
run-parallel {
  echo "..." | from-lines > $p
} {
  take 2 < $p
  take 1 < $p
}

Of course the semantics of take is another problem as you pointed out.

The reason it currently works this way is that it tries to work for both value and byte inputs (the latter treated as newline-separated string records), and there's no way of reading the first N inputs from either without the risk of over-reading from either. So I made take always consume all inputs.

But over the years I think it's probably better to make take just work over value inputs. But changing this has the risk of hanging pipelines when the command before take only produces byte inputs - if take never consumes byte input and the pipe buffer gets filled up before the previous command terminates, you get a deadlock. I haven't quite thought through the consequences of that yet.

xiaq avatar Apr 06 '25 13:04 xiaq

Thanks for the thoughtful response @xiaq!

Yes, I was considering leaving a pipeline active to enable multiple bites at it, so to speak. I am aware that here be dragons, and changing the way that the pipelines work would be major. I am also aware that I am pretty new to Elvish, so appreciate your insight here.

I think the way take currently works by discarding the remainder of the input without providing a way to get the rest is not ideal. Really this is the problem I was trying to address.

When you come to have some more time for Elvish, I am interested in collaborating around this, but until then I don't think there's anything I can usefully do.

I have been really enjoying Elvish, and really appreciate your labour of love here. ❤

tesujimath avatar Apr 06 '25 21:04 tesujimath