Iterators that know their size after start has been called
Another issue that has come up during the Query.jl design: I have a whole bunch of iterators that know their length after the start method has been called.
Would it be possible to add another return value to iteratorsize that is HasLengthAfterStart(), and if a type returns that it has to implement length(source, state)?
See https://github.com/JuliaLang/julia/issues/8149 and https://github.com/JuliaLang/julia/issues/18823
See #8149 and #18823
I assume this was just meant for cross-reference? Neither issue proposes something that would address the issue here.
Related: #16708
@davidanthoff can't this be handled with implementation of a stateful iterator? In the new protocol iterate(x) could then have your desired side effect.
I think the new iteration protocol handles this yes.
Hm, I might be missing something, but I don't think this is addressed with the new protocol? How would a source indicate to a client that length can be called after the first call to iterate?
If a source returns Base.HasLength from IteratorSize, then length has to work without a call to iterate. If it returns SizeUnknown then a client really has to assume that length can never be called. Neither case seems to cover what I'm after.
https://github.com/queryverse/IteratorInterfaceExtensions.jl#iteratorsize2 has an implementation of what I'm suggesting. That works OK for now, as this is essentially just used to trigger a performance optimization, but I still think it would make sense to have this in base itself.
The contract for length is that it does not change during iterate, so it seems odd that calling iterate would make it available when it was not before
FWIW though, I think the iteration protocol already expects that iterate has been called at least once before length is called for Base.HasLength, so that is already the expected definition for it
The contract for
lengthis that it does not change during iterate, so it seems odd that callingiteratewould make it available when it was not before
The proposal here is not that length returns something different after iterate is called or that the return value would change during iteration. The proposal is that a source can signal to a client that length should not be called until iterate has been called once, i.e. it really is more of a signal that length is undefined behavior until a certain point in time.
WIW though, I think the iteration protocol already expects that
iteratehas been called at least once beforelengthis called forBase.HasLength, so that is already the expected definition for it
Really? I certainly would not have guessed that at all from looking at the docs. Also, just very briefly looking through Julia base code, that does not seem to be how iterators are used, for example the code here would then be an incorrect consumption of an iterator, right?
Wouldn't that also be a really odd interpretation with the current stateless design of iterators? If length(iter) was only valid after a call to iterate(iter), then that would bake a mutating design into the iteration protocol that seems a bid odd? That is why in IteratorInterfaceExtensions.jl I added a new method signature for length that is length(iter, x), where x is the iteration state, for this scenario.