more idiomatic iterator use
NOTE: this PR is targeted at the stdlib-iterator branch as a potential improved v2 API, not current tip.
nerdsnipe alert: I'm still in relatively early phases of exploring iterator idiom, so couldn't resist having a go at simplifying SplitTokensIntoLines which is now fully streaming (even though none of the call sites look like they can take advantage of that currently).
I removed the EOF variable, replacing it with an IsZero method which can be used to check if a Token is valid or not. (I don't see anywhere that seems to rely on yielding an EOF value: all the tests pass.)
Aside: one convention from the stdlib I realise I've absorbed but never actually expressed: iterators are generally named in Go as "sequences", so perhaps use chroma.Seq rather than chroma.Iterator (also has the advantage that it's shorter), or just inline iter.Seq[chroma.Token] everywhere - it's not that long and it's super-clear - and remove Iterator entirely.
One other thought: a possibly simpler way to write delegatingLexer.Tokenise:
func (d *delegatingLexer) Tokenise(options *TokeniseOptions, text string) (Iterator, error) {
tokens, err := Coalesce(d.language).Tokenise(options, text)
if err != nil {
return nil, err
}
// Because we can't yield an error in the iterator, try
// creating a root lexer, on the supposition that any
// error it returns is likely to be to do with the iterator
// itself or the options than with the text we'll pass it.
if _, err := d.root.Tokenise(options, ""); err != nil {
return nil, err
}
return func(yield func(Token) bool) {
for t := range tokens {
if t.Type != Other {
if !yield(t) {
return
}
continue
}
rootTokens, err := d.root.Tokenise(options, t.Value)
if err != nil {
continue // What else could we do? Panic?
}
for t := range rootTokens {
if !yield(t) {
return
}
}
}
}, nil
}
I don't think I've got the semantics quite right (the tests fail) but it seems a bit more conceptually simple than merging two token streams. Also with the original approach, ISTM there was no way for a root lexer to be able to distinguish between two independent sections because all the text is mashed together into others.
Aside: idiomatic Go tends to use American spelling, so I'm used to typing Tokenize not Tokenise, even though I'm a Brit. Maybe consider aligning the spelling with convention to reduce API surprises?