Discussion: how to implement `Source` for lazy readers, like `T: Read` or `T: BufRead`
Hello all,
I'd like to open this discussion because, to me, it would be fascinating that Logos supports Source types others than str and [u8], especially lazy readers like those who implement Read or BufRead.
impl<T: Read> Source <T>, or impl<T: Read + Seek> Source <T> would be a game changer to me, as it would allow to lex some string without needing to allocate it completely.
I have tried a bit of different implementation, but I already see some shortcomings that need to be addressed or discussed:
-
Source::len() -> usizeshould maybe beSource::len_hint() -> Option<usize> -
Source::read_*methods should take mutable reference to the reader (but it maybe will reduce performances for types likestror[u8]that do not need mutability). - unsafe methods do not really make sense here, so I don't know how to deal with them (except by copying and pasting the safe equivalent).
- reading with
offsetposition may not be good, especially since this may require usingSeek::seek. If backtracking is never allowed, then using onlyreadmethods should be fine, no? - Tokens take a reference from the original source, so I don't know if implementing for
Readis enough, because we may be loosing all reference to the original source. ImplementingSource for Bytesmay be a solution.
My question is then: did anyone already think about this problem? Has anyone some ideas or suggestions?
- Shouldn't be too much of a blocker. Even when reading from disk a simple
fstator equivalent shouldn't be too expensive, especially if the result is cached.
The changes required to make logos itself accept a mutable source are non-trivial, but this might be of interest to you if you're just looking for a way to leverage a logos lexer with a Read/BufRead:
https://github.com/cliffeh/logos-genawaiter/blob/main/src/main.rs
It's not exactly efficient and in its current form only works with line-wise input, but it should give some idea of the possibilities.