Automa.jl
Automa.jl copied to clipboard
Revamp BioGenerics.Automa.State: Recoverable readers
BioGenerics.jl currently has an object BioGenerics.Automa.State, which is used to track the state of Readers. It has several problems:
- It's in BioGenerics. Why? It's clearly an Automa thing, and it even relies on Automa internals.
- It contains
linenum, despite offering no guarantees that downstream users keep track of it correctly. I believe that, if this is kept track of by readers, it should be in the Reader object themselves. - It contains an unneeded
filledbool value which I would rather get rid of, if possible - It does NOT contain the stream position, which is arguably the most important state of all! That means readers are unrecoverable: If you reach a bad FASTA record, you can't reset the reader or have it tell you its position. Ideally, when encountering malformed data, readers should report the error, then reset itself to the last correct position.
So: Fix these issues. This is a breaking change and so should go in the upcoming breaking change, as well as requiring a breaking change of BioGenerics.
I agree. I'm also not opposed to just breaking BioGenerics or doing without it full stop. As time has gone on, I've become more of the attitude trying to anticipate what readers should look like in advance, defining that in BioGenerics and then making every format reader adhere to that, is a bad idea. When we can just let each format package define it's reader, records, and functions. Well defined and with good interfaces like iteration etc, it should be trivial to write generic code on top of them anyway, and they don't need to care about BioGenerics.