Streamline file reading to use io streams
I've recently needed to read files from buffers rather than files (streamed from file selection pickers). I've opened up a Pull Request with a quick fix allowing a buffer override: #491
However, a simple buffer override is suboptimal. Ideally, there are as few diverging paths in the code as possible. Additionally, it makes sense to separate concerns of parsing and I/O, because being agnostic to the buffer source enables flexibility and decreases complexity on both sides. Hence, merging this PR is much preferred to #491.
This Pull Request implements my recommendation for a restructure of rdrecord, rdheader and rdann. rdann was easy to modify, but rdrecord and rdheader's ability to read multi-segment records does not easily translate to reads using streams. This results in code that isn't as nice as I was hoping - so someone needing to read data from buffers must therefore implement surrounding functionality, such as pre-reading the header or acting on multi-segment records, themselves. Still, streamlining the functions addressed in this PR already facilitate accomplishing this goal at all, without needing to re-write all of the parsing functionality of wfdb.
Before Merging
This pull request needs:
- [x] Unit Tests passing
- [x] Updated documentation
Concerns and observations
- Previously, the header parser differentiated between remote and local headers. The new implementation makes it impossible to distinguish between the two. I would appreciate input on whether this differentiation was intentional. If so, we can allow an argument to
_rdheaderspecifying the encoding.- Remote headers were decoded using
iso-8859-1 - Local headers were decoded using
ascii - I'm also open to
_rdheaderaccepting a pre-decoded string, since the file is plaintext. I advocate a binary input though to avoid inconsistent coding behavior.
- Remote headers were decoded using
rdheaderpreviously ignored file read errors for local files. I would like input on whether that was intentional, because a file read error could result in corrupted file reads.rdheaderpreviously allowed passing a directory even ifpn_dirwas supplied. In this case, thefile_namedirectory was cut from the record name and silently ignored. I don't think this is the right call, so I cut this functionality. Let me know if there's a reason to keep it.- Some
bufferingarguments are, as of now, removed. It should be easy to re-introduce them into the new enclosing functions, if needed. - This PR removes a lot of now unneeded private functions. While it's unlikely (and discouraged due to internality) they were used before, it could break dependents relying on them.
- The
no_fileparameter could be deprecated, instead using aNonetest against sig_data. This removes a potential failure case because it's unlikely someone passingsig_datadoesn't intend it to be used.