htslib icon indicating copy to clipboard operation
htslib copied to clipboard

Equivalent of int feof(FILE *stream) for hFILE in htslib/hfile.h

Open freeseek opened this issue 6 years ago • 5 comments

How do you test whether you are at the end of an hFILE?

The simplest way I could come up is the following:

if ( hgetc(fp) != EOF ) ...

However, this is not ideal because it increases fp->begin by one. What is the most appropriate way? I thought of a couple of options:

int heof(hFILE *fp)
{
    return (fp->end==fp->begin) && (refill_buffer(fp)==0);
}

Or the following:

int heof(hFILE *fp)
{
    if ( hgetc(fp) == EOF ) return 1;
    fp->begin--;
    return 0;
}

Either way, it feels weird that hFILE does not have an equivalent for int feof(FILE *stream).

freeseek avatar Jun 25 '18 21:06 freeseek

I was pondering this only recently. It appears currently the way the code works if you keep reading until you get a negative result and then check if errno is non-zero via herrno(fp), indicating a genuine error instead of EOF. This isn't particularly ideal.

Note hFILE struct has an at_eof field which is ideal for this, but while it is in the public header files it has a statement that the structure should be considered as opaque. I'm not sure it is the same thing though, as it looks to be an indicator that the underlying stream is at eof and not necessary our buffer is at eof. However the two combined (we're at end of buffer plus underlying buffer is at eof) would suffice. This essentially matches your first implementation.

@jmarshall wrote this code so he'll know the best route forward.

jkbonfield avatar Jun 26 '18 07:06 jkbonfield

hopen/hread/hwrite/hclose are pretty obviously based on the open/read/write/close system calls, and you test whether you're at the end of the file in the same way: You try to read, and if it gets 0 bytes, you're at EOF (and if it returns negative, you got an error).

It has some extra convenience reading functions, at least one of which (hgetc) doesn't distinguish EOF and errors in its return value. The normal case is hgetc returning EOF means end-of-file, and you would use herrno to check for the abnormal case.

Apart from needing it if your code particularly uses hgetc (but that would be unusual as most calling code will want performance, which usually means reading whole structures at once), the herrno() function is mostly just for convenience, so you don't have to manually preserve errno quite so compulsively after encountering an error from a hFILE function.

If you want to check ahead of time whether you're at EOF… usually this is an indication that you've organised your code wrong, but the obvious way to do it would be to try to hpeek one character.

The hFILE struct is to be considered opaque because to do otherwise would be insane. There is a statement in the public header to that effect for the benefit of third-party developer users, but it ought to be obvious to HTSlib maintainers :smile:.

The at_eof documentation (in hfile.c) is itself a little opaque, but looking at the code you'll soon see it records whether EOF has been seen on the backend. Usually this is just an optimisation to avoid calling hFILE_backend::read() unnecessarily, but when you're typing on the terminal it's what means you don't have to press ^D an indeterminate number of times. So that's not directly relevant.

Having said all that, I would not be averse to adding an heof() function to the API, even though using it would be an indication that you've organised your reading loop badly. The implementation could be similar to your first one but without omitting the error checking, or the moral equivalent of hpeek(1)

The proposed semantics for heof() has it trying to read from the file itself — so it is NOT an equivalent for feof(), which merely tests whether previous reading has set an indicator. So I don't recommend we add that implementation to hFILE as it will encourage people to use feof()wrongly. We may be able to implement an heof() that just tests an EOF indicator (possibly the existing begin/end and backend at_eof suffice, or we may need to add another bespoke indicator bit), but I'm not sure it would be a particular improvement.

jmarshall avatar Jun 26 '18 09:06 jmarshall

For clarification, it was obvious to me why hFILE is opaque. I was simply pointing out that while it exists, do read the comments about the opaqueness.

jkbonfield avatar Jun 26 '18 09:06 jkbonfield

I know you know; I was pointing out that it has a statement that the structure should be considered as opaque mixes up cause and effect. A philosophical point perhaps.

Anyway, noting that:

  • hFILE mostly models low-level read/write system calls, which don't have any associated eof indicator or eof function;

  • to the extent that hFILE has some aspects of high-level stdio reader functions, herrno() suffices to distinguish the cases after hgetc() returns EOF (note that after fgetc() returns EOF, feof() is basically equivalent to !ferror());

  • herrno() is otherwise simply a convenient way of preserving errno when it is set by hFILE functions;

  • for the test-at-the-top-of-the-read-loop use case, the existing hpeek(1) is a better alternative to heof() as it clearly signals that some reading is occurring

do any of the OP or maintainers continue to feel that it is weird that there is no heof() function?

jmarshall avatar Jun 26 '18 09:06 jmarshall

I'm pretty ambivalent about it really.

On one hand it's mirroring UNIX read/write/seek functionality and it's not a big deal either to check via errno. On the other hand that doesn't necessarily mean it has to lack a eof checking function.

Fundamentally I see the bigger problem to be lacking documentation and usage examples, but that's one of those uneviable tasks!

jkbonfield avatar Jun 26 '18 12:06 jkbonfield