libStatGen icon indicating copy to clipboard operation
libStatGen copied to clipboard

Incorrect behavior with ifeof() when file length matches default buffer size

Open welchr opened this issue 1 year ago • 0 comments

I believe this library is deprecated, but wanted to leave this info for future reference in case someone else runs into this problem.

It seems as though ifeof() does not return true when the last line in the file fills the buffer exactly (to length DEFAULT_BUFFER_SIZE).

If you read through a file using String::readLine():

https://github.com/statgen/libStatGen/blob/fae4fca874b3b78bf9b61c0eae080c15edd976a4/general/StringBasics.cpp#L744-L776

It calls ifgetc() repeatedly:

https://github.com/statgen/libStatGen/blob/fae4fca874b3b78bf9b61c0eae080c15edd976a4/general/InputFile.h#L324-L341

When the last line fills the buffer exactly, myBufferIndex == myCurrentBufferSize == DEFAULT_BUFFER_SIZE.

But when ifeof() checks to see if we have arrived at EOF:

https://github.com/statgen/libStatGen/blob/fae4fca874b3b78bf9b61c0eae080c15edd976a4/general/InputFile.h#L386-L404

It checks if myBufferIndex < myCurrentBufferSize, which will be false, and it thinks the file should continue being read from.

This can result in a seg fault depending on what you try to do with the buffer at that point, after you've tried to read another line (that doesn't exist) from the file.

Code sample:

#include "InputFile.h"
#include "StringBasics.h"
#include "StringArray.h"
#include <iostream>
#include <string>

int main(int argc, char *argv[]) {
  std::string bad_filepath = argv[1];
  IFILE test = ifopen(bad_filepath.c_str(), "r");

  // This loop is roughly what RAREMETAL does when reading in a covariance file
  while (!ifeof(test)) {
    String buffer;
    buffer.ReadLine(test);

    StringArray tokens;
    tokens.AddTokens(buffer, "\t ");

    int p = tokens[0].Find("chr");
  }

  std::cout << "Finished";
  return 0;
}

Attached file, run with above code, will cause a seg fault.

The simple solution to this (if you're a user looking to get around this bug) is just add a zero somewhere it doesn't matter at the end of the file, or an extra space, and the file will be read successfully.

welchr avatar Feb 28 '23 23:02 welchr