csv-parser icon indicating copy to clipboard operation
csv-parser copied to clipboard

Parser can fail for large files (>128 KiB) due to stale reference

Open SydMontague opened this issue 2 years ago • 3 comments

Description

When constructing a parser a std::istream referenced is used and stored. However, the lifecycle of the std::istream can be shorter than the parser, leaving it with a stale reference.

This won't affect small files (<= INPUTBUF_CAP), since those will be buffered on construction. However, with larger files it will continuously attempt to read from the stream. If the reference has become stale at that point it will result in an error.

This can for example happen when a Parser is constructed and stored as a class member with a locally scoped std::ifstream.

Sample Code

struct Test {
  std::unique_ptr<aria::csv::Parser> parser;
  
  Test(std::string path) {
    std::ifstream stream(path, std::ios::in);
    parser = std::make_unique<area::csv::Parser>(stream);
  }
};

Solutions/Workarounds

The problem can be avoided by making sure the std::istream doesn't go out of scope before the parser, making this error mostly a user error.

However, I think for a future revision it might be worthwhile to have the parser take possession of the passed stream, for example in form of a smart pointer.

Until then this issue might serve as help for the next poor fellow who falls into that trap. ;)

SydMontague avatar Aug 14 '22 12:08 SydMontague

@SydMontague Do you know if your suggestions have been applied to the current codebase? If not, can you please show me what changes I need to make at my end to address this? Thank you. 🙏

ajtruckle avatar Mar 31 '24 11:03 ajtruckle

The issue hasn't been directly addressed in the codebase yet. Doing so would require an API breakage of some kind.

If you're just using the library, simply make sure the istream you're passing to the CsvParser class never goes out of scope before you're done using the parser. If you want to create a patch for the library itself then you'd have to pick one of the ways the class can take possession of the argument passed to it. The easiest way would be changing the member and constructor argument from std::istream& to std::shared_ptr<std::istream>.

There might be other solutions, but I'd need to invest a bit more brain power to think about their viability.

SydMontague avatar Mar 31 '24 11:03 SydMontague

@SydMontague That's fine then. In this case I have a simple need for reading a CSV and it is all done in the one function.

I did manage to fork a new branch with my static DetectUtf8_BOM method for the benefit of others (#22). I created a pull request.

ajtruckle avatar Mar 31 '24 12:03 ajtruckle