fparsec icon indicating copy to clipboard operation
fparsec copied to clipboard

Trying to parse a 1.7GB text file throws ArgumentOutOfRangeException

Open atlemann opened this issue 4 years ago • 3 comments

Is there some limitation to how big a file FParsec supports? What I could find out from the code is that it reads by chunks, but I cannot seem to find which StringBuilder.Append is failing.

System.ArgumentOutOfRangeException: The length cannot be greater than the capacity. (Parameter 'valueCount')
   at System.Text.StringBuilder.Append(Char* value, Int32 valueCount)
   at System.Text.StringBuilder.Append(Char[] value, Int32 startIndex, Int32 charCount)
   at FParsec.CharStream.StreamConstructorContinue(Stream stream, Boolean leaveOpen, Encoding encoding, Boolean detectEncodingFromByteOrderMarks, Int32 byteBufferLength)
   at FParsec.CharStream..ctor(String path, Encoding encoding, Boolean detectEncodingFromByteOrderMarks, Int32 byteBufferLength)
   at FParsec.CharStream..ctor(String path, Encoding encoding)
   at FParsec.CharStream`1..ctor(String path, Encoding encoding)
   at FParsec.CharParsers.runParserOnFile[a,u](FSharpFunc`2 parser, u ustate, String path, Encoding encoding)

File.ReadAllText on the same file throws System.OutOfMemoryException: Insufficient memory to continue the execution of the program. so I have to parse it in chunks.

atlemann avatar Oct 31 '19 12:10 atlemann

The version of FParsec that is shipped in the FParsec NuGet package can't parse arbitrarily long streams, see http://www.quanttec.com/fparsec/download-and-installation.html#nuget-packages The FParsec-Big-Data-Edition version does, but unfortunately it hasn't yet been ported to .NET Core.

stephan-tolksdorf avatar Nov 10 '19 15:11 stephan-tolksdorf

Ok, thanks! Will it require a lot of code change to make it netstandard2.0 or is it more or less a update project files job? I could probably contribute with that although it seems Enrico has done that job already maybe?

atlemann avatar Nov 11 '19 08:11 atlemann

AFAIK, the biggest issue is that the encoding decoders in .NET Core are not serializable, which breaks the non-low-trust implementation of CharStream.

stephan-tolksdorf avatar Nov 13 '19 19:11 stephan-tolksdorf