fparsec
fparsec copied to clipboard
Trying to parse a 1.7GB text file throws ArgumentOutOfRangeException
Is there some limitation to how big a file FParsec supports? What I could find out from the code is that it reads by chunks, but I cannot seem to find which StringBuilder.Append is failing.
System.ArgumentOutOfRangeException: The length cannot be greater than the capacity. (Parameter 'valueCount')
at System.Text.StringBuilder.Append(Char* value, Int32 valueCount)
at System.Text.StringBuilder.Append(Char[] value, Int32 startIndex, Int32 charCount)
at FParsec.CharStream.StreamConstructorContinue(Stream stream, Boolean leaveOpen, Encoding encoding, Boolean detectEncodingFromByteOrderMarks, Int32 byteBufferLength)
at FParsec.CharStream..ctor(String path, Encoding encoding, Boolean detectEncodingFromByteOrderMarks, Int32 byteBufferLength)
at FParsec.CharStream..ctor(String path, Encoding encoding)
at FParsec.CharStream`1..ctor(String path, Encoding encoding)
at FParsec.CharParsers.runParserOnFile[a,u](FSharpFunc`2 parser, u ustate, String path, Encoding encoding)
File.ReadAllText on the same file throws System.OutOfMemoryException: Insufficient memory to continue the execution of the program.
so I have to parse it in chunks.
The version of FParsec that is shipped in the FParsec
NuGet package can't parse arbitrarily long streams, see http://www.quanttec.com/fparsec/download-and-installation.html#nuget-packages
The FParsec-Big-Data-Edition
version does, but unfortunately it hasn't yet been ported to .NET Core.
Ok, thanks! Will it require a lot of code change to make it netstandard2.0 or is it more or less a update project files job? I could probably contribute with that although it seems Enrico has done that job already maybe?
AFAIK, the biggest issue is that the encoding decoders in .NET Core are not serializable, which breaks the non-low-trust implementation of CharStream
.