Unclear documentation for `getstring`
Here is a part of the docs for getstring
If the actual parsed String is needed, however, you can pass your source and the res.val::PosLen to Parsers.getstring to get the actual parsed String value.
But getstring takes 3 arguments, the last one which is called e. I don't see any description of e in the docstring and how it should be used. Looking at the tests, it seems that 0x00 is passed in.
I think it is the "escapechar", so usually you'd pass opts::Options.e
here's how Options documents that:
https://github.com/JuliaData/Parsers.jl/blob/b1da070ccfe7c93013a408c8509d0fd605754917/src/Parsers.jl#L46
As you say, we should improve the docs!
I think it is the "escapechar", so usually you'd pass opts::Options.e
Hm, I have no Options struct. I just parsed as:
julia> str = "\"foo\" # comment"
"\"foo\" # comment"
julia> res = Parsers.xparse(String, str)
Parsers.Result{PosLen}(-32603, 15, PosLen(0x0000000000200003))
julia> Parsers.getstring(str, res.val, 0x00) # what should go as third arg?
"foo"
ah, i see. I'd strongly recommend explicitly creating an Options and passing it around if calling xparse multiple times, otherwise a new Options object gets constructed on each call.
Anyway, I believe for getstring to return the expected result (if an escapechar was parsed) you need to pass the same e (third arg) as was used by the xparse call. I suppose tests use 0x00 since any value is fine if you know no escapechar was parsed.
The default for xparse/Options is e=UInt8('"'), so when not specifying a different escapechar, i'd say pass UInt8('"') to getstring.
Arguably the e argument of getstring should have a default matching the xparse/Options default. But i don't think the 2-arg xparse function is really recommended, which is probably why no corresponding 2-arg getstring function exists.
Also, is "foo" the result you expected here? That xparse call is using quoted=true. If you wanted "\"foo\" # comment" you may want to use quoted=false
Also, is "foo" the result you expected here? That xparse call is using quoted=true. If you wanted ""foo" # comment" you may want to use quoted=false
Actually, that is not correct, the default is quoted=false. So this parsed until the end of line and returns an error code.
julia> Parsers.Options().quoted
false
julia> res = Parsers.xparse(String, str)
Parsers.Result{PosLen}(-32603, 15, PosLen(0x0000000000200003))
julia> Parsers.text(res.code)
"encountered an opening quote character, initial value parsing succeeded, reached EOF, invalid delimiter"
julia> res = Parsers.xparse(String, str, 1, sizeof(str), Parsers.Options(; quoted=true))
Parsers.Result{PosLen}(5, 6, PosLen(0x0000000000200003))
julia> Parsers.text(res.code)
"encountered an opening quote character, initial value parsing succeeded"
yikes, we have too many different xparse methods that set different defaults.
julia> Parsers.xparse(String, str) # == Parsers.xparse(String, str; quoted=true)
options.quoted = true
Parsers.Result{PosLen}(-32603, 15, PosLen(0x0000000000200003))
julia> Parsers.xparse(String, str, 1, sizeof(str))
options.quoted = true
Parsers.Result{PosLen}(-32603, 15, PosLen(0x0000000000200003))
julia> Parsers.xparse(String, str, 1, sizeof(str), Parsers.Options())
options.quoted = false
Parsers.Result{PosLen}(33, 15, PosLen(0x000000000010000f))
julia> Parsers.xparse(String, str, 1, sizeof(str), Parsers.Options(; quoted=true))
options.quoted = true
Parsers.Result{PosLen}(5, 6, PosLen(0x0000000000200003))
-
The first hits this, which passes
quoted::Bool=true: https://github.com/JuliaData/Parsers.jl/blob/e2259a6e10452cf0d5c541183d455295415d1dbe/src/Parsers.jl#L211-L212 -
The second hits this, which uses
Parsers.XOPTIONShttps://github.com/JuliaData/Parsers.jl/blob/e2259a6e10452cf0d5c541183d455295415d1dbe/src/Parsers.jl#L217-L218- and
XOPTIONShasquoted=truehttps://github.com/JuliaData/Parsers.jl/blob/e2259a6e10452cf0d5c541183d455295415d1dbe/src/Parsers.jl#L164 - (aside: Why does
XOPTIONSexist?)
- and
-
The third hits the same method as 2, but passing in
Parsers.Options()which hasquoted=false: https://github.com/JuliaData/Parsers.jl/blob/e2259a6e10452cf0d5c541183d455295415d1dbe/src/Parsers.jl#L157 -
The fourth hits the same method as 2/3, but passes in
Parsers.Options(; quoted=true)to set that explicitly ...but returns a different answer to 1 (xparse(String, str; quoted=true)), becauseOptions()defaults todelim=nothingwhereas 1 setsdelim=UInt8(',')https://github.com/JuliaData/Parsers.jl/blob/e2259a6e10452cf0d5c541183d455295415d1dbe/src/Parsers.jl#L149
This is now very off-topic from your original issue (sorry), so can move it to a new issue, but i think perhaps we could simplify the xparse interface to make this whole thing a little less confusing / more explicit.
I think i'd be in favour of requiring a user-given ::Options argument.
cc @quinnj
EDIT: opened #124
0x00 is fine to use as the 3rd argument if you're not dealing with quoted/escaped characters at all. Basically any byte that shouldn't occur in your string. Parsers.getstring just checks if the PosLen is escaped and, if so, will do the unescaping for you based on that 3rd argument, so if you know the PosLen will never be escaped (i.e. you're not using quote or escape chars), then it's fine to just ignore all of that.
@nickrobinson251, yeah, the xparse methods have kind of grown into a bit of a spaghetti mess; they were each born from different use-cases/motivations, but I agree it's gotten a bit intractable.