deser-hjson icon indicating copy to clipboard operation
deser-hjson copied to clipboard

Fix multiline and quoteless

Open pamaury opened this issue 2 years ago • 8 comments

The goal of this pull request is to address to areas of the specification that are unclear and not handled well by the parser. This is all documented in the code but in summary:

  • The parser does not handle multilines where text immediately follows ''', like it
desc: '''blabla'''

or

desc: '''blabla
           hello'''

There are also so unclear behaviour related to spaces in this case.

  • The parser does not handle well quoteless string that look like numbers, such as
x: 0x32

because it thinks 0x32 is a number. This can happen if the parser has no information on the type and is just trying to skip an entry of a map.

  • This also raises some interesting questions that are (again) very unclear in the specification:
sameline1_number: 10, sameline1_string: abc
sameline2_string1: hello, sameline2_string2: abc
sameline2_string: 30 19, sameline2_string2: abc
string_with_space: 10 apples

Probably the first line should have two fields (that's what the official parser does), but what about the second and third? Since 30 19 is not a number, AND quoteless strings can have commas, it should be sameline2_string maps to 30 19, sameline2_string2: abc. The same should happen with hello I think. This is what the official parser does anyway.

This PR addresses the first two points. It happens that sameline1_number: 10, sameline1_string: abc and sameline2_string1: hello, sameline2_string2: abc are parsed "correctly". The last point is not fully address and even though I have added a test for it, it does not pass with the code in this PR. For example, string_with_space: 10 apples fails to parse.

pamaury avatar Aug 01 '23 14:08 pamaury

@pamaury Just letting you know I saw your PR, added it to my TODO list, but probably won't be able to handle it soon (maybe not this month) due to other emergencies.

Canop avatar Aug 01 '23 15:08 Canop

@Canop Thank you for your quick reply, there is no rush. I might try to fix the remaining issue with x: 10 apples in the mean time.

pamaury avatar Aug 01 '23 16:08 pamaury

According to the specification, a quoteless string always go till the LF, commas or not.

sameline2_string1: hello, sameline2_string2: abc is only one key (sameline2_string1) with value being hello, sameline2_string2: abc

sameline2_string: 30 19, sameline2_string2: abc depends on whether sameline2_string requires a number of a string:

  • if it requires a number, then it's an error
  • if it requires a string, then the value is 30 19, sameline2_string2: abc

The multiline string problem is comparatively simpler, and I'd probably accept a PR for just this part.

Canop avatar Oct 26 '23 14:10 Canop

@pamaury can you cut this PR in parts and make one for just the multiline problem ? See https://github.com/Canop/deser-hjson/issues/19

If you don't, I'll do it.

Canop avatar Nov 09 '23 06:11 Canop

Thank you for the answer @Canop, yes I can split the PR, I'll try to do it tomorrow and resubmit just the part about multilines.

pamaury avatar Nov 09 '23 07:11 pamaury

I'll try to do it tomorrow

@pamaury Does it work out or should I just take this part ?

Canop avatar Nov 24 '23 15:11 Canop

Sorry @Canop, I got sidetracked with something else at work. Please feel free to just take the part that about the multiline.

pamaury avatar Nov 24 '23 15:11 pamaury

I ended up doing the multiline differently, because your code, while better than my previous one, wasn't correctly handling some (weird) cases of multiline strings (see https://github.com/Canop/deser-hjson/blob/main/tests/multiline_strings.rs)

Canop avatar Nov 26 '23 21:11 Canop