libucl icon indicating copy to clipboard operation
libucl copied to clipboard

Fix excessive escaping when using `ucl_object_fromstring()`

Open flowerysong opened this issue 2 years ago • 5 comments

UCL_STRING_ESCAPE might be useful for something (though I'm not sure what) but it's not good default behaviour. Strings are escaped during output to meet the needs of the output format, so adding escape sequences while building the object results in things being escaped twice.

flowerysong avatar Mar 20 '22 04:03 flowerysong

This is clearly an example of breaking POLA policy. If one wants a behaviour to skip escaping, it is always possible to use ucl_object_fromstring_common with UCL_STRING_RAW flag. I don't want to switch the default behaviour of ucl_object_fromstring as it is clearly a major breaking change.

vstakhov avatar Mar 20 '22 12:03 vstakhov

I think it's far more astonishing that ucl_object_fromstring() is unsafe to use (I was certainly surprised.) Ultimately it's up to you, but POLA doesn't mean "never fix major bugs."

flowerysong avatar Mar 20 '22 15:03 flowerysong

I understand and share your frustration: the original decision of the default behaviour of ucl_object_fromstring was definitely not right as it led to double escaping in emitter/parser chains. I would not say that this behaviour is unsafe, as it is actually too safe. The reason why I don't like the idea to merge it is that it might break some use cases when an API user expects that all data imported by this function is perfetly JSON safe. With your change this assumption is broken. And whilst I also think that it is a right thing I don't think that it is a good reason to break the (potentially) existing code after all.

vstakhov avatar Mar 21 '22 20:03 vstakhov

I called it "unsafe" because it destroys the integrity of the data that you hand it. We discovered this behaviour in a part of our program that reads in data from a JSON file, does some processing, then writes out the JSON file. Without knowing the number of times the file has gone through this process we cannot retrieve the original value.

Take as an example:

{
  "foo": "\\\\n"
}

This value could be \\n that has been processed once or \n that has been processed twice. There's no way to tell just by looking at the data.

It's simple to avoid this behaviour once you know it exists, but I believe that fixing the broken interface is better than leaving it there as a trap for the unwary.

flowerysong avatar May 05 '22 04:05 flowerysong

Please don't close this issue, I'm still thinking of a better choice here. I think you are right but I just need to sit and check the use cases more carefully. Thank you!

vstakhov avatar May 17 '22 18:05 vstakhov

changing existing API is hard, but perhaps its possible to add an additional API that addresses this.

dch avatar Jul 15 '23 20:07 dch

I think this should be really done

vstakhov avatar Feb 01 '24 08:02 vstakhov