yojson icon indicating copy to clipboard operation
yojson copied to clipboard

Extension for handling Yojson data with code positions

Open gfngfn opened this issue 7 years ago • 1 comments

This PR proposes an extension that enables us to handle Yojson data equipped with code positions the data come from. It adds

  • a submodule Yojson.SafePos and
  • a type Yojson.position for code positions,

and thereby maintains backward compatibility with the latest version (i.e. yojson 1.4.1). It probably be useful in situations like the following: where one expects the value corresponding to a certain name (such as "title") in objects to be a specific kind of data (such as string) but some data in a given (possibly handwritten) Yojson do not meet the expectation. In such case, one can report the code position where the unexpected data occur by using Yojson.SafePos in order to ask for correction in an easy-to-understand manner.

For an example of usage, see examples/filtering_pos.ml (which can be run by invoking the updated run-examples.sh).

gfngfn avatar Oct 22 '18 10:10 gfngfn

Hi! Thanks for looking into this.

It seems that keeping track of locations would be useful to people who use yojson directly i.e. not from atdgen and don't have strict performance requirements. I imagine this fits the scenarios where json is used as a simple config file. Note that there's a bit of tension between achieving the best performance when working with large amounts of machine-generated json data and with human-written json which benefits from better error reporting.

A thing I never particularly like about yojson is that we have already 3 modules to pick from (Basic, Safe, and Raw) and that they're generated by cppo. Are you confident that this fourth module is the right one? Is there anything else we should add to it or remove while we're at it?

Other than that, my concern is about the performance loss introduced by creating and discarding location objects. Did you run performance benchmarks? I haven't done that recently, but based on results I got in the past, it would be significantly faster (10% or more) to call get_raw_position only if the position is not discarded. Can you try getting rid of these calls by using macros or preprocessor conditionals?

mjambon avatar Nov 19 '18 20:11 mjambon