json icon indicating copy to clipboard operation
json copied to clipboard

Get line and column number on valid JSON?

Open oOBoomberOo opened this issue 5 years ago • 3 comments

I'm currently parsing JSON data and validating its input and if there is invalid input I'll display the error message to the user. I can't just use Deserializer trait because to validate these input I need to populate external data first.

oOBoomberOo avatar Mar 27 '20 05:03 oOBoomberOo

I recently wrote json-spanned-value = "0.2" (github, crates.io, docs.rs) to tackle this. Caveats:

  • Currently requires the JSON be loaded into memory first
  • Byte offsets only, you'll need to create line/column information yourself
  • (Ab)uses the fact that serde_json only reads one byte at a time from std::io::Read s
  • Might not play nicely with #[serde(untagged)] enums, or anything else that might make multiple attempts to parse data
  • Requires you to use json_spanned_value::from_* instead of serde_json::from_*
  • Only one user to find all the bugs, missing features, and edge cases so far - myself

That said, it seems to be working OK for my use cases - maybe give it a whirl? Basically, you just wrap anything you want a span of with json_spanned_value::Spanned<...>. You can stick them in Deserializeable struct, or use a json_spanned_value::Value which recursively wraps a whole tree in Spanned sections.

I've got an examples/demo.rs using codespan-reporting = "0.9.5" that can be matched via .vscode/tasks.json problemMatcher:

image

MaulingMonkey avatar Oct 01 '20 00:10 MaulingMonkey

Since serde::Deserializer (and impl) are already visiting each byte of a string or stream, wouldn't it be more efficient to optionally - via feature - parse any combination of \r and \n (pairs treated as a single newline) and pass them to an optionally-implemented visit_newline or something? An implementation such as serde_json could keep track of line numbers at least so it could opt to attach it to serde_json::Value - again, probably as an optional feature to reduce allocations.

toml::Spanned does indeed keep track of bytes, but in many cases a byte offset and length is useful to end users, and without doubly consuming and parsing newlines from a string or stream, an implementation can't efficiently determine line numbers.

heaths avatar Nov 02 '23 21:11 heaths

I'm encoutering a similar issue where I want to parse a JSON document while keeping track of source locations. serde_spanned from the toml crate is probably the best start; but I 100% agree that a (line, column) pair would help a lot for UI, on top of the byte offset.

demurgos avatar Nov 13 '23 00:11 demurgos