edn-rs
edn-rs copied to clipboard
EdnError rework (new parse impl)
Resolves https://github.com/edn-rs/edn-rs/issues/119
Performance
Note this is using the benchmark in main
branch. The performance increase should be much better on larger strings, as there is now no clone()
in parse. The only allocations are for constructing the EDN struct and for parsing numbers. Note that the walker is now all pointer based. Peeking was a big reason previously for the clones, you can see the new performance characteristics (utf-8 complexity but no allocations) https://godbolt.org/z/zK9dEzrYn
Before
Running benches/parse.rs (target/release/deps/parse-5de3e714a9fe6fe3)
parse time: [23.316 µs 24.270 µs 25.380 µs]
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
Running benches/tagged_parse.rs (target/release/deps/tagged_parse-041b63e457dffa70)
tagged_parse time: [4.4875 µs 4.5609 µs 4.6395 µs]
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
After
Running benches/parse.rs (target/release/deps/parse-5de3e714a9fe6fe3)
parse time: [4.6219 µs 4.6649 µs 4.7142 µs]
change: [-81.813% -80.967% -80.093%] (p = 0.00 < 0.05)
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low mild
4 (4.00%) high mild
8 (8.00%) high severe
Running benches/tagged_parse.rs (target/release/deps/tagged_parse-041b63e457dffa70)
tagged_parse time: [1.6429 µs 1.6504 µs 1.6588 µs]
change: [-65.166% -64.416% -63.697%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
Debug
given
let edn = "#_ ,, #{hello, this will be discarded} #_{so will this} #{this is invalid";
let e = Edn::from_str(edn);
println!("{e:?}");
Before
Err(ParseEdn("None could not be parsed at char count 58"))
After
Err(EdnError { code: UnexpectedEOF, line: Some(1), column: Some(74), index: Some(73) })
Examples/tests can be seen in https://github.com/Grinkers/edn-rs/blob/parse/tests/error_messages.rs
Errors
In this rework, the following strings were also previously being parsed as valid EDN. Fuzzing invalid EDN is still a TODO (separate issue/PR).
":"
"5011227E71367421E12" ; https://github.com/edn-format/edn#symbols
"{ :[0x42] 42 }"
"\\cats"
"{ :foo 42 :foo 43 }"
https://github.com/edn-rs/edn-rs/pull/150/commits/3965bc3221d79b883a7059d7ce72c4c5c2eee366
I ended up adding a CustomError
fallback for crates like edn-derive
, std-only.
@evaporei Matching PR for compatibility over at (this is a breaking change). Please tell me if you have any better ideas. I believe derive will always rely on std, so it's easy enough to add std stuff without messing with the base layer https://github.com/edn-rs/edn-derive/pull/44
Added a new Error for duplicate in sets, matching clojure.
Sorry for the delay, I'm moving to another city and have had limited time. I can take a look as soon as the branch is updated
Codecov Report
Attention: Patch coverage is 77.03927%
with 76 lines
in your changes are missing coverage. Please review.
Project coverage is 71.37%. Comparing base (
c60d05b
) to head (f658a21
).
Files | Patch % | Lines |
---|---|---|
src/deserialize/parse.rs | 81.59% | 53 Missing :warning: |
src/deserialize/mod.rs | 40.00% | 21 Missing :warning: |
src/edn/error.rs | 75.00% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #150 +/- ##
==========================================
- Coverage 73.43% 71.37% -2.06%
==========================================
Files 8 9 +1
Lines 862 877 +15
==========================================
- Hits 633 626 -7
- Misses 229 251 +22
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Sorry for the delay, I'm moving to another city and have had limited time. I can take a look as soon as the branch is updated
No worries, I don't think it's too urgent. It would be nice to be merged in so I can tackle the other issues though. Rebased, which is what I assume you meant. No other changes. Tell me if you want this broken up/squashed in a different way.
I think https://github.com/edn-rs/edn-rs/issues/138 should be a higher priority (github release action -> crates.io publishing). I think we should do a final 0.17 before this PR gets merged too.
Code coverage bot seems really bad? It's marking tons of things as untested.
To compare, here's the source based coverage generated by https://github.com/Grinkers/edn-rs/blob/5ad0427304dc9a05276ef2de36013c82e5ab22e8/bb.edn#L38