ion-rust icon indicating copy to clipboard operation
ion-rust copied to clipboard

Adds location support with `LazyValue` API

Open desaikd opened this issue 9 months ago • 3 comments

Issue #909

Description of changes: This PR works on adding support for location metadata in LazyValue API.

List of changes:

  • Modifies top level matching logic in buffer to preserve row and prev_newline_offset count.
  • Adds a way to store row and prev_newline_offset value in the RawReaderState. This helps in preserving saved state whenever resume is used on the streaming reader.
  • Since streaming reader uses readers for each encoding get these values for encoding Reader types.
  • Add location (usize, usize) to LazyValue
  • Adds unit tests for location support

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

desaikd avatar Mar 03 '25 17:03 desaikd

Codecov Report

Attention: Patch coverage is 85.54913% with 25 lines in your changes missing coverage. Please review.

Project coverage is 77.91%. Comparing base (edf2321) to head (671cd75). Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
src/lazy/any_encoding.rs 60.71% 11 Missing :warning:
src/lazy/binary/raw/reader.rs 76.92% 3 Missing :warning:
src/lazy/binary/raw/v1_1/reader.rs 76.92% 3 Missing :warning:
src/lazy/binary/raw/v1_1/value.rs 0.00% 3 Missing :warning:
src/lazy/binary/raw/value.rs 0.00% 3 Missing :warning:
src/lazy/expanded/mod.rs 66.66% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #922      +/-   ##
==========================================
+ Coverage   77.87%   77.91%   +0.03%     
==========================================
  Files         138      138              
  Lines       34818    34979     +161     
  Branches    34818    34979     +161     
==========================================
+ Hits        27115    27253     +138     
- Misses       5701     5724      +23     
  Partials     2002     2002              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Mar 03 '25 20:03 codecov[bot]

Benchmarks by disabling and enabling location metadata for LazyValue

LazyValue support and with source-location feature disabled:

Gnuplot not found, using plotters backend
Text Ion 1.0 data size: 3690000 bytes
Bin  Ion 1.0 data size: 2670089 bytes
text 1.0/scan all       time:   [28.825 ms 28.988 ms 29.147 ms]
                        change: [+58.126% +58.957% +59.878%] (p = 0.00 < 0.05)
                        Performance has regressed.
text 1.0/read all       time:   [26.999 ms 27.078 ms 27.163 ms]
                        change: [+2.0400% +3.2512% +4.1874%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 19 outliers among 100 measurements (19.00%)
  14 (14.00%) high mild
  5 (5.00%) high severe
text 1.0/read 'format' field
                        time:   [22.133 ms 22.194 ms 22.261 ms]
                        change: [+2.9679% +3.3273% +3.6755%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

=== v1.1: maximally compact ===
Binary data size: 620000 bytes
Text   data size: 890000 bytes
maximally compact text 1.1/scan all
                        time:   [8.1125 ms 8.1636 ms 8.2191 ms]
                        change: [-12.529% -11.686% -10.873%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
maximally compact text 1.1/read all
                        time:   [19.497 ms 19.558 ms 19.620 ms]
                        change: [-7.2783% -6.8479% -6.4477%] (p = 0.00 < 0.05)
                        Performance has improved.
maximally compact text 1.1/read 'format' field
                        time:   [9.4405 ms 9.4808 ms 9.5267 ms]
                        change: [-11.395% -10.733% -10.121%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

=== v1.1: moderately compact ===
Binary data size: 1120000 bytes
Text   data size: 1360000 bytes
moderately compact text 1.1/scan all
                        time:   [8.0763 ms 8.0853 ms 8.0948 ms]
                        change: [-13.695% -13.311% -12.976%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
moderately compact text 1.1/read all
                        time:   [16.860 ms 17.047 ms 17.357 ms]
                        change: [-10.334% -9.0061% -7.2801%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
moderately compact text 1.1/read 'format' field
                        time:   [9.7043 ms 9.7375 ms 9.7724 ms]
                        change: [-9.3995% -8.8850% -8.4146%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

=== v1.1: moderately compact w/length-prefixed top level ===
Binary data size: 1140000 bytes
Text   data size: 1360000 bytes
moderately compact w/length-prefixed top level text 1.1/scan all
                        time:   [8.1667 ms 8.1939 ms 8.2284 ms]
                        change: [-14.549% -13.700% -12.941%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  7 (7.00%) high mild
  9 (9.00%) high severe
moderately compact w/length-prefixed top level text 1.1/read all
                        time:   [16.795 ms 16.833 ms 16.872 ms]
                        change: [-9.2602% -8.7515% -8.3216%] (p = 0.00 < 0.05)
                        Performance has improved.
moderately compact w/length-prefixed top level text 1.1/read 'format' field
                        time:   [9.5801 ms 9.5935 ms 9.6092 ms]
                        change: [-11.235% -10.807% -10.464%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe

LazyValue support and with source-location feature enabled:

Gnuplot not found, using plotters backend
Text Ion 1.0 data size: 3690000 bytes
Bin  Ion 1.0 data size: 2670089 bytes
text 1.0/scan all       time:   [24.931 ms 25.045 ms 25.183 ms]
                        change: [-14.226% -13.603% -12.917%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe
text 1.0/read all       time:   [31.497 ms 31.548 ms 31.598 ms]
                        change: [+16.092% +16.505% +16.896%] (p = 0.00 < 0.05)
                        Performance has regressed.
text 1.0/read 'format' field
                        time:   [26.873 ms 26.964 ms 27.076 ms]
                        change: [+20.944% +21.496% +22.129%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  8 (8.00%) high severe

=== v1.1: maximally compact ===
Binary data size: 620000 bytes
Text   data size: 890000 bytes
maximally compact text 1.1/scan all
                        time:   [9.1104 ms 9.1191 ms 9.1289 ms]
                        change: [+10.946% +11.705% +12.418%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
maximally compact text 1.1/read all
                        time:   [20.958 ms 20.993 ms 21.028 ms]
                        change: [+6.9559% +7.3383% +7.7246%] (p = 0.00 < 0.05)
                        Performance has regressed.
maximally compact text 1.1/read 'format' field
                        time:   [10.495 ms 10.529 ms 10.573 ms]
                        change: [+10.389% +11.061% +11.729%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe

=== v1.1: moderately compact ===
Binary data size: 1120000 bytes
Text   data size: 1360000 bytes
moderately compact text 1.1/scan all
                        time:   [9.2199 ms 9.2506 ms 9.2918 ms]
                        change: [+13.998% +14.412% +14.860%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) high mild
  10 (10.00%) high severe
moderately compact text 1.1/read all
                        time:   [18.236 ms 18.269 ms 18.305 ms]
                        change: [+5.2620% +7.1710% +8.3892%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe
moderately compact text 1.1/read 'format' field
                        time:   [10.620 ms 10.632 ms 10.645 ms]
                        change: [+8.7749% +9.1862% +9.5837%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

=== v1.1: moderately compact w/length-prefixed top level ===
Binary data size: 1140000 bytes
Text   data size: 1360000 bytes
moderately compact w/length-prefixed top level text 1.1/scan all
                        time:   [9.2430 ms 9.2693 ms 9.2992 ms]
                        change: [+12.563% +13.125% +13.659%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe
moderately compact w/length-prefixed top level text 1.1/read all
                        time:   [18.280 ms 18.322 ms 18.366 ms]
                        change: [+8.4830% +8.8496% +9.1932%] (p = 0.00 < 0.05)
                        Performance has regressed.
moderately compact w/length-prefixed top level text 1.1/read 'format' field
                        time:   [10.673 ms 10.699 ms 10.725 ms]
                        change: [+11.189% +11.524% +11.860%] (p = 0.00 < 0.05)
                        Performance has regressed.

Above results show 16% regression for this implementation to support location in LazyValue.

desaikd avatar Mar 04 '25 21:03 desaikd

As I've been reading the code, I think that we could simplify things by storing the data as something like this:

#[cfg(not(feature = "source-location"))]
pub type LocationInternalState = PhantomData<()>;
#[cfg(feature = "source-location")]
pub type LocationInternalState = Option<(NonZero<usize>, usize)>;

Then, we don't need to have #[cfg(feature = "source-location")] all over the codebase. We can have these fields always present with no performance penalty because when the feature is not enabled, it's a zero-sized type.

I think having it with a LocationInternbalState would give the same performance results as this current implementation. But yes, for code readability this suggested solution would be better.

desaikd avatar Mar 06 '25 18:03 desaikd

@desaikd This PR was replaced by #941, right? Can we close this?

popematt avatar Apr 10 '25 23:04 popematt

Yes, this has been replaced. Closing it with completion of #941

desaikd avatar Apr 10 '25 23:04 desaikd