OpenSearch
                                
                                 OpenSearch copied to clipboard
                                
                                    OpenSearch copied to clipboard
                            
                            
                            
                        Split the remote global metadata file to metadata attribute files
Description
We are now uploading the global metadata of a cluster state as a separate file for each metadata attribute like coordination metadata, settings, templates and all of the custom metadata attributes. Remote global state directory will look like below:
base folder/
    |
    |--> index/
    |     | --> index_UUID/
    |              | --> metadata__<inverted_index_metadata_version>__<inverted_codec_version>__<timestamp>.dat
    |              | --> metadata__<inverted_index_metadata_version>__<inverted_codec_version>__<timestamp>.dat  
    |
    |--> global-metadata/
    |       | --> coordination__<inverted_metadata_version>__<inverted_codec_version>__<timestamp>.dat
    |       | --> settings__<inverted_metadata_version>__<inverted_codec_version>__<timestamp>.dat
    |       | --> templates__<inverted_metadata_version>__<inverted_codec_version>__<timestamp>.dat
    |       | --> custom__<type>__<inverted_metadata_version>__<inverted_codec_version>__<timestamp>.dat
    |
    |
    |--> manifest/
    |       | --> manifest__<inverted_term>__<inverted_version>__<inverted_codec_version>__<timestamp>
    |       | --> manifest__<inverted_term>__<inverted_version>__<inverted_codec_version>__<timestamp>
Splitting the global-metadata into multiple files have improved the incremental metadata upload time to S3 by 50-70%, and full metadata upload by upto 5% because of parallel upload of global metadata attribute and index metadata files. These benchmarks were done by writing a microbenchmark on main (shiv0408/OpenSearch@fe5fad8c4d4684182f7286ed8545819b13b387dd) and on top of PR branch (shiv0408/OpenSearch@88ab1aca1743718c94c2a7753b999dd9e78a36e3)
Following are the benchmark results:
Benchmark on main
Benchmark                                                                            (indicesAliasesTemplates)  Mode  Cnt     Score    Error  Units
RemoteClusterStateBenchmark.measureFullMetadataUpload                                1000|     100|       100|  avgt   30    60.832 ±  0.642  ms/op
RemoteClusterStateBenchmark.measureFullMetadataUpload                               10000|    1000|      1000|  avgt   30   615.146 ±  1.765  ms/op
RemoteClusterStateBenchmark.measureFullMetadataUpload                               20000|    2000|      2000|  avgt   30  1227.299 ±  4.178  ms/op
RemoteClusterStateBenchmark.measureFullMetadataUpload                               50000|    5000|      5000|  avgt   30  3031.392 ± 19.117  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Coordination        1000|     100|       100|  avgt   30     2.440 ±  0.014  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Coordination       10000|    1000|      1000|  avgt   30    25.849 ±  0.105  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Coordination       20000|    2000|      2000|  avgt   30    52.243 ±  0.476  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Coordination       50000|    5000|      5000|  avgt   30   139.867 ±  1.062  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_IndexMetadata       1000|     100|       100|  avgt   30    32.722 ±  0.541  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_IndexMetadata      10000|    1000|      1000|  avgt   30   311.668 ±  2.694  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_IndexMetadata      20000|    2000|      2000|  avgt   30   622.160 ±  3.091  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_IndexMetadata      50000|    5000|      5000|  avgt   30  1578.523 ±  2.661  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Settings            1000|     100|       100|  avgt   30     2.470 ±  0.007  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Settings           10000|    1000|      1000|  avgt   30    26.391 ±  0.250  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Settings           20000|    2000|      2000|  avgt   30    53.320 ±  0.876  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Settings           50000|    5000|      5000|  avgt   30   144.819 ±  1.237  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Templates           1000|     100|       100|  avgt   30     2.814 ±  0.024  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Templates          10000|    1000|      1000|  avgt   30    29.080 ±  0.160  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Templates          20000|    2000|      2000|  avgt   30    60.032 ±  0.397  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Templates          50000|    5000|      5000|  avgt   30   155.970 ±  2.390  ms/op
Benchmark after splitting the global metadata
Benchmark                                                                            (indicesAliasesTemplates)  Mode  Cnt     Score    Error  Units
RemoteClusterStateBenchmark.measureFullMetadataUpload                                1000|     100|       100|  avgt   30    59.594 ±  0.323  ms/op
RemoteClusterStateBenchmark.measureFullMetadataUpload                               10000|    1000|      1000|  avgt   30   599.334 ±  2.941  ms/op
RemoteClusterStateBenchmark.measureFullMetadataUpload                               20000|    2000|      2000|  avgt   30  1198.450 ±  5.466  ms/op
RemoteClusterStateBenchmark.measureFullMetadataUpload                               50000|    5000|      5000|  avgt   30  2990.730 ± 15.318  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Coordination        1000|     100|       100|  avgt   30     0.800 ±  0.019  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Coordination       10000|    1000|      1000|  avgt   30     8.483 ±  0.059  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Coordination       20000|    2000|      2000|  avgt   30    17.231 ±  0.271  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Coordination       50000|    5000|      5000|  avgt   30    65.734 ±  1.375  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_IndexMetadata       1000|     100|       100|  avgt   30    31.890 ±  0.295  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_IndexMetadata      10000|    1000|      1000|  avgt   30   304.154 ±  0.994  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_IndexMetadata      20000|    2000|      2000|  avgt   30   606.649 ±  1.042  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_IndexMetadata      50000|    5000|      5000|  avgt   30  1530.920 ± 14.235  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Settings            1000|     100|       100|  avgt   30     0.832 ±  0.008  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Settings           10000|    1000|      1000|  avgt   30     8.253 ±  0.226  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Settings           20000|    2000|      2000|  avgt   30    20.208 ±  0.280  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Settings           50000|    5000|      5000|  avgt   30    65.269 ±  0.439  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Templates           1000|     100|       100|  avgt   30     1.166 ±  0.005  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Templates          10000|    1000|      1000|  avgt   30    12.657 ±  0.245  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Templates          20000|    2000|      2000|  avgt   30    26.883 ±  0.419  ms/op
RemoteClusterStateBenchmark.measureIncrementalClusterStateUpdate_Templates          50000|    5000|      5000|  avgt   30    88.283 ±  0.754  ms/op
Related Issues
Resolves #12468 Resolves #10645
Check List
- [x] New functionality includes testing.
- [x] All tests pass
 
- [x] New functionality has been documented.
- [x] New functionality has javadoc added
 
- [x] Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
- [x] Commits are signed per the DCO using --signoff
- [x] Commit changes are listed out in CHANGELOG.md file (See: Changelog)
- [x] Public documentation issue/PR created
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Compatibility status:
Checks if related components are compatible with change 8244c6d
Incompatible components
Skipped components
Compatible components
:x: Gradle check result for 6bf7bc963b0dd9b7a6153b675ae46801430dd1b3: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for d0875f97cd77fb5eaeec4254471a0f08743f3fc4: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for f6a243119d33cdd69c1701666897ce749ff4f29d: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for f3e853b23411842a71343acbdcc21ba47df1fa48: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 5d6a0ad25b77c2ba69e8480559b428893df671b6: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
Looks good on a high level. Can you move it out of draft ?
:x: Gradle check result for 279dbbe24de4283e7a93565a7e2f6483f90d6c88: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
@shiv0408 Seems like there is more context missing for this pull request- why break this apart? Can you articulate what use cases this change improves?
Tagging @sachinpkale for review
:x: Gradle check result for fc270d16a9dc770ebe747a4d8da5ce248c5911a5: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for adb4cf2d8bc1ecf11a3a565595b96868c8abc849: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 0b3873655e5a18b20a98605a4473d7c0a3b02365: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for c86c0f1e7ec60de6ceb388b5937e5c7a32918bc9:
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 3ed92e5f458585017a77235cd60c461aacb8345b: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for cd5c9a5b256119fe2dbe71c6832343e20d3b9ee6: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for cd5c9a5b256119fe2dbe71c6832343e20d3b9ee6: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 2bd97d7fa1739252bd5cfc7c6906f41eef71f20c: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 2bd97d7fa1739252bd5cfc7c6906f41eef71f20c: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:white_check_mark: Gradle check result for 2bd97d7fa1739252bd5cfc7c6906f41eef71f20c: SUCCESS
:x: Gradle check result for 8244c6db0914b2d596c3b68d4e051f80fbebcfb5: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 425cf2097a52d292bff2f532ea183030e3e737ce: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:grey_exclamation: Gradle check result for 8efc1edd9bc96ad279d171e6e7dbbe637395fd34: UNSTABLE
- TEST FAILURES:
      1 org.opensearch.gateway.RecoveryFromGatewayIT.testShardStoreFetchMultiNodeMultiIndexesUsingBatchAction
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.
:white_check_mark: Gradle check result for 6c637c038adcec6acbcd1c8d2d0e866abbae3118: SUCCESS
Codecov Report
Attention: Patch coverage is 78.85835% with 100 lines in your changes are missing coverage. Please review.
Project coverage is 71.55%. Comparing base (
b15cb0c) to head (4f8a64e). Report is 285 commits behind head on main.
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12190      +/-   ##
============================================
+ Coverage     71.42%   71.55%   +0.13%     
- Complexity    59978    61237    +1259     
============================================
  Files          4985     5060      +75     
  Lines        282275   287854    +5579     
  Branches      40946    41689     +743     
============================================
+ Hits         201603   205965    +4362     
- Misses        63999    64928     +929     
- Partials      16673    16961     +288     
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:white_check_mark: Gradle check result for 494aacc7c73671d76e298284fcbcee1a3072636f: SUCCESS
:x: Gradle check result for 928b65036d200c15865f33994a88349875b7f6f0: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:white_check_mark: Gradle check result for 2ebfc6de25614dc30604e469ed3f3d37df85ce9b: SUCCESS
[Storage Triage - attendees 1 2 3 4 5 6 7 8 9 10 11 12 13]
@shiv0408 Looking forward to seeing this improvement merged. Please add the updated release target version.
:x: Gradle check result for fb0b6aaaed7a53f4c6811db6f78c0f0e7f42e644: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?