OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[Bulk] Add _index, _id, status to ERROR object

Open aswath86 opened this issue 2 years ago • 7 comments

Description

One of the Bulk API best practices is to reduce the response size using filter_path. AWS OpenSearch document says this,

This response size might seem minimal, but if you index 1,000,000 documents per day—approximately 11.5 documents per second—339 bytes per response works out to 10.17 GB of download traffic per month.

Also, often times, response code for a Bulk request cannot be trusted since document level failures are not known but are only known in the bulk response.

For example, consider the below failed document

{
    "index": {
    "_index": "bulk_response",
    "_id": "2",
    "status": 400,
    "error": {
        "type": "strict_dynamic_mapping_exception",
        "reason": "mapping set to strict, dynamic introduction of [field2x] within [_doc] is not allowed"
    }
    }
}

filter_path such as filter_path=items.index.error will give the below, leaving no clue about which document on what index failed.

  {
    "index": {
      "error": {
        "type": "strict_dynamic_mapping_exception",
        "reason": "mapping set to strict, dynamic introduction of [field2x] within [_doc] is not allowed"
      }
    }
  }

One cannot reduce the response size as well as capture failed documents. The idea is to add the _index, _id and status to the error object too so it gives us this,

  {
    "index" : {
      "error" : {
        "_index" : "bulk_response",
        "_id" : "3",
        "status" : 400,
        "type" : "strict_dynamic_mapping_exception",
        "reason" : "mapping set to strict, dynamic introduction of [field2x] within [_doc] is not allowed"
      }
    }
  }

_index, _id and status would be repeated for those responses that end in an error. Are we ok with that?

May not be super useful when _id is auto-generated but useful when _id is client-generated

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • [ ] New functionality includes testing.
    • [ ] All tests pass
  • [ ] New functionality has been documented.
    • [ ] New functionality has javadoc added
  • [x] Commits are signed per the DCO using --signoff
  • [ ] Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

aswath86 avatar Sep 13 '23 11:09 aswath86

Gradle Check (Jenkins) Run Completed with:

  • RESULT: FAILURE :x:
  • URL: https://build.ci.opensearch.org/job/gradle-check/25479/
  • CommitID: cbc0a9019e7ba33bd50821e0120397443e2ce6cc Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Sep 13 '23 11:09 github-actions[bot]

Compatibility status:

Checks if related components are compatible with change cbc0a90

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git]

github-actions[bot] avatar Sep 13 '23 11:09 github-actions[bot]

This PR is stalled because it has been open for 30 days with no activity.

Hi @aswath86, the PR is stalled. Is this being worked upon? Feel free to reach out to maintainers for further reviews.

ticheng-aws avatar Jan 06 '24 00:01 ticheng-aws

This PR is stalled because it has been open for 30 days with no activity.

This PR is stalled because it has been open for 30 days with no activity.

:x: Gradle check result for 693cea5a52369866c2b0d85f30cfe11af8e2601d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 28 '24 15:06 github-actions[bot]

:x: Gradle check result for 140d25dbfb68d78804e80707a881d76bab60981c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jul 01 '24 08:07 github-actions[bot]

:x: Gradle check result for 620165f3a8fec53914c2d3ebbef6ed969ec5ea83: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jul 01 '24 12:07 github-actions[bot]

@aswath86 Are you planning to continue on this change?

mgodwan avatar Jul 22 '24 15:07 mgodwan