opensearch-hadoop icon indicating copy to clipboard operation
opensearch-hadoop copied to clipboard

Performance Improvement compared to Elasticsearch

Open susasidharan opened this issue 1 year ago • 2 comments
trafficstars

What is the bug?

Performed A/B testing, comparing Opensearch index data ingestion from Databricks using elasticsearch-spark-30_2.12-8.6.0.jar vs opensearch-spark-30_2.12-1.0.1.jar. The test using Opensearch Spark as the connector had timings that was 2-3 times more that of Elasticsearch Spark connector.

How can one reproduce the bug?

Test 1: Create 10 separate Opensearch index (same schema) with Parent/Child records. Run the insert or update operations into 10 indices in parallel from databricks using elasticsearch spark connector first and record the timings. Then use Opensearch spark connector and record the timings. Test 2: Create one Opensearch index. Run insert/update operations from databricks using elasticsearch spark connector and notice the timings. Then use Opensearch spark connector and notice the timings.

What is the expected behavior?

The insert/update timings should match or be similar.

What is your host/environment?

Opensearch 2.11, Databricks 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12). Both jars below hosted in S3 buckets. elasticsearch-spark-30_2.12-8.6.0.jar opensearch-spark-30_2.12-1.0.1.jar

Do you have any screenshots?

Yes Test Timings and configs.docx

susasidharan avatar Jul 31 '24 17:07 susasidharan

Catch All Triage - 1, 2, 3

dblock avatar Aug 19 '24 16:08 dblock

@anirudha will you be able to help out on this? Thanks.

Pallavi-AWS avatar Aug 19 '24 20:08 Pallavi-AWS