airbyte [EPIC] E2E testing tool

Tell us about the problem you're trying to solve

Our main goal is to implement an E2E testing tool that should test Airbyte connections via CI. This testing tool will test different connector versions close to the user experience. Such E2E testing will help us to detect possible integration issues before a version release. Potential issues:

Performance degradation (Benchmark)
Critical changes (backward compatibility check)
Incompatible with other connectors (integration compatibility check)
Incompatible with Airbyte core (core compatibility check)

Note! This solution is inspired by the previously designed similar tool. #8243

Describe the solution you’d like

Stage 1. POC - Done :heavy_check_mark:

The initial stage provides us with fundamental functionality. In addition, it's enough to start integration with potential benchmark frameworks.

[x] Configure the new project in a separate repository - Repository
[x] Fulfill readme
[x] Implement common core
- [x] Scenario model
- [x] Scenario consistency validation
- [x] Scenario config parser
- [x] Scenario executor
- [x] Credential model
- [x] Credential config parser (local)
- [x] Mapper credentials and scenario
- [x] Make log formatting similar to Airbyte
[x] Implement basic sync runner
- [x] Connect to existing Airbyte instance
- [x] Create Source
- [x] Create Destination
- [x] Create Connection
- [x] Run sync
- [x] Return sync result (is successful)
[x] Prepare test Airbyte instance (before we automate the tool)
[x] Prepare test source instances with test data (before we automate the tool)
[x] Prepare test destination instances (before we automate the tool)

Main flow diagram

Scenario example

{
  "scenarioName" : "Poc Scenario",
  "usedInstances" : [
    {
      "instanceName" : "airbyte_1",
      "instanceType" : "AIRBYTE"
    },
    {
      "instanceName" : "source_1",
      "instanceType" : "SOURCE"
    },
    {
      "instanceName": "destination_1",
      "instanceType": "DESTINATION"
    },
    {
      "instanceName": "connection_1",
      "instanceType": "CONNECTION"
    }
  ],
  "preparationActions" : [
    {
      "action" : "CONNECT_AIRBYTE_API",
      "resultInstance" : "airbyte_1"
    },
    {
      "action" : "CREATE_SOURCE",
      "requiredInstances" : ["airbyte_1"],
      "resultInstance" : "source_1"
    },
    {
      "action": "CREATE_DESTINATION",
      "requiredInstances" : ["airbyte_1"],
      "resultInstance": "destination_1"
    },
    {
      "action" : "CREATE_CONNECTION",
      "requiredInstances" : ["airbyte_1", "source_1", "destination_1"],
      "resultInstance" : "connection_1"
    }
  ],
  "scenarioActions" : [
    {
      "action" : "SYNC_CONNECTION",
      "requiredInstances" : ["airbyte_1", "connection_1"]
    }
  ]
}

Tasks

#15550
#15580
#15659
#15717

Stage 2. Credential customization - Done :heavy_check_mark:

This stage allows specifying the Airbyte instance, source, and destination credentials.

[x] Parsing incoming args
[x] Extend the readme by section with scenarios and call examples (with args)
[x] Retrieve credentials
- [x] Implement reading credentials from local files
- [x] Implement reading credentials from secret storage
- [x] Handle incoming Airbite instance credentials
- [x] Handle source/destination credentials
[x] Extend the scenario model to provide customizations for Actions
[x] Implement Update version scenario action
[x] Implement Scenario helper
[x] Add possible to call helper for a scenario

Tasks

#15836
#15834
#15883
#15875
#15953

Stage 3. Run configuration - In progress :building_construction:

[x] Extend scenario structure by description
[x] Show description and validation results in the help and list commands
[x] New credential type source_with_connector_settings
[x] Implement actions which can provide credentials
[ ] Implement new action create_custom_connector
[ ] New scenario for incremental sync
[ ] Add result parameter to the Scenario Action model
[ ] Implement new action get_source_version
[ ] Implement new action get_destination_version
[ ] Upgrade the version update scenarios by returning the original version after a run

Stage 4. Docker & CI - In progress :building_construction:

[ ] Configure docker
[x] Configure CI commands
[ ] Provide summary result class
[ ] Store result class into file
[ ] Read result class in the GA and put it into the commet

Tasks

#16071
#16124

Checkpoint

We have a fully operational E2E test tool that can interact with an existing Airbity instance and running sources or destinations. The CI commands and predefined configs allow us to run integration tests for specific source-destination combinations. In this state, we can already cover such cases:

Incompatible with other connectors (integration compatibility check)
Incompatible with Airbyte core (core compatibility check)

Stage 5. Result comparison

To detect possible issues in the new version, we should compare the results of the current version and the new version's results. If we don't expect any changes in the result, the structure and data should be equal. Note! Some changes lead to different results (like fixes). In this case, we will accept a flag like diff_is_expected.

[ ] Add the possibility to run a few different versions and collect their results
[ ] Implement common comparison logic

Stage 6. Benchmark

[ ] Integrate the benchmark framework with the testing tool

Stage 7. Autonomous run

[ ] Add possibility to up local Airbyte instance
[ ] Add possibility to up source/destination instances (Common logic with implementation few the most popular source/destination connectors)

Stage 8. Test data generation on the fly

[ ] Extend source/destination handler by testing data population methods
[ ] Design test data config files
[ ] Implement

Checkpoint

Here we have an automated testing tool that can be scheduled on CI tasks or run on demand with different configurations and data sets. The main advantage of the tool is true E2E. Such testing guarantee that we validate the whole system before a version release.

Aug 01 '22 09:08 DoNotPanicUA

@alexandr-shegeda Please review

Aug 02 '22 18:08 DoNotPanicUA

@DoNotPanicUA all looks good, the only suggestion is to move Stage 8. Test data population closer to 1-2 stages

Aug 02 '22 19:08 alexandr-shegeda

@DoNotPanicUA all looks good, the only suggestion is to move Stage 8. Test data population closer to 1-2 stages

This step means filling test data using config files. The tool will generate data on the fly. Before automatization and local run, we will prepare test data manually and reuse it. I will rephrase a bit to make it more clear.

Aug 02 '22 20:08 DoNotPanicUA

Tagging @bleonard, @sherifnada and @davinchia for review

Aug 04 '22 18:08 grishick

I like the approach. Please file Github issues for the first stage and include @davinchia and me as reviewers when creating PRs.

Aug 05 '22 15:08 grishick

Some suggestions:

Use the Octavia CLI! We have a CLI tool for setting up sources, destinations, and syncs. It might be helpful. This repo (https://github.com/airbytehq/airflow-summit-airbyte-2022) has some examples of automating the octavia CLI within Github Actions CI.
For setting up sample data, maybe source-faker can help - This source produces N "user", "purchase", and "product" records. They can be randomly seeded or with a fixed seed to always produce the same data.

Aug 05 '22 15:08 evantahler

Use the Octavia CLI!

+1 , using the CLI will reduce the maintenance burden in the case of Airbyte API evolutions: the CLI is responsible for adapting to Airbyte API changes.

Oct 03 '22 14:10 alafanechere

I've inspected the possibility of using Octavia CLI as part of the solution. I don't see a good integration between the E2E testing tool and Octavia CLI. But I assume that when I finish the original architecture and list the main use cases, we can decrease the tool's flexibility and reuse some other modules to improve the nonfunctional aspects of the tool.

Oct 03 '22 15:10 DoNotPanicUA

airbyte airbyte copied to clipboard

[EPIC] E2E testing tool

Tell us about the problem you're trying to solve

Describe the solution you’d like

Stage 1. POC - Done :heavy_check_mark:

Main flow diagram

Scenario example

Tasks

Stage 2. Credential customization - Done :heavy_check_mark:

Tasks

Stage 3. Run configuration - In progress :building_construction:

Stage 4. Docker & CI - In progress :building_construction:

Tasks

Checkpoint

Stage 5. Result comparison

Stage 6. Benchmark

Stage 7. Autonomous run

Stage 8. Test data generation on the fly

Checkpoint

airbyte
airbyte copied to clipboard