airbyte
airbyte copied to clipboard
[EPIC] E2E testing tool
Tell us about the problem you're trying to solve
Our main goal is to implement an E2E testing tool that should test Airbyte connections via CI. This testing tool will test different connector versions close to the user experience. Such E2E testing will help us to detect possible integration issues before a version release. Potential issues:
- Performance degradation (Benchmark)
- Critical changes (backward compatibility check)
- Incompatible with other connectors (integration compatibility check)
- Incompatible with Airbyte core (core compatibility check)
Note! This solution is inspired by the previously designed similar tool. #8243
Describe the solution you’d like
Stage 1. POC - Done :heavy_check_mark:
The initial stage provides us with fundamental functionality. In addition, it's enough to start integration with potential benchmark frameworks.
- [x] Configure the new project in a separate repository - Repository
- [x] Fulfill readme
- [x] Implement common core
-
- [x] Scenario model
-
- [x] Scenario consistency validation
-
- [x] Scenario config parser
-
- [x] Scenario executor
-
- [x] Credential model
-
- [x] Credential config parser (local)
-
- [x] Mapper credentials and scenario
-
- [x] Make log formatting similar to Airbyte
- [x] Implement basic sync runner
-
- [x] Connect to existing Airbyte instance
-
- [x] Create Source
-
- [x] Create Destination
-
- [x] Create Connection
-
- [x] Run sync
-
- [x] Return sync result (is successful)
- [x] Prepare test Airbyte instance (before we automate the tool)
- [x] Prepare test source instances with test data (before we automate the tool)
- [x] Prepare test destination instances (before we automate the tool)
Main flow diagram
Scenario example
{
"scenarioName" : "Poc Scenario",
"usedInstances" : [
{
"instanceName" : "airbyte_1",
"instanceType" : "AIRBYTE"
},
{
"instanceName" : "source_1",
"instanceType" : "SOURCE"
},
{
"instanceName": "destination_1",
"instanceType": "DESTINATION"
},
{
"instanceName": "connection_1",
"instanceType": "CONNECTION"
}
],
"preparationActions" : [
{
"action" : "CONNECT_AIRBYTE_API",
"resultInstance" : "airbyte_1"
},
{
"action" : "CREATE_SOURCE",
"requiredInstances" : ["airbyte_1"],
"resultInstance" : "source_1"
},
{
"action": "CREATE_DESTINATION",
"requiredInstances" : ["airbyte_1"],
"resultInstance": "destination_1"
},
{
"action" : "CREATE_CONNECTION",
"requiredInstances" : ["airbyte_1", "source_1", "destination_1"],
"resultInstance" : "connection_1"
}
],
"scenarioActions" : [
{
"action" : "SYNC_CONNECTION",
"requiredInstances" : ["airbyte_1", "connection_1"]
}
]
}
Tasks
- #15550
- #15580
- #15659
- #15717
Stage 2. Credential customization - Done :heavy_check_mark:
This stage allows specifying the Airbyte instance, source, and destination credentials.
- [x] Parsing incoming args
- [x] Extend the readme by section with scenarios and call examples (with args)
- [x] Retrieve credentials
-
- [x] Implement reading credentials from local files
-
- [x] Implement reading credentials from secret storage
-
- [x] Handle incoming Airbite instance credentials
-
- [x] Handle source/destination credentials
- [x] Extend the scenario model to provide customizations for Actions
- [x] Implement
Update version
scenario action - [x] Implement Scenario helper
- [x] Add possible to call helper for a scenario
Tasks
- #15836
- #15834
- #15883
- #15875
- #15953
Stage 3. Run configuration - In progress :building_construction:
- [x] Extend scenario structure by description
- [x] Show description and validation results in the help and list commands
- [x] New credential type
source_with_connector_settings
- [x] Implement actions which can provide credentials
- [ ] Implement new action
create_custom_connector
- [ ] New scenario for incremental sync
- [ ] Add result parameter to the Scenario Action model
- [ ] Implement new action
get_source_version
- [ ] Implement new action
get_destination_version
- [ ] Upgrade the version update scenarios by returning the original version after a run
Stage 4. Docker & CI - In progress :building_construction:
- [ ] Configure docker
- [x] Configure CI commands
- [ ] Provide summary result class
- [ ] Store
result class
into file - [ ] Read
result class
in the GA and put it into the commet
Tasks
- #16071
- #16124
Checkpoint
We have a fully operational E2E test tool that can interact with an existing Airbity instance and running sources or destinations. The CI commands and predefined configs allow us to run integration tests for specific source-destination combinations. In this state, we can already cover such cases:
- Incompatible with other connectors (integration compatibility check)
- Incompatible with Airbyte core (core compatibility check)
Stage 5. Result comparison
To detect possible issues in the new version, we should compare the results of the current version and the new version's results. If we don't expect any changes in the result, the structure and data should be equal.
Note! Some changes lead to different results (like fixes). In this case, we will accept a flag like diff_is_expected
.
- [ ] Add the possibility to run a few different versions and collect their results
- [ ] Implement common comparison logic
Stage 6. Benchmark
- [ ] Integrate the benchmark framework with the testing tool
Stage 7. Autonomous run
- [ ] Add possibility to up local Airbyte instance
- [ ] Add possibility to up source/destination instances (Common logic with implementation few the most popular source/destination connectors)
Stage 8. Test data generation on the fly
- [ ] Extend source/destination handler by testing data population methods
- [ ] Design test data config files
- [ ] Implement
Checkpoint
Here we have an automated testing tool that can be scheduled on CI tasks or run on demand with different configurations and data sets. The main advantage of the tool is true E2E. Such testing guarantee that we validate the whole system before a version release.
@alexandr-shegeda Please review
@DoNotPanicUA all looks good, the only suggestion is to move Stage 8. Test data population
closer to 1-2 stages
@DoNotPanicUA all looks good, the only suggestion is to move
Stage 8. Test data population
closer to 1-2 stages
This step means filling test data using config files. The tool will generate data on the fly. Before automatization and local run, we will prepare test data manually and reuse it. I will rephrase a bit to make it more clear.
Tagging @bleonard, @sherifnada and @davinchia for review
I like the approach. Please file Github issues for the first stage and include @davinchia and me as reviewers when creating PRs.
Some suggestions:
- Use the Octavia CLI! We have a CLI tool for setting up sources, destinations, and syncs. It might be helpful. This repo (https://github.com/airbytehq/airflow-summit-airbyte-2022) has some examples of automating the octavia CLI within Github Actions CI.
- For setting up sample data, maybe
source-faker
can help - This source produces N "user", "purchase", and "product" records. They can be randomly seeded or with a fixed seed to always produce the same data.
Use the Octavia CLI!
+1 , using the CLI will reduce the maintenance burden in the case of Airbyte API evolutions: the CLI is responsible for adapting to Airbyte API changes.
I've inspected the possibility of using Octavia CLI as part of the solution. I don't see a good integration between the E2E testing tool and Octavia CLI. But I assume that when I finish the original architecture and list the main use cases, we can decrease the tool's flexibility and reuse some other modules to improve the nonfunctional aspects of the tool.