hyperdrive
hyperdrive copied to clipboard
Extensible streaming ingestion pipeline on top of Apache Spark
Bumps org.apache.commons:commons-configuration2 from 2.7 to 2.10.1. [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a...
Sometimes, docker tests using testcontainers fail with an error message like org.testcontainers.containers.ContainerLaunchException: Timed out waiting for URL to be accessible This may be fixed by adding a startup delay, e.g....
When changing https://github.com/AbsaOSS/hyperdrive/blob/develop/driver/src/test/scala/za/co/absa/hyperdrive/driver/drivers/KafkaToKafkaDeduplicationAfterRetryDockerTest.scala#L73-L74 to ``` "transformer.[kafka.deduplicator].source.id.columns" -> "value.record_id", "transformer.[kafka.deduplicator].destination.id.columns" -> "value.record_id" ``` the test fails with the following exception: ``` org.apache.spark.SparkException: Malformed records are detected in record parsing. Caused by:...
**Description** Currently, component factories are loaded in `ClassLoaderUtils` given their fully qualified classnames. The classname is passed by the configuration. (e.g. `component.writer`) That means that components don't have the possibility...
**Problem description** Spark does not provide an exactly-once behaviour for the Kafka sink, but only at-least-once, and will probably never do so (https://github.com/apache/spark/pull/25618). Under certain assumptions (no concurrent producers, only...
With #227, the generated avro schema is duplicated in any case, even if it doesn't need to be updated. The exact same schema should be created if no default values...
Investigate how `Dataset.observe` could be used to integrate with Atum
Currently, some tests, e.g. `TestKafkaStreamReader` are too mocked. The component's code almost has to be replicated in the mocks of the test. Furthermore, the tests are too rigid, e.g. it...
Currently, constants are sometimes in java convention (uppercase), or like a variable. Per scala standard, they should be camel case, starting with a capital letter.