hyperdrive icon indicating copy to clipboard operation
hyperdrive copied to clipboard

Extensible streaming ingestion pipeline on top of Apache Spark

Results 11 hyperdrive issues
Sort by recently updated
recently updated
newest added

Bumps org.apache.commons:commons-configuration2 from 2.7 to 2.10.1. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.apache.commons:commons-configuration2&package-manager=maven&previous-version=2.7&new-version=2.10.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a...

dependencies

Sometimes, docker tests using testcontainers fail with an error message like org.testcontainers.containers.ContainerLaunchException: Timed out waiting for URL to be accessible This may be fixed by adding a startup delay, e.g....

When changing https://github.com/AbsaOSS/hyperdrive/blob/develop/driver/src/test/scala/za/co/absa/hyperdrive/driver/drivers/KafkaToKafkaDeduplicationAfterRetryDockerTest.scala#L73-L74 to ``` "transformer.[kafka.deduplicator].source.id.columns" -> "value.record_id", "transformer.[kafka.deduplicator].destination.id.columns" -> "value.record_id" ``` the test fails with the following exception: ``` org.apache.spark.SparkException: Malformed records are detected in record parsing. Caused by:...

enhancement

**Description** Currently, component factories are loaded in `ClassLoaderUtils` given their fully qualified classnames. The classname is passed by the configuration. (e.g. `component.writer`) That means that components don't have the possibility...

enhancement

**Problem description** Spark does not provide an exactly-once behaviour for the Kafka sink, but only at-least-once, and will probably never do so (https://github.com/apache/spark/pull/25618). Under certain assumptions (no concurrent producers, only...

enhancement

With #227, the generated avro schema is duplicated in any case, even if it doesn't need to be updated. The exact same schema should be created if no default values...

bug

Investigate how `Dataset.observe` could be used to integrate with Atum

Currently, some tests, e.g. `TestKafkaStreamReader` are too mocked. The component's code almost has to be replicated in the mocks of the test. Furthermore, the tests are too rigid, e.g. it...

internal-task

Currently, constants are sometimes in java convention (uppercase), or like a variable. Per scala standard, they should be camel case, starting with a capital letter.

internal-task