Damon P. Cortesi
Damon P. Cortesi
With the recent `--show-logs` flag, we switch the deploy mode to `client` so that EMR steps can capture the driver `stdout`. Unfortunately, `--client` mode doesn't work with additional archives provided...
Not everybody wants to use Docker to build artifacts, or in some cases like in a CI pipeline it may be undesirable. We should add support for some sort of...
Currently, the `deploy` command assumes that the project has already been built or packaged. We should either (or both): - Add some safeguards to prevent deploying if the artifacts don't...
`configuration_overrides` is a required field in `EmrServerlessStartJobOperator`, but [it's not a required field](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_StartJobRun.html#API_StartJobRun_RequestBody). Error message when trying to start a job without it: ``` E airflow.exceptions.AirflowException: Argument ['configuration_overrides'] is required...
## Description When running the `spark-k8s-operator` example on terraform enterprise, the apply fails with the following error. ``` Error: Kubernetes cluster unreachable: Get "https://.sk1.us-east-1.eks.amazonaws.com/version": getting credentials: exec: executable aws not...
By default, we do a `SELECT *` to query Athena. DuckDB can send the projected columns to the extension, which is what the [postgres scanner does](https://github.com/duckdblabs/postgres_scanner/blob/6745dd5f9e6f80cce68babdaf21ce719d3279c1d/postgres_scanner.cpp#L48). We should do similar.
Currently this extension scans an entire Athena table, which is extremely inefficient. 😁 DuckDB can send the table filters to the extension, which is what the [postgres scanner does](https://github.com/duckdblabs/postgres_scanner/blob/6745dd5f9e6f80cce68babdaf21ce719d3279c1d/postgres_scanner.cpp#L421). We...
`athena_scan` is the most basic thing to implement, but scans an entire table. Unfortunately, the way Athena works, it will be difficult to optimize that for large tables. And in...
Currently only the `default` database is supported. We should add support for selecting data from different databases. Maybe simply as part of the first `athena_scan` parameter? So instead of `athena_scan("table_name")`...