Damon P. Cortesi issues

Results 46 issues of


                                            Damon P. Cortesi

Add support for `--show-logs` in cluster mode on EMR on EC2

With the recent `--show-logs` flag, we switch the deploy mode to `client` so that EMR steps can capture the driver `stdout`. Unfortunately, `--client` mode doesn't work with additional archives provided...

enhancement

Add support for local builds

Not everybody wants to use Docker to build artifacts, or in some cases like in a CI pipeline it may be undesirable. We should add support for some sort of...

enhancement

Add a build flag on deploy

Currently, the `deploy` command assumes that the project has already been built or packaged. We should either (or both): - Add some safeguards to prevent deploying if the artifacts don't...

enhancement

configuration_overrides shouldn't be required

`configuration_overrides` is a required field in `EmrServerlessStartJobOperator`, but [it's not a required field](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_StartJobRun.html#API_StartJobRun_RequestBody). Error message when trying to start a job without it: ``` E airflow.exceptions.AirflowException: Argument ['configuration_overrides'] is required...

spark-k8s-operator requires the awscli, which doesn't work on terraform enterprise

## Description When running the `spark-k8s-operator` example on terraform enterprise, the apply fails with the following error. ``` Error: Kubernetes cluster unreachable: Get "https://.sk1.us-east-1.eks.amazonaws.com/version": getting credentials: exec: executable aws not...

Bump a bunch of actions versions and format workflow files

Add column selection

By default, we do a `SELECT *` to query Athena. DuckDB can send the projected columns to the extension, which is what the [postgres scanner does](https://github.com/duckdblabs/postgres_scanner/blob/6745dd5f9e6f80cce68babdaf21ce719d3279c1d/postgres_scanner.cpp#L48). We should do similar.

Add pushdown support

Currently this extension scans an entire Athena table, which is extremely inefficient. 😁 DuckDB can send the table filters to the extension, which is what the [postgres scanner does](https://github.com/duckdblabs/postgres_scanner/blob/6745dd5f9e6f80cce68babdaf21ce719d3279c1d/postgres_scanner.cpp#L421). We...

[design] Decide on other functions to implement

`athena_scan` is the most basic thing to implement, but scans an entire table. Unfortunately, the way Athena works, it will be difficult to optimize that for large tables. And in...

Add support for other databases

Currently only the `default` database is supported. We should add support for selecting data from different databases. Maybe simply as part of the first `athena_scan` parameter? So instead of `athena_scan("table_name")`...