data-on-eks icon indicating copy to clipboard operation
data-on-eks copied to clipboard

[Feature] dbt on EMR on EKS

Open jaehyeon-kim opened this issue 2 years ago • 11 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

Describe the solution you would like

  • Extend the existing Spark Thrift Server class to run indefinitely and deploy as a Spark Job
  • Deploy a service with a load balancer for external connection
  • Set up a dbt project with the dbt-spark adapter where connection is made via the Spark Thrift Server.

Describe alternatives you have considered

  • Possibly using Apache Kyuubi for externalize the Spark Thrift Server but it'd be too much.

Additional context

  • I'm happy to contribute to this feature. I just need a bit of help.

jaehyeon-kim avatar Jan 23 '23 05:01 jaehyeon-kim

@jaehyeon-kim Thanks and dbt will be very useful for the community. Would this be possible with EMR ON EKS? If not show them with OSS Spark.

Let us know if you need any help.

vara-bonthu avatar Jan 25 '23 03:01 vara-bonthu

@vara-bonthu

The dbt-spark adapter supports odbc, thrift and http connection methods. Only the thrift method is supported for OSS Spark. If it is EMR on EC2, the spark thrift server can be started in the master node easily. However long running thrift server is not supported by EMR on EKS (Spark on Kubernetes in general) and we need a tweak. We can extend the existing spark thrift server class to run indefinitely. It works on my POC and I need someone who can help check the build configuration - I'm not a Java developer and it should be updated. Let me update it shortly.

jaehyeon-kim avatar Jan 30 '23 10:01 jaehyeon-kim

Hi @vara-bonthu

The menu bar and main page include existing sections. Which place would be good for dbt? Could you please create a skeleton for it if necessary? Or please inform me where to put the dbt contents.

image

image

jaehyeon-kim avatar Feb 07 '23 23:02 jaehyeon-kim

Could you please provide full details of your implementation so that i guide you accordingly?

If you are building a new Terraform blueprint for deploying dbt then you can place the code under https://github.com/awslabs/data-on-eks/tree/main/analytics/terraform/dbt-on-eks and the docs can go here -> https://github.com/awslabs/data-on-eks/tree/main/website/docs/spark-on-eks.

Please feel free to raise a PR so that i can suggest the location changes after reviewing the PR

vara-bonthu avatar Feb 10 '23 21:02 vara-bonthu

Hi @vara-bonthu

How are you?

Sorry for replying late. These days I find it hard to save time for this as my wife has a knee injury and I need to support her. Also I've got a 2-year-old baby who also needs care from me. I'll try to come back shortly with an example as per your comment.

Cheers, Jaehyeon

jaehyeon-kim avatar Mar 20 '23 05:03 jaehyeon-kim

Hey @jaehyeon-kim, Thanks for the response. No worries. Take your time and its not an urgent task.

vara-bonthu avatar Mar 21 '23 19:03 vara-bonthu

Hey, any update for the enhancement?

sunny-fffff avatar Jan 03 '24 09:01 sunny-fffff

I didn't have time as I worked more on real time processing. Now I return to work on dbt a bit and would be able to update it. Let me keep you updated.

jaehyeon-kim avatar Jan 03 '24 22:01 jaehyeon-kim

I am working on Kyuubi with EMR on EKS, which supports JDBC/Thrift/HTTP connections. Will that meet dbt's need?

melodyyangaws avatar Mar 28 '24 01:03 melodyyangaws

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] avatar May 15 '24 00:05 github-actions[bot]

Hello, Any update for the enhancement?

beobest2 avatar May 31 '24 15:05 beobest2