distributed-dataset icon indicating copy to clipboard operation
distributed-dataset copied to clipboard

SQL execution

Open utdemir opened this issue 6 years ago • 0 comments

This is one of the more exciting features.

Apache Spark has support for running SQL queries at runtime in an untyped fashion. It is quite useful when exploring the data or for ad-hoc queries. See: https://spark.apache.org/docs/latest/sql-programming-guide.html

We should be able to implement a function like runSQL :: String -> Dataset Row -> Dataset Row where Row is an untyped data structure that can represent arbitrary products like aeson's Value.

If we implement this in distributed-dataset, with some modifications we might even be able to use ghci or IHaskell to run queries.

utdemir avatar Jun 27 '19 04:06 utdemir