distributed-dataset
distributed-dataset copied to clipboard
SQL execution
This is one of the more exciting features.
Apache Spark has support for running SQL queries at runtime in an untyped fashion. It is quite useful when exploring the data or for ad-hoc queries. See: https://spark.apache.org/docs/latest/sql-programming-guide.html
We should be able to implement a function like runSQL :: String -> Dataset Row -> Dataset Row where Row is an untyped data structure that can represent arbitrary products like aeson's Value.
If we implement this in distributed-dataset, with some modifications we might even be able to use ghci or IHaskell to run queries.