Oliver Kennedy
Oliver Kennedy
Should be evaluated by spark. ``` mimir> select row_number(), cast(h as float)*60*60+cast(m as float)*60+cast(s as float)-75359.239 from extract_warm_start; java.lang.RuntimeException: Error Decoding ROW_NUMBER (int) at mimir.exec.result.LazyRow.apply(LazyRow.scala:22) at mimir.exec.PrettyOutputFormat$$anonfun$print$4$$anonfun$apply$1.apply$mcVI$sp(OutputFormat.scala:81) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at...
Challenge: Ordering matters for storage providers. Need to figure out how to specify a priority order for the spark provider. MimirVizier should use a command-line parameter to figure out whether...
As of right now, [CTExplainer](https://github.com/UBOdin/mimir/blob/master/src/main/scala/mimir/ctables/CTExplainer.scala)'s explainRow and explainCell methods rely on a hack to compute statistical metrics for values. Now that we have TupleBundler and compileForSamples() (at least in the...
Would be nice if the parser could read in JSON... would help with inline testing, as well as for passing configuration parameters to Lenses.
The shape watcher lens currently runs a Count Distinct query during the training phase to discover categorical attributes. This is not great for large datasets. Fortunately, we don't care about...
It would be nifty if we could have some way to easily define data validation expressions. For example ``` ASSERT A + B + C = TOTAL IN grades ```
Possibly required for #361 Would subsume #333 Mimir has a much cruder internal type system than [Spark](https://github.com/apache/spark/tree/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types). In addition to lacking collection types, there's a lot of capabilities (e.g., integer...
``` [9] | LOAD DATASET deposits FROM https://odin.cse.buffalo.edu/public_data/Deposits_2018.csv ``` Detect headers is enabled, and indeed, the column headers are all extracted correctly. However, the first row remains included in the...
The following query behaves differently depending on whether `CAST` is evaluated in Mimir or in Spark. ``` SELECT CAST('.' AS int) ``` In Mimir it returns `NULL`, while in Spark...