empirical-lang icon indicating copy to clipboard operation
empirical-lang copied to clipboard

Roadmap to Production

Open chrisaycock opened this issue 5 years ago • 2 comments

Major

Each of these are expected to take multiple weeks worth of work. This list is roughly in order of how critical the feature is.

  • ~~Software engineering (CI, more regression tests)~~
  • ~~Generically-typed functions~~
  • ~~Load with user-provided types~~
  • Streaming evaluation (chunked, continuous)
  • Memory management (reference counting, copy-on-write)
  • JIT
  • Keyword arguments and default arguments
  • Loaders for Parquet (Arrow), JSON, and others?
  • Compile to SQL
  • Time-series features (window join, timezones, Year/Month, custom business days?)
  • Indexing (boolean, slice, multi, on DF)
  • Modules
  • Error messages (AST needs filename, lineno, colno, length)
  • REPL (multiline support, tab completion)
  • Numbers (arbitrary precision, fixed sizes other than 64 bits, endianness)
  • Installers for package managers (eg., Homebrew, yum/apt)
  • Nested types and arrays
  • Dictionaries
  • FFI
  • SIMD
  • Data munging (string handling routines, PCRE)
  • Anonymous functions and closures
  • Categoricals
  • Resampling and pivot tables
  • Unicode

Minor

These are expected to take only a few days each. They are in no particular order.

  • ~~type_of~~
  • ~~Enforce mutability (let vs var)~~
  • Locale
  • Deallocate ASDL nodes
  • Prohibit del outside a value's immediate scope of declaration
  • Flexible join parameters (left_on/right_on?)
  • Check assignment to columns are same length across DF
  • Short-circuit operators (andthen, orelse)
  • Moving and cumulative functions
  • map and reduce
  • Random numbers (normal and uniform distribution)
  • for loops
  • update for Dataframes

Trivial

Most of these are expected to take only a few hours. They are intentionally left open to be friendly for first-time contributors.

  • ~~columns~~
  • ~~trig functions~~
  • ~~mean~~, ~~variance~~, ~~stddev~~
  • ~~reverse~~, distinct
  • append
  • pow, exp, log, sqrt
  • prev, next, rotate, shift
  • abs
  • first, last, take, repeat
  • max, min (one and two parameters)
  • percentile, median
  • cov, corr
  • wavg
  • floor, ceil
  • any, all
  • fill, ffill, bfill, is_nil
  • deltas, differ
  • except
  • join, split
  • upper, lower
  • flatten
  • additional sorting (descending) and searching

chrisaycock avatar May 20 '19 00:05 chrisaycock

Loaders for Parquet (Arrow) and others?

I'm partial to the fst-format.

bobjansen avatar May 25 '19 20:05 bobjansen

It might be helpful to read the Brown Benchmark for Table Types to see how some of the items in this Roadmap can be represented. See also the Dataframe Algebra.

Additionally, the H20.ai DB Benchmarks can give us an idea of performance.

chrisaycock avatar Dec 16 '21 17:12 chrisaycock