kaskada icon indicating copy to clipboard operation
kaskada copied to clipboard

feat: support fenl types in `Collection`

Open jordanrfrazier opened this issue 1 year ago • 1 comments

FenlType::Collection currently only supports type variables, meaning we can't do map<i32, V> type signatures. We should refactor the Collection type to accept FenlTypes instead of just type vars, then clean up the parsing and inference code around that.

Tasks:

  • [x] Support map<k,v> in type inference
  • [ ] Support DataType::Map in merge (spread operation)
    • [x] Unlatched (https://github.com/kaskada-ai/kaskada/pull/529)
    • [ ] Latched
  • [ ] Support DataType::Map in first/last aggregations
    • [x] non-windowed https://github.com/kaskada-ai/kaskada/pull/540
    • [ ] since window
    • [ ] sliding window
  • [x] Support list<e> in type inference https://github.com/kaskada-ai/kaskada/pull/562
  • [ ] Support DataType::List in merge (spread operation)
  • [ ] Support DataType::List in aggregations
    • [x] (non-windowed) First/Last https://github.com/kaskada-ai/kaskada/pull/592
  • [ ] List construction using collect function
    • [x] Doc: https://docs.google.com/document/d/1P6sogDYfq31n3LXe3WViDcHWGBfHRxNZwd1znjmagQk/edit
    • [x] Record Types https://github.com/kaskada-ai/kaskada/pull/602
    • [x] Primitive types https://github.com/kaskada-ai/kaskada/pull/569
    • [ ] Complex/Nested types
    • [x] Non-windowed https://github.com/kaskada-ai/kaskada/pull/569
    • [x] Since Window https://github.com/kaskada-ai/kaskada/pull/583
    • [ ] Sliding Windows
  • [ ] Map construction
    • [ ] TODO: Doc
    • [ ] Primitive Types
    • [ ] Complex/Nested Types
  • [ ] Brackets for access: map[key] list[index]
  • [ ] Add get by key function for maps (get(key, map)) (https://github.com/kaskada-ai/kaskada/pull/532)
    • [x] Support boolean/string/primitives
    • [ ] Complex/Nested types
  • [x] Add get by index for lists https://github.com/kaskada-ai/kaskada/pull/562
  • [ ] Verify all normal functions interact with collection types as expected
  • [ ] Support LargetUtf8
    • Not done in multiple aggregations yet
  • [ ] Support Map Equality in Arrow eq_dyn kernel: https://docs.rs/arrow-ord/44.0.0/src/arrow_ord/comparison.rs
    • Easily done if the map is ordered, but trickier to do (efficiently) if not: [{f1: _, f2: _}] must still equal [{f2: _, f1: _}].
  • [x] Improve map evaluators -- use GetIndex and take kernel to reduce duplication of evaluators/improve iteration
  • [ ] Test on other sample datasets
  • [ ] Using Dictionary types for efficient List (and Map?) aggregation
  • [ ] Other functions
    • keys()
    • values()
    • entries()
    • getEntryFromKey/Value()
    • [x] list_len() https://github.com/kaskada-ai/kaskada/pull/609
    • reverse()
    • column<T: Struct, N: any>(input: list<T>, field: string) -> list<N>
      • takes a list of columns and gets the value for the field for each struct in each list, then collects them to a list<N>
    • Union Lists together
      • Having lists from multiple expressions and wanting to union them together to operate on a single list.

jordanrfrazier avatar Jul 10 '23 22:07 jordanrfrazier

Map and list types are not displayed correctly in the schema (see readings) Screen Shot 2023-07-28 at 11 11 34 AM

jordanrfrazier avatar Jul 28 '23 18:07 jordanrfrazier