kaskada
kaskada copied to clipboard
feat: support fenl types in `Collection`
FenlType::Collection
currently only supports type variables, meaning we can't do map<i32, V>
type signatures. We should refactor the Collection
type to accept FenlTypes
instead of just type vars, then clean up the parsing and inference code around that.
Tasks:
- [x] Support
map<k,v>
in type inference - [ ] Support
DataType::Map
inmerge
(spread operation)- [x] Unlatched (https://github.com/kaskada-ai/kaskada/pull/529)
- [ ] Latched
- [ ] Support
DataType::Map
infirst/last
aggregations- [x] non-windowed https://github.com/kaskada-ai/kaskada/pull/540
- [ ] since window
- [ ] sliding window
- [x] Support
list<e>
in type inference https://github.com/kaskada-ai/kaskada/pull/562 - [ ] Support
DataType::List
inmerge
(spread operation) - [ ] Support
DataType::List
in aggregations- [x] (non-windowed) First/Last https://github.com/kaskada-ai/kaskada/pull/592
- [ ]
List
construction usingcollect
function- [x] Doc: https://docs.google.com/document/d/1P6sogDYfq31n3LXe3WViDcHWGBfHRxNZwd1znjmagQk/edit
- [x] Record Types https://github.com/kaskada-ai/kaskada/pull/602
- [x] Primitive types https://github.com/kaskada-ai/kaskada/pull/569
- [ ] Complex/Nested types
- [x] Non-windowed https://github.com/kaskada-ai/kaskada/pull/569
- [x] Since Window https://github.com/kaskada-ai/kaskada/pull/583
- [ ] Sliding Windows
- [ ]
Map
construction- [ ] TODO: Doc
- [ ] Primitive Types
- [ ] Complex/Nested Types
- [ ] Brackets for access:
map[key]
list[index]
- [ ] Add
get
by key function for maps (get(key, map)
) (https://github.com/kaskada-ai/kaskada/pull/532)- [x] Support boolean/string/primitives
- [ ] Complex/Nested types
- [x] Add
get
by index for lists https://github.com/kaskada-ai/kaskada/pull/562 - [ ] Verify all normal functions interact with collection types as expected
- [ ] Support
LargetUtf8
- Not done in multiple aggregations yet
- [ ] Support Map Equality in Arrow
eq_dyn
kernel: https://docs.rs/arrow-ord/44.0.0/src/arrow_ord/comparison.rs- Easily done if the
map
is ordered, but trickier to do (efficiently) if not:[{f1: _, f2: _}]
must still equal[{f2: _, f1: _}]
.
- Easily done if the
- [x] Improve map evaluators -- use
GetIndex
andtake
kernel to reduce duplication of evaluators/improve iteration - [ ] Test on other sample datasets
- [ ] Using
Dictionary
types for efficientList
(andMap
?) aggregation - [ ] Other functions
- keys()
- values()
- entries()
- getEntryFromKey/Value()
- [x] list_len() https://github.com/kaskada-ai/kaskada/pull/609
- reverse()
-
column<T: Struct, N: any>(input: list<T>, field: string) -> list<N>
- takes a list of columns and gets the value for the field for each struct in each list, then collects them to a
list<N>
- takes a list of columns and gets the value for the field for each struct in each list, then collects them to a
- Union Lists together
- Having lists from multiple expressions and wanting to union them together to operate on a single list.
Map and list types are not displayed correctly in the schema (see readings
)