cuallee
cuallee copied to clipboard
Bump getdaft from 0.2.19 to 0.2.24
Bumps getdaft from 0.2.19 to 0.2.24.
Release notes
Sourced from getdaft's releases.
v0.2.24
Changes
✨ New Features
- [FEAT] Allow returning of pyarrow arrays from UDFs
@jaychia
(#2252)- [FEAT] Add left, right, and outer joins
@kevinzwang
(#2166)- [FEAT] Add rpad and lpad expressions
@murex971
(#2157)- [FEAT] AWS Profile override in S3Config
@samster25
(#2243)- [FEAT] Add unpivot
@kevinzwang
(#2204)- [FEAT] Add string repeat functionality
@murex971
(#2198)- [FEAT] Approximate quantile aggregation (pulled into main)
@jaychia
(#2179)- [FEAT] pivot
@colin-ho
(#2183)🚀 Performance Improvements
- [PERF] Adaptive Query Execution
@samster25
(#2176)- [PERF]: swap out json_deserializer for simd_json
@universalmind303
(#2228)- [PERF] Evaluate only true/false side of if_else if predicate is boolean
@colin-ho
(#2222)- [PERF] enable metadata preservation across materialization points
@samster25
(#2216)👾 Bug Fixes
- [BUG] Fix tab completion on expression namespaced accessors
@jaychia
(#2251)- [BUG] route abfss to AzureBlob
@samster25
(#2244)📖 Documentation
- [CHORE] Skip demo notebook
@jaychia
(#2248)- [FEAT] Add rpad and lpad expressions
@murex971
(#2157)- [DOCS] Add user guide for read_sql
@colin-ho
(#2226)- [FEAT] Add unpivot
@kevinzwang
(#2204)- [DOCS] Add
read_hudi
in the api docs@xushiyan
(#2225)- [FEAT] Add string repeat functionality
@murex971
(#2198)- [DOCS] LinkedIn Big Data meetup tutorial
@jaychia
(#2223)- [FEAT] Approximate quantile aggregation (pulled into main)
@jaychia
(#2179)- [DOCS] Add read_lance docs
@jaychia
(#2218)- [FEAT] pivot
@colin-ho
(#2183)🧰 Maintenance
- [CHORE] Drop Python 3.7
@samster25
(#2250)- [CHORE] Improve timestamp repr
@colin-ho
(#2245)- [CHORE] Allow multiple group_bys for pivot
@colin-ho
(#2242)- [CHORE] Skip demo notebook
@jaychia
(#2248)- [CHORE] Return &str for expression name
@colin-ho
(#2224)- [CHORE] Mount provision.py for iceberg integration tests
@jaychia
(#2232)- [CHORE]: remove trait aliases
@universalmind303
(#2229)⬆️ Dependencies
... (truncated)
Commits
62f9dd6
[FEAT] Allow returning of pyarrow arrays from UDFs (#2252)f1d6570
[CHORE] Drop Python 3.7 (#2250)3b1a5ca
[BUG] Fix tab completion on expression namespaced accessors (#2251)0541774
[CHORE] Improve timestamp repr (#2245)b61461f
[PERF] Adaptive Query Execution (#2176)89e3916
[FEAT] Add left, right, and outer joins (#2166)e47b48a
[CHORE] Allow multiple group_bys for pivot (#2242)252721e
[CHORE] Skip demo notebook (#2248)d035454
[FEAT] Add rpad and lpad expressions (#2157)3e9dcd4
[BUG] route abfss to AzureBlob (#2244)- Additional commits viewable in compare view
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
-
@dependabot rebase
will rebase this PR -
@dependabot recreate
will recreate this PR, overwriting any edits that have been made to it -
@dependabot merge
will merge this PR after your CI passes on it -
@dependabot squash and merge
will squash and merge this PR after your CI passes on it -
@dependabot cancel merge
will cancel a previously requested merge and block automerging -
@dependabot reopen
will reopen this PR if it is closed -
@dependabot close
will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually -
@dependabot show <dependency name> ignore conditions
will show all of the ignore conditions of the specified dependency -
@dependabot ignore this major version
will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) -
@dependabot ignore this minor version
will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) -
@dependabot ignore this dependency
will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
@canimus
I have tried to bump the version of daft
from 0.2.19
to 0.2.24
on python version 3.11.9
, I can not pass any test successfully.
For example, the current implementation of the has_min
check:
def has_min(self, rule: Rule, dataframe: daft.DataFrame) -> Union[bool, int]:
perdicate = daft.col(rule.column).min()
return dataframe.select(perdicate).to_pandas().iloc[0, 0] == rule.value
It will raise this error:
cuallee\__init__.py:1199: in validate
return self.compute_engine.summary(self, dataframe)
cuallee\daft_validation.py:513: in summary
unified_results = {
cuallee\daft_validation.py:514: in <dictcomp>
rule.key: [operator.methodcaller(rule.method, rule, dataframe)(compute)]
cuallee\daft_validation.py:98: in has_min
return dataframe.select(perdicate).to_pandas().iloc[0, 0] == rule.value
.venv\Lib\site-packages\daft\api_annotations.py:26: in _wrap
return timed_method(*args, **kwargs)
.venv\Lib\site-packages\daft\analytics.py:189: in tracked_method
result = method(*args, **kwargs)
.venv\Lib\site-packages\daft\dataframe\dataframe.py:662: in select
builder = self._builder.select(self.__column_input_to_expression(columns))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = * Source:
| Number of partitions = 1
| Output schema = id#Int32
to_select = [min(col(id))]
def select(
self,
to_select: list[Expression],
) -> LogicalPlanBuilder:
to_select_pyexprs = [expr._expr for expr in to_select]
> builder = self._builder.select(to_select_pyexprs)
E daft.exceptions.DaftCoreException: DaftError::ValueError Aggregation expressions are not currently supported in project: min(col(id))
E If you would like to have this feature, please see https://github.com/Eventual-Inc/Daft/issues/1979#issue-2170913383
There are some internal changes in Daft package that might cause this, there are more information here. I have also commented this there.
Suggestion
This method (and others) should be update like this:
def has_min(self, rule: Rule, dataframe: daft.DataFrame) -> Union[bool, int]:
col = daft.col(rule.column)
return dataframe.select(col).min(col).to_pandas().iloc[0, 0] == rule.value
@canimus based on the information provided by @kevinzwang, in here; I need to modify the methods in daft_validation.py
to solve this problem.
It seems the suggested workaround above is going to be backward compatible.
Looks like getdaft is up-to-date now, so this is no longer needed.