cuallee Bump getdaft from 0.2.19 to 0.2.24

Bumps getdaft from 0.2.19 to 0.2.24.

Release notes

v0.2.24

Changes

✨ New Features

[FEAT] Allow returning of pyarrow arrays from UDFs @jaychia (#2252)

[FEAT] Add left, right, and outer joins @kevinzwang (#2166)

[FEAT] Add rpad and lpad expressions @murex971 (#2157)

[FEAT] AWS Profile override in S3Config @samster25 (#2243)

[FEAT] Add unpivot @kevinzwang (#2204)

[FEAT] Add string repeat functionality @murex971 (#2198)

[FEAT] Approximate quantile aggregation (pulled into main) @jaychia (#2179)

[FEAT] pivot @colin-ho (#2183)

🚀 Performance Improvements

[PERF] Adaptive Query Execution @samster25 (#2176)

[PERF]: swap out json_deserializer for simd_json @universalmind303 (#2228)

[PERF] Evaluate only true/false side of if_else if predicate is boolean @colin-ho (#2222)

[PERF] enable metadata preservation across materialization points @samster25 (#2216)

👾 Bug Fixes

[BUG] Fix tab completion on expression namespaced accessors @jaychia (#2251)

[BUG] route abfss to AzureBlob @samster25 (#2244)

📖 Documentation

[CHORE] Skip demo notebook @jaychia (#2248)

[FEAT] Add rpad and lpad expressions @murex971 (#2157)

[DOCS] Add user guide for read_sql @colin-ho (#2226)

[FEAT] Add unpivot @kevinzwang (#2204)

[DOCS] Add read_hudi in the api docs @xushiyan (#2225)

[FEAT] Add string repeat functionality @murex971 (#2198)

[DOCS] LinkedIn Big Data meetup tutorial @jaychia (#2223)

[FEAT] Approximate quantile aggregation (pulled into main) @jaychia (#2179)

[DOCS] Add read_lance docs @jaychia (#2218)

[FEAT] pivot @colin-ho (#2183)

🧰 Maintenance

[CHORE] Drop Python 3.7 @samster25 (#2250)

[CHORE] Improve timestamp repr @colin-ho (#2245)

[CHORE] Allow multiple group_bys for pivot @colin-ho (#2242)

[CHORE] Skip demo notebook @jaychia (#2248)

[CHORE] Return &str for expression name @colin-ho (#2224)

[CHORE] Mount provision.py for iceberg integration tests @jaychia (#2232)

[CHORE]: remove trait aliases @universalmind303 (#2229)

⬆️ Dependencies

... (truncated)

Commits

62f9dd6 [FEAT] Allow returning of pyarrow arrays from UDFs (#2252)
f1d6570 [CHORE] Drop Python 3.7 (#2250)
3b1a5ca [BUG] Fix tab completion on expression namespaced accessors (#2251)
0541774 [CHORE] Improve timestamp repr (#2245)
b61461f [PERF] Adaptive Query Execution (#2176)
89e3916 [FEAT] Add left, right, and outer joins (#2166)
e47b48a [CHORE] Allow multiple group_bys for pivot (#2242)
252721e [CHORE] Skip demo notebook (#2248)
d035454 [FEAT] Add rpad and lpad expressions (#2157)
3e9dcd4 [BUG] route abfss to AzureBlob (#2244)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

May 13 '24 13:05 dependabot[bot]

@canimus

I have tried to bump the version of daft from 0.2.19 to 0.2.24 on python version 3.11.9, I can not pass any test successfully.

For example, the current implementation of the has_min check:

def has_min(self, rule: Rule, dataframe: daft.DataFrame) -> Union[bool, int]:
    perdicate = daft.col(rule.column).min()
    return dataframe.select(perdicate).to_pandas().iloc[0, 0] == rule.value

It will raise this error:

cuallee\__init__.py:1199: in validate
    return self.compute_engine.summary(self, dataframe)
cuallee\daft_validation.py:513: in summary
    unified_results = {
cuallee\daft_validation.py:514: in <dictcomp>
    rule.key: [operator.methodcaller(rule.method, rule, dataframe)(compute)]
cuallee\daft_validation.py:98: in has_min
    return dataframe.select(perdicate).to_pandas().iloc[0, 0] == rule.value
.venv\Lib\site-packages\daft\api_annotations.py:26: in _wrap
    return timed_method(*args, **kwargs)
.venv\Lib\site-packages\daft\analytics.py:189: in tracked_method
    result = method(*args, **kwargs)
.venv\Lib\site-packages\daft\dataframe\dataframe.py:662: in select
    builder = self._builder.select(self.__column_input_to_expression(columns))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = * Source:
|   Number of partitions = 1
|   Output schema = id#Int32

to_select = [min(col(id))]

    def select(
        self,
        to_select: list[Expression],
    ) -> LogicalPlanBuilder:
        to_select_pyexprs = [expr._expr for expr in to_select]
>       builder = self._builder.select(to_select_pyexprs)
E       daft.exceptions.DaftCoreException: DaftError::ValueError Aggregation expressions are not currently supported in project: min(col(id))
E       If you would like to have this feature, please see https://github.com/Eventual-Inc/Daft/issues/1979#issue-2170913383

There are some internal changes in Daft package that might cause this, there are more information here. I have also commented this there.

Suggestion

This method (and others) should be update like this:

def has_min(self, rule: Rule, dataframe: daft.DataFrame) -> Union[bool, int]:
    col = daft.col(rule.column)
    return dataframe.select(col).min(col).to_pandas().iloc[0, 0] == rule.value

May 15 '24 22:05 dsaad68

@canimus based on the information provided by @kevinzwang, in here; I need to modify the methods in daft_validation.py to solve this problem. It seems the suggested workaround above is going to be backward compatible.

May 16 '24 14:05 dsaad68

Looks like getdaft is up-to-date now, so this is no longer needed.

May 26 '24 15:05 dependabot[bot]

cuallee cuallee copied to clipboard

Bump getdaft from 0.2.19 to 0.2.24

v0.2.24

Changes

✨ New Features

🚀 Performance Improvements

👾 Bug Fixes

📖 Documentation

🧰 Maintenance

⬆️ Dependencies

Suggestion

cuallee
cuallee copied to clipboard