MisterKloudy issues

Results 7 issues of


                                            MisterKloudy

Add a .list.unique() expression for unique elements of a list

Would return only the unique elements of the lists like the python set() functionality. Example: ``` df = daft.from_pydict({"a": [[1, 2, 2, 3, 3, 3], [1, 3, 5, 5]]}) df.with_column("b",...

Allowing GroupedDataFrame.agg_concat to also take in a delimiter like Expression.list.join

Currently agg_concat simply combines the strings without a delimiter so the alternative would be to first collect it as agg_list then do list.join with a delimiter but it would be...

Exposing number of bytes to keep & hashing algorithm in the expression minhash()

I would like to get a minhash with alternative hash algorithms such as the first four bytes of SHA1 as implemented in https://github.com/bigcode-project/bigcode-dataset/blob/main/near_deduplication/minhash_deduplication_spark.py The deduplication rate is empirically much better...

Implementing hive-style read

When pyspark saves parquets to a folder on a partition, it creates folders of the partition=some_value. When I use daft to read_parquet the parent folder, I would like to get...

Add a .list.apply() expression

Would apply a function on each element in the list Example: ``` df = daft.from_pydict({"a": [["HeLLo WoRlD", "Hi", "WelCoMe"], ["tO", "a New WoRlD"]]}) df.with_column("b", col("a").list.apply(element().str.lower())).select("b").show() ``` Expected output: ``` ╭─────────────╮...

enhancement

Add a `.list.value_counts()` expression

Would return the counts of each element in the lists like the pandas .value_counts() or numpy .unique(with_counts=True) functionality. **Example:** ``` df = daft.from_pydict({"a": [[1, 2, 2, 3, 3, 3], [1,...

[Bug] Semantic sort for repos doesn't seem to do as described

### Search before asking - [X] I searched the [issues](https://github.com/IBM/data-prep-lab/issues) and found no similar issues. ### Component Transforms/Other, Other ### What happened + What you expected to happen Hi IBM...

bug