Desmond Cheong
Desmond Cheong
### What changes were proposed in this pull request? The Variant datatype was added in https://github.com/apache/spark/pull/43707 but the equivalent PySpark type was not added. In this PR we add Variant...
Adds a parallel CSV reader to speed up ingestion of CSV. The approach adapts some ideas laid out in [1], but the majority of performance gains came from the use...
## Changes Made When we generate thumbnails to display in notebooks, we encoded them as JPEG by default. This does not work if images have an alpha channel. This PR...
## Changes Made 1. Add a generic `WriteSink` interface. Users can use this to write custom write sinks that have optional `.start()`, `.write()`, `.finish()` methods. 2. Add `DataFrame.write_to_sink()` that takes...
## Summary There are currently three cases where we split projections: - when extracting actor pool projects - when extracting monotonically increasing ids - when extracting window functions In these...
## Changes Made Was curious as to why our APIs weren't working too hot with UC and took a look. It seems that the `daft.unity_catalog.UnityCatalog` object we pass into `from_unity`...
### Describe the bug ```py import daft df = daft.from_pydict({ "person": [ {"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}, {"name": "Charlie", "age": 35} ] }) daft.sql("SELECT person.'*' FROM df").show()...
### Is your feature request related to a problem? NA ### Describe the solution you'd like The join reordering optimizer rule currently does a projection pushup and filter pushup. We...
## Changes Made Sometimes critical CI fixes are blocked by... CI. Let's force merge our way through.
## Changes Made Adds vLLM as a provider for text embedding. ``` import daft from daft.ai.provider import load_provider from daft.functions.ai import embed_text provider = load_provider("vllm") model = "Qwen/Qwen3-Embedding-0.6B" ( daft.read_huggingface("Open-Orca/OpenOrca")...