Colin Ho
Colin Ho
With `repartition(8)` it means there will be 8 partitions doing url download and UDF in parallel, which is likely more memory intensive than doing 4 partitions in parallel.
One way to find out is if you just did an `into_partitions` and then `collect` without any url download or UDFs. If that OOMs then it's a problem with using...
Hey @jzz0930, unfortunately this issue is on our backlog, and I'm not able to give an estimate on when we'll be able to tackle it. However, if you are curious,...
Are you running on a single node or in a cluster? If single node, I'd suggest using the native runner first, i.e. `set_runner_native`. I'd also recommend specifying `with_concurrency` and `batch_size`...
Ah, thats a bug. It should be fixed in version 0.4.11. The issue was because extension types like embedding are not able to be pickled and sent between processes.
Closing as the extension types should work on native runner now
Hi @djouallah , sorry for the delay, I'm currently finalizing the PR for this, will let you know once it is ready
Hey @djouallah, this feature should be ready in the next release!
This feature is ready in v0.3.9, closing the issue.
See: https://github.com/Eventual-Inc/Daft/blob/02806c4c27153300f688426775672ca8e292cf72/daft/runners/ray_runner.py#L455, we currently set the max to 1.0 to prevent worker thrashing