Jay Chia
Jay Chia
Hey @apostolos-geyer any progress on this? Otherwise we might have to find someone on our end to fix it :)
Thanks @Xuanwo! I believe @plotor is putting together some initial benchmarks for us to work off of for prioritization of these features
From an offline conversation today: 1. Fixing #2493 should help a lot with Daft being overly-aggressive with scheduling when streaming data out of multiple dataframes. 2. Adding a new multi-stream...
Also, no unit tests on this yet but would love pointers on how to best test this change. Should I add a new column in the files generated by `parquet_integration/write_parquet.py`?
Indeed. This is something @kevinzwang is working on stabilizing our solution for. You can actually try it out with the environment variable: `DAFT_ENABLE_ACTOR_POOL_PROJECTIONS =1` The fix is currently available for...
I see you're using a GPU as well in your UDF. We will probably want to correctly assign the `CUDA_VISBLE_DEVICES` appropriately for each instance of your UDF which isn't yet...
Made a PR for an initial attempt at doing `CUDA_VISIBLE_DEVICES`: https://github.com/Eventual-Inc/Daft/pull/2882 You'll likely need that if running multi-GPU on a single node + PyRunner!
Oops yes, I think you've stumbled upon one of the TODOs in this beta feature: Let me work on something here
> Hi @jaychia, > > I am now able to run with `EmbeddingUDF.with_init_args(model_name).with_concurrency(1).override_options(num_gpus=1)` using the main branch. > > But now hit an error for `RuntimeError: No CUDA GPUs are...
Hello! This might be a pyhudi error -- cc @xushiyan from the Hudi team for any thoughts We are currently awaiting the Hudi team's implementation of Hudi-rs which would give...