jay
jay
@colin-ho This can be retried, but retrying will have an impact on performance. Because I'm writing a test case with ray data during the test, comparing similar scenarios with daft...
@jaychia @colin-ho Can a rebase button be added to the pipeline? I'm not sure if the failure of the use case here is due to my PR or the lack...
> You should be able to rebase locally and do a force push! > > Also thanks for the PR -- is Daft not already installed in editable mode for...
@desmondcheongzx Thank you very much for your reply. I have debugged the coredump file. It seems to be related to the fact that I am reading lance. I'm already trying...
reproduce script ``` tm0 = time.time() default_scan_options={ "with_row_id": True } dataset = daft.read_lance(url="s3://xxx/ds_4coewnmpfe59tgkklbol.lance/", io_config=io_config, default_scan_options=default_scan_options) df = daft.sql("select * from dataset limit 20") df.show(6) tm1 = time.time() print('cost:', tm1 -...
fix it : https://github.com/lancedb/lance/pull/4153
What is relatively magical here is that the utilization rate of the GPU is actually higher for daft. However, the end-to-end time of daft is even longer. I am speculating...
> Are you running on a single node or in a cluster? If single node, I'd suggest using the native runner first, i.e. `set_runner_native`. I'd also recommend specifying `with_concurrency` and...
> To try the native runner (do not initialize Ray for this): > > # This is the default behavior if Ray is not initialized > daft.set_runner_native() > > #...