Mads R. B. Kristensen

Results 162 comments of Mads R. B. Kristensen

> Is it possible to preallocate a chunked dataset, then query hdf5 for the file name, file offset and extent corresponding to each chunk? Yes, this is exactly what I...

This is definitely something KvikIO could support however it will not use GPUDirect Storage (GDS) since [CUDA's `cuFile`](https://docs.nvidia.com/gpudirect-storage/api-reference-guide/index.html) only support files.

I have been planning to implement this for JIT unspilling for some time but now that we are introducing [spilling in cuDF](https://github.com/rapidsai/cudf/pull/10746) it might be sufficient to include spill logging...

> Also most of the tests in https://github.com/rapidsai/dask-cuda/blob/branch-23.02/dask_cuda/tests/test_explicit_comms.py are executed in a child process but a quick glance makes me think that's not necessary. Could you remind me why that...

Could you try with some large buffers like: ``` KVIKIO_COMPAT_MODE=ON python python/benchmarks/single-node-io.py --nruns 5 --nbytes 100MB KVIKIO_COMPAT_MODE=OFF python python/benchmarks/single-node-io.py --nruns 5 --nbytes 100MB ```

> @madsbk The benchmark/single-node-io.py file seems to output reasonably. But why doesn't it work in the example I showed above? I guess it is because of the initialization overhead. Try...

> 1. If I want to use multi-gpu CUDA_VISIBLE_DEVICES='3,4', in this scenario can I control the flow of read operation to which device in the python script using this library?...

CuPy arrays are ignored by jit-unspill by default, see https://github.com/rapidsai/dask-cuda/pull/568#issuecomment-824730557: > > Out of curiosity, why can't `cupy.ndarray` be proxified just like other objects? I must have missed the explanation...

> More specifically, when Dask stores data on a worker, the data could be packed and compressed with nvComp in the `zict` (`__setitem__`) and when Dask needs to use that...

> @madsbk do we have docs or maybe an example of JIT spilling? Maybe this would help @benjha in the near term :) Yes, we have some info here: https://docs.rapids.ai/api/dask-cuda/nightly/spilling.html#jit-unspill