cudf [QST] Running cudf terribly slow

What is your question? I have a python code which calculates lots of numbers for varios custom dataclass objects. In the past I switched to polars in order to speed up. Now ı need to go faster, Therefore I try to implement a solution with a GPU. Core runs without any error in Pycharm, ut when I try to run it on terminal it gets error. Any help please

python3 -m cudf.pandas main.py

Batch_id: 17 Process ForkProcess-1: Traceback (most recent call last): File "/home/hakan/miniconda3/envs/bbx_gpu_env/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/hakan/miniconda3/envs/bbx_gpu_env/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/hakan/miniconda3/envs/bbx_gpu_env/lib/python3.10/concurrent/futures/process.py", line 240, in _process_worker call_item = call_queue.get(block=True) File "/home/hakan/miniconda3/envs/bbx_gpu_env/lib/python3.10/multiprocessing/queues.py", line 122, in get return _ForkingPickler.loads(res) File "/home/hakan/miniconda3/envs/bbx_gpu_env/lib/python3.10/site-packages/cudf/pandas/fast_slow_proxy.py", line 602, in __setstate__ unpickled_wrapped_obj = pickle.loads(state) File "/home/hakan/miniconda3/envs/bbx_gpu_env/lib/python3.10/site-packages/cudf/core/abc.py", line 178, in host_deserialize frames = [ File "/home/hakan/miniconda3/envs/bbx_gpu_env/lib/python3.10/site-packages/cudf/core/abc.py", line 179, in <listcomp> cudf.core.buffer.as_buffer(f) if c else f File "/home/hakan/miniconda3/envs/bbx_gpu_env/lib/python3.10/site-packages/cudf/core/buffer/utils.py", line 136, in as_buffer return buffer_class(owner=owner_class.from_host_memory(data)) File "/home/hakan/miniconda3/envs/bbx_gpu_env/lib/python3.10/site-packages/cudf/core/buffer/buffer.py", line 216, in from_host_memory buf = rmm.DeviceBuffer(ptr=ptr, size=size) File "device_buffer.pyx", line 88, in rmm._lib.device_buffer.DeviceBuffer.__cinit__ File "memory_resource.pyx", line 1087, in rmm._lib.memory_resource.get_current_device_resource File "/home/hakan/miniconda3/envs/bbx_gpu_env/lib/python3.10/site-packages/rmm/_cuda/gpu.py", line 58, in getDevice raise CUDARuntimeError(status) rmm._cuda.gpu.CUDARuntimeError: cudaErrorInitializationError: initialization error

Jun 11 '24 17:06 Hakan439

Hey @Hakan439, thanks for raising this issue! Could you tell me the output of running nvidia-smi in your terminal?

Jun 11 '24 19:06 Matt711

Hi, here it is:

Jun 11 '24 20:06 Hakan439

@Hakan439 Since you say the core runs on pycharm, can you confirm if terminal you are getting the error and pycharm are using the same environment?

You could run which python and share the output of terminal and pycharm.

Jun 11 '24 23:06 galipremsagar

Which Python? Screenshot_2024-06-14_17-37-13

cuda, cupy, cudf Screenshot_2024-06-14_16-56-31

gpu enabled Screenshot_2024-06-14_16-54-34

Sample code: Screenshot_2024-06-14_17-25-31

results: Screenshot_2024-06-14_17-34-34

Jun 14 '24 16:06 Hakan439

Somehow now gpu seems to be enabled but it DOES run very slow. In the sample code above, I reduced the amount of rows in dataframe to 1000 for the test. in cpu dataframe it took 0.12 seconds however in gpu, in 8 seconds. mine running as as eGPU btw I no not know whether it makes a difference or not. What am I missing?

Jun 14 '24 16:06 Hakan439

In the pycharm console output I see a different environment from everything else, gputest vs bbx_gpu_env. Not sure if that is significant or intentional. Good to know that it is running now, though. What do you mean by "eGPU"? Regarding performance, what does your data look like? How many columns does it have? Do you observe similar issues if you have a single column?

Jun 25 '24 18:06 vyasr

In the pycharm console output I see a different environment from everything else, gputest vs bbx_gpu_env. Not sure if that is significant or intentional. Good to know that it is running now, though. What do you mean by "eGPU"? Regarding performance, what does your data look like? How many columns does it have? Do you observe similar issues if you have a single column?

I tried with several virtual environments. The initial env was gputest. In mu single column benchmark, it rans slow. In my original python code, When I try to run it via cudf, it gives the error in the first message

Jul 04 '24 18:07 Hakan439

I have a python code which calculates lots of numbers for varios custom dataclass objects

If cudf is working now but it is still slow, it is possible that your code is using custom dataclasses in a way that cudf simply doesn't support and so you end up falling back to running everything on the CPU. The relative slowdown you mentioned (0.12 vs 8 seconds) is pretty huge though. Have you tried running your code through the cudf.pandas profiler?

Jul 15 '24 17:07 vyasr