cudf icon indicating copy to clipboard operation
cudf copied to clipboard

[BUG] cudf.pandas dataframe.__repr__ slow in jupyterlab for large datasets

Open AjayThorve opened this issue 9 months ago • 1 comments

Describe the bug Calling a dataframe.repr in a notebook cell either takes very long or results in a kernel failure for large datasets. Steps/Code to reproduce bug In a jupyterlab environment, run this in a cell:


# [cell 1]
%load_ext cudf.pandas

# [cell 2]
import pandas as pd
import numpy as np

# Define the number of rows and columns
num_rows = 25_000_000
num_columns = 12

# Create a DataFrame with random data
df = pd.DataFrame(np.random.randint(0, 100, size=(num_rows, num_columns)),
                  columns=[f'Column_{i}' for i in range(1, num_columns + 1)])


# [cell 3]
df

image

Expected behavior dataframe should render quickly, as is the case when working directly with cudf, or pandas

Note This works as expected in a python interactive shell, or when calling print(df) in a notebook.

AjayThorve avatar May 14 '24 19:05 AjayThorve