cudf
cudf copied to clipboard
[BUG] cudf.pandas dataframe.__repr__ slow in jupyterlab for large datasets
Describe the bug Calling a dataframe.repr in a notebook cell either takes very long or results in a kernel failure for large datasets. Steps/Code to reproduce bug In a jupyterlab environment, run this in a cell:
# [cell 1]
%load_ext cudf.pandas
# [cell 2]
import pandas as pd
import numpy as np
# Define the number of rows and columns
num_rows = 25_000_000
num_columns = 12
# Create a DataFrame with random data
df = pd.DataFrame(np.random.randint(0, 100, size=(num_rows, num_columns)),
columns=[f'Column_{i}' for i in range(1, num_columns + 1)])
# [cell 3]
df
Expected behavior dataframe should render quickly, as is the case when working directly with cudf, or pandas
Note
This works as expected in a python interactive shell, or when calling print(df)
in a notebook.