ibis bug - Regression in `.sample()`

bug - Regression in `.sample()`

Open rapatel0 opened this issue 1 year ago • 2 comments

What happened?

It appears that ibis.sample is returning the whole dataframe and ignoring the fraction parameter.

I've also tested with a to_parquet('test_data.parquet') call and that saves a full copy of the dataset unsampled.

ipython test:

In [4]: ibis.__version__
Out[4]: '9.5.0'

In [5]: df = con.read_parquet("s3://sdm-threat-mlflow/data_more.parquet")

In [6]: df.count().execute()
Out[6]: 53214

In [7]: df.sample(0.1).count().execute()
Out[7]: 53214

What version of ibis are you using?

9.5.0

What backend(s) are you using, if any?

Duckdb with s3fs filesystem.

Environment is setup with this pre-script (env vars store S3 variables):

import s3fs
import ibis
import numpy as np
import pandas as pd

fs = s3fs.S3FileSystem(anon=False)

con = ibis.duckdb.connect(":memory:")
con.register_filesystem(fs)

print("available variables: ")
print("`fs` - S3FS Object initialized from environmental variables")
print("`con` - Ibis duckdb connection with s3fs initialized")

Relevant log output

Command returns with no errors or log output

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Oct 09 '24 20:10 rapatel0

ibis ibis copied to clipboard

bug - Regression in `.sample()`

What happened?

What version of ibis are you using?

What backend(s) are you using, if any?

Relevant log output

Code of Conduct

ibis
ibis copied to clipboard