ibis
ibis copied to clipboard
bug - Regression in `.sample()`
What happened?
It appears that ibis.sample is returning the whole dataframe and ignoring the fraction parameter.
I've also tested with a to_parquet('test_data.parquet') call and that saves a full copy of the dataset unsampled.
ipython test:
In [4]: ibis.__version__
Out[4]: '9.5.0'
In [5]: df = con.read_parquet("s3://sdm-threat-mlflow/data_more.parquet")
In [6]: df.count().execute()
Out[6]: 53214
In [7]: df.sample(0.1).count().execute()
Out[7]: 53214
What version of ibis are you using?
9.5.0
What backend(s) are you using, if any?
Duckdb with s3fs filesystem.
Environment is setup with this pre-script (env vars store S3 variables):
import s3fs
import ibis
import numpy as np
import pandas as pd
fs = s3fs.S3FileSystem(anon=False)
con = ibis.duckdb.connect(":memory:")
con.register_filesystem(fs)
print("available variables: ")
print("`fs` - S3FS Object initialized from environmental variables")
print("`con` - Ibis duckdb connection with s3fs initialized")
Relevant log output
Command returns with no errors or log output
Code of Conduct
- [X] I agree to follow this project's Code of Conduct