datasets
datasets copied to clipboard
map() function removes columns when input_columns is not None
Describe the bug
The map function, removes features from the dataset that are not present in the input_columns list of columns, despite the columns being removed not mentioned in the remove_columns argument.
Steps to reproduce the bug
from datasets import Dataset
ds = Dataset.from_dict({"a" : [1,2,3],"b" : [0,1,0], "c" : [2,4,5]})
def double(x,y):
x = x*2
y = y*2
return {"d" : x, "e" : y}
ds.map(double, input_columns=["a","c"])
Expected results
Dataset({
features: ['a', 'b', 'c', 'd', 'e'],
num_rows: 3
})
Actual results
Dataset({
features: ['a', 'c', 'd', 'e'],
num_rows: 3
})
In this specific example feature b should not be removed.
Environment info
datasetsversion: 2.4.0- Platform: linux (colab)
- Python version: 3.7.13
- PyArrow version: 6.0.1
Hi! Thanks for reporting! This looks like a bug. I've just opened a PR with the fix.
Awesome! Thank you. I'll close the issue once the PR gets merged. :-)
I guess we should reopen after the revert by:
- #5006