pyradigm icon indicating copy to clipboard operation
pyradigm copied to clipboard

add_samplet: feature_names allows dimension mismatch, order isn't paired -- will overwrite

Open WillForan opened this issue 4 years ago • 1 comments

I had a few bugs (using wrong variable name), and realized I never got yelled at for providing bad feature names.

A few observations:

  1. feature name length doesn't have to match features.

there can be too many (x, y, z and an additional "DNE" name)

ds = RegrDataset()
ds.descritpion="extra of feauture names"
ds.add_samplet('id1', target=100, features=[1,2,3], feature_names=['x','y','z'])
ds.add_samplet('id2', target=200, features=[4,5,6], feature_names=['x','y','z','DNE'])
(x, _, _) = ds.data_and_targets()
print(ds.feature_names)
print(x)

['x' 'y' 'z' 'DNE'] [[1. 2. 3.] [4. 5. 6.]]

or too few (only x, but have x, y, and z)

ds = RegrDataset()
ds.descritpion="extra of feauture names"
ds.add_samplet('id1', target=100, features=[1,2,3], feature_names=['x'])
ds.add_samplet('id2', target=200, features=[6,5,4], feature_names=['x'])
[x, _, _] = ds.data_and_targets()
print(ds.feature_names)
print(x)

['x'] [[1. 2. 3.] [6. 5. 4.]]

  1. specifying feature names for one samplet changes names everywhere?
ds = RegrDataset()
ds.descritpion="extra of feauture names"
ds.add_samplet('id1', target=100, features=[1,2,3], feature_names=['x','y','z'])
ds.add_samplet('id2', target=200, features=[4,5,6], feature_names=['y','y','z'])
[x, _, _] = ds.data_and_targets()
print(ds.feature_names)
print(x)

['y' 'y' 'z'] [[1. 2. 3.] [4. 5. 6.]]

this is a potentially surprising when features given to add_samplet in a different order -- even if feature and feature_names are paired correctly (@raamana -- a thing you warned me to check. good eye!)

ds = RegrDataset()
ds.descritpion="extra of feauture names"
ds.add_samplet('id1', target=100, features=[1,2,3], feature_names=['x','y','z'])
ds.add_samplet('id2', target=200, features=[6,5,4], feature_names=['z','y','x'])
[x, _, _] = ds.data_and_targets()
print(ds.feature_names)
print(x)

['z' 'y' 'x'] [[1. 2. 3.] [6. 5. 4.]]

WillForan avatar Dec 10 '20 21:12 WillForan

Thanks a lot Will for putting pyradigm to test and reporting these bugs!

Let me look into them and see why they that happened. but these bugs hopefully haven't prevented you from running comparisons? I am zoom and we can discuss this more if you want -- and to prepare for the "progress report" so to say.

raamana avatar Dec 10 '20 21:12 raamana