pandas icon indicating copy to clipboard operation
pandas copied to clipboard

ENH: .isin() method should use __contains__ rather than __iter__ for user-defined classes to determine presence.

Open f3ss1 opened this issue 8 months ago • 4 comments

Feature Type

  • [X] Adding new functionality to pandas

  • [X] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

Right now, if you would define a user class:

class MyClass:
    def __init__(self):
        self.collection = [1, 2, 3]
        self.another_collection = [4, 5, 6]
    
    def __contains__(self, item):
        return item in self.collection
    
    def __iter__(self):
        yield from self.another_collection

and would then initialize a pandas dataframe like this:

example_dataframe = pd.DataFrame(
    {
        'column_name': [3, 1, 4, 6, 13],
        'another_column_name': ['tolly', 'trolly', 'telly', 'belly', 'nelly']
    }
)

and would then call the .isin() method like this:

class_instance = MyClass()
example_dataframe['column_name'].isin(class_instance)

you would actually get this output:

False
False
True
True
False

which is if the values from self.another_collections specified in __iter__ are checked, rather than self.collection from __contains__. I do realize that this might stem from compatibility with other libraries, but this seems counter-intuitive.

Feature Description

A solution I suggest is either to change the behavior (which might result into ruining some peoples code, I believe), or adding a flag (which would lead to more complexity, I guess).

Alternative Solutions

See above.

Additional Context

No response

f3ss1 avatar Jun 18 '24 05:06 f3ss1