InvertedIndices.jl icon indicating copy to clipboard operation
InvertedIndices.jl copied to clipboard

Add some sort of `Strict` option

Open pdeffebach opened this issue 5 years ago • 1 comments

This issue is motivated by a recent PR to DataFrames here. We would like to add the functionality

julia> df = DataFrame(a = rand(2), b = rand(2));

julia> select(df, Not(:c))
4×2 DataFrame
│ Row │ a        │ b         │
│     │ Float64  │ Float64   │
├─────┼──────────┼───────────┤
│ 1   │ 0.916099 │ 0.0552436 │
│ 2   │ 0.998861 │ 0.310562  │

This currently errors. It would be nice if it didn't error since often you want to drop columns automatically just to "clean things up" and not worry about if the column really exists.

This would create inconsistent behavior with other usage of InvertedIndices, obviously. Indexing columns of a DataFrame would be different than indexing rows in a data frame.

One solution is to have some option in InvertedIndices which would allow the user to specify if they care about selecting things that don't exist in the DataFrame. Perhaps a constructor

Not(:c, strict = false)

Then this is stored in the field somehow so we can specialize behavior based off of this option.

Let me know what you think, It's certainly not the only path to getting the behavior we want but it might be fruitful.

pdeffebach avatar Jun 08 '20 14:06 pdeffebach

I think this is more of a property of the downstream use of the inverted index than of the index itself. For example, how would you perform a non-strict select using a regular (i.e. not Not) index, like select(df, :c) but allowing :c to be ignored if it doesn't exist in df? Clearly that functionality can't be baked into the object used to index, since you can't make a non-strict Symbol, String, Int, etc. Baking it into Not but not having it similarly available to other index types feels a bit odd.

I agree with the desire for the functionality you describe, as I've run into that myself. However, given the above, I think the solution should be implemented in DataFrames as part of its API rather than in InvertedIndices as part of Not.

ararslan avatar Dec 03 '24 02:12 ararslan