InvertedIndices.jl
InvertedIndices.jl copied to clipboard
Add some sort of `Strict` option
This issue is motivated by a recent PR to DataFrames here. We would like to add the functionality
julia> df = DataFrame(a = rand(2), b = rand(2));
julia> select(df, Not(:c))
4×2 DataFrame
│ Row │ a │ b │
│ │ Float64 │ Float64 │
├─────┼──────────┼───────────┤
│ 1 │ 0.916099 │ 0.0552436 │
│ 2 │ 0.998861 │ 0.310562 │
This currently errors. It would be nice if it didn't error since often you want to drop columns automatically just to "clean things up" and not worry about if the column really exists.
This would create inconsistent behavior with other usage of InvertedIndices, obviously. Indexing columns of a DataFrame would be different than indexing rows in a data frame.
One solution is to have some option in InvertedIndices which would allow the user to specify if they care about selecting things that don't exist in the DataFrame. Perhaps a constructor
Not(:c, strict = false)
Then this is stored in the field somehow so we can specialize behavior based off of this option.
Let me know what you think, It's certainly not the only path to getting the behavior we want but it might be fruitful.
I think this is more of a property of the downstream use of the inverted index than of the index itself. For example, how would you perform a non-strict select using a regular (i.e. not Not) index, like select(df, :c) but allowing :c to be ignored if it doesn't exist in df? Clearly that functionality can't be baked into the object used to index, since you can't make a non-strict Symbol, String, Int, etc. Baking it into Not but not having it similarly available to other index types feels a bit odd.
I agree with the desire for the functionality you describe, as I've run into that myself. However, given the above, I think the solution should be implemented in DataFrames as part of its API rather than in InvertedIndices as part of Not.