datachain icon indicating copy to clipboard operation
datachain copied to clipboard

Conjunction operators validation

Open dreadatour opened this issue 7 months ago • 3 comments

In datachain filter method we can use conjunction operators as following:

  • AND operation:

    .filter(C("foo") == "bar", C("baz") == "qux")
    
    .filter((C("foo") == "bar") & (C("baz") == "qux"))
    
    .filter(datachain.func.and_(C("foo") == "bar"), (C("baz") == "qux"))
    
  • OR operation:

    .filter((C("foo") == "bar") | (C("baz") == "qux"))
    
    .filter(datachain.func.or_(C("foo") == "bar"), (C("baz") == "qux"))
    
  • NOT operation:

    .filter(~(C("foo") == "bar"))
    

    NOTE: datachain.func.not_ is not implemented, need to implement.

Since we are using Python language, sometimes it feels natural to use python logical operators:

.filter(C("foo") == "bar" and C("baz") == "qux")
.filter(C("foo") == "bar" or C("baz") == "qux")
.filter(!C("foo"))

but that isn’t realistically possible to support this syntax.

Same time, we allow this syntax and in the end user query works not as expected. Sometimes (for complex chains) it is hard to notice that and to figure it out why results are wrong.

Suggestion

Should we may be check input param type for filter chain method (need to also check all other methods like mutate), and fail or fire a warning if it is bool type?

Are there other better options?

dreadatour avatar Jun 10 '25 02:06 dreadatour

As a first step let's put this summary into the filter docs. Thanks @dreadatour for creating the ticket.

shcheklein avatar Jun 10 '25 17:06 shcheklein

As a first step let's put this summary into the filter docs.

I'll do this later 👍

dreadatour avatar Jun 11 '25 02:06 dreadatour

As a first step let's put this summary into the filter docs.

https://github.com/iterative/datachain/pull/1151

dreadatour avatar Jun 12 '25 10:06 dreadatour