datachain Conjunction operators validation

In datachain filter method we can use conjunction operators as following:

AND operation:

.filter(C("foo") == "bar", C("baz") == "qux")

.filter((C("foo") == "bar") & (C("baz") == "qux"))

.filter(datachain.func.and_(C("foo") == "bar"), (C("baz") == "qux"))

OR operation:

.filter((C("foo") == "bar") | (C("baz") == "qux"))

.filter(datachain.func.or_(C("foo") == "bar"), (C("baz") == "qux"))

NOT operation:
```
.filter(~(C("foo") == "bar"))
```
NOTE: datachain.func.not_ is not implemented, need to implement.

Since we are using Python language, sometimes it feels natural to use python logical operators:

.filter(C("foo") == "bar" and C("baz") == "qux")

.filter(C("foo") == "bar" or C("baz") == "qux")

.filter(!C("foo"))

but that isn’t realistically possible to support this syntax.

Same time, we allow this syntax and in the end user query works not as expected. Sometimes (for complex chains) it is hard to notice that and to figure it out why results are wrong.

Suggestion

Should we may be check input param type for filter chain method (need to also check all other methods like mutate), and fail or fire a warning if it is bool type?

Are there other better options?

Jun 10 '25 02:06 dreadatour

As a first step let's put this summary into the filter docs. Thanks @dreadatour for creating the ticket.

Jun 10 '25 17:06 shcheklein

As a first step let's put this summary into the filter docs.

I'll do this later 👍

Jun 11 '25 02:06 dreadatour

As a first step let's put this summary into the filter docs.

https://github.com/iterative/datachain/pull/1151

Jun 12 '25 10:06 dreadatour