datachain
datachain copied to clipboard
`filter` doesn't work on top of group by results
A query like this doesn't work w/o persist on sqlite:
read_dataset("test")
.distinct("file.path")
.group_by(cnt=func.count(), files=func.collect("file.path"), partition_by=("session_id", "position"))
.persist()
.filter(C("cnt") > 1)
It raises:
in/data_storage/sqlite.py", line 242, in execute
result = self.db.execute(*self.compile_to_args(query))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.OperationalError: misuse of aggregate: count()