materialize
materialize copied to clipboard
Non-positive multiplicity in DistinctBy
Sentry Issue: DATABASE-BACKEND-2TY
Non-positive multiplicity in DistinctBy
This error has now occurred in a production environment. Previous issues are closed and are only about occurrences in the CI:
- SQLsmith in https://github.com/MaterializeInc/materialize/issues/18657
- TPC-H load generator in https://github.com/MaterializeInc/materialize/issues/20405
@vmarcos: Is there anything we can and need to do about this or are we awaiting https://github.com/MaterializeInc/materialize/issues/20442?
I have just seen that it is not a customer environment but the production sandbox.
Indeed the case we are seeing looks pretty much in line with the explanation in https://github.com/MaterializeInc/materialize/issues/20405#issuecomment-1658221268.
Sentry issue: DATABASE-BACKEND-398
We got this today in Nightly in QA Canary Environment Base Load. Let me see if it is reproducible...
We got this today in Nightly in QA Canary Environment Base Load. Let me see if it is reproducible...
Yes, it is reliably reproducible!
SELECT COUNT(DISTINCT c_name) FROM qa_canary_environment.public_tpch.tpch_q18 WHERE o_orderdate >= '2023-01-01';
Error: Evaluation error: internal error: Non-positive multiplicity in DistinctBy
on https://console.materialize.com/regions/aws-us-east-1/shell, @petrosagg anything we need to provide? This is a tpch loadgen source. Since this is the environment we use to verify releases I'm worried by this failure.
I've spend some time testing the loadgen sources for correctness and closely reading the code. Nothing stood out and the correctness tests were successful so the most likely theory is that these long running sources suffered from a permanent correctness violation due to the shutdown bugs we had at some point. Given that those bugs have since been fixed I'll pause looking into this until it happens again (which under the current theory will not happen)
I'll pause looking into this until it happens again.
I'm going to optimistically close this to reflect that we don't plan to put further time on this right now. If this happens again, we can reopen!
@def- Few questions:
- How long has the affected environment been active?
- Where is the source's DB hosted?
- What version of PG is it running?
This is the QA Release Verification env: https://www.notion.so/QA-Release-Verification-107e03aaf4fe4029a35c21c46009f8fb The Mz environment was recreated about two weeks ago and had no problems since then. With the upgrade to v0.96.2 this issue started popping up a lot. The PG instance is located in the “Materialize Scratch” account of AWS, region eu-west-1, name qa-canary-environment. It has existed since July 2023: https://eu-west-1.console.aws.amazon.com/rds/home?region=eu-west-1#database:id=qa-canary-environment;is-cluster=false The PG version is 15.3 but was upgraded ~2 weeks ago too.
PG 15.6 is the first v15 release which fixes the invalid retractions issue we've seen. PG 16.2 should also have the fix. It seems like it would be best/easiest if we could move the upstream DB to that version to ensure that it isn't the same issue. Is that something available to us?
Sure, will try. Edit: Updated to 15.6 now. Will try to remove the filter in Sentry and let's see if this reoccurs. Edit2: so far so good
Still occurring after all, do I need to recreate the source?
do I need to recreate the source?
Yessir.
Seems good now for this source. I removed the filter in Sentry and no new occurrences so far.
This is still observed in PROD with 0.97.3 in 88f4bc01-67b2-4ae7-b53e-6c264993152c-0 (production sandbox):
https://materializeinc.sentry.io/issues/4299987926