materialize icon indicating copy to clipboard operation
materialize copied to clipboard

Non-positive multiplicity in DistinctBy

Open sentry-io[bot] opened this issue 2 years ago • 19 comments

Sentry Issue: DATABASE-BACKEND-2TY

Non-positive multiplicity in DistinctBy

This error has now occurred in a production environment. Previous issues are closed and are only about occurrences in the CI:

  • SQLsmith in https://github.com/MaterializeInc/materialize/issues/18657
  • TPC-H load generator in https://github.com/MaterializeInc/materialize/issues/20405

sentry-io[bot] avatar Oct 23 '23 07:10 sentry-io[bot]

@vmarcos: Is there anything we can and need to do about this or are we awaiting https://github.com/MaterializeInc/materialize/issues/20442?

nrainer-materialize avatar Oct 23 '23 07:10 nrainer-materialize

I have just seen that it is not a customer environment but the production sandbox.

nrainer-materialize avatar Oct 23 '23 07:10 nrainer-materialize

Indeed the case we are seeing looks pretty much in line with the explanation in https://github.com/MaterializeInc/materialize/issues/20405#issuecomment-1658221268.

vmarcos avatar Oct 23 '23 11:10 vmarcos

Sentry issue: DATABASE-BACKEND-398

sentry-io[bot] avatar Feb 02 '24 08:02 sentry-io[bot]

We got this today in Nightly in QA Canary Environment Base Load. Let me see if it is reproducible...

nrainer-materialize avatar Feb 02 '24 12:02 nrainer-materialize

We got this today in Nightly in QA Canary Environment Base Load. Let me see if it is reproducible...

Yes, it is reliably reproducible!

nrainer-materialize avatar Feb 02 '24 13:02 nrainer-materialize

SELECT COUNT(DISTINCT c_name) FROM qa_canary_environment.public_tpch.tpch_q18 WHERE o_orderdate >= '2023-01-01';
Error: Evaluation error: internal error: Non-positive multiplicity in DistinctBy

on https://console.materialize.com/regions/aws-us-east-1/shell, @petrosagg anything we need to provide? This is a tpch loadgen source. Since this is the environment we use to verify releases I'm worried by this failure.

def- avatar Feb 06 '24 14:02 def-

I've spend some time testing the loadgen sources for correctness and closely reading the code. Nothing stood out and the correctness tests were successful so the most likely theory is that these long running sources suffered from a permanent correctness violation due to the shutdown bugs we had at some point. Given that those bugs have since been fixed I'll pause looking into this until it happens again (which under the current theory will not happen)

petrosagg avatar Feb 09 '24 14:02 petrosagg

I'll pause looking into this until it happens again.

I'm going to optimistically close this to reflect that we don't plan to put further time on this right now. If this happens again, we can reopen!

benesch avatar Feb 14 '24 16:02 benesch

@def- Few questions:

  • How long has the affected environment been active?
  • Where is the source's DB hosted?
  • What version of PG is it running?

sploiselle avatar Apr 23 '24 02:04 sploiselle

This is the QA Release Verification env: https://www.notion.so/QA-Release-Verification-107e03aaf4fe4029a35c21c46009f8fb The Mz environment was recreated about two weeks ago and had no problems since then. With the upgrade to v0.96.2 this issue started popping up a lot. The PG instance is located in the “Materialize Scratch” account of AWS, region eu-west-1, name qa-canary-environment. It has existed since July 2023: https://eu-west-1.console.aws.amazon.com/rds/home?region=eu-west-1#database:id=qa-canary-environment;is-cluster=false The PG version is 15.3 but was upgraded ~2 weeks ago too.

def- avatar Apr 23 '24 05:04 def-

PG 15.6 is the first v15 release which fixes the invalid retractions issue we've seen. PG 16.2 should also have the fix. It seems like it would be best/easiest if we could move the upstream DB to that version to ensure that it isn't the same issue. Is that something available to us?

sploiselle avatar Apr 23 '24 11:04 sploiselle

Sure, will try. Edit: Updated to 15.6 now. Will try to remove the filter in Sentry and let's see if this reoccurs. Edit2: so far so good

def- avatar Apr 23 '24 12:04 def-

Still occurring after all, do I need to recreate the source?

def- avatar Apr 23 '24 13:04 def-

do I need to recreate the source?

Yessir.

benesch avatar Apr 23 '24 14:04 benesch

Seems good now for this source. I removed the filter in Sentry and no new occurrences so far.

def- avatar Apr 23 '24 15:04 def-

This is still observed in PROD with 0.97.3 in 88f4bc01-67b2-4ae7-b53e-6c264993152c-0 (production sandbox): https://materializeinc.sentry.io/issues/4299987926

nrainer-materialize avatar Apr 29 '24 21:04 nrainer-materialize