self-hosted icon indicating copy to clipboard operation
self-hosted copied to clipboard

crons: same timeline is shown twice

Open JeremiaAu opened this issue 1 year ago • 22 comments

Self-Hosted Version

24.5.0

CPU Architecture

x86_64

Docker Version

26.1.3

Docker Compose Version

2.27.0

Steps to Reproduce

  1. Upgrade from version 24.4.1 to 24.5.0
  2. (Wait for new check-ins)
  3. Open Crons

Expected Result

Only one row per environment

Actual Result

image

Event ID

No response

JeremiaAu avatar May 27 '24 13:05 JeremiaAu

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] avatar Jun 21 '24 22:06 getsantry[bot]

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] avatar May 29 '24 22:05 getsantry[bot]

Routing to @getsentry/product-owners-crons for triage ⏲️

getsantry[bot] avatar May 29 '24 22:05 getsantry[bot]

Would you be able to share the API response for the monitors/ API request?

Specifically I'm interested in if the [].evironments key has two environments in it, or if this is a UI bug

evanpurkhiser avatar May 30 '24 16:05 evanpurkhiser

Hey, sorry it took me so long, there really are two production environments in the API response: image

JeremiaAu avatar Jun 03 '24 15:06 JeremiaAu

I just created another issue that might be related: https://github.com/getsentry/self-hosted/issues/3104

JeremiaAu avatar Jun 03 '24 15:06 JeremiaAu

Hi @JeremiaAu

It's pretty unusual that you have duplicated environments here. We have unique constraints to prevent this type of duplication - the main way I could think of this failing is that potentially you have environments with the same name in multiple organizations in your self hosted instance, and possibly you migrated the monitor over and something went wrong with the migration.

Does that sound like something that would be possible in your set up?

wedamija avatar Jun 03 '24 18:06 wedamija

Hey, @wedamija,

We have only one organization on our server.

What might also be remarkable is, that this issue affects all three crons we currently have running in our organization.

JeremiaAu avatar Jun 03 '24 18:06 JeremiaAu

Hi @JeremiaAu, sorry for the delay in response here.

It might be most helpful to have a look at some of your data here. if you're comfortable running some sql queries, you can post the results here or email them to [email protected] if you would prefer them not be public.

Firstly, I'd like to see what is in your environments table: select * from sentry_environment

I'd also like to see the environments associated with one of the crons with the duplication problem

select sme.* 
from sentry_monitorenvironment sme
inner join sentry_monitor sm on sm.id = sme.monitor_id
where sm.slug in (<monitor_slug>)

wedamija avatar Jun 04 '24 23:06 wedamija

Hey @wedamija,

I have sent you an e-mail.

I have also gotten around to applying the fix from the other issue (https://github.com/getsentry/self-hosted/issues/3104). The updated screenshot now looks like this: image

JeremiaAu avatar Jun 05 '24 12:06 JeremiaAu

Ok, your problem here is quite weird - you have duplicate environments in your environment table, which is likely causing this problem. There should be a unique constraint in place to prevent this, so possibly something has gone wrong and removed the constraint.

Could you run \d sentry_environment and post/email the description?

I would expect to see an index like "sentry_environment_organization_id_name_95a37dc7_uniq" UNIQUE CONSTRAINT, btree (organization_id, name) on your table, possibly the name might be slightly different.

Could you also run select organization_id, name, count(*) from sentry_environment group by organization_id, name and email me through the results? I want to confirm that the strings are also identical.

wedamija avatar Jun 05 '24 19:06 wedamija

Based on the data in your system, it looks like there must be some kind of corruption with the unique constraint on (organization_id, name) that is causing it to not enforce the constraint. I'm not sure what caused it, but basically we need to figure out how to clean up your environment data to correct these duplicates. I'm going to discuss this internally and figure out the best person to help with this.

wedamija avatar Jun 06 '24 18:06 wedamija

I wonder if the reason behind this corruption with the unique constraint is because in between 24.4.1 and 24.5.0 we upgraded postgres to 14, and we started using the alpine image instead. Changing the OS might have lead to some issues here. I wonder if cleaning the duplicates up and then perhaps using the postgres:14 image instead might work?

hubertdeng123 avatar Jun 06 '24 20:06 hubertdeng123

I also think that the postgres issue mentioned in https://github.com/getsentry/self-hosted/issues/3107 is to blame.

I ran the following command, meant to identify broken indices, and the sentry_environment_organization_id_name_95a37dc7_uniq showed up.

SELECT DISTINCT indrelid::regclass::text, indexrelid::regclass::text, collname, pg_get_indexdef(indexrelid) 
FROM (SELECT indexrelid, indrelid, indcollation[i] coll FROM pg_index, generate_subscripts(indcollation, 1) g(i)) s 
  JOIN pg_collation c ON coll=c.oid
WHERE collprovider IN ('d', 'c') AND collname NOT IN ('C', 'POSIX');

source: https://wiki.postgresql.org/wiki/Locale_data_changes#What_to_do

JeremiaAu avatar Jun 07 '24 07:06 JeremiaAu

Yep, I have a feeling that is the case too. We've changed the postgres image used back to a debian based image here. Do you happen to have a backup of your postgres data before you upgraded? If so, depending on your needs it may be better to restore that data.

Otherwise, I think there are a few roads forward from here. It may be a good idea to perform a backup proceeding.

  1. Delete all data in the duplicate environment. This may include legitimate data, and if there is legitimate data we'd want to set data there to the original environment id. This may prove to be a manual process, since we don't have foreign keys for everything that references the environment.
  2. Afterwards, reindex the broken indices.

hubertdeng123 avatar Jun 10 '24 23:06 hubertdeng123

Sadly, I do not have a sentry back up that old. But I would not really mind loosing the data generated since the upgrade.

Can you provide postgresql commands for deleting the duplicate environments and associated data, or point me to the relevant docs?

JeremiaAu avatar Jun 11 '24 11:06 JeremiaAu

Note: We do not have an official guide for this and I am not sure if these instructions I'm giving you is completely comprehensive. This is not guaranteed to work and could result in data loss!

Looks like these models in Sentry are the ones that have a reference to an environment_id. By environment_id

  • sentry_deploy
  • sentry_latestrelease
  • sentry_rule
  • sentry_userreport

So, I'd probably try something like

DELETE from sentry_environment WHERE id="$duplicate_environment_id" (pick the duplicate environment with the higher id)
DELETE from sentry_deploy WHERE environment_id="$duplicate_environment_id"
DELETE from sentry_latestrelease WHERE environment_id="$duplicate_environment_id"
DELETE from sentry_rule WHERE environment_id="$duplicate_environment_id"
DELETE from sentry_userreport WHERE environment_id="$duplicate_environment_id"
REINDEX INDEX sentry_environment_organization_id_name_95a37dc7_uniq;

I believe the models with foreign key relations should be cleaned up automatically.

hubertdeng123 avatar Jun 12 '24 21:06 hubertdeng123

Hey @hubertdeng123, I just got around to applying your suggested fix.

The Models with foreign key relations were not cleaned up automatically, so I had to expand the delete commands:

DELETE from sentry_deploy WHERE environment_id='8';
DELETE from sentry_latestrelease WHERE environment_id='8';
DELETE from sentry_rule WHERE environment_id='8';
DELETE from sentry_userreport WHERE environment_id='8';

DELETE from sentry_environmentproject WHERE environment_id='8';
DELETE from sentry_releaseprojectenvironment WHERE environment_id='8';
DELETE from sentry_environment WHERE id='8';
REINDEX INDEX sentry_environment_organization_id_name_95a37dc7_uniq;

Unfortunately Sentry is behaving weirdly now.

The cron overview does not show check-ins across environments: (But the check mark or fire symbol are displayed correctly) image

I also can't access individual crons without specifying one environment as it only returns the error message "The monitor you were looking for was not found"

Viewing a single environment does work: image image

Thanks for your help so far! Should I create a new Issue?

JeremiaAu avatar Jun 19 '24 08:06 JeremiaAu

Routing to @getsentry/product-owners-crons for triage ⏲️

getsantry[bot] avatar Jun 21 '24 22:06 getsantry[bot]

Are there any logs that may give us a clue to why you're getting The monitor you were looking for was not found? I suspect there is something we're missing but I'm not sure what it is.

hubertdeng123 avatar Jun 24 '24 23:06 hubertdeng123

Where can I find the log you are mentioning?

JeremiaAu avatar Jun 25 '24 05:06 JeremiaAu

I would be curious what the command docker compose logs web shows. Hopefully that should give some information?

hubertdeng123 avatar Jun 25 '24 20:06 hubertdeng123

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

getsantry[bot] avatar Jul 17 '24 07:07 getsantry[bot]