core icon indicating copy to clipboard operation
core copied to clipboard

Running Integrity Checker "fix" can break content

Open wezell opened this issue 7 months ago • 0 comments

Parent Issue

No response

Problem Statement

If there is a content type ID mismatch between two environments and you run the integrity checker, we update the structure.inode and the contentlet.structure_inode in the structure and contentlet tables but we do not update the contentlet.contentlet_as_json['contentType']

This causes an error when dotCMS tries to load the content type from the information stored in the json - that content type id no longer exists. See here where we are trying to load the content type from the information in the json - kaboom! :

https://github.com/dotCMS/core/blob/master/dotCMS/src/main/java/com/dotcms/content/business/json/ContentletJsonAPIImpl.java#L289

We should alway be loading the content type from the column in the db - which should be seen as the source of truth. That said, there are a few options and maybe we need to do a few of them

  1. Pass in the preloaded content type (or the expected field list) into the to the ContentletJsonAPI method and basically ignoring the contentType value in the JSON
  2. Instead of storing the contentType Id in the json, we store the content type variable. Regardless if we fix #1, I think we should do this as our default as it is way less fragile and works out of the box by the way. If you run this on a piece of FileAsset content, the content will still load.
UPDATE contentlet SET contentlet_as_json['contentType'] = '"FileAsset"' WHERE inode = '6e51f36c-8555-497a-9a0d-51923be9a269';

Steps to Reproduce

  1. Set up a push publishing environment that has conflicts - empty starter vs the new starter for example
  2. Run the integrity checker. Make note of the content types that have conflicts and fix them at one of the ends.
  3. On the "fixed" instance, try to load one of the content of the "fixed" types.

To reproduce it quickly, just update a piece of content's json with a content type that does not exist, e.g.

UPDATE contentlet SET contentlet_as_json['contentType'] = '"NOT_FOUND_NOPE"' WHERE inode = '6e51f36c-8555-497a-9a0d-51923be9a269';

Acceptance Criteria

The content type in the database field should be primary - we should be able to update that column in the content type and not blow up the content. I would like to avoid a "fix" that makes the integrity checker run an update on all the contentlet_as_json values as on large datasets that could take a very long time.

If we do choose to store the contentType.variable in the json rather than the id, then I would say that could happen lazily

dotCMS Version

23.10

Maybe 23.01 - wherever we added support for the contentlet as json

Proposed Objective

Reliability

Proposed Priority

Priority 1 - Show Stopper

External Links... Slack Conversations, Support Tickets, Figma Designs, etc.

No response

Assumptions & Initiation Needs

No response

Quality Assurance Notes & Workarounds

No response

Sub-Tasks & Estimates

No response

wezell avatar Jun 28 '24 17:06 wezell