core
core copied to clipboard
Running Integrity Checker "fix" can break content
Parent Issue
No response
Problem Statement
If there is a content type ID mismatch between two environments and you run the integrity checker, we update the structure.inode
and the contentlet.structure_inode
in the structure and contentlet tables but we do not update the contentlet.contentlet_as_json['contentType']
This causes an error when dotCMS tries to load the content type from the information stored in the json - that content type id no longer exists. See here where we are trying to load the content type from the information in the json - kaboom! :
https://github.com/dotCMS/core/blob/master/dotCMS/src/main/java/com/dotcms/content/business/json/ContentletJsonAPIImpl.java#L289
We should alway be loading the content type from the column in the db - which should be seen as the source of truth. That said, there are a few options and maybe we need to do a few of them
- Pass in the preloaded content type (or the expected field list) into the to the ContentletJsonAPI method and basically ignoring the
contentType
value in the JSON - Instead of storing the contentType Id in the json, we store the content type variable. Regardless if we fix #1, I think we should do this as our default as it is way less fragile and works out of the box by the way. If you run this on a piece of FileAsset content, the content will still load.
UPDATE contentlet SET contentlet_as_json['contentType'] = '"FileAsset"' WHERE inode = '6e51f36c-8555-497a-9a0d-51923be9a269';
Steps to Reproduce
- Set up a push publishing environment that has conflicts - empty starter vs the new starter for example
- Run the integrity checker. Make note of the content types that have conflicts and fix them at one of the ends.
- On the "fixed" instance, try to load one of the content of the "fixed" types.
To reproduce it quickly, just update a piece of content's json with a content type that does not exist, e.g.
UPDATE contentlet SET contentlet_as_json['contentType'] = '"NOT_FOUND_NOPE"' WHERE inode = '6e51f36c-8555-497a-9a0d-51923be9a269';
Acceptance Criteria
The content type in the database field should be primary - we should be able to update that column in the content type and not blow up the content. I would like to avoid a "fix" that makes the integrity checker run an update on all the contentlet_as_json values as on large datasets that could take a very long time.
If we do choose to store the contentType.variable in the json rather than the id, then I would say that could happen lazily
dotCMS Version
23.10
Maybe 23.01 - wherever we added support for the contentlet as json
Proposed Objective
Reliability
Proposed Priority
Priority 1 - Show Stopper
External Links... Slack Conversations, Support Tickets, Figma Designs, etc.
No response
Assumptions & Initiation Needs
No response
Quality Assurance Notes & Workarounds
No response
Sub-Tasks & Estimates
No response