core
core copied to clipboard
Copy Site does not copy all content when a site has more than 10000 contentlets
Describe the bug Copy site does not copy all content when a site has more than 10000 contentlets
Reproduced on: 5.3.8.10/22.06
Related ticket: https://dotcms.zendesk.com/agent/tickets/106874
To Reproduce From a site with more than 10000 total contentlets
- Attempt to copy the site
- It will complete successfully, however not all content will have been copied
Expected behavior Copy site should copy all content. Currently it is hitting the limit for our Elasticsearch queries:
https://github.com/dotCMS/core/blob/88aef64a9b92923f1c2089af9ceed262198a8f9e/dotCMS/src/main/java/com/dotcms/content/elasticsearch/business/ESContentletAPIImpl.java#L178-L178
This is because the search we do to pull in a list of content to copy is this:
+conHost:{HostIdentifier} +working:true
We pass a limit of -1, which gets converted to the MAX_LIMIT of 10000, so only 10k contentlets are pulled in. This is related to a limit set on our indexes index.max_result_window
which is capped at 10k.
In order to correct this we would need to both up the limit and the max_result_window, so it makes more sense to paginate the query.
We do have the ability to use the scroll API to paginate the query: https://github.com/dotCMS/core/blob/2e80d1dd0c19c6499ca165979cac8cf307c6098f/dotCMS/src/main/java/com/dotcms/content/elasticsearch/business/ESContentletAPIImpl.java#L1097
The simple solution might be setting the limit sent by copy site to something much larger, as sending -1 gets it set to the MAX_LIMIT. Perhaps make it a configurable limit.