core icon indicating copy to clipboard operation
core copied to clipboard

Copy Site does not copy all content when a site has more than 10000 contentlets

Open swicken-dotcms opened this issue 2 years ago • 0 comments

Describe the bug Copy site does not copy all content when a site has more than 10000 contentlets

Reproduced on: 5.3.8.10/22.06

Related ticket: https://dotcms.zendesk.com/agent/tickets/106874

To Reproduce From a site with more than 10000 total contentlets

  1. Attempt to copy the site
  2. It will complete successfully, however not all content will have been copied

Expected behavior Copy site should copy all content. Currently it is hitting the limit for our Elasticsearch queries:

https://github.com/dotCMS/core/blob/88aef64a9b92923f1c2089af9ceed262198a8f9e/dotCMS/src/main/java/com/dotcms/content/elasticsearch/business/ESContentletAPIImpl.java#L178-L178

This is because the search we do to pull in a list of content to copy is this: +conHost:{HostIdentifier} +working:true We pass a limit of -1, which gets converted to the MAX_LIMIT of 10000, so only 10k contentlets are pulled in. This is related to a limit set on our indexes index.max_result_window which is capped at 10k.

In order to correct this we would need to both up the limit and the max_result_window, so it makes more sense to paginate the query.

We do have the ability to use the scroll API to paginate the query: https://github.com/dotCMS/core/blob/2e80d1dd0c19c6499ca165979cac8cf307c6098f/dotCMS/src/main/java/com/dotcms/content/elasticsearch/business/ESContentletAPIImpl.java#L1097

The simple solution might be setting the limit sent by copy site to something much larger, as sending -1 gets it set to the MAX_LIMIT. Perhaps make it a configurable limit.

swicken-dotcms avatar Jul 05 '22 20:07 swicken-dotcms