tutor icon indicating copy to clipboard operation
tutor copied to clipboard

cms-job taking too long to finish

Open vunguyen-dmt opened this issue 10 months ago • 5 comments

My instance has about 10k courses, every time I run "tutor k8s launch" the process gets stuck at "job.batch/cms-job-20250207040323 created" for hours. I checked logs and meilisearch was indexing courses, eventually it went through this step. It didn't happen with elasticsearch before. While this is not a bug, I think definitely a serious problem for larger instances, I created an issue here so we can improve it.

vunguyen-dmt avatar Feb 07 '25 04:02 vunguyen-dmt

Can you identify which part of the CMS initialization is taking too long? Is it ./manage.py cms reindex_studio --experimental --init or ./manage.py cms reindex_course --active? (ref: https://github.com/overhangio/tutor/blob/release/tutor/templates/jobs/init/cms.sh)

regisb avatar Feb 07 '25 10:02 regisb

@regisb I'm not sure which one it was, but the log outputs start with [search.meilisearch] [user None] [ip None] meilisearch.py:144 - Index request: index=tutor_course_info sources= And it ended up with Killed the last time I launch.

vunguyen-dmt avatar Feb 11 '25 07:02 vunguyen-dmt

Then it is ./manage.py cms reindex_course --active. This comment has come from the past to bite us: https://github.com/overhangio/tutor/pull/1174#pullrequestreview-2488381483

Basically, we are asking the CMS to reindex all active courses. This change was introduced in Sumac. We should probably remove that line from cms.sh...

regisb avatar Feb 11 '25 10:02 regisb

@regisb I have some suggestions here for how to improve this upgrade/init step and would love your input: https://github.com/openedx/edx-platform/issues/36868

bradenmacdonald avatar Jun 06 '25 17:06 bradenmacdonald

My team also struggled with slow deployment times due to the reindexing. We have around 650 courses, and the job did eventually complete but still delayed our deployment workflows. Since we do not use any of the search functionality, we determined a reasonable solution to remove the reindex commands from the deployment workflow entirely which has saved us the time.

Since others are struggling with the same issue, I wonder if it would be worthwhile to add a dynamic condition based on a setting variable to determine whether to run or skip these commands. Could that be included in sumac and teak patch versions of tutor? That would mitigate some of the concerns I listed in my post about our current implementation.

jcohen28 avatar Jul 10 '25 21:07 jcohen28