moodle-format_topcoll icon indicating copy to clipboard operation
moodle-format_topcoll copied to clipboard

Severe performance issues

Open aspark21 opened this issue 4 years ago • 34 comments

Hi Gareth,

This capability checks is extremely problematic - it's occurring 100s of times per page load and taking 120s for that to load https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/classes/activity.php#L129-L173

In particular this method is probable issue: private static function calulatecoursemodules($courseid, $students, $modid = null) {

especially this Line 757 - if (((method_exists($usermod, 'is_visible_on_course_page')) && ($usermod->is_visible_on_course_page()))

We're in CI mode as whole site went down earlier today, it seems other universities are also affected with performance issues on earlier versions of topcoll. We've had a long running item around coursemodinfo cache being problematic for some users for the last year or so. And had been pursuing https://tracker.moodle.org/browse/MDL-55231 - which is not quite ready but close. Though there's thinking needed about wether there is something about topcoll (or topcoll + adaptable) which could make this worse.

We should have Cat AU looped in overnight who may have some recommendations.

Al P.S. will email you to bring you into the loop on this

aspark21 avatar Oct 12 '21 19:10 aspark21

Actually there's no data in this so posting the xhprof traces here: course-view

course-editbulkcompletion

aspark21 avatar Oct 12 '21 19:10 aspark21

Dear @aspark21,

I have just about had enough of this code (sick to the back teeth) that I've inherited from Adaptable and indeed within Adaptable itself. If its problematic, then please do turn it off.

The code in question, only follows what core wrote. Before that the answer was inaccurate and confusing. I see no other way of writing it (bar removing the method exists call).

Also, please could you test the M3.11 version (https://moodle.org/mod/forum/discuss.php?d=425903) as that now employs the use of the Moodle cache, and should be much faster after initial calculations.

Gareth

gjb2048 avatar Oct 12 '21 20:10 gjb2048

How do we turn it off?

we're on cd442546e241b344daf47047ba975e9ef8ad19ca which includes all the cache improvements of the 3.9.1.6 release on Moodle 3.9.10

aspark21 avatar Oct 12 '21 20:10 aspark21

Settings can be used: https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/settings.php#L505-L589 - but that would mean resetting the default layout, probably needs rethinking that bit as a separation / better switch. So... comment out the code:

https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/classes/course_renderer.php#L220-L234

and the events:

https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/classes/observer.php

which could be better and react to the settings. EDIT, does react to the settings: https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/classes/activity.php#L1230-L1243, so just need a better 'reset' to default for that setting as a single element / have the course setting follow the default as the other's now do. EDIT: Which it does https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/lib.php#L774-L791, so now only need to separate into its own reset category so that courses not following the default can be reset to do so. i.e. when its changed to off.

gjb2048 avatar Oct 12 '21 21:10 gjb2048

Why does applying activity completion trigger a 'course_module_updated' event? I would have thought that would only happen when the teacher actually changed it.

gjb2048 avatar Oct 12 '21 22:10 gjb2048

Ok, improvements:

Separated out the reset so that its separate and easier to disable on a / all courses: https://github.com/gjb2048/moodle-format_topcoll/commit/a9fa4de5ffb178bc46100629f0b20822095b7c0b (Don't apply this: https://github.com/gjb2048/moodle-format_topcoll/commit/a9fa4de5ffb178bc46100629f0b20822095b7c0b#diff-8f5ff72ef1cf59199cfc4bbf16f2475d09e448601242f0d2c51bb94bd4aa1395R1572).

Removed 'method_exists': https://github.com/gjb2048/moodle-format_topcoll/commit/8a0dfea27dc9fc57afa47d60c113db5e381ff24d

Only calculate the students once per course: https://github.com/gjb2048/moodle-format_topcoll/commit/9a1afdcaf0f49bf4d9b3ea22dfdb78b6df419ccc.

gjb2048 avatar Oct 13 '21 16:10 gjb2048

Small optimisation: https://github.com/gjb2048/moodle-format_topcoll/commit/59aaff91ab2ae00767cdfd07c93a107483071ede

gjb2048 avatar Oct 13 '21 16:10 gjb2048

@aspark21 With the second flow, why is '\core\event\course_module_completion_updated' not used instead?

gjb2048 avatar Oct 13 '21 17:10 gjb2048

@aspark21 Ok, would a setting that prevented the activity information being calculated / shown if there was more than 'x' students on the course help?

gjb2048 avatar Oct 13 '21 17:10 gjb2048

We turned off those settings yesterday morning and site performed ok. Did not comment out any code yet.

We're however still seeing a significant amount of cache invalidations so it seems there is still something touching the caches outside of those features.

The key here is the cache is invalidated very frequently(100s of times/hour), not just for that specific course but for all courses and the cache then needs to get rebuilt. It's disproportional to the amounts of actual course edits(90 in a day).

aspark21 avatar Oct 14 '21 07:10 aspark21

What events are causing those invalidations and what is generating them?

gjb2048 avatar Oct 14 '21 07:10 gjb2048

I need stack traces please.

gjb2048 avatar Oct 14 '21 07:10 gjb2048

@aspark21 Any news on this please? At the moment I'm a bit in the dark, so can only operate a little on guesswork, I need more concrete facts / evidence.

gjb2048 avatar Oct 18 '21 15:10 gjb2048

Hi Gareth,

Yes, the team will provide the additional info shortly

aspark21 avatar Oct 18 '21 15:10 aspark21

Hi Alistair,

Thanks. The UCL team?

Gareth

gjb2048 avatar Oct 18 '21 15:10 gjb2048

@aspark21 Please note that the MOODLE_311 branch is being updated / improved in this area.

gjb2048 avatar Oct 18 '21 18:10 gjb2048

@aspark21 Been 7 days, any news on this please? I feel like a complete mushroom at the moment and working on pure guess work. I really need the events / stack traces for:

We're however still seeing a significant amount of cache invalidations so it seems there is still something touching the caches outside of those features.

The key here is the cache is invalidated very frequently(100s of times/hour), not just for that specific course but for all courses and the cache then needs to get rebuilt. It's disproportional to the amounts of actual course edits(90 in a day).

Then I might have a hope at fixing this for you!

gjb2048 avatar Oct 25 '21 11:10 gjb2048

Hi Gareth,

Sorry for the slow replies, been pretty tied up in the incident and looking for other causes.

We found the root cause for our incident to actually be https://tracker.moodle.org/browse/MDL-72837 - so I assume

I will rerun some stack traces this week now we've fixed the underlying flaw to see how this code is now performing.

Al

aspark21 avatar Oct 25 '21 12:10 aspark21

Dear @aspark21,

Thank you for getting back to me. I've now improved the code with a new admin setting 'courseadditionalmoddatamaxstudents' (you'll find this on the MOODLE_311 and master (for M4.0) branches), whereby if not zero, then additional information will be turned off if the number of students is above the number set. Of course, the number of students is still calculated, but there should be a performance increase as the rest isn't. The course settings will inform an editing teacher of the situation when this happens.

This is on top of all the other settings to turn elements on / off for this.

G

gjb2048 avatar Oct 25 '21 12:10 gjb2048

Dear @aspark21,

Any news on this please as its driving me nuts that I cannot make progress on this. I'm have a 'get the job done' attitude and this is stalled, which is very frustrating. If there is a bug to fix, then it needs fixing as I'll only then burn more duplicate time making more releases as other things make progress and this doesn't.

Gareth

gjb2048 avatar Nov 03 '21 11:11 gjb2048

Hi Gareth,

We won't be able to feedback on your improvements, as you only made the change in MOODLE_311 and not in MOODLE_39.

I've go an xhprof here of course/view.php now that we've fixed our underlying infrastructure issue. And you can see this is performing adequately. This is with the Additional Info switched off at Site level. The slow thing is calendar / course overview related, so nothing collapsed topics related & overall page load is not that slow. callgraph-courseview-fixed-noadditional

I have re-enabled the Additional Info at site level just now, so will provide similar trace towards the end of the day, for comparison.

aspark21 avatar Nov 10 '21 12:11 aspark21

Dear Alistair,

Thanks for the information. To be honest, I'm really embarrassed by all of this and had not anticipated the agro that porting this code from Adaptable would bring - in fact I'm very annoyed about it all. It is my intention to improve the 'kill switch' code and port that element along with the caching mechanism as far back as I can.

Gareth

gjb2048 avatar Nov 10 '21 12:11 gjb2048

@aspark21 But I have added the caching to the M3.9 version and released it! Please see point '2' for 'Version 3.9.1.6' on https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/Changes.md.

gjb2048 avatar Nov 10 '21 13:11 gjb2048

That wasn't a change for us - we were on MOODLE_39 @ cd442546e241b344daf47047ba975e9ef8ad19ca which already had that change in. We've just updated to MOODLE_39 @ 29450650698b49a2f748c24c9d06bd0a3fa1a2c7 which only had trivial changes (#105 and #111)

I was referring to the new 'courseadditionalmoddatamaxstudents' setting you mentionned https://github.com/gjb2048/moodle-format_topcoll/issues/114#issuecomment-950880746

aspark21 avatar Nov 11 '21 12:11 aspark21

@aspark21 Ah, ok, but what about this:

You:

"We're however still seeing a significant amount of cache invalidations so it seems there is still something touching the caches outside of those features."

"The key here is the cache is invalidated very frequently(100s of times/hour), not just for that specific course but for all courses and the cache then needs to get rebuilt. It's disproportional to the amounts of actual course edits(90 in a day)."

Me:

"Why does applying activity completion trigger a 'course_module_updated' event? I would have thought that would only happen when the teacher actually changed it."

"With the second flow, why is '\core\event\course_module_completion_updated' not used instead?"

"What events are causing those invalidations and what is generating them?"

You:

"Yes, the team will provide the additional info shortly"

Thus the 'courseadditionalmoddatamaxstudents' is outside of the event driven cache code that was already in the M3.9 version that I thought you were testing and I could possibly identify if I needed to raise a tracker issue for the 'completion' when it updates and generates the wrong event. The 'courseadditionalmoddatamaxstudents' setting only mitigates the issue but won't identify the "The key here is the cache is invalidated very frequently(100s of times/hour), not just for that specific course but for all courses and the cache then needs to get rebuilt. It's disproportional to the amounts of actual course edits(90 in a day)." issue I thought you were getting for me, please :).

gjb2048 avatar Nov 11 '21 13:11 gjb2048

For "We're however still seeing a significant amount of cache invalidations so it seems there is still something touching the caches outside of those features."

The reason for that was https://tracker.moodle.org/browse/MDL-72837 - An infrastructure & cache config combo that had a bug in core. So not Collapsed Topics related. The human way I explain this bug is that: "stacked should rebuild cache once & then copy from shared to local 90 times (number of frontends) but it isn't because bug means it rebuilds unnecessarily. single layer rebuilds once, stacked was rebuilding 900 times."

What we are still aware of is that editing any activity results in the full rebuild of the coursemodinfo cache which is a concern for large/very large courses (users x content) - there is improvements in https://tracker.moodle.org/browse/MDL-55231 which were going to make it into 4.0 but are now pushed back to 4.1; we're looking at backporting & reworking it for 3.9.

The events thing we assumed was students doing the manual ticking of activity completions. We didn't explore further as the root cause of the CI was elsewhere.

aspark21 avatar Nov 11 '21 13:11 aspark21

@aspark21 Ah ok, so its a distributed enterprise computing locking issue. Course editing won't result in a full rebuild of a cache, only the part pertaining to a given course.

gjb2048 avatar Nov 11 '21 13:11 gjb2048

Note to self: Investigate the event type generated by course completion.

gjb2048 avatar Nov 11 '21 13:11 gjb2048

@aspark21 Are you also saying that I've spent lots of time investigating this and it's transpired not to be a bug in my code but in fact to do with core code in relation to your enterprise structured Moodle system?

gjb2048 avatar Nov 11 '21 13:11 gjb2048

Any course editing does currently result in a full rebuild for a particular course. This will hopefully change in Moodle 4.1 to only updating the particular activity within the cache for that course.

I did say so earlier but apologies for the brevity which I guess didn't fully convey the meaning - https://github.com/gjb2048/moodle-format_topcoll/issues/114#issuecomment-950859595

So while I don't think there is a mysterious unidentified performance issue with other aspects of Collapsed Topics, there is definitely some issues with the additional activity info.

Now that we have a setup that isn't affected by the unnecessary rebuilds, I re-enabled all of the additional activity info on Wednesday afternoon. This lead to one of the courses refusing to load. It has 7000 users (mostly students) in it. So the looping stuff is an issue. I will re-enable tomorrow now that we've updated. but I was hoping the 'courseadditionalmoddatamaxstudents' was going to help with that, in case we wanted to re-enable the additional info

aspark21 avatar Nov 11 '21 23:11 aspark21