typo3-realurl icon indicating copy to clipboard operation
typo3-realurl copied to clipboard

Garbage collection of urldata table should not be done during FE requests

Open liayn opened this issue 7 years ago • 8 comments

Is there a chance to completely disable garbage collection of urldata table and move that code into a scheduler task? The gc runs pretty often for a site of ours and the delete statement takes more than 2s on the DB and shows up in our slow query log. In our opinion having such heavy operations running randomly for FE requests is really a bad idea, as you see spikes in the response time monitoring, but one has hard times finding the reason for those spikes.

liayn avatar Nov 23 '16 17:11 liayn

The only reason why I did not do it that way is that people tend to forget to add those tasks.

However I tried to make it run not that often.

Firsts, it can run ONLY when a new is inserted, not when it is read. Secondly, it does not run at every insert but at 20% of insertions only.

You can improve performance if you pre-cache URLs using any crawler (like wget for Linux or Integrity for Mac). This is what should be done for the production site anyway.

dmitryd avatar Nov 24 '16 14:11 dmitryd

I know that reasoning.

The 20% thingy is rather misleading, since you get a random number between 0 and max_random (something MAX_INT) and for 20% of those numbers you run the gc. So this is not 20% of all write operations, but could potentially happen for every write operation.

I'm not asking to remove that completely, but to move that code into a dedicated class and to add an option to disable that and enable a scheduler task. This setup is rather meant for people who know what they are doing. (counting myself to that group now ;-))

liayn avatar Nov 24 '16 15:11 liayn

Would you accept a change if I propose a PR?

liayn avatar Dec 12 '16 09:12 liayn

Depends on the change.

dmitryd avatar Dec 12 '16 10:12 dmitryd

As written above, I would extract the code into a separate class, add a scheduler task around it and would add an option to disable the automatic gc. So this would stay fully backwards compatible.

liayn avatar Dec 12 '16 11:12 liayn

and would add an option to disable the automatic gc

Currently configuration is not passed directly to cache classes. So there has to be a setter for that.

dmitryd avatar Dec 12 '16 15:12 dmitryd

@dmitryd @liayn - I've had massive performance problems because of the garbage collection task (it got executed very, very frequently since I run a site with more than 500.000 URLs). I've opened my own ticket for that issue #371

If we have to have garbage collection the maximum amount of URLs should be configurable and it should not be executed on FE requests - since that would be rather often if perfectly valid urls are getting deleted (and then newly generated).

zechendorf avatar Jan 11 '17 10:01 zechendorf

@zechendorf I still have this on my ToDo, to make this a scheduler task and run the gc asynchronously.

liayn avatar Jan 11 '17 17:01 liayn