integreat-cms icon indicating copy to clipboard operation
integreat-cms copied to clipboard

Add deferrable Page tree constraints

Open svenseeberg opened this issue 7 months ago • 27 comments

Short description

Add deferrable constraints for the page tree that do not obstruct the update operation of the Treebeard tree.

We tested this and it works :tada:. If the constraints are not deferrable, move operations break. If the constraints are deferrable, updates on trees work as expected.

Proposed changes

  • Add constraints to PageTreeNode
    • Add unique constraint for (tree_id, lft)
    • Add unique constraint for (tree_id, rgt)
    • Add check constraint to ensure smaller lft than rgt values
  • Split core functionality of repair_tree command into cms app utils
    • Add MPTT Fixer class that calculates new lft/rgt/depth values for all tree at once
    • Add ShadowInstance class that can shadow a model, such that any changes to it have to be explicitly applied and can be diffed
  • Add @tree_mutex() decorator to replace @django.db.transactions.atomic (still uses it inside in addition to the mutex implementation)
    • Monkeypatch treebeards Node class to use djangos database cursor for its raw SQL queries so that changes get rolled back on failure
  • Automatically attempt to fix tree if a page move operation fails and try again
  • Add tests for @tree_mutex() and the repair_tree utility and management command

Missing

  • Front end error messages
  • Apply constraints and migration to AbstractTreeNode instead

Side effects

Resolved issues

Fixes: #


Pull Request Review Guidelines

svenseeberg avatar Dec 29 '23 00:12 svenseeberg

Less restrictive working version:

CREATE FUNCTION public.page_tree_limit_children() RETURNS trigger
   LANGUAGE plpgsql AS
$$BEGIN
   IF (OLD.tree_id=NEW.tree_id) AND (SELECT (COUNT(*)+1)*2 FROM cms_page WHERE tree_id=NEW.tree_id AND lft>NEW.lft AND rgt<NEW.rgt) < (NEW.rgt-NEW.lft-1)
   THEN
      RAISE EXCEPTION 'The node has fewer children than indicated by the difference between lft and rgt values (one missing node is allowed!).';
   END IF;
   RETURN NEW;
END;$$;
 
CREATE CONSTRAINT TRIGGER trigger_page_tree_limit_children AFTER INSERT OR UPDATE ON public.cms_page DEFERRABLE INITIALLY DEFERRED
   FOR EACH ROW EXECUTE PROCEDURE public.page_tree_limit_children();
CREATE FUNCTION public.unique_combined_lft_rgt() RETURNS trigger
   LANGUAGE plpgsql AS
$$BEGIN
   IF ((OLD.tree_id=NEW.tree_id) AND
      (
       SELECT COUNT(*) FROM 
       (
        SELECT COUNT(*) AS count FROM (
         SELECT id,lft AS combined FROM cms_page
         WHERE tree_id=NEW.tree_id
         UNION
         SELECT id,rgt AS combined FROM cms_page
         WHERE tree_id=NEW.tree_id
        ) lr GROUP BY combined
       ) counts WHERE count>1) > 2)
   THEN
      RAISE EXCEPTION 'lft or rgt value not unique enough.';
   END IF;
   RETURN NEW;
END;$$;
 
CREATE CONSTRAINT TRIGGER trigger_unique_combined_lft_rgt AFTER UPDATE ON public.cms_page DEFERRABLE INITIALLY DEFERRED
   FOR EACH ROW EXECUTE PROCEDURE public.unique_combined_lft_rgt();

Final verdict: while the constraint triggers can catch some errors, they are relatively toothless as we need to have a comparatively large margin of error. We decided to continue with the (lft, tree_id) and (rgt, tree_id) unique constraints only.

svenseeberg avatar Jan 20 '24 11:01 svenseeberg

stress test to reproduce collisions:

tools/integreat-cms-cli shell

from integreat_cms.cms.models import Page
from integreat_cms.cms.utils.tree_mutex import tree_mutex

@tree_mutex
def forth():
    alltag = Page.objects.get(id=21)
    sprachkurse = Page.objects.get(id=20)
    alltag.move(sprachkurse, 'last-child')

@tree_mutex
def back():
    alltag = Page.objects.get(id=21)
    sprachkurse = Page.objects.get(id=20)
    alltag.move(sprachkurse, 'right')

i=0

while True:
    forth()
    back()
    print(f"i = {i}")
    i += 1

…and proceed to happily move pages about in the CMS

PeterNerlich avatar Jan 20 '24 17:01 PeterNerlich

The constraints introduce issues with the repair_tree command as it is not yet build for updating all trees at once in one large transaction. And as Treebeard it will temporarily break the tree structure.

svenseeberg avatar Jan 20 '24 23:01 svenseeberg

mock treebeard.models.Node._get_database_cursor() within a custom atomic decorator (?) / context manager to always return the cursor of the connection of the currently wrapped function

EDIT: attempted here: https://github.com/digitalfabrik/integreat-cms/compare/7bc796c...bugfix/treebeard_transaction

PeterNerlich avatar Jan 20 '24 23:01 PeterNerlich

I had a crack at making tree_mutex be a transaction and inject its db cursor into treebeard, but it doesn't seem fully working yet: even though everything seems to work correctly now as far as I can tell, attempting a concurrent move operation causes one of the constraints to fail. Maybe this could be explained by the cache/db disparity we read about, I'm really not sure. I pushed my changes to a separate branch for now: https://github.com/digitalfabrik/integreat-cms/compare/b40e283...bugfix/treebeard_transaction

When you test this, note that I updated my comment about the test commands above, so that it actually also uses tree_mutex

PeterNerlich avatar Jan 21 '24 20:01 PeterNerlich

Maybe also risk a look at https://github.com/digitalfabrik/integreat-cms/compare/b40e283...bugfix/treebeard_transaction – I rebased it just now onto these latest changes. I didn't test it though. Have a look through the mutex at least, I think I saw a few improvements there (e.g. I moved the uuid for some reason, I think it was critical for the mutex to work correctly)

PeterNerlich avatar Mar 01 '24 19:03 PeterNerlich

Maybe also risk a look at b40e283...bugfix/treebeard_transaction – I rebased it just now onto these latest changes. I didn't test it though. Have a look through the mutex at least, I think I saw a few improvements there (e.g. I moved the uuid for some reason, I think it was critical for the mutex to work correctly)

Do you want to open a PR for that? In general this seems solid. .... edit: https://github.com/digitalfabrik/integreat-cms/pull/2680

svenseeberg avatar Mar 01 '24 20:03 svenseeberg

Ah never mind, I will just include the commit :D

svenseeberg avatar Mar 01 '24 20:03 svenseeberg

@timobrembeck I cannot figure out how to capture the command output in tests/core/management/commands/test_repair_tree.py (btw. ignore tests/cms/models/pages/tree_mutex.py, it is a stub that we just copied, as you can see) – the management command is run successfully and I see its output in pytests Captured stdout call, but apparently using the following to capture it doesn't work. I don't think it makes much sense to use the log fixture for it as we do in other tests, because logically we don't care which way the messages take, only that they eventually end up in the terminal stdout when we call the command.

def get_command_output(command: str, *args: str, **kwargs: Any) -> tuple[str, str]:
    out = StringIO()
    err = StringIO()
    call_command(
        command,
        *args,
        **kwargs,
        stdout=out,
        stderr=err,
    )
    return out.getvalue(), err.getvalue()

Do you have any idea?

PeterNerlich avatar Mar 03 '24 15:03 PeterNerlich

@PeterNerlich No sorry, at the first glance I don't see the reason for this, but I assume it has something to do with the custom Printer class. In my opinion, we should remove it to reduce complexity and use consistent logging behavior in all management commands.

timobrembeck avatar Mar 03 '24 17:03 timobrembeck

Then I guess I'll try that next then. Though if it works as I expect it to, it should be rather transparent. It just hands the strings to the desired function, be it from logging or otherwise.

PeterNerlich avatar Mar 03 '24 17:03 PeterNerlich

Unfortunately, it's still the same

PeterNerlich avatar Mar 03 '24 18:03 PeterNerlich

Unfortunately, it's still the same

As described here, we have to log to the logger integreat_cms.core.management.commands in order to get the logging to stdout because it's defined here: https://github.com/digitalfabrik/integreat-cms/blob/41b995e1304285c2d6853813a3168b672af840db/integreat_cms/core/settings.py#L685-L692

timobrembeck avatar Mar 03 '24 18:03 timobrembeck

That was what I was forgetting.

Forgive me, I was almost done, but then I discovered that the repair tree command would report the tree_id with its already fixed value, which I deemed confusing especially if only run in check mode, and I proceeded to build a fully model instance shadowing helper to record the diff… It is a rather big addition that I hope might be helpful elsewhere, but for now it contains a bit of unneeded code because it just felt like it belonged. Please do give me honest feedback on this.

PeterNerlich avatar Mar 05 '24 16:03 PeterNerlich

Code Climate has analyzed commit 15e9b647 and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 86.0% (50% is the threshold).

This pull request will bring the total coverage in the repository to 82.0% (0.2% change).

View more on Code Climate.

codeclimate[bot] avatar Mar 06 '24 13:03 codeclimate[bot]

Okay, I had to re-run the CI pipeline a few times until it went through. At least twice I had a nonsensical failure (not finding the region Augsburg for example) – I fear that the introduced mutex might not be completely innocent here, though I'd strongly expect a different result than the object being treated as not existing in the database.

PeterNerlich avatar Mar 06 '24 13:03 PeterNerlich

could this maybe solve #1716?

PeterNerlich avatar Mar 12 '24 14:03 PeterNerlich

@timobrembeck Weird stuff is happening making tests fail and I have no clue why. tests/api/test_api_push_page.py::test_api_push_page_content - assert 404 == 403 Can you have a look at whether this PR is at fault or something else is going on?

PeterNerlich avatar Mar 25 '24 12:03 PeterNerlich

I'd still like to add a slightly more complex test to tests/cms/utils/test_repair_tree.py, but with the two tests for @tree_mutex() I finally think that this is ready to leave the draft status! :tada:

PeterNerlich avatar Mar 29 '24 15:03 PeterNerlich

Tests seemingly were a success this time, ~~but I'm not confident that my last commit regarding that is~~ are not 100% effective. It also needs cleanup for sure and likely should be refactored, even for such a band-aid solution.

PeterNerlich avatar Apr 02 '24 01:04 PeterNerlich

Even if I'm still investigating why the test don't get along as they should, I think the following log from test_tree_mutex is already very indicative of the effectiveness of tree_mutex (especially considering that test_rule_out_false_positive which does exactly the same thing but not using the mutex provokes a collision immediately in all of my observation so far)

test stdout of log messages from the mutex and patched cursor function
starting threads…
running 5-10-5 on contestant #21 with tree_mutex
    [#21] 0
running 5-10-5 on contestant #19 with tree_mutex
    [#19] 0
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (e04f1088-6fd0-492c-bd48-3048ef06bf6c). Waiting 0.1s…
   moving contestant #19…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (e04f1088-6fd0-492c-bd48-3048ef06bf6c). Waiting 0.1s…
OK moving contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.1849055290222168s (e04f1088-6fd0-492c-bd48-3048ef06bf6c)
   moving back contestant #19…
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (7ab36bba-e55a-4892-b50b-9ead0fdf26a7). Waiting 0.1s…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.11504745483398438s (7ab36bba-e55a-4892-b50b-9ead0fdf26a7)
    [#19] 1
   moving contestant #19…
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (1b4e1e43-b8f3-4323-89c7-d4887ba566e9). Waiting 0.1s…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.10081100463867188s (1b4e1e43-b8f3-4323-89c7-d4887ba566e9)
   moving back contestant #19…
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (13bf03cf-20c2-4e6e-8d53-d443419e9ac3). Waiting 0.1s…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (13bf03cf-20c2-4e6e-8d53-d443419e9ac3). Waiting 0.1s…
OK moving back contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.14224028587341309s (13bf03cf-20c2-4e6e-8d53-d443419e9ac3)
    [#19] 2
   moving contestant #19…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (0bc106e0-d09a-40ca-95bc-d7ff90e027cd). Waiting 0.1s…
OK moving contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.15320730209350586s (0bc106e0-d09a-40ca-95bc-d7ff90e027cd)
   moving back contestant #19…
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (062407b2-4e86-4d72-9ea8-987059b19628). Waiting 0.1s…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.12149930000305176s (062407b2-4e86-4d72-9ea8-987059b19628)
    [#19] 3
   moving contestant #19…
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (893a6f38-82ff-449f-a3f8-676d901cca7b). Waiting 0.1s…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.09466361999511719s (893a6f38-82ff-449f-a3f8-676d901cca7b)
   moving back contestant #19…
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (217eea57-26d7-4a00-ada7-ac5ad51fdbe9). Waiting 0.1s…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.11383414268493652s (217eea57-26d7-4a00-ada7-ac5ad51fdbe9)
    [#19] 4
   moving contestant #19…
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (883f0766-1eac-45ad-a5ec-70b68a052280). Waiting 0.1s…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (883f0766-1eac-45ad-a5ec-70b68a052280). Waiting 0.1s…
OK moving contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.21114397048950195s (883f0766-1eac-45ad-a5ec-70b68a052280)
   moving back contestant #19…
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (78c19334-ad1c-4b61-b45b-02139e74ea89). Waiting 0.1s…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #19!
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (78c19334-ad1c-4b61-b45b-02139e74ea89). Waiting 0.1s…
  Releasing MUTEX_PAGE_TREE after 0.11609148979187012s (78c19334-ad1c-4b61-b45b-02139e74ea89)
finished 5-10-5 of contestant #19 with tree_mutex
   moving contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving contestant #21!
  Releasing MUTEX_PAGE_TREE after 1.5792932510375977s (20570cbf-2cd3-43d5-aecf-c1b7092c99c8)
   moving back contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.11186051368713379s (eaef70a7-b8f1-4a84-9a59-ad880b7f2c21)
    [#21] 1
   moving contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.14497041702270508s (f7c1c7c6-1422-4b10-95ce-a21af933090d)
   moving back contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.20409369468688965s (77765b0e-4d76-4a83-a29e-5a2dfe91beba)
    [#21] 2
   moving contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.15691709518432617s (f4ea6dd6-62ea-4857-b0cf-8ff401d9a794)
   moving back contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.1440751552581787s (ab41fc5e-95fd-4ad4-9809-d5edba714965)
    [#21] 3
   moving contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.1037137508392334s (3e7a11a3-3c02-45f9-bcc4-d2a6d532be3f)
   moving back contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.17049074172973633s (53225650-6062-4861-a995-79b5e137f58d)
    [#21] 4
   moving contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.13613367080688477s (de91fa77-6826-4026-8f23-2f28dd279c5a)
   moving back contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.11805582046508789s (109eb81d-eb78-4129-98bb-83bcc71b2647)
finished 5-10-5 of contestant #21 with tree_mutex
joined threads!

In this case, it seems as soon as one thread released the lock, it acquired it again such that the threads executed after another. This makes sense as we wait a pretty long time (0.1s) before re-trying and the time window between releasing and trying to re-acquiring the mutex is as short as it can be on purpose.

PeterNerlich avatar Apr 04 '24 13:04 PeterNerlich

Wait, we decorated a few functions with @tree_mutex(). But since e.g. to move a page we catch a few exceptions. Doesn't that mean that these never reach @tree_mutex(), meaning that the transaction is committed instead of being rolled back?

PeterNerlich avatar Apr 09 '24 10:04 PeterNerlich