integreat-cms icon indicating copy to clipboard operation
integreat-cms copied to clipboard

Add deferrable Page tree constraints

Open svenseeberg opened this issue 1 year ago • 29 comments

Short description

Add deferrable constraints for the page tree that do not obstruct the update operation of the Treebeard tree.

We tested this and it works :tada:. If the constraints are not deferrable, move operations break. If the constraints are deferrable, updates on trees work as expected.

Proposed changes

  • Add constraints to AbstractTreeNode
    • Add unique constraint for (tree_id, lft)
    • Add unique constraint for (tree_id, rgt)
    • Add check constraint to ensure smaller lft than rgt values
  • Split core functionality of repair_tree command into cms app utils
    • Add MPTT Fixer class that calculates new lft/rgt/depth values for all tree at once
    • Add ShadowInstance class that can shadow a model, such that any changes to it have to be explicitly applied and can be diffed
  • Add @tree_mutex() decorator to replace @django.db.transactions.atomic (still uses it inside in addition to the mutex implementation)
    • Monkeypatch treebeards Node class to use djangos database cursor for its raw SQL queries so that changes get rolled back on failure
  • Automatically attempt to fix tree if a page move operation fails and try again
  • Add tests for @tree_mutex() and the repair_tree utility and management command

Missing

  • Front end error messages

Side effects

Resolved issues

Potentially fixes: #1518, #1716


Pull Request Review Guidelines

Note to reviewers: You don't need to read the comments in this thread, except maybe the very most recent ones. The earliest possibly helpful one is this one outlining a few assumptions at the core of the architecture of @tree_mutex().

svenseeberg avatar Dec 29 '23 00:12 svenseeberg

Less restrictive working version:

CREATE FUNCTION public.page_tree_limit_children() RETURNS trigger
   LANGUAGE plpgsql AS
$$BEGIN
   IF (OLD.tree_id=NEW.tree_id) AND (SELECT (COUNT(*)+1)*2 FROM cms_page WHERE tree_id=NEW.tree_id AND lft>NEW.lft AND rgt<NEW.rgt) < (NEW.rgt-NEW.lft-1)
   THEN
      RAISE EXCEPTION 'The node has fewer children than indicated by the difference between lft and rgt values (one missing node is allowed!).';
   END IF;
   RETURN NEW;
END;$$;
 
CREATE CONSTRAINT TRIGGER trigger_page_tree_limit_children AFTER INSERT OR UPDATE ON public.cms_page DEFERRABLE INITIALLY DEFERRED
   FOR EACH ROW EXECUTE PROCEDURE public.page_tree_limit_children();
CREATE FUNCTION public.unique_combined_lft_rgt() RETURNS trigger
   LANGUAGE plpgsql AS
$$BEGIN
   IF ((OLD.tree_id=NEW.tree_id) AND
      (
       SELECT COUNT(*) FROM 
       (
        SELECT COUNT(*) AS count FROM (
         SELECT id,lft AS combined FROM cms_page
         WHERE tree_id=NEW.tree_id
         UNION
         SELECT id,rgt AS combined FROM cms_page
         WHERE tree_id=NEW.tree_id
        ) lr GROUP BY combined
       ) counts WHERE count>1) > 2)
   THEN
      RAISE EXCEPTION 'lft or rgt value not unique enough.';
   END IF;
   RETURN NEW;
END;$$;
 
CREATE CONSTRAINT TRIGGER trigger_unique_combined_lft_rgt AFTER UPDATE ON public.cms_page DEFERRABLE INITIALLY DEFERRED
   FOR EACH ROW EXECUTE PROCEDURE public.unique_combined_lft_rgt();

Final verdict: while the constraint triggers can catch some errors, they are relatively toothless as we need to have a comparatively large margin of error. We decided to continue with the (lft, tree_id) and (rgt, tree_id) unique constraints only.

svenseeberg avatar Jan 20 '24 11:01 svenseeberg

stress test to reproduce collisions:

tools/integreat-cms-cli shell

from integreat_cms.cms.models import Page
from integreat_cms.cms.utils.tree_mutex import tree_mutex

@tree_mutex
def forth():
    alltag = Page.objects.get(id=21)
    sprachkurse = Page.objects.get(id=20)
    alltag.move(sprachkurse, 'last-child')

@tree_mutex
def back():
    alltag = Page.objects.get(id=21)
    sprachkurse = Page.objects.get(id=20)
    alltag.move(sprachkurse, 'right')

i=0

while True:
    forth()
    back()
    print(f"i = {i}")
    i += 1

…and proceed to happily move pages about in the CMS

PeterNerlich avatar Jan 20 '24 17:01 PeterNerlich

The constraints introduce issues with the repair_tree command as it is not yet build for updating all trees at once in one large transaction. And as Treebeard it will temporarily break the tree structure.

svenseeberg avatar Jan 20 '24 23:01 svenseeberg

mock treebeard.models.Node._get_database_cursor() within a custom atomic decorator (?) / context manager to always return the cursor of the connection of the currently wrapped function

EDIT: attempted here: https://github.com/digitalfabrik/integreat-cms/compare/7bc796c...bugfix/treebeard_transaction

PeterNerlich avatar Jan 20 '24 23:01 PeterNerlich

I had a crack at making tree_mutex be a transaction and inject its db cursor into treebeard, but it doesn't seem fully working yet: even though everything seems to work correctly now as far as I can tell, attempting a concurrent move operation causes one of the constraints to fail. Maybe this could be explained by the cache/db disparity we read about, I'm really not sure. I pushed my changes to a separate branch for now: https://github.com/digitalfabrik/integreat-cms/compare/b40e283...bugfix/treebeard_transaction

When you test this, note that I updated my comment about the test commands above, so that it actually also uses tree_mutex

PeterNerlich avatar Jan 21 '24 20:01 PeterNerlich

Maybe also risk a look at https://github.com/digitalfabrik/integreat-cms/compare/b40e283...bugfix/treebeard_transaction – I rebased it just now onto these latest changes. I didn't test it though. Have a look through the mutex at least, I think I saw a few improvements there (e.g. I moved the uuid for some reason, I think it was critical for the mutex to work correctly)

PeterNerlich avatar Mar 01 '24 19:03 PeterNerlich

Maybe also risk a look at b40e283...bugfix/treebeard_transaction – I rebased it just now onto these latest changes. I didn't test it though. Have a look through the mutex at least, I think I saw a few improvements there (e.g. I moved the uuid for some reason, I think it was critical for the mutex to work correctly)

Do you want to open a PR for that? In general this seems solid. .... edit: https://github.com/digitalfabrik/integreat-cms/pull/2680

svenseeberg avatar Mar 01 '24 20:03 svenseeberg

Ah never mind, I will just include the commit :D

svenseeberg avatar Mar 01 '24 20:03 svenseeberg

@timobrembeck I cannot figure out how to capture the command output in tests/core/management/commands/test_repair_tree.py (btw. ignore tests/cms/models/pages/tree_mutex.py, it is a stub that we just copied, as you can see) – the management command is run successfully and I see its output in pytests Captured stdout call, but apparently using the following to capture it doesn't work. I don't think it makes much sense to use the log fixture for it as we do in other tests, because logically we don't care which way the messages take, only that they eventually end up in the terminal stdout when we call the command.

def get_command_output(command: str, *args: str, **kwargs: Any) -> tuple[str, str]:
    out = StringIO()
    err = StringIO()
    call_command(
        command,
        *args,
        **kwargs,
        stdout=out,
        stderr=err,
    )
    return out.getvalue(), err.getvalue()

Do you have any idea?

PeterNerlich avatar Mar 03 '24 15:03 PeterNerlich

@PeterNerlich No sorry, at the first glance I don't see the reason for this, but I assume it has something to do with the custom Printer class. In my opinion, we should remove it to reduce complexity and use consistent logging behavior in all management commands.

timobrembeck avatar Mar 03 '24 17:03 timobrembeck

Then I guess I'll try that next then. Though if it works as I expect it to, it should be rather transparent. It just hands the strings to the desired function, be it from logging or otherwise.

PeterNerlich avatar Mar 03 '24 17:03 PeterNerlich

Unfortunately, it's still the same

PeterNerlich avatar Mar 03 '24 18:03 PeterNerlich

Unfortunately, it's still the same

As described here, we have to log to the logger integreat_cms.core.management.commands in order to get the logging to stdout because it's defined here: https://github.com/digitalfabrik/integreat-cms/blob/41b995e1304285c2d6853813a3168b672af840db/integreat_cms/core/settings.py#L685-L692

timobrembeck avatar Mar 03 '24 18:03 timobrembeck

That was what I was forgetting.

Forgive me, I was almost done, but then I discovered that the repair tree command would report the tree_id with its already fixed value, which I deemed confusing especially if only run in check mode, and I proceeded to build a fully model instance shadowing helper to record the diff… It is a rather big addition that I hope might be helpful elsewhere, but for now it contains a bit of unneeded code because it just felt like it belonged. Please do give me honest feedback on this.

PeterNerlich avatar Mar 05 '24 16:03 PeterNerlich

Code Climate has analyzed commit ac14338c and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 87.9% (50% is the threshold).

This pull request will bring the total coverage in the repository to 83.4% (0.2% change).

View more on Code Climate.

codeclimate[bot] avatar Mar 06 '24 13:03 codeclimate[bot]

Okay, I had to re-run the CI pipeline a few times until it went through. At least twice I had a nonsensical failure (not finding the region Augsburg for example) – I fear that the introduced mutex might not be completely innocent here, though I'd strongly expect a different result than the object being treated as not existing in the database.

PeterNerlich avatar Mar 06 '24 13:03 PeterNerlich

could this maybe solve #1716?

PeterNerlich avatar Mar 12 '24 14:03 PeterNerlich

@timobrembeck Weird stuff is happening making tests fail and I have no clue why. tests/api/test_api_push_page.py::test_api_push_page_content - assert 404 == 403 Can you have a look at whether this PR is at fault or something else is going on?

PeterNerlich avatar Mar 25 '24 12:03 PeterNerlich

I'd still like to add a slightly more complex test to tests/cms/utils/test_repair_tree.py, but with the two tests for @tree_mutex() I finally think that this is ready to leave the draft status! :tada:

PeterNerlich avatar Mar 29 '24 15:03 PeterNerlich

Tests seemingly were a success this time, ~~but I'm not confident that my last commit regarding that is~~ are not 100% effective. It also needs cleanup for sure and likely should be refactored, even for such a band-aid solution.

PeterNerlich avatar Apr 02 '24 01:04 PeterNerlich

Even if I'm still investigating why the test don't get along as they should, I think the following log from test_tree_mutex is already very indicative of the effectiveness of tree_mutex (especially considering that test_rule_out_false_positive which does exactly the same thing but not using the mutex provokes a collision immediately in all of my observation so far)

test stdout of log messages from the mutex and patched cursor function
starting threads…
running 5-10-5 on contestant #21 with tree_mutex
    [#21] 0
running 5-10-5 on contestant #19 with tree_mutex
    [#19] 0
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (e04f1088-6fd0-492c-bd48-3048ef06bf6c). Waiting 0.1s…
   moving contestant #19…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (e04f1088-6fd0-492c-bd48-3048ef06bf6c). Waiting 0.1s…
OK moving contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.1849055290222168s (e04f1088-6fd0-492c-bd48-3048ef06bf6c)
   moving back contestant #19…
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (7ab36bba-e55a-4892-b50b-9ead0fdf26a7). Waiting 0.1s…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.11504745483398438s (7ab36bba-e55a-4892-b50b-9ead0fdf26a7)
    [#19] 1
   moving contestant #19…
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (1b4e1e43-b8f3-4323-89c7-d4887ba566e9). Waiting 0.1s…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.10081100463867188s (1b4e1e43-b8f3-4323-89c7-d4887ba566e9)
   moving back contestant #19…
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (13bf03cf-20c2-4e6e-8d53-d443419e9ac3). Waiting 0.1s…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (13bf03cf-20c2-4e6e-8d53-d443419e9ac3). Waiting 0.1s…
OK moving back contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.14224028587341309s (13bf03cf-20c2-4e6e-8d53-d443419e9ac3)
    [#19] 2
   moving contestant #19…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (0bc106e0-d09a-40ca-95bc-d7ff90e027cd). Waiting 0.1s…
OK moving contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.15320730209350586s (0bc106e0-d09a-40ca-95bc-d7ff90e027cd)
   moving back contestant #19…
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (062407b2-4e86-4d72-9ea8-987059b19628). Waiting 0.1s…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.12149930000305176s (062407b2-4e86-4d72-9ea8-987059b19628)
    [#19] 3
   moving contestant #19…
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (893a6f38-82ff-449f-a3f8-676d901cca7b). Waiting 0.1s…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.09466361999511719s (893a6f38-82ff-449f-a3f8-676d901cca7b)
   moving back contestant #19…
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (217eea57-26d7-4a00-ada7-ac5ad51fdbe9). Waiting 0.1s…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.11383414268493652s (217eea57-26d7-4a00-ada7-ac5ad51fdbe9)
    [#19] 4
   moving contestant #19…
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (883f0766-1eac-45ad-a5ec-70b68a052280). Waiting 0.1s…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (883f0766-1eac-45ad-a5ec-70b68a052280). Waiting 0.1s…
OK moving contestant #19!
  Releasing MUTEX_PAGE_TREE after 0.21114397048950195s (883f0766-1eac-45ad-a5ec-70b68a052280)
   moving back contestant #19…
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (78c19334-ad1c-4b61-b45b-02139e74ea89). Waiting 0.1s…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #19!
  Failed to acquire MUTEX_PAGE_TREE as 20570cbf-2cd3-43d5-aecf-c1b7092c99c8: MUTEX_page_TREE present (78c19334-ad1c-4b61-b45b-02139e74ea89). Waiting 0.1s…
  Releasing MUTEX_PAGE_TREE after 0.11609148979187012s (78c19334-ad1c-4b61-b45b-02139e74ea89)
finished 5-10-5 of contestant #19 with tree_mutex
   moving contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving contestant #21!
  Releasing MUTEX_PAGE_TREE after 1.5792932510375977s (20570cbf-2cd3-43d5-aecf-c1b7092c99c8)
   moving back contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.11186051368713379s (eaef70a7-b8f1-4a84-9a59-ad880b7f2c21)
    [#21] 1
   moving contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.14497041702270508s (f7c1c7c6-1422-4b10-95ce-a21af933090d)
   moving back contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.20409369468688965s (77765b0e-4d76-4a83-a29e-5a2dfe91beba)
    [#21] 2
   moving contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.15691709518432617s (f4ea6dd6-62ea-4857-b0cf-8ff401d9a794)
   moving back contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.1440751552581787s (ab41fc5e-95fd-4ad4-9809-d5edba714965)
    [#21] 3
   moving contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.1037137508392334s (3e7a11a3-3c02-45f9-bcc4-d2a6d532be3f)
   moving back contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.17049074172973633s (53225650-6062-4861-a995-79b5e137f58d)
    [#21] 4
   moving contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.13613367080688477s (de91fa77-6826-4026-8f23-2f28dd279c5a)
   moving back contestant #21…
someone is getting our monkeypatched db cursor (default)! <class 'integreat_cms.cms.models.pages.page.Page'>, write
OK moving back contestant #21!
  Releasing MUTEX_PAGE_TREE after 0.11805582046508789s (109eb81d-eb78-4129-98bb-83bcc71b2647)
finished 5-10-5 of contestant #21 with tree_mutex
joined threads!

In this case, it seems as soon as one thread released the lock, it acquired it again such that the threads executed after another. This makes sense as we wait a pretty long time (0.1s) before re-trying and the time window between releasing and trying to re-acquiring the mutex is as short as it can be on purpose.

PeterNerlich avatar Apr 04 '24 13:04 PeterNerlich

Wait, we decorated a few functions with @tree_mutex(). But since e.g. to move a page we catch a few exceptions. Doesn't that mean that these never reach @tree_mutex(), meaning that the transaction is committed instead of being rolled back?

PeterNerlich avatar Apr 09 '24 10:04 PeterNerlich

Wait, we decorated a few functions with @tree_mutex(). But since e.g. to move a page we catch a few exceptions. Doesn't that mean that these never reach @tree_mutex(), meaning that the transaction is committed instead of being rolled back?

AFAICT no? Should be easy to test though :D

svenseeberg avatar Jun 24 '24 13:06 svenseeberg

New thought – I think I found a flaw in my thinking which could solve this. Please correct my assumptions:

  1. We can successfully force treebeard to use transactions by overwriting its internal function to get a database cursor to give a prepared cursor that opened a transaction
  2. As long as no other treebeard function is called from other contexts during that call, everything works fine. But if there are concurrent calls to treebeard from contexts that share the memory for the imports, they wrongly also use the transaction – we have to either prevent concurrent calls with a lock or hand out a different db cursor depending on the detected context
  3. Because WSGI spawns individual python processes, we cannot use locks to fence the code or shared data structures to govern which db cursor to hand out
  4. The attempt to use traceback.extract_stack() to discern contexts from each other seems insufficient, as two concurrent calls of the function could possess indistinguishable call stacks
  5. The attempt to use cache.get_or_set() to implement a custom mutex using the cache seems insufficient, as the assumption of get if exists else set action to be atomic proves to be false – there are race conditions where this is overwritten because the cache entry seemed entry, while another function put their value in in the meantime
  6. New: To isolate treebeard calls, we do not need a mutex that works across processes. It should be sufficient to rely on locks within contexts with shared memory, where they are useful, while different processes will have a separate state of treebeard and thus a separate db cursor anyway.

PeterNerlich avatar Jun 24 '24 17:06 PeterNerlich

New thought – I think I found a flaw in my thinking which could solve this. Please correct my assumptions:

  1. We can successfully force treebeard to use transactions by overwriting its internal function to get a database cursor to give a prepared cursor that opened a transaction

:+1:

  1. As long as no other treebeard function is called from other contexts during that call, everything works fine. But if there are concurrent calls to treebeard from contexts that share the memory for the imports, they wrongly also use the transaction – we have to either prevent concurrent calls with a lock or hand out a different db cursor depending on the detected context

When would this happen? It does not sound very likely to me at first glance that this would actually happen. We do not have threading enabled. https://modwsgi.readthedocs.io/en/master/user-guides/processes-and-threading.html states that "concurrent requests [are] being handled in distinct threads executing within those processes".

  1. Because WSGI spawns individual python processes, we cannot use locks to fence the code or shared data structures to govern which db cursor to hand out

:+1:

  1. The attempt to use traceback.extract_stack() to discern contexts from each other seems insufficient, as two concurrent calls of the function could possess indistinguishable call stacks

:+1:

  1. The attempt to use cache.get_or_set() to implement a custom mutex using the cache seems insufficient, as the assumption of get if exists else set action to be atomic proves to be false – there are race conditions where this is overwritten because the cache entry seemed entry, while another function put their value in in the meantime

:+1:

  1. New: To isolate treebeard calls, we do not need a mutex that works across processes. It should be sufficient to rely on locks within contexts with shared memory, where they are useful, while different processes will have a separate state of treebeard and thus a separate db cursor anyway.

This is a totally different root cause than we initially wanted to address? We assumed that have an issue with multiple processes continually rewriting the table while others are doing the same.

svenseeberg avatar Jun 24 '24 17:06 svenseeberg

  1. […] But if there are concurrent calls to treebeard from contexts that share the memory for the imports, they wrongly also use the transaction

When would this happen? […] We do not have threading enabled. […]

Maybe it's my fault for trying to consider these cases at all and making it even more complicated. 2. mainly reflects my mental model of what modifying properties of imports means for us, and I used threading in the tests to provoke and prevent race conditions.

We assumed that have an issue with multiple processes continually rewriting the table while others are doing the same.

Yes, this was the original issue, before all the issues that arose while trying to fix it. The solution is to let the database place make a transaction. Due to our hacky fix it's just difficult to not make everything else use that very transaction in the meantime as well.

PeterNerlich avatar Jun 24 '24 18:06 PeterNerlich

Yes, this was the original issue, before all the issues that arose while trying to fix it. The solution is to let the database place make a transaction. Due to our hacky fix it's just difficult to not make everything else use that very transaction in the meantime as well.

Agreed. But AFAICT this is not an issue, as long as it is the same process that would create the same modifications to the database anyways. So maybe this is even "wanted"?

svenseeberg avatar Jun 24 '24 18:06 svenseeberg

I think I more or less got it to a reasonable state. Again. Except for more git history cleanup and one failing test (→details).

PeterNerlich avatar Jul 21 '24 20:07 PeterNerlich

@svenseeberg We originally started with finding constraints that break when the tree is broken in multiple degrees of freedom, giving just enough leeway for treebeard to do its efficient operations that temporarily break the tree a little. If we managed to make treebeard use transactions, shouldn't we be able to create strict constraints now because they will be deferred to when the treebeard operations are done? (=when our transaction is closed at the end of our context)

PeterNerlich avatar Jul 22 '24 08:07 PeterNerlich