immer icon indicating copy to clipboard operation
immer copied to clipboard

slightly tweak implementation of empty collection nodes

Open dwightguth opened this issue 4 years ago • 12 comments

This change removes the const qualifier from the static objects containing empty rrbtree and champ nodes, and also modifies the methods slightly so that they return a reference. This change makes it possible for these types to be used as part of a garbage-collected runtime which makes use of a copying collector, because if the user specifies a heap_policy which allocates these nodes into a heap that needs to be relocated during garbage collection, the garbage collector will need to update the static pointer inside these methods in order to correctly relocate these nodes.

dwightguth avatar Oct 15 '21 19:10 dwightguth

Codecov Report

Merging #189 (2e754b9) into master (b58466e) will increase coverage by 0.00%. The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #189   +/-   ##
=======================================
  Coverage   90.89%   90.90%           
=======================================
  Files         104      104           
  Lines        9164     9167    +3     
=======================================
+ Hits         8330     8333    +3     
  Misses        834      834           
Impacted Files Coverage Δ
immer/detail/hamts/champ.hpp 90.59% <100.00%> (+0.04%) :arrow_up:
immer/detail/rbts/rrbtree.hpp 88.57% <100.00%> (+0.03%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update b58466e...2e754b9. Read the comment docs.

codecov-commenter avatar Oct 15 '21 19:10 codecov-commenter

Interesting. Do you have an example of such runtime?

arximboldi avatar Oct 15 '21 20:10 arximboldi

We make use of immer in our compiler's runtime. The compiler is for a programming language called K (https://github.com/kframework/k) and its backend translates K code into LLVM code (https://github.com/kframework/llvm-backend). The built-in implementation of maps, lists, and sets in K are implemented using immer::map, immer::flex_vector, and immer::set respectively. The runtime uses a generational copying collector, so we need to be able to relocate anything that is allocated into the garbage-collected heap, which includes internal nodes to immer collections because we make use of a custom heap_policy. Immer uses this heap policy to allocate empty nodes, so we need to be able to relocate these nodes when garbage collection happens, otherwise they will point into the wrong semispace of the garbage collector and thus be pointing to invalid memory.

As a result, we need to be able to modify the static pointer inside the rrbtree::empty_root, rrbtree::empty_tail, and champ::empty functions. We were using an older version of immer before (circa June 2020) in which the corresponding functions returned references, so we just took the address of the reference and then const_casted them, and that suited our purposes. However, in the most recent version of immer, these functions don't return references, so we are unable to get the actual pointer to the static variable, which means that we're unable to adjust that pointer to point into the correct semispace, which means that future calls to these functions after a collection cycle will introduce corrupted memory.

In general, any relocating garbage-collected runtime needs to be able to adjust pointers stored in static variables to point to the new location of those objects on the heap whenever the objects they point to are relocated. This PR will make that possible in a way that it's not currently.

dwightguth avatar Oct 18 '21 19:10 dwightguth

Feel free to take a look at the associated PR (https://github.com/kframework/llvm-backend/pull/438) to see how this is used in practice. Right now we have solved the issue by making a fork of your repository, but for obvious reasons this is not ideal long-term.

dwightguth avatar Oct 18 '21 19:10 dwightguth

I'm not entirely sure why the tests that failed failed. They don't seem to relate to the change that I made. Is there anything you want me to do about them?

dwightguth avatar Oct 18 '21 19:10 dwightguth

It's failing because of this: https://github.com/arximboldi/immer/pull/191 I will merge in a sec and things should become green again.

arximboldi avatar Oct 19 '21 10:10 arximboldi

@arximboldi looks like you still have another issue with the GitHub action for gnu-9 Release. Something about an ssh key being missing it looks like. That appears to be the only check still currently failing, although llvm-10 Release was canceled early for some reason.

dwightguth avatar Oct 23 '21 19:10 dwightguth

@arximboldi any update on this?

dwightguth avatar Nov 03 '21 16:11 dwightguth

Hi @dwightguth!

I think this failed because the "pull request" check does not expose my SSH key inside the action, which is needed to upload those benchmark results. This is for security reasons (you could make a malicious PR that just sends the ssh key to you :smile: )Then the other check was simily cancelled as soon as the main one failed. I am gonna make a new PR to fix this, so that part of the action, which is superfluous in this context, does not attempt to run at all for PR's.

On the other hand: is it ok if I wait to merge this until you clean up #190 along the lines outlined in the comment there? Maybe in the process of introducing the new policies you find an abstraction that would cover this case as well?

Also: thanks again for all this effort in integrating the library in your system. It is definitelly a super interesting use-case!

arximboldi avatar Nov 05 '21 11:11 arximboldi

Sure, I'm in no rush to get these merged as I already have a working version in our fork of the repository that will suffice for the time being. I will go ahead and work on creating the memory policy in the other PR and we can definitely merge that one first.

dwightguth avatar Nov 05 '21 17:11 dwightguth

@arximboldi I originally thought I could modify the PR to take advantage of the template parameter introduced in #190 in order to only expose a reference to these objects if the template parameter was enabled. However, there seems to be a problem. Namely, right now, the pointer itself is a static variable at block scope. In order to do what I just described, I would have to make it be a private static data member. When I tried this, though, it ended up creating a regression in test/flex_vector-issue-47, presumably because it reintroduced the static initialization order woes that you had had previously.

I think the best solution for now is simply to leave this PR as is, even though it does expose a reference to the static allocation for the empty nodes. I think this is probably fine since it was exposed before issue #47 was fixed anyway. Let me know if you want me to keep trying though. I can dig into the initialization order stuff and see if I can work around it if you would like.

dwightguth avatar Nov 08 '21 22:11 dwightguth

btw, as I commented on the other PR, it seems that your SSH key issue is not completely resolved yet.

dwightguth avatar Nov 08 '21 22:11 dwightguth