Duplicate edges in Marksweep
I was trying to test PR #310 (mainly src/util/side_metadata/sanity.rs) with marksweep when I detected random panics due to duplicate edges in both JikesRVM and OpenJDK bindings.
This is the assertion which fails:
https://github.com/mmtk/mmtk-core/blob/1fc73b360e128e3ce072538f213b4013cae91b8b/src/util/edge_logger.rs#L26
Note:
After solving this issue, we need to add CI checks for the side metadata sanity checker (under extreme_assertions and with marksweep plan)
I found a source of duplicate edges in marksweep. In the following method in MallocSpace:
pub fn trace_object<T: TransitiveClosure>(
&self,
trace: &mut T,
object: ObjectReference,
) -> ObjectReference {
if object.is_null() {
return object;
}
let address = object.to_address();
assert!(
self.in_space(object),
"Cannot mark an object {} that was not alloced by malloc.",
address,
);
if !is_marked::<VM>(object) {
set_mark_bit::<VM>(object);
trace.process_node(object);
}
object
}
multiple workers may run if !is_marked::<VM>(object) { and stop just before the set_mark_bit::<VM>(object); call.
In that case, all of them will mark the object and call process_node which will ultimately create duplicate edges.
To avoid this, we should replace the if !is_marked with something like the following pseudo-code:
loop {
if is_marked(object) {
break;
} else if cmp_exchange_mark_bit(object, old: 0, new: 1) {
trace.process_node(object);
break;
}
}
However, I'm not sure this is really what we want here.
In the current implementation, I see no reason for using atomic operations in is_marked and set_mark_bit.
Changing these to none atomic operations will still produce duplicate edges, but I don't think it can lead to concurrency bugs, because it only means we may mark some objects multiple times.
In conclusion, it seems to me that duplicate edges are probably Ok here and we can even change the atomic ops to none-atomic ones. However, I'm not 100% sure about this. @qinsoon @steveblackburn , what do you think?
Duplicate edges are fine in a normal non-copying tracing mark.
I think the best thing may be to have an option that controls for this. By default, duplicates are allowed and atomic operations are not necessary on the mark. But with the flag set, duplicates would not exist, at the cost of the atomic operations. This approach serves two purposes: 1. it makes it explicit that we expect duplicates, and why, and 2. it makes the code more general, tolerating the use case where someone can't deal with duplicates (not sure what that use case is, but there may be one, I suppose).
Currently we added a plan-specific flag may_trace_duplicate_edges. For marksweep, it is set to true, and for other plans, it is false. When a plan allows tracing duplicate edges, we simply do not run duplicate edge check for the plan. Though this is not exactly what Steve suggested, this is pretty simple and allowed us to run the binding tests with extreme assertions easily.
We allow duplicate edges for certain plans.