HVM
HVM copied to clipboard
Simple program sometimes freezes, sometimes runs correctly
I made a program that calculates Fibonacci numbers. When running it with the interpreter, I can't get it to work on any nontrivial inputs (N > 2).
On the other hand, when running the compiled version, it sometimes works, sometimes freezes. When it works, it gives the correct answer (up to integer overflow, anyway). The larger I make N, the more likely it is to freeze.
When it freezes, the cpu usage seems to be pinned at 200%. When it works, it uses just above 100% cpu. Example:
❯ time ./main.out 1000
Rewrites: 17048 (0.45 MR/s).
Mem.Size: 12747 words.
(Some 1556111435)
./main.out 1000 0.02s user 0.04s system 101% cpu 0.060 total
❯ time ./main.out 1000
^C
./main.out 1000 8.78s user 0.02s system 199% cpu 4.417 total
(I quit early the second time, since it was frozen)
Source code is provided below.
// List type
(Head (Cons x xs)) = (Some x)
(Head Nil) = None
// Something like Rust's Option.unwrap_or
(UnwrapOr (Some x) y) = x
(UnwrapOr None y) = y
(SumFirstTwo Nil) = 0
(SumFirstTwo (Cons x xs)) = (+ x (UnwrapOr (Head xs) 0))
// Iterate a function n times on an input
(Iter f x 0) = x
(Iter f x n) = (f (Iter f x (- n 1)))
(Fib n) = (Head (Iter
λxs (Cons (SumFirstTwo xs) xs)
(Cons 0 (Cons 1 Nil))
n
))
(Main n) = (Fib n)
Seems similar to issue #116
this is the 1000th fib number (like you asked HVM to compute in your question):
43466557686937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875
I seriously doubt it can fit in a normal int, so perhaps that has something to do with the compiled version freezing and giving incorrect results. Does it also freeze for smaller inputs? As for the interpreted version not even calculating the third fib number yea that is clearly a bug.
Re:
I seriously doubt it can fit in a normal int, so perhaps that has something to do with the compiled version freezing and giving incorrect results.
If I run it a large number of times, it freezes on inputs as small as 20. Also, running it in batches of 100 revealed some other weird things, including segfaults as well as whatever this is (N=47):
./a.out 47 0.06s user 0.04s system 97% cpu 0.097 total
Rewrites: 111 (0.00 MR/s).
Mem.Size: 136 words.
(Some ((((((((SumFirstTwo x18446744073709551615)+(SumFirstTwo x18446744073709551615))+(SumFirstTwo x18446744073709551615))+((SumFirstTwo x18446744073709551615)+(SumFirstTwo x18446744073709551615)))+(((SumFirstTwo x18446744073709551615)+(SumFirstTwo x18446744073709551615))+(SumFirstTwo x18446744073709551615)))+((((SumFirstTwo x18446744073709551615)+(SumFirstTwo x18446744073709551615))+(SumFirstTwo x18446744073709551615))+((SumFirstTwo x18446744073709551615)+(SumFirstTwo x18446744073709551615))))+(((((SumFirstTwo x18446744073709551615)+(SumFirstTwo x18446744073709551615))+(SumFirstTwo x18446744073709551615))+((SumFirstTwo x18446744073709551615)+(SumFirstTwo x18446744073709551615)))+(((SumFirstTwo x18446744073709551615)+(SumFirstTwo x18446744073709551615))+(SumFirstTwo x18446744073709551615))))+((((((SumFirstTwo x18446744073709551615)+(SumFirstTwo x18446744073709551615))+(SumFirstTwo x18446744073709551615))+((SumFirstTwo x18446744073709551615)+(SumFirstTwo x18446744073709551615)))+(((SumFirstTwo x18446744073709551615)+(SumFirstTwo x18446744073709551615))+(SumFirstTwo x18446744073709551615)))+((((SumFirstTwo x18446744073709551615)+(SumFirstTwo x18446744073709551615))+(SumFirstTwo x18446744073709551615))+((SumFirstTwo x18446744073709551615)+(SumFirstTwo x18446744073709551615))))))
Does this happen on single thread? hvm c file.hvm --single-thread
I compiled the code with --single-thread
as you said, and I ran it as many times as I could in 10 minutes with arguments of varying size. As far as I can tell, the freezing, segfaults, and wrong answers seem to be fixed in the single-threaded version.
@VictorTaelin Do you still have an option for another approach in your mind?
A lambda calculus for the HVM https://github.com/Kindelia/HVM/discussions/97
Reduced example deadlock:
λx
dup q0 q1 = x
dup p0 p1 = q1
(Pair p0 (+ p1 q0))
A B
| |
| (+) __
| | | | |
~~|~~' | | |
| | | | |
--- --- |
\P/ \Q/ |
| | |
| x |
|----------|
Thread A starts, locks P.
Thread B starts, passes by (+), locks Q.
- Nothing to reduce, so goes back to (+).
- Goes to P, that is locked. Waits.
Thread A continues, reaches Q that is locked. Waits.
Deadlock.
A solution may be to release the lock when a thread leaves a node. The problem is that another thread could then reduce a fan node which this thread has passed through. As such, when it goes back to that position, it would be in an invalid state. To solve that, we could, rather than collecting the fan node upon reduction, leave a leftover signaling that this position was altered. When the other thread reaches it, it clears up the leftover and travels back 1 step.
Edit: actually, this could be avoided by not letting the thread to "teleport back" to the (+)
operator after one of the sides doesn't reduce. That is, we should change how strictness work to match inets. Right now, HVM will try to reduce all strict arguments, in an arbitrary order. For example, it will try to reduce the second argument of λx (+ x y)
, even though x
is a variable and is stuck. That breaks interaction net confluence and, it seems, complicates the parallelism here. If we instead just reduce x
first, and only reduce y
once x
is a number, then it will reduce exactly like inets, and this issue would be solved, I believe. To test that hypothesis, I've attempted to fix @suremarc's code by replacing +
by an Add
function that is strict on the first argument only, and it seems to work:
// List type
(Head (Cons x xs)) = (Some x)
(Head Nil) = None
// Something like Rust's Option.unwrap_or
(UnwrapOr (Some x) y) = x
(UnwrapOr None y) = y
// Like (+), but strict on the first argument
(Add 0 b) = b
(Add n b) = (+ n b)
(SumFirstTwo Nil) = 0
(SumFirstTwo (Cons x xs)) = (Add x (UnwrapOr (Head xs) 0))
// Iterate a function n times on an input
(Iter f x 0) = x
(Iter f x n) = (f (Iter f x (- n 1)))
(Fib n) = (Head (Iter
λxs (Cons (SumFirstTwo xs) xs)
(Cons 0 (Cons 1 Nil))
n
))
(Main n) = (Fib 100000)
So, if I'm right, there is no need for leftovers or anything complex. Just stop the walk as soon as there is a stuck redex. The only issue is that doing this would be a little laborious as it'd require splitting the function compiler in N phases, 1 for each strict argument. I'll leave that for another day.
This was finally fixed on the https://github.com/Kindelia/HVM/tree/parallel_fix branch.
The issue was that, while it is true that parallelism on HVM is greatly simplified by the fact structures can only be referenced by one thread at the same time, allowing threads to perform parallel reductions with no need for synchronization, there are two exceptions for that:
-
Dup nodes: these can be entered by two threads at the same times, which would cause both threads to access the same redexes. That can be avoided with a lock, and that was already present.
-
Substitution: the
[x <- val]
operation, i.e., substitution, could overwrite data owned by another thread, since the variable does not necessarily occur in a term owned by this thread. That is the issue I've overseen. To fix this, we need subst to happen in two steps, and be implemented with atomic loads and stores.
I've spent a lot of time debugging it. The solution on the parallel_fix does seem to reduce arbitrarily large inputs soundly. It does reduce performance by ~4x though, possibly due to a spinlock and the double-step subst change. There are ways to optimize this, which I'll work on later before merging on master.
Fixes merged on master, together with a greatly improved parallel runtime! It should work fine now. Let me know otherwise.