Logan Attwood

Results 55 comments of Logan Attwood

in trying to kick the broken allocs, I've managed to get Nomad to try sending an interrupt to a pending allocation! edit: had to `killall -9 nomad` on ca11.

I figured out how to make it worse! If I drain the node and mark it as ineligible, then re-enable eligibility, all of the system jobs end up with additional...

Just adding for additional flavour/I found this hilarious-

Pending alloc example with logs from the Nomad Agent. Times are all accurate to each other. ``` root@HOSTNAME:~# TZ=America/Halifax journalctl --unit nomad --since '14 hours ago' | grep -v '\(runner\)'...

More log spelunking- after this shows up in the logs on an agent/client, no further alloc updates occur, and the drain issue with the pending allocs also starts occuring too....

grabbed a goroutine stack dump and found a clue. the same node is blocked here, and was for 1489 minutes, which ends up being just after the "error performing RPC...

@jrasell I found the bug, it was in yamux. PR: https://github.com/hashicorp/yamux/pull/127

I'm suspecting this bug is caused by the whole pending allocations

@jrasell good stuff. i'm likely going to be cutting a 1.7.7-dev or 1.7.8 build with the yamux change and rolling it out on our side today, once i change the...

Just added some more bug/correctness fixes to the PR.