jepsen
jepsen copied to clipboard
Is it normal for the postgresql test to get stuck and doesn't finish?
I ran the first version of postgresql test, i.e. the snapshot of Add stolon/postgres test
. After a series of read and append operations, it has already detected the problem of could not serialize
in the terminal. After that, the process got stuck and didn't finish for a long time.
Here is my screenshot of getting stuck:
Is it normal? If not, I guess there are some problems that I haven't dealt with well.
What does "a long time" mean here? Longer than the time limit you set for the test?
What does "a long time" mean here? Longer than the time limit you set for the test?
Yes, it is. The time-limit is 120. The program got stuck for over one hour.
Huh. That shouldn't happen! It looks like Jepsen's in the middle of making some requests to the Postgres node, and presumably it's not answering. There should be timeouts here, but maybe they're not working correctly?
Huh. That shouldn't happen! It looks like Jepsen's in the middle of making some requests to the Postgres node, and presumably it's not answering. There should be timeouts here, but maybe they're not working correctly?
I read the keeper.log
of stolon on the db node. It tells "too many clients already". Not sure this is the cause of getting stuck. I used the same command line arguments as the README.md
in stolon test.
Also read sentinel.log
, it tells the master db is failed, just like you told me. Is it possible for too many clients cause the master db node to fail?
Oh! Yeah, that could be a thing. I haven't actually finished the Stolon test--I found bugs in single-node Postgres deployments and worked on those instead, so this test is very much unfinished. I mean, it ran for me once, but that clearly doesn't mean much haha. ;-)