jepsen icon indicating copy to clipboard operation
jepsen copied to clipboard

Is it normal for the postgresql test to get stuck and doesn't finish?

Open winddd opened this issue 4 years ago • 5 comments

I ran the first version of postgresql test, i.e. the snapshot of Add stolon/postgres test. After a series of read and append operations, it has already detected the problem of could not serialize in the terminal. After that, the process got stuck and didn't finish for a long time.

Here is my screenshot of getting stuck:

Is it normal? If not, I guess there are some problems that I haven't dealt with well.

winddd avatar Jun 29 '20 17:06 winddd

What does "a long time" mean here? Longer than the time limit you set for the test?

aphyr avatar Jun 29 '20 17:06 aphyr

What does "a long time" mean here? Longer than the time limit you set for the test?

Yes, it is. The time-limit is 120. The program got stuck for over one hour.

winddd avatar Jun 29 '20 17:06 winddd

Huh. That shouldn't happen! It looks like Jepsen's in the middle of making some requests to the Postgres node, and presumably it's not answering. There should be timeouts here, but maybe they're not working correctly?

aphyr avatar Jun 29 '20 17:06 aphyr

Huh. That shouldn't happen! It looks like Jepsen's in the middle of making some requests to the Postgres node, and presumably it's not answering. There should be timeouts here, but maybe they're not working correctly?

I read the keeper.log of stolon on the db node. It tells "too many clients already". Not sure this is the cause of getting stuck. I used the same command line arguments as the README.md in stolon test.

Also read sentinel.log, it tells the master db is failed, just like you told me. Is it possible for too many clients cause the master db node to fail?

winddd avatar Jun 29 '20 20:06 winddd

Oh! Yeah, that could be a thing. I haven't actually finished the Stolon test--I found bugs in single-node Postgres deployments and worked on those instead, so this test is very much unfinished. I mean, it ran for me once, but that clearly doesn't mean much haha. ;-)

aphyr avatar Jun 30 '20 15:06 aphyr