scylla-bench
scylla-bench copied to clipboard
panic: runtime error: invalid memory address or nil pointer dereference
Installation details
Kernel Version: 5.15.0-1019-aws
Scylla version (or git commit hash): 5.0.3-20220907.b9a61c8e9
with build-id 7be266d2954825cdf843c744de04a0443a8f156c
Relocatable Package: http://downloads.scylladb.com/downloads/scylla/relocatable/scylladb-5.0/scylla-x86_64-package-5.0.3.0.20220907.b9a61c8e9.tar.gz
Cluster size: 4 nodes (i3en.3xlarge)
Scylla Nodes used in this run:
- longevity-large-partitions-4d-scyll-db-node-e83f2364-4 (13.51.47.122 | 10.0.3.49) (shards: 12)
- longevity-large-partitions-4d-scyll-db-node-e83f2364-3 (16.16.25.148 | 10.0.3.35) (shards: 12)
- longevity-large-partitions-4d-scyll-db-node-e83f2364-2 (16.170.221.203 | 10.0.2.203) (shards: 12)
- longevity-large-partitions-4d-scyll-db-node-e83f2364-1 (13.49.225.5 | 10.0.0.73) (shards: 12)
OS / Image: ami-03fc0de751a0b3314
(aws: eu-north-1)
Test: longevity-large-partition-4days-test-rq
Test id: e83f2364-eaf5-4e99-8c28-433f76c2a24e
Test name: scylla-staging/Longevity_yaron/longevity-large-partition-4days-test-rq
Test config file(s):
Issue description
>>>>>>>
- Started 3 read stress (ASC, DESC, ASC/DESC/None) that ran "ok" for 10 hours.
- Throughput was (relatively low): 5k
- after ~ 8.5 hours the stress stared getting quorum timeouts.
- after 10 hours one of the stress (ASC) failed for that and loader got panic as well.
invalid memory address or nil pointer dereference
error from s-b log:
yarongilor@yarongilor:~/Downloads/logs/loader-set-e83f2364$ tail scylla-bench-l0-34379347-f65c-4327-808c-554732ff449c.log -n 40
10h32m19.7s 7718 77180 0 1.9s 1.5s 706ms 495ms 260ms 3.8ms 65ms
panic: runtime error: invalid memory address or nil pointer dereference
10h32m20.8s 7616 76160 0 1.9s 1.6s 703ms 502ms 287ms 3.9ms 67ms
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x5fb33c]
goroutine 1 [running]:
github.com/HdrHistogram/hdrhistogram-go.(*iterator).next(0xc000171448)
/go/pkg/mod/github.com/!hdr!histogram/[email protected]/hdr.go:670 +0x1c
github.com/HdrHistogram/hdrhistogram-go.(*rIterator).next(...)
/go/pkg/mod/github.com/!hdr!histogram/[email protected]/hdr.go:683
github.com/HdrHistogram/hdrhistogram-go.(*Histogram).Merge(0xf0000000e?, 0x4000000000a?)
/go/pkg/mod/github.com/!hdr!histogram/[email protected]/hdr.go:177 +0x8d
github.com/scylladb/scylla-bench/pkg/results.(*MergedResult).AddResult(0xc2abffef60, {0x0, 0x0, 0x0, 0x0, 0x0, {0x0, 0x0, 0x0}, 0x0, ...})
/go/scylla-bench-0.1.11/pkg/results/merged_result.go:53 +0x1b0
github.com/scylladb/scylla-bench/pkg/results.(*TestResults).GetResultsFromThreadsAndMerge(0xc000691380)
/go/scylla-bench-0.1.11/pkg/results/thread_results.go:60 +0x89
github.com/scylladb/scylla-bench/pkg/results.(*TestResults).GetTotalResults(0xc000691380)
/go/scylla-bench-0.1.11/pkg/results/thread_results.go:82 +0xcc
main.main()
/go/scylla-bench-0.1.11/main.go:596 +0x355d
<<<<<<<
- Restore Monitor Stack command:
$ hydra investigate show-monitor e83f2364-eaf5-4e99-8c28-433f76c2a24e
- Restore monitor on AWS instance using Jenkins job
- Show all stored logs command:
$ hydra investigate show-logs e83f2364-eaf5-4e99-8c28-433f76c2a24e
Logs:
No logs captured during this run.
@vponomaryov do you see what's causing this issue?
@vponomaryov do you see what's causing this issue?
General info: Such errors appear when we try to use not initialized go-object. It may be caused either by a race condition or due to an unhandled error.
I haven't worked on the investigation of it to see
the cause.
Seems like when working with the reverse-query feature, after few hours of run we hit this issue.
@yarongilor I created PR with possible fix for it here: https://github.com/scylladb/scylla-bench/pull/109 It may fix this issue, not guaranteed. Need to test it.
Upd:
Created docker image with it here: vponomarovatscylladb/hydra-loaders:scylla-bench-v0.1.12--fix-issue-107-candidate
So, just update your configuration with it.
Hit it once again using the same config file (changed for some extent since then) for the scylla-bench.
Installation details
Kernel Version: 5.15.0-1030-aws
Scylla version (or git commit hash): 5.2.0~rc2-20230228.908a82bea064
with build-id 2d8e1ab089ec69c36323037d66b1a72accfae399
Cluster size: 4 nodes (is4gen.4xlarge)
Scylla Nodes used in this run:
- longevity-large-partitions-4d-5-2-db-node-c3260702-4 (34.245.121.241 | 10.4.1.65) (shards: 15)
- longevity-large-partitions-4d-5-2-db-node-c3260702-3 (52.211.199.21 | 10.4.3.139) (shards: 15)
- longevity-large-partitions-4d-5-2-db-node-c3260702-2 (52.48.75.186 | 10.4.0.214) (shards: 15)
- longevity-large-partitions-4d-5-2-db-node-c3260702-1 (34.244.173.7 | 10.4.2.232) (shards: 15)
OS / Image: ami-074d26a74b8f73dba
(aws: eu-west-1)
Test: longevity-large-partition-4days-arm-test
Test id: c3260702-5b50-4389-8303-7464c8d5e384
Test name: scylla-5.2/longevity/longevity-large-partition-4days-arm-test
Test config file(s):
Details:
It had 3 loaders.
Pre-load finished without errors.
Then, the main read
stress commands failed on 2 loaders from 3. One of the loader failures is the same as in this bugreport:
2023/03/02 22:13:31 Operation timed out for scylla_bench.test - received only 1 responses from 2 CL=QUORUM.
2023/03/02 22:13:31 Operation timed out for scylla_bench.test - received only 1 responses from 2 CL=QUORUM.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x5fc7dc]
goroutine 1 [running]:
github.com/HdrHistogram/hdrhistogram-go.(*iterator).next(0xc000119038)
/go/pkg/mod/github.com/!hdr!histogram/[email protected]/hdr.go:670 +0x1c
github.com/HdrHistogram/hdrhistogram-go.(*rIterator).next(...)
/go/pkg/mod/github.com/!hdr!histogram/[email protected]/hdr.go:683
github.com/HdrHistogram/hdrhistogram-go.(*Histogram).Merge(0xf0000000e?, 0x4000000000a?)
/go/pkg/mod/github.com/!hdr!histogram/[email protected]/hdr.go:177 +0x8d
github.com/scylladb/scylla-bench/pkg/results.(*MergedResult).AddResult(0xc1e0defb60, {0x0, 0x0, 0x0, 0x0, 0x0, {0x0, 0x0, 0x0}, 0x0, ...})
/go/scylla-bench-0.1.15/pkg/results/merged_result.go:53 +0x1b0
github.com/scylladb/scylla-bench/pkg/results.(*TestResults).GetResultsFromThreadsAndMerge(0xc000413b80)
/go/scylla-bench-0.1.15/pkg/results/thread_results.go:60 +0x89
github.com/scylladb/scylla-bench/pkg/results.(*TestResults).GetTotalResults(0xc000413b80)
/go/scylla-bench-0.1.15/pkg/results/thread_results.go:82 +0xcc
main.main()
/go/scylla-bench-0.1.15/main.go:631 +0x39bd
It failed after 35 minutes of running.
- Restore Monitor Stack command:
$ hydra investigate show-monitor c3260702-5b50-4389-8303-7464c8d5e384
- Restore monitor on AWS instance using Jenkins job
- Show all stored logs command:
$ hydra investigate show-logs c3260702-5b50-4389-8303-7464c8d5e384
Logs:
- db-cluster-c3260702.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/c3260702-5b50-4389-8303-7464c8d5e384/20230302_225712/db-cluster-c3260702.tar.gz
- sct-runner-c3260702.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/c3260702-5b50-4389-8303-7464c8d5e384/20230302_225712/sct-runner-c3260702.tar.gz
- monitor-set-c3260702.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/c3260702-5b50-4389-8303-7464c8d5e384/20230302_225712/monitor-set-c3260702.tar.gz
- loader-set-c3260702.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/c3260702-5b50-4389-8303-7464c8d5e384/20230302_225712/loader-set-c3260702.tar.gz
@roydahan @fgelcer @fruch JFYI: the proposed fix in the https://github.com/scylladb/scylla-bench/pull/109 haven't had any attention since October 2022.
@roydahan @fgelcer @fruch JFYI: the proposed fix in the https://github.com/scylladb/scylla-bench/pull/109 haven't had any attention since October 2022.
I assumed it was a side effect of running out of memory, but we can get it merged, regardless.