Manticore server crashes with signal 11
Bug Description:
Hi Manticore folks, We have a manticore pod in k8s with about 2.5M records. We are using 6.3.8 version, We have been getting server crashed and k8s restart weekly. Setup is not a cluster, single pod that has generous resource, usually utilize only about 5-10% cpu and less than 10% ram.
cat /etc/manticoresearch/manticore.conf
searchd {
listen = 127.0.0.1:9312
listen = 127.0.0.1:9306:mysql
listen = 127.0.0.1:9308:http
log = /var/log/manticore/searchd.log
query_log = /var/log/manticore/query.log
pid_file = /var/run/manticore/searchd.pid
data_dir = /var/lib/manticore
}
rt: table qmdocs: ramchunk saved ok (mode=periodic, last TID=6288347, current TID=6311108, ram=50.427 Mb, time delta=36000 sec, took=0.036 sec)
[BUDDY] Fatal error: Uncaught Manticoresearch\Buddy\Core\Error\ManticoreSearchClientError: Error while async request: 104: Connection reset by peer in /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/ManticoreSearch/Client.php:175
[BUDDY] Stack trace:
[BUDDY] #0 /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/ManticoreSearch/Client.php(142): Manticoresearch\Buddy\Core\ManticoreSearch\Client->runAsyncRequest('sql?mode=raw', 'query=SHOW+VARI...', Array)
[BUDDY] #1 /usr/share/manticore/modules/manticore-buddy/src/Lib/Metric.php(551): Manticoresearch\Buddy\Core\ManticoreSearch\Client->sendRequest('query=SHOW+VARI...')
[BUDDY] #2 /usr/share/manticore/modules/manticore-buddy/src/Lib/Metric.php(488): Manticoresearch\Buddy\Base\Lib\Metric->sendManticoreRequest('SHOW VARIABLES')
[BUDDY] #3 /usr/share/manticore/modules/manticore-buddy/src/Lib/Metric.php(250): Manticoresearch\Buddy\Base\Lib\Metric->getVariableLabels()
[BUDDY] #4 /usr/share/manticore/modules/manticore-buddy/src/Lib/Metric.php(203): Manticoresearch\Buddy\Base\Lib\Metric->snapshot()
[BUDDY] #5 /usr/share/manticore/modules/manticore-buddy/src/Lib/MetricThread.php(103): Manticoresearch\Buddy\Base\Lib\Metric->checkAndSnapshot(300)
[BUDDY] #6 [internal function]: Manticoresearch\Buddy\Base\Lib\MetricThread::Manticoresearch\Buddy\Base\Lib\{closure}(Object(Swoole\Process))
[BUDDY] #7 {main}
[BUDDY] thrown in /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/ManticoreSearch/Client.php on line 175
[BUDDY] [2025-04-07 03:18:07 $142.0] WARNING Server::check_worker_exit_status(): worker(pid=228, id=4) abnormal exit, status=255, signal=0
rt: table qmdocs: ramchunk saved ok (mode=periodic, last TID=6311108, current TID=6326095, ram=53.151 Mb, time delta=36000 sec, took=0.036 sec)
[BUDDY] Fatal error: Uncaught Manticoresearch\Buddy\Core\Error\ManticoreSearchClientError: Error while async request: 104: Connection reset by peer in /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/ManticoreSearch/Client.php:175
[BUDDY] Stack trace:
[BUDDY] #0 /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/ManticoreSearch/Client.php(142): Manticoresearch\Buddy\Core\ManticoreSearch\Client->runAsyncRequest('sql?mode=raw', 'query=SHOW+VARI...', Array)
[BUDDY] #1 /usr/share/manticore/modules/manticore-buddy/src/Lib/Metric.php(551): Manticoresearch\Buddy\Core\ManticoreSearch\Client->sendRequest('query=SHOW+VARI...')
[BUDDY] #2 /usr/share/manticore/modules/manticore-buddy/src/Lib/Metric.php(488): Manticoresearch\Buddy\Base\Lib\Metric->sendManticoreRequest('SHOW VARIABLES')
[BUDDY] #3 /usr/share/manticore/modules/manticore-buddy/src/Lib/Metric.php(250): Manticoresearch\Buddy\Base\Lib\Metric->getVariableLabels()
[BUDDY] #4 /usr/share/manticore/modules/manticore-buddy/src/Lib/Metric.php(203): Manticoresearch\Buddy\Base\Lib\Metric->snapshot()
[BUDDY] #5 /usr/share/manticore/modules/manticore-buddy/src/Lib/MetricThread.php(103): Manticoresearch\Buddy\Base\Lib\Metric->checkAndSnapshot(300)
[BUDDY] #6 [internal function]: Manticoresearch\Buddy\Base\Lib\MetricThread::Manticoresearch\Buddy\Base\Lib\{closure}(Object(Swoole\Process))
[BUDDY] #7 {main}
[BUDDY] thrown in /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/ManticoreSearch/Client.php on line 175
[BUDDY] [2025-04-07 12:23:17 $142.0] WARNING Server::check_worker_exit_status(): worker(pid=235, id=4) abnormal exit, status=255, signal=0
Crash!!! Handling signal 11
1900K .......... .......... .......... .......... ........ 100% 308M=0.1s[ec2-user@ip-172-31-15-174 ~]$
I see in the log:
Connection reset by peer in /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/ManticoreSearch/Client.php:175
We get this err too while using the search engine thru http client:
Post "http://qm-search-service.default.svc.cluster.local:9308/delete": read tcp 192.168.108.11:58500->10.100.87.79:9308: read: connection reset by peer
We get it once in a while with 6.3, when I upgraded to 7.xxx due to sign abrt issue in 6.3 above this connection reset was all over so I was forced to downgrade, but sigabort issue is really giving us hard time to diagnose and fix it.
Attached the server log: manticore-log.txt
Manticore Search Version:
6.3.8
Operating System Version:
Linux amz2
Have you tried the latest development version?
No
Internal Checklist:
To be completed by the assignee. Check off tasks that have been completed or are not applicable.
- [ ] Implementation completed
- [ ] Tests developed
- [ ] Documentation updated
- [ ] Documentation reviewed
- [ ] Changelog updated
it is hard to figure out the root of the crash just by this line
Crash!!! Handling signal 11
it could be better to reproduce issue without the kuber or allow daemon save its crash log prior to restart container
this is the crash I got, std out log.
could you start daemon as described at the manual https://manual.manticoresearch.com/dev/Reporting_bugs#How-to-enable-saving-coredumps-on-crash?
with the --coredump CLI ?
I am not sure if I can do it, this is running in a k8s env and is started as a pod. Maybe I can pass a cli when I start the manticore from its docker image? What would the cli look like?
you could set env var in your pod as manual shows https://manual.manticoresearch.com/dev/Reporting_bugs#What-do-I-do-when-Manticore-Search-hangs?
[root@srv lib]# systemctl set-environment _ADDITIONAL_SEARCHD_PARAMS='--coredump'
[root@srv lib]# systemctl restart manticore
dev team also said there should be container left after the crash there you could get in and provide searchd.log from the /var/log/manticore path