cortex icon indicating copy to clipboard operation
cortex copied to clipboard

compactor in CrashLoopBackOff

Open zetaab opened this issue 3 years ago • 3 comments

Describe the bug

We are testing cortex, installed by cortex helm chart. However, we do see compactor in crashloopbackoff and it will exit with code 137

logs:

% kubectl logs cortex-compactor-0
level=info ts=2021-12-08T08:10:41.393284605Z caller=main.go:193 msg="Starting Cortex" version="(version=1.10.0, branch=HEAD, revision=3b9f1c3)"
level=info ts=2021-12-08T08:10:41.393837067Z caller=server.go:239 http=[::]:8080 grpc=[::]:9095 msg="server listening on addresses"
level=info ts=2021-12-08T08:10:41.396343772Z caller=module_service.go:59 msg=initialising module=server
level=info ts=2021-12-08T08:10:41.396484906Z caller=module_service.go:59 msg=initialising module=runtime-config
level=info ts=2021-12-08T08:10:41.396582914Z caller=module_service.go:59 msg=initialising module=memberlist-kv
level=info ts=2021-12-08T08:10:41.397384967Z caller=module_service.go:59 msg=initialising module=compactor
level=info ts=2021-12-08T08:10:41.397969376Z caller=blocks_cleaner.go:139 component=cleaner msg="started blocks cleanup and maintenance"
level=info ts=2021-12-08T08:10:41.806359136Z caller=blocks_cleaner.go:294 component=cleaner org_id=rofa.txt msg="started blocks cleanup and maintenance"
level=info ts=2021-12-08T08:10:41.807032084Z caller=blocks_cleaner.go:294 component=cleaner org_id=hypa msg="started blocks cleanup and maintenance"
level=info ts=2021-12-08T08:10:41.807060498Z caller=blocks_cleaner.go:294 component=cleaner org_id=oskunsecretorg msg="started blocks cleanup and maintenance"
level=info ts=2021-12-08T08:10:41.807324551Z caller=blocks_cleaner.go:294 component=cleaner org_id=rofa.txt msg="started blocks cleanup and maintenance"
level=info ts=2021-12-08T08:10:41.807870307Z caller=blocks_cleaner.go:294 component=cleaner org_id=rofa3.k8s.local msg="started blocks cleanup and maintenance"
level=info ts=2021-12-08T08:10:41.807884017Z caller=blocks_cleaner.go:294 component=cleaner org_id=rofa33 msg="started blocks cleanup and maintenance"
level=info ts=2021-12-08T08:10:41.80815235Z caller=blocks_cleaner.go:294 component=cleaner org_id=spilo msg="started blocks cleanup and maintenance"
level=info ts=2021-12-08T08:10:41.808196681Z caller=blocks_cleaner.go:294 component=cleaner org_id=sre-sandbox.k8s.local msg="started blocks cleanup and maintenance"
level=info ts=2021-12-08T08:10:41.808459107Z caller=blocks_cleaner.go:294 component=cleaner org_id=sre-sandbox msg="started blocks cleanup and maintenance"
level=info ts=2021-12-08T08:10:41.880312152Z caller=blocks_cleaner.go:308 component=cleaner org_id=hypa msg="migrated block deletion marks to the global markers location"
level=info ts=2021-12-08T08:10:41.908743333Z caller=blocks_cleaner.go:308 component=cleaner org_id=rofa.txt msg="migrated block deletion marks to the global markers location"
level=info ts=2021-12-08T08:10:41.911249199Z caller=blocks_cleaner.go:308 component=cleaner org_id=spilo msg="migrated block deletion marks to the global markers location"
level=warn ts=2021-12-08T08:10:41.911944552Z caller=s3.go:447 msg="could not guess file size for multipart upload; upload might be not optimized" name=rofa3.k8s.local/markers/01FPAMR790SWP2DFT5TWQJR1TD-deletion-mark.json err="unsupported type of io.Reader: *objstore.timingReadCloser"
level=warn ts=2021-12-08T08:10:41.917676375Z caller=s3.go:447 msg="could not guess file size for multipart upload; upload might be not optimized" name=sre-sandbox/markers/01FPAMR790STZ71D2RM26WG9VW-deletion-mark.json err="unsupported type of io.Reader: *objstore.timingReadCloser"
level=info ts=2021-12-08T08:10:41.920692739Z caller=blocks_cleaner.go:308 component=cleaner org_id=rofa.txt msg="migrated block deletion marks to the global markers location"
level=warn ts=2021-12-08T08:10:41.979575584Z caller=s3.go:447 msg="could not guess file size for multipart upload; upload might be not optimized" name=sre-sandbox.k8s.local/markers/01FPAMR7922PZ8YZ92K0C518GQ-deletion-mark.json err="unsupported type of io.Reader: *objstore.timingReadCloser"
level=warn ts=2021-12-08T08:10:42.388213901Z caller=s3.go:447 msg="could not guess file size for multipart upload; upload might be not optimized" name=oskunsecretorg/markers/01FPAMR792FHYNY9ANAGSSR2C8-deletion-mark.json err="unsupported type of io.Reader: *objstore.timingReadCloser"
level=warn ts=2021-12-08T08:10:42.388619705Z caller=s3.go:447 msg="could not guess file size for multipart upload; upload might be not optimized" name=rofa33/markers/01FPAMR790YTF7KMH2HSVMGQW4-deletion-mark.json err="unsupported type of io.Reader: *objstore.timingReadCloser"

Any tips how to make it work? Should I delete the broken files or?

To Reproduce Steps to reproduce the behavior:

  1. Start Cortex (SHA or version) see logs above
  2. Perform Operations(Read/Write/Others)

Expected behavior

Environment:

  • Infrastructure: kubernetes
  • Deployment tool: helm

Storage Engine

  • [x] Blocks
  • [ ] Chunks

Additional Context

zetaab avatar Dec 08 '21 08:12 zetaab

You'd need to identify the cause of compactor crash, in my case this was caused by OOMKilled when pod reached its memory limit. Increasing the memory limit is not practical, unless you have lots of room on the host. Instead lower compactor's cleanup_concurrency to 1 (default is 20). That will help keep the memory footprint low by loading one tenant's blocks at a time.

danfromtitan avatar Mar 11 '22 14:03 danfromtitan

Also this might be another settings to look at: https://cortexmetrics.io/docs/blocks-storage/production-tips/#ensure-deletion-marks-migration-is-disabled-after-first-run

danfromtitan avatar Mar 11 '22 15:03 danfromtitan

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 12 '22 11:06 stale[bot]