explore capturing of librbd logs when using go-ceph apis
Describe the feature you'd like to have
We need to explore how we can capture librbd logs when using go-ceph apis to execute ceph rbd commands. This would allow us to gather more details related to rbd commands and help in smoother debugging in case of any issues.
What is the value to the end user? (why is it a priority?)
With the librbd logs available we would be able to pinpoint issues in ceph rbd easily by backtracking the logs and helping the ceph rbd team to identify the issues easily.
its possible today if we create the configmap like below
ceph.conf: |
[global]
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
rbd_validate_pool = false
log_to_stderr = true
debug_rbd = 20
debug_rados = 20
2025-05-22T07:11:09.140+0000 7f606d7fa640 20 librbd::io::FlushTracker: 0x7f608c0587f0 shut_down:
2025-05-22T07:11:09.140+0000 7f606d7fa640 20 librbd::io::AsyncOperation: 0x7f604c001d70 finish_op
2025-05-22T07:11:09.140+0000 7f606d7fa640 10 librbd::image::CloseRequest: 0x7f608c059700 handle_shut_down_image_dispatcher: r=0
2025-05-22T07:11:09.140+0000 7f606d7fa640 10 librbd::image::CloseRequest: 0x7f608c059700 send_shut_down_object_dispatcher
2025-05-22T07:11:09.140+0000 7f606d7fa640 5 librbd::io::Dispatcher: 0x7f608c051b70 shut_down:
2025-05-22T07:11:09.140+0000 7f606d7fa640 5 librbd::io::ObjectDispatch: 0x7f608c057930 shut_down:
2025-05-22T07:11:09.140+0000 7f606d7fa640 5 librbd::io::SimpleSchedulerObjectDispatch: 0x7f608c054960 shut_down:
2025-05-22T07:11:09.140+0000 7f606d7fa640 20 librbd::io::FlushTracker: 0x7f6050019320 shut_down:
2025-05-22T07:11:09.140+0000 7f606d7fa640 5 librbd::cache::WriteAroundObjectDispatch: 0x7f60480033e0 shut_down:
2025-05-22T07:11:09.140+0000 7f606d7fa640 10 librbd::image::CloseRequest: 0x7f608c059700 handle_shut_down_object_dispatcher: r=0
2025-05-22T07:11:09.140+0000 7f606d7fa640 10 librbd::image::CloseRequest: 0x7f608c059700 send_flush_op_work_queue
2025-05-22T07:11:09.140+0000 7f606d7fa640 10 librbd::image::CloseRequest: 0x7f608c059700 handle_flush_op_work_queue: r=0
2025-05-22T07:11:09.140+0000 7f606d7fa640 10 librbd::image::CloseRequest: 0x7f608c059700 handle_flush_image_watcher: r=0
2025-05-22T07:11:09.140+0000 7f606d7fa640 10 librbd::ImageState: 0x7f608c051af0 0x7f608c051af0 handle_close: r=0
2025-05-22T07:11:09.140+0000 7f6058ff9640 10 librbd::ImageCtx: 0x7f608c0089a0 ~ImageCtx
2025-05-22T07:11:09.140+0000 7f6058ff9640 20 librados: flush_aio_writes
2025-05-22T07:11:09.140+0000 7f6058ff9640 20 librados: flush_aio_writes
2025-05-22T07:11:09.140+0000 7f6058ff9640 20 librbd::AsioEngine: 0x7f608c02e240 ~AsioEngine:
2025-05-22T07:11:09.140+0000 7f6058ff9640 20 librbd::asio::ContextWQ: 0x7f608c02f4e0 ~ContextWQ:
2025-05-22T07:11:09.140+0000 7f6058ff9640 20 librbd::asio::ContextWQ: 0x7f608c02f4e0 drain:
2025-05-22T07:11:09.140+0000 7f60a4b95640 10 librados: omap-set-vals oid=csi.volume.f525be9e-41b3-4bc6-bce9-4c23779eca25 nspace=
2025-05-22T07:11:09.147+0000 7f60a4b95640 10 librados: Objecter returned from omap-set-vals r=0
I0522 07:11:09.148996 1 omap.go:159] ID: 18 Req-ID: pvc-e374f529-3756-4f9e-845a-8b1da311e572 set omap keys (pool="replicapool", namespace="", name="csi.volume.f525be9e-41b3-4bc6-bce9-4c23779eca25"): map[csi.imageid:6c70af4dc7ef])
@Nikhil-Ladha sorry i forgot about it. Lets close this one
I think this is important enough to document somewhere. Maybe in a troubleshooting guide, or else in the developers guide.
@Madhu-1 what would be the steps if ceph-csi-operator is managing the deployment? Is it the same?
@Nikhil-Ladha csi-operator doesnt support it directly but it can be achieved, that because we have a open item in ceph-csi to support ceph.conf per ceph cluster not a single ceph.conf for all the ceph clusters.
We can also enable the librbd logs in the csi-rbdplugin container by executing the commands in the toolbox. This works for both rook+csi and ceph-csi-op+csi setup.
ceph config set global debug_rbd 30
ceph config set global log_to_stderr true
P.S: It doesn't require any restart as well 😉
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.