Failed to attach volume
Describe the bug
We use MayaStore in a three-node cluster running MicroK8s. After a reboot, we noticed that one of our pods, the MinIO tenant, was unable to attach a PVC. However, many other PVCs provided by Mayastor are working as expected.
In the events of the MinIO pod, we observed warnings indicating failures in volume attachment and mounting due to no online replicas being available and timeouts waiting for the condition.
│ Events: │
│ Type Reason Age From Message │
│ ---- ------ ---- ---- ------- │
│ Warning FailedAttachVolume 2m40s (x690 over 23h) attachdetach-controller AttachVolume.Attach failed for volume "pvc-aa014c14-f7b5-4b6f-8dfc-cfdc7a812661" : rpc error: code = Internal desc = Operation failed: PreconditionFailed("error in response: status code '412 Precondition Failed', content: 'RestJsonError { details: \"\", message: \"SvcError :: No │
│ OnlineReplicas: No online replicas are available for Volume 'aa014c14-f7b5-4b6f-8dfc-cfdc7a812661'\", kind: FailedPrecondition }'") │
│ Warning FailedMount 29s (x26 over 6h5m) kubelet Unable to attach or mount volumes: unmounted volumes=[0], unattached volumes=[kube-api-access-7spmt 0 cfg-vol configuration]: timed out waiting for the condition
We also noted discrepancies in the kubectl-mayastor get pools output, where
two out of three pools showed a status of 'Unknown' with zero capacity (we
expected 'Online' and also 7TiB):
ID DISKS MANAGED NODE STATUS CAPACITY ALLOCATED AVAILABLE COMMITTED
pool-on-spacekisste02 aio:///dev/nvme0n1 true spacekisste02 Online 7TiB 1.5TiB 5.4TiB 623GiB
pool-on-spacekisste01 /dev/nvme0n1 true spacekisste01 Unknown 0 B 0 B 0 B <none>
pool-on-spacekisste03 /dev/nvme0n1 true spacekisste03 Unknown 0 B 0 B 0 B <none>
Further investigation into the logs of the mayastor-io-engine pods on the affected nodes revealed errors related to failed pool imports due to an illegal byte sequence and NVMe connection issues, such as invalid data parameters and failed connection queues.
io-engine [2024-06-27T15:16:13.710438575+00:00 INFO io_engine::grpc::v1::pool:pool.rs:285] ImportPoolRequest { name: "pool-on-spacekisste01", uuid: None, disks: ["/dev/nvme0n1"], pooltype: Lvs }
io-engine [2024-06-27T15:16:13.718721274+00:00 ERROR io_engine::lvs::lvs_store:lvs_store.rs:306] error=EILSEQ: Illegal byte sequence, failed to import pool pool-on-spacekisste01
metrics-exporter-pool 2024-06-27T15:16:16.795647Z INFO actix_web::middleware::logger: 139.11.222.212 "GET /metrics HTTP/1.1" 200 0 "-" "Prometheus/2.47.0" 0.000118
metrics-exporter-pool at /build/cargo-vendor-dir/actix-web-4.3.1/src/middleware/logger.rs:421
Additionally, dmesg logs on these nodes displayed NVMe connection issues such as invalid data parameters and failed connection queues:
[23161.652647] nvme nvme3: Connect Invalid Data Parameter, subsysnqn "nqn.2019-05.io.openebs:595e6507-8be0-4a43-b2ce-1e141d894700"
[23161.654479] nvme nvme3: failed to connect queue: 0 ret=386
To Reproduce
Unfortunately, we have no idea on how to reproduce this issue as it appeared spontaneously after several system reboots which had previously completed without incident.
Expected behavior
We would have expected none of the issues described above.
** OS info:**
- node OS: Ubuntu 20.04.6 LTS
- k8s distro: MicroK8s v1.26.15 revision 6673
- Kernel version: 5.4.0-187-generic
- MayaStor image: docker.io/openebs/mayastor-io-engine:v2.4.0
Additional context This is currently really blocking us, so we'd be glad for any advice or suggestions. Thanks!
It seems like the pool is not present on the disks where we expect perhaps. I noticed you created pool with plain device uri, rather than /dev/path/by-id. Could it be possible that the order of your disks changed?
Thanks a lot for your answer. Unfortunately, that wasn't the issue. But we'll definitely use the /dev/disk/by-id next time.
Unfortunately, that wasn't the issue
Alright, then something else may be at play here.
Would you be able to upload a support bundle? Example:
kubectl mayastor -n mayastor dump system
Hi @tiagolobocastro, sure, I have attached the bundle to this comment. Thanks for taking a look! mayastor-support-bundle.tar.gz
@hsteude would you be able to run a container on the node to run some metadata dump utility on the disk?
@tiagolobocastro : Yes, I can run containers on the nodes. Could you please provide the specific steps or command you would like me to use? Unfortunately, we'll need to reset the cluster and reinstalling everything tomorrow afternoon, so I would need to do this by tomorrow morning (Berlin time).
Hey, hope it's still in time. Simply run this and capture the out, thank you
docker run --rm --name mayastor-spdk -v /dev/hugepages:/dev/hugepages --privileged -it tiagolobocastro/mayastor-spdk-blobcli:v23.05 -b nvme -D
@tiagolobocastro Thanks, is the source of that image public?
It is now: https://github.com/openebs/mayastor/tree/blobcli-23.05 Just builds blobcli and puts it in a container with a config file If you want to build yourself:
❯ docker load <$(nix-build -A images.mayastor-spdk) ─╯
warning: 'https://mayastor-cp.cachix.org' does not appear to be a binary cache
building '/nix/store/x2m9hr9xy0jg6qsv750vfi0vp4jd8kdw-repo_org.drv'...
patching script interpreter paths in ./git-org-name.sh
./git-org-name.sh: interpreter directive changed from "#!/usr/bin/env bash" to "/nix/store/dsd5gz46hdbdk2rfdimqddhq6m8m8fqs-bash-5.1-p16/bin/bash"
building '/nix/store/wdsqyjws8vmgd1m029bkzkl8ha2vzhmr-io-engine-version.drv'...
this derivation will be built:
/nix/store/0v5ysdn7qd65q8v4yaawik7ldc2rdj7c-docker-image-mayastor-spdk.tar.gz.drv
building '/nix/store/0v5ysdn7qd65q8v4yaawik7ldc2rdj7c-docker-image-mayastor-spdk.tar.gz.drv'...
Adding layer...
tar: Removing leading `/' from member names
Adding meta...
Cooking the image...
Finished.
Loaded image: openebs/mayastor-spdk:703fff02c8ff
Then rename/tag as necessary.
Thanks a lot @tiagolobocastro! I tried it but ran into this:
~$ docker run --rm --name mayastor-spdk -v /dev/hugepages:/dev/hugepages --privileged -it tiagolobocastro/mayastor-spdk-blobcli:v23.05 -b nvme0n1 -D
[2024-07-10 12:21:36.036047] Starting SPDK v23.05 / DPDK 22.11.1 initialization...
[2024-07-10 12:21:36.036115] [ DPDK EAL parameters: blobcli --no-shconf -c 0x1 --huge-unlink --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --iova-mode=pa --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk_pid1 ]
EAL: Error creating '/var/run/dpdk': No such file or directory
EAL: No DPDK runtime directory created
TELEMETRY: No legacy callbacks, legacy socket not created
[2024-07-10 12:21:36.157624] app.c: 767:spdk_app_start: *NOTICE*: Total cores available: 1
[2024-07-10 12:21:36.210992] reactor.c: 937:reactor_run: *NOTICE*: Reactor started on core 0
[2024-07-10 12:21:36.254003] accel_sw.c: 603:sw_accel_module_init: *NOTICE*: Accel framework software module initialized.
[2024-07-10 12:21:36.261860] bdev_aio.c: 110:bdev_aio_open: *ERROR*: open() failed (file:/dev/sda), errno 2: No such file or directory
[2024-07-10 12:21:36.261893] bdev_aio.c: 725:create_aio_bdev: *ERROR*: Unable to open file /dev/sda. fd: -1 errno: 2
[2024-07-10 12:21:36.265842] json_config.c: 194:rpc_client_poller: *ERROR*: error response:
{
"code": -2,
"message": "No such file or directory"
}
[2024-07-10 12:21:36.265974] app.c: 873:spdk_app_stop: *WARNING*: spdk_app_stop'd on non-zero
ERROR!
I don't know why it's looking for /dev/sda.
Sorry I had left an entry for sda for my testing, I've pushed both the branch and the image. Could you please try again?
That worked, this is the output on one of the broken nodes: mayastor-spdk-output-spaceksisste01.txt.zip
Great thanks, could you also run:
docker run --rm --name mayastor-spdk -v /dev/hugepages:/dev/hugepages --privileged -it tiagolobocastro/mayastor-spdk-blobcli:v23.05 -b nvme0n1 -s bs
@tiagolobocastro Thanks a lot for your help. I'm getting an error running that:
~$ docker run --rm --name mayastor-spdk -v /dev/hugepages:/dev/hugepages --privileged -it tiagolobocastro/mayastor-spdk-blobcli:v23.05 -b nvme0n1 -s bs
[2024-07-10 14:13:46.235060] Starting SPDK v23.05 / DPDK 22.11.1 initialization...
[2024-07-10 14:13:46.235137] [ DPDK EAL parameters: blobcli --no-shconf -c 0x1 --huge-unlink --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --iova-mode=pa --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk_pid1 ]
EAL: Error creating '/var/run/dpdk': No such file or directory
EAL: No DPDK runtime directory created
TELEMETRY: No legacy callbacks, legacy socket not created
[2024-07-10 14:13:46.355574] app.c: 767:spdk_app_start: *NOTICE*: Total cores available: 1
[2024-07-10 14:13:46.399475] reactor.c: 937:reactor_run: *NOTICE*: Reactor started on core 0
[2024-07-10 14:13:46.446966] accel_sw.c: 603:sw_accel_module_init: *NOTICE*: Accel framework software module initialized.
[2024-07-10 14:13:46.575475] blobstore.c:4696:bs_recover: *NOTICE*: Performing recovery on blobstore
[2024-07-10 14:13:46.575630] blobstore.c:4643:bs_load_replay_md_cpl: *NOTICE*: Recover: blob 0x0
[2024-07-10 14:13:46.575720] blobstore.c:4643:bs_load_replay_md_cpl: *NOTICE*: Recover: blob 0x1
Error in load callback (err -84)
[2024-07-10 14:13:46.584808] app.c: 873:spdk_app_stop: *WARNING*: spdk_app_stop'd on non-zero
ERROR!
On the node that works, the output looks slightly different:
~$ docker run --rm --name mayastor-spdk -v /dev/hugepages:/dev/hugepages --privileged -it tiagolobocastro/mayastor-spdk-blobcli:v23.05 -b nvme0n1 -s bs
Unable to find image 'tiagolobocastro/mayastor-spdk-blobcli:v23.05' locally
v23.05: Pulling from tiagolobocastro/mayastor-spdk-blobcli
1b4133f42d6a: Pull complete
Digest: sha256:b9db3205d095393d8b2b597cd5db9b03e5132e6e5bdfb0e0e97906a8f79d0897
Status: Downloaded newer image for tiagolobocastro/mayastor-spdk-blobcli:v23.05
[2024-07-10 14:45:27.875845] Starting SPDK v23.05 / DPDK 22.11.1 initialization...
[2024-07-10 14:45:27.875917] [ DPDK EAL parameters: blobcli --no-shconf -c 0x1 --huge-unlink --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --iova-mode=pa --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk_pid1 ]
EAL: Error creating '/var/run/dpdk': No such file or directory
EAL: No DPDK runtime directory created
TELEMETRY: No legacy callbacks, legacy socket not created
[2024-07-10 14:45:27.998015] app.c: 767:spdk_app_start: *NOTICE*: Total cores available: 1
[2024-07-10 14:45:28.055415] reactor.c: 937:reactor_run: *NOTICE*: Reactor started on core 0
[2024-07-10 14:45:28.098014] accel_sw.c: 603:sw_accel_module_init: *NOTICE*: Accel framework software module initialized.
[2024-07-10 14:45:28.271874] blobstore.c:1659:blob_load_cpl: *ERROR*: Metadata page 2227 crc mismatch for blobid 0x100000001
Blobstore Public Info:
Using bdev Product Name: AIO disk
API Version: 3
super blob ID: 0x100000000
page size: 4096
io unit size: 512
cluster size: 4194304
# free clusters: 1426423
blobstore type:
00000000 4c 56 4f 4c 53 54 4f 52 45 00 00 00 00 00 00 00 LVOLSTORE.......
Blobstore Private Info:
Metadata start (pages): 169
Metadata length (pages): 183141
hmm this might require a lot more debugging of the data structures to find the issue.
If you could send us a dump of the start of the disk so we can better diagnose, maybe say 200MiB that would also help. This is only metadata - by default you get 4MiB for the super and then 1 page (4kiB) for every cluster. On your 7T disk you can see you have 183141 pages, which means metadata lives up to 754MiB, before your data starts.
Alright, thanks a lot for your investigation. I'll create the dump and then proceed to nuke the drives, as we really need to get the deployment up and running again. Do you have any idea how I can easily verify that no sensitive data is included in the dump? BTW: we're really looking forward to seeing this being released: https://github.com/openebs/mayastor/issues/1064
Do you have any idea how I can easily verify that no sensitive data is included in the dump?
Create the dump like so:
> sudo dd if=/dev/nvme0n1 of=dump.io bs=1M count=200
You can run strings against the dump, example:
> strings dump.io
SPDKBLOB
LVOLSTORE
TUUP
UQ@UUAU
...................
@@Q@A
uuid54c14dab-8461-49a3-ac7d-dbf1287d47c2
namepool-nvme0n1
name08b67cce-b785-4c3c-921e-39a660167d39
uuid08b67cce-b785-4c3c-921e-39a660167d39
sharedtrue
.......................
BTW: we're really looking forward to seeing this being released: #1064
It's released :)
Hey @hsteude did you ever collect the metadata dump?
Please reopen if required, thank you.
Hi @tiagolobocastro ,
We're facing the same problem. 2 out of 3 nodes can't connect back the disks into the pool.
Our disks were referenced by-id and there's no changes in their structure, they simply fail after a major network outage in the provider + reboot.
We've tried to build your mayastor-spdk image but it fails due to some dependency missing:
https://github.com/openebs/mayastor/tree/blobcli-23.05
We've tried modifying nixos/nix not to use latest, and tried with versions 2.24.11 and 2.24.4 with exactly the same error.
Any help would be appreciated.
Best regards,
@tiagolobocastro Do we need to reopen this?
Hey there, we started facing the same issue a couple of days ago when our kubernetes hosts restarted. On the surface level, for us, it seems like the hosts restarted because OpenEBS caused the kernel to crash. Since then it is unable to create diskpools.
The openebs-io-engine pods throw this:
2025-04-15T15:32:34.158261Z ERROR metrics_exporter_io_engine::cache::pool: Error getting pools data, invalidating pools cache, error: GrpcResponseError("status: Unavailable, message: \"error trying to connect: tcp connect error: Connection refused (os error 111)\", details: [], metadata: MetadataMap { headers: {} }")
at metrics-exporter/src/bin/io_engine/cache/pool.rs:38
2025-04-15T15:32:34.159194Z ERROR metrics_exporter_io_engine::cache::pool_stat: Error getting pools data, invalidating pools cache, error: GrpcResponseError("status: Unavailable, message: \"error trying to connect: tcp connect error: Connection refused (os error 111)\", details: [], metadata: MetadataMap { headers: {} }")
at metrics-exporter/src/bin/io_engine/cache/pool_stat.rs:38
2025-04-15T15:32:34.160005Z ERROR metrics_exporter_io_engine::cache::nexus_stat: Error getting nexus stats data, invalidating nexus stats cache, error: GrpcResponseError("status: Unavailable, message: \"error trying to connect: tcp connect error: Connection refused (os error 111)\", details: [], metadata: MetadataMap { headers: {} }")
at metrics-exporter/src/bin/io_engine/cache/nexus_stat.rs:37
2025-04-15T15:32:34.160671Z ERROR metrics_exporter_io_engine::cache::replica_stat: Error getting replica stats data, invalidating replica cache, error: GrpcResponseError("status: Unavailable, message: \"error trying to connect: tcp connect error: Connection refused (os error 111)\", details: [], metadata: MetadataMap { headers: {} }")
at metrics-exporter/src/bin/io_engine/cache/replica_stat.rs:40
2025-04-15T15:32:34.161062Z INFO actix_web::middleware::logger: 10.200.0.18 "GET /metrics HTTP/1.1" 200 0 "-" "Prometheus/2.51.2" 0.003858
at /build/cargo-vendor-dir/actix-web-4.4.0/src/middleware/logger.rs:420
Current versions Os-Release: Ubuntu 22.04.5 LTS Kernel version: 5.15.0-136-generic OpenEBS version: 4.1.3
Our plan is to upgrade to the latest Ubuntu version and also update the kernel version too
Planned versions Os-Release: Ubuntu 24.04.2 LTS Kernel version: >=5.19 OpenEBS version: 4.1.3
If you have any idea on how this issue happens or what the solution could be to avoid this in the future, let me know.
Could you please share logs of the io_engine container within the io_engine pod
Here are the logs. Sorry for the color codes.
Those are the logs of the exporter container.
In the io-engine pod there are 2 containers, io-engine and the metrics-exporter-io-engine.
Please share the logs of the io-engine pod, ex: logs io-engine-xxxxxx -c io-engine
Right, sorry about that. Here you go:
[2025-04-16T08:21:20.327356753+00:00 INFO io_engine:io-engine.rs:253] Engine responsible for managing I/Os version 1.0.0, revision 02e79b6205f4 (v2.7.3+0)
[2025-04-16T08:21:20.327637869+00:00 INFO io_engine:io-engine.rs:232] free_pages 2MB: 1024 nr_pages 2MB: 1024
[2025-04-16T08:21:20.327652901+00:00 INFO io_engine:io-engine.rs:233] free_pages 1GB: 0 nr_pages 1GB: 0
[2025-04-16T08:21:20.327932358+00:00 INFO io_engine:io-engine.rs:285] kernel io_uring support: yes
[2025-04-16T08:21:20.327941818+00:00 INFO io_engine:io-engine.rs:289] kernel nvme initiator multipath support: yes
[2025-04-16T08:21:20.327998278+00:00 INFO io_engine::core::env:env.rs:945] loading mayastor config YAML file /var/local/openebs/io-engine/config.yaml
[2025-04-16T08:21:20.328036793+00:00 INFO io_engine::subsys::config:mod.rs:189] Config file /var/local/openebs/io-engine/config.yaml is empty, reverting to default config
[2025-04-16T08:21:20.328059826+00:00 INFO io_engine::subsys::config::opts:opts.rs:169] Overriding NVMF_TCP_MAX_QUEUE_DEPTH value to '32'
[2025-04-16T08:21:20.328069884+00:00 INFO io_engine::subsys::config::opts:opts.rs:169] Overriding NVMF_TCP_MAX_QPAIRS_PER_CTRL value to '32'
[2025-04-16T08:21:20.328087066+00:00 INFO io_engine::subsys::config::opts:opts.rs:236] Overriding NVME_TIMEOUT value to '300s'
[2025-04-16T08:21:20.328096352+00:00 INFO io_engine::subsys::config::opts:opts.rs:236] Overriding NVME_TIMEOUT_ADMIN value to '30s'
[2025-04-16T08:21:20.328103404+00:00 INFO io_engine::subsys::config::opts:opts.rs:236] Overriding NVME_KATO value to '10s'
[2025-04-16T08:21:20.328133968+00:00 INFO io_engine::subsys::config:mod.rs:240] Applying Mayastor configuration settings
[2025-04-16T08:21:20.328144734+00:00 INFO io_engine::subsys::config::opts:opts.rs:363] NVMe Bdev options successfully applied
[2025-04-16T08:21:20.328152226+00:00 INFO io_engine::subsys::config::opts:opts.rs:500] Bdev options successfully applied
[2025-04-16T08:21:20.328161201+00:00 INFO io_engine::subsys::config::opts:opts.rs:637] Socket options successfully applied
[2025-04-16T08:21:20.328167961+00:00 INFO io_engine::subsys::config::opts:opts.rs:668] I/O buffer options successfully applied
[2025-04-16T08:21:20.328174644+00:00 INFO io_engine::subsys::config:mod.rs:246] Config {
source: Some(
"/var/local/openebs/io-engine/config.yaml",
),
nvmf_tcp_tgt_conf: NvmfTgtConfig {
name: "mayastor_target",
max_namespaces: 2048,
crdt: [
30,
0,
0,
],
opts: NvmfTcpTransportOpts {
max_queue_depth: 32,
max_qpairs_per_ctrl: 32,
in_capsule_data_size: 4096,
max_io_size: 131072,
io_unit_size: 131072,
max_aq_depth: 32,
num_shared_buf: 2047,
buf_cache_size: 64,
dif_insert_or_strip: false,
abort_timeout_sec: 1,
acceptor_poll_rate: 10000,
zcopy: true,
},
interface: None,
},
nvme_bdev_opts: NvmeBdevOpts {
action_on_timeout: 4,
timeout_us: 300000000,
timeout_admin_us: 30000000,
keep_alive_timeout_ms: 10000,
transport_retry_count: 0,
arbitration_burst: 0,
low_priority_weight: 0,
medium_priority_weight: 0,
high_priority_weight: 0,
nvme_adminq_poll_period_us: 1000,
nvme_ioq_poll_period_us: 0,
io_queue_requests: 0,
delay_cmd_submit: true,
bdev_retry_count: 0,
transport_ack_timeout: 0,
ctrlr_loss_timeout_sec: 0,
reconnect_delay_sec: 0,
fast_io_fail_timeout_sec: 0,
disable_auto_failback: false,
generate_uuids: true,
},
bdev_opts: BdevOpts {
bdev_io_pool_size: 65535,
bdev_io_cache_size: 512,
},
nexus_opts: NexusOpts {
nvmf_enable: true,
nvmf_discovery_enable: true,
nvmf_nexus_port: 4421,
nvmf_replica_port: 8420,
},
socket_opts: PosixSocketOpts {
recv_buf_size: 2097152,
send_buf_size: 2097152,
enable_recv_pipe: true,
enable_zero_copy_send: true,
enable_quickack: true,
enable_placement_id: 0,
enable_zerocopy_send_server: true,
enable_zerocopy_send_client: false,
zerocopy_threshold: 0,
},
iobuf_opts: IoBufOpts {
small_pool_count: 8192,
large_pool_count: 2048,
small_bufsize: 8192,
large_bufsize: 135168,
},
eal_opts: EalOpts {
reactor_mask: None,
core_list: None,
developer_delay: None,
},
}
TELEMETRY: No legacy callbacks, legacy socket not created
[2025-04-16T08:21:20.447643985+00:00 INFO io_engine::core::mempool:mempool.rs:53] Memory pool 'bdev_io_ctx' with 65535 elements (24 bytes size) successfully created
[2025-04-16T08:21:20.454508780+00:00 INFO io_engine::core::mempool:mempool.rs:53] Memory pool 'nvme_ctrl_io_ctx' with 65535 elements (72 bytes size) successfully created
[2025-04-16T08:21:20.454531207+00:00 INFO io_engine::core::env:env.rs:1017] Total number of cores available: 2
[2025-04-16T08:21:20.513842111+00:00 INFO io_engine::core::reactor:reactor.rs:189] Scheduled SPDK thread 'init_thread' (0x56042349da10) on core #4
[2025-04-16T08:21:20.513914625+00:00 INFO io_engine::core::reactor:reactor.rs:165] Init thread ID 1
[2025-04-16T08:21:20.514132917+00:00 INFO io_engine::core::reactor:reactor.rs:309] Starting reactor polling loop [3mcore[0m[2m=[0m5 [3mtid[0m[2m=[0m12
[2025-04-16T08:21:20.515149207+00:00 INFO io_engine::core::env:env.rs:1052] All cores locked and loaded!
[2025-04-16T08:21:20.650573846+00:00 INFO io_engine::bdev::nexus::nexus_module:nexus_module.rs:41] Initializing Nexus CAS Module
[2025-04-16T08:21:20.650760105+00:00 INFO io_engine::core::reactor:reactor.rs:189] Scheduled SPDK thread 'mayastor_nvmf_tcp_pg_core_4' (0x56042349f580) on core #4
[2025-04-16T08:21:20.650844191+00:00 INFO io_engine::core::reactor:reactor.rs:189] Scheduled SPDK thread 'mayastor_nvmf_tcp_pg_core_5' (0x56042349fbd0) on core #5
[2025-04-16T08:21:20.653499550+00:00 INFO io_engine::core::env:env.rs:904] Using 'MY_POD_IP' environment variable for IP address for NVMF target network interface
[2025-04-16T08:21:20.653613467+00:00 INFO io_engine::subsys::nvmf::target:target.rs:263] nvmf target listening on 10.200.0.17:(4421,8420)
[2025-04-16T08:21:20.653657295+00:00 INFO io_engine::subsys::nvmf::subsystem:subsystem.rs:842] Subsystem start in progress... [3mself[0m[2m=[0mNvmfSubsystem { id: 0, subtype: "Discovery", subnqn: "nqn.2014-08.org.nvmexpress.discovery", sn: "00000000000000000000", mn: "Mayastor NVMe controller", allow_any_host: 1, ana_reporting: 0, listeners: Some([Transport ID { trtype: 3, trstring: "TCP", traddr: "10.200.0.17", trsvcid: "8420" }]) }
[2025-04-16T08:21:20.653721950+00:00 INFO io_engine::subsys::nvmf::subsystem:subsystem.rs:893] Subsystem start completed: Ok [3mself[0m[2m=[0mNvmfSubsystem { id: 0, subtype: "Discovery", subnqn: "nqn.2014-08.org.nvmexpress.discovery", sn: "00000000000000000000", mn: "Mayastor NVMe controller", allow_any_host: 1, ana_reporting: 0, listeners: Some([Transport ID { trtype: 3, trstring: "TCP", traddr: "10.200.0.17", trsvcid: "8420" }]) }
[2025-04-16T08:21:20.653745732+00:00 INFO io_engine::subsys::nvmf::target:target.rs:361] nvmf target accepting new connections and is ready to roll..💃
[2025-04-16T08:21:20.751289020+00:00 INFO io_engine::core::reactor:reactor.rs:189] Scheduled SPDK thread 'iscsi_poll_group_4' (0x56042364b6b0) on core #4
[2025-04-16T08:21:20.751545991+00:00 INFO io_engine::core::reactor:reactor.rs:189] Scheduled SPDK thread 'iscsi_poll_group_5' (0x56042364ba00) on core #5
[2025-04-16T08:21:20.751603929+00:00 INFO io_engine::core::env:env.rs:929] RPC server listening at: /var/tmp/mayastor.sock
[2025-04-16T08:21:20.751989428+00:00 WARN io_engine:io-engine.rs:90] Nexus reset is disabled
[2025-04-16T08:21:20.751997806+00:00 INFO io_engine:io-engine.rs:109] Async QPair connection feature ('spdk-async-qpair-connect') is enabled
[2025-04-16T08:21:20.752001792+00:00 INFO io_engine:io-engine.rs:110] Fault injection feature ('fault-injection') is disabled
[2025-04-16T08:21:20.773455708+00:00 INFO io_engine::persistent_store:persistent_store.rs:149] Connected to etcd on endpoint openebs-etcd:2379
[2025-04-16T08:21:20.773541862+00:00 INFO io_engine::persistent_store:persistent_store.rs:125] Persistent store operation timeout: 10s, number of retries: 300
[2025-04-16T08:21:20.773624335+00:00 INFO io_engine::grpc::server:server.rs:79] [V1] gRPC server configured at address 10.200.0.17:10124
[2025-04-16T08:21:20.773927548+00:00 INFO io_engine::subsys::registration::registration_grpc:registration_grpc.rs:211] Registering '"rpc-platform-prod-use1-1"' with grpc server 10.200.0.17:10124 ...
[2025-04-16T08:21:20.788066827+00:00 INFO io_engine::grpc::v1::pool:pool.rs:571] ImportPoolRequest { name: "diskpool-rpc-platform-prod-use1-1", uuid: None, disks: ["/dev/mayastor_vg/mayastor_lv"], pooltype: Lvs }
hmm pool import seems stuck.
Perchance the your logical volume under mayastor is not suspended is it?
sudo -E dmsetup info -C -o info_all /dev/mayastor_vg/mayastor_lv
@tiagolobocastro I ran the command on our kubernetes hosts and the logical volumes all seem to be active. The outputs are almost identical
BlkDevName Stat Tables Suspended Read-only DevNo Maj Min Open Targ Event
dm-0 L--w Live Active Writeable 253:0 253 0 1 1 0
How odd, are the logs still the same or any further messages?