Trident failed to properly remove multipath link to PV on docker swarm worker
Describe the bug when a persistant volume is mounted by a service on a swarm worker node, il we modify the swarm service to mount another persistant volume then this can cause high IO wait on the first swarm worker when the remove the first persistant volume
Environment We use trident + Ontap Select iscsi to be able to consume persistant volume for our services on Docker swarm clusters
Provide accurate information about the environment to help us reproduce the issue.
- Trident version: 23.04.0
- Docker version : 20.10.16
- Trident installation flags used: [e.g. -d -n trident --use-custom-yaml]
- Container runtime: Docker version 25.0.3, build 4debf41
- Docker Swarm mode
- OS: Rocky Linux release 9.3 (Blue Onyx)
- NetApp backend types: NetApp Release 9.8P6
- Other:
To Reproduce Steps to reproduce the behavior: start.sh to create docker swarm service with a persistant volume
# Volumes
/export SERVICE_TEST_VOLUME=TestVolume1
/export SERVICE_TEST_VOLUME_SIZE='1gb'
/vol1=`docker volume inspect $SERVICE_TEST_VOLUME | wc -c`
if [ $vol1 -gt 3 ]
then
echo "$SERVICE_TEST_VOLUME exists"
else
echo "Creating volume $SERVICE_TEST_VOLUME"
docker volume create --driver=netapp --name=$SERVICE_TEST_VOLUME -o size=$SERVICE_TEST_VOLUME_SIZE -o fileSystemType=ext4 -o spaceReserve=volume
docker run --rm -v $SERVICE_TEST_VOLUME:/data busybox rmdir /data/lost+found
fi
docker stack deploy -c docker-compose.yml --resolve-image=always --prune --with-registry-auth SERVICE_TEST
we deploy this service on our swarm cluster. Swarm manager starts this service on worker node A
[root@nodeA:~]# mount |grep testv
/dev/mapper/3600a098056303030313f526b682f4279 on /local/docker-data/plugins/b8fe688a4fd41d4af97f5de3ce33dee1f7f862d89ba982eec79bf5c785b93c9c/propagated-mount/netappdvp_testvolume type ext4 (rw,relatime,stripe=16)
/dev/mapper/3600a098056303030313f526b682f4279 on /local/docker-data/plugins/b8fe688a4fd41d4af97f5de3ce33dee1f7f862d89ba982eec79bf5c785b93c9c/propagated-mount/netappdvp_testvolume type ext4 (rw,relatime,stripe=16)
[root@nodeA:~]# multipath -ll
3600a098056303030313f526b682f4279 dm-8 NETAPP,LUN C-Mode
size=954M features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 4:0:0:227 sdc 8:32 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
`- 3:0:0:227 sdb 8:16 active ready running
Then we modify the volume name to TestVolume2 and redeploy the service
export SERVICE_TEST_VOLUME=TestVolume2
The service is stopped on node A NetApp Trident create a new volume TestVolume2 The service is started on another swarm worker node : node B
On node A we can no longer see TestVolume1 with "mount |grep TestVolume1" But there are still some multipath info on node A
[root@nodeA:~]# mount |grep testv
[root@nodeA:~]# multipath -ll
3600a098056303030313f526b682f4279 dm-8 NETAPP,LUN C-Mode
size=954M features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 4:0:0:227 sdc 8:32 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
`- 3:0:0:227 sdb 8:16 active ready running
then on one of the swarm manager we launch "docker volume rm TestVolume1"
[root@nodeA:~]# multipath -ll
3600a098056303030313f526b682f4279 dm-8 NETAPP,LUN C-Mode
size=954M features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=0 status=enabled
| `- 4:0:0:227 sdc 8:32 **failed faulty running**
`-+- policy='service-time 0' prio=0 status=enabled
`- 3:0:0:227 sdb 8:16 **failed faulty running**
[root@nodeA:~]# top
top - 18:28:57 up 1 day, 2:02, 2 users, load average: 0.80, 0.30, 0.10
Tasks: 310 total, 1 running, 309 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 0.3 sy, 0.0 ni, 82.9 id, **16.6 wa**, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7656.0 total, 5421.9 free, 1101.3 used, 1402.1 buff/cache
MiB Swap: 6144.0 total, 6144.0 free, 0.0 used. 6554.7 avail Mem
to remove high IO wait we have to use dmsetup command
[root@nodeA:~]# dmsetup -f remove 3600a098056303030313f526b682f4279
[root@nodeA:~]# multipath -ll
[root@nodeA:~]# top
top - 18:29:50 up 1 day, 2:03, 2 users, load average: 0.97, 0.43, 0.16
Tasks: 306 total, 1 running, 305 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.0 us, 1.9 sy, 0.0 ni, 97.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7656.0 total, 5454.4 free, 1070.0 used, 1400.7 buff/cache
MiB Swap: 6144.0 total, 6144.0 free, 0.0 used. 6586.0 avail Mem
Expected behavior Trident should clear multipath link to unsuse persistant volume before deleting volume on Ontap backend It's not clear for me if that must be docker swarm who call trindent plugin on each swarm worker to do this or if docker swarm just have to call trident plugin on swarm manager and then trident plugin from this swarm manager have to call all trident plugin on every swarm worker nodes
Additional context Add any other context about the problem here.
I dig into Trident source code, espacialy in func Unmount() in Trident/plugin.go file
There is a comment which says something diferent than docker plugin specification. link to Unmount function in plugin.co here are the two comment lines
// No longer detaching and removing iSCSI session here because it was causing issues with 'docker cp'.
// See https://github.com/moby/moby/issues/34665
Comments in the moby issue explain that the storage plugin should count each time a volume is mounted to a container so that it should unmount to from system only when there is no more container using this volume and not to prevent detaching and removing iSCSI session as stated in this comment.
the docker documentation of storage plugin states also that the plugin have to count the number of time plugin is called to mount one particular volume. https://docs.docker.com/engine/extend/plugins_volume/#volumedrivermount
The more I look into the code and activate debug log, the more I think there is an issue with Trident Plugin : it should remove iscsi device mount when there is no more container using one particular volume on one node.
Could someone tell me if I am right ?
I managed to activate Trident plugin logs with
docker plugin set netapp:latest debug=true
and by adding to /etc/netappdvp/config.json
"debugTraceFlags": {"api":true, "method":true}
I can see all call to NetApp API but I can still no see some of the debug log which I see in source code. By instance, https://github.com/NetApp/trident/blob/master/utils/mount_linux.go#L338-L339 or https://github.com/NetApp/trident/blob/master/frontend/docker/plugin.go#L487C2-L495C58 Is there any way to activate these logs and where could I see them ?
Hi @vlgenf71, Thanks for the detailed analysis and the references. From the moby issue I see that the conclusion is that docker has to handle plugin failure scenarios. And other comment says this workflow (not removing iSCSI connections until volume deletion) cause challenges in swarm environment. Nevertheless, it appears the current work flow resolves one issue but may have challenges with swarm. It may require more debate and prioritize to address in Trident plugin implementation.
And, for the debug logs to be visible at https://github.com/NetApp/trident/blob/master/utils/mount_linux.go#L338-L339 This is in CSI work flow and not in docker. I can see the logs with my work flows.
https://github.com/NetApp/trident/blob/master/frontend/docker/plugin.go#L487C2-L495C58 For these logs to be visible, you might want to change --disable_audit_log to false while installing trident or you can edit daemonset.apps/trident-node-linux to set it to false. By default this flag is set to true and hence you could not see those logs in docker work flow.
And, I see you have been using very old Trident image 20.10.16, would you mind updating to latest since there has been changes in Trident 23.10 and later for iSCSI strengthening, and multipath device removal areas.
Thanks.
Hi @mravi-na, Thank you for your answer.
How can I use the "-disable_audit_log" to false ?
I deploy trident plugin with this command :
docker plugin install --grant-all-permissions --alias netapp netapp/trident-plugin:23.07.1 config=config.json
I made a typo in my post : 20.10.16 is the docker version I use, I deployed v23.07.01 trident plugin version :-)
Hi @mravi-na,
Once again, thank you the time you spent to give me an answer
It's not clear for me why the uMount() func in mount_linux.go would not be call in docker plugin mode
https://github.com/NetApp/trident/blob/master/utils/mount_linux.go#L338-L339
This is in CSI work flow and not in docker. I can see the logs with my work flows.
I understand that the entry point of the Unmount Trident plugin function is this Unmount() function : https://github.com/NetApp/trident/blob/master/frontend/docker/plugin.go#L484-L515
This function calls p.orchestrator.DetachVolume
then I can see that the utils.Umount(ctx, mountpoint) is called https://github.com/NetApp/trident/blob/master/core/orchestrator_core.go#L3626
I can see only 3 implementation of his utils.Umount function, the one in mount_linux.go file seems the mostly likely to me utils/mount_darwin.go func Umount(ctx context.Context, mountpoint string) (err error) {
utils/mount_linux.go func Umount(ctx context.Context, mountpoint string) (err error) {
utils/mount_windows.go func Umount(ctx context.Context, mountpoint
Hi @vlgenf71 Sorry for confusion :( , I meant to say that I tested in CSI workflows and I could see the debug logs. I have not tested in docker setup yet.
Hi @vlgenf71 Could you please confirm if this is still an issue with Trident 25.10?