Storage: Fix PowerFlex apparmor warnings
During the work on the SDC mode in PowerFlex I was closely following the dmesg output throughout the entire lxd-ci tests/storage-vm test suite (when executed for the NVMe/TCP mode) and I found some apparmor warnings that interestingly do not cause any errors on LXD side but indicate some missing access.
They only appear from time to time and do not happen consistently. The IPs/ports listed in the error messages correspond to the NVMe/TCP connections to the PowerFlex SDT's:
This PR adds additional rules to both the rsync and qemuimg profile to mitigate the following errors. At least one of the warnings was reported unrelated to PowerFlex here https://github.com/canonical/lxd/issues/13585.
Great observation, Julian!
found some apparmor warnings that interestingly do not cause any errors on LXD side but indicate some missing access.
this is surprising. Because these errors come from NVMe over TCP internals. And it means that block device driver fails to perform some network requests which should prevent any IO (except the case when data is cached).
And it is a bit weird, that LSM controls and checks network access for kernel block device. This is something we should look into from the kernel side I believe.
@roosterfish Im a bit confused why rsync is being blocked to send data directly to an IP - normally we wrap rsync over the network inside a websocket, so the rsync process itself isn't sending to a remote IP.
In what situations are you getting these warnings?
In what situations are you getting these warnings?
I have updated the PR's description to clarify on this. The IPs/ports are from the target systems (PowerFlex SDT's) the host connects to via NVMe/TCP.
In what situations are you getting these warnings?
I have updated the PR's description to clarify on this. The IPs/ports are from the target systems (PowerFlex SDT's) the host connects to via NVMe/TCP.
So is rsync directly communicating with powerflex or is Linux interpreting a write to a locally mapped nvme over TCP block device as a message being sent to the remote server?
So is rsync directly communicating with powerflex or is Linux interpreting a write to a locally mapped nvme over TCP block device as a message being sent to the remote server?
It's presumably the latter as the PowerFlex driver doesn't have any specific logic when doing the rsync. It also uses the same generic functions we have in LXD for volume transfer.
static checks not happy
It's presumably the latter as the PowerFlex driver doesn't have any specific logic when doing the rsync. It also uses the same generic functions we have in LXD for volume transfer.
Thanks. That feels like a layering violation to me, as it shouldn't be needed for every program that accesses the device to need to be explicitly allowed to send packets to the mapped device's endpoint, after all the programs themselves are not sending the packets, but the underlying OS. What do you think @mihalicyn ?
That feels like a layering violation to me, as it shouldn't be needed for every program that accesses the device to need to be explicitly allowed to send packets to the mapped device's endpoint, after all the programs themselves are not sending the packets, but the underlying OS. What do you think @mihalicyn ?
This is what I've said above ;-)
This is something we should investigate from the kernel side.
This is something we should investigate from the kernel side.
OK so we should not add it to the apparmor policy yet then?
OK so we should not add it to the apparmor policy yet then?
I think we left with no choice and we have to add this. What I can not understand is why nothing is failing from the LXD side?
If rsync/qemu-img are failing on the network requests produced from the nvme block device it should make a device faulty and cause EIO. But for some reason it works. Why? page cache? I would try to put something like echo 3 > /proc/sys/vm/drop_caches just before each qemu-img/rsync calls and retest if it breaks things (it should!) If not, I guess we must dive into it and get some understanding of how it works with such a critical network errors and why.
I would try to put something like
echo 3 > /proc/sys/vm/drop_cachesjust before eachqemu-img/rsynccalls and retest
I will test this, thanks for the suggestion. Which value do I have to echo into /proc/sys/vm/drop_caches afterwards to reset this?
Which value do I have to echo into
/proc/sys/vm/drop_cachesafterwards to reset this?
a, good question, you don't really need to echo anything after this. echo 3 > /proc/sys/vm/drop_caches is a single-shot thing. It just drops all the caches one time but does not disable caching.
I think we need to confirm if anything is actually breaking first before we proceed with this change as otherwise we are going to end up weakening the apparmor profile we use when calling rsync for "local" copies to allow it to make network connections that should be unnecessary, and this will be for all storage drivers, not just powerflex.
If its a kernel bug and/or not causing any actual problems then we shouldnt need to workaround it in LXD at the expense of reduced security.
@mihalicyn I have put this right in front of the qemu-img and rsync operations. Neither do the errors in the kernels log look different nor does any of the errors get propagated to the caller of qemu-img/rsync:
@roosterfish shall we close this?
@roosterfish shall we close this?
As it's clearly not causing any issues/errors on the LXD side I will close it for now.