cc-oci-runtime
cc-oci-runtime copied to clipboard
networkstatic/iperf3 containers fails to run with cors
If you try to run iperf3 with runc as the runtime it works. However if the same container is run with cor as a runtime it will fail with the error
iperf3: error - unable to create a new stream: No such file or directory
This may have something to do with the setup of /tmp or an environment variable in cor.
This is looking like a 9p bug. I created a test program (attached) that makes the following calls (similar to what iperf_new_stream() in iperf3 is doing. The program works fine in a normal environment, but fails in a CC on the call to ftruncate(). Here's some sample strace output:
execve("./test_iperf3-manohar", ["./test_iperf3-manohar"], [/* 9 vars */]) = 0
uname({sysname="Linux", nodename="7e55c4113e73", ...}) = 0
brk(NULL) = 0x186b000
brk(0x186c1c0) = 0x186c1c0
arch_prctl(ARCH_SET_FS, 0x186b880) = 0
readlink("/proc/self/exe", "/shared/test_iperf3-manohar", 4096) = 27
brk(0x188d1c0) = 0x188d1c0
brk(0x188e000) = 0x188e000
fstat(1, {st_mode=S_IFREG|0644, st_size=468, ...}) = 0
gettimeofday({1471958259, 200678}, NULL) = 0
getpid() = 360
open("/tmp/foo.O2LCdw", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlink("/tmp/foo.O2LCdw") = 0
ftruncate(3, 0)
[test_iperf3-manohar.c.txt](https://github.com/01org/cc-oci-runtime/files/432478/test_iperf3-manohar.c.txt)
= -1 ENOENT (No such file or directory)
write(2, "ERROR: ftruncate(fd=3, 0) failed"..., 42ERROR: ftruncate(fd=3, 0) failed: errno=2
) = 42
write(1, "DEBUG: template='/tmp/foo.XXXXXX"..., 115DEBUG: template='/tmp/foo.XXXXXX'
SUCCESS: mkstemp(/tmp/foo.O2LCdw) worked
SUCCESS: unlink(/tmp/foo.O2LCdw) worked
) = 115
exit_group(1) = ?
+++ exited with 1 +++
A workaround is to run the following in the container before starting the app:
# mount -t ramfs -o size=20M ramfs /tmp
Problem is similar to #47.
just an update. Yes, I believe the problem is similar to #47, but different as #47 did unlink-fstat, and this one is doing unlink-ftruncate . Same fundamental issue likely (that at some point in the code path the lookup is done by the pathname and not the fd/handle and thus fails to find the unlinked file), but through a slightly different code path. This is reasonably hard to track down, and then if in the VFS layer on the client it may be reasonably hard to fix. It could be over in the qemu host side though, which is probably the first place to look (if only as it is probably a bit easier to debug)
@anthonyzxu do you have any comments on this issue?
For the record. After chasing this down for https://github.com/coreos/rkt/pull/3475, there was a patch set in test that would fix this: https://sourceforge.net/p/v9fs/mailman/v9fs-developer/thread/146659832556.15781.17414806975641516683.stgit%40bahia.lan/#msg35175775
but a race condition still exists, and those patches are still in flight and not upstreamed yet. If this issue does rise in importance then we could 'update' to those latest patches and re-evaluate.
As this is well understood, and I believe we should be able to use a ramfs /tmp as a workaround, can we drop the priority of this down from a P1 do we think @jodh-intel @dlespiau
@grahamwhaley - I concur.
The problem is that the /tmp workaround is only going to work for /tmp: iperf3 seems well-behaved in using a temporary directory, but if any other apps try to use ftruncate(2) on, say, /, they will see the problem.
As such, we'll have to document this limitation carefully. In fact, I almost wonder if we should generate some sort of warning message if these problematic 9p calls (via a patch). That way, we could audit the scope of the problem in say the top 100 images (#226). As and when a full solution is available, we can drop the 9p audit patch.
For completeness, here's some sample strace output from a successful run of the program attached in comment https://github.com/01org/cc-oci-runtime/issues/152#issuecomment-241728833:
execve("./test_iperf3-manohar", ["./test_iperf3-manohar"], [/* 111 vars */]) = 0
brk(NULL) = 0xa63000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4974cb8000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=143417, ...}) = 0
mmap(NULL, 143417, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4974c94000
close(3) = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\10\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2089496, ...}) = 0
mmap(NULL, 3938656, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f49746d4000
mprotect(0x7f497488d000, 2093056, PROT_NONE) = 0
mmap(0x7f4974a8c000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b8000) = 0x7f4974a8c000
mmap(0x7f4974a92000, 14688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f4974a92000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4974c93000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4974c92000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4974c91000
arch_prctl(ARCH_SET_FS, 0x7f4974c92700) = 0
mprotect(0x7f4974a8c000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ) = 0
mprotect(0x7f4974cb9000, 4096, PROT_READ) = 0
munmap(0x7f4974c94000, 143417) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 8), ...}) = 0
brk(NULL) = 0xa63000
brk(0xa84000) = 0xa84000
brk(NULL) = 0xa84000
write(1, "DEBUG: template='/tmp/foo.XXXXXX"..., 34DEBUG: template='/tmp/foo.XXXXXX'
) = 34
getpid() = 438
open("/tmp/foo.dewhr0", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
write(1, "SUCCESS: mkstemp(/tmp/foo.dewhr0"..., 41SUCCESS: mkstemp(/tmp/foo.dewhr0) worked
) = 41
unlink("/tmp/foo.dewhr0") = 0
write(1, "SUCCESS: unlink(/tmp/foo.dewhr0)"..., 40SUCCESS: unlink(/tmp/foo.dewhr0) worked
) = 40
ftruncate(3, 0) = 0
write(1, "SUCCESS: ftruncate(fd=3, 0) work"..., 35SUCCESS: ftruncate(fd=3, 0) worked
) = 35
close(3) = 0
exit_group(0) = ?
+++ exited with 0 +++
@jcvenegas are any updates about this?
@GabyCT not so far, I did not take a look, I think the last update is @grahamwhaley research.
Right, I peeked at this issue last week and thought we should be able to 'fix' it by taking the latest kernel and QEMU patchsets for the 9pfs unlink fixes. @GabyCT , if you want to develop the tests then I think you could use the ramfs tmp mount 'trick' listed early in the ticket. I'll at least open an Issue to note we need to check and probably update the 9pfs code.
This sounds like a gnarly one to fix. But how about we downgrade to P2 given that there is a workaround of sorts?
I could agree to downgrade to P2. I'm not going to get to look at the patch upgrades in the immediate future. Worth pulling in @anthonyzxu for his thoughts in case it is a work in progress already?
Can you test with the following changes in /usr/share/defaults/cc-oci-runtime/hypervisor.args:
In the qemu command line, following option is being used:
fsdev local,id=test_dev,path=
Instead please use the following:
-fsdev handle,id=test_dev,path=
Manohar saw the issue being fixed with the above change.
@rarindam no, this issue still exists after your suggested method is used.
-fsdev local,id=extra-9p-hyperShared,path=/tmp/hyper/shared/pods/7cefd7d5746cef74f9fb04c9f246b05bf6ac7e0bc38c0457115f4d443307546e,security_model=none
Hi @wuzhy , -fsdev local should be -fsdev handle. That is what fixed the issue for Manohar.
@rarindam Sorry for the late reply, Just i tried, and found that Clear Container failed when "-fsdev handle" was used. Below is the log:
2017-08-01 01:38:22.896147955 +0800 CST:7266:cc-runtime:error:Container still running, should be stopped
@wuzhy, it has be some other issue other than the option I provided. You have to remove security_model option if using handle, instead of local. Please try it out. I am providing this update from my end:
- -fsdev handle certainly is helping me run the test-iperf code with success: DEBUG: template='/tmp/foo.XXXXXX' SUCCESS: mkstemp(/tmp/foo.mqbCsO) worked SUCCESS: unlink(/tmp/foo.mqbCsO) worked SUCCESS: ftruncate(fd=3, 0) worked
@mcastelino 2. This issue seems to be actively tracked here: https://bugs.launchpad.net/qemu/+bug/1336794 Here instead of ftruncate they are using fstat, hitting the same issue. I tried the solutions provided, but that does not fix out issue. In fact this solution itself is not committed hence the bug is still open.
-
Using 9pVhost: The same test also passes if I use 9pvhost. The issue is not seen there.
-
In my testing, I could see the following: After UNLINK is executed, ftruncate is implemented using setattr (with set size flag). In the case when 9p-local is used as server, this results in error:
[ 26.551849] 9pnet: (00000550) >>> TSETATTR fid 4
[ 26.551854] 9pnet: (00000550) valid=68 mode=0 uid=2 gid=0 size=0
atime_sec=1501796060 atime_nsec=0
mtime_sec=1501796060 mtime_nsec=0
[ 26.551872] 9pnet: -- p9_client_prepare_req (550): client ffff9359f473f000 op 26
[ 26.552121] 9pnet: -- p9_idpool_get (550): id 1 pool ffff9359f2c1ac00
[ 26.552127] 9pnet: (00000550) >>> size=67 type: 26 tag: 1
[ 26.552132] 9pnet: -- p9_virtio_request (550): 9p debug: virtio request
[ 26.552149] 9pnet: -- p9_virtio_request (550): virtio request kicked
[ 26.552424] 9pnet: -- req_done (0): : request done
[ 26.552438] 9pnet: -- p9_client_cb (0): tag 1
[ 26.552454] 9pnet: -- p9_client_cb (0): wakeup: 1
[ 26.552505] 9pnet: (00000550) <<< size=11 type: 7 tag: 1
[ 26.552507] 9pnet: (00000550) <<< RLERROR (-2)
[ 26.552510] 9pnet: -- p9_free_req (550): clnt ffff9359f473f000 req ffff9359f2c0c070 tag: 1
[ 26.552512] 9pnet: -- p9_idpool_put (550): id 1 pool ffff9359f2c1ac00
[ 26.555121] 9pnet: -- v9fs_dir_release (550): inode: ffff9359f18319e0 filp: ffff9359f6300200 fid: 4
[ 26.555124] 9pnet: (00000550) >>> TCLUNK fid 4 (try 0)
But if we use 9p-handle as server, everything works fine: [ 74.256036] 9pnet: -- v9fs_fid_find (554): dentry: foo.fk2Whc (ffff905ff4db4300) uid 0 any 0 [ 74.256037] 9pnet: -- v9fs_fid_find_inode (554): inode: ffff905ff2025750 [ 74.256038] 9pnet: (00000554) >>> TSETATTR fid 4 [ 74.256040] 9pnet: (00000554) valid=68 mode=0 uid=2 gid=0 size=0 atime_sec=1501795991 atime_nsec=0 mtime_sec=1501795991 mtime_nsec=0 [ 74.256041] 9pnet: -- p9_client_prepare_req (554): client ffff905ff42e5000 op 26 [ 74.256043] 9pnet: -- p9_idpool_get (554): id 1 pool ffff905ff5251800 [ 74.256045] 9pnet: (00000554) >>> size=67 type: 26 tag: 1 [ 74.256045] 9pnet: -- p9_virtio_request (554): 9p debug: virtio request [ 74.256053] 9pnet: -- p9_virtio_request (554): virtio request kicked [ 74.256275] 9pnet: -- req_done (0): : request done [ 74.256277] 9pnet: -- p9_client_cb (0): tag 1 [ 74.256283] 9pnet: -- p9_client_cb (0): wakeup: 1 [ 74.256315] 9pnet: (00000554) <<< size=7 type: 27 tag: 1 [ 74.256316] 9pnet: (00000554) <<< RSETATTR fid 4
execve("./test_iperf3-manohar", ["./test_iperf3-manohar"], [/* 25 vars */]) = 0 brk(NULL) = 0x55909c273000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1d50817000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=13660, ...}) = 0 mmap(NULL, 13660, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f1d50813000 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\5\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1856752, ...}) = 0 mmap(NULL, 3959200, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f1d5022e000 mprotect(0x7f1d503eb000, 2097152, PROT_NONE) = 0 mmap(0x7f1d505eb000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7f1d505eb000 mmap(0x7f1d505f1000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f1d505f1000 close(3) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1d50811000 arch_prctl(ARCH_SET_FS, 0x7f1d50811700) = 0 mprotect(0x7f1d505eb000, 16384, PROT_READ) = 0 mprotect(0x55909be50000, 4096, PROT_READ) = 0 mprotect(0x7f1d5081a000, 4096, PROT_READ) = 0 munmap(0x7f1d50813000, 13660) = 0 fstat(1, {st_mode=S_IFREG|0644, st_size=1633, ...}) = 0 brk(NULL) = 0x55909c273000 brk(0x55909c295000) = 0x55909c295000 getpid() = 560 open("/tmp/foo.F4F4ul", O_RDWR|O_CREAT|O_EXCL, 0600) = 3 unlink("/tmp/foo.F4F4ul") = 0 ftruncate(3, 0) = 0 close(3) = 0 write(1, "DEBUG: template='/tmp/foo.XXXXXX"..., 150DEBUG: template='/tmp/foo.XXXXXX' SUCCESS: mkstemp(/tmp/foo.F4F4ul) worked SUCCESS: unlink(/tmp/foo.F4F4ul) worked SUCCESS: ftruncate(fd=3, 0) worked ) = 150 exit_group(0) = ? +++ exited with 0 +++
@gorozco1 @devimc we need to also run DPDK pmd to verify that all the issues have been fixed.
still occurs using device driver overlay2
@devimc , have you tested this with 9p-handle instead of 9p-local?
@rarindam nop, but I can try I tested with qemu 2.10.1 to see if it was fixed, but still occurs 😢
./test_iperf3-manohar DEBUG: template='/tmp/foo.XXXXXX' SUCCESS: mkstemp(/tmp/foo.mDEbT7) worked SUCCESS: unlink(/tmp/foo.mDEbT7) worked SUCCESS: ftruncate(fd=3, 0) worked
Kernel 4.14, Qemu 2.10: -fsdev local,id=fsdev0,security_model=none,path=/tmp/9p_shared -device virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=qtest
execve("./test_iperf3-manohar",
["./test_iperf3-manohar"], [/* 28 vars */]) = 0 brk(NULL) = 0x55892f9fe000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fab98d1e000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=13660, ...}) = 0 mmap(NULL, 13660, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fab98d1a000 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\5\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1856752, ...}) = 0 mmap(NULL, 3959200, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fab98735000 mprotect(0x7fab988f2000, 2097152, PROT_NONE) = 0 mmap(0x7fab98af2000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7fab98af2000 mmap(0x7fab98af8000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fab98af8000 close(3) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fab98d18000 arch_prctl(ARCH_SET_FS, 0x7fab98d18700) = 0 mprotect(0x7fab98af2000, 16384, PROT_READ) = 0 mprotect(0x55892dec7000, 4096, PROT_READ) = 0 mprotect(0x7fab98d21000, 4096, PROT_READ) = 0 munmap(0x7fab98d1a000, 13660) = 0 fstat(1, {st_mode=S_IFREG|0644, st_size=1633, ...}) = 0 brk(NULL) = 0x55892f9fe000 brk(0x55892fa20000) = 0x55892fa20000 getpid() = 426 open("/tmp/foo.1zEZ5T", O_RDWR|O_CREAT|O_EXCL, 0600) = 3 unlink("/tmp/foo.1zEZ5T") = 0 ftruncate(3, 0) = 0 close(3) = 0 write(1, "DEBUG: template='/tmp/foo.XXXXXX"..., 150DEBUG: template='/tmp/foo.XXXXXX' SUCCESS: mkstemp(/tmp/foo.1zEZ5T) worked SUCCESS: unlink(/tmp/foo.1zEZ5T) worked SUCCESS: ftruncate(fd=3, 0) worked ) = 150 exit_group(0) = ? +++ exited with 0 +++
@rarindam what kernel did you use? would you mind running the iperf test?
in clear containers this is the cmdline:
-device virtio-9p-pci,fsdev=ctr-9p-0,mount_tag=ctr-rootfs-0 -fsdev local,id=ctr-9p-0,path=/var/lib/docker/overlay2/232a11afb1dd6db15ef9a79d039c15ffdf85b09fea336420f5eb31eb186db2af/merged,security_model=none
@devimc Kernel 4.14, Qemu 2.10: -fsdev local,id=fsdev0,security_model=none,path=/tmp/9p_shared -device virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=qtest