cc-oci-runtime icon indicating copy to clipboard operation
cc-oci-runtime copied to clipboard

networkstatic/iperf3 containers fails to run with cors

Open mcastelino opened this issue 9 years ago • 27 comments

If you try to run iperf3 with runc as the runtime it works. However if the same container is run with cor as a runtime it will fail with the error

iperf3: error - unable to create a new stream: No such file or directory

This may have something to do with the setup of /tmp or an environment variable in cor.

mcastelino avatar Aug 22 '16 22:08 mcastelino

This is looking like a 9p bug. I created a test program (attached) that makes the following calls (similar to what iperf_new_stream() in iperf3 is doing. The program works fine in a normal environment, but fails in a CC on the call to ftruncate(). Here's some sample strace output:

execve("./test_iperf3-manohar", ["./test_iperf3-manohar"], [/* 9 vars */]) = 0
uname({sysname="Linux", nodename="7e55c4113e73", ...}) = 0
brk(NULL)                               = 0x186b000
brk(0x186c1c0)                          = 0x186c1c0
arch_prctl(ARCH_SET_FS, 0x186b880)      = 0
readlink("/proc/self/exe", "/shared/test_iperf3-manohar", 4096) = 27
brk(0x188d1c0)                          = 0x188d1c0
brk(0x188e000)                          = 0x188e000
fstat(1, {st_mode=S_IFREG|0644, st_size=468, ...}) = 0
gettimeofday({1471958259, 200678}, NULL) = 0
getpid()                                = 360
open("/tmp/foo.O2LCdw", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlink("/tmp/foo.O2LCdw")               = 0
ftruncate(3, 0)                       
[test_iperf3-manohar.c.txt](https://github.com/01org/cc-oci-runtime/files/432478/test_iperf3-manohar.c.txt)
  = -1 ENOENT (No such file or directory)
write(2, "ERROR: ftruncate(fd=3, 0) failed"..., 42ERROR: ftruncate(fd=3, 0) failed: errno=2
) = 42
write(1, "DEBUG: template='/tmp/foo.XXXXXX"..., 115DEBUG: template='/tmp/foo.XXXXXX'
SUCCESS: mkstemp(/tmp/foo.O2LCdw) worked
SUCCESS: unlink(/tmp/foo.O2LCdw) worked
) = 115
exit_group(1)                           = ?
+++ exited with 1 +++

A workaround is to run the following in the container before starting the app:

# mount -t ramfs -o size=20M ramfs /tmp

jodh-intel avatar Aug 23 '16 13:08 jodh-intel

Problem is similar to #47.

jodh-intel avatar Aug 23 '16 13:08 jodh-intel

just an update. Yes, I believe the problem is similar to #47, but different as #47 did unlink-fstat, and this one is doing unlink-ftruncate . Same fundamental issue likely (that at some point in the code path the lookup is done by the pathname and not the fd/handle and thus fails to find the unlinked file), but through a slightly different code path. This is reasonably hard to track down, and then if in the VFS layer on the client it may be reasonably hard to fix. It could be over in the qemu host side though, which is probably the first place to look (if only as it is probably a bit easier to debug)

grahamwhaley avatar Sep 08 '16 17:09 grahamwhaley

@anthonyzxu do you have any comments on this issue?

gorozco1 avatar Sep 27 '16 22:09 gorozco1

For the record. After chasing this down for https://github.com/coreos/rkt/pull/3475, there was a patch set in test that would fix this: https://sourceforge.net/p/v9fs/mailman/v9fs-developer/thread/146659832556.15781.17414806975641516683.stgit%40bahia.lan/#msg35175775

but a race condition still exists, and those patches are still in flight and not upstreamed yet. If this issue does rise in importance then we could 'update' to those latest patches and re-evaluate.

grahamwhaley avatar Dec 20 '16 10:12 grahamwhaley

As this is well understood, and I believe we should be able to use a ramfs /tmp as a workaround, can we drop the priority of this down from a P1 do we think @jodh-intel @dlespiau

grahamwhaley avatar Dec 20 '16 11:12 grahamwhaley

@grahamwhaley - I concur.

The problem is that the /tmp workaround is only going to work for /tmp: iperf3 seems well-behaved in using a temporary directory, but if any other apps try to use ftruncate(2) on, say, /, they will see the problem.

As such, we'll have to document this limitation carefully. In fact, I almost wonder if we should generate some sort of warning message if these problematic 9p calls (via a patch). That way, we could audit the scope of the problem in say the top 100 images (#226). As and when a full solution is available, we can drop the 9p audit patch.

jodh-intel avatar Dec 20 '16 11:12 jodh-intel

For completeness, here's some sample strace output from a successful run of the program attached in comment https://github.com/01org/cc-oci-runtime/issues/152#issuecomment-241728833:

execve("./test_iperf3-manohar", ["./test_iperf3-manohar"], [/* 111 vars */]) = 0
brk(NULL)                               = 0xa63000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4974cb8000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=143417, ...}) = 0
mmap(NULL, 143417, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4974c94000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\10\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2089496, ...}) = 0
mmap(NULL, 3938656, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f49746d4000
mprotect(0x7f497488d000, 2093056, PROT_NONE) = 0
mmap(0x7f4974a8c000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b8000) = 0x7f4974a8c000
mmap(0x7f4974a92000, 14688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f4974a92000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4974c93000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4974c92000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4974c91000
arch_prctl(ARCH_SET_FS, 0x7f4974c92700) = 0
mprotect(0x7f4974a8c000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ)     = 0
mprotect(0x7f4974cb9000, 4096, PROT_READ) = 0
munmap(0x7f4974c94000, 143417)          = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 8), ...}) = 0
brk(NULL)                               = 0xa63000
brk(0xa84000)                           = 0xa84000
brk(NULL)                               = 0xa84000
write(1, "DEBUG: template='/tmp/foo.XXXXXX"..., 34DEBUG: template='/tmp/foo.XXXXXX'
) = 34
getpid()                                = 438
open("/tmp/foo.dewhr0", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
write(1, "SUCCESS: mkstemp(/tmp/foo.dewhr0"..., 41SUCCESS: mkstemp(/tmp/foo.dewhr0) worked
) = 41
unlink("/tmp/foo.dewhr0")               = 0
write(1, "SUCCESS: unlink(/tmp/foo.dewhr0)"..., 40SUCCESS: unlink(/tmp/foo.dewhr0) worked
) = 40
ftruncate(3, 0)                         = 0
write(1, "SUCCESS: ftruncate(fd=3, 0) work"..., 35SUCCESS: ftruncate(fd=3, 0) worked
) = 35
close(3)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

jodh-intel avatar Dec 20 '16 11:12 jodh-intel

@jcvenegas are any updates about this?

GabyCT avatar Apr 03 '17 21:04 GabyCT

@GabyCT not so far, I did not take a look, I think the last update is @grahamwhaley research.

jcvenegas avatar Apr 03 '17 21:04 jcvenegas

Right, I peeked at this issue last week and thought we should be able to 'fix' it by taking the latest kernel and QEMU patchsets for the 9pfs unlink fixes. @GabyCT , if you want to develop the tests then I think you could use the ramfs tmp mount 'trick' listed early in the ticket. I'll at least open an Issue to note we need to check and probably update the 9pfs code.

grahamwhaley avatar Apr 04 '17 08:04 grahamwhaley

This sounds like a gnarly one to fix. But how about we downgrade to P2 given that there is a workaround of sorts?

jodh-intel avatar Apr 06 '17 13:04 jodh-intel

I could agree to downgrade to P2. I'm not going to get to look at the patch upgrades in the immediate future. Worth pulling in @anthonyzxu for his thoughts in case it is a work in progress already?

grahamwhaley avatar Apr 06 '17 13:04 grahamwhaley

Can you test with the following changes in /usr/share/defaults/cc-oci-runtime/hypervisor.args:

In the qemu command line, following option is being used: fsdev local,id=test_dev,path=,security_model=none

Instead please use the following: -fsdev handle,id=test_dev,path=

Manohar saw the issue being fixed with the above change.

rarindam avatar Jun 06 '17 18:06 rarindam

@rarindam no, this issue still exists after your suggested method is used.

-fsdev local,id=extra-9p-hyperShared,path=/tmp/hyper/shared/pods/7cefd7d5746cef74f9fb04c9f246b05bf6ac7e0bc38c0457115f4d443307546e,security_model=none

wuzhy avatar Jul 20 '17 03:07 wuzhy

Hi @wuzhy , -fsdev local should be -fsdev handle. That is what fixed the issue for Manohar.

rarindam avatar Jul 20 '17 05:07 rarindam

@rarindam Sorry for the late reply, Just i tried, and found that Clear Container failed when "-fsdev handle" was used. Below is the log:

2017-08-01 01:38:22.896147955 +0800 CST:7266:cc-runtime:error:Container still running, should be stopped

wuzhy avatar Jul 31 '17 17:07 wuzhy

@wuzhy, it has be some other issue other than the option I provided. You have to remove security_model option if using handle, instead of local. Please try it out. I am providing this update from my end:

  1. -fsdev handle certainly is helping me run the test-iperf code with success: DEBUG: template='/tmp/foo.XXXXXX' SUCCESS: mkstemp(/tmp/foo.mqbCsO) worked SUCCESS: unlink(/tmp/foo.mqbCsO) worked SUCCESS: ftruncate(fd=3, 0) worked

@mcastelino 2. This issue seems to be actively tracked here: https://bugs.launchpad.net/qemu/+bug/1336794 Here instead of ftruncate they are using fstat, hitting the same issue. I tried the solutions provided, but that does not fix out issue. In fact this solution itself is not committed hence the bug is still open.

  1. Using 9pVhost: The same test also passes if I use 9pvhost. The issue is not seen there.

  2. In my testing, I could see the following: After UNLINK is executed, ftruncate is implemented using setattr (with set size flag). In the case when 9p-local is used as server, this results in error:

[ 26.551849] 9pnet: (00000550) >>> TSETATTR fid 4 [ 26.551854] 9pnet: (00000550) valid=68 mode=0 uid=2 gid=0 size=0 atime_sec=1501796060 atime_nsec=0 mtime_sec=1501796060 mtime_nsec=0 [ 26.551872] 9pnet: -- p9_client_prepare_req (550): client ffff9359f473f000 op 26 [ 26.552121] 9pnet: -- p9_idpool_get (550): id 1 pool ffff9359f2c1ac00 [ 26.552127] 9pnet: (00000550) >>> size=67 type: 26 tag: 1 [ 26.552132] 9pnet: -- p9_virtio_request (550): 9p debug: virtio request
[ 26.552149] 9pnet: -- p9_virtio_request (550): virtio request kicked [ 26.552424] 9pnet: -- req_done (0): : request done
[ 26.552438] 9pnet: -- p9_client_cb (0): tag 1 [ 26.552454] 9pnet: -- p9_client_cb (0): wakeup: 1 [ 26.552505] 9pnet: (00000550) <<< size=11 type: 7 tag: 1 [ 26.552507] 9pnet: (00000550) <<< RLERROR (-2) [ 26.552510] 9pnet: -- p9_free_req (550): clnt ffff9359f473f000 req ffff9359f2c0c070 tag: 1 [ 26.552512] 9pnet: -- p9_idpool_put (550): id 1 pool ffff9359f2c1ac00
[ 26.555121] 9pnet: -- v9fs_dir_release (550): inode: ffff9359f18319e0 filp: ffff9359f6300200 fid: 4 [ 26.555124] 9pnet: (00000550) >>> TCLUNK fid 4 (try 0)

But if we use 9p-handle as server, everything works fine: [ 74.256036] 9pnet: -- v9fs_fid_find (554): dentry: foo.fk2Whc (ffff905ff4db4300) uid 0 any 0 [ 74.256037] 9pnet: -- v9fs_fid_find_inode (554): inode: ffff905ff2025750 [ 74.256038] 9pnet: (00000554) >>> TSETATTR fid 4 [ 74.256040] 9pnet: (00000554) valid=68 mode=0 uid=2 gid=0 size=0 atime_sec=1501795991 atime_nsec=0 mtime_sec=1501795991 mtime_nsec=0 [ 74.256041] 9pnet: -- p9_client_prepare_req (554): client ffff905ff42e5000 op 26 [ 74.256043] 9pnet: -- p9_idpool_get (554): id 1 pool ffff905ff5251800 [ 74.256045] 9pnet: (00000554) >>> size=67 type: 26 tag: 1 [ 74.256045] 9pnet: -- p9_virtio_request (554): 9p debug: virtio request [ 74.256053] 9pnet: -- p9_virtio_request (554): virtio request kicked [ 74.256275] 9pnet: -- req_done (0): : request done [ 74.256277] 9pnet: -- p9_client_cb (0): tag 1 [ 74.256283] 9pnet: -- p9_client_cb (0): wakeup: 1 [ 74.256315] 9pnet: (00000554) <<< size=7 type: 27 tag: 1 [ 74.256316] 9pnet: (00000554) <<< RSETATTR fid 4

rarindam avatar Aug 04 '17 21:08 rarindam

execve("./test_iperf3-manohar", ["./test_iperf3-manohar"], [/* 25 vars */]) = 0 brk(NULL) = 0x55909c273000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1d50817000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=13660, ...}) = 0 mmap(NULL, 13660, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f1d50813000 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\5\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1856752, ...}) = 0 mmap(NULL, 3959200, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f1d5022e000 mprotect(0x7f1d503eb000, 2097152, PROT_NONE) = 0 mmap(0x7f1d505eb000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7f1d505eb000 mmap(0x7f1d505f1000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f1d505f1000 close(3) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1d50811000 arch_prctl(ARCH_SET_FS, 0x7f1d50811700) = 0 mprotect(0x7f1d505eb000, 16384, PROT_READ) = 0 mprotect(0x55909be50000, 4096, PROT_READ) = 0 mprotect(0x7f1d5081a000, 4096, PROT_READ) = 0 munmap(0x7f1d50813000, 13660) = 0 fstat(1, {st_mode=S_IFREG|0644, st_size=1633, ...}) = 0 brk(NULL) = 0x55909c273000 brk(0x55909c295000) = 0x55909c295000 getpid() = 560 open("/tmp/foo.F4F4ul", O_RDWR|O_CREAT|O_EXCL, 0600) = 3 unlink("/tmp/foo.F4F4ul") = 0 ftruncate(3, 0) = 0 close(3) = 0 write(1, "DEBUG: template='/tmp/foo.XXXXXX"..., 150DEBUG: template='/tmp/foo.XXXXXX' SUCCESS: mkstemp(/tmp/foo.F4F4ul) worked SUCCESS: unlink(/tmp/foo.F4F4ul) worked SUCCESS: ftruncate(fd=3, 0) worked ) = 150 exit_group(0) = ? +++ exited with 0 +++

rarindam avatar Aug 04 '17 21:08 rarindam

@gorozco1 @devimc we need to also run DPDK pmd to verify that all the issues have been fixed.

mcastelino avatar Oct 09 '17 15:10 mcastelino

still occurs using device driver overlay2

devimc avatar Nov 17 '17 00:11 devimc

@devimc , have you tested this with 9p-handle instead of 9p-local?

rarindam avatar Nov 17 '17 18:11 rarindam

@rarindam nop, but I can try I tested with qemu 2.10.1 to see if it was fixed, but still occurs 😢

devimc avatar Nov 17 '17 20:11 devimc

./test_iperf3-manohar DEBUG: template='/tmp/foo.XXXXXX' SUCCESS: mkstemp(/tmp/foo.mDEbT7) worked SUCCESS: unlink(/tmp/foo.mDEbT7) worked SUCCESS: ftruncate(fd=3, 0) worked

Kernel 4.14, Qemu 2.10: -fsdev local,id=fsdev0,security_model=none,path=/tmp/9p_shared -device virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=qtest

execve("./test_iperf3-manohar",

["./test_iperf3-manohar"], [/* 28 vars */]) = 0 brk(NULL) = 0x55892f9fe000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fab98d1e000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=13660, ...}) = 0 mmap(NULL, 13660, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fab98d1a000 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\5\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1856752, ...}) = 0 mmap(NULL, 3959200, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fab98735000 mprotect(0x7fab988f2000, 2097152, PROT_NONE) = 0 mmap(0x7fab98af2000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7fab98af2000 mmap(0x7fab98af8000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fab98af8000 close(3) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fab98d18000 arch_prctl(ARCH_SET_FS, 0x7fab98d18700) = 0 mprotect(0x7fab98af2000, 16384, PROT_READ) = 0 mprotect(0x55892dec7000, 4096, PROT_READ) = 0 mprotect(0x7fab98d21000, 4096, PROT_READ) = 0 munmap(0x7fab98d1a000, 13660) = 0 fstat(1, {st_mode=S_IFREG|0644, st_size=1633, ...}) = 0 brk(NULL) = 0x55892f9fe000 brk(0x55892fa20000) = 0x55892fa20000 getpid() = 426 open("/tmp/foo.1zEZ5T", O_RDWR|O_CREAT|O_EXCL, 0600) = 3 unlink("/tmp/foo.1zEZ5T") = 0 ftruncate(3, 0) = 0 close(3) = 0 write(1, "DEBUG: template='/tmp/foo.XXXXXX"..., 150DEBUG: template='/tmp/foo.XXXXXX' SUCCESS: mkstemp(/tmp/foo.1zEZ5T) worked SUCCESS: unlink(/tmp/foo.1zEZ5T) worked SUCCESS: ftruncate(fd=3, 0) worked ) = 150 exit_group(0) = ? +++ exited with 0 +++

rarindam avatar Nov 22 '17 23:11 rarindam

@rarindam what kernel did you use? would you mind running the iperf test?

in clear containers this is the cmdline:

-device virtio-9p-pci,fsdev=ctr-9p-0,mount_tag=ctr-rootfs-0 -fsdev local,id=ctr-9p-0,path=/var/lib/docker/overlay2/232a11afb1dd6db15ef9a79d039c15ffdf85b09fea336420f5eb31eb186db2af/merged,security_model=none

devimc avatar Nov 23 '17 17:11 devimc

@devimc Kernel 4.14, Qemu 2.10: -fsdev local,id=fsdev0,security_model=none,path=/tmp/9p_shared -device virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=qtest

rarindam avatar Dec 01 '17 21:12 rarindam