snabb icon indicating copy to clipboard operation
snabb copied to clipboard

nfvconfig selftest hanging on raptorjit branch

Open takikawa opened this issue 7 years ago • 5 comments

Recently some CI runs on the raptorjit branch seem to be failing because the nfvconfig selftest hangs. The most recent runs don't show this because they are erroring with a merge issue, but AFAIK the issue is still there.

Here's an example SnabbBot log from #1332 with the issue: https://gist.github.com/SnabbBot/1a57acb59416bd84bafe173985fb6895

The test hangs with some output like this:

engine: new_link id11_Virtio.tx -> id11_NIC.input
engine: link_output id11_Virtio
engine: link_input id11_NIC
load: time: 1.85s  fps: 0         fpGbps: 0.000 fpb: 0   bpp: -    sleep: 100 us
testing: program/snabbnfv/test_fixtures/nfvconfig/scale_change/x

I minimized the test a bit to make the issue easier to reproduce:

module(...,package.seeall)

local virtual_ether_mux = require("lib.io.virtual_ether_mux")
local lib = require("core.lib")

function load (file, pciaddr)
   local ports = lib.load_conf(file)
   local c = config.new()
   virtual_ether_mux.configure(c, ports, {pci = "0000:82:00.0"})
   return c
end

function selftest ()
   print("start test")
   local path = "program/snabbnfv/test_fixtures/nfvconfig/scale_up/x"
   for i=1,10 do
      print(i)

      print("start popen")
      -- will get stuck here, filepath doesn't matter
      local f = io.popen("cat /proc/cpuinfo")
      f:read("*l")
      f:close()
      print("end popen")

      engine.configure(load(path))
      engine.main({duration = 0.1})
   end
   print("done")
end

The test hangs at the popen call. In the actual test, the popen happens due to a use of firstfile while getting some PCI device information for virtual_ether_mux.configure.

One thing I noticed is that the reason this only hangs on the raptorjit branch is that it only happens when vmprofile is on.

If you do an strace, you can see the reason for this:

write(1, "start popen\n", 12)           = 12
write(67, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 607) = 607
pipe2([68, 69], O_CLOEXEC)              = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe3135289d0) = ? ERESTARTNOINTR (To be restarted)
--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 56
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe3135289d0) = ? ERESTARTNOINTR (To be restarted)
--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 56
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe3135289d0) = ? ERESTARTNOINTR (To be restarted)
--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---

From this strace output, you can see that after a popen does a clone to make a child process the syscall is constantly interrupted by the 1ms SIGPROF timer for vmprofile. This causes the syscall to restart, but it gets interrupted again. And so it gets stuck in an infinite loop.

It looks like this has been an issue in other software like Chromium (https://bugs.chromium.org/p/chromium/issues/detail?id=83521) or with JVM profiling (https://github.com/jvm-profiling-tools/async-profiler/issues/97).

There seem to be a few potential ways to solve this:

  • Set the profiling timer to 10ms or more (this seems to fix the issue for my minimized test)
  • Avoid the use of fork, clone, etc if possible. For firstfile, it could use ljsyscall to do the filesystem operations.
  • Disable vmprofile for unit tests so CI will keep working.

The first one sounds unappealing since it would reduce the profiling granularity, but maybe it's acceptable. The second seems pretty do-able for this particular case.

Any thoughts on a preferred solution?

takikawa avatar May 02 '18 00:05 takikawa

AFAIK, at the moment it's not possible to disable vmprofiling in Raptorjit, as it's always on. By that I mean there's not flag or environment variable to disable profiling. So in case of going for 3) it would be necessary as well to implement this mechanism.

As for 2) initially it sounds good to me, I have one question though. As the issue is with system calls fork, clone, etc won't ljsyscall functions will translate to these system calls too?

dpino avatar May 02 '18 08:05 dpino

The PC-losering problem just won't die eh :).

I have a vague feeling that Linux signals can be setup to not interrupt system calls i.e. to defer the signal until after the system call completes. This could be a nice solution since we are not that interested in profiling system calls? I can't find how to do that right now so it is possible that I dreamed it.

lukego avatar May 02 '18 12:05 lukego

Some info. on blocking system calls here:

https://stackoverflow.com/questions/2853653/deferring-signal-handling-in-linux

On Wed, May 2, 2018, 08:14 Luke Gorrie, [email protected] wrote:

The PC-losering https://www.dreamsongs.com/WIB.html problem just won't die eh :).

I have a vague feeling that Linux signals can be setup to not interrupt system calls i.e. to defer the signal until after the system call completes. This could be a nice solution since we are not that interested in profiling system calls? I can't find how to do that right now so it is possible that I dreamed it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/snabbco/snabb/issues/1337#issuecomment-385957441, or mute the thread https://github.com/notifications/unsubscribe-auth/AE7ethP_ObXezPOkodRn9Mb5ZmLMaCeJks5tuaMbgaJpZM4TusHx .

tobyriddell avatar May 02 '18 12:05 tobyriddell

Would it solve the problem to switch our setitimer(2) call from ITIMER_PROF to ITIMER_VIRTUAL?

This should only profile the time spent running in userspace and a reasonable person might expect (...) that this would guarantee that system calls are never interrupted.

This would make vmprofile less useful for profiling code that spends time in kernel space but that is not especially relevant to Snabb (and reasonably well covered by perf externally too.)

lukego avatar May 03 '18 09:05 lukego

@lukego Thanks, that sounds like a good idea! I made a PR for that at #1340. It seems to get rid of the selftest hang nicely. Not sure if it's made much difference in vmprofile results, but some basic sanity checks comparing the results with/without ITIMER_PROF seemed ok anyway.

takikawa avatar May 03 '18 23:05 takikawa