snabblab-nixos Running out of hugepages on Murren

Some builds on Murren failed because snabb was unable to satisfy its thirst for hugepages. Related might be the unusually high demand and inter-process traffic of hugepages by this test case.

Notable is that snabb attempt to raise vm.nr_hugepages beyond Murren’s limit of 4096, and that the test case does succeed on lugano-1 which has only ~3000 hugepages allocated.

The error looks like this:

[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4096 -> 4097]
[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4097 -> 4098]
[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4098 -> 4099]
core/main.lua:28: Failed to allocate a huge page for DMA


Stack Traceback
===============
(1) Lua function 'handler' at file 'core/main.lua:177' (best guess)
	Local variables:
	 reason = string: "core/main.lua:28: Failed to allocate a huge page for DMA"
	 (*temporary) = C function: print
(2) global C function 'error'
(3) Lua global 'assert' at file 'core/main.lua:28'
	Local variables:
	 v = nil
(4) Lua global 'allocate_next_chunk' at file 'core/memory.lua:47'
(5) Lua field 'dma_alloc' at file 'core/memory.lua:30'
	Local variables:
	 bytes = number: 10754
	 align = number: 512
(6) Lua global 'new_packet' at file 'core/packet.lua:93'
(7) Lua global 'preallocate_step' at file 'core/packet.lua:198'
	Local variables:
	 (for index) = number: 1
	 (for limit) = number: 1000
	 (for step) = number: 1
	 i = number: 1
	 (*temporary) = Lua function 'free' (defined at line 169 of chunk core/packet.lua)
(8) Lua field 'allocate' at file 'core/packet.lua:86'
(9) Lua global 'test_packets' at file 'program/vita/test.lua:24'
	Local variables:
	 pktsize = number: 1000
	 sizes = table: 0x41994d08  {1:1000}
	 packets = table: 0x41994d48  {}
	 (for generator) = C function: builtin#6
	 (for state) = table: 0x41994d08  {1:1000}
	 (for control) = number: 1
	 _ = number: 1
	 size = number: 1000
	 payload_size = number: 966
	 (*temporary) = Lua function 'datagram' (defined at line 104 of chunk lib/protocol/datagram.lua)
	 (*temporary) = table: 0x41034ee8  {packet:function: 0x41035470, _freelist:table: 0x41034f10, pop_raw:function: 0x41035198 (more...)}
	 (*temporary) = Lua function 'resize' (defined at line 186 of chunk core/packet.lua)
(10) main chunk of file 'program/vita/test.lua' at line 51
(11) global C function 'dofile'
(12) Lua global 'run_script' at file 'program/snsh/snsh.lua:87'
	Local variables:
	 parameters = table: 0x416dad38  {1:1000, 2:100e6}
	 command = string: "program/vita/test.lua"
(13) Lua field 'run' at file 'program/snsh/snsh.lua:71'
	Local variables:
	 parameters = table: 0x416dad38  {1:1000, 2:100e6}
	 profiling = boolean: false
	 traceprofiling = boolean: false
	 start_repl = boolean: false
	 noop = boolean: true
	 program = nil
	 opt = table: 0x416dabf8  {t:function: 0x416dacb8, q:function: 0x416dacd8, P:function: 0x41cdbbd8 (more...)}
(14) Lua function 'main' at file 'core/main.lua:73' (best guess)
	Local variables:
	 program = string: "snsh"
	 args = table: 0x416cfcd0  {1:program/vita/test.lua, 2:1000, 3:100e6}
(15) global C function 'xpcall'
(16) main chunk of file 'core/main.lua' at line 239
(17)  C function 'require'
(18) global C function 'pcall'
(19) main chunk of file 'core/startup.lua' at line 3
(20) global C function 'require'
(21) main chunk of [string "require "core.startup""] at line 1
	nil

Sep 11 '17 11:09 eugeneia

I reproduced these errors in a second run, notable is that vm.nr_hugepages increases across runs:

(these were both run on murren-7, I assume in succession)

[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4099 -> 4100]
[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4100 -> 4101]
[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4101 -> 4102]
core/main.lua:28: Failed to allocate a huge page for DMA

[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4102 -> 4103]
[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4103 -> 4104]
[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4104 -> 4105]
core/main.lua:28: Failed to allocate a huge page for DMA

Sep 11 '17 13:09 eugeneia

How many huge pages do you need for this test?

If it's more than ~4100 x 2MB pages (enough for around one million packets) then the problem is probably that the server cannot provide that many.

If it's less than this then the problem is probably not running out of huge pages but something else. The memory allocation code is a bit simple and on a failure it will always just ask the kernel to provision a new page and try again. This only actually helps in the case where the problem is no huge pages available to allocate and then the kernel can successfully provision a new one. (This can be the normal situation on a freshly booted Linux box that has plenty of RAM but has not yet allocated any to huge pages.)

Can you see more details somewhere e.g. how many allocations you are successfully making before the failure, what is the error reason, why so many attempts to mount the hugetlbfs?

(If you set SNABB_SHM_ROOT somewhere else then maybe the problem is /var/run/snabb not existing and so the mount failing? Is there anything special/different about how you setup the env? Seems like other tests are running successfully still.)

Sep 12 '17 10:09 lukego

(If you set SNABB_SHM_ROOT somewhere else then maybe the problem is /var/run/snabb not existing and so the mount failing? Is there anything special/different about how you setup the env? Seems like other tests are running successfully still.)

This is it! It must be. On a side note, is there an existing idiom to save the shm directory (including VMProfile data in my case) as a build output? I have naively set SNABB_SHM_ROOT=$out/shm.

Sep 12 '17 10:09 eugeneia

ensure_hugetlbfs does syscall.mkdir("/var/run/snabb/hugetlbfs") before attempting to mount though.

https://github.com/snabbco/snabb/blob/master/src/core/memory.lua#L188

Sep 12 '17 10:09 eugeneia

The test needs no more than 3000 hugepages.

Sep 12 '17 10:09 eugeneia

I added a line of code to have the error message of allocate_huge_page printed which confirms that its the mount failing:

[mounting /var/run/snabb/hugetlbfs]
[memory: failed to allocate hugepage: core/main.lua:28: failed to (re)mount /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4103 -> 4104]
[mounting /var/run/snabb/hugetlbfs]
[memory: failed to allocate hugepage: core/main.lua:28: failed to (re)mount /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4104 -> 4105]
[mounting /var/run/snabb/hugetlbfs]
[memory: failed to allocate hugepage: core/main.lua:28: failed to (re)mount /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4105 -> 4106]
core/main.lua:28: Failed to allocate a huge page for DMA

Now I just have to find out why.

Sep 12 '17 11:09 eugeneia

@eugeneia seems like you're on the path of figuring this out - do you need some help from me? :)

Sep 12 '17 11:09 domenkozar

@eugeneia Could be that we should always put the huge pages in /var/run/snabb/hugetlbfs rather than under SNABB_SHM_ROOT. The breakage is may be that SNABB_SHM_ROOT is pointing to a directory that is not allowed to be a mount point due to the way its own filesystem is mounted.

Here is a draft commit of mine to add a keepShm option to Hydra builds to archive the shm files. Could adapt this? Please be conservative about pushing changes like this to snabblab-nixos/master branch since it will invalidate all builds (better to make your Hydra job point to a dev branch of snabblab-nixos.)

Sep 12 '17 11:09 lukego

Oops: "Here is a draft commit of mine" => https://github.com/lukego/snabblab-nixos/commit/78ea36e09f03336b6c237e15f4b4bdbf810ebb0b.

Sep 12 '17 11:09 lukego

@eugeneia btw the next big step for Studio will be to add scripts like

snabb.inspect-hydra-build 1234
snabb.compare-hydra-builds 1234 5678

and so on to make it really easy to pull up traces / vmprofile / timeline / etc for a given test and apply all the R code, interactive visualizations, etc. Itching to get this from the future into the present......

Sep 12 '17 11:09 lukego

(re: my patch I think it would be simpler to make one big tarball with a predictable name and make it into a Hydra build product that's visible on the web UI.)

Sep 12 '17 11:09 lukego

snabblab-nixos snabblab-nixos copied to clipboard

Running out of hugepages on Murren

snabblab-nixos
snabblab-nixos copied to clipboard