snabblab-nixos
snabblab-nixos copied to clipboard
Running out of hugepages on Murren
Some builds on Murren failed because snabb was unable to satisfy its thirst for hugepages. Related might be the unusually high demand and inter-process traffic of hugepages by this test case.
Notable is that snabb attempt to raise vm.nr_hugepages beyond Murren’s limit of 4096, and that the test case does succeed on lugano-1 which has only ~3000 hugepages allocated.
The error looks like this:
[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4096 -> 4097]
[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4097 -> 4098]
[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4098 -> 4099]
core/main.lua:28: Failed to allocate a huge page for DMA
Stack Traceback
===============
(1) Lua function 'handler' at file 'core/main.lua:177' (best guess)
Local variables:
reason = string: "core/main.lua:28: Failed to allocate a huge page for DMA"
(*temporary) = C function: print
(2) global C function 'error'
(3) Lua global 'assert' at file 'core/main.lua:28'
Local variables:
v = nil
(4) Lua global 'allocate_next_chunk' at file 'core/memory.lua:47'
(5) Lua field 'dma_alloc' at file 'core/memory.lua:30'
Local variables:
bytes = number: 10754
align = number: 512
(6) Lua global 'new_packet' at file 'core/packet.lua:93'
(7) Lua global 'preallocate_step' at file 'core/packet.lua:198'
Local variables:
(for index) = number: 1
(for limit) = number: 1000
(for step) = number: 1
i = number: 1
(*temporary) = Lua function 'free' (defined at line 169 of chunk core/packet.lua)
(8) Lua field 'allocate' at file 'core/packet.lua:86'
(9) Lua global 'test_packets' at file 'program/vita/test.lua:24'
Local variables:
pktsize = number: 1000
sizes = table: 0x41994d08 {1:1000}
packets = table: 0x41994d48 {}
(for generator) = C function: builtin#6
(for state) = table: 0x41994d08 {1:1000}
(for control) = number: 1
_ = number: 1
size = number: 1000
payload_size = number: 966
(*temporary) = Lua function 'datagram' (defined at line 104 of chunk lib/protocol/datagram.lua)
(*temporary) = table: 0x41034ee8 {packet:function: 0x41035470, _freelist:table: 0x41034f10, pop_raw:function: 0x41035198 (more...)}
(*temporary) = Lua function 'resize' (defined at line 186 of chunk core/packet.lua)
(10) main chunk of file 'program/vita/test.lua' at line 51
(11) global C function 'dofile'
(12) Lua global 'run_script' at file 'program/snsh/snsh.lua:87'
Local variables:
parameters = table: 0x416dad38 {1:1000, 2:100e6}
command = string: "program/vita/test.lua"
(13) Lua field 'run' at file 'program/snsh/snsh.lua:71'
Local variables:
parameters = table: 0x416dad38 {1:1000, 2:100e6}
profiling = boolean: false
traceprofiling = boolean: false
start_repl = boolean: false
noop = boolean: true
program = nil
opt = table: 0x416dabf8 {t:function: 0x416dacb8, q:function: 0x416dacd8, P:function: 0x41cdbbd8 (more...)}
(14) Lua function 'main' at file 'core/main.lua:73' (best guess)
Local variables:
program = string: "snsh"
args = table: 0x416cfcd0 {1:program/vita/test.lua, 2:1000, 3:100e6}
(15) global C function 'xpcall'
(16) main chunk of file 'core/main.lua' at line 239
(17) C function 'require'
(18) global C function 'pcall'
(19) main chunk of file 'core/startup.lua' at line 3
(20) global C function 'require'
(21) main chunk of [string "require "core.startup""] at line 1
nil
I reproduced these errors in a second run, notable is that vm.nr_hugepages increases across runs:
(these were both run on murren-7, I assume in succession)
[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4099 -> 4100]
[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4100 -> 4101]
[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4101 -> 4102]
core/main.lua:28: Failed to allocate a huge page for DMA
[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4102 -> 4103]
[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4103 -> 4104]
[mounting /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4104 -> 4105]
core/main.lua:28: Failed to allocate a huge page for DMA
How many huge pages do you need for this test?
If it's more than ~4100 x 2MB pages (enough for around one million packets) then the problem is probably that the server cannot provide that many.
If it's less than this then the problem is probably not running out of huge pages but something else. The memory allocation code is a bit simple and on a failure it will always just ask the kernel to provision a new page and try again. This only actually helps in the case where the problem is no huge pages available to allocate and then the kernel can successfully provision a new one. (This can be the normal situation on a freshly booted Linux box that has plenty of RAM but has not yet allocated any to huge pages.)
Can you see more details somewhere e.g. how many allocations you are successfully making before the failure, what is the error reason, why so many attempts to mount the hugetlbfs?
(If you set SNABB_SHM_ROOT somewhere else then maybe the problem is /var/run/snabb not existing and so the mount failing? Is there anything special/different about how you setup the env? Seems like other tests are running successfully still.)
(If you set SNABB_SHM_ROOT somewhere else then maybe the problem is /var/run/snabb not existing and so the mount failing? Is there anything special/different about how you setup the env? Seems like other tests are running successfully still.)
This is it! It must be. On a side note, is there an existing idiom to save the shm directory (including VMProfile data in my case) as a build output? I have naively set SNABB_SHM_ROOT=$out/shm.
ensure_hugetlbfs does syscall.mkdir("/var/run/snabb/hugetlbfs") before attempting to mount though.
https://github.com/snabbco/snabb/blob/master/src/core/memory.lua#L188
The test needs no more than 3000 hugepages.
I added a line of code to have the error message of allocate_huge_page printed which confirms that its the mount failing:
[mounting /var/run/snabb/hugetlbfs]
[memory: failed to allocate hugepage: core/main.lua:28: failed to (re)mount /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4103 -> 4104]
[mounting /var/run/snabb/hugetlbfs]
[memory: failed to allocate hugepage: core/main.lua:28: failed to (re)mount /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4104 -> 4105]
[mounting /var/run/snabb/hugetlbfs]
[memory: failed to allocate hugepage: core/main.lua:28: failed to (re)mount /var/run/snabb/hugetlbfs]
[memory: Provisioned a huge page: sysctl vm.nr_hugepages 4105 -> 4106]
core/main.lua:28: Failed to allocate a huge page for DMA
Now I just have to find out why.
@eugeneia seems like you're on the path of figuring this out - do you need some help from me? :)
@eugeneia Could be that we should always put the huge pages in /var/run/snabb/hugetlbfs rather than under SNABB_SHM_ROOT. The breakage is may be that SNABB_SHM_ROOT is pointing to a directory that is not allowed to be a mount point due to the way its own filesystem is mounted.
Here is a draft commit of mine to add a keepShm option to Hydra builds to archive the shm files. Could adapt this? Please be conservative about pushing changes like this to snabblab-nixos/master branch since it will invalidate all builds (better to make your Hydra job point to a dev branch of snabblab-nixos.)
Oops: "Here is a draft commit of mine" => https://github.com/lukego/snabblab-nixos/commit/78ea36e09f03336b6c237e15f4b4bdbf810ebb0b.
@eugeneia btw the next big step for Studio will be to add scripts like
snabb.inspect-hydra-build 1234
snabb.compare-hydra-builds 1234 5678
and so on to make it really easy to pull up traces / vmprofile / timeline / etc for a given test and apply all the R code, interactive visualizations, etc. Itching to get this from the future into the present......
(re: my patch I think it would be simpler to make one big tarball with a predictable name and make it into a Hydra build product that's visible on the web UI.)