faasm icon indicating copy to clipboard operation
faasm copied to clipboard

Remove TSAN and ASAN runs

Open csegarragonz opened this issue 3 years ago • 5 comments
trafficstars

ASAN and TSAN builds of the tests are consuming upwards of 15 and 20 GB of memory each. This is bricking the Github-hosted runners.

If we run a self-hosted runner in one of our readily available servers, we are severly limited in terms of concurrency by two factors: (i) having to deploy one runner per concurrent job, and (ii) running out of the 32 GBs of memory. This means that we'd have to run the sanitised tests in serial, accounting for > 1h of execution time. As a consequence, we disable them altogether.

For reference, TSAN was already removed in #650. Also for future reference, here are the TSAN runtime flags and the ASAN runtime flags.

csegarragonz avatar Jun 20 '22 16:06 csegarragonz

Pinging @eigenraven to pick their brains. Is there something we are completely overlooking here?

csegarragonz avatar Jun 21 '22 16:06 csegarragonz

Hi! For TSAN, have you tried reducing history size? The memory used grows exponentially with the size parameter in the options. As for ASAN, I'm not sure why the sudden jump - did you change anything with memory management that could've caused more allocations? It usually doesn't have that high of an overhead, it mostly consumes virtual memory which is almost free.

eigenraven avatar Jun 21 '22 16:06 eigenraven

@eigenraven thanks for the tips! For ASAN, as you say, I think there might be another problem. For TSAN, changing the history_size to 0 does not help. I am a bit puzzled by that, as I was expecting a big reduce in memory consumption. However, it seems that TSAN's runtime flags need to be provided as a space-spearated (and not colon-separated) list. So we may have been using the default value of 2 all this time.

csegarragonz avatar Jun 22 '22 10:06 csegarragonz

@csegarragonz looking at the current failed run, there seems to be 1) a lot of useless tls access logging, 2) a lot of warnings about unreclaimed memory https://github.com/faasm/faasm/runs/7000258660?check_suite_focus=true#step%3A23%3A5773= - this looks like faasm logs rather than tsan logs

eigenraven avatar Jun 22 '22 10:06 eigenraven

Looks like this check is failing repeatedly https://github.com/faasm/faasm/blob/main/src/wasm/WasmModule.cpp#L826-L833=. I remember having to modify the brk mechanism locally because it didn't work properly when the wasm module allocated memory itself via memory.grow, and led to a lot of memory leaks if that function was called, maybe that's also what's happening now?

eigenraven avatar Jun 22 '22 10:06 eigenraven

@eigenraven thanks for pointing out the typos. FYI, the flush_memory_ms really made the difference for TSAN. Is there anything else that u think needs amending?

csegarragonz avatar Oct 03 '22 08:10 csegarragonz

Good question. I removed it and the tests pass as well. I can't remember why I included it in the first place...

csegarragonz avatar Oct 03 '22 09:10 csegarragonz