core icon indicating copy to clipboard operation
core copied to clipboard

Segmentation fault in TypeScript loader tests related to logs

Open viferga opened this issue 3 years ago • 2 comments

🐛 Bug Report

This bug gets shown with logs but probably it's a race condition related to threading, specially in destruction of TypeScript loader. This has to be reviewed carefully because the behavior is not deterministic. I have achieved to remove all the logs, making a no-op the log_write macro, and then the error happened in a different place, but I do not have those logs right now.

I am leaving here ./docker-compose.sh test logs in order to provide basic info about the bug. This should be reviewed with either helgrind or thread sanitizer. docker-compose.sh.test.output.txt

viferga avatar Sep 13 '22 14:09 viferga

@rxbryan check it out.

viferga avatar Sep 13 '22 14:09 viferga

Here's more: output-node-segfault.txt

viferga avatar Sep 13 '22 22:09 viferga

I have been reviewing this attached file output-node-segfault.txt and I can see that metacall-callback-complex-test fails here

 (/usr/local/metacall/build/node_modules/acorn/dist/acorn.js:817:23)
    at Espree.parseTopLevel (/usr/local/metacall/build/node_modules/espree/dist/espree.cjs:663:26)
    at Espree.parse (/usr/local/metacall/build/node_modules/acorn/dist/acorn.js:589:17)
    at Espree.parse (/usr/local/metacall/build/node_modules/espree/dist/espree.cjs:585:35) {
  index: 8,
  lineNumber: 1,
  column: 9
}
Exception while parsing 'function(x) {
	return function(n) {
		console.log('------------------ js factorial', n);
		if (n == 0) {
			console.log('------------------ js factorial case base');
			return 1;
		} else {
			console.log('------------------ js factorial pre x() call', x.toString().slice(0, 12), '...');
			const result = n * x(x)(n - 1);
			console.log('------------------ js factorial post x() call', x.toString().slice(0, 12), '...');
			return result;
		}
	};
}' in node_loader_trampoline_discover_function SyntaxError: Unexpected token (
    at Espree.raise (/usr/local/metacall/build/node_modules/espree/dist/espree.cjs:675:25)
    at Espree.unexpected (/usr/local/metacall/build/node_modules/espree/dist/espree.cjs:720:18)
    at Espree.pp$5.parseIdent (/usr/local/metacall/build/node_modules/acorn/dist/acorn.js:3390:12)
    at Espree.pp$8.parseFunction (/usr/local/metacall/build/node_modules/acorn/dist/acorn.js:1339:92)
    at Espree.pp$8.parseFunctionStatement (/usr/local/metacall/build/node_modules/acorn/dist/acorn.js:1058:17)
    at Espree.pp$8.parseStatement (/usr/local/metacall/build/node_modules/acorn/dist/acorn.js:903:19)
    at Espree.pp$8.parseTopLevel (/usr/local/metacall/build/node_modules/acorn/dist/acorn.js:817:23)
    at Espree.parseTopLevel (/usr/local/metacall/build/node_modules/espree/dist/espree.cjs:663:26)
    at Espree.parse (/usr/local/metacall/build/node_modules/acorn/dist/acorn.js:589:17)
    at Espree.parse (/usr/local/metacall/build/node_modules/espree/dist/espree.cjs:585:35) {
  index: 8,
  lineNumber: 1,
  column: 9
}
/usr/local/metacall/source/tests/metacall_callback_complex_test/source/metacall_callback_complex_test.cpp:102: Failure
Expected: ((void *)__null) != ((void *)js_function_factorial), actual: NULL vs NULL
[  FAILED  ] metacall_callback_complex_test.DefaultConstructor (284 ms)
[----------] 1 test from metacall_callback_complex_test (284 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (284 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] metacall_callback_complex_test.DefaultConstructor

have this been fixed? if not updating factcallback.js should fix it

#!/usr/bin/env node

const js_function_chain = function js_function_chain (x) {
	return function(n) {
		console.log('------------------ js chain', n);
		console.log('------------------ js chain pre x() call', x.toString().slice(0, 12), '...');
		const result = x(x)(n);
		console.log('------------------ js chain post x() call', x.toString().slice(0, 12), '...');
		return result;
	};
};
const js_function_factorial = function  js_function_factorial (x) {
	return function(n) {
		console.log('------------------ js factorial', n);
		if (n == 0) {
			console.log('------------------ js factorial case base');
			return 1;
		} else {
			console.log('------------------ js factorial pre x() call', x.toString().slice(0, 12), '...');
			const result = n * x(x)(n - 1);
			console.log('------------------ js factorial post x() call', x.toString().slice(0, 12), '...');
			return result;
		}
	};
}

module.exports = {
	js_function_chain,
	js_function_factorial,
};

rxbryan avatar Dec 01 '22 13:12 rxbryan

espree won't parse function calls written this way

function(x) {	};

rxbryan avatar Dec 01 '22 13:12 rxbryan

The output from running metacall-py-call-bench with gdb. This segfault doesn't occur when running with valgrind. @viferga do you think its related to this https://docs.python.org/3/c-api/init.html#c.Py_FinalizeEx

Bugs and caveats: The destruction of modules and objects in modules is done in random order; this may cause destructors (__del__() methods) to fail when they depend on other objects (even functions) or modules.
(gdb) run
Starting program: /home/lord-bryan/projects/core/build/metacall-py-call-bench 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
2022-12-01T17:21:40+01:00
Running /home/lord-bryan/projects/core/build/metacall-py-call-bench
Run on (4 X 2000 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 256 KiB (x2)
  L3 Unified 3072 KiB (x1)
Load Average: 0.60, 0.66, 0.80
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
[New Thread 0x7ffff5cdf700 (LWP 42551)]
[Thread 0x7ffff5cdf700 (LWP 42551) exited]
[New Thread 0x7ffff5cdf700 (LWP 42552)]
[Thread 0x7ffff5cdf700 (LWP 42552) exited]

Thread 1 "metacall-py-cal" received signal SIGSEGV, Segmentation fault.
0x00007ffff6e78f9c in PyType_Modified ()
   from /lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb) bt
#0  0x00007ffff6e78f9c in PyType_Modified ()
   from /lib/x86_64-linux-gnu/libpython3.9.so.1.0
#1  0x00007ffff6e78fb4 in PyType_Modified ()
   from /lib/x86_64-linux-gnu/libpython3.9.so.1.0
#2  0x00007ffff6e7b163 in PyType_ClearCache ()
   from /lib/x86_64-linux-gnu/libpython3.9.so.1.0
#3  0x00007ffff6db65ba in Py_FinalizeEx ()
   from /lib/x86_64-linux-gnu/libpython3.9.so.1.0
#4  0x00007ffff79e43e0 in py_loader_impl_finalize (py_impl=0x5555555e53f0)
    at /home/lord-bryan/projects/core/source/loaders/py_loader/source/py_loader_impl.c:4207
#5  py_loader_impl_finalize (py_impl=0x5555555e53f0)
    at /home/lord-bryan/projects/core/source/loaders/py_loader/source/py_loader_impl.c:4196
#6  py_loader_impl_destroy (impl=<optimized out>)
    at /home/lord-bryan/projects/core/source/loaders/py_loader/source/py_loader_impl.c:4285
#7  0x00007ffff7fb0550 in loader_impl_destroy (p=<optimized out>, 
    impl=0x555555683530)
    at /home/lord-bryan/projects/core/source/loader/source/loader_impl.c:1478
#8  0x00007ffff7fa5cc8 in plugin_destroy_delayed (p=0x5555556ccc30)
    at /home/lord-bryan/projects/core/source/plugin/source/plugin_impl.c:110
#9  0x00007ffff7fae834 in loader_unload_children (impl=impl@entry=0x0)
--Type <RET> for more, q to quit, c to continue without paging--
    at /home/lord-bryan/projects/core/source/loader/source/loader.c:696
#10 0x00007ffff7fae8d7 in loader_destroy ()
    at /home/lord-bryan/projects/core/source/loader/source/loader.c:742
#11 0x00007ffff7fb46e1 in metacall_destroy ()
    at /home/lord-bryan/projects/core/source/metacall/source/metacall.c:2213
#12 0x000055555556055d in metacall_py_call_bench::TearDown (
    this=<optimized out>, state=...)
    at /home/lord-bryan/projects/core/source/benchmarks/metacall_py_call_bench/source/metacall_py_call_bench.cpp:60
#13 0x00005555555a7748 in benchmark::internal::BenchmarkInstance::Run(unsigned long, int, benchmark::internal::ThreadTimer*, benchmark::internal::ThreadManager*, benchmark::internal::PerfCountersMeasurement*) const ()
#14 0x00005555555892de in benchmark::internal::(anonymous namespace)::RunInThread(benchmark::internal::BenchmarkInstance const*, unsigned long, int, benchmark::internal::ThreadManager*, benchmark::internal::PerfCountersMeasurement*) ()
#15 0x00005555555899c0 in benchmark::internal::BenchmarkRunner::DoNIterations()
    ()
#16 0x000055555558a0b9 in benchmark::internal::BenchmarkRunner::DoOneRepetition() ()
#17 0x0000555555562aaa in benchmark::internal::(anonymous namespace)::RunBenchmarks(std::vector<benchmark::internal::BenchmarkInstance, std::allocator<benchmark::internal::BenchmarkInstance> > const&, benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*) ()
--Type <RET> for more, q to quit, c to continue without paging--
#18 0x0000555555563d80 in benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
#19 0x0000555555563fee in benchmark::RunSpecifiedBenchmarks() ()
#20 0x000055555555ecc8 in main (argc=<optimized out>, argv=0x7fffffffe038)
    at /home/lord-bryan/projects/core/source/benchmarks/metacall_py_call_bench/source/metacall_py_call_bench.cpp:177

rxbryan avatar Dec 01 '22 16:12 rxbryan

The output from running metacall-py-call-bench with gdb. This segfault doesn't occur when running with valgrind. @viferga do you think its related to this https://docs.python.org/3/c-api/init.html#c.Py_FinalizeEx

Bugs and caveats: The destruction of modules and objects in modules is done in random order; this may cause destructors (__del__() methods) to fail when they depend on other objects (even functions) or modules.
(gdb) run
Starting program: /home/lord-bryan/projects/core/build/metacall-py-call-bench 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
2022-12-01T17:21:40+01:00
Running /home/lord-bryan/projects/core/build/metacall-py-call-bench
Run on (4 X 2000 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 256 KiB (x2)
  L3 Unified 3072 KiB (x1)
Load Average: 0.60, 0.66, 0.80
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
[New Thread 0x7ffff5cdf700 (LWP 42551)]
[Thread 0x7ffff5cdf700 (LWP 42551) exited]
[New Thread 0x7ffff5cdf700 (LWP 42552)]
[Thread 0x7ffff5cdf700 (LWP 42552) exited]

Thread 1 "metacall-py-cal" received signal SIGSEGV, Segmentation fault.
0x00007ffff6e78f9c in PyType_Modified ()
   from /lib/x86_64-linux-gnu/libpython3.9.so.1.0
(gdb) bt
#0  0x00007ffff6e78f9c in PyType_Modified ()
   from /lib/x86_64-linux-gnu/libpython3.9.so.1.0
#1  0x00007ffff6e78fb4 in PyType_Modified ()
   from /lib/x86_64-linux-gnu/libpython3.9.so.1.0
#2  0x00007ffff6e7b163 in PyType_ClearCache ()
   from /lib/x86_64-linux-gnu/libpython3.9.so.1.0
#3  0x00007ffff6db65ba in Py_FinalizeEx ()
   from /lib/x86_64-linux-gnu/libpython3.9.so.1.0
#4  0x00007ffff79e43e0 in py_loader_impl_finalize (py_impl=0x5555555e53f0)
    at /home/lord-bryan/projects/core/source/loaders/py_loader/source/py_loader_impl.c:4207
#5  py_loader_impl_finalize (py_impl=0x5555555e53f0)
    at /home/lord-bryan/projects/core/source/loaders/py_loader/source/py_loader_impl.c:4196
#6  py_loader_impl_destroy (impl=<optimized out>)
    at /home/lord-bryan/projects/core/source/loaders/py_loader/source/py_loader_impl.c:4285
#7  0x00007ffff7fb0550 in loader_impl_destroy (p=<optimized out>, 
    impl=0x555555683530)
    at /home/lord-bryan/projects/core/source/loader/source/loader_impl.c:1478
#8  0x00007ffff7fa5cc8 in plugin_destroy_delayed (p=0x5555556ccc30)
    at /home/lord-bryan/projects/core/source/plugin/source/plugin_impl.c:110
#9  0x00007ffff7fae834 in loader_unload_children (impl=impl@entry=0x0)
--Type <RET> for more, q to quit, c to continue without paging--
    at /home/lord-bryan/projects/core/source/loader/source/loader.c:696
#10 0x00007ffff7fae8d7 in loader_destroy ()
    at /home/lord-bryan/projects/core/source/loader/source/loader.c:742
#11 0x00007ffff7fb46e1 in metacall_destroy ()
    at /home/lord-bryan/projects/core/source/metacall/source/metacall.c:2213
#12 0x000055555556055d in metacall_py_call_bench::TearDown (
    this=<optimized out>, state=...)
    at /home/lord-bryan/projects/core/source/benchmarks/metacall_py_call_bench/source/metacall_py_call_bench.cpp:60
#13 0x00005555555a7748 in benchmark::internal::BenchmarkInstance::Run(unsigned long, int, benchmark::internal::ThreadTimer*, benchmark::internal::ThreadManager*, benchmark::internal::PerfCountersMeasurement*) const ()
#14 0x00005555555892de in benchmark::internal::(anonymous namespace)::RunInThread(benchmark::internal::BenchmarkInstance const*, unsigned long, int, benchmark::internal::ThreadManager*, benchmark::internal::PerfCountersMeasurement*) ()
#15 0x00005555555899c0 in benchmark::internal::BenchmarkRunner::DoNIterations()
    ()
#16 0x000055555558a0b9 in benchmark::internal::BenchmarkRunner::DoOneRepetition() ()
#17 0x0000555555562aaa in benchmark::internal::(anonymous namespace)::RunBenchmarks(std::vector<benchmark::internal::BenchmarkInstance, std::allocator<benchmark::internal::BenchmarkInstance> > const&, benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*) ()
--Type <RET> for more, q to quit, c to continue without paging--
#18 0x0000555555563d80 in benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
#19 0x0000555555563fee in benchmark::RunSpecifiedBenchmarks() ()
#20 0x000055555555ecc8 in main (argc=<optimized out>, argv=0x7fffffffe038)
    at /home/lord-bryan/projects/core/source/benchmarks/metacall_py_call_bench/source/metacall_py_call_bench.cpp:177

This seems to have been solved in master. Respect to the main bug, @rxbryan has noticed that it does not appear with NodeJS 18, we should test it.

viferga avatar Dec 09 '22 12:12 viferga

espree won't parse function calls written this way

function(x) {	};

How we should handle this? Can you implement a fix? Do you mean this won't work?:

module.exports = {
   sum: function (x, y) { return x + y },
}

viferga avatar Dec 09 '22 12:12 viferga

How we should handle this? Can you implement a fix? Do you mean this won't work?:

module.exports = {
   sum: function (x, y) { return x + y },
}

Yes this wouldn't have worked at the time we replaced cherow with espree. But it seems like it's been fixed in espree.

rxbryan avatar Dec 09 '22 15:12 rxbryan

This bug has been completely mitigated, all Linux tests are passing now. It requires to implement fully thread safe (lock free and signal async safe) logs, but for now all logs which are thread unsafe have been commented so we can avoid these errors. It's not something crucial for the project yet, but it's a TODO we can do soon.

viferga avatar Jan 17 '23 19:01 viferga