lpython Some CI tests are flaky or real failures

There are some CI tests that seem to be flaky and fail.

Failure in ctest.

Test project /home/runner/work/lpython/lpython
    Start 1: test_stacktrace
1/2 Test #1: test_stacktrace ..................   Passed    0.00 sec
    Start 2: test_lpython
2/2 Test #2: test_lpython .....................Subprocess aborted***Exception:   0.86 sec
[doctest] doctest version is "2.4.8"
[doctest] run with "--help" for options
0 0 0 0 0 0 0 0 0 0 
test_lpython: /home/runner/micromamba/envs/lp/include/llvm/IR/DataLayout.h:656: uint64_t llvm::StructLayout::getElementOffset(unsigned int) const: Assertion `Idx < NumElements && "Invalid element idx!"' failed.
===============================================================================
/home/runner/work/lpython/lpython/src/lpython/tests/test_llvm.cpp:[15](https://github.com/lcompilers/lpython/actions/runs/13378363981/job/37362241953?pr=2823#step:6:16)56:
TEST CASE:  PythonCompiler classes

/home/runner/work/lpython/lpython/src/lpython/tests/test_llvm.cpp:1556: FATAL ERROR: test case CRASHED: SIGABRT - Abort (abnormal termination) signal

===============================================================================
[doctest] test cases:  55 |  54 passed | 1 failed | [17](https://github.com/lcompilers/lpython/actions/runs/13378363981/job/37362241953?pr=2823#step:6:18) skipped
[doctest] assertions: 486 | 486 passed | 0 failed |
[doctest] Status: FAILURE!


50% tests passed, 1 tests failed out of 2

Total Test time (real) =   0.87 sec

The following tests FAILED:
	  2 - test_lpython (Subprocess aborted)
Errors while running CTest
Error: Process completed with exit code 8.

Failure in reference tests

compiler_tester.tester.RunException: Testing with reference output failed.
runtime_errors/test_assert_01.py * run_dbg
The JSON metadata differs against reference results
Reference JSON: tests/reference/run_dbg-test_assert_01-2f34744.json
Output JSON:    tests/output/run_dbg-test_assert_01-2f34744.json
Omitting 9 identical items
Differing items:
{'stderr_hash': '32b0a24f111e577fe4fc5b3f4a5994b951e34dde7986b3fb750c5f5e'} != {'stderr_hash': '4811af471c73572b285e9ea01c8689abdd3cb32c717b3cd4876d2669'}
{'returncode': 134} != {'returncode': 1}
Diff against: tests/reference/run_dbg-test_assert_01-2f34744.stderr
1,7c1,2
<   File "tests/runtime_errors/test_assert_01.py", line 1
<     def test():
<   File "tests/runtime_errors/test_assert_01.py", line 4
<     test()
<   File "tests/runtime_errors/test_assert_01.py", line 2
<     assert False
< AssertionError
---
> *** buffer overflow detected ***: terminated
> Aborted (core dumped)

Error: Process completed with exit code 1.

Feb 17 '25 20:02 ubaidsk

The above tests kind of fail randomly. I will probably comment out these two tests for now and get the CI to pass (which will unblock merging PRs). We can then fix these tests iteratively in subsequent PRs.

Feb 17 '25 21:02 ubaidsk

Failure in reference tests

Seems like all the run_with_dbg reference tests are failing/flaky as of now. Following the steps at the CI, they work fine for me locally.

Feb 17 '25 22:02 ubaidsk

Also integration_tests/test_str_01.py fails with the above (> *** buffer overflow detected ***: terminated).

I think these are real failures that we need to fix.

Feb 17 '25 22:02 ubaidsk

I think a fix for this buffer overflow error might be the same as https://github.com/lfortran/lfortran/pull/6003.

The CI failures started occurring when the GCC compiler was updated on the CI. The PRs which fixed them incrementally were:

https://github.com/lfortran/lfortran/pull/5983
https://github.com/lfortran/lfortran/pull/6003
https://github.com/lfortran/lfortran/pull/6004

Feb 18 '25 09:02 kmr-srbh

https://github.com/lcompilers/lpython/blob/7eb2bea75234ee7a99158871175bc0bb7df63fb1/src/libasr/runtime/lfortran_intrinsics.c#L2306-L2308

It does look like the same issue for test_str_01

Feb 20 '25 08:02 swamishiju