llvm-project [ci] New script to generate test reports as Buildkite Annotations

The CI builds now send the results of every lit run to a unique file. This means we can read them all to make a combined report for all tests.

This report will be shown as an "annotation" in the build results: https://buildkite.com/docs/agent/v3/cli-annotate#creating-an-annotation

Here is an example: https://buildkite.com/llvm-project/github-pull-requests/builds/112660 (make sure it is showing "All" instead of "Failures")

This is an alternative to using the existing Buildkite plugin: https://github.com/buildkite-plugins/junit-annotate-buildkite-plugin

As the plugin is:

Specific to Buildkite, and we may move away from Buildkite.
Requires docker, unless we were to fork it ourselves.
Does not let you customise the report format unless again, we make our own fork.

Annotations use GitHub's flavour of Markdown so the main code in the script generates that text. There is an extra "style" argument generated to make the formatting nicer in Buildkite.

"context" is the name of the annotation that will be created. By using different context names for Linux and Windows results we get 2 separate annotations.

The script also handles calling the buildkite-agent. This makes passing extra arguments to the agent easier, rather than piping the output of this script into the agent.

In the future we can remove the agent part of it and simply use the report content. Either printed to stdout or as a comment on the GitHub PR.

Oct 23 '24 11:10 DavidSpickett

:white_check_mark: With the latest revision this PR passed the Python code formatter.

Oct 23 '24 13:10 github-actions[bot]

Notes to reviewers:

This is based on https://github.com/llvm/llvm-project/pull/113160.
The first commit is that previous PR.
The second commit is the new script.
The third commit is some example changes so I can show you what the reports look like, they will be removed if this is approved.

This is an alternative to forking the existing Buildkite plugin. I do not have examples of those results because it's can't run without docker, but I do have WIP changes to the scripting to get that far (https://github.com/llvm/llvm-project/pull/113290) if you want co compare.

There are a couple of other things we could do:

Print the report to stdout as well so it is part of the logfile for download. However it will have a bunch of markdown stuff in it.
Or, write the report to test_results.md and make it an artifact of the job. The annotation can then say "you can download this report from the artifacts...".

I think the current state of this PR is a big improvement as it is, and I am able to commit more time to improving it as time goes on and if this needs to be ported to GitHub infrastructure later.

Oct 24 '24 09:10 DavidSpickett

@llvm/pr-subscribers-testing-tools @llvm/pr-subscribers-github-workflow @llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: David Spickett (DavidSpickett)

Changes

The CI builds now send the results of every lit run to a unique file. This means we can read them all to make a combined report for all tests.

This report will be shown as an "annotation" in the build results: https://buildkite.com/docs/agent/v3/cli-annotate#creating-an-annotation

Here is an example: https://buildkite.com/llvm-project/github-pull-requests/builds/112660#_ (make sure it is showing "All" instead of "Failures")

This is an alternative to using the existing Buildkite plugin: https://github.com/buildkite-plugins/junit-annotate-buildkite-plugin

As the plugin is:

Specific to Buildkite, and we may move away from Buildkite.
Requires docker, unless we were to fork it ourselves.
Does not let you customise the report format unless again, we make our own fork.

Annotations use GitHub's flavour of Markdown so the main code in the script generates that text. There is an extra "style" argument generated to make the formatting nicer in Buildkite.

"context" is the name of the annotation that will be created. By using different context names for Linux and Windows results we get 2 separate annotations.

The script also handles calling the buildkite-agent. This makes passing extra arguments to the agent easier, rather than piping the output of this script into the agent.

In the future we can remove the agent part of it and simply use the report content. Either printed to stdout or as a comment on the GitHub PR.

Full diff: https://github.com/llvm/llvm-project/pull/113447.diff

7 Files Affected:

(modified) .ci/generate-buildkite-pipeline-premerge (+2-2)
(added) .ci/generate_test_report.py (+328)
(modified) .ci/monolithic-linux.sh (+15-6)
(modified) .ci/monolithic-windows.sh (+7-3)
(modified) clang/README.md (+2)
(modified) clang/test/Driver/cl-pch.c (+1-1)
(modified) llvm/README.txt (+2)

diff --git a/.ci/generate-buildkite-pipeline-premerge b/.ci/generate-buildkite-pipeline-premerge
index 7676ff716c4185..e52133751f09b1 100755
--- a/.ci/generate-buildkite-pipeline-premerge
+++ b/.ci/generate-buildkite-pipeline-premerge
@@ -272,7 +272,7 @@ if [[ "${linux_projects}" != "" ]]; then
   artifact_paths:
   - 'artifacts/**/*'
   - '*_result.json'
-  - 'build/test-results.xml'
+  - 'build/test-results*.xml'
   agents: ${LINUX_AGENTS}
   retry:
     automatic:
@@ -295,7 +295,7 @@ if [[ "${windows_projects}" != "" ]]; then
   artifact_paths:
   - 'artifacts/**/*'
   - '*_result.json'
-  - 'build/test-results.xml'
+  - 'build/test-results*.xml'
   agents: ${WINDOWS_AGENTS}
   retry:
     automatic:
diff --git a/.ci/generate_test_report.py b/.ci/generate_test_report.py
new file mode 100644
index 00000000000000..f2ae116ace99af
--- /dev/null
+++ b/.ci/generate_test_report.py
@@ -0,0 +1,328 @@
+# Script to parse many JUnit XML result files and send a report to the buildkite
+# agent as an annotation.
+#
+# To run the unittests:
+# python3 -m unittest discover -p generate_test_report.py
+
+import argparse
+import unittest
+from io import StringIO
+from junitparser import JUnitXml, Failure
+from textwrap import dedent
+from subprocess import check_call
+
+
+def junit_from_xml(xml):
+    return JUnitXml.fromfile(StringIO(xml))
+
+
+class TestReports(unittest.TestCase):
+    def test_title_only(self):
+        self.assertEqual(_generate_report("Foo", []), ("", None))
+
+    def test_no_tests_in_testsuite(self):
+        self.assertEqual(
+            _generate_report(
+                "Foo",
+                [
+                    junit_from_xml(
+                        dedent(
+                            """\
+          <?xml version="1.0" encoding="UTF-8"?>
+          <testsuites time="0.00">
+          <testsuite name="Empty" tests="0" failures="0" skipped="0" time="0.00">
+          </testsuite>
+          </testsuites>"""
+                        )
+                    )
+                ],
+            ),
+            ("", None),
+        )
+
+    def test_no_failures(self):
+        self.assertEqual(
+            _generate_report(
+                "Foo",
+                [
+                    junit_from_xml(
+                        dedent(
+                            """\
+          <?xml version="1.0" encoding="UTF-8"?>
+          <testsuites time="0.00">
+          <testsuite name="Passed" tests="1" failures="0" skipped="0" time="0.00">
+          <testcase classname="Bar/test_1" name="test_1" time="0.00"/>
+          </testsuite>
+          </testsuites>"""
+                        )
+                    )
+                ],
+            ),
+            (
+                dedent(
+                    """\
+              # Foo
+
+              * 1 test passed"""
+                ),
+                "success",
+            ),
+        )
+
+    def test_report_single_file_single_testsuite(self):
+        self.assertEqual(
+            _generate_report(
+                "Foo",
+                [
+                    junit_from_xml(
+                        dedent(
+                            """\
+          <?xml version="1.0" encoding="UTF-8"?>
+          <testsuites time="8.89">
+          <testsuite name="Bar" tests="4" failures="2" skipped="1" time="410.63">
+          <testcase classname="Bar/test_1" name="test_1" time="0.02"/>
+          <testcase classname="Bar/test_2" name="test_2" time="0.02">
+            <skipped message="Reason"/>
+          </testcase>
+          <testcase classname="Bar/test_3" name="test_3" time="0.02">
+            <failure><![CDATA[Output goes here]]></failure>
+          </testcase>
+          <testcase classname="Bar/test_4" name="test_4" time="0.02">
+            <failure><![CDATA[Other output goes here]]></failure>
+          </testcase>
+          </testsuite>
+          </testsuites>"""
+                        )
+                    )
+                ],
+            ),
+            (
+                dedent(
+                    """\
+          # Foo
+
+          * 1 test passed
+          * 1 test skipped
+          * 2 tests failed
+
+          ## Failed tests
+          (click to see output)
+
+          ### Bar
+          <details>
+          <summary>Bar/test_3/test_3</summary>
+
+          ```
+          Output goes here
+          ```
+          </details>
+          <details>
+          <summary>Bar/test_4/test_4</summary>
+
+          ```
+          Other output goes here
+          ```
+          </details>"""
+                ),
+                "error",
+            ),
+        )
+
+    MULTI_SUITE_OUTPUT = (
+        dedent(
+            """\
+        # ABC and DEF
+
+        * 1 test passed
+        * 1 test skipped
+        * 2 tests failed
+
+        ## Failed tests
+        (click to see output)
+
+        ### ABC
+        <details>
+        <summary>ABC/test_2/test_2</summary>
+
+        ```
+        ABC/test_2 output goes here
+        ```
+        </details>
+
+        ### DEF
+        <details>
+        <summary>DEF/test_2/test_2</summary>
+
+        ```
+        DEF/test_2 output goes here
+        ```
+        </details>"""
+        ),
+        "error",
+    )
+
+    def test_report_single_file_multiple_testsuites(self):
+        self.assertEqual(
+            _generate_report(
+                "ABC and DEF",
+                [
+                    junit_from_xml(
+                        dedent(
+                            """\
+          <?xml version="1.0" encoding="UTF-8"?>
+          <testsuites time="8.89">
+          <testsuite name="ABC" tests="2" failures="1" skipped="0" time="410.63">
+          <testcase classname="ABC/test_1" name="test_1" time="0.02"/>
+          <testcase classname="ABC/test_2" name="test_2" time="0.02">
+            <failure><![CDATA[ABC/test_2 output goes here]]></failure>
+          </testcase>
+          </testsuite>
+          <testsuite name="DEF" tests="2" failures="1" skipped="1" time="410.63">
+          <testcase classname="DEF/test_1" name="test_1" time="0.02">
+            <skipped message="reason"/>
+          </testcase>
+          <testcase classname="DEF/test_2" name="test_2" time="0.02">
+            <failure><![CDATA[DEF/test_2 output goes here]]></failure>
+          </testcase>
+          </testsuite>
+          </testsuites>"""
+                        )
+                    )
+                ],
+            ),
+            self.MULTI_SUITE_OUTPUT,
+        )
+
+    def test_report_multiple_files_multiple_testsuites(self):
+        self.assertEqual(
+            _generate_report(
+                "ABC and DEF",
+                [
+                    junit_from_xml(
+                        dedent(
+                            """\
+          <?xml version="1.0" encoding="UTF-8"?>
+          <testsuites time="8.89">
+          <testsuite name="ABC" tests="2" failures="1" skipped="0" time="410.63">
+          <testcase classname="ABC/test_1" name="test_1" time="0.02"/>
+          <testcase classname="ABC/test_2" name="test_2" time="0.02">
+            <failure><![CDATA[ABC/test_2 output goes here]]></failure>
+          </testcase>
+          </testsuite>
+          </testsuites>"""
+                        )
+                    ),
+                    junit_from_xml(
+                        dedent(
+                            """\
+          <?xml version="1.0" encoding="UTF-8"?>
+          <testsuites time="8.89">
+          <testsuite name="DEF" tests="2" failures="1" skipped="1" time="410.63">
+          <testcase classname="DEF/test_1" name="test_1" time="0.02">
+            <skipped message="reason"/>
+          </testcase>
+          <testcase classname="DEF/test_2" name="test_2" time="0.02">
+            <failure><![CDATA[DEF/test_2 output goes here]]></failure>
+          </testcase>
+          </testsuite>
+          </testsuites>"""
+                        )
+                    ),
+                ],
+            ),
+            self.MULTI_SUITE_OUTPUT,
+        )
+
+
+def _generate_report(title, junit_objects):
+    style = None
+
+    if not junit_objects:
+        return ("", style)
+
+    failures = {}
+    tests_run = 0
+    tests_skipped = 0
+    tests_failed = 0
+
+    for results in junit_objects:
+        for testsuite in results:
+            tests_run += testsuite.tests
+            tests_skipped += testsuite.skipped
+            tests_failed += testsuite.failures
+
+            for test in testsuite:
+                if (
+                    not test.is_passed
+                    and test.result
+                    and isinstance(test.result[0], Failure)
+                ):
+                    if failures.get(testsuite.name) is None:
+                        failures[testsuite.name] = []
+                    failures[testsuite.name].append(
+                        (test.classname + "/" + test.name, test.result[0].text)
+                    )
+
+    if not tests_run:
+        return ("", style)
+
+    style = "error" if tests_failed else "success"
+    report = [f"# {title}", ""]
+
+    tests_passed = tests_run - tests_skipped - tests_failed
+
+    def plural(num_tests):
+        return "test" if num_tests == 1 else "tests"
+
+    if tests_passed:
+        report.append(f"* {tests_passed} {plural(tests_passed)} passed")
+    if tests_skipped:
+        report.append(f"* {tests_skipped} {plural(tests_skipped)} skipped")
+    if tests_failed:
+        report.append(f"* {tests_failed} {plural(tests_failed)} failed")
+
+    if failures:
+        report.extend(["", "## Failed tests", "(click to see output)"])
+        for testsuite_name, failures in failures.items():
+            report.extend(["", f"### {testsuite_name}"])
+            for name, output in failures:
+                report.extend(
+                    [
+                        "<details>",
+                        f"<summary>{name}</summary>",
+                        "",
+                        "```",
+                        output,
+                        "```",
+                        "</details>",
+                    ]
+                )
+
+    return "\n".join(report), style
+
+
+def generate_report(title, junit_files):
+    return _generate_report(title, [JUnitXml.fromfile(p) for p in junit_files])
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "title", help="Title of the test report, without Markdown formatting."
+    )
+    parser.add_argument("context", help="Annotation context to write to.")
+    parser.add_argument("junit_files", help="Paths to JUnit report files.", nargs="*")
+    args = parser.parse_args()
+
+    report, style = generate_report(args.title, args.junit_files)
+    check_call(
+        [
+            "buildkite-agent",
+            "annotate",
+            "--context",
+            args.context,
+            "--style",
+            style,
+            report,
+        ]
+    )
diff --git a/.ci/monolithic-linux.sh b/.ci/monolithic-linux.sh
index b78dc59432b65c..4c4c1c756ba98f 100755
--- a/.ci/monolithic-linux.sh
+++ b/.ci/monolithic-linux.sh
@@ -28,18 +28,24 @@ if [[ -n "${CLEAR_CACHE:-}" ]]; then
   ccache --clear
 fi
 
-function show-stats {
+function at-exit {
+  python3 "${MONOREPO_ROOT}"/.ci/generate_test_report.py ":linux: Linux x64 Test Results" \
+    "linux-x64-test-results" "${BUILD_DIR}"/test-results*.xml
+
   mkdir -p artifacts
   ccache --print-stats > artifacts/ccache_stats.txt
 }
-trap show-stats EXIT
+trap at-exit EXIT
 
 projects="${1}"
 targets="${2}"
 
+lit_args="-v --xunit-xml-output ${BUILD_DIR}/test-results.xml --use-unique-output-file-name --timeout=1200 --time-tests"
+
 echo "--- cmake"
 pip install -q -r "${MONOREPO_ROOT}"/mlir/python/requirements.txt
 pip install -q -r "${MONOREPO_ROOT}"/lldb/test/requirements.txt
+pip install -q junitparser==3.2.0
 cmake -S "${MONOREPO_ROOT}"/llvm -B "${BUILD_DIR}" \
       -D LLVM_ENABLE_PROJECTS="${projects}" \
       -G Ninja \
@@ -47,7 +53,7 @@ cmake -S "${MONOREPO_ROOT}"/llvm -B "${BUILD_DIR}" \
       -D LLVM_ENABLE_ASSERTIONS=ON \
       -D LLVM_BUILD_EXAMPLES=ON \
       -D COMPILER_RT_BUILD_LIBFUZZER=OFF \
-      -D LLVM_LIT_ARGS="-v --xunit-xml-output ${BUILD_DIR}/test-results.xml --timeout=1200 --time-tests" \
+      -D LLVM_LIT_ARGS="${lit_args}" \
       -D LLVM_ENABLE_LLD=ON \
       -D CMAKE_CXX_FLAGS=-gmlt \
       -D LLVM_CCACHE_BUILD=ON \
@@ -87,7 +93,8 @@ if [[ "${runtimes}" != "" ]]; then
       -D CMAKE_BUILD_TYPE=RelWithDebInfo \
       -D CMAKE_INSTALL_PREFIX="${INSTALL_DIR}" \
       -D LIBCXX_TEST_PARAMS="std=c++03" \
-      -D LIBCXXABI_TEST_PARAMS="std=c++03"
+      -D LIBCXXABI_TEST_PARAMS="std=c++03" \
+      -D LLVM_LIT_ARGS="${lit_args}"
 
   echo "--- ninja runtimes C++03"
 
@@ -104,7 +111,8 @@ if [[ "${runtimes}" != "" ]]; then
       -D CMAKE_BUILD_TYPE=RelWithDebInfo \
       -D CMAKE_INSTALL_PREFIX="${INSTALL_DIR}" \
       -D LIBCXX_TEST_PARAMS="std=c++26" \
-      -D LIBCXXABI_TEST_PARAMS="std=c++26"
+      -D LIBCXXABI_TEST_PARAMS="std=c++26" \
+      -D LLVM_LIT_ARGS="${lit_args}"
 
   echo "--- ninja runtimes C++26"
 
@@ -121,7 +129,8 @@ if [[ "${runtimes}" != "" ]]; then
       -D CMAKE_BUILD_TYPE=RelWithDebInfo \
       -D CMAKE_INSTALL_PREFIX="${INSTALL_DIR}" \
       -D LIBCXX_TEST_PARAMS="enable_modules=clang" \
-      -D LIBCXXABI_TEST_PARAMS="enable_modules=clang"
+      -D LIBCXXABI_TEST_PARAMS="enable_modules=clang" \
+      -D LLVM_LIT_ARGS="${lit_args}"
 
   echo "--- ninja runtimes clang modules"
   
diff --git a/.ci/monolithic-windows.sh b/.ci/monolithic-windows.sh
index 91e719c52d4363..303cfadfa917cc 100755
--- a/.ci/monolithic-windows.sh
+++ b/.ci/monolithic-windows.sh
@@ -27,16 +27,20 @@ if [[ -n "${CLEAR_CACHE:-}" ]]; then
 fi
 
 sccache --zero-stats
-function show-stats {
+function at-exit {
+  python "${MONOREPO_ROOT}"/.ci/generate_test_report.py ":windows: Windows x64 Test Results" \
+    "windows-x64-test-results" "${BUILD_DIR}"/test-results*.xml
+
   mkdir -p artifacts
   sccache --show-stats >> artifacts/sccache_stats.txt
 }
-trap show-stats EXIT
+trap at-exit EXIT
 
 projects="${1}"
 targets="${2}"
 
 echo "--- cmake"
+pip install junitparser==3.2.0
 pip install -q -r "${MONOREPO_ROOT}"/mlir/python/requirements.txt
 
 # The CMAKE_*_LINKER_FLAGS to disable the manifest come from research
@@ -53,7 +57,7 @@ cmake -S "${MONOREPO_ROOT}"/llvm -B "${BUILD_DIR}" \
       -D LLVM_ENABLE_ASSERTIONS=ON \
       -D LLVM_BUILD_EXAMPLES=ON \
       -D COMPILER_RT_BUILD_LIBFUZZER=OFF \
-      -D LLVM_LIT_ARGS="-v --xunit-xml-output ${BUILD_DIR}/test-results.xml --timeout=1200 --time-tests" \
+      -D LLVM_LIT_ARGS="-v --xunit-xml-output ${BUILD_DIR}/test-results.xml --use-unique-output-file-name --timeout=1200 --time-tests" \
       -D COMPILER_RT_BUILD_ORC=OFF \
       -D CMAKE_C_COMPILER_LAUNCHER=sccache \
       -D CMAKE_CXX_COMPILER_LAUNCHER=sccache \
diff --git a/clang/README.md b/clang/README.md
index b98182d8a3f684..94b1e1a7a07433 100644
--- a/clang/README.md
+++ b/clang/README.md
@@ -23,3 +23,5 @@ If you're interested in more (including how to build Clang) it is best to read t
 * If you find a bug in Clang, please file it in the LLVM bug tracker:
   
     https://github.com/llvm/llvm-project/issues
+
+test change
\ No newline at end of file
diff --git a/clang/test/Driver/cl-pch.c b/clang/test/Driver/cl-pch.c
index 36d83a11242dc6..1a9527458ebf4f 100644
--- a/clang/test/Driver/cl-pch.c
+++ b/clang/test/Driver/cl-pch.c
@@ -25,7 +25,7 @@
 // CHECK-YCTP-SAME: -o
 // CHECK-YCTP-SAME: pchfile.pch
 // CHECK-YCTP-SAME: -x
-// CHECK-YCTP-SAME: c++-header
+// CHECK-YCTP-SAME: c++-header -- this will fail tests on windows!
 
 // Except if a later /TC changes it back.
 // RUN: %clang_cl -Werror /Yc%S/Inputs/pchfile.h /FI%S/Inputs/pchfile.h /c /Fo%t/pchfile.obj /Fp%t/pchfile.pch -v -- %s 2>&1 \
diff --git a/llvm/README.txt b/llvm/README.txt
index b9b71a3b6daff1..ba60b8ffdd072c 100644
--- a/llvm/README.txt
+++ b/llvm/README.txt
@@ -15,3 +15,5 @@ documentation setup.
 
 If you are writing a package for LLVM, see docs/Packaging.rst for our
 suggestions.
+
+test change
\ No newline at end of file

Oct 24 '24 09:10 llvmbot

Example reports can be seen here - https://buildkite.com/llvm-project/github-pull-requests/builds/112660

(make sure to switch to "All" instead of "Failures")

Oct 24 '24 09:10 DavidSpickett

Trying to understand what this patch does, basically it adds a summary when looking at the results of a BuildKite job?

Previously: Screenshot 2024-10-24 at 13 00 24

After this patch: Screenshot 2024-10-24 at 13 00 47

Basically, the sort of "plain text" report that gets added is the difference. Is that correct? Are there other differences (such as how test results get reported to Github actions)?

Oct 24 '24 17:10 ldionne

Basically, the sort of "plain text" report that gets added is the difference. Is that correct?

Yes, there is also a Linux report there that shows that it succeeded, but you have to select "All" on the "All | Failures" button. Which is maybe redundant but folks like seeing green boxes, I know I do.

This overall report is much more useful than seeing only the last check-whatever that was run at the end of the log. Which might not be the one that failed.

Are there other differences (such as how test results get reported to Github actions)?

The result sent to GitHub is unchanged. For example the result on this PR is a failure because one test failed on Windows, because of the deliberate failure I put in.

If we wanted to send the report as a comment on GitHub, we can certainly look into that as mentioned here https://discourse.llvm.org/t/using-plugins-in-buildkite-ci-that-require-docker/82701/6?u=davidspickett. The Buildkite specific parts of the script are easy to remove.

Oct 24 '24 17:10 DavidSpickett

Ack, thanks for the confirmation.

If we wanted to send the report as a comment on GitHub, we can certainly look into that as mentioned here https://discourse.llvm.org/t/using-plugins-in-buildkite-ci-that-require-docker/82701/6?u=davidspickett. The Buildkite specific parts of the script are easy to remove.

I'd rather not do that. Commenting on the PR should be done very rarely, especially for CI failures, since it clutters the PR and Github already has a builtin system for presenting CI failures.

Overall, I'm fine with this patch, however I wonder how much effort it is worth putting into the BuildKite infrastructure given that we plan on moving as much as possible to Github Actions. With Github actions, the .ci/generate-buildkite-pipeline-premerge script would probably go away entirely in favour of individual workflows or something along those lines, and then the issue of figuring out what failed would probably be a non-issue. I am especially unsure about adding functionality to fundamental utilities like Lit in that context: if we e.g. end up not needing to output test results in individual files a few weeks/months from now, then there may not be any users of --use-unique-output-file-name anymore and then we'll have increased the complexity of Lit for a temporary benefit. Just my .02. I really appreciate efforts like this to make the CI infra easier to use, I just wonder if the specific approach taken by this patch series is in line with the near future direction of the project w.r.t. CI.

Oct 24 '24 17:10 ldionne

I'd rather not do that. Commenting on the PR should be done very rarely, especially for CI failures, since it clutters the PR and Github already has a builtin system for presenting CI failures.

I think this is something that needs to be discussed further when we actually get there. Comments I think can still be useful in certain circumstances, but definitely agree that adding a comment for every push would clutter things, but that's definitely not the only way to do things.

Overall, I'm fine with this patch, however I wonder how much effort it is worth putting into the BuildKite infrastructure given that we plan on moving as much as possible to Github Actions.

The script is already written and works, and there still isn't a concrete timeline for when we are going to move things over to Github actions. This solves a valid complaint I hear somewhat often. For libc++ at least, there are definitely platforms that will not be able to move over to Github precommit where something like this might still be useful from what I understand.

With Github actions, the .ci/generate-buildkite-pipeline-premerge script would probably go away entirely in favour of individual workflows or something along those lines, and then the issue of figuring out what failed would probably be a non-issue. I am especially unsure about adding functionality to fundamental utilities like Lit in that context: if we e.g. end up not needing to output test results in individual files a few weeks/months from now, then there may not be any users of --use-unique-output-file-name anymore and then we'll have increased the complexity of Lit for a temporary benefit.

Do we make any guarantees about supporting lit flags in the future? It's mostly an internal tool, and I would personally be in support of removing flags that are no longer used in tree, unless there are some extremely compelling downstream use cases and there is an appropriate maintenance story.

Oct 24 '24 17:10 boomanaiden154

The script is already written and works, and there still isn't a concrete timeline for when we are going to move things over to Github actions. This solves a valid complaint I hear somewhat often. For libc++ at least, there are definitely platforms that will not be able to move over to Github precommit where something like this might still be useful from what I understand.

True, however for libc++ the benefits are smaller because we run only libc++ specific job in the BuildKite job. I've never had trouble with figuring out what part of a job has caused failures. Either way, like I said I'm fine with the patch, it seems like an improvement.

Do we make any guarantees about supporting lit flags in the future? It's mostly an internal tool, and I would personally be in support of removing flags that are no longer used in tree, unless there are some extremely compelling downstream use cases and there is an appropriate maintenance story.

No, but someone has to remember and actually take the time to do the cleanup. I also 100% support the removal of unused options and I've done things like that in the past, but it doesn't happen unless someone decides to make it their task.

Oct 24 '24 18:10 ldionne

True, however for libc++ the benefits are smaller because we run only libc++ specific job in the BuildKite job. I've never had trouble with figuring out what part of a job has caused failures. Either way, like I said I'm fine with the patch, it seems like an improvement.

Ah, okay. That would make sense.

No, but someone has to remember and actually take the time to do the cleanup. I also 100% support the removal of unused options and I've done things like that in the past, but it doesn't happen unless someone decides to make it their task.

That's true. It is definitely tech debt when things like that are suddenly not used, but it is theoretically not too difficult to pull out at that point.

Oct 24 '24 20:10 boomanaiden154

Do we make any guarantees about supporting lit flags in the future? It's mostly an internal tool, and I would personally be in support of removing flags that are no longer used in tree, unless there are some extremely compelling downstream use cases and there is an appropriate maintenance story.

Overall, I'm fine with this patch, however I wonder how much effort it is worth putting into the BuildKite infrastructure given that we plan on moving as much as possible to Github Actions...

Louis' point is a good one given that lit is released externally (https://pypi.org/project/lit/). We probably don't have a contract as such there but if we don't add anything, there's no chance to annoy anyone when we remove it. The next release would be based on the 19 branch, so there is still time to remove the feature I added.

The alternative is to go back to changing the build scripts to do:

for target in targets:
   ninja target
   move result file
   accumulate return code

That does not require a lit feature but is more invasive. If we're going to throw away those scripts anyway, perhaps folks will be ok with a temporary increase in complexity.

I agree that a future GitHub actions pipeline should be splitting check targets into steps anyway, effectively doing the for loop I showed above, but using the pipeline structure. And if the folks doing the transition don't want to worry about that on top of everything else I'm sure they have to do, I will commit my time to doing the extra work required for that.

If that means we get useful reporting of test results by default and we can delete every change I've made towards this so far, I'll be very happy!

I will make another version of this PR that modifies the build scripts instead of using the lit feature. If that PR is preferable, I will revert the lit feature.

Oct 25 '24 07:10 DavidSpickett

The version that does not use the lit feature: https://github.com/llvm/llvm-project/pull/113660

It's not complete but it's good enough to compare approaches I think.

Another, probably objectionable choice, is to write a wrapper script that replaces lit, looks for the result filename argument, runs the original lit then moves the file to a new location. Even then we would have to ninja targets first so that the lit we replace is the complete one. So it's probably even more change to the build scripts to split building and checking.

I wondered about some sort of virtual file where we could accumulate versions of it but doubted this would work on Windows if there were such a thing. Maybe we could give lit the path to some kind of special Unix pipe and have a listener pull data from that so each EOF makes a new file?

Edit: It was suggested to me that since I can split the output in the report generator, I could use tail -f -retry <results file name> to get all the data each lit writes out. I will try this out next week.

Oct 25 '24 13:10 DavidSpickett

I tried a few more things, so I will summarise everything up until this point. There are two layers here:

Making test reports.
Producing the XML test results.

Using the buildkite plugin is not an option (https://github.com/llvm/llvm-project/pull/113290) because we'd need docker or a fork of the plugin. Our own test reporting script is the only way to generate the reports. This decides #1, unless we want to leave it as is until the move to GitHub actions.

For #2 the following ideas don't work:

tail ---follow --retry results_file > combined_results_file, then splitting it up in the reporting script
- Does not work because tail does not print the whole file on truncation. We would miss data.
mkfifo and a listener script to write it out to unique files (https://github.com/llvm/llvm-project/pull/113703)
- Works great on Linux but -
- mkfifo fails silently in bash for Windows. There are no plans to fix that any time soon.
- I could try https://learn.microsoft.com/en-us/windows/win32/ipc/named-pipes, but I suspect this would be even more code, and folks are already wary of adding too much to this system as it is.
Using ninja check-all
- Does not allow us to disable testing of a project, but still build it.
- Does not help with runtimes tests that are separate builds.

These could work but have tradeoffs:

Using the lit feature --unique-output... (this PR)
- Built into lit (for now).
- Minimal changes to scripts and pipeline.
  - New arguments for LLVM_LIT_ARGS.
  - Run the report generator on exit.
- Risk of exposing an option publicly that we no longer need when the pipeline is re-written.
  - I could RFC that option and drum up interest in it from other users and external projects, if that would help. I have not done so up to now because it might have looked like I was trying to bias this discussion.
Changing the build scripts to ninja target mv result_file unique_name in a loop (https://github.com/llvm/llvm-project/pull/113660)
- The most script changes of any solution.
- No changes to report generator.
- Small bonus - result files are named after their check target (though I wonder if some check targets run lit multiple times, in which case this is actually a bug not a bonus).
Wrapping lit in a Python script that will catch the output file name argument, run lit, then move the file to a unique name (https://github.com/llvm/llvm-project/pull/113896)
- It works, albeit in a fiddly way because of differences in .py or not between Windows and Linux.
- We have to re-wrap lit every time cmake is run (what I have implemented) or -
- We configure in another directory, patch that lit, and configure with -DLLVM_EXTERNAL_LIT=<path to the other lit> (which I have yet to test)
- We're not using the actual lit, though the chances of a regression because of that are pretty low. ninja check-lit does still work, so though we aren't testing what we ship, it's pretty close.

Are any of these solutions acceptable? Is there anything else I could do to address your concerns with any of these?

If folks think it's just too close to a move to GitHub Actions and full pipeline rewrite (which I fully support doing), then I will shelve all this. However, personally I'd like better test reporting even if it is only for 2/3 months until the new system appears.

Oct 28 '24 16:10 DavidSpickett

I also looked through GitHub code search to see if anyone else was hitting this same issue.

ROCm avoids it by splitting each check target into its own step: https://github.com/ROCm/ROCm/blob/bce439ecacc86b7f48934f5cc6be7ce2044412cf/.azuredevops/components/llvm-project.yml#L87

As we will in a future pipeline I'm sure.

The Direct X shader compiler uses one file but check-all instead: https://github.com/microsoft/DirectXShaderCompiler/blob/6a7eae1bbe36d24471cb542fda12ec6a69f49572/azure-pipelines.yml#L116

Microsoft STL has a failure reporting script which has a subset of the output my proposed script produces: https://github.com/microsoft/STL/blob/ca1553d38480b1224cb5b36ebd755af17aec275b/tools/scripts/print_failures.py

At one point the CHERI project called lit directly with all the test paths to avoid overwriting the results: https://github.com/SchrodingerZhu/cheribuild/commit/36dbfbfb228afde8c2f0007575fe5f582fa4a652

In another place they used to use the ninja / mv / ninja / mv ... strategy: https://github.com/CTSRD-CHERI/llvm-project/commit/cd8cdda1ede103d8cc9e0a2696e90cac429a7bc5#diff-8d76cd41051bde3e75bbe95280fde32df0252c12ee3e01b707a7c31c8f641320R32

The rest of the instances were Swift which seems to split checks into steps.

So there is evidence that this has been a problem, but that the solution has been to refactor the pipeline itself. This gives you finer control over what builds, and you can have a report per step.

This supports reverting the lit option and choosing either ninja / mv or the lit wrapper script (assuming we don't shelve this until the GitHub Actions move).

Oct 29 '24 09:10 DavidSpickett

FWIW, I didn't mean to derail this effort or create significant additional complexity by saying what I said above. My intention was only to point out the tradeoff of adding new functionality for a temporary benefit, which wasn't clear to me had been thought about in those terms.

I'd also be fine with proceeding with this approach as-is and then cleaning up the Lit flag in the future if we don't need it and if we think it won't break people (which is unlikely since that's a fairly niche flag). But @DavidSpickett you seem to have a firm understanding of what needs to happen and the various constraints, and you're clearly cognizant about adding tech debt, so at this point I'm personally very comfortable with deferring to your judgement for what's the best way to move forward.

Oct 29 '24 14:10 ldionne

FWIW, I didn't mean to derail this effort or create significant additional complexity by saying what I said above. My intention was only to point out the tradeoff of adding new functionality for a temporary benefit, which wasn't clear to me had been thought about in those terms.

It didn't derail it, in fact it was good you brought it up because I was too hasty landing that part. I could and should have left it approved and included it as part of larger PRs like this before making any decisions. I don't consider working on any of the alternatives to be wasted time, I needed to think through them anyway.

Not adding the lit option avoids any dependency on the GitHub Actions move, which would be when we could remove the option. We could take the risk there but we have a lot of holidays coming up who knows whether we'll get there before the 20 branch.

We configure in another directory, patch that lit, and configure with -DLLVM_EXTERNAL_LIT= (which I have yet to test)

So I think this is the best of the remaining methods to do this.

Or the one that just occurred to me, which is hiding the lit option: https://docs.python.org/3/library/argparse.html#help

(as usual, I'm not sure why I didn't think of this earlier)

Oct 29 '24 15:10 DavidSpickett

I have updated this PR to include marking the option as hidden. If folks would be ok with that approach, I will break this up into individual parts for review and landing separately.

If not, I will work more on the lit wrapper approach.

@boomanaiden154 what do you think of those plans?

Oct 29 '24 15:10 DavidSpickett

I've split up the code: Hide the lit option - https://github.com/llvm/llvm-project/pull/114812 Write results to unique file names - https://github.com/llvm/llvm-project/pull/113160 This PR, which adds the reporting script.

Nov 04 '24 15:11 DavidSpickett

Apologies, I merged this without an approval, I got mixed up with the other 2 patches in the stack. But I think we had a decent consensus anyway? It's an easy revert if you feel that I'm jumping ahead too far.

The only outstanding comment on the reporting script was the testing of it, but I think we're ok with using unittest for now since it is temporary.

Again, I hope a future system makes all this unnecessary so I look forward to deleting this and putting my efforts into that instead. Thanks for all the feedback it was really good to have to think it all through thoroughly.

Nov 12 '24 13:11 DavidSpickett

Since this relanded, I've been getting pre-commit breakages on a libc change: https://buildkite.com/llvm-project/github-pull-requests/builds/120029#01933169-a0ae-443c-9de3-0827c512a223

I can't find anything in the logs that indicates that a build or test failure occurred. Does anything jump out to you as to why this might be happening?

It also looks like a coworker's libc change went through without issue: https://buildkite.com/llvm-project/github-pull-requests/builds/119274#_

+ runtimes=
--
  | + runtime_targets=
  | + [[ '' != '' ]]
  | + at-exit
  | + mkdir -p artifacts
  | + ccache --print-stats
  | + shopt -s nullglob
  | + python3 /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-5wwkv-1/llvm-project/github-pull-requests/.ci/generate_test_report.py ':linux: Linux x64 Test Results' linux-x64-test-results
  | Traceback (most recent call last):
  | File "/var/lib/buildkite-agent/builds/linux-56-59b8f5d88-5wwkv-1/llvm-project/github-pull-requests/.ci/generate_test_report.py", line 406, in <module>
  | p = subprocess.Popen(
  | File "/usr/lib/python3.10/subprocess.py", line 971, in __init__
  | self._execute_child(args, executable, preexec_fn, close_fds,
  | File "/usr/lib/python3.10/subprocess.py", line 1796, in _execute_child
  | self.pid = _posixsubprocess.fork_exec(
  | TypeError: expected str, bytes or os.PathLike object, not NoneType
  | 🚨 Error: The command exited with status 1
  | user command error: exit status 1

Nov 15 '24 20:11 mysterymath

Fixed by https://github.com/llvm/llvm-project/commit/6a12b43ac00096976a886bd6d3a1b70a804d09ca.

Nov 18 '24 09:11 DavidSpickett

It also looks like a coworker's libc change went through without issue:

This build must have been done before the first land or while this was reverted, I don't see it attempting to use the script in the log.

Nov 18 '24 09:11 DavidSpickett

Thanks for the quick fix; my PR is green now!

Nov 18 '24 20:11 mysterymath

llvm-project llvm-project copied to clipboard

[ci] New script to generate test reports as Buildkite Annotations

llvm-project
llvm-project copied to clipboard