tensorflow icon indicating copy to clipboard operation
tensorflow copied to clipboard

Can no longer run XLA lit tests

Open trevor-m opened this issue 2 years ago • 5 comments

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

TF 2.15

Custom code

No

Current behavior?

Previously, I could run the XLA unit tests via bazel test //tensorflow/compiler/xla/...:all. However, in TF 2.15 after xla was moved to third_party/xla I am encountering issues. I updated my command to bazel test @local_xla//xla/...:all. While most tests run successfully, it seems there are some hardcoded paths which are preventing the llvm lit tests from running correctly. See the logs below. Probably the lit configs need to be updated?

Standalone code to reproduce the issue

Checkout tensorflow. Configure. Run `bazel test @local_xla//xla/...:all`

Relevant log output

================================================================================
FAIL: @local_xla//xla/mlir/backends/gpu/transforms/tests:gpu_memcpy.mlir.test (see /root/.cache/bazel/_bazel_root/a8fc6d0749b4f3c43761726a36e8ec4c/execroot/org_tensorflow/bazel-out/k8-opt/testlogs/external/local_xla/xla/mlir/backends/gpu/transforms/tests/gpu_memcpy.mlir.test/test.log)
[27,548 / 27,604] 348 / 447 tests, 216 failed; [Sched] Testing @local_xla//xla/mlir_hlo/tests:Dialect/mhlo/hlo-collapse-elementwise-map.mlir.test; 45s ... (55 actions, 2 running)
INFO: From Testing @local_xla//xla/mlir/backends/gpu/transforms/tests:gpu_memcpy.mlir.test:
==================== Test output for @local_xla//xla/mlir/backends/gpu/transforms/tests:gpu_memcpy.mlir.test:
Running test /root/.cache/bazel/_bazel_root/a8fc6d0749b4f3c43761726a36e8ec4c/execroot/org_tensorflow/bazel-out/k8-opt/bin/external/local_xla/xla/mlir/backends/gpu/transforms/tests/gpu_memcpy.mlir.test.runfiles/org_tensorflow/../local_xla/xla/mlir/backends/gpu/transforms/tests/gpu_memcpy.mlir.test xla/gpu_memcpy.mlir --config-prefix=runlit -v on GPU 0
lit.py: /root/.cache/bazel/_bazel_root/a8fc6d0749b4f3c43761726a36e8ec4c/external/llvm-raw/llvm/utils/lit/lit/discovery.py:137: warning: unable to find test suite for 'xla/gpu_memcpy.mlir'
lit.py: /root/.cache/bazel/_bazel_root/a8fc6d0749b4f3c43761726a36e8ec4c/external/llvm-raw/llvm/utils/lit/lit/discovery.py:276: warning: input 'xla/gpu_memcpy.mlir' contained no tests
error: did not discover any tests for provided path(s)
================================================================================
FAIL: @local_xla//xla/mlir_hlo/tests:Dialect/mhlo/hlo-collapse-elementwise-map.mlir.test (see /root/.cache/bazel/_bazel_root/a8fc6d0749b4f3c43761726a36e8ec4c/execroot/org_tensorflow/bazel-out/k8-opt/testlogs/external/local_xla/xla/mlir_hlo/tests/Dialect/mhlo/hlo-collapse-elementwise-map.mlir.test/test.log)
INFO: From Testing @local_xla//xla/mlir_hlo/tests:Dialect/mhlo/hlo-collapse-elementwise-map.mlir.test:
==================== Test output for @local_xla//xla/mlir_hlo/tests:Dialect/mhlo/hlo-collapse-elementwise-map.mlir.test:
Running test /root/.cache/bazel/_bazel_root/a8fc6d0749b4f3c43761726a36e8ec4c/execroot/org_tensorflow/bazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/tests/Dialect/mhlo/hlo-collapse-elementwise-map.mlir.test.runfiles/org_tensorflow/../local_xla/xla/mlir_hlo/tests/Dialect/mhlo/hlo-collapse-elementwise-map.mlir.test -v external/local_xla/xla/mlir_hlo/tests/Dialect/mhlo/hlo-collapse-elementwise-map.mlir on GPU 0
lit.py: /root/.cache/bazel/_bazel_root/a8fc6d0749b4f3c43761726a36e8ec4c/external/llvm-raw/llvm/utils/lit/lit/TestingConfig.py:151: fatal: unable to parse config file '/root/.cache/bazel/_bazel_root/a8fc6d0749b4f3c43761726a36e8ec4c/execroot/org_tensorflow/bazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/tests/Dialect/mhlo/hlo-collapse-elementwise-map.mlir.test.runfiles/org_tensorflow/external/local_xla/xla/mlir_hlo/tests/lit.site.cfg.py', traceback: Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/a8fc6d0749b4f3c43761726a36e8ec4c/external/llvm-raw/llvm/utils/lit/lit/TestingConfig.py", line 139, in load_from_path
    exec(compile(data, path, "exec"), cfg_globals, None)
  File "/root/.cache/bazel/_bazel_root/a8fc6d0749b4f3c43761726a36e8ec4c/execroot/org_tensorflow/bazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/tests/Dialect/mhlo/hlo-collapse-elementwise-map.mlir.test.runfiles/org_tensorflow/external/local_xla/xla/mlir_hlo/tests/lit.site.cfg.py", line 44, in <module>
    lit_config.load_config(config, "xla/mlir_hlo/tests/lit.cfg.py")
  File "/root/.cache/bazel/_bazel_root/a8fc6d0749b4f3c43761726a36e8ec4c/external/llvm-raw/llvm/utils/lit/lit/LitConfig.py", line 152, in load_config
    config.load_from_path(path, self)
  File "/root/.cache/bazel/_bazel_root/a8fc6d0749b4f3c43761726a36e8ec4c/external/llvm-raw/llvm/utils/lit/lit/TestingConfig.py", line 126, in load_from_path
    f = open(path)
FileNotFoundError: [Errno 2] No such file or directory: 'xla/mlir_hlo/tests/lit.cfg.py'

trevor-m avatar Jan 03 '24 22:01 trevor-m

@ddunl It seems like perhaps the lit configs need to be updated in order to run the XLA tests via TF? I see a few hardcoded paths like this: https://github.com/tensorflow/tensorflow/blob/fc347ca5597a0a2a58d4f0f344d1210afede2cc5/third_party/xla/xla/glob_lit_test.bzl#L54

trevor-m avatar Jan 03 '24 22:01 trevor-m

I see, I think that I probably deleted the transformations that kept this working as these aren't tested on CI anymore from the TF point of view, but I'll try to fix this in the next two weeks or so (I'll be on vacation for a little bit soon so won't get to this as quickly as I normally could).

ddunl avatar Jan 04 '24 02:01 ddunl

I see, I think that I probably deleted the transformations that kept this working as these aren't tested on CI anymore from the TF point of view, but I'll try to fix this in the next two weeks or so (I'll be on vacation for a little bit soon so won't get to this as quickly as I normally could).

Thank you!

trevor-m avatar Jan 04 '24 16:01 trevor-m

HI @ddunl, wondering if you had a chance to take a look at this issue yet. Thanks!

trevor-m avatar Jan 17 '24 21:01 trevor-m

@ddunl I managed to get these working with a combination of:

  1. My changes here https://github.com/trevor-m/tensorflow/commit/c2fabfcb0e67df4f269483f61f1a443b853dded7
    Looks like there are a few paths that need to be modified to reflect the TF runfile structure: MLIR_HLO_TOOLS_DIR used for lit config template and also XlaSrcRoot() used by the tests. Also, it looks like some string substitution is going awry in some of the .mlir files during the automated transfer from XLA->TF (copybara?)
  2. This commit https://github.com/tensorflow/tensorflow/commit/767225e0d1acdb2ac5f478baba9a158f7c4b5ea0

trevor-m avatar Feb 02 '24 01:02 trevor-m