rules_scala
rules_scala copied to clipboard
Scalafmt fails with Build without the bytes
When running a target like this:
# Important to clean, as the files can stay from a previous bazel build with a lower Bazel version.
bazel clean
bazel run //utils/smth:library.format-test
I receive:
utils/smth/src/main/scala/com/dummy/smth/Smth.scala
diff: /home/gege/.cache/bazel/_bazel_user/7f83a0b1f0940f88aec1f4530f927169/execroot/repo/bazel-out/k8-fastbuild/bin/utils/smth/utils/smth/src/main/scala/com/dummy/smth/Smth.scala.fmt.output: No such file or directory
It seems the action responsible for creating this file executes (I have tested disabling writing the output there, but then obviously it fails due to not generating an output), but at a later step the action running scalafmt from the manifest cannot see the scalafmt outputs.
One difference I can see in the bazel-bin folder between Bazel 6.4.0 and Bazel 7.0rc2 (after running bazel run //utils/smth:library.format-test
):
# 6.4.0:
ls bazel-bin/utils/smth/utils/smth/src/main/scala/com/dummy/smth/
Smth.scala.fmt.output
# 7.0rc2:
ls bazel-bin/utils/smth/utils/smth/src/main/scala/com/dummy/smth/
[empty]
Is this a Bazel bug or a rules_scala one?
Added reproduction in https://github.com/gergelyfabian/bazel-scala-example/ (master branch):
# Change .bazelversion to 7.0.0rc2 and then run:
bazel run //example-maven:example-maven.format-test
Checking whether this is a bazel issue:
export BAZELISK_CLEAN=true
bazel --bisect=6.4.0..release-7.0.0rc2 run //example-maven:example-maven.format-test
Checking whether this is a bazel issue:
export BAZELISK_CLEAN=true bazel --bisect=6.4.0..release-7.0.0rc2 run //example-maven:example-maven.format-test
Trick is that I had to turn off using a java toolchain and remotejdk_17 first, as remotejdk_17 does not work for some commits as bisecting.
--- Bisect Result
first bad commit is https://github.com/bazelbuild/bazel/commit/9c96529fca4a135c162e35ce2882834b879438fb
That is:
Build without the bytes by default
by changing the default value of `--remote_download_outputs` to `toplevel`.
Thank you, bazelisk, indeed this seems to be the answer. This works:
bazel run --remote_download_all //example-maven:example-maven.format-test
Forming the question again, scalafmt internal outputs not working with "build without the bytes" is a bug of Bazel or rules_scala?
Starting from Bazel 7, the default download mode will be changed from --remote_download_all to --remote_download_toplevel. That is, Bazel will no longer download intermediate outputs of remote actions into the local output base. In return, your remote builds will be faster.
"Using a remote cache or even a disk cache will make actions "remote" from this point of view." (thanks @fmeum)
In my case, disk cache is enabled.
The information I received on the Bazel slack was that if rules_scala emitted correct information (DefaultInfo(runfiles = ...)
) for the outputs it generates (the .fmt.output
files), then those should be visible also when using Build without the bytes.
Unfortunately I cannot find yet whether the scalafmt phase is generating proper DefaultInfo.
The information I received on the Bazel slack was that if rules_scala emitted correct information (
DefaultInfo(runfiles = ...)
) for the outputs it generates (the.fmt.output
files), then those should be visible also when using Build without the bytes. Unfortunately I cannot find yet whether the scalafmt phase is generating proper DefaultInfo.
@gergelyfabian could you check if anything might be missing here: https://github.com/bazelbuild/rules_scala/blob/master/scala/private/phases/phase_scalafmt.bzl#L18?
The current state where I'm at is this. phase_scalafmt
seems to be adding the proper information (on the line you quoted):
# Return a depset containing all the relevant files, so a wrapping `sh_test` can successfully access them.
return struct(runfiles = depset([manifest] + files + srcs))
Then, we go to the phase_default_info, and runfiles properly includes the .fmt.output
files there (added a print before that):
https://github.com/bazelbuild/rules_scala/blob/master/scala/private/phases/phase_default_info.bzl#L46
But, when api.bzl runs (https://github.com/bazelbuild/rules_scala/blob/master/scala/private/phases/api.bzl#L85) it does not get the DefaultInfo
that phase_default_info was supposed to return to it. I don't get why.
Debug output like this:
DEBUG: /home/user/.cache/bazel/_bazel_user/64527517dacf7170f7d914889f83481c/external/io_bazel_rules_scala/scala/private/phases/phase_default_info.bzl:38:10: Returning DefaultInfo with runfiles: depset([<generated file example-maven/example-maven.jar>, <generated file external/maven/com/twitter/algebird-core_2.13/0.13.7/processed_algebird-core_2.13-0.13.7.jar>, <generated file external/maven/com/googlecode/javaewah/JavaEWAH/1.1.7/processed_JavaEWAH-1.1.7.jar>, <generated file external/maven/org/scala-lang/modules/scala-collection-compat_2.13/2.1.6/processed_scala-collection-compat_2.13-2.1.6.jar>, <generated file external/maven/org/scala-lang/scala-library/2.13.12/processed_scala-library-2.13.12.jar>, <generated file external/maven/org/scala-lang/scala-reflect/2.13.12/processed_scala-reflect-2.13.12.jar>, <generated file external/maven/org/typelevel/algebra_2.13/2.0.0/processed_algebra_2.13-2.0.0.jar>, <generated file external/maven/org/typelevel/cats-kernel_2.13/2.0.0/processed_cats-kernel_2.13-2.0.0.jar>, <generated file example-maven/format/example-maven/manifest.txt>, <generated file example-maven/example-maven/src/main/scala/mypackage/Maven.scala.fmt.output>, <source file example-maven/src/main/scala/mypackage/Maven.scala>])
DEBUG: /home/user/.cache/bazel/_bazel_user/64527517dacf7170f7d914889f83481c/external/io_bazel_rules_scala/scala/private/phases/api.bzl:85:22: Detected external providers: {"DefaultInfo": struct(data_runfiles = None, default_runfiles = <unknown object com.google.devtools.build.lib.analysis.Runfiles>, files = depset([<generated file example-maven/example-maven.jar>]), files_to_run = None)}
Debugging code is here: https://github.com/gergelyfabian/rules_scala/tree/bazel-7.0-scalafmt-bug
Btw. the following command also fails for Bazel 6.4.0: bazel run --remote_download_toplevel //example-maven:example-maven.format-test
Maybe the necessary information is lost here:
print("DefaultInfo with runfiles: %s" % depset(direct = direct, transitive = runfiles))
print("DefaultInfo runfiles after wrapping: %s" % ctx.runfiles(transitive_files = depset(direct = direct, transitive = runfiles), collect_data = True))
Added these debug lines just before the return statement in phase_default_info.bzl.
Prints:
DEBUG: /home/user/.cache/bazel/_bazel_user/64527517dacf7170f7d914889f83481c/external/io_bazel_rules_scala/scala/private/phases/phase_default_info.bzl:38:10: DefaultInfo with runfiles: depset([<generated file example-maven/example-maven.jar>, <generated file external/maven/com/twitter/algebird-core_2.13/0.13.7/processed_algebird-core_2.13-0.13.7.jar>, <generated file external/maven/com/googlecode/javaewah/JavaEWAH/1.1.7/processed_JavaEWAH-1.1.7.jar>, <generated file external/maven/org/scala-lang/modules/scala-collection-compat_2.13/2.1.6/processed_scala-collection-compat_2.13-2.1.6.jar>, <generated file external/maven/org/scala-lang/scala-library/2.13.12/processed_scala-library-2.13.12.jar>, <generated file external/maven/org/scala-lang/scala-reflect/2.13.12/processed_scala-reflect-2.13.12.jar>, <generated file external/maven/org/typelevel/algebra_2.13/2.0.0/processed_algebra_2.13-2.0.0.jar>, <generated file external/maven/org/typelevel/cats-kernel_2.13/2.0.0/processed_cats-kernel_2.13-2.0.0.jar>, <generated file example-maven/format/example-maven/manifest.txt>, <generated file example-maven/example-maven/src/main/scala/mypackage/Maven.scala.fmt.output>, <source file example-maven/src/main/scala/mypackage/Maven.scala>])
DEBUG: /home/user/.cache/bazel/_bazel_user/64527517dacf7170f7d914889f83481c/external/io_bazel_rules_scala/scala/private/phases/phase_default_info.bzl:39:10: DefaultInfo runfiles after wrapping: <unknown object com.google.devtools.build.lib.analysis.Runfiles>
I'm wondering whether collect_data
does what we expect it to do, especially if I read the comment:
https://github.com/bazelbuild/rules_scala/blob/master/scala/private/phases/phase_default_info.bzl#L43-L46
I've added some more debugging and I think api.bzl is returning proper DefaultInfo for the rules.
Having https://github.com/bazelbuild/rules_scala/blob/master/scala/private/phases/api.bzl#L91 (that sets up the final rule output):
I have added a debug line just before that:
print("DefaultInfo before the end: %s" % acculmulated_external_providers["DefaultInfo"].default_runfiles.files)
This prints for me:
DEBUG: /home/user/.cache/bazel/_bazel_user/64527517dacf7170f7d914889f83481c/external/io_bazel_rules_scala/scala/private/phases/api.bzl:93:10: DefaultInfo before the end: depset([<generated file example-maven/example-maven.jar>, <generated file external/maven/com/twitter/algebird-core_2.13/0.13.7/processed_algebird-core_2.13-0.13.7.jar>, <generated file external/maven/com/googlecode/javaewah/JavaEWAH/1.1.7/processed_JavaEWAH-1.1.7.jar>, <generated file external/maven/org/scala-lang/modules/scala-collection-compat_2.13/2.1.6/processed_scala-collection-compat_2.13-2.1.6.jar>, <generated file external/maven/org/scala-lang/scala-library/2.13.12/processed_scala-library-2.13.12.jar>, <generated file external/maven/org/scala-lang/scala-reflect/2.13.12/processed_scala-reflect-2.13.12.jar>, <generated file external/maven/org/typelevel/algebra_2.13/2.0.0/processed_algebra_2.13-2.0.0.jar>, <generated file external/maven/org/typelevel/cats-kernel_2.13/2.0.0/processed_cats-kernel_2.13-2.0.0.jar>, <generated file example-maven/format/example-maven/manifest.txt>, <generated file example-maven/example-maven/src/main/scala/mypackage/Maven.scala.fmt.output>, <source file example-maven/src/main/scala/mypackage/Maven.scala>], order = "postorder")
So, this includes the files that I think should be there, specifically: Maven.scala.fmt.output
So I think the rule definition returns a DefaultInfo that contains what is necessary. Even though it's under default_runfiles.files. When setting up ctx.runfiles (that ends up in DefaultInfo) I tried adding these files into ctx.runfiles(files = ...) , but still it did not help, in either way those files end up in .default_runfiles.files
Is the DefaultInfo constructed by rules_scala faulty, or it's a Bazel bug?
Updated debugging code in https://github.com/gergelyfabian/rules_scala/tree/bazel-7.0-scalafmt-bug.
Opened Bazel bug: https://github.com/bazelbuild/bazel/issues/20099
Quoting a response from the bazel issue:
There are a couple of odd things going on here, and I'm not sure that any of them are Bazel's fault.
It appears that you're invoking
bazel run
on a generated file, not an executable rule:$ bazel query --output=label_kind //example-maven:example-maven.format-test generated file //example-maven:example-maven.format-test
I don't know why
bazel run
allows a plain file as an argument, but it seems reasonable to me that running a file won't give access to the runfiles; files have no associated runfiles, rules do.That brings us to another problem, which is that
scala_library
isn't an executable rule, so it's not possible tobazel run
it either:$ bazel query --output=label_kind //example-maven:example-maven scala_library rule //example-maven:example-maven $ bazel run //example-maven:example-maven ERROR: Cannot run target //example-maven:example-maven: Not executable
So my suggestion would be to (1) mark
scala_library
as an executable rule, (2) return the formatter in theDefaultInfo.executable
field alongside the runfiles, (3) make sure the formatter looks for the runfiles in the right location (perhaps by using a runfiles library for your implementation language), and finally (4)bazel run
thescala_library
directly.As to why you're only running into this now: as it currently stands, your
bazel run
command is effectively non-hermetic, butbazel run
(unlike build actions) doesn't provide a safeguard against non-hermeticity; if Bazel downloads all build outputs, your program will find them anyway, but Bazel has no idea that the dependency is there, and won't realize that they must be downloaded when building without the bytes.
Originally posted by @tjgq in https://github.com/bazelbuild/bazel/issues/20099#issuecomment-1802513312
Hey @gergelyfabian I'm curious if you've tried this with https://github.com/aspect-build/rules_lint We support scalafmt and shouldn't have any of these bugs that Tiago pointed out there.
I experienced some issues when trying to apply the latest version of rules_lint, and also it seems it's ~3 times slower on our monorepo than running rules_scala's scalafmt (with parallelized targets), so I think we'll be staying with that.