rules_scala
rules_scala copied to clipboard
FileAlreadyExistsException during proto compilation
Hello!
I've tried to use latest rules_scala version (8092d5f6165a8d9c4797d5f089c1ba4eee3326b1) and faced an interesting problem: during proto compilation of independent rules bazel if both are depending on some common protobuf (like timestamp.proto), bazel crashes with FileAlreadyExistsException. If you just rerun the previous command, everything will pass successfully. It looks like there is some kind of conflict (or just forgot to check that file already exists) during creation of _virtual_imports:
sholokhov$ bazel build //shared-libs/protocols:service_events_scala_proto
INFO: Invocation ID: 216a834e-0e37-4dac-8306-3e6f0f61386a
INFO: Analyzed target //shared-libs/protocols:service_events_scala_proto (96 packages loaded, 1058 targets configured).
INFO: Found 1 target...
INFO: From Linking external/com_google_protobuf/libprotobuf_lite.a [for host]:
/Library/Developer/CommandLineTools/usr/bin/libtool: file: bazel-out/host/bin/external/com_google_protobuf/_objs/protobuf_lite/io_win32.o has no symbols
INFO: From Linking external/com_google_protobuf/libprotobuf.a [for host]:
/Library/Developer/CommandLineTools/usr/bin/libtool: file: bazel-out/host/bin/external/com_google_protobuf/_objs/protobuf/error_listener.o has no symbols
INFO: From Building external/com_google_protobuf/libprotobuf_java.jar (122 source files, 1 source jar) [for host]:
warning: -parameters is not supported for target value 1.7. Use 1.8 or later.
INFO: From scala @io_bazel_rules_scala//src/scala/scripts:scalapb_generator_lib [for host]:
warning: there was one deprecation warning
warning: there were two deprecation warnings (since 2.11.0)
warning: there were three deprecation warnings in total; re-run with -deprecation for details
three warnings found
ERROR: /code/shared-libs/protocols/BUILD.bazel:34:1: creating scalapb files //shared-libs/protocols:iam_events_proto failed (Exit 1)
java.nio.file.FileAlreadyExistsException: external/com_google_protobuf/google/protobuf/timestamp.proto
at java.base/sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:573)
at java.base/sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:254)
at java.base/java.nio.file.Files.copy(Files.java:1294)
at scripts.ScalaPBGenerator.$anonfun$setupIncludedProto$1(ScalaPBGenerator.scala:42)
at scala.collection.immutable.List.foreach(List.scala:392)
at scripts.ScalaPBGenerator.setupIncludedProto(ScalaPBGenerator.scala:37)
at scripts.ScalaPBGenerator.processRequest(ScalaPBGenerator.scala:53)
at io.bazel.rulesscala.worker.GenericWorker.runPersistentWorker(GenericWorker.java:45)
at io.bazel.rulesscala.worker.GenericWorker.run(GenericWorker.java:111)
at scripts.ScalaPBWorker$.main(ScalaPBGenerator.scala:26)
at scripts.ScalaPBWorker.main(ScalaPBGenerator.scala)
Target //shared-libs/protocols:service_events_scala_proto failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 56.293s, Critical Path: 32.41s
INFO: 206 processes: 1 remote cache hit, 193 darwin-sandbox, 12 worker.
FAILED: Build did NOT complete successfully
# just rerun previous command
sholokhov$ bazel build //shared-libs/protocols:service_events_scala_proto
INFO: Invocation ID: 57ca90aa-87b8-445c-b170-98689f1aa259
INFO: Analyzed target //shared-libs/protocols:service_events_scala_proto (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //shared-libs/protocols:service_events_scala_proto up-to-date:
bazel-bin/shared-libs/protocols/billing_events_proto_scalapb.jar
bazel-bin/shared-libs/protocols/iam_events_proto_scalapb.jar
INFO: Elapsed time: 12.149s, Critical Path: 11.99s
INFO: 8 processes: 8 worker.
INFO: Build completed successfully, 9 total actions
Bazel 0.27.0. Just in case, dependency graph:

I have a quick fix for that problem, but I'm not 100% sure if it's a valid fix, because it basically fights against symptoms, not the root cause. But it works for me though:
Index: src/scala/scripts/ScalaPBGenerator.scala
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- src/scala/scripts/ScalaPBGenerator.scala (revision 8092d5f6165a8d9c4797d5f089c1ba4eee3326b1)
+++ src/scala/scripts/ScalaPBGenerator.scala (date 1562061165000)
@@ -1,7 +1,7 @@
package scripts
import java.io.PrintStream
-import java.nio.file.Path
+import java.nio.file.{Path, FileAlreadyExistsException}
import io.bazel.rulesscala.io_utils.DeleteRecursively
import io.bazel.rulesscala.jar.JarCreator
@@ -12,6 +12,7 @@
import java.nio.file.{Files, Paths}
import scalapb.{ScalaPBC, ScalaPbCodeGenerator, ScalaPbcException}
import java.net.URLClassLoader
+import scala.util.{Try, Failure}
object ScalaPBWorker extends GenericWorker(new ScalaPBGenerator) {
@@ -39,7 +40,13 @@
val relativePath = root.relativize(fullPath)
relativePath.toFile.getParentFile.mkdirs
- Files.copy(fullPath, relativePath)
+ Try(Files.copy(fullPath, relativePath)) match {
+ case Failure(err) if err.isInstanceOf[FileAlreadyExistsException] =>
+ Console.println(s"File already exists, skipping: ${err.getMessage}")
+ case Failure(err) =>
+ sys.error(err.getMessage)
+ case _ => ()
+ }
}
}
def deleteDir(path: Path): Unit =
Do you possibly have duplicate classes or protos in different targets somehow?
No, only timestamp and wrappers which are shared between two independent protos:
proto_library(
name = "A",
srcs = [ "event/A.proto" ],
deps = [
"@com_google_protobuf//:wrappers_proto",
"@com_google_protobuf//:timestamp_proto",
],
)
proto_library(
name = "B",
srcs = [ "event/B.proto" ],
deps = [
"@com_google_protobuf//:timestamp_proto",
"@com_google_protobuf//:wrappers_proto",
],
)
proto_library(
name = "service_events_proto",
visibility = [ "//visibility:public" ],
srcs = [ "event/ServiceEvent.proto" ],
deps = [
":A",
":B",
],
)
scalapb_proto_library(
name = "service_events_scala_proto",
visibility = ["//visibility:public"],
deps = [ ":service_events_proto" ],
)
Sorry for pushing you guys, but is there any thoughts or updates on it?
Now we have to use forked version of this rules with quick fix for the problem, but I would prefer to depend on official rules rather than supporting forked version.
Could you make a PR with a test that fails? That would be very helpful if we are going to fix it. Thank you for your help!
Cc @ianoc
I tried quickly to repo this and couldn't, so would love a PR showing the issue
After upgrading from bazel 0.26 to 0.27 this issue started to happen very frequently on our repo
I've prepared small example which reproduces this issue: https://github.com/sholokhov/bazel-rules-scala-issue
Just try to build everything from scratch (bazel build //...) and it fails with FileAlreadyExistsException.
Thanks for the repo case @sholokhov, its quite interesting -- so this was something related to a bazel change in 0.27, i tried 0.27 and it fails, pretty handily. But 0.26 (clean expunge + build) in 2 cycle attempts didn't fail. (Not very scientific, overall, but rebuilding proto takes a few mins so its not so much fun)
Question here will be seeing if the approach in the code is actually still safe now or if the virtual imports in 0.28(edit -- actually looks like this change is on master but not in 0.28) maybe make this no longer required at all. https://github.com/bazelbuild/bazel/issues/7157
It sounds like maybe actions here:
- try see if the change on bazel master would make this go away a) if so we should put a fix in until that release comes out, but that might be testing to see if target file exists, if it does asserting the contents are identical? , also removing any files added when we are done
- If it doesn't, maybe look closer to see if this is safe going forward. (concern is really if the changes around symlinks could mean its possible to write to the external shared folder and not some scratch space?)
@ianoc yay! Thanks for the breakdown. Is that something you're going to work on?
At some point i will, have a lot on this week and @sholokhov has a workaround that sounds like it'll survive for another few days anyway.
@ianoc ok, thank you! I'll switch our repo to point to his fork so I can get our bazel upgrade in then. Does that sound reasonable?
@ptarjan thats really an internal stripe discussion, but i don't believe generally speaking we'd want our main repos pointing to forks on the internet.
so, @sholokhov posted a work around:
https://github.com/bazelbuild/rules_scala/issues/779#issuecomment-507608794
@ptarjan can you make a PR with that diff and we can merge as a workaround and keep this issue open to root cause it?
Thanks @johnynek. Done here: https://github.com/bazelbuild/rules_scala/pull/787
Thanks @ptarjan ! Also @sholokhov can you tell the bot on the PR that you don't mind us pulling this in?
It's interesting that this fix seemed to still result in a similar error on windows:
https://travis-ci.org/bazelbuild/rules_scala/jobs/560109357#L310
ERROR: C:/users/travis/_bazel_travis/xsbyteln/external/com_google_protobuf/BUILD:284:2: Generating Descriptor Set proto_library @com_google_protobuf//:wrappers_proto failed (Exit 1): protoc.exe failed: error executing command
cd C:/users/travis/_bazel_travis/xsbyteln/execroot/io_bazel_rules_scala
SET PATH=C:\program files\git\usr\bin;C:\program files\git\bin;C:\program files\git\usr\bin;C:\program files\git\usr\bin;C:\tools\ruby25\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\ProgramData\GooGet;C:\Program Files\Google\Compute Engine\metadata_scripts;C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin;C:\Program Files\Google\Compute Engine\sysprep;C:\Program Files\Docker;C:\ProgramData\chocolatey\bin;C:\Program Files\CMake\bin;C:\Program Files\Git\cmd;C:\Program Files\LLVM\bin;C:\Program Files\dotnet;C:\Users\travis\AppData\Local\Microsoft\WindowsApps;C:\ProgramData\chocolatey\lib\mingw\tools\install\mingw64\bin
bazel-out/host/bin/external/com_google_protobuf/protoc.exe --proto_path=external/com_google_protobuf --descriptor_set_out=bazel-out/x64_windows-fastbuild/genfiles/external/com_google_protobuf/wrappers_proto-descriptor-set.proto.bin -Igoogle/protobuf/wrappers.proto=bazel-out/x64_windows-fastbuild/genfiles/external/com_google_protobuf/google/protobuf/wrappers.proto --direct_dependencies google/protobuf/wrappers.proto --direct_dependencies_violation_msg=%s is imported, but @com_google_protobuf//:wrappers_proto doesn't directly depend on a proto_library that 'srcs' it. bazel-out/x64_windows-fastbuild/genfiles/external/com_google_protobuf/google/protobuf/wrappers.proto
Execution platform: @bazel_tools//platforms:host_platform
bazel-out/x64_windows-fastbuild/genfiles/external/com_google_protobuf/google/protobuf/wrappers.proto: Input is shadowed in the --proto_path by "external/com_google_protobuf/google/protobuf/wrappers.proto". Either use the latter file as your input or reorder the --proto_path so that the former file's location comes first.
so, there is some definite duplication somewhere.
We experiencing a variation of this on all our Windows machine (it runs just fine on MacOS/Linux). It's happens on a specific project, and this specific project is unique because it's the only project that use the strip_import_prefixoption in proto_library. Here's the stack-trace:
ERROR: D:/projects/depo/legacy/kit/protobuf/BUILD:78:1: scala //legacy/kit/protobuf:pill_scala failed (Exit 1)
java.nio.file.FileAlreadyExistsException: \kit\protobuf\pill.proto
at sun.nio.fs.WindowsFileCopy.copy(WindowsFileCopy.java:124)
at sun.nio.fs.WindowsFileSystemProvider.copy(WindowsFileSystemProvider.java:278)
at java.nio.file.Files.copy(Files.java:1274)
at io.bazel.rulesscala.scalac.ScalacProcessor.copyResources(ScalacProcessor.java:309)
at io.bazel.rulesscala.scalac.ScalacProcessor.processRequest(ScalacProcessor.java:73)
at io.bazel.rulesscala.worker.GenericWorker.runPersistentWorker(GenericWorker.java:45)
at io.bazel.rulesscala.worker.GenericWorker.run(GenericWorker.java:111)
at io.bazel.rulesscala.scalac.ScalaCInvoker.main(ScalaCInvoker.java:41)
INFO: Elapsed time: 1542.308s, Critical Path: 146.22s
INFO: 395 processes: 389 local, 6 worker.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
I tried to check for the workaround mentioned above - catching the FileAlreadyExistsException and handling it, but I couldn't find it. Was it removed deliberately? Or perhaps that's not a variation but a new bug?
I don’t know and honestly our windows support is not awesome since we don’t run our CI builds with it. We don’t run our CI builds with it because there’s no one to own the windows problems (we had windows CI for 6 months maybe and the flakiness and issues killed our velocity).
Can you take a look at git history to see when the workaround was removed? Was it intentional or by mistake?
Cc @simuons as someone who I think touched this area
The file no longer exists. I'm assuming it happened during the phases refactor. If you are find with the old workaround, I can try to find the new spot where this happens, and solve it the same way I was solved previously - by catching the Exception and continue if the file is already there.
But let's wait to hear @simuons opinion.
I think so but I’d really appreciate it if you could find the exact point where it was removed. I’d love to see if indeed it was an oversight and maybe ping whoever did it
On Sun, 24 May 2020 at 16:16 Amit Zarfati [email protected] wrote:
The file no longer exists. I'm assuming it happened during the phases refactor. If you are find with the old workaround, I can try to find the new spot where this happens, and solve it the same way I was solved previously - by catching the Exception and continue if the file is already there.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/779#issuecomment-633229510, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKQQFYJMFS32GKQO7ANMU3RTEM3ZANCNFSM4H4TGEQA .
--
Ittai Zeidman
40 Hanamal street, Tel Aviv, Israel
Sure. Here you go. It's completely possible that while It's the same error, it's not the same bug - and another place might need that exception handling.
Ok. I’m ok with the old workaround but the thing I’d like is a test. This regression happened because no test told the OP of #850 that they need to keep the workaround. Do you think you can build such a test? I’ll probably accept a fix without a test but I’d be much more concerned about it
On Mon, 25 May 2020 at 9:46 Amit Zarfati [email protected] wrote:
Sure. Here https://github.com/bazelbuild/rules_scala/commit/548bce937978530394be2b118cfd983194b94632#diff-a5d868bf47a1ca0e7449cfc151f2cc05L37 you go. It's completely possible that while It's the same error, it's not the same bug - and another place might need that exception handling.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/779#issuecomment-633407744, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKQQF7MFMV4CDICH2EA6ILRTIH4LANCNFSM4H4TGEQA .
--
Ittai Zeidman
40 Hanamal street, Tel Aviv, Israel
It seems like the origin of the problem was never found. So the work-around was there to bypass it. If we want a test to verify it, we need to understand what's the problem is really is, (and a Windows setup for CI). We also need to find a way to always reproduce it (right now it happens around 80% of the times).
I will ask one of the Windows user here to try figure out why we get this exception to start with.
Unless you meant a Unit-test in Scala for this specific method?
The best is indeed a Windows CI (we can have it with one specific test for the time being) and E2E. If your windows user can find the root cause that would be really beneficial.
If you invest significant time and come up empty I think a unit test in scala would at least help us know we’ve messed it up.
Sounds good. I'll ask the a Window user to spend some time researching it and will update when I have a better understanding of what's up.
This issue isn't related to the other protobuf one I believe -- the last change on the protobuf thing removed the call to copy. So all of that code is gone. The call stack here is quite different.
This looks related I think to https://github.com/bazelbuild/rules_scala/pull/949 , the proto resources are not unique I'd say in their paths which causes scalac to barf.
But wouldn't that happen in all system and not just Windows?
I agree with @ianoc it's related to https://github.com/bazelbuild/rules_scala/pull/949 though it's interesting why proto paths are not unique and we see this only in windows.
@amitz could you share a test-case/setup for proto_library/scala_proto_library which reproduces the issue so I could investigate.
@ittaiz I think we could do two things to unblock windows users:
- Change
Files.copy(source, target);toFiles.copy(source, target, StandardCopyOption.REPLACE_EXISTING);inScalacWorker#copyResources - Revert proto bundling to generates sources jar