rules_scala icon indicating copy to clipboard operation
rules_scala copied to clipboard

FileAlreadyExistsException during proto compilation

Open sholokhov opened this issue 6 years ago • 37 comments

Hello!

I've tried to use latest rules_scala version (8092d5f6165a8d9c4797d5f089c1ba4eee3326b1) and faced an interesting problem: during proto compilation of independent rules bazel if both are depending on some common protobuf (like timestamp.proto), bazel crashes with FileAlreadyExistsException. If you just rerun the previous command, everything will pass successfully. It looks like there is some kind of conflict (or just forgot to check that file already exists) during creation of _virtual_imports:

sholokhov$ bazel build //shared-libs/protocols:service_events_scala_proto 
INFO: Invocation ID: 216a834e-0e37-4dac-8306-3e6f0f61386a
INFO: Analyzed target //shared-libs/protocols:service_events_scala_proto (96 packages loaded, 1058 targets configured).
INFO: Found 1 target...
INFO: From Linking external/com_google_protobuf/libprotobuf_lite.a [for host]:
/Library/Developer/CommandLineTools/usr/bin/libtool: file: bazel-out/host/bin/external/com_google_protobuf/_objs/protobuf_lite/io_win32.o has no symbols
INFO: From Linking external/com_google_protobuf/libprotobuf.a [for host]:
/Library/Developer/CommandLineTools/usr/bin/libtool: file: bazel-out/host/bin/external/com_google_protobuf/_objs/protobuf/error_listener.o has no symbols
INFO: From Building external/com_google_protobuf/libprotobuf_java.jar (122 source files, 1 source jar) [for host]:
warning: -parameters is not supported for target value 1.7. Use 1.8 or later.
INFO: From scala @io_bazel_rules_scala//src/scala/scripts:scalapb_generator_lib [for host]:
warning: there was one deprecation warning
warning: there were two deprecation warnings (since 2.11.0)
warning: there were three deprecation warnings in total; re-run with -deprecation for details
three warnings found
ERROR: /code/shared-libs/protocols/BUILD.bazel:34:1: creating scalapb files //shared-libs/protocols:iam_events_proto failed (Exit 1)
java.nio.file.FileAlreadyExistsException: external/com_google_protobuf/google/protobuf/timestamp.proto
        at java.base/sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:573)
        at java.base/sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:254)
        at java.base/java.nio.file.Files.copy(Files.java:1294)
        at scripts.ScalaPBGenerator.$anonfun$setupIncludedProto$1(ScalaPBGenerator.scala:42)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at scripts.ScalaPBGenerator.setupIncludedProto(ScalaPBGenerator.scala:37)
        at scripts.ScalaPBGenerator.processRequest(ScalaPBGenerator.scala:53)
        at io.bazel.rulesscala.worker.GenericWorker.runPersistentWorker(GenericWorker.java:45)
        at io.bazel.rulesscala.worker.GenericWorker.run(GenericWorker.java:111)
        at scripts.ScalaPBWorker$.main(ScalaPBGenerator.scala:26)
        at scripts.ScalaPBWorker.main(ScalaPBGenerator.scala)
Target //shared-libs/protocols:service_events_scala_proto failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 56.293s, Critical Path: 32.41s
INFO: 206 processes: 1 remote cache hit, 193 darwin-sandbox, 12 worker.
FAILED: Build did NOT complete successfully

# just rerun previous command

sholokhov$ bazel build //shared-libs/protocols:service_events_scala_proto 
INFO: Invocation ID: 57ca90aa-87b8-445c-b170-98689f1aa259
INFO: Analyzed target //shared-libs/protocols:service_events_scala_proto (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //shared-libs/protocols:service_events_scala_proto up-to-date:
  bazel-bin/shared-libs/protocols/billing_events_proto_scalapb.jar
  bazel-bin/shared-libs/protocols/iam_events_proto_scalapb.jar
INFO: Elapsed time: 12.149s, Critical Path: 11.99s
INFO: 8 processes: 8 worker.
INFO: Build completed successfully, 9 total actions

Bazel 0.27.0. Just in case, dependency graph:

dep

sholokhov avatar Jul 01 '19 14:07 sholokhov

I have a quick fix for that problem, but I'm not 100% sure if it's a valid fix, because it basically fights against symptoms, not the root cause. But it works for me though:

Index: src/scala/scripts/ScalaPBGenerator.scala
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- src/scala/scripts/ScalaPBGenerator.scala	(revision 8092d5f6165a8d9c4797d5f089c1ba4eee3326b1)
+++ src/scala/scripts/ScalaPBGenerator.scala	(date 1562061165000)
@@ -1,7 +1,7 @@
 package scripts
 
 import java.io.PrintStream
-import java.nio.file.Path
+import java.nio.file.{Path, FileAlreadyExistsException}
 
 import io.bazel.rulesscala.io_utils.DeleteRecursively
 import io.bazel.rulesscala.jar.JarCreator
@@ -12,6 +12,7 @@
 import java.nio.file.{Files, Paths}
 import scalapb.{ScalaPBC, ScalaPbCodeGenerator, ScalaPbcException}
 import java.net.URLClassLoader
+import scala.util.{Try, Failure}
 
 object ScalaPBWorker extends GenericWorker(new ScalaPBGenerator) {
 
@@ -39,7 +40,13 @@
       val relativePath = root.relativize(fullPath)
 
       relativePath.toFile.getParentFile.mkdirs
-      Files.copy(fullPath, relativePath)
+      Try(Files.copy(fullPath, relativePath)) match {
+        case Failure(err) if err.isInstanceOf[FileAlreadyExistsException] =>
+          Console.println(s"File already exists, skipping: ${err.getMessage}")
+        case Failure(err) =>
+          sys.error(err.getMessage)
+        case _ => ()
+      }
     }
   }
   def deleteDir(path: Path): Unit =

sholokhov avatar Jul 02 '19 10:07 sholokhov

Do you possibly have duplicate classes or protos in different targets somehow?

johnynek avatar Jul 02 '19 16:07 johnynek

No, only timestamp and wrappers which are shared between two independent protos:

proto_library(
    name = "A",
    srcs = [ "event/A.proto" ],
    deps = [
        "@com_google_protobuf//:wrappers_proto",
        "@com_google_protobuf//:timestamp_proto",
    ],
)

proto_library(
    name = "B",
    srcs = [ "event/B.proto" ],
    deps = [
        "@com_google_protobuf//:timestamp_proto",
        "@com_google_protobuf//:wrappers_proto",
    ],
)

proto_library(
    name = "service_events_proto",
    visibility = [ "//visibility:public" ],
    srcs = [ "event/ServiceEvent.proto" ],
    deps = [
        ":A",
        ":B",
    ],
)

scalapb_proto_library(
    name = "service_events_scala_proto",
    visibility = ["//visibility:public"],
    deps = [ ":service_events_proto" ],
)

sholokhov avatar Jul 02 '19 16:07 sholokhov

Sorry for pushing you guys, but is there any thoughts or updates on it?

Now we have to use forked version of this rules with quick fix for the problem, but I would prefer to depend on official rules rather than supporting forked version.

sholokhov avatar Jul 16 '19 09:07 sholokhov

Could you make a PR with a test that fails? That would be very helpful if we are going to fix it. Thank you for your help!

Cc @ianoc

johnynek avatar Jul 16 '19 16:07 johnynek

I tried quickly to repo this and couldn't, so would love a PR showing the issue

ianoc avatar Jul 16 '19 17:07 ianoc

After upgrading from bazel 0.26 to 0.27 this issue started to happen very frequently on our repo

ptarjan avatar Jul 16 '19 22:07 ptarjan

I've prepared small example which reproduces this issue: https://github.com/sholokhov/bazel-rules-scala-issue

Just try to build everything from scratch (bazel build //...) and it fails with FileAlreadyExistsException.

sholokhov avatar Jul 17 '19 08:07 sholokhov

Thanks for the repo case @sholokhov, its quite interesting -- so this was something related to a bazel change in 0.27, i tried 0.27 and it fails, pretty handily. But 0.26 (clean expunge + build) in 2 cycle attempts didn't fail. (Not very scientific, overall, but rebuilding proto takes a few mins so its not so much fun)

Question here will be seeing if the approach in the code is actually still safe now or if the virtual imports in 0.28(edit -- actually looks like this change is on master but not in 0.28) maybe make this no longer required at all. https://github.com/bazelbuild/bazel/issues/7157

ianoc avatar Jul 17 '19 17:07 ianoc

It sounds like maybe actions here:

  1. try see if the change on bazel master would make this go away a) if so we should put a fix in until that release comes out, but that might be testing to see if target file exists, if it does asserting the contents are identical? , also removing any files added when we are done
  2. If it doesn't, maybe look closer to see if this is safe going forward. (concern is really if the changes around symlinks could mean its possible to write to the external shared folder and not some scratch space?)

ianoc avatar Jul 17 '19 17:07 ianoc

@ianoc yay! Thanks for the breakdown. Is that something you're going to work on?

ptarjan avatar Jul 17 '19 17:07 ptarjan

At some point i will, have a lot on this week and @sholokhov has a workaround that sounds like it'll survive for another few days anyway.

ianoc avatar Jul 17 '19 17:07 ianoc

@ianoc ok, thank you! I'll switch our repo to point to his fork so I can get our bazel upgrade in then. Does that sound reasonable?

ptarjan avatar Jul 17 '19 17:07 ptarjan

@ptarjan thats really an internal stripe discussion, but i don't believe generally speaking we'd want our main repos pointing to forks on the internet.

ianoc avatar Jul 17 '19 18:07 ianoc

so, @sholokhov posted a work around:

https://github.com/bazelbuild/rules_scala/issues/779#issuecomment-507608794

@ptarjan can you make a PR with that diff and we can merge as a workaround and keep this issue open to root cause it?

johnynek avatar Jul 17 '19 18:07 johnynek

Thanks @johnynek. Done here: https://github.com/bazelbuild/rules_scala/pull/787

ptarjan avatar Jul 17 '19 18:07 ptarjan

Thanks @ptarjan ! Also @sholokhov can you tell the bot on the PR that you don't mind us pulling this in?

johnynek avatar Jul 17 '19 18:07 johnynek

It's interesting that this fix seemed to still result in a similar error on windows:

https://travis-ci.org/bazelbuild/rules_scala/jobs/560109357#L310

ERROR: C:/users/travis/_bazel_travis/xsbyteln/external/com_google_protobuf/BUILD:284:2: Generating Descriptor Set proto_library @com_google_protobuf//:wrappers_proto failed (Exit 1): protoc.exe failed: error executing command 
  cd C:/users/travis/_bazel_travis/xsbyteln/execroot/io_bazel_rules_scala
  SET PATH=C:\program files\git\usr\bin;C:\program files\git\bin;C:\program files\git\usr\bin;C:\program files\git\usr\bin;C:\tools\ruby25\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\ProgramData\GooGet;C:\Program Files\Google\Compute Engine\metadata_scripts;C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin;C:\Program Files\Google\Compute Engine\sysprep;C:\Program Files\Docker;C:\ProgramData\chocolatey\bin;C:\Program Files\CMake\bin;C:\Program Files\Git\cmd;C:\Program Files\LLVM\bin;C:\Program Files\dotnet;C:\Users\travis\AppData\Local\Microsoft\WindowsApps;C:\ProgramData\chocolatey\lib\mingw\tools\install\mingw64\bin
  bazel-out/host/bin/external/com_google_protobuf/protoc.exe --proto_path=external/com_google_protobuf --descriptor_set_out=bazel-out/x64_windows-fastbuild/genfiles/external/com_google_protobuf/wrappers_proto-descriptor-set.proto.bin -Igoogle/protobuf/wrappers.proto=bazel-out/x64_windows-fastbuild/genfiles/external/com_google_protobuf/google/protobuf/wrappers.proto --direct_dependencies google/protobuf/wrappers.proto --direct_dependencies_violation_msg=%s is imported, but @com_google_protobuf//:wrappers_proto doesn't directly depend on a proto_library that 'srcs' it. bazel-out/x64_windows-fastbuild/genfiles/external/com_google_protobuf/google/protobuf/wrappers.proto
Execution platform: @bazel_tools//platforms:host_platform
bazel-out/x64_windows-fastbuild/genfiles/external/com_google_protobuf/google/protobuf/wrappers.proto: Input is shadowed in the --proto_path by "external/com_google_protobuf/google/protobuf/wrappers.proto".  Either use the latter file as your input or reorder the --proto_path so that the former file's location comes first.

so, there is some definite duplication somewhere.

johnynek avatar Jul 17 '19 18:07 johnynek

We experiencing a variation of this on all our Windows machine (it runs just fine on MacOS/Linux). It's happens on a specific project, and this specific project is unique because it's the only project that use the strip_import_prefixoption in proto_library. Here's the stack-trace:

ERROR: D:/projects/depo/legacy/kit/protobuf/BUILD:78:1: scala //legacy/kit/protobuf:pill_scala failed (Exit 1)
java.nio.file.FileAlreadyExistsException: \kit\protobuf\pill.proto
        at sun.nio.fs.WindowsFileCopy.copy(WindowsFileCopy.java:124)
        at sun.nio.fs.WindowsFileSystemProvider.copy(WindowsFileSystemProvider.java:278)
        at java.nio.file.Files.copy(Files.java:1274)
        at io.bazel.rulesscala.scalac.ScalacProcessor.copyResources(ScalacProcessor.java:309)
        at io.bazel.rulesscala.scalac.ScalacProcessor.processRequest(ScalacProcessor.java:73)
        at io.bazel.rulesscala.worker.GenericWorker.runPersistentWorker(GenericWorker.java:45)
        at io.bazel.rulesscala.worker.GenericWorker.run(GenericWorker.java:111)
        at io.bazel.rulesscala.scalac.ScalaCInvoker.main(ScalaCInvoker.java:41)
INFO: Elapsed time: 1542.308s, Critical Path: 146.22s
INFO: 395 processes: 389 local, 6 worker.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully

I tried to check for the workaround mentioned above - catching the FileAlreadyExistsException and handling it, but I couldn't find it. Was it removed deliberately? Or perhaps that's not a variation but a new bug?

amitz avatar May 21 '20 14:05 amitz

I don’t know and honestly our windows support is not awesome since we don’t run our CI builds with it. We don’t run our CI builds with it because there’s no one to own the windows problems (we had windows CI for 6 months maybe and the flakiness and issues killed our velocity).

Can you take a look at git history to see when the workaround was removed? Was it intentional or by mistake?

Cc @simuons as someone who I think touched this area

ittaiz avatar May 22 '20 18:05 ittaiz

The file no longer exists. I'm assuming it happened during the phases refactor. If you are find with the old workaround, I can try to find the new spot where this happens, and solve it the same way I was solved previously - by catching the Exception and continue if the file is already there.

But let's wait to hear @simuons opinion.

amitz avatar May 24 '20 13:05 amitz

I think so but I’d really appreciate it if you could find the exact point where it was removed. I’d love to see if indeed it was an oversight and maybe ping whoever did it

On Sun, 24 May 2020 at 16:16 Amit Zarfati [email protected] wrote:

The file no longer exists. I'm assuming it happened during the phases refactor. If you are find with the old workaround, I can try to find the new spot where this happens, and solve it the same way I was solved previously - by catching the Exception and continue if the file is already there.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/779#issuecomment-633229510, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKQQFYJMFS32GKQO7ANMU3RTEM3ZANCNFSM4H4TGEQA .

--

Ittai Zeidman

40 Hanamal street, Tel Aviv, Israel

http://www.wix.com

ittaiz avatar May 24 '20 17:05 ittaiz

Sure. Here you go. It's completely possible that while It's the same error, it's not the same bug - and another place might need that exception handling.

amitz avatar May 25 '20 06:05 amitz

Ok. I’m ok with the old workaround but the thing I’d like is a test. This regression happened because no test told the OP of #850 that they need to keep the workaround. Do you think you can build such a test? I’ll probably accept a fix without a test but I’d be much more concerned about it

On Mon, 25 May 2020 at 9:46 Amit Zarfati [email protected] wrote:

Sure. Here https://github.com/bazelbuild/rules_scala/commit/548bce937978530394be2b118cfd983194b94632#diff-a5d868bf47a1ca0e7449cfc151f2cc05L37 you go. It's completely possible that while It's the same error, it's not the same bug - and another place might need that exception handling.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/779#issuecomment-633407744, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKQQF7MFMV4CDICH2EA6ILRTIH4LANCNFSM4H4TGEQA .

--

Ittai Zeidman

40 Hanamal street, Tel Aviv, Israel

http://www.wix.com

ittaiz avatar May 26 '20 05:05 ittaiz

It seems like the origin of the problem was never found. So the work-around was there to bypass it. If we want a test to verify it, we need to understand what's the problem is really is, (and a Windows setup for CI). We also need to find a way to always reproduce it (right now it happens around 80% of the times).

I will ask one of the Windows user here to try figure out why we get this exception to start with.

Unless you meant a Unit-test in Scala for this specific method?

amitz avatar May 26 '20 05:05 amitz

The best is indeed a Windows CI (we can have it with one specific test for the time being) and E2E. If your windows user can find the root cause that would be really beneficial.

If you invest significant time and come up empty I think a unit test in scala would at least help us know we’ve messed it up.

ittaiz avatar May 26 '20 05:05 ittaiz

Sounds good. I'll ask the a Window user to spend some time researching it and will update when I have a better understanding of what's up.

amitz avatar May 26 '20 06:05 amitz

This issue isn't related to the other protobuf one I believe -- the last change on the protobuf thing removed the call to copy. So all of that code is gone. The call stack here is quite different.

This looks related I think to https://github.com/bazelbuild/rules_scala/pull/949 , the proto resources are not unique I'd say in their paths which causes scalac to barf.

ianoc avatar May 27 '20 03:05 ianoc

But wouldn't that happen in all system and not just Windows?

amitz avatar May 31 '20 09:05 amitz

I agree with @ianoc it's related to https://github.com/bazelbuild/rules_scala/pull/949 though it's interesting why proto paths are not unique and we see this only in windows. @amitz could you share a test-case/setup for proto_library/scala_proto_library which reproduces the issue so I could investigate. @ittaiz I think we could do two things to unblock windows users:

  1. Change Files.copy(source, target); to Files.copy(source, target, StandardCopyOption.REPLACE_EXISTING); in ScalacWorker#copyResources
  2. Revert proto bundling to generates sources jar

simuons avatar Jun 01 '20 07:06 simuons