bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Export ProGuard specs from java_import.

Open cpsauer opened this issue 2 years ago • 17 comments

Hi, wonderful Bazel folks. Here's another PR for your consideration :)

I happened to notice that Bazel was failing to use ProGuards bundled in java_imported JARs while I was writing #14910, so I figured I'd create a fix. Didn't immediately need it for myself, I just figured I'd be a good ex-Googler and fix it when I saw it.

As before, please feel free to modify as you see fit, especially if that'd make things faster overall! And it's my first time touching a lot of these parts of the codebase--so I'm hoping you'll be patient with me.

More details from the commit message:

Background: JAR files can bundle ProGuard specs under META-INF/proguard/ [See https://developer.android.com/studio/build/shrink-code]

Problem: Bazel previously erroniously ignored these ProGuard specs, leading to failures with, for example, androidx.annotation.Keep. Bad times.

There was previously a parallel issue with aar_import. [Fixed in #12749]

Solution: This change causes the previously ignored, embedded proguards to be extracted, validated, and then bubbled up correctly via the ProguardSpecProvider.

There's also a minor fix to aar_import, adding proguard validation and slightly simplifying the resulting code. For reasoning behind why library proguards should be validated, see the module docstring of proguard_whitelister.py

Remaining issues: JAR files brought down from Maven via rules_jvm_external bypass java_import in favor of rolling their own jvm_import, since java_import apparently been broken for Kotlin for years. That'll need a subsequent fix, since this only fixes java_import. For context on the Kotlin breakage, see https://github.com/bazelbuild/bazel/issues/4549. For the status on fixes in rules_jvm_external, see https://github.com/bazelbuild/rules_jvm_external/issues/672


And here, as opposed to the commit message, I'd like to lobby for higher prioritization of #4549. (not because I need it or use Kotlin) but because since last prioritization, (1) Kotlin has become the preferred language for Android and (2) it looks like the breakage is severe enough that we're getting multiple semi-problematic re-implementations of java_import, including by Bazel team members. Better to just fix get java_import back in working order, I'd think, even if just temporarily by providing a simple option to disable ijar per-target. ^ As an update, I tossed up a PR with a temporary fix. See #14967. It doesn't fix ijars issue, but I think it should unblock things for people as a good 80/20 fix.

Thanks for your consideration! Chris (ex-Googler)

cpsauer avatar Mar 05 '22 00:03 cpsauer

cc-ing also @ahumesky and @benjaminRomano, since they were involved with the (mostly parallel) #12749

cpsauer avatar Mar 05 '22 00:03 cpsauer

cc @kevin1e100 for prioritization request cc @cushon

This looks like a useful feature. Importing it will require some internal setup. My intuition says it shouldn't cause a regression - but a benchmark will tell us more.

I will delay the decision to @cushon, because it's a new feature and I'm only a mainteiner.

@cpsauer did you consider adding tests for this feature also to JavaImportConfiguredTargetTest.java?

comius avatar Mar 08 '22 08:03 comius

Thanks for taking a look @comius! Glad the functionality looks good to you on a quick inspection

I don't quite know how the import/benchmark process works these days, but I'd love it if you'd enlighten me! Looking forward to hearing from you, @cushon.

[Sounds like maybe I should hear what he thinks and how things go with your internal benchmarking and regression testing--and then we'll talk tests?]

cpsauer avatar Mar 09 '22 00:03 cpsauer

Howdy, all :)

cpsauer avatar Mar 18 '22 21:03 cpsauer

Friendly bump!

cpsauer avatar Mar 29 '22 03:03 cpsauer

So internally I think we're relying on tools like Proguard processing their inputs to discover any META-INF/proguard entries, we don't rely on the build system to unpack them and pass them in as separate inputs. I think this is the relevant logic in r8: https://r8.googlesource.com/r8/+/refs/heads/main/src/main/java/com/android/tools/r8/R8Command.java#532

https://github.com/bazelbuild/bazel/pull/14741 is related, IIUC Bazel doesn't support r8 out of the box.

I don't think this is the right change for the internal java_import, because we wouldn't use this functionality, and there are some costs associated with adding additional actions to java_import and to adding an implicit dep on a py_binary here. (If adding a dep on a tool like this was necessary, I'd probably look at putting it in the java_toolchain instead of directly on java_import, and also supporting prebuilts to avoid having java_import depend on py_binary.)

Adding it might be the right choice for Bazel, so I'll punt back to @comius for that.

@cpsauer if Bazel supported r8, do you have a sense of whether that would also solve this issue, or do you specifically want proguard support for the META-INF/proguard specs? Do you have a sense of how common these configs are in the Java ecosystem?

cushon avatar Apr 04 '22 17:04 cushon

Hey @cushon! Thanks for your thoughtful reply and some sweet xrefs.

You're way ahead of me on R8; I didn't realize R8 itself automatically pulled ProGuard specs from JARs. Do you know if it also does that for proguard.txt in AARs, too? I'd guess so, but didn't see it in that file and couldn't find it quickly. (Is there external code search for R8?)

I knew only that ProGuard (proper) doesn't pull specs from JARs or AARs, causing issues, already fixed for AARs in aar_import in a way parallel to this (as above). And I knew that Bazel didn't support R8, only ProGuard. But I didn't know about the R8 support coming soon. Hooray! The sooner the better.

To answer your question: Personally, I'm only using ProGuard for Android, and only because R8 isn't (yet) supported by Bazel. Happy to switch and eager for that to land. It isn't even clear to me that Bazel supports ProGuard for Java more generally than Android, since I don't see anything about it on java_binary. But Android is the limit of my ProGuard experience.

One caveat: I would need to be able to use R8 to output JARs--as a drop-in replacement for Proguard, without dexing. (Needed to produce AARs/JARs for distribution. But also if anyone wanted to use it for non-Android Java.) I assume that's possible, since R8 was marketed as a drop-in replacement for ProGuard. However, the R8 docs are certainly thin and it looks like R8 being used in dexing mode in that PR. (I do see --classfile and OutputMode.ClassFile in this old but searchable mirror.) Do you know if R8 can produce JARs?

If all that checks out, that leaves the question: Does Bazel even want to continue to support using ProGuard, as opposed to just R8 everywhere? If R8 everywhere, then I'd imagine there's a lot of ProGuard logic we could eliminate in Bazel, including this, the equivalent for AARs, and maybe all of the ProguardSpecProvider logic if the ProGuard specs are bundled into JARs. If Bazel wants to support ProGuard in addition to R8, that's where we'd want this code for correctness.

Thanks! Chris

cpsauer avatar Apr 05 '22 07:04 cpsauer

cc @ahumesky for plans on r8 support

Do you know if it also does that for proguard.txt in AARs, too? I'd guess so, but didn't see it in that file and couldn't find it quickly.

It doesn't, but this is now tracked by https://issuetracker.google.com/issues/228319861

Is there external code search for R8?

There's https://cs.android.com/, but I can't find R8 in the index. I just cloned it and used grep :)

Do you know if R8 can produce JARs?

I believe it has a bytecode backend, the help text mentions:

  --classfile             # Compile program to Java classfile format.

Does Bazel even want to continue to support using ProGuard, as opposed to just R8 everywhere? If R8 everywhere, then I'd imagine there's a lot of ProGuard logic we could eliminate in Bazel, including this, the equivalent for AARs, and maybe all of the ProguardSpecProvider logic if the ProGuard specs are bundled into JARs. If Bazel wants to support ProGuard in addition to R8, that's where we'd want this code for correctness.

Good question. We're still using Proguard for some stuff internally, so I think there are parts of the code we aren't in a rush to turn down, but for Bazel I think it might make sense to consider removing some of the stuff for extracting ProguardSpecProviders from jars and converging on R8. @ahumesky should have a better sense of the landscape here than me, though :)

cushon avatar Apr 06 '22 16:04 cushon

Sounds good. Thanks for another great reply, @cushon!

Really appreciate your getting that issue tracked and assigned. Bummer that the behavior is inconsistent within R8--and between R8 and ProGuard.

Looking forward to @ahumesky's analysis!

Thanks, all! Chris

cpsauer avatar Apr 06 '22 22:04 cpsauer

Adding it might be the right choice for Bazel, so I'll punt back to @comius for that.

I'd prefer if the "baseline" rules are used in Bazel for now - the simplest and fastest rules that can then be extended with extra features for whomever needs them. This is because it seems extending the rules is far easier than "unextending"/reducing them.

Once java_import is in Starlark, I don't mind it extended to something like android_java_import. Or perhaps this can already be done outside of the Bazel codebase?

Mixing Android features into Java rules also seems like a bad design decision.

comius avatar Apr 08 '22 05:04 comius

Okay, yeah. I guess the larger issue here, as per Liam, still comes down to whether you all want to have ProGuard (vs R8) work correctly for these Java rules, right? Esp given they're already returning ProguardSpecProviders.

While there's a lot of ProGuard usage in Android, my understanding is that it's originally and more generally a Java tool. I feel like I should raise that because I was maybe too self-centered in my answer above. I may personally only be using Java for Android (&Bazel), but you all may have more general goals around supporting Java&ProGuard. I assume ProGuard support might be why the Java rules return ProguardSpecProviders, but maybe that's just to weave Android features through the Java rules. You all would know better than I.

From the above, it sounds like no, doesn't matter if ProGuard works correctly for Java generally; that's a non-goal. Is that right? (Totally fine! I just don't want to assume.)

Conversely, if we do want that ProguardSpecProvider already returned to be right for ProGuard for Java, then the current behavior is definitely problematic, I'd think, because java_import is returning a ProguardSpecProvider missing a key entry. Lots of opportunities for subtle breakage with, e.g. serialization interfaces that require reflection.

cpsauer avatar Apr 08 '22 06:04 cpsauer

Proposal: Maybe we wait and see what @ahumesky says about R8, and then, iff indeed you guys confirm that ProGuard support for general Java is a non-goal despite the ProguardSpecProviders, then we close this down? (And if not, not.)

Decision tree from there would be, I think: @ahumesky says, "full switch to R8 soon, no point in better ProGuard support" -> close this, merging nothing.

@ahumesky says, "still want to support ProGuard for Android" -> Still salvage the aar_import-validation-fix content of this PR. Fixes parallel to the main fix go into a new Android rule, android_java_import, and into all the other places people import JARs for Android: kt_jvm_import and rules_jvm_external's kotlin-friendly java_import replacement.

cpsauer avatar Apr 08 '22 07:04 cpsauer

We do plan to add R8 support, but first we're removing the last bits of dx, updating bazel's D8 dependnecy, and moving to D8-based desugaring. So we probably won't get to R8 support until near the end of the quarter, and there's a bit of legacy stuff around proguarding so that could complicate things (in particular, proguarding happens in android_binary, which is currently still mostly in native (i.e. java). Ideally we would add this to the Starlark rules to avoid duplicate work, but migrating to the Starlark rules is a task in itself). Given the potential performance costs of additional actions in the java rules, if we could hold of a little while longer, we can get R8 support in and solve this in one go.

ahumesky avatar Apr 08 '22 21:04 ahumesky

Excited to hear R8 is coming as soon as the end of the quarter!

@ahumesky, to confirm: The plan is for a full switch to R8, dropping ProGuard like dx and happening soon enough that we shouldn't bother with the fix to java_import, right? And re aar_import, is the plan to continue to unpack AARs and consume their JARs in android_binary--in which case you might want the AAR validation fix in this PR--or will you be switching to consuming them as AARs and we should just close this wholesale?

cpsauer avatar Apr 11 '22 23:04 cpsauer

Friendly bump to @ahumesky (or @ted-xie?) since it's been a month

cpsauer avatar May 17 '22 07:05 cpsauer

Friendly bump x2 to @ahumesky. Any updates on R8--or what you'd like around the AAR proguard validation content of this PR?

cpsauer avatar Jun 24 '22 04:06 cpsauer

Unfortunately we had to deprioritize R8 integration, so it might be still some time before that's available, so we should look at the alternatives here.

One option is a macro that extracts the proguard specs and pairs that with the jar in an android_library:

def java_import_with_proguard_specs(name, jar):
  import_name = "_" + name + "_import"
  native.java_import(
    name = import_name,
    jars = [jar],
  )

  proguard_specs_name = "_" + name + "_proguard_specs"
  native.genrule(
    name = "_" + name + "_extract_proguard_specs",
    srcs = [jar],
    outs = [proguard_specs_name],
    cmd = "unzip -p $< 'META-INF/proguard/*' > $@",
  )
  
  native.java_library(
    name = name,
    exports = [import_name],
    proguard_specs = [proguard_specs_name],
  )

(it's not quite the same as a java_import because this macro takes only 1 jar, but that could be made to work)

This wouldn't work so well for dependencies you don't control, or for java_imports that are generated in workspace rules like our own rules_jvm_external (unless they have some injection mechanism I'm not aware of). This macro could readily be applied though.

Otherwise, I've talked with a few folks, and we can go ahead with this PR if the workaround above is insufficient. A few things we'd need to adjust though:

  • Per cushon@'s comments above, let's make this bazel-only by moving the added logic in JavaImport to BazelJavaSemantics, with a default no-op implementation in a no-op method in JavaSemantics, and JavaImport calls the method on JavaSemantics
  • Similarly, move the implicit dependency from JavaImportBaseRule to BazelJavaImportRule

ahumesky avatar Aug 10 '22 00:08 ahumesky

Thanks for your reply, @ahumesky! I'll work on scoping things out. (I think people would probably want the fix to also solve the large subset of these that come from rules_jvm_external, and I'd think rules_jvm_external would likely want to get the ijar benefits of java_import now that cushion has fixed it for kotlin, but we'll see what they say.)

Sorry to hear R8 got deprioritized. Is it still on the agenda to happen at some point? Any rough sense when? (Days? Months? Quarters? Years?) Apologies if there's a published roadmap somewhere that I've missed.

cpsauer avatar Aug 10 '22 23:08 cpsauer

Hi @cpsauer, is this something you're still looking at?

(btw JavaImport and is now in Starlark: https://github.com/bazelbuild/bazel/issues/15196#issuecomment-1486891278)

hvadehra avatar Apr 13 '23 08:04 hvadehra

Hey, @hvadehra! Good to meet you. Thanks for checking in and reorienting me around things having moved to Starlark.

I'd paused because it seemed like it'd need more plan clarity from you guys, I'd need buy in to merge, and there's was a lot of other valuable, unblocked to do (like building out good cross-editor c-language-family support :). I still think there's a real issue behind this and plenty of value to fixing. It just needs clear calls about when/whether the R8 switch is happening and if soon, whether y'all want to maintain ProGuard compatibility for other use cases, like pure Java. Decision trees above around what to do remain relevant, I think.

If you want my quick take, I think a quick move to R8 would be awesome and worthwhile, both for this and other issues, perhaps releasing what you've already got inside. Failing that, I (still) think it's well worth getting this into java_import, now in Starlark, rather than further blocking on R8, to unbreak things, including upstreaming use into rules_jvm_external to fix the common cases that are currently broken.

cpsauer avatar Apr 13 '23 20:04 cpsauer

So internally I think we're relying on tools like Proguard processing their inputs to discover any META-INF/proguard entries, we don't rely on the build system to unpack them and pass them in as separate inputs. I think this is the relevant logic in r8: https://r8.googlesource.com/r8/+/refs/heads/main/src/main/java/com/android/tools/r8/R8Command.java#532

#14741 is related, IIUC Bazel doesn't support r8 out of the box.

@cushon I think some work will still need to happen on bazel side to support this even with R8 support. AFAICT the only jar that is provided to R8 is the deploy jar which sort of works assuming proguard files have unique names. Some libraries simply declare something like proguard.pro or consumer-proguard-rules.pro so if there are 2 or more jars with such file only one will be considered by r8. Probably the simplest thing is to simply have bazel rename to something unique. Singlejar already has some custom logic to check desugar deps, maybe it can also have some logic to rename anything under META-INF/proguard/ to something unique

mauriciogg avatar May 25 '23 20:05 mauriciogg

So internally I think we're relying on tools like Proguard processing their inputs to discover any META-INF/proguard entries, we don't rely on the build system to unpack them and pass them in as separate inputs. I think this is the relevant logic in r8: https://r8.googlesource.com/r8/+/refs/heads/main/src/main/java/com/android/tools/r8/R8Command.java#532 #14741 is related, IIUC Bazel doesn't support r8 out of the box.

@cushon I think some work will still need to happen on bazel side to support this even with R8 support. AFAICT the only jar that is provided to R8 is the deploy jar which sort of works assuming proguard files have unique names. Some libraries simply declare something like proguard.pro or consumer-proguard-rules.pro so if there are 2 or more jars with such file only one will be considered by r8. Probably the simplest thing is to simply have bazel rename to something unique. Singlejar already has some custom logic to check desugar deps, maybe it can also have some logic to rename anything under META-INF/proguard/ to something unique

I just combed through all my external deps and everything seems to name the file something unique, so maybe its not an actual issue. Not sure if its a proper convention though.

mauriciogg avatar May 25 '23 20:05 mauriciogg

This bit will cause issues with deploy jars https://r8.googlesource.com/r8/+/8c36ac82385e907df2e3166e0534d997a8a418f5/src/main/java/com/android/tools/r8/R8Command.java#1358. If the deploy jar includes artifacts under com.android.tools/ and proguard/ it will ignore stuff under proguard

mauriciogg avatar May 26 '23 20:05 mauriciogg

This bit will cause issues with deploy jars https://r8.googlesource.com/r8/+/8c36ac82385e907df2e3166e0534d997a8a418f5/src/main/java/com/android/tools/r8/R8Command.java#1358. If the deploy jar includes artifacts under com.android.tools/ and proguard/ it will ignore stuff under proguard

@ahumesky I think the starlark implementation can take the full transitive jars instead of the deploy jar in order to avoid this

mauriciogg avatar May 27 '23 00:05 mauriciogg