mxnet icon indicating copy to clipboard operation
mxnet copied to clipboard

[FEATURE] Add Java build using JavaCPP to map the C API

Open saudet opened this issue 4 years ago • 10 comments

Description

Based on the discussion in issue #17783, I am proposing here as base for a new Java API to use JavaCPP, which takes care of building MXNet from source, generating appropriate low-level wrapper classes and JNI code for the C API, as well as bundle the resulting binaries in artifacts. This build was adapted, ported, and upgraded to Gradle from the JavaCPP Presets for MXNet 1.x available here:

  • https://github.com/bytedeco/javacpp-presets/tree/master/mxnet

This PR also contains a new "java" workflow for GitHub Actions that builds and deploys binaries for Linux (CentOS 7), Mac, and Windows, with and without CUDA (although the build currently fails with CUDA 11.2 on Windows), which produces files as per those previously deployed here:

  • https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/mxnet/2.0-SNAPSHOT/

(To have this deploy artifacts as part of org.apache, we need to enter credentials under the CI_DEPLOY_USERNAME and CI_DEPLOY_PASSWORD secrets for GitHub Actions.)

Checklist

Essentials

  • [x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • [x] Changes are complete (i.e. I finished coding on this PR)
  • [x] All changes have test coverage
  • [x] Code is well-documented

Changes

  • [x] Configurations to build and map the C API to Java with Gradle and JavaCPP
  • [x] An initial simple test that runs as part of the CI on GitHub Actions under the "java" workflow
  • [x] API documentation that gets deployed automatically from Javadoc

Comments

  • The generated wrapper classes are not currently being pushed as source to the repository. Should they be? We can see what that looks like here: https://github.com/bytedeco/javacpp-presets/tree/master/mxnet/src/gen/java/org/bytedeco/mxnet
  • I believe the code is generally straightforward to follow, so I have not added a lot of comments, but please let me know which parts may need to be better documented, if any.
  • There probably should be some small example somewhere that developers of the high-level API could use as starting point. But where should we put that? What should it contain? I'm thinking something like this: https://github.com/bytedeco/javacpp-presets/tree/master/mxnet#sample-usage

saudet avatar Jan 27 '21 10:01 saudet

Hey @saudet , Thanks for submitting the PR All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [unix-cpu, unix-gpu, miscellaneous, website, windows-cpu, edge, sanity, centos-cpu, clang, centos-gpu, windows-gpu]


Note: Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. All CI tests must pass before the PR can be merged.

mxnet-bot avatar Jan 27 '21 10:01 mxnet-bot

It looks like we're not allowed to use almost all third-party actions: https://github.com/apache/incubator-mxnet/actions/runs/514790755 Does this mean we have to copy/paste essentially everything in this repository?

saudet avatar Jan 27 '21 10:01 saudet

Does this mean we have to copy/paste essentially everything in this repository?

Would it be too complicated to copy the scripts from https://github.com/bytedeco/javacpp-presets/blob/master/.github/actions/deploy-centos/action.yml to this repo? Otherwise, it might be required that apache infra approves the bytedeco/javacpp-presets github action?

leezu avatar Jan 27 '21 12:01 leezu

Does this mean we have to copy/paste essentially everything in this repository?

Would it be too complicated to copy the scripts from https://github.com/bytedeco/javacpp-presets/blob/master/.github/actions/deploy-centos/action.yml to this repo? Otherwise, it might be required that apache infra approves the bytedeco/javacpp-presets github action?

I could copy those scripts here, yes, and we could also copy the source code of all the other dependencies, such as OpenBLAS, GFortran, etc in this repository, but that's not being done. What's the rationale for allowing third-party software that isn't related to GitHub Actions, but not allowing third-party software that is related to GitHub Actions? For example, what about https://github.com/al-cheb/configure-pagefile-action/? I didn't write that one, and although we could also copy all the files here, it's not a contribution from me. How does Apache's IP team deal with that?

saudet avatar Jan 28 '21 08:01 saudet

What's the rationale for allowing third-party software that isn't related to GitHub Actions, but not allowing third-party software that is related to GitHub Actions?

If it doesn't work, it's may just be due some default setting. You can advise what Apache Infra needs to do and we can ask them to do it

leezu avatar Jan 28 '21 13:01 leezu

Hum, I'm assuming it has something to do with this, whose details are being kept secret for the moment for obvious reasons:

Notice: December 27, 2020: We only allow Actions that are official "Made by GitHub" or local to the Apache org on GitHub, to address a potential security vulnerability. This is an incident-related policy change. We are researching the situation, and the policy may evolve based on what we learn.

It looks like we can find out about the details of this on the internal infra mailing list: http://mail-archives.apache.org/mod_mbox/www-builds/202012.mbox/%3cCAOtwSKFnB6POd2N=YpbDBJdeRgPHZDX8ZKE11rBgUW8eSYBrig@mail.gmail.com%3e If you could take a look at that and see what this is all about and maybe how we could work around this without importing everything in the Apache domain, it would be helpful.

saudet avatar Jan 29 '21 06:01 saudet

The last message on the thread provides some details: http://mail-archives.apache.org/mod_mbox/www-builds/202012.mbox/%3cCAMdOYcJaprG2Wn6nDygx3vdDKpeKVczGbRSypSHQ7hjBOCT_uQ@mail.gmail.com%3e I'm assuming that using a SHA instead of a TAG solves this, so why would infra block everything? Is it just blocked by default, but specific repositories can still be enabled by the maintainers?

saudet avatar Jan 29 '21 06:01 saudet

Thank you looking this up! The message refers to a serious security vulnerability in Github Actions that gives the job complete read and write access to the underlying git repository (if GITHUB_TOKEN is present). I'm aware of two possible problems with that: 1) Executing untrusted code via pull_request_target that can then use GITHUB_TOKEN and do damage 2) The third-party action get's updated maliciously and starts to do damage.

Is it possible to hardcode the third-party actions to a particular commit? If so, we can manually review their current state and make sure they are not malicious. Once done, I'm confident that Apache Infra would be happy to help enable the respective third-party extensions. On the other hand, if any updates of the third-party action are automatically applied, I'm not sure if Apache Infra would be happy to take the risk. WDYT?

leezu avatar Jan 29 '21 18:01 leezu

Yes, that's what I understood from the thread there, but it sounded like there was more to it than that. In the last commit, I've pinned the actions in java.yml to full length commit SHA, as per these recommendations: https://docs.github.com/en/actions/learn-github-actions/security-hardening-for-github-actions

saudet avatar Jan 29 '21 20:01 saudet

Thank you. I opened https://issues.apache.org/jira/browse/INFRA-21361

leezu avatar Jan 29 '21 20:01 leezu