mxnet
mxnet copied to clipboard
[FEATURE] Add Java build using JavaCPP to map the C API
Description
Based on the discussion in issue #17783, I am proposing here as base for a new Java API to use JavaCPP, which takes care of building MXNet from source, generating appropriate low-level wrapper classes and JNI code for the C API, as well as bundle the resulting binaries in artifacts. This build was adapted, ported, and upgraded to Gradle from the JavaCPP Presets for MXNet 1.x available here:
- https://github.com/bytedeco/javacpp-presets/tree/master/mxnet
This PR also contains a new "java" workflow for GitHub Actions that builds and deploys binaries for Linux (CentOS 7), Mac, and Windows, with and without CUDA (although the build currently fails with CUDA 11.2 on Windows), which produces files as per those previously deployed here:
- https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/mxnet/2.0-SNAPSHOT/
(To have this deploy artifacts as part of org.apache, we need to enter credentials under the CI_DEPLOY_USERNAME and CI_DEPLOY_PASSWORD secrets for GitHub Actions.)
Checklist
Essentials
- [x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
- [x] Changes are complete (i.e. I finished coding on this PR)
- [x] All changes have test coverage
- [x] Code is well-documented
Changes
- [x] Configurations to build and map the C API to Java with Gradle and JavaCPP
- [x] An initial simple test that runs as part of the CI on GitHub Actions under the "java" workflow
- [x] API documentation that gets deployed automatically from Javadoc
Comments
- The generated wrapper classes are not currently being pushed as source to the repository. Should they be? We can see what that looks like here: https://github.com/bytedeco/javacpp-presets/tree/master/mxnet/src/gen/java/org/bytedeco/mxnet
- I believe the code is generally straightforward to follow, so I have not added a lot of comments, but please let me know which parts may need to be better documented, if any.
- There probably should be some small example somewhere that developers of the high-level API could use as starting point. But where should we put that? What should it contain? I'm thinking something like this: https://github.com/bytedeco/javacpp-presets/tree/master/mxnet#sample-usage
Hey @saudet , Thanks for submitting the PR All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:
- To trigger all jobs: @mxnet-bot run ci [all]
- To trigger specific jobs: @mxnet-bot run ci [job1, job2]
CI supported jobs: [unix-cpu, unix-gpu, miscellaneous, website, windows-cpu, edge, sanity, centos-cpu, clang, centos-gpu, windows-gpu]
Note: Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. All CI tests must pass before the PR can be merged.
It looks like we're not allowed to use almost all third-party actions: https://github.com/apache/incubator-mxnet/actions/runs/514790755 Does this mean we have to copy/paste essentially everything in this repository?
Does this mean we have to copy/paste essentially everything in this repository?
Would it be too complicated to copy the scripts from https://github.com/bytedeco/javacpp-presets/blob/master/.github/actions/deploy-centos/action.yml to this repo? Otherwise, it might be required that apache infra approves the bytedeco/javacpp-presets github action?
Does this mean we have to copy/paste essentially everything in this repository?
Would it be too complicated to copy the scripts from https://github.com/bytedeco/javacpp-presets/blob/master/.github/actions/deploy-centos/action.yml to this repo? Otherwise, it might be required that apache infra approves the
bytedeco/javacpp-presetsgithub action?
I could copy those scripts here, yes, and we could also copy the source code of all the other dependencies, such as OpenBLAS, GFortran, etc in this repository, but that's not being done. What's the rationale for allowing third-party software that isn't related to GitHub Actions, but not allowing third-party software that is related to GitHub Actions? For example, what about https://github.com/al-cheb/configure-pagefile-action/? I didn't write that one, and although we could also copy all the files here, it's not a contribution from me. How does Apache's IP team deal with that?
What's the rationale for allowing third-party software that isn't related to GitHub Actions, but not allowing third-party software that is related to GitHub Actions?
If it doesn't work, it's may just be due some default setting. You can advise what Apache Infra needs to do and we can ask them to do it
Hum, I'm assuming it has something to do with this, whose details are being kept secret for the moment for obvious reasons:
Notice: December 27, 2020: We only allow Actions that are official "Made by GitHub" or local to the Apache org on GitHub, to address a potential security vulnerability. This is an incident-related policy change. We are researching the situation, and the policy may evolve based on what we learn.
It looks like we can find out about the details of this on the internal infra mailing list: http://mail-archives.apache.org/mod_mbox/www-builds/202012.mbox/%3cCAOtwSKFnB6POd2N=YpbDBJdeRgPHZDX8ZKE11rBgUW8eSYBrig@mail.gmail.com%3e If you could take a look at that and see what this is all about and maybe how we could work around this without importing everything in the Apache domain, it would be helpful.
The last message on the thread provides some details: http://mail-archives.apache.org/mod_mbox/www-builds/202012.mbox/%3cCAMdOYcJaprG2Wn6nDygx3vdDKpeKVczGbRSypSHQ7hjBOCT_uQ@mail.gmail.com%3e I'm assuming that using a SHA instead of a TAG solves this, so why would infra block everything? Is it just blocked by default, but specific repositories can still be enabled by the maintainers?
Thank you looking this up! The message refers to a serious security vulnerability in Github Actions that gives the job complete read and write access to the underlying git repository (if GITHUB_TOKEN is present). I'm aware of two possible problems with that: 1) Executing untrusted code via pull_request_target that can then use GITHUB_TOKEN and do damage 2) The third-party action get's updated maliciously and starts to do damage.
Is it possible to hardcode the third-party actions to a particular commit? If so, we can manually review their current state and make sure they are not malicious. Once done, I'm confident that Apache Infra would be happy to help enable the respective third-party extensions. On the other hand, if any updates of the third-party action are automatically applied, I'm not sure if Apache Infra would be happy to take the risk. WDYT?
Yes, that's what I understood from the thread there, but it sounded like there was more to it than that. In the last commit, I've pinned the actions in java.yml to full length commit SHA, as per these recommendations: https://docs.github.com/en/actions/learn-github-actions/security-hardening-for-github-actions
Thank you. I opened https://issues.apache.org/jira/browse/INFRA-21361