djl
djl copied to clipboard
Is it possible to train a PyTorch SSD model on an M1 Mac - or is this not yet implemented? PtNDArrayEx.multiBoxPrior(PtNDArrayEx.java:697) UnsupportedOperationException: Not implemented
Description
When running TrainPikachuTest
on an M1 Mac I get the error UnsupportedOperationException: Not implemented
Expected Behavior
The TrainPikachuTest
runs as expected and a model is produced.
Error Message
Exception in thread "main" java.lang.UnsupportedOperationException: Not implemented
at ai.djl.pytorch.engine.PtNDArrayEx.multiBoxPrior(PtNDArrayEx.java:697)
at ai.djl.modality.cv.MultiBoxPrior.generateAnchorBoxes(MultiBoxPrior.java:68)
at ai.djl.basicmodelzoo.cv.object_detection.ssd.SingleShotDetection.forwardInternal(SingleShotDetection.java:84)
at ai.djl.nn.AbstractBaseBlock.forwardInternal(AbstractBaseBlock.java:128)
at ai.djl.nn.AbstractBaseBlock.forward(AbstractBaseBlock.java:93)
at ai.djl.training.Trainer.forward(Trainer.java:189)
at ai.djl.training.EasyTrain.trainSplit(EasyTrain.java:122)
at ai.djl.training.EasyTrain.trainBatch(EasyTrain.java:110)
at ai.djl.training.EasyTrain.fit(EasyTrain.java:58)
at ai.djl.examples.training.TrainPikachu.runExample(TrainPikachu.java:93)
at ai.djl.examples.training.TrainPikachuTest.testDetection(TrainPikachuTest.java:52)
at ai.djl.examples.training.TrainPikachuTest.main(TrainPikachuTest.java:30)
How to Reproduce?
Run the class TrainPikachuTest
on an M1 Mac
Steps to reproduce
(Paste the commands you ran that produced the error.)
- Run the
TrainPikachuTest
class withDJL_DEFAULT_ENGINE=PyTorch
What have you tried to solve it?
- Debugging through the code - and looking at the implementation of the class.
- Looking for other examples of training doing SingleShotDetection. (Didn't find any).
Environment Info
DJL_DEFAULT_ENGINE=PyTorch
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home
MXNet has several helper operators specific to SSD and they were used as part of the DJL SSD model you are using. Unfortunately, MXNet doesn't support M1 and the model doesn't run on PyTorch.
If you are interested in contributing here, you could build an implementation of SSD that does not rely on those operators or you could add the missing implementations as part of PtNDArrayEx
.
@zachgk thanks for getting back to me. Thanks for creating an opportunity to contribute.
I'm sizing it up - and working out a specification and way to measure if it is working. In terms of a specification - it seems to be this class here: https://github.com/apache/mxnet/blob/master/src/operator/contrib/multibox_prior.cc Please help me out if you know a better one.
In terms of measuring if it is working - I'm looking in here - and not finding anything that corresponds: https://github.com/apache/mxnet/tree/master/tests/cpp/operator
Can you help me out with how you would measure a working implementation?
Probably the easiest way to test whether it is working is to use a hard-coded value for inputs and outputs. We have some examples in OptimizerTest
.
So, find a known sample data and then you can put it into the integration suite so it is run in all engines. This way, it ensures that all engines have matching behavior (including between the MXNet version and your new implementation). It also ensures that the behavior won't change because it would require also changing the values in the test
I'll get back to you - I'm writing a test.
I've done a pull request on this. https://github.com/deepjavalibrary/djl/pull/2715 The two different unit tests nearly match up, but not quite - so I'm asking for some help on this.