djl
djl copied to clipboard
Tensorflow & MxNet Sparse bugs
Description
- MxNet shape uses
intand then casts results intolongwhich crashes when the data has a shape larger thanint. This does not happen forPyTorchvariant of sparse matrices. It's a simple fix. PyTorchCOO matrix does not supportsum()/sum(axis)which are supported according to PyTorch documentation.
Expected Behavior
All should work.
Error Message
It's easy to reproduce
How to Reproduce?
Can't shorten it down as it's in my pipeline with my data.
Steps to reproduce
(Paste the commands you ran that produced the error.) MxNet
- Have a sparse matrix with more columns that
intcan cover. - Ask for
getShape() - Crash
PyTorch
- Try to call
.sum()on aCOOmatrix - Fails
What have you tried to solve it?
- Tried using MxNet which found a new crash
- Tried using Tensorflow, does not support sparcity (even if TF has sparsetensor)
Environment Info
Using Windows 10
@Lundez Thanks for reporting this issue, will take a look.
@Lundez
I created a PR trying to address MXNet large tensor issue: #1183, unfortunately, getShape() will still cause crash. By default, MXNet is compiled without large tensor support for performance reason. You have to manually compile MXNet with USE_INT64_TENSOR_SIZE=1 flag. And then you can set MNXET_LIBRARY_PATH environment variable to load your customized libmxnet.so. See: http://docs.djl.ai/docs/development/troubleshooting.html#4-how-to-run-djl-using-other-versions-of-apache-mxnet
@frankfliu I see. Thank you for the assistance! 🤗
@frankfliu did you ever get around to validate the PyTorch COOMatrix.sum() issue?