Anders Boesen Lindbo Larsen comments

Results 110 comments of


                                            Anders Boesen Lindbo Larsen

Unable to produce images > 800²px on cards with more than 4GB RAM

Ah, great job finding it finally! I admire your persistence. :) I will try to look into `kernel_win2img()` at a later point. It seems like this is the problem.

Unable to produce images > 800²px on cards with more than 4GB RAM

If you install with `sudo` you need to export the environment variables as well. This is done with `sudo -E`.

Unable to produce images > 800²px on cards with more than 4GB RAM

@mirzman: what is the output of `ldd `? I think you need to update the environment variable `LD_LIBRARY_PATH` to point to the correct libraries.

Feature request for `predict`

I would prefer not to check array dimensionality in convnet layers for every call to fprop(). How about raising an error in _setup() if the array is not 4D?

Update math.py to include tanh elementwise op

Hey, thank you for the proposal! I have intentionally left out `cudarray.nnet.tanh()` because NumPy exposes it at the root level. Therefore, you can find it as `cudarray.tanh()`. The implementation is...

Update math.py to include tanh elementwise op

Hey, I have just pushed a commit. In this new version of CUDArray you can check the backend by printing `cudarray._backend`. If this variable is `'numpy'` instead of `'cuda'`, something...

fp16

Good question! FP16 is not supported at the moment. I expect it would be fairly straightforward (but tedious) to add FP16 to all the CUDA kernels. Maybe it's easier to...

fp16

I think all kernels have types as template parameters. Thus, you can add float16 support by instantiating template functions with float16. Alternatively you can try to change all occurences of...

Any way to use MKL speedup in numpy?

For fully-connected architectures, you should make sure that MKL parallelizes the matrix multiplications across multiple cores. Maybe it isn't configured correctly? For convolutional architectures you are out of luck. The...

Way to run operations on CPU, thusly RAM instead of GPU/VRAM?

The CPU implementation of the convolution operation is too slow for any practical use. You would have to implement something faster first.