Ralph Kube
Ralph Kube
power9 support would very helpful. Many HPC clusters are power9 based, it would be great to use vscode to edit files on these systems.
I've ported this pullback to CUDA.jl: https://gist.github.com/rkube/b17ef683409d76a3f01bcc590b85de6e Where would be a good place for that code?
Cool, that works. Here is a gist that includes code from https://github.com/JuliaDiff/ChainRules.jl/pull/469 into a minimum working example: https://gist.github.com/rkube/b965267944115af7d13b3f00e7533572 This code gives the same results as comparable code for pytorch. But...
Those four reduction operators are all dispatched in a similar manner: https://github.com/JuliaLang/julia/blob/bb2d8630f3aeb99c38c659a35ee9aa57bb71012a/base/reducedim.jl#L885 So I thought to handle them all equally. Feel free to close this if it's not needed.
Thanks @niklasschmitz for providing the updated code. The runtime for gradient is now about 100x that for gmres. @antoine-levitt reported 20x. Do you know what happened here? Also the number...
I've replaced pathos with [https://docs.python.org/3/library/multiprocessing.html](https://docs.python.org/3/library/multiprocessing.html) and I'm getting the same errors.
To preprocess the dataset on traverse I need to limit the number of threads used for preprocessing https://github.com/PPPLDeepLearning/plasma-python/issues/82
Each traverse node has 2 processors, 16 cores per processor and 4 threads per core. When I run pre-processing with 126 threads it starts off well but throws errors after...
Summit and traverse are very similar, but no 100% identical. ``` (frnn) [rkube@traverse examples]$ lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core:...
They can be merged.