creg
creg copied to clipboard
use OpenMP to parallelize learning
During learning, computing the loss and its gradient relative to the parameters (especially with large numbers of training instances or features) can be quite expensive. OpenMP (http://openmp.org/wp/), which is supported by default with g++, could easily be used to parallelize this computation. Basically, all the loops of the following form for (unsigned i = 0; i < training.size(); ++i) are good candidates for parallelization. Reading about OpenMP such "reductions" will have to be implemented by creating a gradients buffer per thread and then summing them at the end (although this summing could also be parallelized).