cruise
cruise copied to clipboard
Implement multi-threaded Trainer.
As discussed in offline, multi -threaded Trainer should be implemented. Each Trainer thread shared training data to compute and they run until all of the current batch data is computed.
When we implement multi-threaded Trainer, we can consider two versions for threads to write their gradient updates:
- Synchronized fashion
- Hogwild-style (lock-free)
We should build both versions and compare the performance of them.
I'll start to send a PR that enables multi-thread in MLR of consistent (i.e., non-hogwild) version first.