Yunseong Lee
Yunseong Lee
The main reason for this failure is our code depends on the REEF's latest `SNAPSHOT`. Whenever we need new features in REEF, we have to manually build REEF and include...
IMHO, we'd better find a way to rebuild REEF automatically somehow. I think we can add another project to Jenkins, which tests & builds Cay periodically (e.g., daily), and there...
Thanks for the report! Probably we are creating too many threads; i think it's time to look into the problem and come up with a better management on threads. I'll...
When we implement multi-threaded Trainer, we can consider two versions for threads to write their gradient updates: 1) Synchronized fashion 2) Hogwild-style (lock-free) We should build both versions and compare...
I'll start to send a PR that enables multi-thread in MLR of consistent (i.e., non-hogwild) version first.
@gyeongin Yes, both results were very similar.
Totally agreed! I'll prepare a draft and share it in this thread. Thanks for the great suggestion!
I'm sharing the draft:  You can find the original file [here](https://docs.google.com/presentation/d/1j3X9bWRjzarhjwlOI1S7mHkAhUQ1uhQCORKyioRYdfc/edit#slide=id.p), and I would appreciate if you have any comments/feedback. Thanks!
We can first implement the simplest version of keeping all the updates until requested to aggregate them. Then we can improve it by aggregating beforehand; for example, when the number...
Two strategies are possible for checkpoint: 1) Stop-the-world 2) Asynchronous We can easily notice a trade-off between performance and correctness.