glint
glint copied to clipboard
Yarn support
Hi rjagemen,
Could you please help me to review the request?
All codes are tested on online in my cluster environment.
Any question is welcome and appreciate your previous work.
Thanks
@rjagerman Could You Please help me to review the change. Thanks
Hi @batizty,
Thanks! This looks really nice! I haven't had the time yet to review it due to several projects and deadlines at work. I hope to review it some time next week.
Hi @rjagerman,
Understand.
And feature for yarn support is used in weibo.com(Maybe you have heard about this web site, maybe not, and it is top 5 website in China, similar twitter with more users in China). And it works well.
And I also developed some other features on Glint, which includes additional operations like Save and Load which could used to store and read quickly models in HDFS, and I believe it is useful for most of Glint Users who are working on Big Vector and Matrix Machine Learning.
If could, I wanna to be an contributor for Glint because it is very simple and stable for large scale Machine learning.
Thank you for your work on Glint.
Still haven't found the time to do it, too many deadlines unfortunately :-( I'll let you know when I get around to it.
Got it.
later I will send out another patch for Glint, which could be used to store all parameters into HDFS by nodes independently. And I have tested before, if you want to pull all weight vector/matrix which sizes is over 100m, it took about more than 30min. And I add an operation 'Save' to store the weights in parameter nodes, fortunately it took me less than 1min. I believe it is useful for others who will work on huge models.
Thanks.
Hi, @batizty I want to use Glint to store weights for machine learning algorithms, but it's too difficult to save weights to local file or hdfs file. fortunately, i found that you had met this problem and solved it, could you please send out your branch? Thanks.
Hi, @baukloze Sorry, I forgot this issue.
And could you please wait one or two days, I will send out my modification ASAP. Hope you like it.
By the way, @rjagerman my workmates and i have implemented basic ML algorithms based on Glint, but it is not stable enough now. When our data size reached to 1000B, and the matrix/vector width reached 500B, a lot of traffic load will cause some of AKKA nodes became Quarantined State. Any Suggestion or method to fix this problem?
@batizty ok, thanks.