wormhole
wormhole copied to clipboard
Readable model dump.
Hi,
Currently all learning methods in wormhole save resulted models in binary format. This is pretty well in cases of solving machine learning competitions, i.e training and predicting both using wormhole components. However in more general cases when we train the models offline and want to apply them in an online component (in our case it's a server running on JVM), the binary format results in some inconvenience. So a readable model output in text format (or other exchangeable format such as protobuf) is highly expected.
Thanks, Gang
I address the readable dump of DiFacto model by parsing the binary file saved via SaveModel, i.e Save
in KVStore and IVal AdaGradEntry in DiFacto.
Ideally we can abstract the Entry data and the internal storage in KVStore
using protobuf. This will make io implementations neat and make our model results exchangeable in various language and platforms.
So my proposal above is mainly related to ps-lite. I'll try it out and make a WIP pull request there.
yeah, that's good suggestion.
i'll add a tool to convert the binary model into an ascii format.
at the same time, i'm trying to refact fm into a separate repo called dmlc/difacto, with two major changes
- having a single machine multiple threads implementation, which should process data <100GB easily on a single machine. and also will be easy to have python/R bindings
- switch to the dev branch of ps-lite, which is a simplified version of the master branch. mxnet is using it now and it works well
i hope to get it done in a week.
Very nice, Look forward to the changes :)
Thanks and looking forward to the changes. : )
Any update on this?
I'm also interested in the refactor of ps-lite. It has no update for two months. So is it finalized?
@BaiGang "I address the readable dump of DiFacto model by parsing the binary file saved via SaveModel". Can you share me the parsing method? Thanks.
see dump.cc
@BaiGang @mli
When I dump the model to text format, I found original feature ids are converted into new ids (large numbers). If I want to keep the original feature ids in model, how do I make it work?
Thanks!
there is a revert key id function, I guess it is called in the data reader On Wed, Aug 24, 2016 at 3:37 AM Xiaoqiang Feng [email protected] wrote:
@BaiGang https://github.com/BaiGang @mli https://github.com/mli
When I dump the model to text format, I found original feature ids are converted into new ids (large numbers). If I want to keep the original feature ids in model, how do I make it work? Thanks!
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/dmlc/wormhole/issues/43#issuecomment-242022478, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZv4fOh8TDMC4sKo4x5hG9lwtbN_BU8ks5qjB8GgaJpZM4GkXpd .
@toughJack Maybe you should change code in localizer.h like this.
else if (sizeof(I) == 8) {
#pragma omp parallel for num_threads(nt_)
for (size_t i = 0; i < idx_size; ++i) {
//pair_[i].k = ReverseBytes(blk.index[i]);
pair_[i].k = blk.index[i];
pair_[i].i = i;
}
@formath @toughJack see issues/8
just comment //pair_[i].k = ReverseBytes(blk.index[i]);
will make ranges of servers imbalanced if your max key is small
you manually set the max_key, so the servers will only partition that key range On Wed, Aug 24, 2016 at 9:41 PM CNevd [email protected] wrote:
@formath https://github.com/formath @toughJack https://github.com/toughJack see issues/8 https://github.com/CNevd/Difacto_DMLC/issues/8 just comment //pair_[i].k = ReverseBytes(blk.index[i]); will make ranges of servers imbalanced if your max key is small
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dmlc/wormhole/issues/43#issuecomment-242279567, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZv4Z4OM_YImOreDvnID5CcrS-tfAyHks5qjRz-gaJpZM4GkXpd .
@mli yes:)
@CNevd Good suggestion. I always generate balanced uint64 feature id offline, so miss that. If max key is small, setting max_key is truly right.
@mli I noticed that you mentioned single machine multiple threads implementation of FM. "1. having a single machine multiple threads implementation, which should process data <100GB easily on a single machine. and also will be easy to have python/R bindings" I did not find any manual for single machine multiple threads version. I wonder whether it works ? If it works, how to set the relative parameters and run? Thanks
- just run multiple workers on the same machine
- try to use lbfgs implemented on dmlc/difacto
On Thu, Aug 25, 2016 at 2:11 AM, Xiaoqiang Feng [email protected] wrote:
@mli https://github.com/mli I noticed that you mentioned single machine multiple threads implementation of FM. "1. having a single machine multiple threads implementation, which should process data <100GB easily on a single machine. and also will be easy to have python/R bindings" I did not find any manual for single machine multiple threads version. I wonder whether it works ? If it works, how to set the relative parameters and run? Thanks
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dmlc/wormhole/issues/43#issuecomment-242325406, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZv4RX6j348wdvN1PUh2jIk4NMfh79Kks5qjVxHgaJpZM4GkXpd .