Jianfei Chen

Results 10 comments of Jianfei Chen

Hi kyhhdm, The log likelihood is reported per token, i.e., it is divided by the number of tokens. This open-sourced version cannot be run in distributed mode. We do have...

The topic id is the id of the topic. You can find the corresponding topic in the corresponding line in .model, which is a row of counts, each of which...

It means the topic on line 4132 has only one word, word #49, assigned to it once. Notice however, the topic id may start from 0 (I forgot whether it...

Sorry, I forgot train.model is a vocabulary size * number of topics sparse matrix. So, you should look at the 4132-th column in train.model instead of the 4132-th row. Each...

ActNN is a lossy algorithm, so it is possible that it does not work with 2 bits for some models. Please try using more warmup iterations, and actnn.set_optimization_level("L2")

ActNN L0 does exactly the same thing with full precision training. Is the 2% accuracy loss within random error?

Could you print(model) before the training loop, and check if the model is correctly converted? ActNN converts nn.Modules with its own modules, and I noticed there are additional model converters...

That's strange. ActNN L0 and full precision training should have identical behavior. Could you try to debug by the following: 1. prepare a model checkpoint, and a batch of (data,...

This might be a bug in the converter function. @merrymercy could you take a look? A quicker fix is to disable the bias in conv layers. (set bias=None)

I use graphviz to generate the correlation graph based on model.cov.json and model.phi.json.