wenhu chen

Results 8 issues of wenhu chen

I found the the splitting step is done after shuffle according to https://github.com/karpathy/neuraltalk2/blob/master/prepro.py (line 159), which means every time we get different splits for val/test set. Even after I removed...

Hi, I tried to reload the whole model by adding saver.restore(sess, "model/*") after sess.run(tf.global_variables_initializer()), I notice that every time it restored the model, the performance will be much lower. Could...

Do you have any plan to implement large vocabulary in blocks? I will appreciate if you could share it.

``` def init_extra_parameters(model, state): # May want to add skip_init later model.large_W_0_enc_approx_embdr = eval(state['weight_init_fn'])(state['large_vocab_source'], state['rank_n_approx'], -1, state['weight_scale'], model.rng) model.large_W_0_dec_approx_embdr = eval(state['weight_init_fn'])(state['large_vocab_target'], state['rank_n_approx'], -1, state['weight_scale'], model.rng) model.large_W2_dec_deep_softmax = eval(state['weight_init_fn'])(state['rank_n_approx'], state['large_vocab_target'], -1,...

I found the split of test/val is different from what is given in karpathy/neuraltalk2. According to their script https://github.com/karpathy/neuraltalk2/blob/master/coco/coco_preprocess.ipynb They tried to get first 5000 as val, 5000-10000 as test...

Hi, thanks for sharing your code. I'm curious whether this is the official code for the EMNLP best paper, I managed to get BLEU of ~52, which is lower than...

Hi there, Have you tested on lower-end gpus for the model? I tried the code and it doesn't seem to work on A6000, not to mention on A10, etc. Is...

Hi there, thanks for sharing gemma. But it seems that I can't reproduce the MATH 24% 4-shot accuracy. I'm only getting 20% now. Is there anyone trying to reproduce that?...

type:support
stat:awaiting internal