Zhiting Hu

Results 30 comments of Zhiting Hu

The transformer beam-search is adapted from the official implementation ([tensor2tensor](https://github.com/tensorflow/tensor2tensor)). Not sure how it can speed up. A possible way would be using a more efficient variant of transformer decoder...

To "produce a tensor with shape [bs, sl]" from `logits` and `sample_id`, you may use [`sequence_sparse_softmax_cross_entropy`](https://texar.readthedocs.io/en/latest/code/losses.html#texar.tf.losses.sequence_sparse_softmax_cross_entropy) and set ``` average_across_batch=False, average_across_timesteps=False, sum_over_batch=False, sum_over_timesteps=False ``` -- Another way of doing RL...

The code looks good. A reference code here (which is basically the same as what you wrote): https://github.com/asyml/texar/issues/147#issuecomment-489442414 2- it's not really necessary cuz you'd do the mask with `reduce_with_weights`

I couldn't see the why here. What's in the `fetches` here? ``` File "roc_rl_main_refacored.py", line 724, in _train_epoch rets = sess.run(fetches, feed_dict, options=run_opts) ``` If optimization (e.g,, `train_op`) is included:...

running `train_op` (in `fetches`) will consume GPU memory for gradient tensors. A quick test is to remove `train_op` from `fetches` and see if OOM is gone. If so, it means...

Removing `train_op` or using `tf.stop_gradient` is for debugging purpose -- to locate which portion of the code causes OOM. Once it's located and fixed, you do need to add back...

hmm... The OOM is caused by the optimization (backward pass). Gradients of `rl_loss_fine` and `loss_mle` should consume the same amount of memory, respectively. To verify this -- since you've tried...

> @ZhitingHu I really appreciate your help. > Yeah, that is a good test and actually I tried with just `loss==rl_loss_fine` and it threw the same error. Note that, I...

Glad to hear that! :) Could you briefly explain the cause of OOM, for future reference? Thanks

> Hi, > I tried running the code for the text style transfer example after reading the related paper(Unsupervised Text Style Transfer using Language Models as Discriminators) and I have...