Vipula Rawte issues

Results 10 issues of


                                            Vipula Rawte

Visualize word and sentence attention weight as color coded in the paper

Hi, The code runs fine, thanks, and gives the accuracy results but I am trying to visualize the attention weights as shown in the paper (color-wise). Any suggestion on its...

does DocBERT freeze all BERT layers and add a fully connected layer on the top for classification?

@Ashutosh-Adhikari @achyudh (just tagged for quick reply) Thanks!

ImportError: cannot import name 'amp'

Hi, Can you please take a look at this [issue](https://github.com/NVIDIA/apex/issues/621)? Thanks!

utils.py [output_mode = regression]

Hi, I am trying to predict a score (float) using regression and I was wondering what changes need to be made in lines 254 and 258 in [utils.py](https://github.com/ThilinaRajapakse/pytorch-transformers-classification/blob/master/utils.py) Thanks!

ImportError: cannot import name 'amp'

Hi, I can import amp from /apex directory but not from any other location. I installed it using the following command: pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ but I...

mean vs identity pooling?

Hi, The paper describes four pooling functions: 1. Mean, 2. Identity, 3. Transformer, and 4. LSTM. I am confused between ```mean``` and ```identity```. I follow that ```mean``` means simply average...

backpropgation on chunks?

Hi, When the document chunks are fed to the data parallel model, how is the loss backpropagated? Is it for every chunk? Also, do you unfreeze and fine-tune for the...

Why the max_seq_length = 512 for XLNet?

Hi, Just a conceptual question: In the paper, it is mentioned that XLNet derives some parts from Transformer-XL which isn't limited to a fixed context but the hyperparameters section says...

Running train.py

I am using TensorFlow 0.12.1 and Python 2.7.12 (as mentioned) but I am still running into the following issue: :~/Downloads/NARRE-master/model$ python train.py Parameters: ALLOW_SOFT_PLACEMENT=True BATCH_SIZE=100 DROPOUT_KEEP_PROB=0.5 EMBEDDING_DIM=300 FILTER_SIZES=3 L2_REG_LAMBDA=0.001 LOG_DEVICE_PLACEMENT=False...

fine-tune text classification?

Hi, Can you provide sample code for fine-tuning Transformer-XL for classification task (Just like BERT)? Thanks!