SentimentVAE
SentimentVAE copied to clipboard
Why the output from the reconstruction is always a single word?
I tried to run your code in a small size sample. Here is one line of the file train.csv
:
0,"if you enjoy service by someone who is as competent as he is personable , i would recommend corey kaplan highly . the time he has spent here has been very productive and working with him educational and enjoyable . i hope not to need him again though this is highly unlikely but knowing he is there if i do is very nice . by the way , i m not from el centro , ca . but scottsdale , az . "
which is supposed to be in right format. And I set the display_every
and print_every
to 2. Other hyper-parameters are remained the same. The output of the reconstruction is as following:
Sentences generated from encodings
Sentence 0: deal
Sentence 1: very
Sentence 2: Service,
Sentence 3: Mint,
Sentence 4: I
Sentence 5: Your
Sentence 6: people
Sentence 7: eyes.
Sentence 8: that
Sentence 9: Chinatown?
Sentence 10: said.
Sentence 11: I
Sentence 12: that
Sentence 13: atmosphere
Sentence 14: I
Sentence 15: First
Sentence 16: possible,
Sentence 17: recap.
Sentence 18: I
Sentence 19: I
Sentence 20: account.
Sentence 21: you
Sentence 22: I
Sentence 23: filthy
Sentence 24: very
Sentence 25: Will
Sentence 26: I
Sentence 27: yellow
I wonder if I did anything wrong or it supposed to be like this? Do you have any idea about it? Thanks in advance!
Here is the detailed information:
Config:
anneal_bias 6500
anneal_max 1.0
autoencoder True
batch_size 28
beam_size 16
conv_width 5,5,3
convolutional False
data_path /data/yelp
debug False
decoder_inputs True
decoding_noise 0.1
display_every 2
dropout_finish 13000
dropout_start 4000
encoder_birnn True
encoder_summary mean
gpu_id -1
group_length True
hidden_size 512
init_dropout 0.95
keep_fraction 1
label_emb_size 3
latent_size 64
learning_rate 0.001
length_penalty 100.0
load_file
max_epoch 10000
max_gen_length 50
max_grad_norm 5.0
max_length None
max_steps 9999999
mutinfo_weight 0.0
mutual_info True
num_layers 1
optimizer adam
print_every 2
save_every -1
save_file models/recent.dat
save_overwrite True
softmax_samples 1000
test_ll_samples 6
test_validation True
training True
use_labels False
val_ll_samples 3
validate_every 1
variational True
vocab_file vocab
word_dropout 0.5
word_emb_size 224
Loading vocabulary from pickle...
Vocabulary loaded, size: 3305
Loading train csv
Loading train data from pickle...
Training samples = 100
Loading validation csv
Loading validation data from pickle...
Validation samples = 100
Loading test csv
Loading test data from pickle...
Testing samples = 100
Thanks!
Hey, we never really managed to get this to work. If you would like to read the report on this we can send it to you via email.
Actually here it is, I just realized it is public. http://www.cs.nyu.edu/~akv245/inf/writeup.pdf
@vighneshbirodkar Thanks for reply. But in the Experiment section of the report, seems that you have generated plausible results by both VAE with KL-Divergence annealing and Variational autoencoder with mutual information. And I would like to reproduce the result.
Indeed I have read the report before I found this code. And I like your idea no matter it works practically.
The results there were from the Yelp Review dataset.
There is already code to pre-process the yelp review dataset and train with it. Let us know if you have any trouble.
I just notices that your dataset size is 100. You should really be training with the whole Yelp dataset
@vighneshbirodkar Yes, I did use part of the yelp data set to train and I did use the scripts to preprocess them. But I don't know why it can generate complete sentence with whole dataset but can't with part of it.
Here are the files I have tried to run the code with. Could you please give a try to see whether you can generate complete sentences? I am afraid I had done something wrong. Archive.zip
Because it just hasn't seen enough data to learn anything useful.
@vighneshbirodkar I can not run the whole data set with more than four millions lines on my macbook. This time I use the first 100,000 lines. But it still generates one word per sentence. I think it is not a matter about the size of the data. When the data set is small, it should learn something overfitting but it still can generate a complete sentence. I wonder whether you get the result in your report with exactly this repo? If there was newer version, could you please provide it to me? Thanks a lot!
The repo is exactly what we used in the report. Over-fitting won't work like that in this case because generation is done using beam search. Let me see if I can re run this with the whole dataset
@vighneshbirodkar Could you please also tried using, say, the head 100,000 lines of data, which is supposed to be able to train a not-so-bad model? In addition, may I ask how many GPUs you have used?
When I run the code, it says that IOError: [Errno 2] No such file or directory: 'data/yelp/vocab.0.970.pk'
Where can I find/generate this file?
@vighneshbirodkar
do you have a paper associated to this repo
@rylanchiu , do you meet with the problem of NAN loss in the training? How to solve it?
@wangyong1122 Seems yes. And I already gave up this repo. I don't think I can reproduce the result with this code.