beyondguo
                                            beyondguo
                                        
                                    我觉得可以使用验证集作为本地的测试集,然后自己构造验证集,进行实验研究。如果方法真正有效的话,再丢到系统上进行测试,应该也是有效的。
就是没有test标签,才能保证排行榜公平嘛。
read through the paper, I didn’t find what w2v embedding other models(such as LSTM,CNN) are using. It is amazing that SWEM -ave can achieve better results than LSTM or CNN...
read through the paper, I didn’t find what w2v embedding other models(such as LSTM,CNN) are using. It is amazing that SWEM -ave can achieve better results than LSTM or CNN...
Thanks for replying so fast. In paper [Character-level Convolutional Networks for TextClassification] where the results of LSTM/bag-of-means/CNNs are reported, **I couldn't find the evidence that they were using Glove. Actually,...
Update: I found that using `bert-base-multilingual-uncased` will be fine: ```python text = '咋就不行了?' context_aug = naw.ContextualWordEmbsAug( model_path='bert-base-multilingual-uncased', action="substitute") augmented_text = context_aug.augment(text) print("Original:") print(text) print("Augmented Text:") print(augmented_text) >>>>>>>> Original: 咋就不行了? Augmented...
For ppl, I strongly suggest your to rewrite their perplexity.py. - for Llama, you should use LlamaTokenizer - rewriting the init function of the Perplexity class. The current version will...
## Usage: ``` from ppl import Perplexity ppl = Perplexity(model_id='your_model_path') texts = ['asdfasdf','apxl wndo aslewr sdf', 'hello world'] ppl._compute(texts) ``` `ppl.py` as follows: ```python import datasets import numpy as np...
I dont understand why should we split train and test set in unsupervised mode?
