PSSAttention
PSSAttention copied to clipboard
An error in utils.py
I observed that an error in "read" function about TNet(+AS). Specifically,
line 51,52 in utils.py:
words.append(t.strip(end))
target_words.append(t.strip(end))
if use t.strip()
, it will cause an error , such as 'nicki/n'.strip('/n')
the ouptut is 'icki' rather than 'nicki'
when I try to use t[:-2]
to replace t.strip()
:
words.append(t[:-2])
target_words.append(t[:-2])
I find the best accuracy in Twitter data set is only 72%-74%.
Thank you for finding this error in utils.py. Because this part of the code is directly from TNet (https://github.com/lixin4ever/TNet). I do not check this very carefully. However, I think the dropped performance originates in small data size. Due to the small data size, the random seed is really important. In our experiments, all random seeds are directly from TNet, too. If you want to obtain comparable performance in Twitter data, I suggest you should adjust random seed.
Thank you again for your careful inspection. I will fix this problem in the future. :-P
Hello, I'd connected with the author of TNet. Such pre-processing error indeed affects the final performances (1-2% accuracy drop). However, after discussing, considering this pre-processing error and the issue that theano is no longer maintained, he highly recommends to use ABSA-pytorch, a pytorch-based implementation of many ABSA models including TNet, for reproduction. And you can refer Issue4 in TNet for more information.
Thank for your reply. I am confused that this preprocessing error has such an impact on performance. Maybe it's caused by the small data size.