PSSAttention icon indicating copy to clipboard operation
PSSAttention copied to clipboard

An error in utils.py

Open ipp123456 opened this issue 5 years ago • 4 comments

I observed that an error in "read" function about TNet(+AS). Specifically,

line 51,52 in utils.py: words.append(t.strip(end)) target_words.append(t.strip(end))

if use t.strip(), it will cause an error , such as 'nicki/n'.strip('/n') the ouptut is 'icki' rather than 'nicki' when I try to use t[:-2] to replace t.strip(): words.append(t[:-2]) target_words.append(t[:-2])

I find the best accuracy in Twitter data set is only 72%-74%.

ipp123456 avatar Sep 08 '19 03:09 ipp123456

Thank you for finding this error in utils.py. Because this part of the code is directly from TNet (https://github.com/lixin4ever/TNet). I do not check this very carefully. However, I think the dropped performance originates in small data size. Due to the small data size, the random seed is really important. In our experiments, all random seeds are directly from TNet, too. If you want to obtain comparable performance in Twitter data, I suggest you should adjust random seed.

tangjialong avatar Sep 27 '19 07:09 tangjialong

Thank you again for your careful inspection. I will fix this problem in the future. :-P

tangjialong avatar Sep 27 '19 07:09 tangjialong

Hello, I'd connected with the author of TNet. Such pre-processing error indeed affects the final performances (1-2% accuracy drop). However, after discussing, considering this pre-processing error and the issue that theano is no longer maintained, he highly recommends to use ABSA-pytorch, a pytorch-based implementation of many ABSA models including TNet, for reproduction. And you can refer Issue4 in TNet for more information.

tangjialong avatar Nov 19 '19 03:11 tangjialong

Thank for your reply. I am confused that this preprocessing error has such an impact on performance. Maybe it's caused by the small data size.

ipp123456 avatar Nov 20 '19 06:11 ipp123456