ABSA-PyTorch icon indicating copy to clipboard operation
ABSA-PyTorch copied to clipboard

关于数据处理问题

Open zhoucz97 opened this issue 3 years ago • 2 comments

您好,我发现您的data_utils.py中,build_tokenizer()函数中有text_left, _, text_right = [s.lower().strip() for s in lines[i].partition("$T$")]这行代码,

可是如果句子中有两个或以上的'$T$',比如ACL-14数据集test.raw的第一行就是:

$T$ to miss 3rd straight playoff game | The ... : $T$ will miss his third straight play ... .

那么得到的 text_left = ''; text_right = 'to miss 3rd straight playoff game | The ... : $T$ will miss his third straight play ... . '

text_right中还有一个$T$并未筛出来,请问是故意这样做的吗? 还是说这是为了方便处理的妥协之举?

希望作者能够答疑解惑~~

zhoucz97 avatar Mar 29 '21 14:03 zhoucz97

这是个bug 。。

songyouwei avatar Mar 30 '21 07:03 songyouwei

hhhh,那好吧,谢谢回复~~

zhoucz97 avatar Mar 30 '21 08:03 zhoucz97