ABSA-PyTorch 关于数据处理问题

关于数据处理问题

Open zhoucz97 opened this issue 3 years ago • 2 comments

您好，我发现您的data_utils.py中，build_tokenizer()函数中有text_left, _, text_right = [s.lower().strip() for s in lines[i].partition("$T$")]这行代码，

可是如果句子中有两个或以上的'$T$'，比如ACL-14数据集test.raw的第一行就是：

$T$ to miss 3rd straight playoff game | The ... : $T$ will miss his third straight play ... .

那么得到的 text_left = ''; text_right = 'to miss 3rd straight playoff game | The ... : $T$ will miss his third straight play ... . '

text_right中还有一个$T$并未筛出来，请问是故意这样做的吗？还是说这是为了方便处理的妥协之举？

希望作者能够答疑解惑~~

Mar 29 '21 14:03 zhoucz97

这是个bug 。。

Mar 30 '21 07:03 songyouwei

hhhh，那好吧，谢谢回复~~

Mar 30 '21 08:03 zhoucz97