pycorrector icon indicating copy to clipboard operation
pycorrector copied to clipboard

It should handle punctuation

Open yingshaoxo opened this issue 1 year ago • 2 comments

https://huggingface.co/raynardj/classical-chinese-punctuation-guwen-biaodian?text=%E6%88%91%E7%AB%99%E8%B5%B7%E8%BA%AB%E5%91%A8%E5%9B%B4%E9%BB%91%E6%BC%86%E6%BC%86%E7%9A%84%E5%9B%A0%E4%B8%BA%E7%8E%B0%E5%9C%A8%E6%98%AF%E6%B7%B1%E5%A4%9C%E6%8D%AE%E8%90%A8%E5%85%8B%E9%82%A3%E6%B7%B7%E8%9B%8B%E8%AF%B4%E7%9A%84%E5%A5%BD%E5%83%8F%E6%98%AF%E5%B7%B2%E7%BB%8F%E8%BF%87%E5%8E%BB3%E5%A4%A9%E4%BA%86%E5%90%A7%E9%83%BD%E4%B8%8D%E7%9F%A5%E9%81%93%E8%BF%99%E9%87%8C%E6%98%AF%E5%93%AA%E9%87%8C%E5%95%8A%E4%B8%8D%E4%BC%9A%E8%A2%AB%E8%90%A8%E5%85%8B%E8%BF%99%E6%B7%B7%E8%9B%8B%E5%BC%84%E5%88%B0%E5%A4%96%E5%9B%BD%E4%BA%86%E5%90%A7%E9%98%BF%E5%BC%A5%E9%99%80%E4%BD%9B%E4%B8%8A%E5%B8%9D%E4%BF%9D%E4%BD%91%E5%95%8A%E5%8F%AF%E5%88%AB%E7%9C%9F%E5%BC%84%E5%88%B0%E5%A4%96%E5%9B%BD%E6%9D%A5%E4%BA%86%E5%95%8A%E7%AE%97%E4%BA%86%E9%9D%A0%E9%82%A3%E4%BA%9B%E8%BF%98%E4%B8%8D%E7%9F%A5%E5%AD%98%E4%B8%8D%E5%AD%98%E5%9C%A8%E7%9A%84%E4%B8%9C%E8%A5%BF%E4%BF%9D%E4%BD%91%E8%BF%98%E4%B8%8D%E5%A6%82%E9%9D%A0%E8%87%AA%E5%B7%B1%E5%91%A2%E7%A5%9E%E8%84%91%E4%BD%A0%E5%B8%AE%E6%88%91%E6%9F%A5%E6%9F%A5%E6%88%91%E7%8E%B0%E5%9C%A8%E6%89%80%E5%9C%A8%E7%9A%84%E5%9C%B0%E6%96%B9%E5%A5%BD%E7%9A%84%E4%B8%BB%E4%BA%BA%E8%AF%B7%E7%A8%8D%E7%AD%89%E4%B8%BB%E4%BA%BA%E7%8E%B0%E5%9C%A8%E6%89%80%E5%9C%A8%E7%9A%84%E5%9C%B0%E6%96%B9%E6%98%AFW%E5%B8%82%E7%9A%84%E7%A7%83%E8%A7%92%E5%B1%B1%E4%B8%8A

yingshaoxo avatar Jun 29 '23 18:06 yingshaoxo

https://github.com/raynardj/yuan/issues/8

yingshaoxo avatar Jun 29 '23 18:06 yingshaoxo

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.(由于长期不活动,机器人自动关闭此问题,如果需要欢迎提问)

stale[bot] avatar Dec 27 '23 07:12 stale[bot]

可以参考:

from transformers import AutoTokenizer, BertForTokenClassification
from transformers import pipeline

TAG = "raynardj/classical-chinese-punctuation-guwen-biaodian"
ner = pipeline("ner",module.model,tokenizer=tokenizer)

model = BertForTokenClassification.from_pretrained(TAG)
tokenizer = AutoTokenizer.from_pretrained(TAG)

def mark_sentence(x: str):
    outputs = ner(x)
    x_list = list(x)
    for i, output in enumerate(outputs):
        x_list.insert(output['end']+i, output['entity'])
    return "".join(x_list)

shibing624 avatar Mar 20 '24 05:03 shibing624