bert-extractive-summarizer icon indicating copy to clipboard operation
bert-extractive-summarizer copied to clipboard

How can i apply your code for Chinese?

Open shaofengzeng opened this issue 4 years ago • 20 comments

Excuse me, can i use your code for Chinese...

shaofengzeng avatar Apr 16 '20 05:04 shaofengzeng

The only limitation right now for Chinese is that you would need a Bert Model and tokenizer that uses Chinese. If you have both the tokenizer and model, you can easily pass it in for summarization.

dmmiller612 avatar Apr 17 '20 00:04 dmmiller612

OK, thanks

shaofengzeng avatar Apr 17 '20 02:04 shaofengzeng

打扰一下,我可以用您的中文代码...

Hello, is the project about the application of ‘bert-extractive-summarizer’ applied to the Chinese abstract successful? I do n’t know how to modify it. I would like to ask.

1615070057 avatar May 22 '20 06:05 1615070057

OK, thanks

Have you ever tested this model on a Chinese dataset? It didn't work on my dataset and outputs nothing

BIRlz avatar May 22 '20 08:05 BIRlz

It would need a Chinese based bert model. I am not sure if the bert-multilingual model supports Chinese or not. This would need to be in the form of a huggingface transformer.

dmmiller612 avatar May 22 '20 13:05 dmmiller612

I have tried using bert-base-chinese model,but it outputs nothing. this is my code:

from transformers import *

# Load model, model config and tokenizer via Transformers
custom_config = AutoConfig.from_pretrained('bert-base-chinese')
custom_config.output_hidden_states=True
custom_tokenizer = AutoTokenizer.from_pretrained('bert-base-chinese')
custom_model = AutoModel.from_pretrained('bert-base-chinese', config=custom_config)

from summarizer import Summarizer

body = '这是一个测试句子'
model = Summarizer(custom_model=custom_model, custom_tokenizer=custom_tokenizer)
model(body)

ttxs69 avatar Jun 29 '20 03:06 ttxs69

I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well. Thanks @dmmiller612

ttxs69 avatar Jul 01 '20 02:07 ttxs69

I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well. Thanks @dmmiller612

Excuse me, where have you changed to use jieba rather than spacy? I can't find it, thank U

Bibabo-BUPT avatar Aug 07 '20 16:08 Bibabo-BUPT

I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well. Thanks @dmmiller612

Excuse me, where have you changed to use jieba rather than spacy? I can't find it, thank U

Sorry to reply so late. just change two lines code in sentence_handler.py https://github.com/dmmiller612/bert-extractive-summarizer/blob/f94c0243954171b2e5233d2624a8d2fcad1ea9ba/summarizer/sentence_handler.py#L3 change to

from spacy.lang.zh import Chinese

and https://github.com/dmmiller612/bert-extractive-summarizer/blob/f94c0243954171b2e5233d2624a8d2fcad1ea9ba/summarizer/sentence_handler.py#L8

change to

def __init__(self, language=Chinese):

and this code https://github.com/dmmiller612/bert-extractive-summarizer/issues/45#issuecomment-650879240 works well.

ttxs69 avatar Sep 25 '20 01:09 ttxs69

@ttxs69 Why is the final output of the Chinese original text after I modify the Chinese model according to your steps? Urgently want to know, hope can reply!

lmq990417 avatar Dec 16 '20 03:12 lmq990417

@ttxs69 Why is the final output of the Chinese original text after I modify the Chinese model according to your steps? Urgently want to know, hope can reply!

i just try and it can work after i follow the steps to change the two lines code, you can run step into model(body) for debug

jnkr36 avatar Dec 16 '20 03:12 jnkr36

@ttxs69 Ok, thanks,I will try. If it is convenient, could you please send me a copy of the code you run? My email address is [email protected].

lmq990417 avatar Dec 16 '20 03:12 lmq990417

@jnkr36 I'm sorry that I read the wrong name this morning. First of all, thank you very much for replying to me. I'm a little urgent now, but I can't find the mistake, so I will try the method you said, at the same time if it is convenient, could you please send me a copy of the code you run? My email address is [email protected]! Thank you very much again

lmq990417 avatar Dec 16 '20 08:12 lmq990417

@jnkr36 I came again ! I just have a question that if you've downloaded zh_core_web_sm before.

lmq990417 avatar Dec 17 '20 08:12 lmq990417

@jnkr36 I came again ! I just have a question that if you've downloaded zh_core_web_sm before.

@jnkr36 I'm sorry that I read the wrong name this morning. First of all, thank you very much for replying to me. I'm a little urgent now, but I can't find the mistake, so I will try the method you said, at the same time if it is convenient, could you please send me a copy of the code you run? My email address is [email protected]! Thank you very much again

sorry for late response. i have sent you my project. please check you email. any other questions, we can talk again.

jnkr36 avatar Dec 17 '20 14:12 jnkr36

Just for convenience, I forked the repo and modified it as the suggestion above, it works nicely.

pip install git+https://github.com/FrontMage/bert-extractive-summarizer.git

FrontMage avatar May 13 '21 07:05 FrontMage

@FrontMage Hello! I've installed your modified fork, transformers, spacy 3.0.0 and downloaded zh_core_web_sm, then tried to run model as in ttxs69 snippet, but model generates empty output on Chinese sentences. Could you, please, provide more details on your setup?

tuzcsap avatar May 29 '21 16:05 tuzcsap

[email protected]

If it is convenient, could you please send me a copy of the code you run? My email address is [email protected] thanks

zhangsirf avatar Nov 30 '21 09:11 zhangsirf

@ttxs69 为什么我按照你的步骤修改了中文模型后最终输出的是中文原文? 急想知道,望能回复!

我只是尝试,在我按照步骤更改两行代码后它可以工作,您可以运行 step into model(body) 进行调试

If it is convenient, could you please send me a copy of the code you run? My email address is [email protected] thanks

zhangsirf avatar Nov 30 '21 09:11 zhangsirf

For the outputs is original text, I just found out that you need to change every sentence in your long text to a Chinese period.

ilingen avatar Mar 27 '22 08:03 ilingen