bert-extractive-summarizer How can i apply your code for Chinese?

Excuse me, can i use your code for Chinese...

Apr 16 '20 05:04 shaofengzeng

The only limitation right now for Chinese is that you would need a Bert Model and tokenizer that uses Chinese. If you have both the tokenizer and model, you can easily pass it in for summarization.

Apr 17 '20 00:04 dmmiller612

OK, thanks

Apr 17 '20 02:04 shaofengzeng

打扰一下，我可以用您的中文代码...

Hello, is the project about the application of ‘bert-extractive-summarizer’ applied to the Chinese abstract successful? I do n’t know how to modify it. I would like to ask.

May 22 '20 06:05 1615070057

OK, thanks

Have you ever tested this model on a Chinese dataset? It didn't work on my dataset and outputs nothing

May 22 '20 08:05 BIRlz

It would need a Chinese based bert model. I am not sure if the bert-multilingual model supports Chinese or not. This would need to be in the form of a huggingface transformer.

May 22 '20 13:05 dmmiller612

I have tried using bert-base-chinese model,but it outputs nothing. this is my code:

from transformers import *

# Load model, model config and tokenizer via Transformers
custom_config = AutoConfig.from_pretrained('bert-base-chinese')
custom_config.output_hidden_states=True
custom_tokenizer = AutoTokenizer.from_pretrained('bert-base-chinese')
custom_model = AutoModel.from_pretrained('bert-base-chinese', config=custom_config)

from summarizer import Summarizer

body = '这是一个测试句子'
model = Summarizer(custom_model=custom_model, custom_tokenizer=custom_tokenizer)
model(body)

Jun 29 '20 03:06 ttxs69

I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well. Thanks @dmmiller612

Jul 01 '20 02:07 ttxs69

I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well. Thanks @dmmiller612

Excuse me, where have you changed to use jieba rather than spacy? I can't find it, thank U

Aug 07 '20 16:08 Bibabo-BUPT

I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well. Thanks @dmmiller612

Excuse me, where have you changed to use jieba rather than spacy? I can't find it, thank U

Sorry to reply so late. just change two lines code in sentence_handler.py https://github.com/dmmiller612/bert-extractive-summarizer/blob/f94c0243954171b2e5233d2624a8d2fcad1ea9ba/summarizer/sentence_handler.py#L3 change to

from spacy.lang.zh import Chinese

and https://github.com/dmmiller612/bert-extractive-summarizer/blob/f94c0243954171b2e5233d2624a8d2fcad1ea9ba/summarizer/sentence_handler.py#L8

change to

def __init__(self, language=Chinese):

and this code https://github.com/dmmiller612/bert-extractive-summarizer/issues/45#issuecomment-650879240 works well.

Sep 25 '20 01:09 ttxs69

@ttxs69 Why is the final output of the Chinese original text after I modify the Chinese model according to your steps? Urgently want to know, hope can reply！

Dec 16 '20 03:12 lmq990417

@ttxs69 Why is the final output of the Chinese original text after I modify the Chinese model according to your steps? Urgently want to know, hope can reply！

i just try and it can work after i follow the steps to change the two lines code, you can run step into model(body) for debug

Dec 16 '20 03:12 jnkr36

@ttxs69 Ok, thanks,I will try. If it is convenient, could you please send me a copy of the code you run? My email address is [email protected].

Dec 16 '20 03:12 lmq990417

@jnkr36 I'm sorry that I read the wrong name this morning. First of all, thank you very much for replying to me. I'm a little urgent now, but I can't find the mistake, so I will try the method you said, at the same time if it is convenient, could you please send me a copy of the code you run? My email address is [email protected]！ Thank you very much again

Dec 16 '20 08:12 lmq990417

@jnkr36 I came again ! I just have a question that if you've downloaded zh_core_web_sm before.

Dec 17 '20 08:12 lmq990417

@jnkr36 I came again ! I just have a question that if you've downloaded zh_core_web_sm before.

@jnkr36 I'm sorry that I read the wrong name this morning. First of all, thank you very much for replying to me. I'm a little urgent now, but I can't find the mistake, so I will try the method you said, at the same time if it is convenient, could you please send me a copy of the code you run? My email address is [email protected]！ Thank you very much again

sorry for late response. i have sent you my project. please check you email. any other questions, we can talk again.

Dec 17 '20 14:12 jnkr36

Just for convenience, I forked the repo and modified it as the suggestion above, it works nicely.

pip install git+https://github.com/FrontMage/bert-extractive-summarizer.git

May 13 '21 07:05 FrontMage

@FrontMage Hello! I've installed your modified fork, transformers, spacy 3.0.0 and downloaded zh_core_web_sm, then tried to run model as in ttxs69 snippet, but model generates empty output on Chinese sentences. Could you, please, provide more details on your setup?

May 29 '21 16:05 tuzcsap

[email protected]。

If it is convenient, could you please send me a copy of the code you run? My email address is [email protected] thanks

Nov 30 '21 09:11 zhangsirf

@ttxs69 为什么我按照你的步骤修改了中文模型后最终输出的是中文原文？急想知道，望能回复！

我只是尝试，在我按照步骤更改两行代码后它可以工作，您可以运行 step into model(body) 进行调试

If it is convenient, could you please send me a copy of the code you run? My email address is [email protected] thanks

Nov 30 '21 09:11 zhangsirf

For the outputs is original text, I just found out that you need to change every sentence in your long text to a Chinese period.

Mar 27 '22 08:03 ilingen

bert-extractive-summarizer bert-extractive-summarizer copied to clipboard

How can i apply your code for Chinese?

bert-extractive-summarizer
bert-extractive-summarizer copied to clipboard