bert_for_corrector icon indicating copy to clipboard operation
bert_for_corrector copied to clipboard

文件编码有问题

Open Godlikemandyy opened this issue 5 years ago • 8 comments

你好,我试着跑了一下bert_corrector.py代码,发现文件编码的错误,具体如下: Traceback (most recent call last): File "D:/soft/bert_for_corrector/bert_corrector.py", line 73, in d = BertCorrector() File "D:/soft/bert_for_corrector/bert_corrector.py", line 23, in init tokenizer=bert_model_dir) File "D:\Anaconda3\Lib\site-packages\transformers\pipelines.py", line 2727, in pipeline framework = framework or get_framework(model) File "D:\Anaconda3\Lib\site-packages\transformers\pipelines.py", line 110, in get_framework model = AutoModel.from_pretrained(model) File "D:\Anaconda3\Lib\site-packages\transformers\modeling_auto.py", line 624, in from_pretrained pretrained_model_name_or_path, return_unused_kwargs=True, **kwargs File "D:\Anaconda3\Lib\site-packages\transformers\configuration_auto.py", line 330, in from_pretrained config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs) File "D:\Anaconda3\Lib\site-packages\transformers\configuration_utils.py", line 374, in get_config_dict config_dict = cls._dict_from_json_file(resolved_config_file) File "D:\Anaconda3\Lib\site-packages\transformers\configuration_utils.py", line 456, in _dict_from_json_file text = reader.read() File "D:\Anaconda3\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

请问一下,这个错误是因为模型文件编码导致的吗

Godlikemandyy avatar Oct 23 '20 07:10 Godlikemandyy

你好,我试着跑了一下bert_corrector.py代码,发现文件编码的错误,具体如下: Traceback (most recent call last): File "D:/soft/bert_for_corrector/bert_corrector.py", line 73, in d = BertCorrector() File "D:/soft/bert_for_corrector/bert_corrector.py", line 23, in init tokenizer=bert_model_dir) File "D:\Anaconda3\Lib\site-packages\transformers\pipelines.py", line 2727, in pipeline framework = framework or get_framework(model) File "D:\Anaconda3\Lib\site-packages\transformers\pipelines.py", line 110, in get_framework model = AutoModel.from_pretrained(model) File "D:\Anaconda3\Lib\site-packages\transformers\modeling_auto.py", line 624, in from_pretrained pretrained_model_name_or_path, return_unused_kwargs=True, **kwargs File "D:\Anaconda3\Lib\site-packages\transformers\configuration_auto.py", line 330, in from_pretrained config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs) File "D:\Anaconda3\Lib\site-packages\transformers\configuration_utils.py", line 374, in get_config_dict config_dict = cls._dict_from_json_file(resolved_config_file) File "D:\Anaconda3\Lib\site-packages\transformers\configuration_utils.py", line 456, in _dict_from_json_file text = reader.read() File "D:\Anaconda3\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

请问一下,这个错误是因为模型文件编码导致的吗

you can see the model file here location
Or get the models file path by reading the README file

thinks

tongchangD avatar Oct 23 '20 09:10 tongchangD

语言模型能重新训练,或者增量训练吗

Godlikemandyy avatar Oct 29 '20 07:10 Godlikemandyy

你好,我试着跑了一下bert_corrector.py代码,发现文件编码的错误,具体如下: Traceback (most recent call last): File "D:/soft/bert_for_corrector/bert_corrector.py", line 73, in d = BertCorrector() File "D:/soft/bert_for_corrector/bert_corrector.py", line 23, in init tokenizer=bert_model_dir) File "D:\Anaconda3\Lib\site-packages\transformers\pipelines.py", line 2727, in pipeline framework = framework or get_framework(model) File "D:\Anaconda3\Lib\site-packages\transformers\pipelines.py", line 110, in get_framework model = AutoModel.from_pretrained(model) File "D:\Anaconda3\Lib\site-packages\transformers\modeling_auto.py", line 624, in from_pretrained pretrained_model_name_or_path, return_unused_kwargs=True, **kwargs File "D:\Anaconda3\Lib\site-packages\transformers\configuration_auto.py", line 330, in from_pretrained config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs) File "D:\Anaconda3\Lib\site-packages\transformers\configuration_utils.py", line 374, in get_config_dict config_dict = cls._dict_from_json_file(resolved_config_file) File "D:\Anaconda3\Lib\site-packages\transformers\configuration_utils.py", line 456, in _dict_from_json_file text = reader.read() File "D:\Anaconda3\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

请问一下,这个错误是因为模型文件编码导致的吗

我也遇到了这个问题 ,请问具体是怎么解决的呢

guofengming11 avatar Jan 06 '21 07:01 guofengming11

你好,我试着跑了一下bert_corrector.py代码,发现文件编码的错误,具体如下: Traceback (most recent call last): File "D:/soft/bert_for_corrector/bert_corrector.py", line 73, in d = BertCorrector() File "D:/soft/bert_for_corrector/bert_corrector.py", line 23, in init tokenizer=bert_model_dir) File "D:\Anaconda3\Lib\site-packages\transformers\pipelines.py", line 2727, in pipeline framework = framework or get_framework(model) File "D:\Anaconda3\Lib\site-packages\transformers\pipelines.py", line 110, in get_framework model = AutoModel.from_pretrained(model) File "D:\Anaconda3\Lib\site-packages\transformers\modeling_auto.py", line 624, in from_pretrained pretrained_model_name_or_path, return_unused_kwargs=True, **kwargs File "D:\Anaconda3\Lib\site-packages\transformers\configuration_auto.py", line 330, in from_pretrained config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs) File "D:\Anaconda3\Lib\site-packages\transformers\configuration_utils.py", line 374, in get_config_dict config_dict = cls._dict_from_json_file(resolved_config_file) File "D:\Anaconda3\Lib\site-packages\transformers\configuration_utils.py", line 456, in _dict_from_json_file text = reader.read() File "D:\Anaconda3\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte 请问一下,这个错误是因为模型文件编码导致的吗

我也遇到了这个问题 ,请问具体是怎么解决的呢

bert_corrector.py里面初始化的时候只保留“bert_model_dir”这个变量,其他的不用

Godlikemandyy avatar Jan 06 '21 08:01 Godlikemandyy

你好,我试着跑了一下bert_corrector.py代码,发现文件编码的错误,具体如下: Traceback (most recent call last): File "D:/soft/bert_for_corrector/bert_corrector.py", line 73, in d = BertCorrector() File "D:/soft/bert_for_corrector/bert_corrector.py", line 23, in init tokenizer=bert_model_dir) File "D:\Anaconda3\Lib\site-packages\transformers\pipelines.py", line 2727, in pipeline framework = framework or get_framework(model) File "D:\Anaconda3\Lib\site-packages\transformers\pipelines.py", line 110, in get_framework model = AutoModel.from_pretrained(model) File "D:\Anaconda3\Lib\site-packages\transformers\modeling_auto.py", line 624, in from_pretrained pretrained_model_name_or_path, return_unused_kwargs=True, **kwargs File "D:\Anaconda3\Lib\site-packages\transformers\configuration_auto.py", line 330, in from_pretrained config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs) File "D:\Anaconda3\Lib\site-packages\transformers\configuration_utils.py", line 374, in get_config_dict config_dict = cls._dict_from_json_file(resolved_config_file) File "D:\Anaconda3\Lib\site-packages\transformers\configuration_utils.py", line 456, in _dict_from_json_file text = reader.read() File "D:\Anaconda3\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte 请问一下,这个错误是因为模型文件编码导致的吗

我也遇到了这个问题 ,请问具体是怎么解决的呢

bert_corrector.py里面初始化的时候只保留“bert_model_dir”这个变量,其他的不用

那模型load进去了吗

xin-w8023 avatar Jan 12 '21 07:01 xin-w8023

遇到了同样的问题 怎么解决的?

prozyworld avatar May 21 '21 02:05 prozyworld

遇到了同样的问题 怎么解决的?

我是ubuntu的系统,可能是文件格式不一致,你可以试着改改文件格式或修改代码

tongchangD avatar May 21 '21 03:05 tongchangD

语句改一下,改成self.model = pipeline('fill-mask', model=bert_model_dir, tokenizer=bert_model_dir)即可。应该是版本的问题。

DDzzxiaohongdou avatar Oct 13 '21 03:10 DDzzxiaohongdou