SpamMessage
SpamMessage copied to clipboard
训练时报错
当我运行ValueError: dimension mismatch.py尝试训练时遇到
Traceback (most recent call last):
File "C:/work/py/78stars_SpamMessage-master/token_and_save_to_file.py", line 38, in
为解决这个问题,当我替换为: data = [d for d in map(jieba.cut, data)] 在运行test.py时候ValueError: dimension mismatch。多线程那里有什么问题,如果替换为单线程怎么写?
当我运行ValueError: dimension mismatch.py尝试训练时遇到 Traceback (most recent call last): File "C:/work/py/78stars_SpamMessage-master/token_and_save_to_file.py", line 38, in data = Pool().map(jieba.lcut, data) File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 657, in get raise self._value File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 431, in _handle_tasks put(task) File "C:\Users\yah\Anaconda3\lib\multiprocessing\connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "C:\Users\yah\Anaconda3\lib\multiprocessing\reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) TypeError: can't pickle _thread.RLock objects报错。 分析应该是这句代码问题data = Pool().map(jieba.lcut, data)
为解决这个问题,当我替换为: data = [d for d in map(jieba.cut, data)] 在运行test.py时候ValueError: dimension mismatch。多线程那里有什么问题,如果替换为单线程怎么写?
根据自己的需求,把主函数里的一些语句去掉注释后运行
当我运行ValueError: dimension mismatch.py尝试训练时遇到 Traceback (most recent call last): File "C:/work/py/78stars_SpamMessage-master/token_and_save_to_file.py", line 38, in data = Pool().map(jieba.lcut, data) File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 657, in get raise self._value File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 431, in _handle_tasks put(task) File "C:\Users\yah\Anaconda3\lib\multiprocessing\connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "C:\Users\yah\Anaconda3\lib\multiprocessing\reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) TypeError: can't pickle _thread.RLock objects报错。 分析应该是这句代码问题data = Pool().map(jieba.lcut, data)
为解决这个问题,当我替换为: data = [d for d in map(jieba.cut, data)] 在运行test.py时候ValueError: dimension mismatch。多线程那里有什么问题,如果替换为单线程怎么写?
python3.6 之后 多线程需要是一个 外部函数,不能直接在class 里面 运行jieba.lcut ,需要修改一下,当然也可以使用单线程
def cut_words(data):
return jieba.lcut(data)
if __name__ == '__main__':
# 多线程
pool = Pool(processes=6)
data = pool.map(cut_words, data)
save_tokenlization_result(data, target)
# 单线程
# data2words = []
# for words in data:
# temp = jieba.cut(words)
# data2words.append(temp)
# save_tokenlization_result(data2words, target)