SpamMessage icon indicating copy to clipboard operation
SpamMessage copied to clipboard

训练时报错

Open yahuuu opened this issue 5 years ago • 2 comments

当我运行ValueError: dimension mismatch.py尝试训练时遇到 Traceback (most recent call last): File "C:/work/py/78stars_SpamMessage-master/token_and_save_to_file.py", line 38, in data = Pool().map(jieba.lcut, data) File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 657, in get raise self._value File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 431, in _handle_tasks put(task) File "C:\Users\yah\Anaconda3\lib\multiprocessing\connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "C:\Users\yah\Anaconda3\lib\multiprocessing\reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) TypeError: can't pickle _thread.RLock objects报错。 分析应该是这句代码问题data = Pool().map(jieba.lcut, data)

为解决这个问题,当我替换为: data = [d for d in map(jieba.cut, data)] 在运行test.py时候ValueError: dimension mismatch。多线程那里有什么问题,如果替换为单线程怎么写?

yahuuu avatar Jan 06 '20 09:01 yahuuu

当我运行ValueError: dimension mismatch.py尝试训练时遇到 Traceback (most recent call last): File "C:/work/py/78stars_SpamMessage-master/token_and_save_to_file.py", line 38, in data = Pool().map(jieba.lcut, data) File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 657, in get raise self._value File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 431, in _handle_tasks put(task) File "C:\Users\yah\Anaconda3\lib\multiprocessing\connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "C:\Users\yah\Anaconda3\lib\multiprocessing\reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) TypeError: can't pickle _thread.RLock objects报错。 分析应该是这句代码问题data = Pool().map(jieba.lcut, data)

为解决这个问题,当我替换为: data = [d for d in map(jieba.cut, data)] 在运行test.py时候ValueError: dimension mismatch。多线程那里有什么问题,如果替换为单线程怎么写?

根据自己的需求,把主函数里的一些语句去掉注释后运行

zhangmin4215 avatar May 19 '20 07:05 zhangmin4215

当我运行ValueError: dimension mismatch.py尝试训练时遇到 Traceback (most recent call last): File "C:/work/py/78stars_SpamMessage-master/token_and_save_to_file.py", line 38, in data = Pool().map(jieba.lcut, data) File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 657, in get raise self._value File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 431, in _handle_tasks put(task) File "C:\Users\yah\Anaconda3\lib\multiprocessing\connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "C:\Users\yah\Anaconda3\lib\multiprocessing\reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) TypeError: can't pickle _thread.RLock objects报错。 分析应该是这句代码问题data = Pool().map(jieba.lcut, data)

为解决这个问题,当我替换为: data = [d for d in map(jieba.cut, data)] 在运行test.py时候ValueError: dimension mismatch。多线程那里有什么问题,如果替换为单线程怎么写?

python3.6 之后 多线程需要是一个 外部函数,不能直接在class 里面 运行jieba.lcut ,需要修改一下,当然也可以使用单线程

def cut_words(data):
    return jieba.lcut(data)

if __name__ == '__main__':
    # 多线程
    pool = Pool(processes=6)
    data = pool.map(cut_words, data)
    save_tokenlization_result(data, target)
    # 单线程 
    # data2words = []
    # for words in data:
    #     temp = jieba.cut(words)
    #     data2words.append(temp)
    # save_tokenlization_result(data2words, target)

hsipeng avatar Jan 15 '21 02:01 hsipeng