HeterSumGraph icon indicating copy to clipboard operation
HeterSumGraph copied to clipboard

你好,内存问题

Open Sniper970119 opened this issue 4 years ago • 6 comments

你好,由于需要读的文件太多,我这16g内存有点顶不住啊,请问可以修改一些什么可以降低一点内存消耗呢?(不是显存)

Sniper970119 avatar Jun 06 '20 13:06 Sniper970119

你好!我们强烈建议你在GPU服务器上运行我们的代码,并且最低有32g的内存。如果不行的话,你最好把数据集按照行分割撑小文件并且修改_ExampleSet_ (https://github.com/brxx122/HeterSumGraph/blob/master/module/dataloader.py) 来需要时再读取数据。 具体来说,把_readJson_() 从_init_() 移动到 get_example() 并且按照id来读取文件。你也可以自己实现类似batch的机制。此外别忘了把cache目录下已处理好的feature文件也相应划分好。

Hi! We strongly recommend you to run our code in a GPU server with a larger CPU memory (at least 32g). If not, you can split the original dataset by line to several smaller files and change the ExampleSet in https://github.com/brxx122/HeterSumGraph/blob/master/module/dataloader.py to lazily load the dataset on demand.
To be specific, move the readJson() from init() to get_example() and read files by ids. Also, you can use the mechanism like batch by yourself. Don't forget to split the processed feature files in cache directory and make them correspond to dataset files.

dqwang122 avatar Jun 23 '20 03:06 dqwang122

Hi! We strongly recommend you to run our code in a GPU server with a larger CPU memory (at least 32g). If not, you can split the original dataset by line to several smaller files and change the ExampleSet in https://github.com/brxx122/HeterSumGraph/blob/master/module/dataloader.py to lazily load the dataset on demand. To be specific, move the readJson() from init() to get_example() and read files by ids. Also, you can use the mechanism like batch by yourself. Don't forget to split the processed feature files in cache directory and make them correspond to dataset files.

tks,I try to split the test dataset and try to run evaluation.py, but after read all files, it appear error and stop.

log :

2020-06-23 15:29:10,197 INFO : Pytorch 1.5.0 2020-06-23 15:29:10,198 INFO : [INFO] Create Vocab, vocab path is cache/CNNDM\vocab 2020-06-23 15:29:10,425 INFO : [INFO] max_size of vocab was specified as 50000; we now have 50000 words. Stopping reading. 2020-06-23 15:29:10,425 INFO : [INFO] Finished constructing vocabulary of 50000 total words. Last word added: chaudhary 2020-06-23 15:29:10,698 INFO : [INFO] Loading external word embedding... 2020-06-23 15:30:43,961 INFO : [INFO] External Word Embedding iov count: 49699, oov count: 301 2020-06-23 15:30:44,299 INFO : Namespace(atten_dropout_prob=0.1, batch_size=8, bidirectional=True, blocking=False, cache_dir='cache/CNNDM', cuda=True, data_dir='data/CNNDM', doc_max_timesteps=50, embed_train=False, embedding_path='/remote/glove.txt', feat_embed_size=50, ffn_dropout_prob=0.1, ffn_inner_hidden_size=512, gcn_hidden_size=128, gpu='0', hidden_size=64, limited=False, log_root='log/', lstm_hidden_state=128, lstm_layers=2, m=3, model='HSG', n_feature_size=128, n_head=8, n_iter=1, n_layers=1, recurrent_dropout_prob=0.1, save_label=False, save_root='save/', sent_max_len=100, test_model='evalcnndm.ckpt', use_orthnormal_init=True, use_pyrouge=False, vocab_size=50000, word_emb_dim=300, word_embedding=True) 2020-06-23 15:30:44,511 INFO : [MODEL] HeterSumGraph 2020-06-23 15:30:44,511 INFO : [INFO] Start reading ExampleSet 2020-06-23 15:30:44,609 INFO : [INFO] Finish reading ExampleSet. Total time is 0.097944, Total size is 1 2020-06-23 15:30:44,609 INFO : [INFO] Loading filter word File cache/CNNDM\filter_word.txt 2020-06-23 15:30:45,045 INFO : [INFO] Loading word2sent TFIDF file from cache/CNNDM\test.w2s.tfidf_small.jsonl! 2020-06-23 15:30:56,121 INFO : [INFO] Use cuda 2020-06-23 15:30:56,121 INFO : [INFO] Decoding... 2020-06-23 15:30:56,127 INFO : [INFO] Restoring evalcnndm.ckpt for testing...The path is save/eval\cnndm.ckpt 2020-06-23 15:30:56,830 INFO : [Model] Sequence Labeling!

error: ` Traceback (most recent call last): File "", line 1, in File "\torch\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "\torch\lib\multiprocessing\spawn.py", line 114, in _main prepare(preparation_data) File "\torch\lib\multiprocessing\spawn.py", line 225, in prepare _fixup_main_from_path(data['init_main_from_path']) File "torch\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path run_name="mp_main") File "\torch\lib\runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "\torch\lib\runpy.py", line 96, in run_module_code mod_name, mod_spec, pkg_name, script_name) File "\torch\lib\runpy.py", line 85, in run_code exec(code, run_globals) File "\HeterSumGraph-master\evaluation.py", line 26, in import torch File "\torch\lib\site-packages\torch_init.py", line 81, in ctypes.CDLL(dll) File "\torch\lib\ctypes_init.py", line 348, in init Traceback (most recent call last): File "/HeterSumGraph-master/evaluation.py", line 242, in main() File "/HeterSumGraph-master/evaluation.py", line 239, in main run_test(model, dataset, loader, hps.test_model, hps) File "/HeterSumGraph-master/evaluation.py", line 85, in run_test for i, (G, index) in enumerate(loader): File "torch\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter return _MultiProcessingDataLoaderIter(self) File "\torch\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init w.start() File "\torch\lib\multiprocessing\process.py", line 105, in start self._popen = self._Popen(self) File "\torch\lib\multiprocessing\context.py", line 223, in _Popen self._handle = _dlopen(self._name, mode) OSError: [WinError 1455] 页面文件太小,无法完成操作。 return _default_context.get_context().Process._Popen(process_obj) File "\torch\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "\torch\lib\multiprocessing\popen_spawn_win32.py", line 65, in init reduction.dump(process_obj, to_child) File "\torch\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) BrokenPipeError: [Errno 32] Broken pipe

Process finished with exit code 1 `

what should I do? Is this means poor in memory? but it just take up about 60% memory and maybe 10% cuda memory.

Sniper970119 avatar Jun 23 '20 07:06 Sniper970119

根据错误信息,应该是把单个例子当作单独的文件太小了从而导致的错误。我想你应该把比如2000个例子合成一个chunk来当作一份文件。你可以参考别人的做法 https://github.com/nlpyang/PreSumm/blob/master/src/prepro/data_builder.py

According to the error message, the error may be caused by making an example as one file, which is too small for the multiprocessing function. I think that a chunk of examples (like 2000) may be more suitable for one file. You can refer to others' preprocessing code here https://github.com/nlpyang/PreSumm/blob/master/src/prepro/data_builder.py

dqwang122 avatar Jul 31 '20 15:07 dqwang122

但是我找了一个32G电脑(没有切割数据集),依然是类似的报错,是只能在linux下运行吗?

Sniper970119 avatar Aug 01 '20 03:08 Sniper970119

Hi! We strongly recommend you to run our code in a GPU server with a larger CPU memory (at least 32g). If not, you can split the original dataset by line to several smaller files and change the ExampleSet in https://github.com/brxx122/HeterSumGraph/blob/master/module/dataloader.py to lazily load the dataset on demand. To be specific, move the readJson() from init() to get_example() and read files by ids. Also, you can use the mechanism like batch by yourself. Don't forget to split the processed feature files in cache directory and make them correspond to dataset files.

tks,I try to split the test dataset and try to run evaluation.py, but after read all files, it appear error and stop.

log :

2020-06-23 15:29:10,197 INFO : Pytorch 1.5.0 2020-06-23 15:29:10,198 INFO : [INFO] Create Vocab, vocab path is cache/CNNDM\vocab 2020-06-23 15:29:10,425 INFO : [INFO] max_size of vocab was specified as 50000; we now have 50000 words. Stopping reading. 2020-06-23 15:29:10,425 INFO : [INFO] Finished constructing vocabulary of 50000 total words. Last word added: chaudhary 2020-06-23 15:29:10,698 INFO : [INFO] Loading external word embedding... 2020-06-23 15:30:43,961 INFO : [INFO] External Word Embedding iov count: 49699, oov count: 301 2020-06-23 15:30:44,299 INFO : Namespace(atten_dropout_prob=0.1, batch_size=8, bidirectional=True, blocking=False, cache_dir='cache/CNNDM', cuda=True, data_dir='data/CNNDM', doc_max_timesteps=50, embed_train=False, embedding_path='/remote/glove.txt', feat_embed_size=50, ffn_dropout_prob=0.1, ffn_inner_hidden_size=512, gcn_hidden_size=128, gpu='0', hidden_size=64, limited=False, log_root='log/', lstm_hidden_state=128, lstm_layers=2, m=3, model='HSG', n_feature_size=128, n_head=8, n_iter=1, n_layers=1, recurrent_dropout_prob=0.1, save_label=False, save_root='save/', sent_max_len=100, test_model='evalcnndm.ckpt', use_orthnormal_init=True, use_pyrouge=False, vocab_size=50000, word_emb_dim=300, word_embedding=True) 2020-06-23 15:30:44,511 INFO : [MODEL] HeterSumGraph 2020-06-23 15:30:44,511 INFO : [INFO] Start reading ExampleSet 2020-06-23 15:30:44,609 INFO : [INFO] Finish reading ExampleSet. Total time is 0.097944, Total size is 1 2020-06-23 15:30:44,609 INFO : [INFO] Loading filter word File cache/CNNDM\filter_word.txt 2020-06-23 15:30:45,045 INFO : [INFO] Loading word2sent TFIDF file from cache/CNNDM\test.w2s.tfidf_small.jsonl! 2020-06-23 15:30:56,121 INFO : [INFO] Use cuda 2020-06-23 15:30:56,121 INFO : [INFO] Decoding... 2020-06-23 15:30:56,127 INFO : [INFO] Restoring evalcnndm.ckpt for testing...The path is save/eval\cnndm.ckpt 2020-06-23 15:30:56,830 INFO : [Model] Sequence Labeling!

error: ` Traceback (most recent call last): File "", line 1, in File "\torch\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "\torch\lib\multiprocessing\spawn.py", line 114, in _main prepare(preparation_data) File "\torch\lib\multiprocessing\spawn.py", line 225, in prepare _fixup_main_from_path(data['init_main_from_path']) File "torch\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path run_name="mp_main") File "\torch\lib\runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "\torch\lib\runpy.py", line 96, in run_module_code mod_name, mod_spec, pkg_name, script_name) File "\torch\lib\runpy.py", line 85, in run_code exec(code, run_globals) File "\HeterSumGraph-master\evaluation.py", line 26, in import torch File "\torch\lib\site-packages\torch_init.py", line 81, in ctypes.CDLL(dll) File "\torch\lib\ctypes_init.py", line 348, in init Traceback (most recent call last): File "/HeterSumGraph-master/evaluation.py", line 242, in main() File "/HeterSumGraph-master/evaluation.py", line 239, in main run_test(model, dataset, loader, hps.test_model, hps) File "/HeterSumGraph-master/evaluation.py", line 85, in run_test for i, (G, index) in enumerate(loader): File "torch\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter return _MultiProcessingDataLoaderIter(self) File "\torch\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init w.start() File "\torch\lib\multiprocessing\process.py", line 105, in start self._popen = self._Popen(self) File "\torch\lib\multiprocessing\context.py", line 223, in _Popen self._handle = _dlopen(self._name, mode) OSError: [WinError 1455] 页面文件太小,无法完成操作。 return _default_context.get_context().Process._Popen(process_obj) File "\torch\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "\torch\lib\multiprocessing\popen_spawn_win32.py", line 65, in init reduction.dump(process_obj, to_child) File "\torch\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) BrokenPipeError: [Errno 32] Broken pipe

Process finished with exit code 1 `

what should I do? Is this means poor in memory? but it just take up about 60% memory and maybe 10% cuda memory.

你好,请问关于数据集切分这个问题你解决了没有,可以参考下你的代码吗

GongShuai8210 avatar Mar 13 '21 12:03 GongShuai8210

Hi! We strongly recommend you to run our code in a GPU server with a larger CPU memory (at least 32g). If not, you can split the original dataset by line to several smaller files and change the ExampleSet in https://github.com/brxx122/HeterSumGraph/blob/master/module/dataloader.py to lazily load the dataset on demand. To be specific, move the readJson() from init() to get_example() and read files by ids. Also, you can use the mechanism like batch by yourself. Don't forget to split the processed feature files in cache directory and make them correspond to dataset files.

tks,I try to split the test dataset and try to run evaluation.py, but after read all files, it appear error and stop. log : 2020-06-23 15:29:10,197 INFO : Pytorch 1.5.0 2020-06-23 15:29:10,198 INFO : [INFO] Create Vocab, vocab path is cache/CNNDM\vocab 2020-06-23 15:29:10,425 INFO : [INFO] max_size of vocab was specified as 50000; we now have 50000 words. Stopping reading. 2020-06-23 15:29:10,425 INFO : [INFO] Finished constructing vocabulary of 50000 total words. Last word added: chaudhary 2020-06-23 15:29:10,698 INFO : [INFO] Loading external word embedding... 2020-06-23 15:30:43,961 INFO : [INFO] External Word Embedding iov count: 49699, oov count: 301 2020-06-23 15:30:44,299 INFO : Namespace(atten_dropout_prob=0.1, batch_size=8, bidirectional=True, blocking=False, cache_dir='cache/CNNDM', cuda=True, data_dir='data/CNNDM', doc_max_timesteps=50, embed_train=False, embedding_path='/remote/glove.txt', feat_embed_size=50, ffn_dropout_prob=0.1, ffn_inner_hidden_size=512, gcn_hidden_size=128, gpu='0', hidden_size=64, limited=False, log_root='log/', lstm_hidden_state=128, lstm_layers=2, m=3, model='HSG', n_feature_size=128, n_head=8, n_iter=1, n_layers=1, recurrent_dropout_prob=0.1, save_label=False, save_root='save/', sent_max_len=100, test_model='evalcnndm.ckpt', use_orthnormal_init=True, use_pyrouge=False, vocab_size=50000, word_emb_dim=300, word_embedding=True) 2020-06-23 15:30:44,511 INFO : [MODEL] HeterSumGraph 2020-06-23 15:30:44,511 INFO : [INFO] Start reading ExampleSet 2020-06-23 15:30:44,609 INFO : [INFO] Finish reading ExampleSet. Total time is 0.097944, Total size is 1 2020-06-23 15:30:44,609 INFO : [INFO] Loading filter word File cache/CNNDM\filter_word.txt 2020-06-23 15:30:45,045 INFO : [INFO] Loading word2sent TFIDF file from cache/CNNDM\test.w2s.tfidf_small.jsonl! 2020-06-23 15:30:56,121 INFO : [INFO] Use cuda 2020-06-23 15:30:56,121 INFO : [INFO] Decoding... 2020-06-23 15:30:56,127 INFO : [INFO] Restoring evalcnndm.ckpt for testing...The path is save/eval\cnndm.ckpt 2020-06-23 15:30:56,830 INFO : [Model] Sequence Labeling! error: Traceback (most recent call last): File "", line 1, in File "\torch\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "\torch\lib\multiprocessing\spawn.py", line 114, in _main prepare(preparation_data) File "\torch\lib\multiprocessing\spawn.py", line 225, in prepare _fixup_main_from_path(data['init_main_from_path']) File "torch\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path run_name="mp_main") File "\torch\lib\runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "\torch\lib\runpy.py", line 96, in run_module_code mod_name, mod_spec, pkg_name, script_name) File "\torch\lib\runpy.py", line 85, in run_code exec(code, run_globals) File "\HeterSumGraph-master\evaluation.py", line 26, in import torch File "\torch\lib\site-packages\torch_init.py", line 81, in ctypes.CDLL(dll) File "\torch\lib\ctypes_init.py", line 348, in init Traceback (most recent call last): File "/HeterSumGraph-master/evaluation.py", line 242, in main() File "/HeterSumGraph-master/evaluation.py", line 239, in main run_test(model, dataset, loader, hps.test_model, hps) File "/HeterSumGraph-master/evaluation.py", line 85, in run_test for i, (G, index) in enumerate(loader): File "torch\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter return _MultiProcessingDataLoaderIter(self) File "\torch\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init w.start() File "\torch\lib\multiprocessing\process.py", line 105, in start self._popen = self._Popen(self) File "\torch\lib\multiprocessing\context.py", line 223, in _Popen self._handle = _dlopen(self._name, mode) OSError: [WinError 1455] 页面文件太小,无法完成操作。 return _default_context.get_context().Process._Popen(process_obj) File "\torch\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "\torch\lib\multiprocessing\popen_spawn_win32.py", line 65, in init reduction.dump(process_obj, to_child) File "\torch\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) BrokenPipeError: [Errno 32] Broken pipe Process finished with exit code 1 what should I do? Is this means poor in memory? but it just take up about 60% memory and maybe 10% cuda memory.

Hello, have you solved the problem of data set segmentation? Can you refer to your code?

See my repo ext-sum

JainitBITW avatar Mar 21 '24 06:03 JainitBITW