neural-editor
neural-editor copied to clipboard
why training is always killed without any error information
Hi,
My training is always killed without any error information like below.
uncomitted changes being stored as patches
New TrainingRun created at: /data/edit_runs/7
Optimized batches: reduced cost from 45709568 (naive) to 20758016 (0.545871533942% reduction).
Optimal (batch_size=1) would be 20741962.
Passed batching test
Streaming training examples: 6%|5 | 399/7032 [48:47<12:31:31, 6.80s/it]Killed
I am encountering this issue as well. Running with the edit_logp
config, the process is consistently killed at the same point with the following output:
[localhost] local: wc -l /data/yelp_dataset_large_split/train.tsv
Reading data file.: 20%|#############4
Reading data file.: 26%|#################3
Killed
The same issue is occurring with other configs as well.
I have the same issue. Training is consistently killed.
[localhost] local: wc -l /data/onebillion_split/train.tsv Reading data file.: 17%|##############1 | 582582/3506331 [02:43<19:10:00, 42.37it/s]Reading data file.: 17%|##############5 | 594704/3506331 [02:44<39:10, 1238.52it/s] Killed
Looks like this is a memory issue. I ran on my cluster and it ran fine.
@yamsgithub hello,do you config this project by running "run_docker.py"? Because some network reasons, I can not run it successfully. I install all packages one by one and encounter an issue about git like this:
Traceback (most recent call last):
File "textmorph/edit_model/main.py", line 34, in
It seems like path problem.However this issue is still existing after I create this master folder in refs/heads/
@Vonzpf Yes. I am following the instructions as per the README and didn't have any issues. However without gpu the training has been running for 3 days now and is about 36% complete. So I would recommend using gpus. Hopefully it is faster. This is on the one billion text.
@yamsgithub did you load any other modules besides pytorch python when you ran the code on the cluster?
@luciay I just used the docker which setup all the dependencies. I didn't have to install anything else except docker on my machine.
@luciay if you are running on a cluster I would recommend creating a virtual environment and let the docker install all packages in that env.
@yamsgithub Thank you! I had solved that problem luckily. This project need git to record the code's state. I initialize the repo at my folder "/neural-editor/", but I forgot to add and commit the code. So I just need using "git add ." and "git commit" at folder "/neural-editor/" to solve the problem.
@yamsgithub I spoke with @luciay and she shared her batch script which runs on the prince cluster with Singularity instead of Docker on CPU. I then made some modifications so it runs with GPU on the Prince cluster. You can see my fork here -> https://github.com/JackLangerman/neural-editor
Hope this helps people!