chimera icon indicating copy to clipboard operation
chimera copied to clipboard

Memory Explosion

Open pedrocolon93 opened this issue 6 years ago • 10 comments

Hi there,

I have been trying to run this for the past couple of weeks, and it seems one needs a beefy computer to run through the process. In the reader.py code, where the multiprocessing is used, at some point in the planning, I believe the system goes either into a very deep recursion, or it explores too many branches, and eventually, the system runs out of memory. I've tried this on a machine with 64gb of ram and it eats that up too. The problem occurs around iteration 890 in the create_plans method of the DataReader class. I'm going to try and debug this to see if I can limit depth or size or run time as lack of a feasible plan. A small memory optimization is to replace Pool with ThreadPool.

pedrocolon93 avatar Apr 29 '19 15:04 pedrocolon93

So sorry about that! You are correct, I am running this on a beefy server...

Going over plans in an iterator is possible, but isn't needed because Here are 2 possible solutions:

  • In the following line, you can change is_parallel to False. This will only plan one graph at a time, so if you don't have too large of a graph, it would work, however, will take long (for example for the WebNLG test set, 1.5 hours) https://github.com/AmitMY/chimera/blob/8f6df059868b52245267a327ad296f9bb1b72d67/planner/naive_planner.py#L10
  • More possible solution, that would work on any graph size, and will take 0 seconds per graph is to use the NeuralPlanner instead. https://github.com/AmitMY/chimera/blob/master/planner/neural_planner.py This planner is for ongoing research on how to make online-planning, and how to avoid the need for "experts" to score plans. On WebNLG the current version of this planner performs as-good in terms of automatic metrics, but haven't been tested in terms oh human evaluation. There will be updates to it in the weeks to come! To switch to the Neural Planner, you need to uncomment this line: https://github.com/AmitMY/chimera/blob/8f6df059868b52245267a327ad296f9bb1b72d67/main.py#L45 and change the config in this line: https://github.com/AmitMY/chimera/blob/8f6df059868b52245267a327ad296f9bb1b72d67/main.py#L48 Also, you need to remove the directory cache/WebNLG/planner if it exists, for the new planner to initialize.

The planner takes around 15 seconds to train on my machine. The inference time for 1,900 graphs of various sizes is 7 seconds on my machine.

The Neural Planner is built in such a way it can't create a wrong plan (it has to follow a structure, and has to include every fact from the graph).


Feel free to suggest any improvements to the naive planner / neural planner. or let me know if anything is not working for you, and I'll sort it out.

AmitMY avatar Apr 29 '19 19:04 AmitMY

Awesome! I'm testing it out with the neural planner and i'll get back eventually on what happens!

pedrocolon93 avatar Apr 29 '19 20:04 pedrocolon93

Just to make sure you know, you don't need to restart the entire thing for the change to take effect, only remove cache/WebNLG/planner, so your translation model, and pre-processing is still cached.

AmitMY avatar Apr 29 '19 21:04 AmitMY

Yup I noticed! I tried to put it into the server to see if I could visualize it, but the neural planner is missing the score method implementation. I'm guessing its missing the score method because it just generates the plan

pedrocolon93 avatar Apr 30 '19 20:04 pedrocolon93

Yeah... still need to do that part. That is part of why it is not documented - ongoing research. And in the neural planner’s case, getting the “best” plan is very fast, but scoring all plans would probably be slow

On Apr 30, 2019, at 23:46, Pedro [email protected] wrote:

Yup I noticed! I tried to put it into the server to see if I could visualize it, but the neural planner is missing the score method implementation

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

AmitMY avatar Apr 30 '19 21:04 AmitMY

Yeah... still need to do that part. That is part of why it is not documented - ongoing research. And in the neural planner’s case, getting the “best” plan is very fast, but scoring all plans would probably be slow On Apr 30, 2019, at 23:46, Pedro @.***> wrote: Yup I noticed! I tried to put it into the server to see if I could visualize it, but the neural planner is missing the score method implementation — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

If it gets the best plan, then thats the highest score (and if there were a way to vectorize the plans then just scale the score according to the closest plan vectors). Some other thoughts are that maybe something could be done like the discriminator in a GAN (just some shower thoughts)

pedrocolon93 avatar Apr 30 '19 21:04 pedrocolon93

So sorry about that! You are correct, I am running this on a beefy server...

Going over plans in an iterator is possible, but isn't needed because Here are 2 possible solutions:

  • In the following line, you can change is_parallel to False. This will only plan one graph at a time, so if you don't have too large of a graph, it would work, however, will take long (for example for the WebNLG test set, 1.5 hours) https://github.com/AmitMY/chimera/blob/8f6df059868b52245267a327ad296f9bb1b72d67/planner/naive_planner.py#L10

  • More possible solution, that would work on any graph size, and will take 0 seconds per graph is to use the NeuralPlanner instead. https://github.com/AmitMY/chimera/blob/master/planner/neural_planner.py This planner is for ongoing research on how to make online-planning, and how to avoid the need for "experts" to score plans. On WebNLG the current version of this planner performs as-good in terms of automatic metrics, but haven't been tested in terms oh human evaluation. There will be updates to it in the weeks to come! To switch to the Neural Planner, you need to uncomment this line: https://github.com/AmitMY/chimera/blob/8f6df059868b52245267a327ad296f9bb1b72d67/main.py#L45 and change the config in this line: https://github.com/AmitMY/chimera/blob/8f6df059868b52245267a327ad296f9bb1b72d67/main.py#L48

    Also, you need to remove the directory cache/WebNLG/planner if it exists, for the new planner to initialize.

The planner takes around 15 seconds to train on my machine. The inference time for 1,900 graphs of various sizes is 7 seconds on my machine.

The Neural Planner is built in such a way it can't create a wrong plan (it has to follow a structure, and has to include every fact from the graph).

Feel free to suggest any improvements to the naive planner / neural planner. or let me know if anything is not working for you, and I'll sort it out.

Also, even if you set parallel to False, you still have the memory explosion at some point in the process, so I guess filtering the graph somehow to remove redundancies or make it smaller will have to be a thing. I'll see if I can come up with something.

pedrocolon93 avatar Apr 30 '19 23:04 pedrocolon93

Ok. So, I'll explain what I think a solution can be, but I don't have time to test this until the end of NAACL (mid-June) - Though I did implement it.

The "best_plan" planning is done in 3 stages:

  1. Create a tree of possible traversals over the graph. While possibly large, shouldn't be a problem in terms of memory.
  2. linearize the tree into every traversal in that tree, and create a string for it. This is very expensive.
  3. score all plans and choose the best one

Step 1 is not a problem. Step 2 however can be change to yield instead of return, such that in the following code: https://github.com/AmitMY/chimera/blob/8f6df059868b52245267a327ad296f9bb1b72d67/utils/graph.py#L74-L85 In lines 76, 80, 83, and 85 instead of doing a list with parenthesis, like: https://github.com/AmitMY/chimera/blob/e939eac253b8a418356fd49a4d2bddcfdf358139/utils/graph.py#L74-L85 I have done the same for StructuredNode, which is a tiny bit more complex, but same idea.

Step 3 now becomes the problem, as it does not work with generators: https://github.com/AmitMY/chimera/blob/8f6df059868b52245267a327ad296f9bb1b72d67/planner/naive_planner.py#L31-L36

So I changed it as well, to the most basic implementation of O(1) memory MAX function: https://github.com/AmitMY/chimera/blob/b69194ee70be9f9977ea415fee1b59f1f3e71bbf/planner/naive_planner.py#L34-L41

If you want to test this, it is on the development branch, but it is not stable (results will be lower than what you expect, or it just won't work). If this is not pressing for you, I would wait, but if it is, you can also just take the last 3 commits to that branch, which are only these changes.

AmitMY avatar May 01 '19 12:05 AmitMY

I have implemented a fast scorer for the neural planner. Around 500~ plans a second on average, which depends on how long they are. I will push that in the next few days, so you don't need to copy all the above code :)

AmitMY avatar May 03 '19 17:05 AmitMY

I havent had a chance to test it, but awesome!

pedrocolon93 avatar May 07 '19 16:05 pedrocolon93