bi-att-flow icon indicating copy to clipboard operation
bi-att-flow copied to clipboard

how to handle large context by Machine Comprehension(bi-att-flow) &Resource Exhausted error

Open rubby33 opened this issue 7 years ago • 6 comments

I have a test file, whose name is mytest1.json. The context is a large text, which has so many words in it.

when i run the folowing: basic/run_single.sh $HOME/data/squad/mytest1.json single.json

some errors happen, do you guys have an idea about how to solve this... Thanks so much.

`File "/home/weijiang/bi-att-flow/inference/main.py", line 29, in main eval_data = _forward(config, data, shared) File "/home/weijiang/bi-att-flow/inference/main.py", line 88, in _forward models = get_multi_gpu_models(config) File "/home/weijiang/bi-att-flow/inference/model.py", line 19, in get_multi_gpu_models model = Model(config, scope, rep=gpu_idx == 0) File "/home/weijiang/bi-att-flow/inference/model.py", line 58, in init self._build_forward() File "/home/weijiang/bi-att-flow/inference/model.py", line 164, in _build_forward p0 = attention_layer(config, self.is_train, h, u, h_mask=self.x_mask, u_mask=self.q_mask, scope="p0", tensor_dict=self.tensor_dict) File "/home/weijiang/bi-att-flow/inference/model.py", line 421, in attention_layer u_a, h_a = bi_attention(config, is_train, h, u, h_mask=h_mask, u_mask=u_mask, tensor_dict=tensor_dict) File "/home/weijiang/bi-att-flow/inference/model.py", line 398, in bi_attention is_train=is_train, func=config.logit_func, scope='u_logits') # [N, M, JX, JQ] File "/home/weijiang/bi-att-flow/my/tensorflow/nn.py", line 127, in get_logits new_arg = args[0] * args[1] File "/home/weijiang/anaconda2/envs/tensorflow-0.11-py3.5/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 751, in binary_op_wrapper return func(x, y, name=name) File "/home/weijiang/anaconda2/envs/tensorflow-0.11-py3.5/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 910, in _mul_dispatch return gen_math_ops.mul(x, y, name=name) File "/home/weijiang/anaconda2/envs/tensorflow-0.11-py3.5/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1519, in mul result = _op_def_lib.apply_op("Mul", x=x, y=y, name=name) File "/home/weijiang/anaconda2/envs/tensorflow-0.11-py3.5/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 749, in apply_op op_def=op_def) File "/home/weijiang/anaconda2/envs/tensorflow-0.11-py3.5/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2380, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/weijiang/anaconda2/envs/tensorflow-0.11-py3.5/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1298, in init self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,1,34680,7,200] [[Node: model_0/main/p0/bi_attention/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](model_0/main/p0/bi_attention/Tile, model_0/main/p0/bi_attention/Tile_1)]] [[Node: model_0/main/g2/BW/BW/Assert/AssertGuard/Assert/Switch/_333 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_4307_model_0/main/g2/BW/BW/Assert/AssertGuard/Assert/Switch", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/cpu:0"]]`

rubby33 avatar Apr 07 '17 08:04 rubby33

In fact, my question is: " how to handle large context by Machine Comprehension(bi-att-flow)". I think this is a very interested problem. Thanks.

rubby33 avatar Apr 07 '17 09:04 rubby33

Interesting question,

Does your code work with smaller chunks of text like the ones used for training? I just want to make sure your code works.

Recurrent layers tend to give memory problems when dealing with large dependencies. Here are a few solutions:

  • Split your text into smaller chunks like the ones used for training. Then you won’t get OOM (Out of memory). The model was trained with the following flags to control input size and memory usage, you can find those flags in cli.py:
# Thresholds for speed and less memory usage
flags.DEFINE_integer("word_count_th", 10, "word count th [100]")
flags.DEFINE_integer("char_count_th", 50, "char count th [500]")
flags.DEFINE_integer("sent_size_th", 400, "sent size th [64]")
flags.DEFINE_integer("num_sents_th", 8, "num sents th [8]")
flags.DEFINE_integer("ques_size_th", 30, "ques size th [32]")
flags.DEFINE_integer("word_size_th", 16, "word size th [16]")
flags.DEFINE_integer("para_size_th", 256, "para size th [256]")
  • Get a machine with more memory.
  • Reduce memory usage by changing the floating point precision. I think the model was trained using FP32 so you could try with FP16 but this depends on hardware as well.
  • Change the architecture to use an external memory and overcome the memory bottleneck.

Do you have any other ideas @seominjoon ?

webeng avatar Apr 07 '17 10:04 webeng

@webeng You are right that you can control those params to avoid OOM during training, but I think he is using pre-trained model and only testing.

Looking at the error log, I am assuming that you have single example and your context is 34680 words. Unfortunately, you can only fit in around 60*500 words, so it might be a little over limit.

Easiest thing you can do is you can run this on CPU. Then you won't have memory error (though will take a longer time..)

Another way is that you can arbitrarily split the context into a few chunks (for this example, split by 2 seems to be enough), and copy the question for each chunk so that you have the same question for all chunks. Then you can consider each pair as independent question, and run the inference (batch_size 1).

Then, for you output, you will have two answers with confidence levels. You can compare the answers and take more confident one. The only caveat here is that, for confidence score, you shouldn't use the the probability output in answer folder because it is locally normalized via softmax. Instead, you will need to use logits, which are unnormalized. These are outputted in eval folder.

Of course, these are easiest ways without modifying the code. You can modify the code to use multi gpus, etc. but I think this will be more difficult.

I will leave this issue for possible feature in future.

seominjoon avatar Apr 07 '17 18:04 seominjoon

Good idea to use the unnormalised logits before the softmax 👍

webeng avatar Apr 07 '17 19:04 webeng

Thanks. @webeng code work well with smaller chunks of text like the ones used for training. @seominjoon yes, the context has 34680 words.

Thanks for the good idea.

By the way, how can I know it can fit around 60*500 words? By the parameter defined in the cli.py?

rubby33 avatar Apr 09 '17 11:04 rubby33

@rubby33 It is a trial-and-error thing, and I usually figure it out by looking at the memory usage and whether it gives OOM. 30k is a rough estimate.

seominjoon avatar Apr 10 '17 20:04 seominjoon