OpenNMT-py
OpenNMT-py copied to clipboard
Calculating number of train steps with batch type tokens -- potential perf issues?
Hi,
I am using batch_type tokens with max_generator size =2.
I am confused about computing number of train steps with batch type as tokens.
Can someone guide me into it?
Not sure what you mean. Do you want to know the relation between steps and epochs? If yes, that's not necessarily straightforward, especially in batch_type tokens.
Thanks for your quick reply.
Right, I meant the relation between steps and epochs.
I have a corpus with 3M sentence pairs. The average sentence length is ~50. I am trying out batch size with batch_type tokens as 12k. Does that mean in each step, ~240 sentences are being processed?
Currently a 1 Layer LSTM is running very slow on this data. Can you suggest some optimization?
I have a corpus with 3M sentence pairs. The average sentence length is ~50. I am trying out batch size with batch_type tokens as 12k. Does that mean in each step, ~240 sentences are being processed?
That sounds about right. Especially with the pooling mechanism that makes the batches very homogeneous (few padding). You can check with the logs if it makes sense. (In the legacy version you'll see which shards are loading, and in 2.0 you'll see explicitly which corpus is loaded.à
Currently a 1 Layer LSTM is running very slow on this data. Can you suggest some optimization?
You'll have to give more details here. What is your task? Your command line? What machine are you using (cpu/gpu/memory)? What usage do you see (cpu util/memory, gpu util/memory)?
Sure. I am trying to run a seq2seq autoencoder on a recommendation dataset, where each token in a user's activity.
I am using 1 Tesla V100 GPU with 32GB Memory. The config is :
"overwrite: True
src_vocab_size : 500000
tgt_vocab_size : 500000
share_vocab: True
share_vocab: True
share_embeddings: True
max_generator_batches: 2
word_vec_size : 128
batch_size: 24576
pool_factor : 1000
batch_type: "tokens"
valid_batch_size: 16
rnn_size: 100
learning_rate : 0.15
layers: 1
dropout: 0
optim: "adam"
normalization: "tokens"
#adagrad_accumulator_init: 0.1
max_grad_norm: 2
copy_attn: 'true'
global_attention: 'mlp'
reuse_copy_attn: 'true'
bridge: 'true'
seed: 42
# # Where to save the checkpoints
save_model: user_modelling_data/run_bart/model
save_checkpoint_steps: 1000
train_steps: 100000
valid_steps: 1000
"
Currently, with this config, 26GB out of 32 GB is being used.
Not sure it'll change a lot but you probably want to set bucket_size
to a big value, like 200,000.
What is the approximate GPU utilization? CPU utilization? (If there are two threads constantly at 100%, that might be the bottleneck.) What kind of token/s speed do you have in the logs? What's the approx. wall time for 100 or 1000 steps?