xlnet
xlnet copied to clipboard
Using XLNetModel class for inference
Hi, I want to use the XLNetModel class just like in the example on the main page of this repo to get the embedding vectors (perform inference). I don't want it to train and neither fine-tune, just inference.
However, if I set the run configuration is_training=False
and is_finetune=False
(to my understanding this will stop the model from training), I am told that I'm missing the flags for mem_len, reuse_len, bi_data, clamp_len, same_length
. I have not seen an example which contains the proper values for these variables, nor do they seem like they're relevant for prediction only. What is the correct way to do inference with this model?
I thought it will work by setting is_training=False
but I'm still getting different results for the same input on each run which made me suspect that the model is training (or at least fine-tuning).
Thanks!
To answer your first question from a non-interpretive stance, see the code below that is run whenever you create a config:
def create_run_config(is_training, is_finetune, FLAGS):
kwargs = dict(
is_training=is_training,
use_tpu=FLAGS.use_tpu,
use_bfloat16=FLAGS.use_bfloat16,
dropout=FLAGS.dropout,
dropatt=FLAGS.dropatt,
init=FLAGS.init,
init_range=FLAGS.init_range,
init_std=FLAGS.init_std,
clamp_len=FLAGS.clamp_len)
if not is_finetune:
kwargs.update(dict(
mem_len=FLAGS.mem_len,
reuse_len=FLAGS.reuse_len,
bi_data=FLAGS.bi_data,
clamp_len=FLAGS.clamp_len,
same_length=FLAGS.same_length))
return RunConfig(**kwargs)
The mem_len, reuse_len, bi_data, clamp_len, same_length
flags are called regardless of whether is_training
is true, and they are called whenever not is_finetune
. So that is why the code requires it in your configuration.
I am not sure as to why this is the case, however. We should wait for an author to answer this, I suppose.
Thanks @kmeng01 , I already noticed it in the code. However I fail to understand the purpose of these flags and their connection to running inference. They seem to me more related to training than inference, which is why I don't understand why they're required... You're right, let's wait for an author to respond on this...
EDIT: OK, I found out that I can get the same vector consistently if I set the random seed of TensorFlow with: tf.compat.v1.random.set_random_seed(42)
This is quite strange, where is randomization used in inference and why? Usually the output of predicting should be consistent afaik
Don't forget that the get_pooled_out method uses a linear projection at the end of the pooling process, by default.
get_pooled_out(summary_type, use_summ_proj=True)
This adds an additional Dense layer. If you are using the default run configuration, it will randomly initialize the layer weights using a normal distribution. That is why you see different outputs each time.
You can turn this setting off.
@alexpnt Thank you!! This is exactly what I was missing! Could you please explain to me what is the use of this linear projection and what's the difference whether I use it or not? I see the output vector has the same dimensions regardless of the projection, so I don't fully understand its meaning... Also, what run configuration I should use so that the weights aren't initialized randomly? Thanks!
You can set the seed to a fixed value in order to avoid the randomness. The initializer is controled by the FLAGS.init
flag, which can be 'uniform' or 'normal'.
That additional layer is used to build rich features that you can use later.
I was using seq_out which does not contain any extra dense layer, but i still got different output with the same input when I load weights from given checkpoint. Is there any layers which is random initialized even if i load weights from given checkpoint?