xlnet icon indicating copy to clipboard operation
xlnet copied to clipboard

Using XLNetModel class for inference

Open OmriPi opened this issue 5 years ago • 6 comments

Hi, I want to use the XLNetModel class just like in the example on the main page of this repo to get the embedding vectors (perform inference). I don't want it to train and neither fine-tune, just inference. However, if I set the run configuration is_training=False and is_finetune=False (to my understanding this will stop the model from training), I am told that I'm missing the flags for mem_len, reuse_len, bi_data, clamp_len, same_length. I have not seen an example which contains the proper values for these variables, nor do they seem like they're relevant for prediction only. What is the correct way to do inference with this model? I thought it will work by setting is_training=False but I'm still getting different results for the same input on each run which made me suspect that the model is training (or at least fine-tuning).

Thanks!

OmriPi avatar Jul 22 '19 12:07 OmriPi

To answer your first question from a non-interpretive stance, see the code below that is run whenever you create a config:

def create_run_config(is_training, is_finetune, FLAGS):
  kwargs = dict(
      is_training=is_training,
      use_tpu=FLAGS.use_tpu,
      use_bfloat16=FLAGS.use_bfloat16,
      dropout=FLAGS.dropout,
      dropatt=FLAGS.dropatt,
      init=FLAGS.init,
      init_range=FLAGS.init_range,
      init_std=FLAGS.init_std,
      clamp_len=FLAGS.clamp_len)

  if not is_finetune:
    kwargs.update(dict(
        mem_len=FLAGS.mem_len,
        reuse_len=FLAGS.reuse_len,
        bi_data=FLAGS.bi_data,
        clamp_len=FLAGS.clamp_len,
        same_length=FLAGS.same_length))

  return RunConfig(**kwargs)

The mem_len, reuse_len, bi_data, clamp_len, same_length flags are called regardless of whether is_training is true, and they are called whenever not is_finetune. So that is why the code requires it in your configuration.

I am not sure as to why this is the case, however. We should wait for an author to answer this, I suppose.

kmeng01 avatar Jul 22 '19 20:07 kmeng01

Thanks @kmeng01 , I already noticed it in the code. However I fail to understand the purpose of these flags and their connection to running inference. They seem to me more related to training than inference, which is why I don't understand why they're required... You're right, let's wait for an author to respond on this...

EDIT: OK, I found out that I can get the same vector consistently if I set the random seed of TensorFlow with: tf.compat.v1.random.set_random_seed(42) This is quite strange, where is randomization used in inference and why? Usually the output of predicting should be consistent afaik

OmriPi avatar Jul 23 '19 09:07 OmriPi

Don't forget that the get_pooled_out method uses a linear projection at the end of the pooling process, by default.

get_pooled_out(summary_type, use_summ_proj=True)

This adds an additional Dense layer. If you are using the default run configuration, it will randomly initialize the layer weights using a normal distribution. That is why you see different outputs each time.

You can turn this setting off.

alexpnt avatar Jul 26 '19 11:07 alexpnt

@alexpnt Thank you!! This is exactly what I was missing! Could you please explain to me what is the use of this linear projection and what's the difference whether I use it or not? I see the output vector has the same dimensions regardless of the projection, so I don't fully understand its meaning... Also, what run configuration I should use so that the weights aren't initialized randomly? Thanks!

OmriPi avatar Jul 28 '19 10:07 OmriPi

You can set the seed to a fixed value in order to avoid the randomness. The initializer is controled by the FLAGS.init flag, which can be 'uniform' or 'normal'.

That additional layer is used to build rich features that you can use later.

alexpnt avatar Aug 02 '19 13:08 alexpnt

I was using seq_out which does not contain any extra dense layer, but i still got different output with the same input when I load weights from given checkpoint. Is there any layers which is random initialized even if i load weights from given checkpoint?

Syrup274 avatar Oct 19 '19 06:10 Syrup274