xlnet icon indicating copy to clipboard operation
xlnet copied to clipboard

TPU num_shards and num_replicas error

Open huiwudiyi opened this issue 5 years ago • 2 comments

when I use 'TPU v3-32 'and 'tf 1.13' to train xlnet ,it tell me a error. How can I fix it!

Found TPU system: tpu_system_metadata.py:121] *** Num TPU Cores: 8 tpu_system_metadata.py:122] *** Num TPU Workers: 1 tpu_system_metadata.py:124] *** Num TPU Cores Per Worker: 8 ... ValueError: TPUConfig.num_shards is not set correctly. According to TPU system metadata for Tensorflow master: num_replicas should be (8), got (32). For non-model-parallelism, num_replicas should be the total num of TPU cores in the system. For model-parallelism, the total number of TPU cores should be num_cores_per_replica * num_replicas. Please set it accordingly or leave it as None

huiwudiyi avatar Aug 12 '19 10:08 huiwudiyi

I have the same issue as well.

TianrenWang avatar Sep 16 '19 21:09 TianrenWang

you may refer to this one: https://github.com/zihangdai/xlnet/pull/239/files#diff-0

csarron avatar Dec 09 '19 22:12 csarron