xlnet
xlnet copied to clipboard
num_predict flag
What is the significance of num_predict in terms of number of tokens to be predicted? My data mostly comprises small snippets of social media text, and a few larger comprehensions. Should I set num_predict less than 85 in pre-training?