sequence_tagging icon indicating copy to clipboard operation
sequence_tagging copied to clipboard

Why use training, validation and test set to build vocabulary in build_data.py ? Isn't it information leak ?

Open zhihuacc opened this issue 3 years ago • 0 comments

I'm not sure but I think including test set to build vocabulary is spilling information from test set to training set. If using test set was OK, then why using only training set to build character set in build_data.py ?

zhihuacc avatar Mar 22 '21 16:03 zhihuacc