albert
albert copied to clipboard
Incorrect English alphabet in line 402 in create_pretraining_data.py
# Note(mingdachen):
# For foreign characters, we always treat them as a whole piece.
english_chars = set(list("abcdefghijklmnopqrstuvwhyz"))
the character h is listed twice.
Hi @FruVirus , I think 'h' should be replaced by 'x' as the code suggests.I have requested a PR on ALBERT .