albert icon indicating copy to clipboard operation
albert copied to clipboard

Incorrect English alphabet in line 402 in create_pretraining_data.py

Open tonyzhao6 opened this issue 6 years ago • 1 comments

  # Note(mingdachen):
  # For foreign characters, we always treat them as a whole piece.
  english_chars = set(list("abcdefghijklmnopqrstuvwhyz"))

the character h is listed twice.

tonyzhao6 avatar Nov 16 '19 17:11 tonyzhao6

Hi @FruVirus , I think 'h' should be replaced by 'x' as the code suggests.I have requested a PR on ALBERT .

abhilash1910 avatar Dec 12 '19 18:12 abhilash1910