albert icon indicating copy to clipboard operation
albert copied to clipboard

Probable error on line 306 in `create_pretraining_data.py` for albert

Open wjdghks950 opened this issue 3 years ago • 0 comments

https://github.com/google-research/albert/blob/932b41f0319fbef7efd069d5ff545e3358574e19/create_pretraining_data.py#L306

In line 306, there is appears to be a probable issue.

For random.randint(start, end), the method is end-inclusive.

So, when len(current_chunk) == 2, line 309 would stop at a single iteration.

While this may allow the model to incorporate the single leftover chunk (if it were to be enter the first elif statement in line 339), it will leave the single chunk out of training instances.

Please address this issue.

wjdghks950 avatar Jan 18 '22 14:01 wjdghks950