I modified lr=1e-3 (1e-4 in your paper) and found that the convergence speed is much faster. Similar phenomena also appeared in other code reproduction repositories, e.g, cr_1 and cr_2.
lr=1e-3
1e-4