Chiheon Kim

Results 4 comments of Chiheon Kim

Unfortunately, torchlars is only supported for machines with CUDA GPUs. It is specifically designed to speed up the computations in LARS optimizer by combining them into one GPU kernel.

In case of the released 1.3B model, total >5GB memory (1.3B * 4 bytes/per parameter) is required for placing the model on the GPU. Hence, I suspect that the model...

Hi xshaun, 1. Indeed, copy kernel can overlap with execution kernels (for reasonably new gpus). We used non-default stream for only copy kernels (since execution kernels does not need to...

This is indeed better and LGTM, but this will break the scripts when they are parsing topk, topp from config, in particular: https://github.com/kakaobrain/rq-vae-transformer/blob/c3de514d7c832f63eff333833b456c53a342e8c0/main_sampling_fid.py#L72-L76 https://github.com/kakaobrain/rq-vae-transformer/blob/c3de514d7c832f63eff333833b456c53a342e8c0/main_sampling_txt2img.py#L70-L74 This is due to that `args.topk`,...