Chiheon Kim comments

Results 4 comments of


                                            Chiheon Kim

REQ: pre compiled wheel file for torchlars package

Unfortunately, torchlars is only supported for machines with CUDA GPUs. It is specifically designed to speed up the computations in LARS optimizer by combining them into one GPU kernel.

How much VRAM is needed for this?

In case of the released 1.3B model, total >5GB memory (1.3B * 4 bytes/per parameter) is required for placing the model on the GPU. Hence, I suspect that the model...

Issues on torchgpipe project and paper

Hi xshaun, 1. Indeed, copy kernel can overlap with execution kernels (for reasonably new gpus). We used non-default stream for only copy kernels (since execution kernels does not need to...

make it safer input argument type check in sampling method

This is indeed better and LGTM, but this will break the scripts when they are parsing topk, topp from config, in particular: https://github.com/kakaobrain/rq-vae-transformer/blob/c3de514d7c832f63eff333833b456c53a342e8c0/main_sampling_fid.py#L72-L76 https://github.com/kakaobrain/rq-vae-transformer/blob/c3de514d7c832f63eff333833b456c53a342e8c0/main_sampling_txt2img.py#L70-L74 This is due to that `args.topk`,...