linnan wang

Results 43 comments of linnan wang

get_configs(batch=1, latency="1.25ms-D", gpuType="GV100") ``` GPUNet( (network): Sequential( (stem: 2): Prologue( (net): Sequential( (0): Conv2d(3, 33, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): BatchNorm2d(33, eps=0.001, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU(inplace=True)...

get_configs(batch=1, latency="2.25ms-D", gpuType="GV100") ``` GPUNet( (network): Sequential( (stem: 2): PrologueLargeD( (net): Sequential( (0): Conv2d(3, 48, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU()...

That's correct. 30000x30000 double precision should go beyond of 6GB. I don't recommend testing double precision on TITAN. We normally use K40. You can try single, and it should delivery...

It can go any matrix size as long as you have enough host RAM (The current memory issue is due to the way I'm marking for finished tasks; however, it...

That codes have not yet been cleaned, and it involves a lot of manual operations to get it working. It would be a lot of easy if you implement RNN...

can you try a smaller case, e.g. 20000 x 20000?

okay. It is already very impressive to see 2*10^4 case working on GPU. The purpose of this library is to demonstrate a new system design to support large scale matrix...

Sorry just double check our experiments. Yes, we used NASBench for designing validation of meta-DNN. I think the discrepancy in evaluating the val_correlation is at "how you split the dataset"....

For CPU multi-threading, it depends on what CPU BLAS you link, and how you configure them. Please don't pay too much attention to CPU, this is a multiGPU BLAS. For...

good catch. I just merged ZGEMM a few seconds ago.