Shaojie Bai

Results 11 comments of Shaojie Bai

Hi @polo5 , Thanks for your interest in our paper! Yes, MDEQ-Large does take a few hours to finish all epochs, so your calculation is correct. However, I should note...

Also re: error: I haven't encountered this error for this repo before, but I'll check for sure for PyTorch 1.10!

对的,同时你可以把长度相似的sequence preprocess然后group在一起,这样pad的数量可以少一些

Hello @polo5, Thanks for your interest in our repo and DEQ! To begin with, we want to caution that "constant memory cost" is constant w.r.t. the number of layers. That...

Also, I want to add that in Anderson we usually keep `m=5` or `m=6`, which is usually significantly smaller than the number of solver iterations (e.g., 25 in DEQ-Transformer).

Hi @polo5, 1. Interesting observation on MDEQ...! I didn't know that you can achieve the same accuracy on CIFAR-10 with just 1 iteration but it's likely closely related to the...

Thanks for the pointers. Where can I find the documentations for `THCudaTensor` in PyTorch? Also, why don't we need `im2col` anymore? While the input tensor is `L x C` now...

Simply changing it to 0 worked for me!

Oh, we flatten each image to 1D. For example, a 28x28 image is converted to a 784x1 sequence (i.e., length 784). So each "time step" to the TCN is essentially...