Katherine Crowson comments

Results 61 comments of


                                            Katherine Crowson

OOM eventually when using create_graph=True with BatchL2Grad

Apparently if I tell ESGD-M to do a Hessian-vector product *every step* instead of every ten for compute efficiency, I don't OOM anymore. Normally the graphs made with create_graph=True are...

About GumbelQuantization training

I would also like some clarity on the best KL weight for training from scratch (and whether it should be warmed up over time).

About GumbelQuantization training

> @borisdayma I personally don't think so. In the image reconstruction example from `usage.ipynb`, the discretion method of DALL-E is the `argmax` function... Here's one thought, if we keep every...

About GumbelQuantization training

> Hi @TomoshibiAkira , it is really a valuable discussion! May I know if you validate the performance of f=8 without Gumbel? Actually, I just want to see the effect...

The torchvision pretrained VGG-16 requires normalization of inputs and you do not do this

You should be using ImageNet statistics for any input because that's what VGG-16 was trained on, you should only use different statistics if you trained or fine-tuned VGG-16 on a...

the google drive link for pretrained model is invalid

> I accidentally wiped my google drive. Also a bit busy lately so going to take a while to regenerate these I still have the pretrained model downloaded, if you...

bug: exploding gradient?

It looks like L-BFGS took a bad step and was unable to recover. Unfortunately my L-BFGS implementation does not include a line search to guard against and reject bad steps....

Colab Port

Hi. I tried this and I'm not able to upload content/style images: I haven't used Colab before and don't really know how to start troubleshooting the issue.

Colab Port

Apparently the solution to the problem I encountered is to use Chrome instead of Safari.

What does --aux-image stands for ?

The auxiliary image allows you to specify an image which the rendering process is "drawn back to" during iteration. Technically it imposes an L2 penalty on the difference between the...