sschoenholz comments

Results 18 comments of


sschoenholz

Shared weights in Neural Nets

Great question! Unfortunately, at the moment we don't have a mechanism for weight sharing. Right now, the best you can do is, as you describe, use the kernel function compute...

How to analyze transductive node classification

Hi there! Sorry for the delay. I'm not totally familiar with transductive learning in the GP setting. I will note that after the `stax.Aggregate` layer the kernel will be of...

Why is the standard deviation always within [0, 1] and why do I get negative or NaN covariance values?

Thanks for raising this question and for the clear repro. I haven't yet looked into the [0,1] issue, but I have investigated the NaNs. Note that for deep Erf networks...

Training dynamics with a custom metric

Hey Rylan, I'm not totally sure, but let's see if we can work something out. I wrote up a short note on my interpretation of your problem [here](https://drive.google.com/file/d/1Z7n11SLRzDDgShIj80pXmMRPnG65HWkN/view?usp=sharing), let me...

Memory Constraint (Approximation Available?)

Just to add to Jaehoon's reply, one thing we are interested in testing out is integration with the excellent GPyTorch package (https://gpytorch.ai/) which can scale GP inference to 1M+ datapoints....

NotImplementedError: When I use stax.DotGeneral

Great question! A few points. 1. You were on the right track with setting `is_gaussian=True`. Notice that `post_half` doesn't have any dense layers and so if the inputs to it...

Memory and computation efficiency problems for empirical NTK kernel

Thanks for taking the time to try out NT and raise this issue! I think it is likely that a layer-wise scheme for computing the NTK will be more memory...

Memory and computation efficiency problems for empirical NTK kernel

Thanks for following up! I've been digging into the code and profiling. While I don't have a solution yet, here are some comments on your investigations: 1. I think this...

Memory and computation efficiency problems for empirical NTK kernel

Ok! So I think I may have made some progress. I would like to understand why NT is slower than the sample you provided and then, separately, think about other...

Memory and computation efficiency problems for empirical NTK kernel

Thanks for adding the check! You're clearly correct and you've come up with a super clever method! For my own sanity, I'll have to do some digging to figure out...