Roman Novak comments

Results 80 comments of


Roman Novak

very large memory footprint for a simple UNet

FYI, we've just added `FanInConcat` support in c48505230f70fa01f820661a9e9827bae2caec50! Two caveats: 1) When concatenating along the channel/feature axis, a Dense or Convolutional layer is required afterwards (so you can't have [assuming...

very large memory footprint for a simple UNet

FYI, 100aface242be5cee01352e59a122f99680d65b8 altered tensor layout and this _might_ reduce the TPU memory footprint; in general, there's still work to do to fully eliminate padding, and GPUs are much recommended (see...

very large memory footprint for a simple UNet

FYI, we have finally added `ConvTranspose` in https://github.com/google/neural-tangents/commit/780ad0ce22d482bcefd12f4d3390090de7206da5!

Question: difference between kernel_fn from stax.serial and nt.empirical_ntk_fn(apply_fn)?

Welcome! This is a good channel - we also have https://github.com/google/neural-tangents/discussions, either place is OK. `emp_ntk_kernel_fn` is the finite-width kernel function; it returns the outer product of Jacobians of `apply_fn`...

Softmax layers?

Unfortunately there are several ways in which softmax doesn't play nice in the infinite-width settings that we consider in neural tangents: 1) Suppose you have a softmax over the infinite-width...

Softmax layers?

> And what I was doing before basically gives the traces of the matrices that I expect. Actually, judging by the results I'm getting, does it return the "normalized" traces,...

Question - custom layer (Dense layer without bias and with custom initialization)

Hey Eduard, 1) To not have a bias term, you could simply set `stax.Dense(..., b_std=0.)` when constructing the dense layer. 2) For custom weight init, I suggest adapting the `stax.Dense`...

Question - custom layer (Dense layer without bias and with custom initialization)

It looks OK to me, but again without seeing the fully-runnable code snippet it's hard to say for sure. A few comments: 1) you could try `jax.nn.log_sigmoid` for better numerical...

Sparsely Connected Layers

1) AFAIK `nt.linearize` should work with any function, so if you have a sparse model of signature `f(params, x)`, it should work. I don't know however if JAX has good...

Sparsely Connected Layers

(Sorry for late reply, was on vacation) Ah I see! If you have a layer that has an infinite number of inputs per node eventually (e.g. a dense layer at...