Patrick Kidger comments

Results 1451 comments of


                                            Patrick Kidger

Using optimistix to solve optimization problems in parallel?

The overall computational cost will be `batch size × greatest number of steps for any batch element`. For example if there are a batch of 2 elements, the first batch...

Question about autoparallelism

So `filter_shard` is just a thin wrapper around wsc: https://github.com/patrick-kidger/equinox/blob/8191b113df5d985720e86c0d6292bceb711cbe94/equinox/_sharding.py#L40 it's designed to make it (a) easier to handle static arguments and (b) to unify device/sharding information. So indeed it's...

Question about autoparallelism

> Any idea why XLA complains when using filter_shard or lax.with_sharding_constraint inside a JIT'd function with PRNG keys? Maybe it's more of a question/issue for JAX or XLA folks... That...

LoRA that doesn't require memory for zero gradients of the underlying matrices

Actually, I think JAX is exactly that clever :) Optimizing `x+0` to just `x` is a simple optimization that XLA should perform for us. That said I'd be happy to...

LoRA that doesn't require memory for zero gradients of the underlying matrices

Hmm, that's unfortunate if so. Quax is still a fairly experimental library, so I'd be happy to take suggestions on how we might adjust the internals to work around this....

evaluating at different time points per batch

I don't believe this is possible, unfortunately. I think for this I would recommend using JAX, and in particular the interpolation routines in [Diffrax](https://github.com/patrick-kidger/diffrax) as a better more featureful option.

mypy type checking seems to break in strict mode -- a mypy bug?

I can't replicate your issue I'm afraid. Running: ```python import torch from jaxtyping import Float def simple_test_a(x: Float[torch.Tensor, "dim1"]) -> torch.Tensor: reveal_type(x) return x def simple_test_b(x: Float[torch.Tensor, "dim1"]) -> float:...

Patrick Kidger

Using optimistix to solve optimization problems in parallel?

Question about autoparallelism

Question about autoparallelism

LoRA that doesn't require memory for zero gradients of the underlying matrices

LoRA that doesn't require memory for zero gradients of the underlying matrices

evaluating at different time points per batch

mypy type checking seems to break in strict mode -- a mypy bug?

TST: Add pytest-codspeed and benchmarking suite

ImportError: cannot import name 'Array' from 'jaxtyping'

ImportError: cannot import name 'Array' from 'jaxtyping'