Peter comments

Results 143 comments of


                                            Peter

Add `within_gradient`

Is that argument `x` needed? It seems we can just define `within_gradient() = false` since we didn't (or can't?) detect whether the function is differentiated wrt `x`.

I would say `is_deriving(x)`/`is_differentiating(x)` is kind of weird for non-tracker AD. It sounds like you are checking whether the pullback get a `NoTangent` as being non-differentiable. Actually, that means this...

Add `within_gradient`

``` In fact it looks like Yota is smart enough to do that: julia> Yota.grad(x -> within_gradient(x) ? x^2 : x, 2.0) (2.0, (ZeroTangent(), 1)) ``` I am a little...

Future of RDatasets.jl

Sorry for barging in, but I'm quite curious about the idea of serving dataset with package server system. Wouldn't that be too much for the package server to cache? I...

be nice to see a Seq2seq or Transformer model example.

I have [one](https://github.com/chengchingwen/Transformers.jl/tree/master/example/AttentionIsAllYouNeed) in [Transformers.jl](https://github.com/chengchingwen/Transformers.jl)

be nice to see a Seq2seq or Transformer model example.

Sure. I'm also thinking about open a model zoo for Transformers.jl itself, since there are other models like gpt or bert.

Improve type stability of LayerNorm and Dropout

@ToucheSir Could you try running the layer norm gradient with gpu? I have try that manual broadcast fusion before but `CUDA.time` said it actually allocated more gpu memory

PyTorch feature parity

I think something need to be mentioned together with Embedding is the one-hot encoding implementation. The problem of Embedding/OneHotEncoding is to maintain semantics and composability without hurting the performance on...

PyTorch feature parity

@CarloLucibello I would like to add Einstein summation and tensor product to the discussion list. They are quite useful in some novel model design.

Adding support for checkpointing

Sounds good to have! HF handle it in the forward method of hf-models (equiv. `Layers.Transformer`). I'm not sure `Checkedpointed` as `AbstractTransformerBlock` is the best place to add the checkpoint functionality....