Results 34 comments of Jonathan Tow

I've attempted an implementation of an Embedding layer but am running into problems with the Layer protocol's input type requirements. Given that an Embedding layer consumes tensors of indices (UInt/Int)...

Hey @Shashi456. Yup. It just wouldn't compile as it relied on the `Raw.gather(params:, atIndices:)` function which requires a BinaryInteger for the second argument. Thanks @rxwei I'll give it a try.

Richard's advice resolved the compiler issues I had before regarding input types. Thanks for the suggestion @eaplatanios. The only issue left seems to be differentiating `gathering`. I'll keep an eye...

@Shashi456 Categorical Cross-Entropy seems to already be implemented through Softmax Cross-Entropy with Logits. Maybe we can cross it off the list?

**Layer Gradient Tests** - [ ] Sequential - [ ] Conv1D - [x] Conv2D - [x] Conv3D - [x] DepthConv2D - [ ] SeparableConv1D - [x] SeparableConv2D - [x] ZeroPadding1D...

Hi, @ArjunSubramonian ! The Hindi version of the prompt also results in that same error. I'll merge the PR on the eval-harness side so that we can await the `promptsource`...

These models were only intentionally trained on English data, but some sources within the dataset are known to contain text from other languages. Therefore, you may be able to interact...

@ethankim00 This looks great! I've made a few changes based on some testing on our cluster. Here's the summary: * Updates "gpt_neo"` model type name in the modifier map. *...

Hello, @James4Ever0 ! We do not plan on incorporating reward modeling into this repository. If you want to get a better idea of such fine-tuning (SFT + RMs) in practice,...

Note: #43 removes the `flake8` F82 undefined-name check. This needs to be caught by `mypy`.