Results 34 issues of Peter

Currently it take a huge amount of time to compile the model, we should add some precompile statements.

help wanted

To make Julia Object loadable from python, there are three issues: 1. For simple Julia struct (no functions): utilities the stack machine to build a similar python class and load...

Since there are no function/class bodys in pickle but only module and function name, we can't not reconstruct the function or object in Julia. proposed solution: read names and maintain...

This is an initial attempt to adopt the technique used in [flash attention](https://github.com/Dao-AILab/flash-attention). The implementation basically follows the pseudo-code in [flash attention 2 paper](https://arxiv.org/abs/2307.08691). The code is done in a...

The goal of the redesign is to support: 1. better type hierarchy for dispatch. This helps the function that can only accept sequence masks to work with combined masks. https://github.com/chengchingwen/Transformers.jl/blob/91a3fe00bad5bb9ebff35b61356c3d52ad3efba3/src/loss.jl#L29-L31...

There are many different design on how the update function/api should be implemented, but the application rules of each optimizer instances are the same. I wonder if we can separate...

I try to run the BiLSTM Max-out trained model but I get the following error message ``` Traceback (most recent call last): File "arc_solvers/run.py", line 10, in from arc_solvers.commands import...

Are there a predictor for the Question to Choices Max Attention model? and where did the model get supporting sentence?

MWE: ```julia julia> x = SizedVector{30000}(randn(30000)); julia> 0.5 in x ERROR: syntax: invalid syntax (memory-error out of gc handles) Stacktrace: [1] top-level scope @ REPL[4]:1 [2] _mapreduce @ ~/.julia/packages/StaticArrays/58yy1/src/mapreduce.jl:113 [inlined]...

MWE: ```julia julia> using ChainRulesCore, ChainRulesTestUtils julia> f(x) = x .* fill(2, size(x)) f (generic function with 1 method) julia> f1(x) = f(x) f1 (generic function with 1 method) julia>...