Federico Belotti
Federico Belotti
Hi, i've read from the [official repo](https://github.com/facebookresearch/xcit#getting-started) that the minimum pytorch version supported is 1.7.0, but in your porting `torch.div` with keyword argument `rounding_mode='floor'` is used, which is available from...
We should add some tests function to test the trained model. Given some results, we should also update the readme @DavideTr8
[LongLora](https://arxiv.org/abs/2309.12307) is "an efficient fine-tuning approach that extends the context sizes of pre-trained large language models". They propose to fine-tune a model with a sparse local attention while maintaining dense...
The XSLT to translate from MathML to LaTeX is very outdated, so it would be nice to refactor it to some LaTeX math standard
It will be useful to translate also to MathML or Mathematica languages.
Follow up of #1346. This PR introduces LongLora as in https://github.com/Lightning-AI/litgpt/issues/1237 for both the LoRa and full fine-tuning, while also enabling it during generation. cc @rasbt
Hi everyone, [recently has been proposed](https://arxiv.org/abs/2404.09610v1) to apply the dropout directly on the LoRA weight matrices A and B: this favors sparsity which improve generalization and reduce overfitting. The dropout...
[Nucleus sampling](https://arxiv.org/abs/1904.09751) (top-p sampling in HF) is a dynamic sampling strategy that "truncat[es] the unreliable tail of the probability distribution, sampling from the dynamic nucleus of tokens containing the vast...
I was also about to open an issue regarding Dreamer on feature-vector based (partially observable) environments where no Cnn is needed (and as a matter of fact, to also handle...
Hi everyone, in [this branch](https://github.com/Eclectic-Sheep/sheeprl/tree/feature/compile) one can use `torch.compile` to compile the Dreamer-V3 agent. In particular: * in the `sheeprl/configs/algo/dreamer_v3.yaml` one can decide what to compile and which arguments to...