Yang
Yang
I found that I can not make NeMo report many metrics like rouge, loss, etc. at the same time for the end of each evaluation loop. I believe it is...
# One word for all As reported in the technical report, the bs: 256 is seemly a large-scale batch size with 2048 max sequence length. I wonder what hardware environment...
### System Info - OS: Ubuntu 22.04.3 LTS - GPU count and types: one machine with 4 x NVIDIA H100 PCIe - Python version: 3.10.12 - Any other relevant info...
From a design perspective, [Here](https://github.com/Leeroo-AI/mergoo/blob/main/mergoo/models/modeling_llama.py#L242), shall we consider to add the original `x` to the hidden states of `down_proj`?
Hi there, thanks mergoo, an amazing code base for MoE model construction. A crucial feature that may need to be implemented is that mergoo should let the user select the...
Please considerate supporting the recently released [Olmo3](https://allenai.org/blog/olmo3) models into verl. Thank you so much!
It seems that there is only prompt and dataset for summeval, request for the one of TopicalChat in the original paper. :) Thanks!