Monohydroxides

Results 7 comments of Monohydroxides

@tjruwase > Can you describe how you observed that optimizer state is in reduced precision? The ZeRO design is to keep the master weights and optimizer states in fp32 precision....

The pretrain data and the SFT data follow different organization formats. Simply put, the pretrain data does not include user, assistant, or their associated special tokens. You may refer to...

> @karamavusibrahim any progress here ? We are facing similar issues while combining Webdataloader with Accelerate, ending up using Webdataset + torch dataloader. @muellerzr Any more insights here ? Could...

Hello, thank you for your interest in LLaDA. We plan to open-source the evaluation metrics for the LLaDA Base model using the lm-evaluation-harness library. This may take some time to...

No, these two special tokens are not used during pre-training, so they do not have any effect.

> thanks for your reply. Is there a suggested special token for us to handle numbers specifically? Are and used in the pre-training? How are "role" tokens used in pre-training?...