One

Results 109 comments of One

Besides, it would be greatly appreciated if pre-training details including hyperparameters are also open-sourced (see https://github.com/xai-org/grok-1/issues/23), as they're extremely important for conducting a full fine-tune.

I've found a temporary solution by using `wandb.init(reinit=True, ...)` and not calling `wandb.finish()` after a run. ```python def run_multiple_times(): while True: wandb.init(reinit=True, ...) # training code... # wandb.finish() ``` My...

Figured this out. This is because NCCL cannot use the memory in the PyTorch memory pool and a CUDA OOM occurs during NCCL collective operation. Set `NCCL_DEBUG=INFO` to see NCCL...

> Figured this out. This is because NCCL cannot use the memory in the PyTorch memory pool and a CUDA OOM occurs during NCCL collective operation. Set `NCCL_DEBUG=INFO` to see...

Have you fixed the issue? We released a new version and tested the following setup: ``` conda create -y --name openchat conda activate openchat conda install -y python conda install...

Is there any follow-up to this issue? As @lgeiger did, I also observed that the performance is ~3x slower than float32, it would be great if we could speed up...

OpenChat added prompt templates during the dataset generation phase. So openchat.train.json already contains a template. Please have a look at openchat.[train/eval].text.json for a better explanation. We haven't conducted ablation on...

Closing this issue now. If you have further questions, plz re-open.

1. The conversation template involves concatenating tokens, and cannot be expressed by plain text. You can try the following code: ```python def tokenize_single_input(tokenizer, prompt): human_prefix = "Human: " ai_prefix =...

Closing this issue now. If you have further questions, plz re-open.