H-Simpson123

Results 2 issues of H-Simpson123

### Describe the bug I'm trying to create embeddings for some documents with langchain and openllm. With each request the GPU RAM consumption increases by some hundred MB until OpenLLM...

The finetuning of Qwen2-57B-A14B-Instruct is extremely slow compared to finetuning of Qwen2-72B-Instruct. Here are the runtimes: **Qwen/Qwen2-7B-Instruct:** {'train_runtime': 100.8509, 'train_samples_per_second': 5.652, 'train_steps_per_second': 0.099, 'train_loss': 0.751581035554409, 'epoch': 10.0} **Qwen/Qwen2-72B-Instruct:** {'train_runtime': 483.8572,...