HGB issues

Results 18 issues of

HGB

mergekit-evolve underutilized GPUs after finishing evals

During evaluation phase, when GPUs have created the merge, are done with lm_eval on the task, they sit idle doing nothing. Running this on 8xMI300X ``` mergekit-evolve --task-search-path task_dir --max-fevals...

Saving Output takes significantly longer than the sampling steps.

For this part: ``` # Crop the padded images to the desired resolution and number of frames (pad_left, pad_right, pad_top, pad_bottom) = padding pad_bottom = -pad_bottom pad_right = -pad_right if...

Performance diminish after running more than 40 steps on TPUs.

Performance of 40 steps vs 100 steps. ``` Step 1/40 2%|██▏ | 1/40 [00:01

Support for Flash Attention 3 for Ampere, Ada, and Hopper in LMDeploy

Flash Attention 3 now works with these platforms, is it easily possible for LMDeploy team to implement this? @lvhan028 https://github.com/Dao-AILab/flash-attention/issues/1049#issuecomment-2695283567

Example of distributed training?

ValueError: Incompatible shapes for broadcasting

``` File "/home/kojoe/miniconda3/envs/vllm/lib/python3.12/site-packages/gemma/gm/text/_sampler.py", line 311, in sample init_state = _prefill.prefill( ^^^^^^^^^^^^^^^^^ File "/home/kojoe/miniconda3/envs/vllm/lib/python3.12/site-packages/gemma/gm/text/_prefill.py", line 110, in prefill out = model.apply( ^^^^^^^^^^^^ File "/home/kojoe/miniconda3/envs/vllm/lib/python3.12/site-packages/kauldron/utils/train_property.py", line 141, in decorated return fn(*args, **kwargs)...

ValueError: Streaming is not supported for batched prompts. Let us know if you need this feature.

NotImplementedError: Multi-turn with images on the second turn is not supported yet

``` # Common imports import os import jax.numpy as jnp import tensorflow_datasets as tfds # Gemma imports from gemma import gm os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"] = "1.00" ds = tfds.data_source("oxford_flowers102", split="train") image1 =...