Lzhang-hub issues

Results 17 issues of


                                            Lzhang-hub

A single GPU card may be oversold during scheduling

Through ip:5678/metric ,I get the number of containers on each GPU card and the amount of resources remaining on each card. I find there is oversold situation. container info on...

run 05_stable_diffusion scripts/compile.py failed

run step: ``` # build docker image ./docker/build.sh cuda # run docker docker run -it --gpus=all ait:latest bash # run scripts cd /AITemplate/examples/05_stable_diffusion python3 scripts/download_pipeline.py python3 scripts/compile.py ``` error log:...

Qwen14B model result of long prompt is different with hf result

### System Info GPU: rtx8000 Diver version: 525.85.05 Cuda version: 12.0 Syetem: ubuntu20.04 ### Who can help? _No response_ ### Information - [ ] The official example scripts - [...

bug

any plan support Volta?

### Feature request support Volta gpu ### Motivation support Volta gpu ### Your contribution ....

support Yi-Vl

Is any plan support Yi-Vl? https://huggingface.co/01-ai/Yi-VL-34B

First GPU use more memory，and lead to OOM

## Condition: GPU: A100 40G *8 batch size=2 ## error CUDA out of memory. ## some confusion We find the gpu 0 use more memory than other gpus ![image](https://github.com/justinpinkney/stable-diffusion/assets/57925599/53e94f81-2a8d-4ba5-b49b-22996eda9607) And...

batch_response() 耗时和prompt list长度成线性关系

```python st=time.time() prompts=[text] config = pyfastllm.GenerationConfig() res=model.batch_response(prompts, None, config) one_time=time.time()-st print(one_time) multi_st=time.time() prompts=[text,text,text,text] config = pyfastllm.GenerationConfig() res=model.batch_response(prompts, None, config) multi_time=time.time()-multi_st print(multi_time) ``` multi_time 差不多是one_time的四倍？请教一下是有参数配置的不合理导致的嘛

aglang

I test yi-vl-6B with `srt_example_yi_vl.py` get error: ``` AttributeError: 'TokenizerManager' object has no attribute 'executor ```

Flash attention support softcap.

# Description Flash attention had support softcap in commit [8f873cc6](https://github.com/Dao-AILab/flash-attention/commit/8f873cc6acac2933d757b2ed6069518d619b341b), which is used in [gemma2](https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf). Fixes # (issue) ## Type of change - [ ] New feature (non-breaking change which...

AttributeError: module 'transformer_engine' has no attribute 'pytorch'

I reinstall `pip install flash-attn==2.6.1` in NGC pytorch docker image 24.06. When I run train job, I got follow error: ``` Traceback (most recent call last): File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/pretrain_gpt.py", line 8,...