hank comments

Results 61 comments of


                                            hank

请问在使用MP模式分词的时候，如何能让不在词典中的英文单词不被分成单个字母？

> 使用hmm唷！ hmm会产生新词，这样不行唷

tp > 1 inference is very slow

@mrwyattii First thank for your reply :) What is very strange is that I used your above example of tp=1 and tp=2 for testing. tp=2 costs a lot of time....

Can not train llama-7b due to OOM on 40GA100

@Fazziekey @FrankLeeeee Same OOM issue. The same is A100 40GB, 1 gpu running llama7B model, batch=1, max_seq_len=512, colossalai_zero2 placement_policy='cuda', use torch.cuda.memory_allocated() to analyze memory usage, in SFTTrainer self.optimizer = strategy.setup_optimizer...

Multi-GPU Training

> Hi all, > > I was trying to feed the model into 2 3090ti (24G VRAM each). Loading in 4bit, however I got the error message "ValueError: You can't...

zephyr-7b-beta fp16 engine outputs "\u68a6\u68a6\u68a6..." for long input ~7000 tokens

> Thanks. I will take a look at it. it looks there are some issues when multi_block_mode + sliding_window_attention works together. @PerkzZheng Is there any progress on this issue?

failed to use "stop_words_list" for tensorrt-llm==0.9.0

@Superjomn @byshiue Same question, can you give an example of successful use of stop_words or stop_words_list? Thank you, I am currently using the service started by tensorrtllm_backend, the commit number...

[Bug] Zero temperature curl request affects non-zero temperature requests

Same issue , can you help to research this issue? thanks @kaiyux

quantize.py fails to export important data to config.json (eg rotary scaling)

@byshiue Any process? At present, I still have this problem using the latest version of the code ( tensorrtllm: a96ccca)

quantize.py fails to export important data to config.json (eg rotary scaling)

> > @byshiue Any process? At present, I still have this problem using the latest version of the code ( tensorrtllm: [a96ccca](https://github.com/NVIDIA/TensorRT-LLM/commit/a96cccafcf6365c128f004f779160951f8c0801c)) > > it's fixed for my case, after...

quantize.py fails to export important data to config.json (eg rotary scaling)

> We have added it here https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/quantization/quantize_by_modelopt.py#L474-L478. Do you still encounter same issue on latest main branch? If so, could you try printing some debug message there to make sure...