Puyuan Liu

Results 8 issues of Puyuan Liu

Is there a way to log experiments to wandb? e.g., loss, lr and customized metrics.

I was training an GPT-Neo (2.8B) model using the step1 script on 4 A10G GPUs. I used the default parameters in the example script but zero_stage=2 is consuming more GPU...

deespeed chat

Is FasterTransformer developed based on TensorRT? Is FasterTransformer more efficient than TensorRT when perfoming inference with Transformer models (e.g., llama)? And what's the difference between FasterTransformer and Huggingface/betterTransformers?

### Feature request Currently, bnb only supports block size of 64 or above. It would be great if it can support block size of 32, like llama.cpp ### Motivation Better...

I am running a loop to insert sparse embeddings into an on-disk client. The loop becomes progressively slower—it starts taking less than 1 second at the beginning but slows down...

Is it possible to create a block that determines which flow to execute based on conditions? For example, execute different flows based on the user's intent.

question

In create_hf_model, what's the purpose of resizing the model embedding? model.config.end_token_id = tokenizer.eos_token_id -- 44 | model.config.pad_token_id = model.config.eos_token_id 45 | model.resize_token_embeddings(int( 46 | 8 * 47 | math.ceil(len(tokenizer) /...

It looks like the log probability of both chosen& rejected trajectory is keep decreasing, which is strange. ![image](https://github.com/eric-mitchell/direct-preference-optimization/assets/121119211/a2f635a1-0874-4f90-bc19-5083a494e14b)