Puyuan Liu issues

Results 8 issues of


Puyuan Liu

Logging experiments to wandb

Is there a way to log experiments to wandb? e.g., loss, lr and customized metrics.

ZeRO Stage 2 consumes more GPU memory than Stage 1

I was training an GPT-Neo (2.8B) model using the step1 script on 4 A10G GPUs. I used the default parameters in the example script but zero_stage=2 is consuming more GPU...

deespeed chat

What's the difference between FasterTransformer and TensorRT

Is FasterTransformer developed based on TensorRT? Is FasterTransformer more efficient than TensorRT when perfoming inference with Transformer models (e.g., llama)? And what's the difference between FasterTransformer and Huggingface/betterTransformers?

Support block size of 32

### Feature request Currently, bnb only supports block size of 64 or above. It would be great if it can support block size of 32, like llama.cpp ### Motivation Better...

upsert is slow for sparse embeddings

I am running a loop to insert sparse embeddings into an on-disk client. The loop becomes progressively slower—it starts taking less than 1 second at the beginning but slows down...

Conditional Flow

Is it possible to create a block that determines which flow to execute based on conditions? For example, execute different flows based on the user's intent.

question

Resizing model embedding when loading the model

In create_hf_model, what's the purpose of resizing the model embedding? model.config.end_token_id = tokenizer.eos_token_id -- 44 | model.config.pad_token_id = model.config.eos_token_id 45 | model.resize_token_embeddings(int( 46 | 8 * 47 | math.ceil(len(tokenizer) /...

Strange loss pattern

It looks like the log probability of both chosen& rejected trajectory is keep decreasing, which is strange. ![image](https://github.com/eric-mitchell/direct-preference-optimization/assets/121119211/a2f635a1-0874-4f90-bc19-5083a494e14b)