DoLa
DoLa copied to clipboard
Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"
Hello, I'm following your work and trying to run your code. When I install the environment in my anaconda env with `pip install -e transformers-4.28.1` I meet the following problem:...
Hi Team, great work on the project! I noticed something in `dola.py` line 217 that might need attention. The code is: ```python log_probs = diff_logits[range(diff_logits.shape[0]), continue_ids].sum().item() ``` I am wondering...
https://github.com/voidism/DoLa/blob/dc88907406f9744f748f3c779f2353efd5bdc824/transformers-4.28.1/src/transformers/models/llama/modeling_llama.py#L703 I think you guys should apply model.norm layer to hidden_states[early_exit_layer] . Because only the last hidden_state applied model.norm layer. See https://github.com/voidism/DoLa/blob/dc88907406f9744f748f3c779f2353efd5bdc824/transformers-4.28.1/src/transformers/models/llama/modeling_llama.py#L594
automatcially -> automatically
Hi, As the output of the model in each token's position represents the possibilities of next token, should the calculation of log_probs be misaligned. I mean "diff_logits[range(diff_logits.shape[0]-1), continue_ids[1:]].sum().item()" instead of...
GPT-3 has been deprecated. What type of model should I use to fine-tune into a GPT-judge? Also, due to the change in the fine-tuning format, what changes should I make...
I think there is a problem with the implementation of the Jensen-Shannon divergence in DoLa and a new hallucination detection method ReDeep. Here I described the problem: https://github.com/Jeryi-Sun/ReDEeP-ICLR/issues/2https://github.com/Jeryi-Sun/ReDEeP-ICLR/issues/2 The code...
Thanks for your great work! Can dola support more LLM model? Such as llama3.1, llama2, qwen2 or Mistral serious?
Thank you for your excellent work. 1. I think [**"scores_normalized = scores.log_softmax(dim=-1)"**](https://github.com/voidism/DoLa/blob/11b73b74ec1a72216e3c97c587177d72d8288f8f/dola.py#L113C8-L113C56) is redundant because **“scores”** has already been passed through log_softmax (i.e., "final_logits = final_logits.log_softmax(dim=-1)"). 2. When you fix...