SEV issues

Results 11 issues of

SEV

Why Choose QA Task in Batched edits with MEND and ENN?

Editing tasks have three categories: binary classification, QA and generation. In Batched edits, Why choose QA? And What is the result under finetune(FT)? Thanks!

Hi , What should I do if I want to train the model myself on conll2012 dataset?

How to apply this methods to a new LMs without fine-tune?

Hi, In your paper, does the model edited in MEND methods must be fine-tuned? How to apply the MEND to the model without fine-tuning ? When I try to do...

How to apply this methods to a new LMs without fine-tune, like a bert?

Hi, In your paper, does the model edited in ROME or MEND methods must be fine-tuned? How to apply the ROME to the model without fine-tuning ? For example, the...

Why don't mask during Testing?

https://github.com/kssteven418/LTP/blob/f1d5ec88aba913de5e2b4aa502af9cf0ab7bb13f/src/transformers/models/ltp/modeling_ltp.py#L247 if self.training and not self.hard_masking: if pruner_outputs is not None: threshold, pruning_scores = pruner_outputs['threshold'], pruner_outputs['scores'] self.mask = torch.sigmoid((pruning_scores - threshold) / self.temperature) layer_output = layer_output * self.mask.unsqueeze(-1)

The code is training Aligner or using the Aligner to train the strong-model?

And the GPU I needed, at least can support train two 7B models ?

Questions about: gumbel_softmax

Hi, I find the code in rlattention.py : 'from thumt.layers.gumbel import gumbel_softmax' But in the layers fload, there isn't the Class gumbel ?

How to use the Vicuna model?

The Vicuna model will generate some unrelated output, so how to control the max_lrngth in : model.generate(inputs.input_ids.cuda(), max_length=??)

How to run this code with CUDA >12.0?

simple-knn and diff-gaussian-rasterization could install with cuda 12?

How to run it with multi device?

``` ./RWKU/KnowledgeCircuits-main/KnowledgeCircuits-main/transformer_lens/components.py:625, in AbstractAttention.forward(self, query_input, key_input, value_input, past_kv_cache_entry, additive_attention_mask, attention_mask) 616 result = self.hook_result( 617 bnb.matmul_4bit( 618 z.reshape(z.shape[0], z.shape[1], self.cfg.d_model), (...) 622 ) 623 ) 624 else: --> 625 result...

enhancement