LM training data attribution question for individual sentences
Hi - thank you for making this great library! I am trying to use it to implicate training data for differences in minimal pair sentences of the form:
the keys to the cabinet are on the table
the keys to the cabinet is on the table
where I just want to look at what factors affect "is" vs. "are". This would clearly require changes to the wikitext example where the eval/dev set was simply being grouped into chunks of fixed length sequences as opposed to individual sentences per row. I was wondering if I could simply pad all my queries with some fixed max length and then proceed as normal or is there something else I can do?
I tried using the pad sequence idea but was getting some weird matmul dimension errors (I was just trying with 4 examples in my dev set):
the toys on the table are
the toys on the table is
i think the toy on the table is
i think the toy on the table are
and then:
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", max_length=128)
def add_labels(examples):
examples["labels"] = examples["input_ids"].copy()
return examples
tokenized_test_dataset = test_dataset.map(
tokenize_function,
batched=True,
num_proc=None,
remove_columns=test_dataset["test"].column_names,
load_from_cache_file=True,
desc="Running tokenizer on dataset",
batch_size=4
)
tokenized_test_dataset = tokenized_test_dataset.map(
add_labels,
batched=True,
num_proc=None,
load_from_cache_file=True,
batch_size=4
)
when I then run the pairwise score computation, this is the error I get:
RuntimeError: The size of tensor a (4) must match the size of tensor b (512) at non-singleton dimension 1
Any assistance would be much appreciated - please let me know if I should share more details!
Sorry for the late reply! I am unsure what the issue might be here - if you are still stuck, please feel free to leave a full code. I can also take a look at where the dimension mismatch happens.
Thanks for the reply! I am collaborating with @kanishkamisra, and I think the issue lies in summing up the factor tensors per module in compute_dot_products_with_loader when compute_per_module_scores=False.
In the above example, we are trying to compute the influence score for a simple model (a fine-tuned OPTForCausalLM) with attention layers k_proj, v_proj, q_proj, and out_proj` and two other linear layers. The linear layers have different dimensions compared to the attention layers.
ie., (k_proj/v_proj/q_proj/out_proj): Linear(in_features=256, out_features=256, bias=True)
versus
(fc1): Linear(in_features=256, out_features=1024, bias=True) (fc2): Linear(in_features=1024, out_features=256, bias=True)
After fitting all factors, the line
module.get_factor(factor_name=PAIRWISE_SCORE_MATRIX_NAME) in dot_product.py returns different factor shapes for the attention layers ([per_device_query_batch_size, train_batch_size]) versus the linear layers ([per_device_query_batch_size x 100, train_batch_size x 100]).
We are unsure why the linear layer's factor tensor shapes are very different. Inspecting the tensors shows they are primarily sparse but contain more non-zero elements than the attention factor tensors.
Any comments or suggestions are greatly appreciated! We can compute pairwise scores with compute_per_module_scores=False. However, knowing how to aggregate across modules with different factor tensors smartly would be helpful.