kronfluence LM training data attribution question for individual sentences

Hi - thank you for making this great library! I am trying to use it to implicate training data for differences in minimal pair sentences of the form:

the keys to the cabinet are on the table
the keys to the cabinet is on the table

where I just want to look at what factors affect "is" vs. "are". This would clearly require changes to the wikitext example where the eval/dev set was simply being grouped into chunks of fixed length sequences as opposed to individual sentences per row. I was wondering if I could simply pad all my queries with some fixed max length and then proceed as normal or is there something else I can do?

I tried using the pad sequence idea but was getting some weird matmul dimension errors (I was just trying with 4 examples in my dev set):

the toys on the table are
the toys on the table is
i think the toy on the table is
i think the toy on the table are

and then:

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", max_length=128)

def add_labels(examples):
    examples["labels"] = examples["input_ids"].copy()
    return examples

tokenized_test_dataset = test_dataset.map(
    tokenize_function,
    batched=True,
    num_proc=None,
    remove_columns=test_dataset["test"].column_names,
    load_from_cache_file=True,
    desc="Running tokenizer on dataset",
    batch_size=4
)

tokenized_test_dataset = tokenized_test_dataset.map(
    add_labels,
    batched=True,
    num_proc=None,
    load_from_cache_file=True,
    batch_size=4
)

when I then run the pairwise score computation, this is the error I get:

RuntimeError: The size of tensor a (4) must match the size of tensor b (512) at non-singleton dimension 1

Any assistance would be much appreciated - please let me know if I should share more details!

Jan 01 '25 17:01 kanishkamisra

Sorry for the late reply! I am unsure what the issue might be here - if you are still stuck, please feel free to leave a full code. I can also take a look at where the dimension mismatch happens.

Jan 17 '25 09:01 pomonam

Thanks for the reply! I am collaborating with @kanishkamisra, and I think the issue lies in summing up the factor tensors per module in compute_dot_products_with_loader when compute_per_module_scores=False.

In the above example, we are trying to compute the influence score for a simple model (a fine-tuned OPTForCausalLM) with attention layers k_proj, v_proj, q_proj, and out_proj` and two other linear layers. The linear layers have different dimensions compared to the attention layers.

ie., (k_proj/v_proj/q_proj/out_proj): Linear(in_features=256, out_features=256, bias=True)

versus

(fc1): Linear(in_features=256, out_features=1024, bias=True) (fc2): Linear(in_features=1024, out_features=256, bias=True)

After fitting all factors, the line module.get_factor(factor_name=PAIRWISE_SCORE_MATRIX_NAME) in dot_product.py returns different factor shapes for the attention layers ([per_device_query_batch_size, train_batch_size]) versus the linear layers ([per_device_query_batch_size x 100, train_batch_size x 100]).

We are unsure why the linear layer's factor tensor shapes are very different. Inspecting the tensors shows they are primarily sparse but contain more non-zero elements than the attention factor tensors.

Any comments or suggestions are greatly appreciated! We can compute pairwise scores with compute_per_module_scores=False. However, knowing how to aggregate across modules with different factor tensors smartly would be helpful.

Jan 21 '25 04:01 SamSoup