PolyFuzz icon indicating copy to clipboard operation
PolyFuzz copied to clipboard

How to use PolyFuzz with T5 model?

Open bhishanpdl opened this issue 1 year ago • 1 comments

I am learning the use case of polyfuzz with T5 embedding. I am getting error when using following code:

polyfuzz: 0.4.0
transformers: 4.26.1
torch: 1.13.1+cu117
tensorflow: 2.11.0
tensorflow_hub: 0.12.0

MWE

import torch
import numpy as np
import polyfuzz
from transformers import T5Tokenizer, T5ForConditionalGeneration

# Load the T5 model and tokenizer
model_name = 't5-small'
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = model.from_pretrained(model_name)

# Define your target and candidate strings
target_strings = ['The quick brown fox jumps over the lazy dog', 'The sky is blue']
candidate_strings = ['The fox is quick and the dog is lazy', 'The ocean is blue']

# Tokenize the strings and convert them to T5 embeddings
target_tokens = tokenizer.batch_encode_plus(target_strings, padding=True, truncation=True, return_tensors='pt')
candidate_tokens = tokenizer.batch_encode_plus(candidate_strings, padding=True, truncation=True, return_tensors='pt')
with torch.no_grad():
    target_embeddings = model.encoder(input_ids=target_tokens['input_ids']).last_hidden_state.detach().numpy()
    candidate_embeddings = model.encoder(input_ids=candidate_tokens['input_ids']).last_hidden_state.detach().numpy()

# Create a PolyFuzz object with default settings
model = polyfuzz.PolyFuzz()

# Fit the model with the T5 embeddings
model.fit(target_embeddings, candidate_embeddings)

# Get the matches between the target and candidate strings
matches = model.get_matches()

bhishanpdl avatar Feb 16 '23 21:02 bhishanpdl