flair icon indicating copy to clipboard operation
flair copied to clipboard

How to update the labels of a dataset loaded from Flair?

Open DonaldFeuz opened this issue 1 year ago • 1 comments

Question

Hi, I am loading a JNLPBA dataset from the Flair library, and I would like to keep only the protein mentions, renaming them as "Gene." Additionally, I want to remove all other labels different from 'protein' in the dataset for training my gene REN model. However, when I go through the Flair documentation, I can't find a way to achieve my goal as all my attempts fail. Here is an example of the code I wrote.

`from flair.data import Sentence

def rename_and_remove_labels(sentence: Sentence):

new_labels = []


for label in sentence.get_labels():
    if label.value == 'protein':
        # Ajouter un nouveau label 'Gene' pour chaque label 'protein'
        new_labels.append((label.data_point.start_position, label.data_point.end_position, 'Gene'))


sentence.remove_labels([label.value for label in sentence.get_labels()])


for start_pos, end_pos, new_label in new_labels:
    span = sentence[start_pos:end_pos]
    span.add_label(new_label)

return sentence

sentence = Sentence("IL-2 gene expression and NF-kappa B activation through CD28 requires reactive oxygen production by 5-lipoxygenase.") sentence[0:2].add_label('ner', 'DNA') sentence[4:6].add_label('ner', 'protein') sentence[8:9].add_label('ner', 'protein') sentence[14:15].add_label('ner', 'protein')

print("Avant :") for label in sentence.get_labels(): print(label) print(sentence)

sentence = rename_and_remove_labels(sentence)

print("\nAprès :") for label in sentence.get_labels(): print(label) `

DonaldFeuz avatar Jun 15 '24 17:06 DonaldFeuz

Hello @DonaldFeuz what Flair version are you on? How are you loading the JNLPBA dataset?

alanakbik avatar Jun 21 '24 18:06 alanakbik

Closing, due to lack of activity.

alanakbik avatar Mar 11 '25 04:03 alanakbik