Entity-Recognition-In-Resumes-SpaCy icon indicating copy to clipboard operation
Entity-Recognition-In-Resumes-SpaCy copied to clipboard

ValueError: [E103]

Open Huzmorgoth opened this issue 5 years ago • 21 comments

I get the error mentioned below while training, even when I used the same code.

ValueError: [E103] Trying to set conflicting doc.ents: '(6861, 6870, 'Companies worked at')' and '(6305, 7258, 'Skills')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

Huzmorgoth avatar Nov 10 '19 21:11 Huzmorgoth

@Huzmorgoth paste this code `# trim some entity def trim_entity_spans(data: list) -> list:

invalid_span_tokens = re.compile(r'\s')
cleaned_data = []
for text, annotations in data:
    entities = annotations['entities']
    valid_entities = []
    for start, end, label in entities:
        valid_start = start
        valid_end = end
        while valid_start < len(text) and invalid_span_tokens.match(
                text[valid_start]):
            valid_start += 1
        while valid_end > 1 and invalid_span_tokens.match(
                text[valid_end - 1]):
            valid_end -= 1
        valid_entities.append([valid_start, valid_end, label])
    cleaned_data.append([text, {'entities': valid_entities}])

return cleaned_data`

Abhimanyu100 avatar Nov 11 '19 09:11 Abhimanyu100

@Abhimanyu100 Hi, I tried but it's not working, the same issue occurring.


Statring iteration 0 Traceback (most recent call last):

File "", line 1, in . . . _format_docs_and_golds gold = GoldParse(doc, **gold)

File "gold.pyx", line 715, in spacy.gold.GoldParse.init

File "gold.pyx", line 925, in spacy.gold.biluo_tags_from_offsets

ValueError: [E103] Trying to set conflicting doc.ents: '(3385, 3391, 'Companies worked at')' and '(3345, 3896, 'Skills')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

Huzmorgoth avatar Nov 11 '19 12:11 Huzmorgoth

i also have this error..

ValueError: [E103] Trying to set conflicting doc.ents: '(370, 392, 'Designation')' and '(370, 391, 'Designation')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

Nisit007 avatar Nov 19 '19 12:11 Nisit007

[Edit] Which spacy version you are using? I'm able to resolve this issue.

Abhimanyu100 avatar Nov 19 '19 12:11 Abhimanyu100

Python 3

Huzmorgoth avatar Nov 19 '19 12:11 Huzmorgoth

I'm sorry. I was asking for Spacy version.

Abhimanyu100 avatar Nov 19 '19 13:11 Abhimanyu100

Oh damn, it's 2.2.2

Huzmorgoth avatar Nov 19 '19 13:11 Huzmorgoth

Use Spacy version 2.1.4 I was able to get results with this library. Let me know if this works for you.

Abhimanyu100 avatar Nov 21 '19 11:11 Abhimanyu100

I am using spacy 2.2.3. In the older version of spacy, there was a bug which messed up the model after loading from disk. So, I had to update spacy and when I updated, I came across this issue. Sadly, I couldn't find a workaround and had to manually remove all conflicting entities. I have both testdata.json and traindata.json with cleaned data which will not raise this error. But I am not able to attach json format here.

sayalraza avatar Dec 10 '19 09:12 sayalraza

I am using spacy 2.2.3. In the older version of spacy, there was a bug which messed up the model after loading from disk. So, I had to update spacy and when I updated, I came across this issue. Sadly, I couldn't find a workaround and had to manually remove all conflicting entities. I have both testdata.json and traindata.json with cleaned data which will not raise this error. But I am not able to attach json format here.

Hey could you post it in your own git and share the file?

vverman avatar Mar 02 '20 06:03 vverman

I got the same error as well. ValueError: [E103] Trying to set conflicting doc.ents: '(6861, 6870, 'Companies worked at')' and '(6305, 7258, 'Skills')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap. It would be very helpful if someone can help out

Srijha09 avatar Aug 17 '20 05:08 Srijha09

I am using spacy 2.2.3. In the older version of spacy, there was a bug which messed up the model after loading from disk. So, I had to update spacy and when I updated, I came across this issue. Sadly, I couldn't find a workaround and had to manually remove all conflicting entities. I have both testdata.json and traindata.json with cleaned data which will not raise this error. But I am not able to attach json format here.

Hi, could you share the test and train.json. Thank you

JasonLing95 avatar Sep 07 '20 04:09 JasonLing95

I am encounteering the same problem: ValueError: [E103] Trying to set conflicting doc.ents: '(1155, 1199, 'Email Address')' and '(1143, 1240, 'Links')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

Did you guys figure out a way to resolve it?

B-Yassine avatar Dec 02 '20 09:12 B-Yassine

@sayalraza Can you share the stated clean dataset

aditya-malte avatar Feb 05 '21 13:02 aditya-malte

I am using spacy 2.2.3. In the older version of spacy, there was a bug which messed up the model after loading from disk. So, I had to update spacy and when I updated, I came across this issue. Sadly, I couldn't find a workaround and had to manually remove all conflicting entities. I have both testdata.json and traindata.json with cleaned data which will not raise this error. But I am not able to attach json format here.

@sayalraza Hey, can you please share the clean dataset. Thanks in advance!

udara-kw avatar Jul 20 '21 16:07 udara-kw

try installing this version :

pip install spacy==2.0.18

harshgeek4coder avatar Aug 02 '21 05:08 harshgeek4coder

try installing this version :

pip install spacy==2.0.18

@harshgeek4coder were you able to solve it?

siddharth271101 avatar Aug 15 '21 09:08 siddharth271101

v3 gives new error so try for pip install spacy==2.2.4 (collab pre installed - feb 22)

gamingflexer avatar Feb 06 '22 22:02 gamingflexer

[E103] Trying to set conflicting doc.ents: '(402, 818, 'Skills')' and '(817, 1118, 'worked at')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap. I'm also getting the same error while training the code. Anyone, please help me to run the code also. I'm not that much familiar in machine learning

Seemz246 avatar Aug 19 '22 03:08 Seemz246

spaCy version 2.3.5
Python version 3.9.10 using this version

Seemz246 avatar Aug 19 '22 03:08 Seemz246

I have found this code that fixes the overlapping issue.

def clean_entities(training_data):
  clean_data = []
  for text, annotation in training_data:
        
    entities = annotation.get('entities')
    entities_copy = entities.copy()
        
    # append entity only if it is longer than its overlapping entity
    i = 0
    for entity in entities_copy:
      j = 0
      for overlapping_entity in entities_copy:
        # Skip self
        if i != j:
          e_start, e_end, oe_start, oe_end = entity[0], entity[1], overlapping_entity[0], overlapping_entity[1]
          # Delete any entity that overlaps, keep if longer
          if ((e_start >= oe_start and e_start <= oe_end) \
          or (e_end <= oe_end and e_end >= oe_start)) \
          and ((e_end - e_start) <= (oe_end - oe_start)):
            entities.remove(entity)
        j += 1
      i += 1
    clean_data.append((text, {'entities': entities}))
                
  return clean_data

BillelBenoudjit avatar Jan 17 '23 08:01 BillelBenoudjit