spacy-llm
spacy-llm copied to clipboard
Inconsistent output on Dolly NER
Here is a block of example.yml:
- text: Jack and Jill went up the hill.
spans:
- text: Jack
is_entity: true
label: PERSON
reason: is the name of a person
- text: Jill
is_entity: true
label: PERSON
reason: is the name of a person
- text: went up
is_entity: false
label: ==NONE==
reason: is a verb
- text: hill
is_entity: true
label: LOCATION
reason: is a location
Block of fewshot.cfg
[paths]
[nlp]
lang = "en"
pipeline = ["llm"]
batch_size = 128
[components]
[components.llm]
factory = "llm"
[components.llm.model]
@llm_models = "spacy.Dolly.v1"
name = "dolly-v2-3b"
[components.llm.task]
@llm_tasks = "spacy.NER.v3"
labels = PERSON,ORGANISATION,LOCATION
[components.llm.task.examples]
@misc = "spacy.FewShotReader.v1"
path = "example.yml"
[components.llm.task.normalizer]
@misc = "spacy.LowercaseNormalizer.v1"
Block of pipeline run:
from spacy_llm.util import assemble
nlp = assemble(
"fewshot.cfg"
)
doc = nlp("Jack and Jill went up the hill.")
print(f"Text: {doc.text}")
print(doc.ents)
print(f"Entities: {[(ent.text, ent.label_) for ent in doc.ents]}")
There are inconsistencies in output, and how do i resolve it?
python spacyllmtry.py
Text: Jack and Jill went up the hill.
(Jack, Jill, hill)
Entities: [('Jack', 'PERSON'), ('Jill', 'PERSON'), ('hill', 'LOCATION')]
python spacyllmtry.py
Text: Jack and Jill went up the hill.
()
Entities: []
Hi @nxitik, it seems that Dolly doesn't return the correct output. You can further debug this by setting save_io = True
in your config:
[components.llm]
factory = "llm"
save_io = True
I recommend a larger and newer model than dolly-v2-3b
- smaller models often struggle with more complex tasks like this one.
Does it allow quantized models from like for eg TheBloke?
```ini save_io = True
It still doesn't work for me, and I agree, it's better to use larger models