FileNotFoundError: [Errno 2] No such file or directory: 'local-ner-cache/9963044417883968883.spacy'
I changed the spacy.NER.v2 to spacy.NER.v3
ValueError: Prompt template in cache directory (local-ner-cache/prompt_template.txt) is not equal with current prompt template. Reset your cache if you are using a new prompt template.
After deleting the folder local-ner-cache, I encountered the following error:
FileNotFoundError: [Errno 2] No such file or directory: 'local-ner-cache/9963044417883968883.spacy'
What is the right way to "Reset your cache if you are using a new prompt template."?
Because after the deleting the folder local-ner-cache, I'm no longer able to annotate the same dataset:
dotenv run -- prodigy ner.llm.correct
There are still around 1k samples to annotate.
Hi @nikolaysm! Thanks for reporting this. We can't identify right away why this would happen, but will look into it.
Can you provide your spacy-llm config?
Hi @rmitsch,
spacy-llm-config.cfg:
[paths]
examples = "./assets/examples.json"
template = "./assets/prompt_template.txt/"
[nlp]
lang = "en"
pipeline = ["llm"]
[components]
[components.llm]
factory = "llm"
save_io = true
[components.llm.task]
@llm_tasks = "spacy.NER.v3"
labels = ["ORG", "PERSON"]
description = "Entities are the names of company,
associated brands, people.
Adjectives, verbs, adverbs are not entities.
Pronouns are not entities."
[components.llm.task.template]
@misc = "spacy.FileReader.v1"
path = "${paths.template}"
[components.llm.task.label_definitions]
ORG = "Extract the names of the companies and associated brands, e.g. ...."
PERSON = "Extract the people's names, e.g. ...."
[components.llm.task.examples]
@misc = "spacy.FewShotReader.v1"
path = "${paths.examples}"
[components.llm.model]
# Also tried to use "spacy.GPT-3-5.v3", spacy.GPT-3-5.v1, "spacy.GPT-4.v1", "spacy.GPT-4.v3"
@llm_models = "spacy.GPT-4.v1"
# Also tried to use "gpt-3.5-turbo", "gpt-4"
name = "gpt-4"
config = {"temperature": 0.3}
[components.llm.cache]
@llm_misc = "spacy.BatchCache.v1"
path = "local-ner-cache"
batch_size = 4
max_batches_in_mem = 10
Next, I see that the number of API requests is 567, which is much more than the total number of samples I annotated, which is 370:
Not sure, but the issue may be related to a timeout during the API call to OpenAI.