spacy-llm FileNotFoundError: [Errno 2] No such file or directory: 'local-ner-cache/9963044417883968883.spacy'

I changed the spacy.NER.v2 to spacy.NER.v3

ValueError: Prompt template in cache directory (local-ner-cache/prompt_template.txt) is not equal with current prompt template. Reset your cache if you are using a new prompt template.

After deleting the folder local-ner-cache, I encountered the following error:

FileNotFoundError: [Errno 2] No such file or directory: 'local-ner-cache/9963044417883968883.spacy'

What is the right way to "Reset your cache if you are using a new prompt template."?

Because after the deleting the folder local-ner-cache, I'm no longer able to annotate the same dataset:

dotenv run -- prodigy ner.llm.correct

There are still around 1k samples to annotate.

Jan 18 '24 11:01 nikolaysm

Hi @nikolaysm! Thanks for reporting this. We can't identify right away why this would happen, but will look into it.

Jan 19 '24 11:01 rmitsch

Can you provide your spacy-llm config?

Jan 19 '24 11:01 rmitsch

Hi @rmitsch,

spacy-llm-config.cfg:

[paths]
examples = "./assets/examples.json"
template = "./assets/prompt_template.txt/"

[nlp]
lang = "en"
pipeline = ["llm"]

[components]

[components.llm]
factory = "llm"
save_io = true

[components.llm.task]
@llm_tasks = "spacy.NER.v3"
labels = ["ORG", "PERSON"]
description = "Entities are the names of company,
    associated brands, people.
    Adjectives, verbs, adverbs are not entities.
    Pronouns are not entities."

[components.llm.task.template]
@misc = "spacy.FileReader.v1"
path = "${paths.template}"

[components.llm.task.label_definitions]
ORG = "Extract the names of the companies and associated brands, e.g. ...."
PERSON = "Extract the people's names, e.g. ...."

[components.llm.task.examples]
@misc = "spacy.FewShotReader.v1"
path = "${paths.examples}"

[components.llm.model]
# Also tried to use "spacy.GPT-3-5.v3", spacy.GPT-3-5.v1,  "spacy.GPT-4.v1", "spacy.GPT-4.v3"
@llm_models = "spacy.GPT-4.v1"
# Also tried to use "gpt-3.5-turbo", "gpt-4"
name = "gpt-4"
config = {"temperature": 0.3}

[components.llm.cache]
@llm_misc = "spacy.BatchCache.v1"
path = "local-ner-cache"
batch_size = 4
max_batches_in_mem = 10

Next, I see that the number of API requests is 567, which is much more than the total number of samples I annotated, which is 370:

Not sure, but the issue may be related to a timeout during the API call to OpenAI.

Jan 19 '24 12:01 nikolaysm