Is it an idea to reduce the complexity of the main entity extraction prompt to allow for "possible" higher entity extraction quality for bigger and smaller models?
Do you need to ask a question?
- [ ] I have searched the existing question and discussions and this question is not already answered.
- [x] I believe this is a legitimate question, not just a bug or feature request.
Your Question
The main entity extraction prompt now looks at the chunk and extracts entities with descriptions and numbers and then as well relationships between those entities also with weights etc.
Then it needs to formulate everything in the right format for the extraction process. And if the chunks language doesn't match the language of the prompt it might start listing entities like: {Entity: Hombres, desc: Spanish word for men} instead of what the word implies in the context.
For the language problem there are two approaches. One is to have the prompt instructions in exactly the same language as the incoming file. Or to do a translation step, but this might cause the AI to miss information.
Then to reduce complexity it might be better to first let an AI extract entities and relationships in plain text with weight and all that is needed and in the final AI call make an AI formulate it according to the delimiters.
This is just an idea and I haven't validated it yet, but usually this is how you make a smaller AI perform on the same level as a LLM. By reducing the needed intelligence for the task.
Let me know what you think of this idea. It will definitely result in more calls and more tokens, but I think that it definitely helps AIs.
Currently, I have mostly been testing with "gpt-4.1-mini".
Additional Context
No response