llm-archetype-batch-use-case
llm-archetype-batch-use-case copied to clipboard
General solution to archetype LLM batch use case
💬 LLM batch
General solution to an archetype batch use case for LLMs.
For a given set of input documents (pdf
or txt
), we apply an LLM to extract the relevant information and store it in a structured format (json
). The outputs are validated with Pydantic.
Getting started 💻
-
Clone this repo
-
Install the dependencies with Poetry:
poetry install
-
Configure your
.env
- Copy the
.env.example
file to.env
- If you're from Xebia, use
OPENAI_API_BASE="https://xebia-openai-us.openai.azure.com"
- Copy the OpenAI key from Azure
- Copy the
-
Update the
schema.yaml
file to your needs. This file defines the output structure of the LLM. -
Upload your documents in
data/input_txt
ordata/input_pdf
. -
If you have uploaded pdf documents, run
poetry run llm-batch preprocess
, otherwise skip this step -
Run
poetry run llm-batch run
to process your documents. -
Check the results in the
output
folder! 🎉
To configure input parameters such as paths, run poetry run llm-batch run --help
or poetry run llm-batch preprocess --help
to see what's possible.