paper-qa
paper-qa copied to clipboard
Performance with Scientific Spec Sheets
Has anyone tried this tool with scientific spec sheets like those provided for manufacturing tools, probes, measurement devices, etc.
Is there anything in the way the models are prompted or the way the data is processed that would make that not work well?
Hey there, @istranic! I'm a bot here to help you with bug fixes, answer questions, and guide you on becoming a contributor. Let's tackle this issue together while we wait for a human maintainer.
The tool has specific configuration options for processing scientific or technical documents, which can help in handling scientific spec sheets effectively. These include:
-
Parsing Configuration:
use_doc_details: Set totrueto utilize document details during parsing.chunk_size: Set to7000to define the size of each chunk of the document.overlap: Set to250to specify the overlap between chunks.
-
Answer Configuration:
evidence_k: Set to8to determine the number of evidence pieces to consider.answer_max_sources: Set to3to limit the maximum number of sources for an answer.max_concurrent_requests: Set to8to control the number of concurrent requests.
-
Prompts Configuration:
use_json: Set totrueto enable JSON format for prompts.
-
LLM Configuration:
- Various rate limits are specified for different models, such as
gpt-4o,gpt-4-turbo, andgpt-3.5-turbo, with limits ranging from40,000to2,000,000per minute.
- Various rate limits are specified for different models, such as
-
Embedding Configuration:
rate_limit: Set to1,000,000per minute.
These configurations are designed to manage resources and performance effectively, which should help in processing scientific spec sheets. However, there is no specific information on limitations or issues with highly technical or domain-specific language in the repository [1][2].
To continue talking to Dosu, mention @dosu.