uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering
uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering copied to clipboard
LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster R&D!
splitter="fads" in pipeline_pdf.ipynb
writed by Huarong Zhang
In `example/rater/generated_answer.ipynb`. For a input of which true label is `equivalent`, model sometimes generate `accept` or `reject`. So majority vote can give wrong vote. Input: ``` ("Vitamin C (also known...
WIP for now - added TransformQuestionExtractionOpenAIFlow to generate questions from prev reports - added FeedOpenAIFlow to use questions from previous flow to generate responses for news feed - added corresponding...
### 🚀 The feature, motivation and pitch Is it possible to Define the number of Questions-Answers pairs? Also is there an option to load the models in q4 or q8...
### 🐛 Describe the bug I use the base [example extract pdf](https://github.com/CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering/blob/main/example/transform/nougat_huggingface_QAs.ipynb) - I am using nvcr.io/nvidia/pytorch:24.07-py3 docker container - I have installed last Anaconda version - I have a...
### 🚀 The feature, motivation and pitch @CallmeNafiy Per our discussion, we would love to support multi-flow configuration in the future. ### Alternatives _No response_ ### Additional context _No response_