open-parse 🚀 Roadmap

Description

This is a tentative roadmap, I will update it as things evolve.

Roadmap

High Priority:

[x] Implement unitable
[x] Enable OCR support
[ ] Different embedding providers [in-progress]
[ ] Better table detection
[x] LlamaIndex integration

Long Term:

[ ] Create a docker image with fastapi for non python users
[ ] Add support for ImageElements
[ ] More automated eval suite
[ ] Better OCR provider
[ ] Speed up parsing. Due to the way we construct TextSpan this can be quite slow especially on documents with tons of tables
[ ] Add embed_text property, useful on tables where embedding the contents performs poorly

Mar 27 '24 23:03 Filimoa

Hey @Filimoa do you plan to add support for unitable anytime soon? Seems like the doc mentions it but the notebook does not have an example for it. Thanks for creating this project.

Mar 31 '24 01:03 shekhars-li

Hey @Filimoa do you plan to add support for unitable anytime soon? Seems like the doc mentions it but the notebook does not have an example for it. Thanks for creating this project.

As soon as the pre-trained weights are released I'll be adding it. I talked with the ShengYun earlier this week and sounds like they'll be released ASAP.

Mar 31 '24 04:03 Filimoa

@Filimoa Looks like pretrained weights are available now! :)

Apr 04 '24 19:04 shekhars-li

In progress! Should be merged in by the end of the week.

Apr 04 '24 22:04 Filimoa

Just merged - try it out, it will require downloading weights which you can read about here. We need to find a better model for table detection but this performs incredibly well otherwise.

Apr 05 '24 04:04 Filimoa

Hey @Filimoa! Really great project!! Have you thought about using open source models for the semantic processing? You can find even better embedding models here: https://huggingface.co/spaces/mteb/leaderboard Especially this one is really promising (only 0.67GB & better than text-embedding-3-large): https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1 There are also ONNX models, running pretty fast on CPUs.

Apr 08 '24 09:04 Ulipenitz

Added to the roadmap! Will ship very soon @Ulipenitz

Apr 08 '24 14:04 Filimoa

Would be great to support Azure OpenAI as well.

Apr 09 '24 22:04 cthompson-insight

Hey @Filimoa ! Have you try PaddleOCR ? As for me, this project have well performance for Layout Analysis and Table Recognition

Jul 22 '24 09:07 zishengwu

open-parse open-parse copied to clipboard

🚀 Roadmap

Description

Roadmap

open-parse
open-parse copied to clipboard