Ashish Sam T George
Ashish Sam T George
I had to store the `Content` object in my database. A quick hack I did was to use [jsonpickle](https://pypi.org/project/jsonpickle/) to serialize the object into JSON.
@tylermaran Please have a look at this PR.
Hi @tylermaran! I wanted to check in on my PR #44. If you have any feedback, I’d love to hear it—just making sure it hasn’t gotten lost in the shuffle!...
Hey @tylermaran This seems exciting. Would love to work on it. I think tweaking the system prompt would do the trick.
Hey @tylermaran I played around with the prompts for sometime. I was able to get the bounding boxes back but it is not 100% accurate. Some boxes are off by...
 It is able to identify the sections: heading, paragraph, paragraph, table. But, the bounding boxes become more inaccurate when there are more data in the page.
Seems like we need to go with a different approach. This is the flow that I have in mind: 1. Get the different sections and the corresponding markdown, using GPT...
> Hey @getwithashish! This is really promising. Can you share the prompts you were using to get the bounding boxes returned? @tylermaran **System Prompt 1:** Convert the following PDF page...
Hey @pradhyumna85, You’re right—there’s no guarantee that the sections identified through OCR will align perfectly with those derived from markdown. Moreover, using vision models for each bounding box crop is...
> @getwithashish This would be something interesting. On the similarity part, have a look at this research paper which interestingly use DTW for the same: [Measuring text similarity with dynamic...