AdvancedLiterateMachinery icon indicating copy to clipboard operation
AdvancedLiterateMachinery copied to clipboard

Question about extracting labels of PDF Elements

Open Samyssmile opened this issue 5 months ago • 0 comments
trafficstars

First of all, thank you for making these models available—great work!

I have tried several AI models that extract content from PDFs and identify its type—e.g.,

  • text
  • title
  • list
  • table
  • figure.

The problem is that I haven’t yet found a model that correctly recognizes the hierarchy of headings, such as H1, H2, and H3. Can any of your models do that? So what I need looking for is a way to detect

  • text
  • title
  • list
  • table
  • figure.
  • H1
  • H2
  • H3
  • H4

Is it possible with one of your model?

Samyssmile avatar Jun 20 '25 15:06 Samyssmile