grobid icon indicating copy to clipboard operation
grobid copied to clipboard

Title is missing ; Results not matching with Web Service and Web Application

Open rajeshkumargp opened this issue 7 years ago • 4 comments

Hi, I tried to convert PDF to Full Text Document.
The title of the document is missing in the extracted xml document. Here are my trails.

Trail 1:
In Webapp , the title statement is missing.

<titleStmt>
              <title level="a" type="main"></title>
  </titleStmt>

In Webapp, with Consolidate header option enabled,

<titleStmt>
                <title level="a" type="main">  Document Title    </title>
    </titleStmt>

Trail 2: From CURL Command,

  1. curl -v --form input=@./ASample.pdf http://172.16.28.52:8900/api/processFulltextDocument

  2. curl -v --form input=@./ASample.pdf consolidateHeader=1 http://localhost:8900/api/processFulltextDocument

  3. curl -v --form input=@./ASample.pdf --form consolidateHeader=1 http://localhost:8900/api/processFulltextDocument

For all three, I got

 <titleStmt>
                <title level="a" type="main"></title>
    </titleStmt>

Title is missing.

Please guide/suggest me steps to improve/get title fileld in results. Batch mode is also fine.

rajeshkumargp avatar Aug 01 '18 05:08 rajeshkumargp

Hello @rajeshkumargp !

Thanks for reporting the problem, could you add the PDF (or send it to me by email if it is not public) so that we can reproduce the error?

kermitt2 avatar Aug 01 '18 05:08 kermitt2

Please refer the PDF in the below link. http://www.jpma.org.pk/PdfDownload/8618.pdf

rajeshkumargp avatar Aug 01 '18 06:08 rajeshkumargp

Reading order issue, the title comes at the end of the page in the PDF stream and for some obscure reasons it vanishes in the limbos.

kermitt2 avatar Jul 05 '19 21:07 kermitt2

@rajeshkumargp any chance you still have the PDF document?

lfoppiano avatar Nov 25 '24 08:11 lfoppiano