docfx icon indicating copy to clipboard operation
docfx copied to clipboard

PDF header signature not found (Error happen when conversion toc.json to Pdf)

Open marco-bertschi opened this issue 6 years ago • 16 comments

Operation System: Windos 10

DocFX Version Used: 2.43.2 Template used: default

Steps to Reproduce:

  1. Take artifacts from documentation
  2. Run DocFX
  3. Error:[PdfCommand.PDF]Error happen when converting pdf/toc.json to Pdf. Details: System.AggregateException: Mindestens ein Fehler ist aufgetreten. ---> iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.

Expected Behavior: No error

Actual Behavior: Error

I've tried to add PDF generation wo one of my own docFX repos, but the build failed with the error above. After that I downloaded the artifacts mentioned above, yet no avail: The build still fails, even with the standard documentation.

marco-bertschi avatar Aug 16 '19 10:08 marco-bertschi

@superyyrrzz is there any progress on this? Would be nice to have at least a working example

marco-bertschi avatar Sep 27 '19 19:09 marco-bertschi

Which version of wkhtmltox are you using?

I tested it with wkhtmltox v0.12.5-1.msvc2015-win64 and it worked without issues.

Try passing --logLevel Verbose to docfx.exe and share the output please.

Thank you.

icnocop avatar Mar 25 '20 23:03 icnocop

I have received this error when I pass in an html file with invalid references to the wkhtmltopdf cover parameter.

For example,

    "wkhtmltopdf": {
      "additionalArguments": "--quiet cover \"C:\\contains invalid references.html\""
    }

If the html contains a relative file path like <img src="../test.png" /> and test.png can't be found, then the error occurs.

icnocop avatar Mar 31 '20 06:03 icnocop

https://github.com/dotnet/docfx/issues/4488 seems related.

icnocop avatar Mar 31 '20 08:03 icnocop

@icnocop I was using the version available from chocolatey.org, which was at the time of testing 0.12.5.

marco-bertschi avatar Mar 31 '20 08:03 marco-bertschi

Sorry, I couldn't reproduce.

Make sure your current directory is in the same folder as docfx.json.

Steps I took:

  1. Download walkthrough3.zip and extract to c:\walkthrough3

  2. Download wkhtmltox-0.12.5-1.msvc2015-win64.exe and extract to c:\wkhtmltox

  3. Copy c:\wkhtmltox\bin\wkhtmltopdf.exe to c:\walkthrough3\wkhtmltopdf.exe

  4. Download docfx.zip and extract to c:\docfx

  5. Open a command prompt and run the following commands:

cd C:\walkthrough3
c:\docfx\docfx.exe docfx.json --logLevel Verbose

icnocop avatar May 02 '20 22:05 icnocop

I'm unable to spend any more time on this as per my employer, please close the issue.

marco-bertschi avatar May 04 '20 07:05 marco-bertschi

I've run into this issue as well. It only happens on a Windows runner on GitHub Actions; it works fine locally. All versions are the same between the runner and my local machine (DocFX 2.56.2, wkhtmltopdf 0.12.6).

alexrp avatar Sep 07 '20 07:09 alexrp

@alexrp can you provide a repository that can reproduce the issue?

Thank you.

icnocop avatar Sep 07 '20 08:09 icnocop

@icnocop over here: https://github.com/flare-lang/flare-lang.github.io

Note that I removed PDF generation from CI in https://github.com/flare-lang/flare-lang.github.io/commit/1d905f487328f578062b08c41f7da15c16f9f085. You'll need to revert that commit to reproduce.

Here's an example of a run where the problem occurred: https://github.com/flare-lang/flare-lang.github.io/runs/1080331536?check_suite_focus=true#step:7:45

alexrp avatar Sep 07 '20 09:09 alexrp

Thank you, @alexrp.

I was able to reproduce the issue using your repo.

I was able to "work-around" the issue by specifying "noStdin": true in docfx.json as follows:

{
    ...
    "pdf": {
        ...
        "noStdin": true,
        ...
    }
}

Example commit: https://github.com/icnocop/flare-lang.github.io/commit/d34621a4720ccda53c153a1232956a815ad590be

Example build: https://github.com/icnocop/flare-lang.github.io/runs/1372135172

For reference, see: https://github.com/dotnet/docfx/issues/4488

icnocop avatar Nov 09 '20 04:11 icnocop

Hello @icnocop!!

I just started using this for documentation for my library and so far it is great, but ran into this issue.

The versions that I used when I ran into the issue is below:

  1. docfx 👉🏼 v2.59.2.0
  2. wkhtmltopdf 👉🏼 v0.12.6 (with patched qt)

I did indeed get it working by adding noStdin: true to the pdf section of the docfx.json. My questions are this:

  1. Is this an "issue" that is in the works on getting fixed and this is just a workaround?
  2. I did not see anything about noStdin in the walkthrough or anything and stumbled on this issue for hours, if this is not a workaround and it is meant to be used like this, is the documentation/tutorial on the website going to be updated?
  3. Is this a windows only thing? I did notice that somebody in the comments mentioned that they only ran into the issue with a windows runner with GitHub actions.

Just for clarity and to hopefully help with the issue, below is the error I got in windows terminal.

[22-05-12 03:40:51.420]Error:[PdfCommand.PDF]Error happen when converting pdf/toc.json to Pdf. Details: System.AggregateException: One or more errors occurred. ---> iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IEnumerable`1 source, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Action`3 bodyWithStateAndIndex, Func`4 bodyWithStateAndLocal, Func`5 bodyWithEverything, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.ForEach[TSource](IEnumerable`1 source, ParallelOptions parallelOptions, Action`1 body)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.GetPartialPdfModels(IList`1 htmlFilePaths)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.ConvertOutlines()
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.GetOutlines()
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.SaveCore(Stream stream)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Save(String outputFileName)
   at Microsoft.DocAsCode.HtmlToPdf.ConvertWrapper.<>c__DisplayClass7_0.<ConvertCore>b__1(ManifestItem tocFile)
---> (Inner Exception #0) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #1) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #2) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #3) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #4) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #5) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #6) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #7) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<

Cheers!!

CalvinWilkinson avatar May 12 '22 15:05 CalvinWilkinson

Hi @CalvinWilkinson.

  1. Is this an "issue" that is in the works on getting fixed and this is just a workaround?

I'm not exactly sure if this is an issue in docfx, wkhtmltopdf, or iTextSharp for example.

  1. I did not see anything about noStdin in the walkthrough or anything and stumbled on this issue for hours, if this is not a workaround and it is meant to be used like this, is the documentation/tutorial on the website going to be updated?

The tutorial seems to work without issues for some users, so I'm not exactly sure the tutorial is the actual issue. I'm sure the docfx project maintainers will provide feedback to a pull request to update the tutorial if this is an issue.

  1. Is this a windows only thing? I did notice that somebody in the comments mentioned that they only ran into the issue with a windows runner with GitHub actions.

It could be a Windows only and/or a GitHub action only thing; sorry, I'm not exactly sure what the underlying issue is. I've personally only used docfx on Windows.

I'm interested to know if the same error occurs when wkhtmltopdf is replaced with another compatible exe and noStdin: true is removed.

For example, I'm using https://github.com/icnocop/HtmlToPdf instead of wkhtmltopdf and it meets my requirements. HtmlToPdf is not 100% compatible with wkhtmltopdf, and that's okay because I don't use all the features of wkhtmltopdf with docfx anyways. Disclaimer: I'm the creator of https://github.com/icnocop/HtmlToPdf.

If HtmlToPdf works instead of wkhtmltopdf, then the issue seems to be in wkhtmltopdf or iTextSharp.

Thank you.

icnocop avatar May 15 '22 20:05 icnocop

Ok. Sounds good.

Thanks for your response!!

CalvinWilkinson avatar May 16 '22 12:05 CalvinWilkinson

I have the same issue. Everything is OK locally, but have the problem within Azure Pipelines.

melanchall avatar May 17 '22 12:05 melanchall

I have the same issue. Everything is OK locally, but have the problem within Azure Pipelines.

I have the issue locally and in GitHub actions.

CalvinWilkinson avatar May 18 '22 22:05 CalvinWilkinson

Addressed in v2.73.0 with a new PDF engine.

yufeih avatar Nov 02 '23 14:11 yufeih