gotenberg-js-client icon indicating copy to clipboard operation
gotenberg-js-client copied to clipboard

How to convert multiple Office to pdf and merge at the same time?

Open william-gense opened this issue 2 years ago • 4 comments

With curl, we can perform something like this curl --request POST http://localhost:3000/forms/libreoffice/convert --form [email protected] --form [email protected] --form merge=true -o out.pdf

Is that possible to do equivalent in this library? If I try using merge and office together, i get the following message: Error: Cannot set "Merge" conversion, already set to "Office"

In addition, passing multiple Office files (docx only here) together is also problematic. Here is example:

import * as got from 'gotenberg-js-client'
const toPDF = got.pipe(
  got.gotenberg(''),
  got.convert,
  got.office,
  got.adjust({
    //adjust due to v7
    url: 'http://localhost:3000/forms/libreoffice/convert',
  }),
  got.please,
);
const pdf = await toPDF([
  'file://in1.docx', 
  'file://in2.docx', 
  'file://in3.docx'
]);
pdf.pipe(fs.createWriteStream('out.pdf'))

There is no problem if just having 1 file.

If I have only 2 files, it complain with Error: Source name "file://in1.docx" doesn't look like file name

If I have 3 or more, it run but the saved file is not valid PDF.

I believe I am not converting multiple files in right way, but I cannot find any documentation of multiple docx file conversion. The HTML one also seems to be one web page as well (but with other assets file to render so multi-files).

Currently, one workaround is convert each of them to PDF one by one, and trigger PDF merge then. However, this will result extra bandwidth for each separate PDF download and upload again for merging.

william-gense avatar Sep 28 '22 04:09 william-gense

Hello! Try something like this (I could try it in about 5-6 hours, now away from my laptop):

const toPDF = got.pipe(
  got.gotenberg(''),
  got.merge,
  got.adjust({
    // manually adjust endpoint, because
    // gotenberg:7 has different one
    url: 'http://localhost:3000/forms/libreoffice/convert',

    // manually adjust for fields
    fields: {
      merge: true,
      // `merge` is not valid field for gotenberg:6
      // so we have to cast to any, otherwise typescript will complain
    } as any, // if you don't use typescript, just remove `as any` casting
  }),
  got.please,
);

const pdf = await toPDF([
  ['in1.docx', 'file://in1.docx'], 
  ['in2.docx', 'file://in2.docx'], 
  ['in3.docx', 'file://in3.docx']
]);

yumauri avatar Sep 28 '22 05:09 yumauri

Strange though, multiple files should work like this

const pdf = await toPDF([
  'file://in1.docx', 
  'file://in2.docx', 
  'file://in3.docx'
]);

I'll check this explicitly later. With two filenames, I guess, it mistakes this with tuple, where first item should be filename, and the second one — file URI...

yumauri avatar Sep 28 '22 05:09 yumauri

Hello! Try something like this (I could try it in about 5-6 hours, now away from my laptop):

Tried the solution. It result 500 Internal Server Error. From server side, it result segmentation fault on server side. {"bytes_in":456334, "bytes_out":21, "host":"127.0.0.1", "latency":1.517660359E9, "latency_human":"1.517660359s", "level":"error", "logger":"api", "method":"POST", "msg":"convert to PDF: unoconv PDF: unix process error: wait for unix process: signal: segmentation fault", "path":"/forms/libreoffice/convert", "referer":"", "remote_ip":"xx.xxx.xxx.xxx", "status":500, "trace":"xxxx-xxxxx-xxxxx-xxx", "ts":1.664347126993072E9, "uri":"/forms/libreoffice/convert", "user_agent":""}

Strange though, multiple files should work like this

const pdf = await toPDF([
  'file://in1.docx', 
  'file://in2.docx', 
  'file://in3.docx'
]);

I'll check this explicitly later. With two filenames, I guess, it mistakes this with tuple, where first item should be filename, and the second one — file URI...

However, if I change last part to this, it is working now! Thank you very much.

Still, if passing two only, it may result error. I also check the documentation, one example is

// tuple with FileURI
const pdf = await toPDF(['index.html', 'file://index.html'])

Perhaps when passing two files, it confused with this case.

william-gense avatar Sep 28 '22 06:09 william-gense

Hm 🤔 I've tried locally, and in my case all three variants are working the same:

Plain array:

toPDF([
  `file://${__dirname}/in1.docx`,
  `file://${__dirname}/in2.docx`,
  `file://${__dirname}/in3.docx`,
])

Array of tuples:

toPDF([
  ['in1.docx', `file://${__dirname}/in1.docx`],
  ['in2.docx', `file://${__dirname}/in2.docx`],
  ['in3.docx', `file://${__dirname}/in3.docx`],
])

Object:

toPDF({
  'in1.docx': `file://${__dirname}/in1.docx`,
  'in2.docx': `file://${__dirname}/in2.docx`,
  'in3.docx': `file://${__dirname}/in3.docx`,
})

Maybe last one with an object is most robust, it will not confuse with tuple when just two files are used. Note, that Gotenberg will merge files alphabetically, so, using filenames keys in object you can adjust ordering.

yumauri avatar Sep 28 '22 10:09 yumauri

I'll close this issue, feel free to reopen it, if you still encounter this problem!

yumauri avatar Jun 06 '23 20:06 yumauri