Cannot extract image from pptx, docx
go-fitz version: v1.24.15
file:
Code
package main
import (
"fmt"
"image/jpeg"
"os"
"github.com/gen2brain/go-fitz"
)
func main() {
doc, err := fitz.New("example.docx")
if err != nil {
panic(err)
}
defer doc.Close()
// Extract pages as images
img, err := doc.Image(0)
if err != nil {
panic(err)
}
f, err := os.Create("example.jpg")
if err != nil {
panic(err)
}
err = jpeg.Encode(f, img, &jpeg.Options{jpeg.DefaultQuality})
if err != nil {
panic(err)
}
f.Close()
}
Stdout output:
warning: dropping unclosed output
The output image will be a blank jpg: example.jpg
I misunderstood; I thought go-fitz could extract images from a document. Close it.
@amikai Yes, that is precisely what is possible; it's stated in the README. For broken files, you probably need a later version (I recall a changelog mentioning this).
@gen2brain Do you mean I can extract images from .docx, .xlsx, and .pptx files using go-fitz? In my case, I created the file using Microsoft Word on a Mac. I just want to understand why these files are broken.
Yes, you do not extract embedded images if that is what you think. It will render the documents, and you can export page by page as an image. I have no idea why or what is broken; this is a wrapper for MuPDF, and I don't know the internals. It may have already been solved, and you can try using an external library (with a newer version). Anyway, the fix will not happen in this repo, I update bundled libraries from time to time.