pypandoc
pypandoc copied to clipboard
Ignoring Alt Text when convert from docx to txt
Currently, when I convert from docx to txt, the alt text of images is retrieved along with the paragraphs as something like "[ALT TEXT]", how do I exclude alt text? Here is my code pypandoc.convert_file(docx_path, 'plain', extra_args=['--wrap=none'], outputfile='output.txt')
From the pandoc user guide:
A link immediately preceded by a ! will be treated as an image. The link text will be used as the image’s alt text:

![movie reel]
[movie reel]: movie.gif
Extension: implicit_figures
An image with nonempty alt text, occurring by itself in a paragraph, will be rendered as a figure with a caption. The image’s alt text will be used as the caption.

[...]
If you just want a regular inline image, just make sure it is not the only thing in the paragraph. One way to do this is to insert a nonbreaking space after the image:
\