PHPDocumentParser
PHPDocumentParser copied to clipboard
[Question] About end of file
Hi. Thank you for create PHPDocumentParser.
I am using it in my project and the content is extracted right.
But always there are some strange characters at the end (for .doc
files):
Do you have some idea to fix it?
I will take a look as soon as. Could you send through the original doc please
@JCarlosR This is going to take a little bit longer to investigate than I anticipated! I'll try to get back to you tomorrow! Thanks for the kind words by the way!
@JCarlosR btw, as these docs are public, are you sure that you still want them up here, or are you okay with that?
Don't worry, I have no problem with that. And thank you for your time.
@JCarlosR In your environment, do you have antiword
installed? If you have the ability to install this, please do whilst I try and fix this the vanilla way
@LukeMadhanga Yes. I have typed antiword
in the cmd (I am using Windows) and it shows:
Name: antiword
Purpose: Display MS-Word files
Author: (C) 1998-2005 Adri van Os
Version: 0.37 (21 Oct 2005)
Status: GNU General Public License
Usage: antiword [switches] wordfile1 [wordfile2 ...]
Switches: [-f|-t|-a papersize|-p papersize|-x dtd][-m mapping][-w #][-i #][-Ls]
-f formatted text output
-t text output (default)
-a <paper size name> Adobe PDF output
-p <paper size name> PostScript output
paper size like: a4, letter or legal
-x <dtd> XML output
like: db (DocBook)
-m <mapping> character mapping file
-w <width> in characters of text output
-i <level> image level (PostScript only)
-L use landscape mode (PostScript only)
-r Show removed text
-s Show hidden (by Word) text
I have an Ubuntu vps and there, the extracted content also has the strange characters at the end. Thank you.