PHPWord icon indicating copy to clipboard operation
PHPWord copied to clipboard

List item values are missing from docx file while convert as HTML file

Open vschavala opened this issue 7 years ago • 25 comments

/* Here is my code*/

$PHPWord = new \PhpOffice\PhpWord\PhpWord();

$PHPWordLoad = \PhpOffice\PhpWord\IOFactory::load($file);

$objWriter = \PhpOffice\PhpWord\IOFactory::createWriter($PHPWordLoad, 'HTML');

$tmpfname = public_path('doczipfiles/temp.html');

$htmlWriter ->save($tmpfname);

vschavala avatar Sep 10 '18 06:09 vschavala

Up to this. I encountered the same issue as well.

akosipau avatar Oct 10 '18 07:10 akosipau

same here

evomer avatar Oct 29 '18 12:10 evomer

I also face the same issue and found that "ListItemRun.php" is missing from the PATH "src\PhpWord\Writer\HTML\Element" which is causing the issue.

I added the file and made the changes. I got the list value but was missing the bullet icon. I am currently trying to fix this issue. In the mean while if anyone want the file, please let me know

vineetagarwal1981 avatar Nov 02 '18 11:11 vineetagarwal1981

+1

lubobill1990 avatar Nov 06 '18 07:11 lubobill1990

@vineetagarwal1981 I could use the listItemRun.php file if you could make it available please. Just the data with out the bullets is sufficient for my needs. Thanks,

bozzit avatar Nov 13 '18 17:11 bozzit

@vineetagarwal1981 I could use the listItemRun.php file if you could make it available please. Just the data with out the bullets is sufficient for my needs. Thanks,

@bozzit Below is the file that you require. Place this file at the PATH "src\PhpWord\Writer\HTML\Element"

Let me know if it's working for you

ListItemRun.zip

vineetagarwal1981 avatar Nov 14 '18 06:11 vineetagarwal1981

@vschavala If you just want to convert html, I find it's better to convert with more specific tools. This is my solution and works better than PHPWord: https://gist.github.com/lubobill1990/701df4becce20af43e9122a26dc52a05

The main purpose of PHPWord is to compose a word document with PHP, but not convert between formats.

lubobill1990 avatar Nov 14 '18 15:11 lubobill1990

@vineetagarwal1981 I could use the listItemRun.php file if you could make it available please. Just the data with out the bullets is sufficient for my needs. Thanks,

@bozzit Below is the file that you require. Place this file at the PATH "src\PhpWord\Writer\HTML\Element"

Let me know if it's working for you

ListItemRun.zip

Hi Yes thank you, If I have time I will attempt to make it output a unordered lists instead of the text within

elements.

at least I'm not loosing the text within the lists by adding this file.

bozzit avatar Nov 15 '18 11:11 bozzit

Same here.

If I generate a word document like here I will get this file structure:

  • _rels
  • theme
  • document.xml
  • fontTable.xml
  • footnotes.xml
  • numbering.xml
  • settings.xml
  • styles.xml
  • webSettings.xml

With HTML generated lists this:

  • _rels
  • theme
  • endnotes.xml
  • fontTable.xml
  • footer1.xml
  • header1.xml
  • settings.xml
  • styles.xml
  • stylesWithEffects.xml
  • webSettings.xml

So you can see there is no numbering.xml. And also if you try to use libreoffice to generate a pdf all lists are empty.

kristl78 avatar Mar 28 '19 12:03 kristl78

@kristl78 https://github.com/PHPOffice/PHPWord/issues/1462#issuecomment-438691752 I'm using this solution and it works well. Hope it can help.

lubobill1990 avatar Apr 03 '19 04:04 lubobill1990

@lubobill1990 thank u but this is unfortunately not enough.

kristl78 avatar Apr 05 '19 11:04 kristl78

I've used the solution from @vineetagarwal1981 but modified it a bit. The list items were not parsed as li tags, which I need for my project.

public function write() { if (!$this->element instanceof \PhpOffice\PhpWord\Element\ListItemRun) { return ''; } $content = '

  • '; $content .= $this->element->getElement(0)->getText(); $content .= '
  • '; return $content; }

    tikumo avatar Aug 01 '19 08:08 tikumo

    I just install PHPWord using composer (6/8/2020) I also see the above problem (loss of list text) when attempting to convert a .docx to .html.
    The version of PHPWord I installed did have the file mentioned above ListItemRun.php in the proper directory. However I still had the error. I also attempted to copy the file ListItemRun.php provided by @vineetagarwal1981 above into the element directory overwriting the installed copy of ListItemRun.php and that generated several exceptions. Therefor I backed that change out.
    Has there been any resolution on how to convert .docx list to Html without losing the text ??

    PhoenixRising2015 avatar Jun 14 '20 11:06 PhoenixRising2015

    I've used the solution from @vineetagarwal1981 but modified it a bit. The list items were not parsed as li tags, which I need for my project.

    public function write() { if (!$this->element instanceof \PhpOffice\PhpWord\Element\ListItemRun) { return ''; } $content = '* '; $content .= $this->element->getElement(0)->getText(); $content .= ''; return $content; }

    Hi Could you provide a little more specifics for example in which php file did you place this code?

    PhoenixRising2015 avatar Jun 14 '20 12:06 PhoenixRising2015

    Hello @PhoenixRising2015! i have the same problem, did you find a solution?

    Hector1567XD avatar Apr 30 '21 22:04 Hector1567XD

    @Hector1567XD

    Just create a file called.

    "ListItemRun.php" in PATH "src\PhpWord\Writer\HTML\Element" With that code in it or look up in this thread there is a link ti a ZIP file with the "ListItemRun.php" in it.

    bozzit avatar May 01 '21 14:05 bozzit

    Same here: I have the following list with numbers in docx file:

    1. a
    2. b
    3. c
    

    After converting to HTML file:

     a
     b
     c
    

    ryanzzeng avatar Dec 19 '21 00:12 ryanzzeng

    for anyone having this issue the solution by @tikumo works

    public function write()
    	{
    if (!$this->element instanceof \PhpOffice\PhpWord\Element\ListItemRun) {
    	return '';
    	}
    	$content = '';
    	$content .= '<ul><li>';
    	$content .= $this->element->getElement(0)->getText();
    	$content .= '</li></ul>';
    	$content .= "\n";
    	return $content;
    }
    

    Replace the function write() in src\PhpWord\Writer\HTML\Element\ListItemRun.php with the code above and it will transform any listItemRun into a li element, however there is no way to create the parent ul for the lists afaik so I modified the function and make every list item a separated list as a temporary solution. If anyone has any solution for making the ul elements please let me know

    Lurtz963 avatar Feb 09 '22 13:02 Lurtz963

    What I ended up doing is: I modified phpoffice/phpword/vendor/phpoffice/phpword/src/PhpWord/Writer/HTML/Element/ListItemRun.php

    protected function writeOpening()
        {
             $content =  sprintf('<li data-depth="%s" data-liststyle="%s" data-numId="%s">',  $this->element->getDepth(),
                                                                                              $this->element->getListFormat($this->element->getDepth()),
                                                                                              $this->element->getListId());
    
            return $content;
        }
    

    Then created my own writer that extends AbstractWriter

    class MyHtmlWriter extends AbstractWriter implements WriterInterface
    {
    .
    .
    .
    
        /**
         * Get content
         *
         * @return string
         */
    
        public function getContent()
        {
            $content = $this->getWriterPart('Body')->write();
    
            $lines = explode(PHP_EOL, $content);
     
            $newcontent = '';
            foreach ($lines as $line)
            {
                if (preg_match('/( |^)<li data-depth/', $line))
                {
                /** use the data-depth, data-liststyle and data-numid to add <ul> </ul> <ol></ol> 
                   * where needed
                   * /
               }
               else
               {
                    $newcontent .= $line;
               }
            }
    
            $content = $newcontent;
    .
    .
    .
            return $content;
        }
    

    Hope this points you @Lurtz963 in the right directions.

    bozzit avatar Feb 09 '22 14:02 bozzit

    What I ended up doing is: I modified phpoffice/phpword/vendor/phpoffice/phpword/src/PhpWord/Writer/HTML/Element/ListItemRun.php

    protected function writeOpening()
        {
             $content =  sprintf('<li data-depth="%s" data-liststyle="%s" data-numId="%s">',  $this->element->getDepth(),
                                                                                              $this->element->getListFormat($this->element->getDepth()),
                                                                                              $this->element->getListId());
    
            return $content;
        }
    

    Then created my own writer that extends AbstractWriter

    class MyHtmlWriter extends AbstractWriter implements WriterInterface
    {
    .
    .
    .
    
        /**
         * Get content
         *
         * @return string
         */
    
        public function getContent()
        {
            $content = $this->getWriterPart('Body')->write();
    
            $lines = explode(PHP_EOL, $content);
     
            $newcontent = '';
            foreach ($lines as $line)
            {
                if (preg_match('/( |^)<li data-depth/', $line))
                {
                /** use the data-depth, data-liststyle and data-numid to add <ul> </ul> <ol></ol> 
                   * where needed
                   * /
               }
               else
               {
                    $newcontent .= $line;
               }
            }
    
            $content = $newcontent;
    .
    .
    .
            return $content;
        }
    

    Hope this points you @Lurtz963 in the right directions.

    I tried this solution but data-depth is always 0 and the rest of the attributes are empty

    Lurtz963 avatar Feb 09 '22 14:02 Lurtz963

    My bad, forgot I had to implement, some of those methods for the other Attributes, and 0 is normal for depth if you don't have nested lists. Top level List is always 0.

    index 6e48a69..ed83162 100644
    --- a/3rdparty/phpoffice/phpword/vendor/phpoffice/phpword/src/PhpWord/Element/ListItemRun.php
    +++ b/3rdparty/phpoffice/phpword/vendor/phpoffice/phpword/src/PhpWord/Element/ListItemRun.php
    @@ -73,6 +73,24 @@ class ListItemRun extends TextRun
             return $this->style;
         }
     
    +    public function getListFormat($depth)
    +    {
    +        if (isset($this->style->bulletListType[$depth]->format))
    +        {
    +            return $this->style->bulletListType[$depth]->format;
    +        }
    +        else
    +        {
    +            return 'bullet';
    +        }
    +
    +    }
    +
    +    public function getListId()
    +    {
    +        return $this->style->numId;
    +    }
    +
    

    bozzit avatar Feb 09 '22 15:02 bozzit

    After a bit of struggle I was able to implement a similar solution @bozzit , for some reason I couldn't use a custom writer (It throws the error that is not a valid writer) so I modified HTML writer. I let the files here in case someone wants to use it or make a better version. ListItemRun.php goes in phpoffice/phpword/src/PhpWord/Writer/HTML/Element and HTML.php goes in phpoffice/phpword/src/PhpWord/Writer

    files.zip

    Lurtz963 avatar Feb 09 '22 16:02 Lurtz963

    After a bit of struggle I was able to implement a similar solution @bozzit , for some reason I couldn't use a custom writer (It throws the error that is not a valid writer) so I modified HTML writer. I let the files here in case someone wants to use it or make a better version. ListItemRun.php goes in phpoffice/phpword/src/PhpWord/Writer/HTML/Element and HTML.php goes in phpoffice/phpword/src/PhpWord/Writer

    files.zip

    Thanks! Your code helped me)) I just added a loop to the function write in ListItemRun.php

    public function write()
        {
            if (!$this->element instanceof \PhpOffice\PhpWord\Element\ListItemRun) {
                return '';
            }
            $content = '';
            $content .= sprintf('<li data-depth="%s" data-liststyle="%s" data-numId="%s">',  $this->element->getDepth(),
                $this->getListFormat($this->element->getDepth()),$this->getListId());
    
            $size_content = $this->element->countElements();
            for ($i=0; $i < $size_content; $i++){
                $content .= $this->element->getElement($i)->getText();
            }
    
            $content .= '</li>';
            $content .= "\n";
            return $content;
        }
    

    CaptBarbarossa avatar Apr 15 '22 08:04 CaptBarbarossa

    It's been 6 years and ListItemRun.php is still not implemented. Pretty crazy.

    In any case, I took @CaptBarbarossa's code and extended it to handle all types of elements in the li, since there is no guarantee that a li only contains text:

    <?php
    
    namespace PhpOffice\PhpWord\Writer\HTML\Element;
    
    /**
     * ListItemRun element HTML writer
     *
     * @since 0.10.0
     */
    class ListItemRun extends TextRun
    {
        public function write()
        {
            if (!$this->element instanceof \PhpOffice\PhpWord\Element\ListItemRun) {
                return '';
            }
            $content = '';
            $content .= sprintf(
                '<li data-depth="%s" data-liststyle="%s" data-numId="%s">',
                $this->element->getDepth(),
                $this->getListFormat($this->element->getDepth()),
                $this->getListId()
            );
    
            $namespace = 'PhpOffice\\PhpWord\\Writer\\HTML\\Element';
            $container = $this->element;
    
            $elements = $container->getElements();
            foreach ($elements as $element) {
                $elementClass = get_class($element);
                $writerClass = str_replace('PhpOffice\\PhpWord\\Element', $namespace, $elementClass);
                if (class_exists($writerClass)) {
                    /** @var \PhpOffice\PhpWord\Writer\HTML\Element\AbstractElement $writer Type hint */
                    $writer = new $writerClass($this->parentWriter, $element, true);
                    $content .= $writer->write();
                }
            }
    
            $content .= '</li>';
            $content .= "\n";
            return $content;
        }
    
        public function getListFormat($depth)
        {
            return $this->element->getStyle()->getNumStyle();
        }
    
        public function getListId()
        {
            return $this->element->getStyle()->getNumId();
        }
    }
    

    The true as the last argument to new $writerClass($this->parentWriter, $element, true); prevents text from being wrapped in <p> tags so that everything inside the li is displayed inline.

    If you're installing this package with composer (like I am), you can use the post-install-cmd hook in your composer.json file to copy this file into ./vendor/phpoffice/phpword/src/PhpWord/Writer/HTML/Element/ListItemRun.php every time the package is installed

    EvanShaw avatar Mar 12 '24 14:03 EvanShaw