docx icon indicating copy to clipboard operation
docx copied to clipboard

Order of paragraphs and tables

Open Prigin opened this issue 4 years ago • 4 comments

Problem

I need to get all paragraphs and tables in order they have in docx file. Is there any way I can do this?

Solution

May be just one index for paragraph objects and table objects will be enough.

Prigin avatar Nov 17 '21 15:11 Prigin

You mean that at the latest version of this gem Document#paragraphs returns paragraphs in wrong order, right? Could you give us a docx file to reproduce this behavior if you have? The file would help us to investigate what happens.

Thanks

satoryu avatar Nov 18 '21 02:11 satoryu

Not exactly. :) Sorry for not being transparent. Lets say I have a docx that I want to convert to txt:

image

I need to know place of each element(paragraphs and tables). How to get the same order of elements they have in DOCX? Or maybe they already have that method(which returns order number from doc). I cant actually find it :(

Prigin avatar Nov 18 '21 07:11 Prigin

I was able to do this as followed. I'm using private vars/methods, but if they open up more APIs in the future, we won't have to.

    doc = Docx::Document.open(file)
    doc.instance_variable_get("@doc").xpath('//w:document//w:body').children.each do |c|
      if c.name == 'p' # paragraph
        p = doc.send(:parse_paragraph_from, c)      
      elsif c.name = 'tbl' # table
        t = doc.send(:parse_table_from, c)  
      else # other types?
      end
    end

aunghtain avatar Mar 10 '22 21:03 aunghtain

if u just want text, u don't need to parse them as paragraph/table. u can just get as "c.content"

aunghtain avatar Mar 10 '22 22:03 aunghtain