robotframework-doctestlibrary icon indicating copy to clipboard operation
robotframework-doctestlibrary copied to clipboard

Question: is it feasible to compare two MS office documents?

Open fengnex opened this issue 4 years ago • 4 comments

As the above title suggests, I wonder whether it is feasible to compare the contents of two MS office documents like word or ppt and get the location of the difference, and then output a comparison picture containing the found difference.

Any response will be preferred. Thanks in advance.

fengnex avatar Oct 20 '21 11:10 fengnex

Yes, this should be possible. But as the library is focused on .PDF or Image comparisons, it would mean we need to convert those .pptx or .docx files to PDF (or PNG) first. (I would recommend .PDF).

To try it yourself:

  • Export the .pptx file in PowerPoint as .pdf (FILE > Export > Create PDF)
  • Add a small change and export again as .pdf
  • Compare both .pdf files using the library

manykarim avatar Oct 20 '21 14:10 manykarim

Thank you very much@manykarim Although there should be some ways to automatically convert Office documents into PDF files, which would then help us utilize the library, maybe it would be helpful and enrich the library's function if we can compare two Office documents.

But perhaps there is a certain difficulty when drawing a rectangle onto a Word document since it would change the layout of content, so maybe PDF is a better option in such a case.

fengnex avatar Oct 21 '21 14:10 fengnex

I could think about it. However there are already libraries o there to do the conversion from e.g. word to PDF. E.g. https://rpaframework.org/libraries/word_application/ Maybe it's worth checking those out first. I want to avoid some parallel/double development there

manykarim avatar Oct 21 '21 14:10 manykarim

Also this approach using pure python looks simple.. https://stackoverflow.com/questions/6011115/doc-to-pdf-using-python

manykarim avatar Oct 21 '21 14:10 manykarim