Open-XML-SDK icon indicating copy to clipboard operation
Open-XML-SDK copied to clipboard

Bug with Open XML SDK

Open kkurhekar10 opened this issue 2 years ago • 4 comments

We are doing some String manipulation in our code using OPEN XML and Facing issue. Text In Input File : “Figure 4. Word and other agencies like IRCC look this for strategy and technical solutions. “ Text in output File after manipulation: “Figure 4. Word and other agencies l<> IRCC look this for strategy and technical solutions.” Here IRCC should be replace with <<Client>>, but “ike” is getting replaced instead.

Figure 4 is a link here to some figure. When read in Open XML , it is read as REF _Ref12123123

kkurhekar10 avatar Feb 08 '23 15:02 kkurhekar10

@kkurhekar10, please provide the relevant parts of your code, eg., in the form of a unit test, and the Open XML markup (e.g, a Word document) so that we can reproduce the behavior.

ThomasBarnekow avatar Feb 08 '23 17:02 ThomasBarnekow

File.docx Attached is the sample source file, output file and code snippet. static void Main(string[] args)         { File_Output.docx

            string filepath = "C:/Sample/";             string Filename = "File.docx";             string SrcFilename = filepath + Filename;             string DstFilename = @"C:\Sample" + "File" + "_" + DateTime.Now.ToString("yyyyMMdd_HH_mm_ss") + ".docx";             File.Copy(SrcFilename, DstFilename, true);             if (System.IO.File.Exists(DstFilename))             {                 using (WordprocessingDocument wDoc = WordprocessingDocument.Open(DstFilename, true))                 {                     if (wDoc != null)                     {                         XDocument xDoc = wDoc.MainDocumentPart.GetXDocument();                         if (xDoc != null)                         {                             string User = "user1";                             IEnumerable<XElement> content = xDoc.Descendants(W.p).Take(1);                             string inputText = "IRCC";                             string replacedText = "<>";                                                         Regex regex = new Regex(inputText);                             //content.Remove()                             int count = OpenXmlRegex.Replace(content, regex, replacedText, null, true, User);                             content = xDoc.Descendants(W.p).Take(1);                             wDoc.MainDocumentPart.PutXDocument();                             wDoc.Close();                             wDoc.Dispose();                         }                     }                 }                 byte[] byteArray = File.ReadAllBytes(DstFilename);             }         }

kkurhekar10 avatar Feb 09 '23 11:02 kkurhekar10

Couple thoughts and questions:

  • Your issue is not related to the Open XML SDK but the Open XML PowerTools. You are using the OpenXmlRegex class to do the replacement.
  • Did you test your regular expression (e.g., using "normal" .Net classes or Regel testers) to confirm that it works?
  • Did you look at the Open XML markup before and after replacement to confirm that it looks as expected? For example, the characters < and > are reserved in XML and must be escaped.

ThomasBarnekow avatar Feb 09 '23 11:02 ThomasBarnekow

@kkurhekar10, could you please test your code with a different replacement text that does not contain any characters that are reserved in XML? For example, use "TEST".

ThomasBarnekow avatar Feb 09 '23 13:02 ThomasBarnekow