unable to convert html to json using powershell if there are nested tables in html
I am having a drift report from Microsoft 365 DSC in HTML format which I want to change it to json format. I have written a PowerShell function to do the conversion, but it is unable to parse nested tables in html. If there are no nested tables, then it works fine, but if there are nested html table, then it doesn't work. I am using regex expression like this -
rawdata = Get-Content "$htmlpath\DriftReport.html" [regex]$regex = "<table.?>(.?)" $_tables = [regex]::matches($rawdata,$regex).groups.value| Where-Object -Property Length -GT 1000 $_tables = $tables | Where-Object {$ -match "Property"} $_tables = $_tables |Get-Unique
How can this issue be fixed? Any suggestions is appreciated.
If you have the original files from where the drift was created, then you can simply generate the export in the JSON format instead of HTML. For your case, you‘d need to recursively go ober the content of the table and check, if the element itself contains another table.
We can only give you support for things we develop - A custom conversion from html to json is not something we can do.
Hi @FabienTschanz , thanks for your response. I understand that you cannot support this, but can you suggest some way how to convert complex/ nested html tables to json via powershell using regex expression or if there is any other option to do it?
I'm really not sure. Since I don't know the specifics of the HTML content myself very well, I'd figure you feed the content of it to ChatGPT or some other AI assistant, give it the code you have, and let it create a nested version for you. This would be my way how to approach it. The HTML report is really not made to be reversable. If you can fetch the original versions from where the report was generated, this would be the best thing.
Can we close this issue?
Closing the issue.