InMemoryDatasets.jl icon indicating copy to clipboard operation
InMemoryDatasets.jl copied to clipboard

HTML serialization of dataset produces invalid html

Open ufechner7 opened this issue 1 year ago • 3 comments

Example:

using InMemoryDatasets

const HTML_HEADER = """
<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>
"""
const HTML_FOOTER = """
</body>
</html>
"""

function save_summary(ds)
    fullname = "summary.html"
    file = open(fullname, "w")
    println(file, HTML_HEADER)
    show(file, "text/html", ds)
    println(file, HTML_FOOTER)
    close(file)
end

ds = Dataset(var1 = [1, 2, 3],
                var2 = [1.2, 0.5, 3.3],
                var3 = ["C1", "C2", "C3"])

save_summary(ds)

Content of the file summary.html after running this script:

<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>

<table class="data-set"><thead><tr><th></th><th>var1</th><th>var2</th><th>var3</th></tr><th></th><th>identity</th><th>identity</th><th>identity</th></tr><tr><th></th><th title="Union{Missing, Int64}">Int64?</th><th title="Union{Missing, Float64}">Float64?</th><th title="Union{Missing, String}">String?</th></tr></thead><tbody><p>3 rows × 3 columns</p><tr><th>1</th><td>1</td><td>1.2</td><td>C1</td></tr><tr><th>2</th><td>2</td><td>0.5</td><td>C2</td></tr><tr><th>3</th><td>3</td><td>3.3</td><td>C3</td></tr></tbody></table></body>
</html>

Checking if the file is correct using the program tidy:

ufechner@ubuntu:~/repos/Dataset2Html/data$ tidy -e summary.html 
line 8 column 89 - Warning: missing <tr>
line 8 column 328 - Warning: <p> isn't allowed in <tbody> elements
line 8 column 321 - Info: <tbody> previously mentioned
Info: Document content looks like HTML5
Tidy found 2 warnings and 0 errors!


About HTML Tidy: https://github.com/htacg/tidy-html5
Bug reports and comments: https://github.com/htacg/tidy-html5/issues
Official mailing list: https://lists.w3.org/Archives/Public/public-htacg/
Latest HTML specification: http://dev.w3.org/html5/spec-author-view/
Validate your HTML documents: http://validator.w3.org/nu/
Lobby your company to join the W3C: http://www.w3.org/Consortium

This is not only a theoretical problems, this html fails to render with some renderers.

ufechner7 avatar Oct 03 '22 10:10 ufechner7

The first warning is a bug, and I am fixing it, however, for the second warning I need to learn how to fix it. Probably PrettyTables version 2 will fix it.

PS We will move the showing functionality to PrettyTables version 2 soon. The reason why we haven't move to the new version of the package yet is that the change will be breaking, thus a little more study would be a good idea.

sl-solution avatar Oct 03 '22 11:10 sl-solution

In case it helps, the program tidy fixes the output into this form:

<!DOCTYPE html>
<html>
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.2.0">
<title></title>
</head>
<body>
<p>3 rows × 3 columns</p>
<table class="data-set">
<thead>
<tr>
<th></th>
<th>var1</th>
<th>var2</th>
<th>var3</th>
</tr>
<tr>
<th></th>
<th>identity</th>
<th>identity</th>
<th>identity</th>
</tr>
<tr>
<th></th>
<th title="Union{Missing, Int64}">Int64?</th>
<th title="Union{Missing, Float64}">Float64?</th>
<th title="Union{Missing, String}">String?</th>
</tr>
</thead>
<tbody>
<tr>
<th>1</th>
<td>1</td>
<td>1.2</td>
<td>C1</td>
</tr>
<tr>
<th>2</th>
<td>2</td>
<td>0.5</td>
<td>C2</td>
</tr>
<tr>
<th>3</th>
<td>3</td>
<td>3.3</td>
<td>C3</td>
</tr>
</tbody>
</table>
</body>
</html>

ufechner7 avatar Oct 03 '22 12:10 ufechner7

This is helpful. If I understand it correctly, it suggests putting the

..

out of table.

sl-solution avatar Oct 03 '22 12:10 sl-solution