XLSX.jl
XLSX.jl copied to clipboard
deal with phonetic text
When reading an excel with phonetic hints (ruby) (see sample), each cell.value
contains a text and its phonetic text.
You can check it, following
# Employment Status Survey / Statistical Tables(Time Series) / Statistical Tables(Time Series) from e-Stat (a portal site for Japanese Government Statistics)
# https://www.e-stat.go.jp/en/stat-search/files?page=1&layout=datalist&toukei=00200532&tstat=000001116777&cycle=0&tclass1=000001116800&stat_infid=000031732265&tclass2val=0
using XLSX, Downloads
f = tempname()
Downloads.download("https://www.e-stat.go.jp/en/stat-search/file-download?statInfId=000031732265&fileKind=0", f)
wb = XLSX.readxlsx(f)
wb[1][:]
It is not common; openpyxl and pandas don't contains phonetic texts in values.
It could be fixed by changing gather_strings!
function in unformatted_text
function, like
function unformatted_text(el::EzXML.Node) :: String
function gather_strings!(v::Vector{String}, e::EzXML.Node)
if EzXML.nodename(e) == "t"
push!(v, EzXML.nodecontent(e))
end
for ch in EzXML.eachelement(e)
if EzXML.nodename(e) != "rPh" ## !!HERE!!
gather_strings!(v, ch)
end
end
nothing
end
...
This change would be reasonable, because a phonetic text is included in "rPh" elements, which can include a "t" element as a child, so applying gather_strings!
to a "rPh" node is unnecessary.