protege
protege copied to clipboard
Protege OWLDoc cannot save Chinese classes names correctly
Hello all,
Try to seek help here. I create one ontology in Protege with classes name in Chinese character, Protege is the latest 5.6.2 version in Mac OS.
After I use the OWLDoc to generate the documentation, the classes folder shows the class file name in non-Chinese characters, one sample is like this:
%E4%B8%8A%E5%90%90%E4%B8%8B%E6%B3%BB___126541734.html
The home page can display the class name in Chinese correctly and the URL link to the classes is in Chinese character, but since the actual file name is not in Chinese, the navigation is not working.
Is this due to something like UTF-8 setting for Chinese characters? I couldn't find the way to configure that, please help.
Thanks, Xiaoqi
The problem seems to be caused by Chinese characters in the IRI, not in the label. I cannot reproduce the problem at all if I use Chinese characters only in class labels, but I do reproduce it as soon as I use them also in class IRIs.
Can you confirm your ontology is using Chinese characters in IRIs and not only in labels?
When non-ASCII characters are present in an IRI, the OWLDoc plugin converts the IRI into a URI by encoding all the non-ASCII characters with the “percent-encoding” method (leading to those '%E4%B8...' strings). The percent-encoded URI is then written in the generated HTML files as the target of a link:
<li class="asserted"><a href="TEST_%E8%89%BE%E5%85%8B%E8%88%87%E8%92%82%E5%A8%9C___1344495863.html" class='Class' title="http://purl.obolibrary.org/obo/TEST_艾克與蒂娜">艾克與蒂娜</a></li>
So far, so good: that’s the expected behaviour.
But then, the HTML file for that class is written under a filename which is also percent-encoded (i.e., TEST_%E8%89%BE%E5%85%8B%E8%88%87%E8%92%82%E5%A8%9C___1344495863.html
instead of TEST_艾克與蒂娜___1344495863.html
). That I believe is incorrect, and is why the browser cannot find the file.
When the browser reads TEST_%E8%89%BE%E5%85%8B%E8%88%87%E8%92%82%E5%A8%9C___1344495863.html
in the href
attribute of a link, it decodes the percent-encoded URI back into the original IRI (TEST_艾克與蒂娜___1344495863.html
), and then looks for precisely that filename – it does not look for a percent-encoded filename.
Bottom line is that this looks like a bug in the OWLDoc plugin, not a configuration problem on your side.
Hi @gouttegd,
Thanks greatly for your quick support, it's exact the error I'm facing.
Yes, when I create the class name in Chinese character, the IRI is also using the Chinese character, as below the sample full IRI for class "金":
http://www.semanticweb.org/yasen/ontologies/2023/4/medica#金
As you said, the encoding step is as designed, but it's not properly decode the URI back into the original IRI, so it cannot find the file name in classes folder with Chinese characters.
Good that OWLDoc plugin can fix this as a bug.
Thanks again, Xiaoqi
Hi again, I record one quick video in Windows Protege (v5.6.2 as well) to demo this issue: https://youtu.be/vEaSQo3h87s for your easier review. Same situation as in Mac OS. Thanks a lot!
Hi there, where I can learn when this bug will be solved? Thanks.
Well, nobody has worked on the owldoc plugin for the past 7 years, so it’s unlikely to be fixed any time soon.