protege icon indicating copy to clipboard operation
protege copied to clipboard

import ontology error

Open ahmad88me opened this issue 5 years ago • 11 comments

Hi, I have a test ontology published here.

It works with curl

curl -L -H "Accept: application/rdf+xml" https://w3id.org/def/geo123

But I can not import it with Protege (using the link https://w3id.org/def/geo123)

Thanks,

ahmad88me avatar May 17 '19 08:05 ahmad88me

Using the OWL API outside of Protege, I get the same set of parse errors. It looks like this issues isn't caused by Protege specific settings.

When parsing with the OWL API, the correct Accept headers are set. The final stream that is opened contains an html document though. This is the first line of the document <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">. The OWL API does seem to follow an initial redirect. Contenting to investigate.

cc @ignazio1977

matthewhorridge avatar May 17 '19 16:05 matthewhorridge

Verbose curl output, which shows multiple redirects. I think this is causing problems for the OWL API.

*   Trying 138.100.15.128...
* TCP_NODELAY set
* Connected to ontoology.linkeddata.es (138.100.15.128) port 80 (#0)
> GET /publish/geo123 HTTP/1.1
> Host: ontoology.linkeddata.es
> User-Agent: curl/7.54.0
> Accept: application/rdf+xml, application/xml; q=0.7, text/xml; q=0.6
> 
< HTTP/1.1 301 Moved Permanently
< Date: Fri, 17 May 2019 16:51:39 GMT
< Server: Apache
< Location: http://ontoology.linkeddata.es/publish/geo123/
< Content-Length: 246
< Content-Type: text/html; charset=iso-8859-1
< 
* Ignoring the response-body
* Connection #0 to host ontoology.linkeddata.es left intact
* Issue another request to this URL: 'http://ontoology.linkeddata.es/publish/geo123/'
* Found bundle for host ontoology.linkeddata.es: 0x7f8cde0015a0 [can pipeline]
* Re-using existing connection! (#0) with host ontoology.linkeddata.es
* Connected to ontoology.linkeddata.es (138.100.15.128) port 80 (#0)
> GET /publish/geo123/ HTTP/1.1
> Host: ontoology.linkeddata.es
> User-Agent: curl/7.54.0
> Accept: application/rdf+xml, application/xml; q=0.7, text/xml; q=0.6
> 
< HTTP/1.1 303 See Other
< Date: Fri, 17 May 2019 16:51:39 GMT
< Server: Apache
< Location: https://ahmad88me.github.io/demo/OnToology/geolinkeddata.owl/documentation/doc/ontology.xml
< Content-Length: 298
< Content-Type: text/html; charset=iso-8859-1
< 
* Ignoring the response-body
* Connection #0 to host ontoology.linkeddata.es left intact
* Issue another request to this URL: 'https://ahmad88me.github.io/demo/OnToology/geolinkeddata.owl/documentation/doc/ontology.xml'
*   Trying 185.199.108.153...
* TCP_NODELAY set
* Connected to ahmad88me.github.io (185.199.108.153) port 443 (#1)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=San Francisco; O=GitHub, Inc.; CN=www.github.com
*  start date: Jun 27 00:00:00 2018 GMT
*  expire date: Jun 20 12:00:00 2020 GMT
*  subjectAltName: host "ahmad88me.github.io" matched cert's "*.github.io"
*  issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 High Assurance Server CA
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7f8cdf800400)
> GET /demo/OnToology/geolinkeddata.owl/documentation/doc/ontology.xml HTTP/2
> Host: ahmad88me.github.io
> User-Agent: curl/7.54.0
> Accept: application/rdf+xml, application/xml; q=0.7, text/xml; q=0.6
> 
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 200 
< server: GitHub.com
< content-type: application/xml
< last-modified: Fri, 10 May 2019 15:40:09 GMT
< etag: "5cd59b59-87a"
< access-control-allow-origin: *
< expires: Fri, 17 May 2019 16:22:48 GMT
< cache-control: max-age=600
< x-github-request-id: C464:35DE:D41356:E2B863:5CDEDD7F
< accept-ranges: bytes
< date: Fri, 17 May 2019 16:51:40 GMT
< via: 1.1 varnish
< age: 0
< x-served-by: cache-pao17439-PAO
< x-cache: HIT
< x-cache-hits: 1
< x-timer: S1558111900.024108,VS0,VE24
< vary: Accept-Encoding
< x-fastly-request-id: 919cc428c290c570390a2e2d4d867613dbec66e2
< content-length: 2170
< 
<?xml version="1.0"?>
<rdf:RDF xmlns="http://geo.linkeddata.es/ontology/"
     xml:base="http://geo.linkeddata.es/ontology/"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:xml="http://www.w3.org/XML/1998/namespace"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:adms="http://www.w3.org/ns/adms#"
     xmlns:foaf="http://xmlns.com/foaf/0.1/"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:skos="http://www.w3.org/2004/02/skos/core#"
     xmlns:vann="http://purl.org/vocab/vann/"
     xmlns:scovo="http://purl.org/NET/scovo#"
     xmlns:terms="http://purl.org/dc/terms/"
     xmlns:ontology="http://geo.linkeddata.es/ontology/"
     xmlns:wgs84_pos="http://www.w3.org/2003/01/geo/wgs84_pos#">
    <owl:Ontology rdf:about="http://geo.linkeddata.es/ontology/">
        <owl:imports rdf:resource="http://geo.linkeddata.es/ontology/geopolitica.owl"/>
        <owl:imports rdf:resource="http://geo.linkeddata.es/ontology/hydro.owl"/>
        <owl:imports rdf:resource="http://geo.linkeddata.es/ontology/transportes.owl"/>
        <terms:contributor rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI">http://delicias.dia.fi.upm.es/members/DGarijo/#me</terms:contributor>
        <terms:creator>Luis Manuel Vilches-Blázquez</terms:creator>
        <terms:license>http://creativecommons.org/licenses/by-nc-sa/2.0/</terms:license>
        <terms:publisher xml:lang="en">Ontology Engineering Group</terms:publisher>
        <terms:title xml:lang="es">Red de ontologías geo.linkeddata.es</terms:title>
        <vann:preferredNamespacePrefix>geoes</vann:preferredNamespacePrefix>
        <vann:preferredNamespaceUri>http://geo.linkeddata.es/ontology/</vann:preferredNamespaceUri>
        <rdfs:comment xml:lang="en">Geo.linkeddata.es ontology network</rdfs:comment>
        <rdfs:comment xml:lang="es">Red de ontologías geo.linkeddata.es</rdfs:comment>
        <owl:versionInfo>v0.1</owl:versionInfo>
    </owl:Ontology>
</rdf:RDF>



<!-- Generated by the OWL API (version 5.1.7) https://github.com/owlcs/owlapi/ -->


* Connection #1 to host ahmad88me.github.io left intact

matthewhorridge avatar May 17 '19 16:05 matthewhorridge

Trying with okhttp, I get HTML back on http://aims.fao.org/aos/geopolitical.owl.

This returns HTML on curl as well.

With OWLAPI 6, the error is different from OWLAPI 4 - OWLAPI 4 cannot parse the first ontology, while OWLAPI 6 gets stuck on the above link, so there's likely an issue in the version 4 code; however, I'm not sure whether the error on http://aims.fao.org/aos/geopolitical.owl is legitimate or not.

ignazio1977 avatar May 18 '19 05:05 ignazio1977

Looks like a plain 404

*   Trying 54.72.210.230...
* TCP_NODELAY set
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to aims.fao.org (54.72.210.230) port 80 (#0)
> GET /aos/geopolitical.owl HTTP/1.1
> Host: aims.fao.org
> User-Agent: curl/7.52.1
> Accept: application/rdf+xml
>
< HTTP/1.1 404 Not Found
< Date: Sat, 18 May 2019 05:43:11 GMT
< Content-Type: text/html; charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< Server: Apache/2.4.39 (Amazon) PHP/7.2.17
< X-Content-Type-Options: nosniff
< X-Powered-By: PHP/7.2.17
< X-Drupal-Cache: HIT
< Etag: "1558157778-0"
< Content-Language: en
< X-Frame-Options: SAMEORIGIN
< X-Generator: Drupal 7 (http://drupal.org)
< Cache-Control: public, max-age=300
< Last-Modified: Sat, 18 May 2019 05:36:18 GMT
< Expires: Sun, 19 Nov 1978 05:00:00 GMT
< Vary: Cookie,Accept-Encoding
<```

ignazio1977 avatar May 18 '19 05:05 ignazio1977

Hello, the problem also occurs for other ontologies: https://w3id.org/def/saref4agri https://w3id.org/def/saref4city https://w3id.org/def/easytv In all of them there is a double redirection (302 from w3id to the server where content negotiation happens and then 303 depending on the request (html, ttl, etc.)). Curls works fine and the final redirected URL loads in Protege as well.

dgarijo avatar Jun 10 '19 13:06 dgarijo

Thanks for the extra details @dgarijo. Much appreciated.

matthewhorridge avatar Jun 10 '19 14:06 matthewhorridge

Hello, I'm encountering the same type of error with some ontologies that include only a single redirection (see for example: http://ontology.eil.utoronto.ca/5087/Activity/1.2/). In this case, a 301 redirect is used but I've also tried with a 302 with the same result. Curl works fine and the ontologies can be loaded with Protege without issue (just not imported).

megankatsumi avatar Oct 23 '20 16:10 megankatsumi

This appears related to https://github.com/owlcs/owlapi/issues/954, which is fixed for version 4 in 4.5.18

ignazio1977 avatar Oct 23 '20 22:10 ignazio1977

My apologies, I should have clarified that I am using Protege 5.5.0.

In addition, in this particular instance I don't think there are multiple http redirects (see excerpt of curl output below):

*   Trying 128.100.48.242...
* TCP_NODELAY set
* Connected to ontology.eil.utoronto.ca (128.100.48.242) port 80 (#0)
> GET /5087/Activity/1.2/ HTTP/1.1
> Host: ontology.eil.utoronto.ca
> User-Agent: curl/7.64.1
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
< Date: Fri, 23 Oct 2020 16:44:07 GMT
< Server: Apache
< Location: https://enterpriseintegrationlab.github.io/icity/5087/Activity/Activity_1.2.owl
< Content-Length: 287
< Content-Type: text/html; charset=iso-8859-1
< 
* Ignoring the response-body
* Connection #0 to host ontology.eil.utoronto.ca left intact
* Issue another request to this URL: 'https://enterpriseintegrationlab.github.io/icity/5087/Activity/Activity_1.2.owl'
*   Trying 185.199.108.153...
* TCP_NODELAY set
* Connected to enterpriseintegrationlab.github.io (185.199.108.153) port 443 (#1)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=San Francisco; O=GitHub, Inc.; CN=www.github.com
*  start date: May  6 00:00:00 2020 GMT
*  expire date: Apr 14 12:00:00 2022 GMT
*  subjectAltName: host "enterpriseintegrationlab.github.io" matched cert's "*.github.io"
*  issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 High Assurance Server CA
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7f9f03809200)
> GET /icity/5087/Activity/Activity_1.2.owl HTTP/2
> Host: enterpriseintegrationlab.github.io
> User-Agent: curl/7.64.1
> Accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 200 
< server: GitHub.com
< content-type: application/rdf+xml
< strict-transport-security: max-age=31556952
< last-modified: Mon, 19 Oct 2020 18:29:14 GMT
< etag: "5f8ddafa-aa3a"
< access-control-allow-origin: *
< expires: Fri, 23 Oct 2020 13:46:29 GMT
< cache-control: max-age=600
< x-proxy-cache: MISS
< x-github-request-id: 890E:3B1E:12C1884:16D9A8C:5F92DC5D
< accept-ranges: bytes
< date: Fri, 23 Oct 2020 16:44:07 GMT
< via: 1.1 varnish
< age: 0
< x-served-by: cache-mdw17355-MDW
< x-cache: MISS
< x-cache-hits: 0
< x-timer: S1603471447.437002,VS0,VE27
< vary: Accept-Encoding
< x-fastly-request-id: 3e59ac7dc52351586177771dec254b5a616d31a5
< content-length: 43578
< 
<?xml version="1.0"?>
<rdf:RDF xmlns="http://ontology.eil.utoronto.ca/5087/Activity/"
     xml:base="http://ontology.eil.utoronto.ca/5087/Activity/"
     xmlns:WV="http://www.wurvoc.org/vocabularies/WV/"

megankatsumi avatar Oct 26 '20 11:10 megankatsumi

Sorry, I should have been more precise - version 4 referred to OWLAPI version 4, which is used throughout Protege version 5.x.

I've just tried importing https://enterpriseintegrationlab.github.io/icity/5087/Activity/Activity_1.2.owl with the main branch Protege (OWLAPI 4.5.12) and then tried dropping in the 4.5.18 jar. In both cases the loading completes, but with differences in the logs.

OWL API Version: 4.5.12.2019-05-06T20:49:08Z
Imported ontology document http://purl.org/dc/elements/1.1/ was not resolved to any documents defined in the ontology catalog.
[Fatal Error] :9:3: The element type "p" must be terminated by the matching end-tag "</p>".
Failed to load imported ontology at http://purl.org/dc/elements/1.1/
Finished loading https://enterpriseintegrationlab.github.io/icity/5087/Activity/Activity_1.2.owl

While on 4.5.18:

OWL API Version: 4.5.18.2020-10-23T22:28:07Z
Imported ontology document http://purl.org/dc/elements/1.1/ was not resolved to any documents defined in the ontology catalog.
Notice: root element does not have an xml:base. Relative IRIs will be resolved against 
Finished loading imported ontology at http://purl.org/dc/elements/1.1/
Finished loading https://enterpriseintegrationlab.github.io/icity/5087/Activity/Activity_1.2.owl

Trying with OWLAPI code only so bad imports are not silenced, I can replicate the failure; cannot replicate it in 4.5.18.

It's worth trying a drop in replacement for your local install while Protege gets updated to use 4.5.18.

ignazio1977 avatar Oct 26 '20 21:10 ignazio1977

Thanks for clarifying @ignazio1977. This worked for me!

megankatsumi avatar Oct 27 '20 18:10 megankatsumi