database
database copied to clipboard
n-quads is UTF-8, but Blazegraph only supports US-ASCII
According to the IANA record [1], n-quads is only supposed to be interpreted as UTF-8, but currently posting utf-8 data in n-quads results in it being interpreted as ASCII. You claim to support the appropriate charset for each format, but n-quads needs to honor utf-8.
Encoding considerations: 8bit The syntax of N-Quads is expressed over code points in Unicode. The encoding is always UTF-8. Unicode code points may also be expressed using an \uXXXX (U+0 to U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a hexadecimal digit [0-9A-F]
[1] https://www.iana.org/assignments/media-types/application/n-quads
Jamie, that certainly looks like a bug. Can you work up a PR with a test and a fix? I can point you to the relevant parts of the code if you are unfamiliar with it.
Thanks, Bryan
On Wed, Aug 11, 2021 at 16:03 Jamie McCusker @.***> wrote:
According to the IANA record [1], n-quads is only supposed to be interpreted as UTF-8, but currently posting utf-8 data in n-quads results in it being interpreted as ASCII. You claim to support the appropriate charset for each format, but n-quads needs to honor utf-8.
Encoding considerations: 8bit The syntax of N-Quads is expressed over code points in Unicode. The encoding is always UTF-8. Unicode code points may also be expressed using an \uXXXX (U+0 to U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a hexadecimal digit [0-9A-F]
[1] https://www.iana.org/assignments/media-types/application/n-quads
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/206, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YDZWZCZ5CJNKRIGAMLT4L6TZANCNFSM5B7VXEIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
We've worked around it, and will be retiring support for Blazegraph with Whyis 2.0. We will be moving over to Fuseki, which is easier for us to extend and control. We've had ongoing production stability issues with Blazegraph, especially when we push multiple mutations per second. I haven't reported this because it's hard to reproduce and it seemed like Blazegraph was EOL.
We do have other projects that are using Blazegraph, so I'll ask around the lab and see if anyone wants to take this on.
On Thu, Aug 12, 2021 at 9:57 AM Bryan Thompson @.***> wrote:
Jamie, that certainly looks like a bug. Can you work up a PR with a test and a fix? I can point you to the relevant parts of the code if you are unfamiliar with it.
Thanks, Bryan
On Wed, Aug 11, 2021 at 16:03 Jamie McCusker @.***> wrote:
According to the IANA record [1], n-quads is only supposed to be interpreted as UTF-8, but currently posting utf-8 data in n-quads results in it being interpreted as ASCII. You claim to support the appropriate charset for each format, but n-quads needs to honor utf-8.
Encoding considerations: 8bit The syntax of N-Quads is expressed over code points in Unicode. The encoding is always UTF-8. Unicode code points may also be expressed using an \uXXXX (U+0 to U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a hexadecimal digit [0-9A-F]
[1] https://www.iana.org/assignments/media-types/application/n-quads
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/206, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AATW7YDZWZCZ5CJNKRIGAMLT4L6TZANCNFSM5B7VXEIQ
. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675
or Android < https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/206#issuecomment-897661737, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAETCEL7IOC4XDAXTRBWV43T4PHLVANCNFSM5B7VXEIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
-- Jamie McCusker (she/they)
Director, Data Operations Tetherless World Constellation Rensselaer Polytechnic Institute @.*** @.***> http://tw.rpi.edu
Adding -Dfile.encoding=UTF-8 -Dfile.client.encoding=UTF-8 -Dclient.encoding.override=UTF-8
did the trick