Actor ID encoding
Quick summary
I found an actor that has Unicode characters in its ID: https://xn--y9aai3au2bc2f.xn--y9a3aq/գրառում/author/antranigv/ (nodeinfo). Looks like it is not a valid actor ID.
The ActivityPub specification says that identifiers must be "Publicly dereferencable URIs": https://www.w3.org/TR/activitypub/#obj-id But URIs can't contain Unicode characters, so I assume that these characters should be percent-encoded.
Steps to reproduce
curl -H "Accept: application/activity+json" https://xn--y9aai3au2bc2f.xn--y9a3aq/%D5%A3%D6%80%D5%A1%D5%BC%D5%B8%D6%82%D5%B4/author/antranigv/
What you expected to happen
Unicode characters should be percent-encoded.
What actually happened
ID contains Unicode characters
Impact
One
Available workarounds?
There is no user impact
Logs or notes
No response
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days.
I don't think this has been fixed
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Bump
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days.
@pfefferle @obenland ping
@pfefferle What would potentially break if we were to run id and url paths through rawurlencode() to fix this?
# functions.php
function encode_url_path( $url ) {
$path = \wp_parse_url( $url, PHP_URL_PATH );
if ( empty( $path ) ) {
return \esc_url_raw( $url );
}
// Encode each path segment to be RFC 3986 compliant.
$segments = explode( '/', $path );
$encoded_path = implode( '/', array_map( 'rawurlencode', $segments ) );
return \esc_url_raw( str_replace( $path, $encoded_path, $url ) );
}