ondemand icon indicating copy to clipboard operation
ondemand copied to clipboard

File-Editor: Can't open files with multi-byte UTF-8 characters

Open nickjer opened this issue 8 years ago • 7 comments

I am unable to open files whether from the FileExplorer or from this internal file explorer that have funky characters in the file name:

image

┆Issue is synchronized with this Asana task by Unito

nickjer avatar May 06 '17 17:05 nickjer

The viewer in FileExplorer works though. So this only seems to affect opening files in FileEditor.

nickjer avatar May 06 '17 18:05 nickjer

This one is turning out to be a little sticky.

Using the example: (ノ°Д°)ノ︵ ┻━┻ is actually causing problems on the Rails end.

On the system, is actually a three-byte char represented by %EF%BC%89, but the ruby renderer is writing it out as a standard close paren ), which doesn't get escaped.

Likewise is represented by %EF%B8%B5 and it's being rendered as (

These are valid characters on the filesystem and I can open the files in the File Explorer viewer, but because rails (or maybe the browser) is making some odd substitutions as or before it gets rendered out, it's not encoding properly prior to getting passed to the API.

I'll need to pinpoint exactly where the substitution is happening and enforce the proper encoding.

brianmcmichael avatar May 10 '17 20:05 brianmcmichael

The problem seems to be coming from the OodAppkit files api generator. There may be a setting required for Addressible gem to handle this appropriately.

irb(main):002:0> p = Pathname.new '(ノ°Д°)ノ︵ ┻━┻'
=> #<Pathname:(ノ°Д°)ノ︵ ┻━┻>
irb(main):003:0> o = OodAppkit.files.api(path: p).to_s
=> "/pun/sys/files/api/v1/fs(%E3%83%8E%C2%B0%D0%94%C2%B0)%E3%83%8E(%20%E2%94%BB%E2%94%81%E2%94%BB"
irb(main):005:0> p.to_s
=> "(ノ°Д°)ノ︵ ┻━┻"

brianmcmichael avatar May 10 '17 21:05 brianmcmichael

Appears to affect Addressible 2.5.1

Addressable::URI.parse('http://www.google.com/(╯°□°)╯︵ ┻━┻').normalize
=> #<Addressable::URI:0x23b0478 URI:http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0)%E2%95%AF(%20%E2%94%BB%E2%94%81%E2%94%BB>

brianmcmichael avatar May 11 '17 13:05 brianmcmichael

The problem with Addressable seems intentional.

It's doing the right thing actually. IRIs (unicode-friendly URIs) use unicode normalization form KC to limit phishing. NFKC tends to do perceptual codepoint conversions, like converting '?' to '?'. The solution here is not to normalize the URI if this is causing a problem, or to instead normalize components piecemeal. "http://foo.com/blah%ef%bc%9f" and "http://foo.com/blah%3F" are considered equivalent.

https://github.com/sporkmonger/addressable/issues/8#issuecomment-26674048

@nickjer is the .normalize call in OodAppkit necessary?

irb(main):003:0> p = Pathname.new "http://www.google.com/(╯°□°)╯︵ ┻━┻"
=> #<Pathname:http://www.google.com/(╯°□°)╯︵ ┻━┻>
irb(main):006:0> v = URI.encode p.to_s
=> "http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0%EF%BC%89%E2%95%AF%EF%B8%B5%20%E2%94%BB%E2%94%81%E2%94%BB"
irb(main):008:0> g = Addressable::URI.parse v
=> #<Addressable::URI:0x24c86f8 URI:http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0%EF%BC%89%E2%95%AF%EF%B8%B5%20%E2%94%BB%E2%94%81%E2%94%BB>
irb(main):009:0> g.normalize
=> #<Addressable::URI:0x24b92e8 URI:http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0)%E2%95%AF(%20%E2%94%BB%E2%94%81%E2%94%BB>

brianmcmichael avatar May 11 '17 13:05 brianmcmichael

I say we punt on this and bring up a more url-friendly encoding for file paths that all of our apps implement for ingest.

An example being: https://ruby-doc.org/stdlib-2.2.0/libdoc/base64/rdoc/Base64.html

in particular the urlsafe option.

nickjer avatar May 11 '17 15:05 nickjer

reviewed, similar to #254 but distinct

matt257 avatar May 02 '24 19:05 matt257