ejbca-ce
ejbca-ce copied to clipboard
Handle non-ASCII charater filenames in EnrollWithRequestIdBean
We had issues with issuing certs containing Croatian symbols - žđšćč as mentioned in #174 so I've looked into the parts that handle it and found slightly differing logic in the downloadToken function between the EnrollWithRequestIdBean and EnrollMakeNewRequestBean.java where one uses a getFileName() function to handle the name generation and the other one does it inline.
The one using the function also encodes the filename to a Base64 string which bypasses the issues with non-ASCII characters.
I'm not too fond of base64 as filename, it's very user non-friendly. Isn't there an apache string function somewhere that makes strings "filename" friendly?
Ah I see, you just copied the behavior that was already part of another piece in EJBCA....
Base64 encoding as the last resort is definitely a good way. One potential improvement, that could preserve some utf-8 characters would be to base64 encode it if it doesn't pass: java.nio.file.Paths.get(filename); If I understand it correctly it would check the filename for validity without doing any IO or anything costly like that.
Uh yeah, It's unwieldy and I didn't really like it either but seeing as it was already there for a different enrollment option I went with it as the obvious solution.
Best solution I found with apache string stuff was just checking if it's ASCIIPrintable and then dropping the special characters but that also results in weirdness since then you get a file where Šandor Štefanović would get a cert named andor tefanovi.pem
I'll check out your suggestion and get back to you here, I'm looking at what else could be done
A nice solution could be replacing UTF-8 characters with corresponding ASCII ones - https://stackoverflow.com/a/4122207/19848036
About the time you posted this I found https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#stripAccents-java.lang.String- which eluded me at first, I'll test it out and update the PR although I wonder if the solution you posted would cover a wider range of edge cases
Alright, I've switched up the filename logic so now it uses the Apache commons lang3 StringUtils which implements a stripAccents function that does exactly what I wanted to achieve here.
I tested out this change with the names Lars Űmlaöt d'Ăȯçěny Fųßßb¤łł and Stevo Štefanovčić which yielded a functional filename of Lars Umlaot dAoceny Fußßb¤ll.pem and Stevo Stefanovcic.pem respectively which make the Apache server happy enough.
If nothing else this'll remedy some issues for users with less-than-standard alphabets
Unfortunately, StringUtils.stripAccents does not guarantee that the output is ASCII printable, as demonstrated by the following unit test:
@Test
public void testTextNormalisation() {
assertTrue(StringUtils.isAsciiPrintable(StringUtils.stripAccents("Test CA")));
assertTrue(StringUtils.isAsciiPrintable(StringUtils.stripAccents("malmö.se")));
assertTrue(StringUtils.isAsciiPrintable(StringUtils.stripAccents("Га́рри Ки́мович Каспа́ров")));
assertTrue(StringUtils.isAsciiPrintable(StringUtils.stripAccents("Mǎ Yún")));
assertTrue(StringUtils.isAsciiPrintable(StringUtils.stripAccents("马云")));
}
This will cause problems for anyone using Chinese or Russian characters in their CN. I don't know how common that is, but there are a couple of people in China using EJBCA.
The standards don't really say what to do, so non-ASCII filenames are a bit of a mess, but my understanding is that most (all?) modern web browsers supports UTF-8 character in the Content-Disposition header. The trick is to URL-encode the filename. This should be done in DownloadHelper.sendFile 🔗 link. For example:
final String encodedFilename = URLEncoder.encode(filename, StandardCharsets.UTF_8);
ec.setResponseHeader("Content-Disposition", "attachment; filename*=UTF-8''" + encodedFilename);
As a sidenote, I asked ChatGPT and it seems to agree, and provides some more details:
To create a Content-Disposition header with a filename containing a UTF-8 string in Java, you can use the URLEncoder.encode() method to encode the filename according to the rules specified in the RFC 6266. The resulting string can then be used as the value of the filename parameter in the Content-Disposition header. Here is an example:
import java.net.URLEncoder;
String filename = "my file with åäö characters.pdf";
String encodedFilename = URLEncoder.encode(filename, "UTF-8"); String contentDisposition = "attachment; filename*=UTF-8''" +
String contentDisposition = "attachment; filename*=UTF-8''" + encodedFilename;
System.out.println(contentDisposition);
This will output the following Content-Disposition header:
┌──────────────────────────────────────────────────────────────────────────┐
│ attachment; │
│ filename*=UTF-8''my%20file%20with%20%C3%A5%C3%A4%C3%B6%20characters.pdf │
└──────────────────────────────────────────────────────────────────────────┘
This should work in most major browsers. Note that the actual behavior may vary depending on the browser and its settings.
To keep backwards compatibility with older browsers, one could look at the UserAgent string and fall back to the ugly, but reliable base64-encoding if needed. There are still some people using EJBCA with Internet Explorer, probably because that's what their smartcard solution supports.