openvsx icon indicating copy to clipboard operation
openvsx copied to clipboard

Enforce the use of utf-8 as default charset for api mappings

Open netomi opened this issue 1 month ago • 7 comments

This fixes #1346 .

A CharacterEncodingFilter is added to enforce the use of UTF-8 encoding for any api request.

netomi avatar Nov 04 '25 10:11 netomi

@amvanbaren - At your earliest convenience, could you please take a look at this MR?

chrisguindon avatar Nov 05 '25 20:11 chrisguindon

fyi: this is just one solution to the problem, I am happy to discuss other approaches but we should certainly ensure utf-8 encoding throughout the server imho.

netomi avatar Nov 05 '25 21:11 netomi

another option would be to add that to the application.yaml:

server:
  servlet:
    encoding:
      charset: UTF-8 # its already the default, just to make it clear that this is what we want
      force: true

Other option would be to explicitly set the content encoding to UTF-8 for all responses, but that is tedious and you might miss some occurrences.

The downside of updating the configuration is that you must ensure that it is configured like that for your instance instead of hardcoding it in the application itself.

netomi avatar Nov 06 '25 14:11 netomi

fyi: this is just one solution to the problem, I am happy to discuss other approaches but we should certainly ensure utf-8 encoding throughout the server imho.

Throughout or only /api?

amvanbaren avatar Nov 10 '25 10:11 amvanbaren

so the change is currently for /api as these routes are most affected, but the whole app should probably default to utf-8. Not sure why its not the case, the spring documentation on this is rather sparse.

Some claim that this is the default, but I failed to find official documentation about it. Maybe just the force parameter is not set, so the default might be UTF-8.

netomi avatar Nov 10 '25 10:11 netomi

This works for local storage, but not for cloud storage.

amvanbaren avatar Nov 20 '25 17:11 amvanbaren

I could not test yet on a cloud storage, so I feared that it will not work.

Digging more into this topic, you can actually set properties for files stored in a blob: https://learn.microsoft.com/en-us/rest/api/storageservices/set-blob-properties?tabs=microsoft-entra-id

That should also include content type and encoding, so we should change the existing storage provider to set the encoding to utf-8 by default.

The question is how we modify existing files, there are currently 1.3M entries, of which there are several 100k text / json files which should be changed afaict

netomi avatar Nov 20 '25 18:11 netomi