Set content encoding for files stored on cloud storage
Currently we do not set the content encoding for text files stored on a cloud provider, e.g. readme, license, ...
Which has the side-effect that browsers need to guess the encoding when displaying the files and they are mostly wrong about it.
We should explicitly set the content encoding when uploading a file (there should be support for that by each cloud provider).
Additionally, we need to add some migration procedure for existing files.
Related:
- https://github.com/EclipseFdn/open-vsx.org/issues/1122
- #1346
@netomi can you assign this issue to me?
we dont assign tickets to outside contributors, but you are free to submit a PR ofc.
Hi @netomi, I’m exploring this issue as part of my GSoC 2026 preparation and wanted to discuss my approach before starting a PR.
My idea directly addresses the problem description and the related issue #1122:
-
Prevent Future Issues
The root cause is that text files (e.g., README, LICENSE) are uploaded with a Content-Type liketext/plainwithout specifying the character encoding.
My solution is to modifyStorageUtil.javato include;charset=utf-8in the Content-Type header. This ensures browsers render special characters correctly. -
Fix Existing Files
Changing the code only affects new uploads; existing files on S3/Azure/Google Cloud still have the wrong metadata.
My plan is to create a small migration task (ContentEncodingMigration.java) that:- Identifies affected text files in the database
- Downloads and re-uploads them to update the metadata with the correct charset
This approach follows the suggestion in the issue comments: "We should explicitly set the content encoding when uploading a file... Additionally, we need to add some migration procedure for existing files."
Before I start, I wanted to confirm that this approach aligns with the project’s expectations. Any feedback is appreciated!
Hi @siddharthbaleja7 thanks for the interest in this ticket.
The solution to explicitly set the content encoding for uploading new files to the storage provider sounds good. To fix existing files we will need to apply a different approach. In large instances of openvsx there are more than a million files stored, so downloading them will not work imho.
Thanks for the feedback! You are absolutely right , downloading and re-uploading millions of files would be too resource-intensive.
I propose updating the migration strategy to perform an in-place metadata update instead. Most cloud storage providers allow updating the Content-Type without transferring the file content:
- AWS S3: Use
CopyObjectwhere the source and destination are the same, withMetadataDirective.REPLACE. - Azure Blob: Use
setHttpHeadersto update the properties directly. - Google Cloud: Use
blob.update()to modify the metadata.
This approach only makes lightweight API calls for the affected text files (README, LICENSE, etc.) to set the correct charset=utf-8, avoiding heavy data transfer.
Does this approach sound reasonable?