Use UTF8 encoding on Tar string fields
Fixes https://github.com/dotnet/runtime/issues/75482.
Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.
Issue Details
Fixes https://github.com/dotnet/runtime/issues/75482.
| Author: | Jozkee |
|---|---|
| Assignees: | - |
| Labels: |
|
| Milestone: | - |
Could you describe the interop testing you're planning to do for this? Trying with various tar implementations. Might be worth a table.
@danmoseley I wasn't planning on adding interop tests to System.Formats.Tar.Tests. I did play a bit with GNU Tar with all the formats we support, to see how they handled non-ascii characters and all formats are able to write non-ascii correctly.
I also tested groupname and username and checked with linux commands useradd and groupadd and both commands were able to handle non-ascii.
Last commit addresses the problem with ustar prefix (https://github.com/dotnet/runtime/issues/75360) and partially addresses the truncation issue (https://github.com/dotnet/runtime/issues/75921) as the name won't be truncated on formats that do not suppoort unlimited size names, I said "partially" because other fields are still being truncated (linkname, uname, gname). I was hoping that I could present an sketch of what fixing those issues would look like and extend the fix to the other fields once you agree with it.
Updated PR to also fix the other two related issues, field truncation and ustar prefix logic (see description).
/backport to release/7.0
Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/3146104764