sunpy
sunpy copied to clipboard
Fix filename sanitization for downloaded files (do not replace periods, do not change case, and do not leave Unicode characters decomposed)
This PR address the issue #7450
Can you also add a changelog please.
Actually in line with https://github.com/sunpy/sunpy/issues/7450#issuecomment-1951436626, his point makes a lot of sense. I think we should leave the periods alone.
So should I remove my fix in net.py
and remove the periods from the list that slugify()
function replaces?
So should I remove my fix in
net.py
and remove the periods from the list thatslugify()
function replaces?
Yes please.
Thanks for the PR this addresses the specific issue and look good to me.
It would be good to consider if could address the the root cause see the discussion on #7450
Sure, I'll investigate the root cause
I've directly committed to this PR to re-implement slugify()
in a more straightforward way
For the Unicode fans out there, slugify()
has been normalizing to NFKD form (decomposed by compatibility), but I feel it should normalize to NFKC form (decomposed by compatibility, then recomposed by canonical equivalence). It's weird to me that it leaves characters decomposed. For example, "ä" becomes two Unicode characters ("a" + U+0308) instead of being restored to a single character. Any thoughts?
I'll note that there is currently a test line that looks intended to test the decomposition of "ä", but that line is bugged, so the output isn't actually tested.
I would be in favour of NFKC
Online Test failures look to be real.
One is a filename difference, and the other seems to be more results returned from the VSOClient?
Do you have any tasks for me at the moment . Apologies for not being active last week I was occupied with my mid-terms
Do you have any tasks for me at the moment . Apologies for not being active last week I was occupied with my mid-terms
The online tests need updating but that is it.
Thanks for the PR @ViciousEagle03 and @ayshih
I decided not to backport this and add a whatsnew as I don't want to break previous behaviour for filenames for the released versions.