sunpy icon indicating copy to clipboard operation
sunpy copied to clipboard

Fix filename sanitization for downloaded files (do not replace periods, do not change case, and do not leave Unicode characters decomposed)

Open ViciousEagle03 opened this issue 1 year ago • 12 comments

This PR address the issue #7450

ViciousEagle03 avatar Feb 18 '24 09:02 ViciousEagle03

Can you also add a changelog please.

nabobalis avatar Feb 18 '24 17:02 nabobalis

Actually in line with https://github.com/sunpy/sunpy/issues/7450#issuecomment-1951436626, his point makes a lot of sense. I think we should leave the periods alone.

nabobalis avatar Feb 18 '24 21:02 nabobalis

So should I remove my fix in net.py and remove the periods from the list that slugify() function replaces?

ViciousEagle03 avatar Feb 19 '24 04:02 ViciousEagle03

So should I remove my fix in net.py and remove the periods from the list that slugify() function replaces?

Yes please.

nabobalis avatar Feb 19 '24 05:02 nabobalis

Thanks for the PR this addresses the specific issue and look good to me.

It would be good to consider if could address the the root cause see the discussion on #7450

samaloney avatar Feb 19 '24 11:02 samaloney

Sure, I'll investigate the root cause

ViciousEagle03 avatar Feb 19 '24 15:02 ViciousEagle03

I've directly committed to this PR to re-implement slugify() in a more straightforward way

ayshih avatar Feb 20 '24 14:02 ayshih

For the Unicode fans out there, slugify() has been normalizing to NFKD form (decomposed by compatibility), but I feel it should normalize to NFKC form (decomposed by compatibility, then recomposed by canonical equivalence). It's weird to me that it leaves characters decomposed. For example, "ä" becomes two Unicode characters ("a" + U+0308) instead of being restored to a single character. Any thoughts?

I'll note that there is currently a test line that looks intended to test the decomposition of "ä", but that line is bugged, so the output isn't actually tested.

ayshih avatar Feb 20 '24 17:02 ayshih

I would be in favour of NFKC

nabobalis avatar Feb 20 '24 17:02 nabobalis

Online Test failures look to be real.

One is a filename difference, and the other seems to be more results returned from the VSOClient?

nabobalis avatar Feb 21 '24 16:02 nabobalis

Do you have any tasks for me at the moment . Apologies for not being active last week I was occupied with my mid-terms

ViciousEagle03 avatar Mar 02 '24 18:03 ViciousEagle03

Do you have any tasks for me at the moment . Apologies for not being active last week I was occupied with my mid-terms

The online tests need updating but that is it.

nabobalis avatar Mar 02 '24 18:03 nabobalis

Thanks for the PR @ViciousEagle03 and @ayshih

nabobalis avatar Mar 21 '24 01:03 nabobalis

I decided not to backport this and add a whatsnew as I don't want to break previous behaviour for filenames for the released versions.

nabobalis avatar Mar 21 '24 03:03 nabobalis