PatreonDownloader
PatreonDownloader copied to clipboard
Odd UTF-8 sequences break title formatting
Odd sequences that create so called "magic" or "cursed" text cause weird things to happen when it generates the post title folder and causes it to break. I even took precautions to prevent it from ending in a space. These sequences should probably get filtered out to prevent naming breaks
https://www.patreon.com/posts/42453989
Here is something that helps to break it down. Looks like these characters are really normal or latin characters that have a crap ton of concating accents. Maybe catching/removing them will take care of this?
Yes.... I know. Really long url. https://qaz.wtf/u/show.cgi?show=S%CC%B7%CC%BE%CC%81%CD%80%CD%9B%CC%8E%CC%8D%CC%88%CC%80%CC%8A%CC%BF%CD%83%CD%9B%CC%97%CC%A8h%CC%B6%CC%8E%CC%80%CC%BD%CC%85%CC%80%CC%BF%CD%91%CC%8F%CC%9A%CD%9B%CC%91%CC%88%CC%BD%CC%8D%CC%AE%CC%99%CD%96%CD%93%CC%A1%CC%AD%CC%9D%CD%88%CC%AD%CD%8E%CC%99%CD%8De%CC%B6%CD%8B%CC%82%CC%88%CC%87%CC%BD%CD%9D%CD%8C%CC%87%CD%83%CD%94%CC%AA%CC%96%CD%87%CD%8E%CD%88%CC%BA%CC%AE%CC%BB%CC%B2+%CC%B7%CC%BF%CC%BF%CC%93%CD%8B%CC%88%CC%81%CC%AB%CD%89%CC%BA%CC%9E%CC%99%CD%95%CC%9F%CC%AE%CC%A3%CC%9C%CC%A0%CC%B2%CC%ACA%CC%B5%CD%92%CC%80%CC%8A%CC%BD%CD%8A%CC%BF%CD%98%CC%8B%CC%8F%CD%87%CD%88%CD%87%CC%B0%CC%BA%CC%BA%CC%97%CD%93%CD%89%CC%B3%CD%96%CD%87%CC%B3%CD%9A%CC%B0p%CC%B8%CD%8B%CC%86%CC%82%CC%9B%CD%9B%CD%8C%CD%84%CC%82%CC%80%CD%92%CC%A0%CD%89%CC%A3%CD%89%CD%96%CC%A3%CC%99%CD%87%CC%AAp%CC%B8%CC%8B%CD%86%CC%92%CC%BF%CD%83%CD%91%CD%83%CC%A0%CD%8D%CD%88%CC%A0%CC%98%CD%88%CD%93%CD%8D%CC%B3%CD%94%CC%99%CC%96%CC%9F%CD%99r%CC%B7%CD%8B%CD%8C%CC%95%CC%93%CD%8A%CD%84%CC%A6%CD%9A%CC%98%CC%A6o%CC%B8%CC%83%CC%8E%CC%89%CC%83%CD%83%CD%84%CC%9A%CC%81%CC%82%CD%9B%CC%BD%CD%9B%CD%81%CC%92%CC%9A%CC%ADa%CC%B8%CD%82%CC%91%CD%9D%CC%95%CC%86%CC%9B%CC%BF%CC%8F%CC%87%CD%90%CC%8C%CC%9B%CD%8C%CC%8E%CC%B1%CC%A9%CC%97%CD%93%CC%AA%CC%AC%CC%A0%CC%AA%CD%9C%CC%A5c%CC%B4%CC%9B%CD%8A%CC%BE%CD%9D%CC%80%CC%94%CD%97%CD%91%CD%8A%CD%97%CC%9A%CC%84%CC%98%CD%88%CC%9Ch%CC%B4%CC%81%CD%87%CC%96%CC%A0%CD%94%CC%A7%CC%B2%CC%BB%CD%95%CC%96%CD%96%CD%9C%CC%98e%CC%B7%CC%87%CD%91%CC%8A%CC%8E%CC%BC%CC%B1%CC%AF%CD%9A%CD%87%CD%99%CC%A7%CD%8D%CC%A6%CD%99%CC%A9%CD%89%CC%9Es%CC%B8%CD%9D%CC%90%CD%A0%CC%A2&type=string
Hit a similar issue, and it appears to break for filenaming as well...
Something like this in PathSanitizer.cs should help
using System;
using System.Text;
using System.IO;
using System.Linq;
namespace UniversalDownloaderPlatform.Common.Helpers {
public class PathSanitizer {
private static readonly char[] _invalidPathCharacters;
static PathSanitizer() {
_invalidPathCharacters = Path.GetInvalidPathChars().Concat("\\/:*?\"<>|".ToCharArray()).ToArray();
}
public static string SanitizePath(string path) {
foreach (char c in _invalidPathCharacters) {
path = path.Replace(c, '_');
}
path = Encoding.ASCII.GetString(
Encoding.Convert(
Encoding.UTF8,
Encoding.GetEncoding(
Encoding.ASCII.EncodingName,
new EncoderReplacementFallback(string.Empty),
new DecoderExceptionFallback()
),
Encoding.UTF8.GetBytes(path)
)
);
return path;
}
}
}