PatreonDownloader icon indicating copy to clipboard operation
PatreonDownloader copied to clipboard

Odd UTF-8 sequences break title formatting

Open shinji257 opened this issue 2 years ago • 2 comments

Odd sequences that create so called "magic" or "cursed" text cause weird things to happen when it generates the post title folder and causes it to break. I even took precautions to prevent it from ending in a space. These sequences should probably get filtered out to prevent naming breaks

https://www.patreon.com/posts/42453989

shinji257 avatar Mar 13 '22 20:03 shinji257

Here is something that helps to break it down. Looks like these characters are really normal or latin characters that have a crap ton of concating accents. Maybe catching/removing them will take care of this?

Yes.... I know. Really long url. https://qaz.wtf/u/show.cgi?show=S%CC%B7%CC%BE%CC%81%CD%80%CD%9B%CC%8E%CC%8D%CC%88%CC%80%CC%8A%CC%BF%CD%83%CD%9B%CC%97%CC%A8h%CC%B6%CC%8E%CC%80%CC%BD%CC%85%CC%80%CC%BF%CD%91%CC%8F%CC%9A%CD%9B%CC%91%CC%88%CC%BD%CC%8D%CC%AE%CC%99%CD%96%CD%93%CC%A1%CC%AD%CC%9D%CD%88%CC%AD%CD%8E%CC%99%CD%8De%CC%B6%CD%8B%CC%82%CC%88%CC%87%CC%BD%CD%9D%CD%8C%CC%87%CD%83%CD%94%CC%AA%CC%96%CD%87%CD%8E%CD%88%CC%BA%CC%AE%CC%BB%CC%B2+%CC%B7%CC%BF%CC%BF%CC%93%CD%8B%CC%88%CC%81%CC%AB%CD%89%CC%BA%CC%9E%CC%99%CD%95%CC%9F%CC%AE%CC%A3%CC%9C%CC%A0%CC%B2%CC%ACA%CC%B5%CD%92%CC%80%CC%8A%CC%BD%CD%8A%CC%BF%CD%98%CC%8B%CC%8F%CD%87%CD%88%CD%87%CC%B0%CC%BA%CC%BA%CC%97%CD%93%CD%89%CC%B3%CD%96%CD%87%CC%B3%CD%9A%CC%B0p%CC%B8%CD%8B%CC%86%CC%82%CC%9B%CD%9B%CD%8C%CD%84%CC%82%CC%80%CD%92%CC%A0%CD%89%CC%A3%CD%89%CD%96%CC%A3%CC%99%CD%87%CC%AAp%CC%B8%CC%8B%CD%86%CC%92%CC%BF%CD%83%CD%91%CD%83%CC%A0%CD%8D%CD%88%CC%A0%CC%98%CD%88%CD%93%CD%8D%CC%B3%CD%94%CC%99%CC%96%CC%9F%CD%99r%CC%B7%CD%8B%CD%8C%CC%95%CC%93%CD%8A%CD%84%CC%A6%CD%9A%CC%98%CC%A6o%CC%B8%CC%83%CC%8E%CC%89%CC%83%CD%83%CD%84%CC%9A%CC%81%CC%82%CD%9B%CC%BD%CD%9B%CD%81%CC%92%CC%9A%CC%ADa%CC%B8%CD%82%CC%91%CD%9D%CC%95%CC%86%CC%9B%CC%BF%CC%8F%CC%87%CD%90%CC%8C%CC%9B%CD%8C%CC%8E%CC%B1%CC%A9%CC%97%CD%93%CC%AA%CC%AC%CC%A0%CC%AA%CD%9C%CC%A5c%CC%B4%CC%9B%CD%8A%CC%BE%CD%9D%CC%80%CC%94%CD%97%CD%91%CD%8A%CD%97%CC%9A%CC%84%CC%98%CD%88%CC%9Ch%CC%B4%CC%81%CD%87%CC%96%CC%A0%CD%94%CC%A7%CC%B2%CC%BB%CD%95%CC%96%CD%96%CD%9C%CC%98e%CC%B7%CC%87%CD%91%CC%8A%CC%8E%CC%BC%CC%B1%CC%AF%CD%9A%CD%87%CD%99%CC%A7%CD%8D%CC%A6%CD%99%CC%A9%CD%89%CC%9Es%CC%B8%CD%9D%CC%90%CD%A0%CC%A2&type=string

shinji257 avatar Mar 13 '22 22:03 shinji257

Hit a similar issue, and it appears to break for filenaming as well...

Something like this in PathSanitizer.cs should help

using System;
using System.Text;
using System.IO;
using System.Linq;

namespace UniversalDownloaderPlatform.Common.Helpers {
  public class PathSanitizer {
    private static readonly char[] _invalidPathCharacters;

    static PathSanitizer() {
      _invalidPathCharacters = Path.GetInvalidPathChars().Concat("\\/:*?\"<>|".ToCharArray()).ToArray();
    }

    public static string SanitizePath(string path) {
      foreach (char c in _invalidPathCharacters) {
        path = path.Replace(c, '_');
      }

      path = Encoding.ASCII.GetString(
        Encoding.Convert(
          Encoding.UTF8,
          Encoding.GetEncoding(
            Encoding.ASCII.EncodingName,
            new EncoderReplacementFallback(string.Empty),
            new DecoderExceptionFallback()
            ),
          Encoding.UTF8.GetBytes(path)
        )
      );

      return path;
    }
  }
}

ootz0rz avatar Mar 19 '22 20:03 ootz0rz