NewPipeExtractor icon indicating copy to clipboard operation
NewPipeExtractor copied to clipboard

YouTube channels with unicode names are not accepted by YouTube.getChannelExtractor

Open sadr0b0t opened this issue 5 years ago • 0 comments

Hello, NewPipeExctractor 0.20.1

I am trying to work with YouTube channel which has unicode in its URL https://www.youtube.com/c/СтудияДиафильм

And get ParsingException from NewPipeExtractor. First, I thought that java.net.URL does not want to accept UTF-8 url string, so converted url to ASCII, but got same exception also with ASCII representation

        try {
            String plUrl = "https://www.youtube.com/c/СтудияДиафильм";
            String plUrlAscii = new java.net.URI(plUrl).toASCIIString();
            System.out.println(plUrlAscii);

            java.net.URL url1 = new java.net.URL(plUrl);
            System.out.println(url1.toString());
            java.net.URL url2 = new java.net.URL(plUrlAscii);
            System.out.println(url2.toString());

            YouTube.getChannelExtractor(url2.toString());
        } catch (Exception e) {
            e.printStackTrace();
        }

constructing URL both from UTF-8 url and from ascii-escaped works

System.out: https://www.youtube.com/c/%D0%A1%D1%82%D1%83%D0%B4%D0%B8%D1%8F%D0%94%D0%B8%D0%B0%D1%84%D0%B8%D0%BB%D1%8C%D0%BC
System.out: https://www.youtube.com/c/СтудияДиафильм
System.out: https://www.youtube.com/c/%D0%A1%D1%82%D1%83%D0%B4%D0%B8%D1%8F%D0%94%D0%B8%D0%B0%D1%84%D0%B8%D0%BB%D1%8C%D0%BC
 org.schabi.newpipe.extractor.exceptions.ParsingException: Malformed unacceptable url: https://www.youtube.com/c/%D0%A1%D1%82%D1%83%D0%B4%D0%B8%D1%8F%D0%94%D0%B8%D0%B0%D1%84%D0%B8%D0%BB%D1%8C%D0%BC
     at org.schabi.newpipe.extractor.linkhandler.LinkHandlerFactory.fromUrl(LinkHandlerFactory.java:54)
     at org.schabi.newpipe.extractor.linkhandler.ListLinkHandlerFactory.fromUrl(ListLinkHandlerFactory.java:43)
     at org.schabi.newpipe.extractor.linkhandler.ListLinkHandlerFactory.fromUrl(ListLinkHandlerFactory.java:36)
     at org.schabi.newpipe.extractor.StreamingService.getChannelExtractor(StreamingService.java:253)
...

It much looks like the URL (both unicode and ascii-escaped) is not accepted somewhere here in YouTubeChannelLinkHandlerFactory.getId(url):

            if (id == null || !id.matches("[A-Za-z0-9_-]+")) {
                throw new ParsingException("The given id is not a Youtube-Video-ID");
            }

sadr0b0t avatar Oct 19 '20 21:10 sadr0b0t