CycleTLS icon indicating copy to clipboard operation
CycleTLS copied to clipboard

Empty body with strangely encoded url

Open benjiro29 opened this issue 1 year ago • 3 comments

Description

Try opening the following URL with CycleTLS:

https://img.asuracomics.com/unsafe/fit-in/200x260/filters:format(webp)/https://asuratoon.com/wp-content/uploads/2023/12/%C2%B8A%C2%B0%C3%97AC_%C2%BDAA%C2%BCAY%C2%B4A_A%C2%B5AcAuc_A%C2%B8AIAE%C2%B2_AOA%C2%BE.jpg

This will result in no data in the body. No error feedback also.

Other URLS like:

https://img.asuracomics.com/unsafe/fit-in/200x260/filters:format(webp)/https://asuratoon.com/wp-content/uploads/2023/09/InsanelyTalentedPlayerCover01.png

Work without issue.

I am guessing the odd url encoding is the issue.

Issue Type

Bug

Operating System

Linux

Node Version

None

Golang Version

Other

Relevant Log Output

No response

benjiro29 avatar Dec 05 '23 19:12 benjiro29

both this

package main

import (
    "github.com/Danny-Dasilva/CycleTLS/cycletls"
	"log"
	"encoding/base64"
    "os"
)

func main() {

	client := cycletls.Init()
	response, err := client.Do("https://img.asuracomics.com/unsafe/fit-in/200x260/filters:format(webp)/https://asuratoon.com/wp-content/uploads/2023/12/%C2%B8A%C2%B0%C3%97AC_%C2%BDAA%C2%BCAY%C2%B4A_A%C2%B5AcAuc_A%C2%B8AIAE%C2%B2_AOA%C2%BE.jpg", cycletls.Options{
		Body:      "",
		Ja3:       "771,4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,0-23-65281-10-11-35-16-5-13-18-51-45-43-27-17513,29-23-24,0",
		UserAgent: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
	}, "GET")
	if err != nil {
		log.Print("Request Failed: " + err.Error())
	}
	dec, err := base64.StdEncoding.DecodeString(response.Body)
    if err != nil {
        panic(err)
    }
    //create file to write
    f, err := os.Create("test.jpeg")
    if err != nil {
        panic(err)
    }
    defer f.Close()
    //write b64 to file
    if _, err := f.Write(dec); err != nil {
        panic(err)
    }
    if err := f.Sync(); err != nil {
        panic(err)
    }

}

and this

const initCycleTLS = require("cycletls");
var fs = require("fs");

//Function to write image to a file
const writeImage = (filename, data) => {
  let writeStream = fs.createWriteStream(filename);

  // write some data with a base64 encoding
  writeStream.write(data, "base64");
  writeStream.on("finish", () => {
    console.log(`wrote to file ${filename}`);
  });
  
  // close the stream
  writeStream.end();
};

(async () => {
  const cycleTLS = await initCycleTLS();
  // try {

  const jpegImage = await cycleTLS("https://img.asuracomics.com/unsafe/fit-in/200x260/filters:format(webp)/https://asuratoon.com/wp-content/uploads/2023/12/%C2%B8A%C2%B0%C3%97AC_%C2%BDAA%C2%BCAY%C2%B4A_A%C2%B5AcAuc_A%C2%B8AIAE%C2%B2_AOA%C2%BE.jpg", {
    ja3: "771,4865-4867-4866-49195-49199-52393-52392-49196-49200-49162-49161-49171-49172-51-57-47-53-10,0-23-65281-10-11-35-16-5-51-43-13-45-28-21,29-23-24-25-256-257,0",
    userAgent:
      "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0",
  });
  //Write Image
  writeImage("test.jpeg", jpegImage.body);

  cycleTLS.exit();
})();

seem to work for me. Body data is being returned and you can write the output to an image in both instances. Can you add your linux version so I can try to better reproduce.

Danny-Dasilva avatar Dec 05 '23 22:12 Danny-Dasilva

Here is the actual raw URL that fails. The previous posted one was auto corrected and that is why it worked.

https://img.asuracomics.com/unsafe/fit-in/330x450/filters:format(webp)/https://asuratoon.com/wp-content/uploads/2023/12/¸A°×AC_½AA¼AY´A_AµAcAuc_A¸AIAE²_AOA¾.jpg

Notice the:

The character U+00b8 "¸" could be confused with the ASCII character U+002c ",", which is more common in source code The character U+00d7 "×" could be confused with the ASCII character U+0078 "x", which is more common in source code ...

Aka odd characters that are off in any normal URL. So next step is checking encoding...

If you use:

fmt.Println(url.PathEscape(url))

You get:

https://img.asuracomics.com/unsafe/fit-in/330x450/filters:format%28webp%29/https://asuratoon.com/wp-content/uploads/2023/12/%C2%B8A%C2%B0%C3%97AC_%C2%BDAA%C2%BCAY%C2%B4A_A%C2%B5AcAuc_A%C2%B8AIAE%C2%B2_AOA%C2%BE.jpg

But this fails ... Well, fix the "%28webp%29" back to (webp) and bingo... Looks like Go PathEscape is too aggressive.

So the issue is the url is not correctly encoded. And if you use the URL directly (from scraping), ... And CycleTLS chocks on that raw URL.

I love to test things on Asura their website because they are constantly trying to prevent scraping and pull a lot of tricks.

benjiro29 avatar Dec 06 '23 01:12 benjiro29

() Caused by the righteousness, similar to this bug: https://github.com/golang/go/issues/63586

gospider007 avatar Dec 06 '23 05:12 gospider007