CycleTLS
CycleTLS copied to clipboard
Empty body with strangely encoded url
Description
Try opening the following URL with CycleTLS:
https://img.asuracomics.com/unsafe/fit-in/200x260/filters:format(webp)/https://asuratoon.com/wp-content/uploads/2023/12/%C2%B8A%C2%B0%C3%97AC_%C2%BDAA%C2%BCAY%C2%B4A_A%C2%B5AcAuc_A%C2%B8AIAE%C2%B2_AOA%C2%BE.jpg
This will result in no data in the body. No error feedback also.
Other URLS like:
https://img.asuracomics.com/unsafe/fit-in/200x260/filters:format(webp)/https://asuratoon.com/wp-content/uploads/2023/09/InsanelyTalentedPlayerCover01.png
Work without issue.
I am guessing the odd url encoding is the issue.
Issue Type
Bug
Operating System
Linux
Node Version
None
Golang Version
Other
Relevant Log Output
No response
both this
package main
import (
"github.com/Danny-Dasilva/CycleTLS/cycletls"
"log"
"encoding/base64"
"os"
)
func main() {
client := cycletls.Init()
response, err := client.Do("https://img.asuracomics.com/unsafe/fit-in/200x260/filters:format(webp)/https://asuratoon.com/wp-content/uploads/2023/12/%C2%B8A%C2%B0%C3%97AC_%C2%BDAA%C2%BCAY%C2%B4A_A%C2%B5AcAuc_A%C2%B8AIAE%C2%B2_AOA%C2%BE.jpg", cycletls.Options{
Body: "",
Ja3: "771,4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,0-23-65281-10-11-35-16-5-13-18-51-45-43-27-17513,29-23-24,0",
UserAgent: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
}, "GET")
if err != nil {
log.Print("Request Failed: " + err.Error())
}
dec, err := base64.StdEncoding.DecodeString(response.Body)
if err != nil {
panic(err)
}
//create file to write
f, err := os.Create("test.jpeg")
if err != nil {
panic(err)
}
defer f.Close()
//write b64 to file
if _, err := f.Write(dec); err != nil {
panic(err)
}
if err := f.Sync(); err != nil {
panic(err)
}
}
and this
const initCycleTLS = require("cycletls");
var fs = require("fs");
//Function to write image to a file
const writeImage = (filename, data) => {
let writeStream = fs.createWriteStream(filename);
// write some data with a base64 encoding
writeStream.write(data, "base64");
writeStream.on("finish", () => {
console.log(`wrote to file ${filename}`);
});
// close the stream
writeStream.end();
};
(async () => {
const cycleTLS = await initCycleTLS();
// try {
const jpegImage = await cycleTLS("https://img.asuracomics.com/unsafe/fit-in/200x260/filters:format(webp)/https://asuratoon.com/wp-content/uploads/2023/12/%C2%B8A%C2%B0%C3%97AC_%C2%BDAA%C2%BCAY%C2%B4A_A%C2%B5AcAuc_A%C2%B8AIAE%C2%B2_AOA%C2%BE.jpg", {
ja3: "771,4865-4867-4866-49195-49199-52393-52392-49196-49200-49162-49161-49171-49172-51-57-47-53-10,0-23-65281-10-11-35-16-5-51-43-13-45-28-21,29-23-24-25-256-257,0",
userAgent:
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0",
});
//Write Image
writeImage("test.jpeg", jpegImage.body);
cycleTLS.exit();
})();
seem to work for me. Body data is being returned and you can write the output to an image in both instances. Can you add your linux version so I can try to better reproduce.
Here is the actual raw URL that fails. The previous posted one was auto corrected and that is why it worked.
https://img.asuracomics.com/unsafe/fit-in/330x450/filters:format(webp)/https://asuratoon.com/wp-content/uploads/2023/12/¸A°×AC_½AA¼AY´A_AµAcAuc_A¸AIAE²_AOA¾.jpg
Notice the:
The character U+00b8 "¸" could be confused with the ASCII character U+002c ",", which is more common in source code The character U+00d7 "×" could be confused with the ASCII character U+0078 "x", which is more common in source code ...
Aka odd characters that are off in any normal URL. So next step is checking encoding...
If you use:
fmt.Println(url.PathEscape(url))
You get:
https://img.asuracomics.com/unsafe/fit-in/330x450/filters:format%28webp%29/https://asuratoon.com/wp-content/uploads/2023/12/%C2%B8A%C2%B0%C3%97AC_%C2%BDAA%C2%BCAY%C2%B4A_A%C2%B5AcAuc_A%C2%B8AIAE%C2%B2_AOA%C2%BE.jpg
But this fails ... Well, fix the "%28webp%29" back to (webp) and bingo... Looks like Go PathEscape is too aggressive.
So the issue is the url is not correctly encoded. And if you use the URL directly (from scraping), ... And CycleTLS chocks on that raw URL.
I love to test things on Asura their website because they are constantly trying to prevent scraping and pull a lot of tricks.
() Caused by the righteousness, similar to this bug: https://github.com/golang/go/issues/63586