WebToEpub Help write parser for https://ffxs8.com

Hello friend, I am trying to learn to make my own analyzers but I don't understand much, can you tell me what the error is? Uploading Screenshot_2023-10-10-21-29-52-925_com.android.systemui.jpg…

Oct 11 '23 02:10 Alastorwill

Screenshot_2023-10-10-21-29-52-925_com android systemui

Oct 11 '23 02:10 Alastorwill

Screenshot_2023-10-10-19-56-38-363_com kiwibrowser browser

Oct 11 '23 02:10 Alastorwill

If you could tell me where I was wrong I would appreciate it.

Oct 11 '23 02:10 Alastorwill

@Alastorwill hi your problem is in line 11

let tocUrl = dom.querySelector(".detail .detail-btn .read").href;

The error message says, that it can't read href. example link: https://ffxs8.com/trxs/16224.html Solution: let tocUrl = dom.querySelector(".detail .detail-btn .read").firstChild.getAttribute("href"); tocUrl is "/trxs/16224/1.html" Or: let tocUrl = dom.querySelector(".detail .detail-btn .read").firstChild.href; tocUrl is "https://ffxs8.com/trxs/16224/1.html"

Oct 11 '23 08:10 gamebeaker

it didn't work for me

Screenshot_2023-10-11-17-39-06-888_com kiwibrowser browser

Oct 11 '23 22:10 Alastorwill

it didn't work for me

Screenshot_2023-10-11-17-39-06-888_com kiwibrowser browser

Oct 11 '23 22:10 Alastorwill

Screenshot_2023-10-11-18-08-03-083_com kiwibrowser browser Screenshot_2023-10-11-18-06-32-947_com kiwibrowser browser

Oct 11 '23 23:10 Alastorwill

These are the results I really don't know if this part is correct .detail .detail-btn .read since I don't have much experience I saw a different analyzer and searched for the same parts that the other page had I'm not sure if I was wrong on that one part since I didn't understand that part

Oct 11 '23 23:10 Alastorwill

@Alastorwill

This works for me. Tested with

https://www.ffxs8.com/trxs/16224.html, chapters 1 and 2

Notes,

As the above page seems to have the list of chapters, there's no need to make an additional call to get the list, just read from the initial web page
Character set is gb3212 not gb18030
48 minutes work

"use strict";

parserFactory.register("ffxs8.com", () => new Ffxs8Parser());

class Ffxs8Parser extends Parser{
    constructor() {
        super();
    }

    async getChapterUrls(dom) {
        let menu = dom.querySelector("div.catalog");
        return util.hyperlinksToChapterList(menu);
    }

    findContent(dom) {
        return dom.querySelector("div.content");
    }

    findChapterTitle(dom) {
        return dom.querySelector("div.article h1");
    }
    
    findCoverImageUrl(dom) {
        return util.getFirstImgSrc(dom, "div.cover");
    }

    async fetchChapter(url) {
        return (await HttpClient.wrapFetch(url, this.makeOptions())).responseXML;
    }

    makeOptions() {
        return ({
            makeTextDecoder: () => new TextDecoder("gb2312")
        });
    }

    getInformationEpubItemChildNodes(dom) {
        return [...dom.querySelectorAll("div.descInfo")];
    }
}

Oct 12 '23 07:10 dteviot

Thanks for helping me, I'm already understanding more. I tried another page that was similar and made some modifications and it worked for me in some, it's more complicated than in others. By the way, I don't know if you know how to put the option to register. There are pages that don't let you see the entire chapter. unless you are registered on the page but webtoepub does not recognize it even if it is in the browser so I want to know if there is a way to put the account in webtoepub

Oct 13 '23 01:10 Alastorwill

@Alastorwill

so I want to know if there is a way to put the account in webtoepub

Most sites do this by setting cookies. And WebToEpub will use the cookies in the browser for the site. So, all you should need to do is log onto the site normally, and then run WebToEpub.

Failing that, you need to see how the logon works. Usually, POSTing a form with the credentials.

Oct 13 '23 01:10 dteviot

Sorry again, this is a js that I made and it worked for me but suddenly it no longer works for me and it sends me to the manual analyzer page as if I didn't have one for that page. I tried to see if they had changed the domain but it wasn't. I don't know if it's something they changed and I didn't fix it. This only happened to me when the analyzer had an error, but I've already used it well. Screenshot_2023-10-15-19-09-29-091_com teejay trebedit

Oct 16 '23 00:10 Alastorwill

I also tried https://ffxs8.com but it doesn't work either, it seems like they changed something on those pages but I don't know what it could be, everything looks the same

Oct 16 '23 00:10 Alastorwill

@Alastorwill

I also tried https://ffxs8.com/ but it doesn't work either,

I tried https://www.ffxs8.com/trxs/16224.html, chapters 1 & 2. They worked fine for me.

As regards trxs.me, it looks like it should work, although I've changed the CSS for getChapterUrls Refer commit above. Also,

Please don't send me screen shots of code. Paste the actual code in, I don't want to spend time typing it in.
Create a new issue, for a new site.

That said, test versions for Firefox and Chrome (with code for above sites) have been uploaded to https://drive.google.com/drive/folders/1B_X2WcsaI_eg9yA-5bHJb8VeTZGKExl8?usp=sharing. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes. Tested with: https://trxs.me/tongren/8595.html, chapters 1 & 2 https://trxs.me/tongren/8596.html, chapter 1

For my notes: 34 minutes work

Oct 16 '23 06:10 dteviot

sorry for bothering you again I was doing the parser for this page but it tells me that the content cannot be found

Oct 26 '23 00:10 Alastorwill

"use strict";

parserFactory.register("mtlnation.com", function() { return new MtlnationParser() });

class MtlnationParser extends Parser{ constructor() { super(); }

async getChapterUrls(dom) {
    let menu = dom.querySelector("div.chapters.my-shadow");
    return util.hyperlinksToChapterList(menu);
}

findContent(dom) {
    return dom.querySelector("div.desc.font-poppins");
}

findChapterTitle(dom) {
    return dom.querySelector("div.name-chapter");
}

findCoverImageUrl(dom) {
    return util.getFirstImgSrc(dom, "div.text-center");
}

removeUnwantedElementsFromContentElement(element) { util.removeChildElementsMatchingCss(element, ".#row"); super.removeUnwantedElementsFromContentElement(element); }

getInformationEpubItemChildNodes(dom) {
    return [...dom.querySelectorAll("div.descInfo")];
}

}

Oct 26 '23 00:10 Alastorwill

Error: Could not find content element for web page 'https://mtlnation.com/novel/comprehensive-manga-you-were-taken-in/chapter-1-twelve-trials'. at chrome-extension://lljhlogpiejifggpcabgbckpmpoplefk/js/Parser.js:501:23 at async Promise.all (index 0) at async MtlnationParser.fetchWebPages (chrome-extension://lljhlogpiejifggpcabgbckpmpoplefk/js/Parser.js:470:17)

Oct 26 '23 00:10 Alastorwill

Error: Could not find content element for web page 'https://mtlnation.com/novel/comprehensive-manga-you-were-taken-in/chapter-1-twelve-trials'. at chrome-extension://lljhlogpiejifggpcabgbckpmpoplefk/js/Parser.js:501:23 at async Promise.all (index 0) at async MtlnationParser.fetchWebPages (chrome-extension://lljhlogpiejifggpcabgbckpmpoplefk/js/Parser.js:470:17)

Oct 26 '23 00:10 Alastorwill

@Alastorwill

The owners of the mtlnation site have requested that WebToEpub does not process their site.

However, if you want to build a parser for your own use, you'll need to learn how to use Chrome's Network tab under the Developler tools.

If you use that, you will see that the page https://mtlnation.com/novel/comprehensive-manga-you-were-taken-in/chapter-1-twelve-trials does not contain any content. Instead, it makes a REST call to https://api.mtlnation.com/api/v2/chapters/comprehensive-manga-you-were-taken-in/chapter-1-twelve-trials Which returns a JSON object, with the chapter's content in the data.content.

So, have a look at the fetchChapter function in the InoveltranslationParser for how to handle something like this. https://github.com/dteviot/WebToEpub/blob/902e259f23440b3380e53921b7ac4bf03f916179/plugin/js/parsers/InoveltranslationParser.js#L53-L67

Oct 26 '23 06:10 dteviot

I couldn't understand how inoveltranslation.com works, it is branched with the other codes. Can you give me an explanation of how those codes work? I tried to modify the code but ended up not even knowing where I was going.

Oct 27 '23 00:10 Alastorwill

@Alastorwill The fetchChapter function() will fetch a chapter, when the URL provided doesn't have the chapter. So, step 1, convert URL of given page into URL that will give you the wanted URL. So, replace "//mtlnation.com/novel/" with "//api.mtlnation.com/api/v2/chapters/" e.g. let apiUrl = url.replace("//mtlnation.com/novel/", "//api.mtlnation.com/api/v2/chapters/");

Step 2. Make REST call and get JSON response let json = (await HttpClient.fetchJson(apiUrl)).json;

Step 3. Create empty chapter and put content from JSON into it

 let newDoc = Parser.makeEmptyDocForContent(url); 
 this.appendElement(newDoc, "h1", this.titleFromJson(json)); 
 this.appendParagraphs(newDoc, json.content);

Step 4, return the chapter

 return newDoc.dom;

Oct 27 '23 06:10 dteviot

I have been trying as you told me and it hasn't worked, I always get error 401. This is the last one I tried. Can you tell me if you see something wrong?

"use strict";

parserFactory.register("mtlnation.com", function() { return new MtlnationParser() });

class MtlnationParser extends Parser{ constructor() { super(); }

async getChapterUrls(dom) {
    let menu = dom.querySelector("div.chapters.my-shadow");
    return util.hyperlinksToChapterList(menu);
}

findContent(dom) {
    return dom.querySelector("div.desc.font-poppins");
}

findChapterTitle(dom) {
    return dom.querySelector("div.name-chapter");
}

findCoverImageUrl(dom) {
    return util.getFirstImgSrc(dom, "div.text-center");
}

async fetchChapter(url) {
    let apiUrl = url.replace("mtlnation.com/novel/", "api.mtlnation.com/api/v2/chapters");
    
    let json = (await HttpClient.fetchJson(apiUrl)).json;
 
     let new = Parser.makeEmptyDocForContent(url);

this.appendElement(new, "h1", this.titleFromJson(json)); this.appendParagraphs(new, json.content);

    return new.dom; 
}

removeUnwantedElementsFromContentElement(element) { util.removeChildElementsMatchingCss(element, ".#row"); super.removeUnwantedElementsFromContentElement(element); }

getInformationEpubItemChildNodes(dom) {
    return [...dom.querySelectorAll("div.descInfo")];
}

}

Oct 29 '23 02:10 Alastorwill

@Alastorwill

HTTP 401 error usually means you don't have access permissions. Which means:

Wrong URL,
Wrong cookies, or
You're running into anti scraping protection.

At this point, I try to compare the network trace between working when I browse manually, and what WebToEub sends, to see if there's an obvious difference/mistake.

Oct 29 '23 19:10 dteviot

I searched everything and couldn't find anything, I was baffled, I don't understand why it keeps failing.

Nov 07 '23 04:11 Alastorwill

WebToEpub WebToEpub copied to clipboard

Help write parser for https://ffxs8.com

WebToEpub
WebToEpub copied to clipboard