WebToEpub
WebToEpub copied to clipboard
Help write parser for https://ffxs8.com
Hello friend, I am trying to learn to make my own analyzers but I don't understand much, can you tell me what the error is?
If you could tell me where I was wrong I would appreciate it.
@Alastorwill hi your problem is in line 11
let tocUrl = dom.querySelector(".detail .detail-btn .read").href;
The error message says, that it can't read href.
example link: https://ffxs8.com/trxs/16224.html
Solution:
let tocUrl = dom.querySelector(".detail .detail-btn .read").firstChild.getAttribute("href");
tocUrl is "/trxs/16224/1.html"
Or:
let tocUrl = dom.querySelector(".detail .detail-btn .read").firstChild.href;
tocUrl is "https://ffxs8.com/trxs/16224/1.html"
it didn't work for me
it didn't work for me
These are the results I really don't know if this part is correct .detail .detail-btn .read since I don't have much experience I saw a different analyzer and searched for the same parts that the other page had I'm not sure if I was wrong on that one part since I didn't understand that part
@Alastorwill
This works for me. Tested with
- https://www.ffxs8.com/trxs/16224.html, chapters 1 and 2
Notes,
- As the above page seems to have the list of chapters, there's no need to make an additional call to get the list, just read from the initial web page
- Character set is gb3212 not gb18030
- 48 minutes work
"use strict";
parserFactory.register("ffxs8.com", () => new Ffxs8Parser());
class Ffxs8Parser extends Parser{
constructor() {
super();
}
async getChapterUrls(dom) {
let menu = dom.querySelector("div.catalog");
return util.hyperlinksToChapterList(menu);
}
findContent(dom) {
return dom.querySelector("div.content");
}
findChapterTitle(dom) {
return dom.querySelector("div.article h1");
}
findCoverImageUrl(dom) {
return util.getFirstImgSrc(dom, "div.cover");
}
async fetchChapter(url) {
return (await HttpClient.wrapFetch(url, this.makeOptions())).responseXML;
}
makeOptions() {
return ({
makeTextDecoder: () => new TextDecoder("gb2312")
});
}
getInformationEpubItemChildNodes(dom) {
return [...dom.querySelectorAll("div.descInfo")];
}
}
Thanks for helping me, I'm already understanding more. I tried another page that was similar and made some modifications and it worked for me in some, it's more complicated than in others. By the way, I don't know if you know how to put the option to register. There are pages that don't let you see the entire chapter. unless you are registered on the page but webtoepub does not recognize it even if it is in the browser so I want to know if there is a way to put the account in webtoepub
@Alastorwill
so I want to know if there is a way to put the account in webtoepub
Most sites do this by setting cookies. And WebToEpub will use the cookies in the browser for the site. So, all you should need to do is log onto the site normally, and then run WebToEpub.
Failing that, you need to see how the logon works. Usually, POSTing a form with the credentials.
Sorry again, this is a js that I made and it worked for me but suddenly it no longer works for me and it sends me to the manual analyzer page as if I didn't have one for that page. I tried to see if they had changed the domain but it wasn't. I don't know if it's something they changed and I didn't fix it. This only happened to me when the analyzer had an error, but I've already used it well.
I also tried https://ffxs8.com but it doesn't work either, it seems like they changed something on those pages but I don't know what it could be, everything looks the same
@Alastorwill
I also tried https://ffxs8.com/ but it doesn't work either,
I tried https://www.ffxs8.com/trxs/16224.html, chapters 1 & 2. They worked fine for me.
As regards trxs.me, it looks like it should work, although I've changed the CSS for getChapterUrls Refer commit above. Also,
- Please don't send me screen shots of code. Paste the actual code in, I don't want to spend time typing it in.
- Create a new issue, for a new site.
That said, test versions for Firefox and Chrome (with code for above sites) have been uploaded to https://drive.google.com/drive/folders/1B_X2WcsaI_eg9yA-5bHJb8VeTZGKExl8?usp=sharing. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes. Tested with: https://trxs.me/tongren/8595.html, chapters 1 & 2 https://trxs.me/tongren/8596.html, chapter 1
For my notes: 34 minutes work
sorry for bothering you again I was doing the parser for this page but it tells me that the content cannot be found
"use strict";
parserFactory.register("mtlnation.com", function() { return new MtlnationParser() });
class MtlnationParser extends Parser{ constructor() { super(); }
async getChapterUrls(dom) {
let menu = dom.querySelector("div.chapters.my-shadow");
return util.hyperlinksToChapterList(menu);
}
findContent(dom) {
return dom.querySelector("div.desc.font-poppins");
}
findChapterTitle(dom) {
return dom.querySelector("div.name-chapter");
}
findCoverImageUrl(dom) {
return util.getFirstImgSrc(dom, "div.text-center");
}
removeUnwantedElementsFromContentElement(element) { util.removeChildElementsMatchingCss(element, ".#row"); super.removeUnwantedElementsFromContentElement(element); }
getInformationEpubItemChildNodes(dom) {
return [...dom.querySelectorAll("div.descInfo")];
}
}
Error: Could not find content element for web page 'https://mtlnation.com/novel/comprehensive-manga-you-were-taken-in/chapter-1-twelve-trials'. at chrome-extension://lljhlogpiejifggpcabgbckpmpoplefk/js/Parser.js:501:23 at async Promise.all (index 0) at async MtlnationParser.fetchWebPages (chrome-extension://lljhlogpiejifggpcabgbckpmpoplefk/js/Parser.js:470:17)
Error: Could not find content element for web page 'https://mtlnation.com/novel/comprehensive-manga-you-were-taken-in/chapter-1-twelve-trials'. at chrome-extension://lljhlogpiejifggpcabgbckpmpoplefk/js/Parser.js:501:23 at async Promise.all (index 0) at async MtlnationParser.fetchWebPages (chrome-extension://lljhlogpiejifggpcabgbckpmpoplefk/js/Parser.js:470:17)
@Alastorwill
The owners of the mtlnation site have requested that WebToEpub does not process their site.
However, if you want to build a parser for your own use, you'll need to learn how to use Chrome's Network tab under the Developler tools.
If you use that, you will see that the page https://mtlnation.com/novel/comprehensive-manga-you-were-taken-in/chapter-1-twelve-trials does not contain any content. Instead, it makes a REST call to https://api.mtlnation.com/api/v2/chapters/comprehensive-manga-you-were-taken-in/chapter-1-twelve-trials Which returns a JSON object, with the chapter's content in the data.content.
So, have a look at the fetchChapter function in the InoveltranslationParser for how to handle something like this. https://github.com/dteviot/WebToEpub/blob/902e259f23440b3380e53921b7ac4bf03f916179/plugin/js/parsers/InoveltranslationParser.js#L53-L67
I couldn't understand how inoveltranslation.com works, it is branched with the other codes. Can you give me an explanation of how those codes work? I tried to modify the code but ended up not even knowing where I was going.
@Alastorwill The fetchChapter function() will fetch a chapter, when the URL provided doesn't have the chapter. So, step 1, convert URL of given page into URL that will give you the wanted URL. So, replace "//mtlnation.com/novel/" with "//api.mtlnation.com/api/v2/chapters/" e.g. let apiUrl = url.replace("//mtlnation.com/novel/", "//api.mtlnation.com/api/v2/chapters/");
Step 2. Make REST call and get JSON response let json = (await HttpClient.fetchJson(apiUrl)).json;
Step 3. Create empty chapter and put content from JSON into it
let newDoc = Parser.makeEmptyDocForContent(url);
this.appendElement(newDoc, "h1", this.titleFromJson(json));
this.appendParagraphs(newDoc, json.content);
Step 4, return the chapter
return newDoc.dom;
I have been trying as you told me and it hasn't worked, I always get error 401. This is the last one I tried. Can you tell me if you see something wrong?
"use strict";
parserFactory.register("mtlnation.com", function() { return new MtlnationParser() });
class MtlnationParser extends Parser{ constructor() { super(); }
async getChapterUrls(dom) {
let menu = dom.querySelector("div.chapters.my-shadow");
return util.hyperlinksToChapterList(menu);
}
findContent(dom) {
return dom.querySelector("div.desc.font-poppins");
}
findChapterTitle(dom) {
return dom.querySelector("div.name-chapter");
}
findCoverImageUrl(dom) {
return util.getFirstImgSrc(dom, "div.text-center");
}
async fetchChapter(url) {
let apiUrl = url.replace("mtlnation.com/novel/", "api.mtlnation.com/api/v2/chapters");
let json = (await HttpClient.fetchJson(apiUrl)).json;
let new = Parser.makeEmptyDocForContent(url);
this.appendElement(new, "h1", this.titleFromJson(json)); this.appendParagraphs(new, json.content);
return new.dom;
}
removeUnwantedElementsFromContentElement(element) { util.removeChildElementsMatchingCss(element, ".#row"); super.removeUnwantedElementsFromContentElement(element); }
getInformationEpubItemChildNodes(dom) {
return [...dom.querySelectorAll("div.descInfo")];
}
}
@Alastorwill
HTTP 401 error usually means you don't have access permissions. Which means:
- Wrong URL,
- Wrong cookies, or
- You're running into anti scraping protection.
At this point, I try to compare the network trace between working when I browse manually, and what WebToEub sends, to see if there's an obvious difference/mistake.
I searched everything and couldn't find anything, I was baffled, I don't understand why it keeps failing.