WebToEpub icon indicating copy to clipboard operation
WebToEpub copied to clipboard

Please add site https://fanficus.com

Open DEATHAN-SAA opened this issue 11 months ago • 4 comments

Hello. I'm sorry to bother you, but I don't know how to bypass websites with protection. So that it doesn't give errors like this:

Error: Не найден элемент с содержимым на странице 'https://fanficus.com/events'. at chrome-extension://akiljllkbielkidmammnifcnibaigelm/js/Parser.js:519:23 at async Promise.all (index 0) at async DefaultParser.fetchWebPages (chrome-extension://akiljllkbielkidmammnifcnibaigelm/js/Parser.js:484:17)

Provide URL for web page that contains Table of Contents (list of chapters) of a typical story on the site

https://fanficus.com/post/65155f1a60150b0018c984c7

Did you try using the Default Parser for the site? If not, why not?

Instructions for using the default parser can be found at https://github.com/dteviot/WebToEpub/wiki/FAQ#how-to-convert-a-new-site-using-the-default-parser

What settings did you use? What didn't work?

  • URL of first chapter *https://fanficus.com/post/65155f1a60150b0018c984c7/post-part/65155f6a60150b0018c98621?ownership=
  • CSS selector for element holding content to put into EPUB * #post-part__text
  • CSS selector for element holding Title of Chapter
  • #post-part > div.ff-reader-content > div.ff-reader-header.ng-star-inserted > app-reader-header > div > div > span

Unfortunately, I don't understand how to create a parser on my own while bypassing the error.

DEATHAN-SAA avatar Jan 23 '25 18:01 DEATHAN-SAA

@DEATHAN-SAA At this point in time, I'm unable to get WebToEpub to handle this site.

Notes on fetching content for chapter.

Starting with ToC of https://fanficus.com/post/65155f1a60150b0018c984c7

URL of first chapter is given as https://fanficus.com/post/65155f1a60150b0018c984c7/post-part/65155f6a60150b0018c98621?ownership=

But there's no content in the HTML returned by that URL. Instead, content seems to come from a REST call to https://fanficus-server-mirror-879c30cd977f.herokuapp.com/api/v1/post/65155f1a60150b0018c984c7/post-part/65155f6a60150b0018c98621?uSId=da23ec4e-791d-4012-b361-f7295938109e

I don't know where/how to get the uSId value.

dteviot avatar Jan 26 '25 06:01 dteviot

Hello! I watched it again. • URL of first chapter *https://fanficus.com/post/65155f1a60150b0018c984c7/post-part/65155f6a60150b0018c98621?ownership= • CSS selector for element holding content to put into EPUB

  • #post-part__text • CSS selector for element holding Title of Chapter • #.ff-rh-title > span:nth-child(1) That's how WebToEpub let me go further and even collected the chapters, although it missed the "prologue," and there were many "extra" ones. After that, WebToEpub created a working archive.

Image

Image

Image

DEATHAN-SAA avatar Feb 05 '25 18:02 DEATHAN-SAA

@DEATHAN-SAA

I'm a bit confused. Are you saying that the default parser is working? In which case, what are you asking me to do?

dteviot avatar Feb 09 '25 22:02 dteviot

Hello. Before this, WebToEpub didn't want to work for me at all. I mentioned the error in the first post. I still can't understand how it started working for me, even though I had tried the same parameters before.

Why all this? It would be nice not to have to search for the first chapter every time and not to add it manually to the list of chapters + not to delete the extra accumulating information.

Even if I try to write a parser myself to make life easier without constantly fixing links, I'm a newbie and don't know where I'm making mistakes. I based it on another parser, but I tried to find similar information using the information on the page and changed it to what I found.

`"use strict";

parserFactory.register("fanficus.com", () => new fanficusParser());

class fanficusParser extends Parser{ constructor() { super(); }

async getChapterUrls(dom) {
    let base = this.makeChapterBaseUrl(dom);
    let json = this.getJsonWithChapters(dom);
    return json.map(j => this.jsonToChapters(j, base));
}

getJsonWithChapters(dom) {
    let startString = "window.__CONTENT__ = ";
    let scriptElement = [...dom.querySelectorAll("script")]
        .filter(s => s.textContent.includes(startString))[0];
    return util.locateAndExtractJson(scriptElement.textContent, startString)
}

makeChapterBaseUrl(dom) {
    let base = new URL(dom.baseURI);
    let tip = base.pathname.split("/").pop();
    return `https://fanficus.com/post/${tip}/post-part/`
}

jsonToChapters(json, base) {
    let name = json.name;
    if (!util.isNullOrEmpty(name)) {
        name = " - " + name;
    }
    return ({
        sourceUrl: `${base}v${json.volume}/c${json.number}`,
        title: `Том ${json.volume} Глава ${json.number}${name}`
    });
}

findContent(dom) {
    return dom.querySelector("#post-part__text");
}

extractTitleImpl(dom) {
    return dom.querySelector(".ng-star-inserted");
}

extractAuthor(dom) {
    let authorLabel = dom.querySelector(".app-post-meta .app-post-main-info span");
    return authorLabel?.textContent ?? super.extractAuthor(dom);
}

extractLanguage(dom) {
    return dom.querySelector("html").getAttribute("lang");
}

findChapterTitle(dom) {
    return dom.querySelector("[data-media-down].ff-chapter-list overflow-hidden ng-star-inserted");
}

findCoverImageUrl(dom) {
    return util.getFirstImgSrc(dom, ".ff-post-header-meta");
}

} `

And this is what I end up with:

`TypeError: Cannot read properties of undefined (reading 'textContent') at fanficusParser.getJsonWithChapters (chrome-extension://negdafacmkjjmmmepginmndfpgonehmd/js/parsers/fanficusParser.js:20:56) at fanficusParser.getChapterUrls (chrome-extension://negdafacmkjjmmmepginmndfpgonehmd/js/parsers/fanficusParser.js:12:25) at fanficusParser.onLoadFirstPage (chrome-extension://negdafacmkjjmmmepginmndfpgonehmd/js/Parser.js:392:14) at processInitialHtml (chrome-extension://negdafacmkjjmmmepginmndfpgonehmd/js/main.js:47:24) at populateControlsWithDom (chrome-extension://negdafacmkjjmmmepginmndfpgonehmd/js/main.js:253:9) at onMessageListener (chrome-extension://negdafacmkjjmmmepginmndfpgonehmd/js/main.js:16:13)

` Thank you for spending your time on me.

DEATHAN-SAA avatar Feb 12 '25 19:02 DEATHAN-SAA

@DEATHAN-SAA

Possible limitations.

  1. I've hardcoded server hostname for fetching content. Might need to fetch that dynamically.
  2. Not sure how this will work for story with lots of chapters, as it assumes all chapters are listed on the Table of Contents page. Might not find all the chapters to fetch. If you can provide URL for Table of Contents page with missing chapters, I can look into it if it doesn't work.

Test versions for Firefox and Chrome have been uploaded to https://github.com/dteviot/WebToEpub/releases/tag/developer-build. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes. Tested with:

  • https://fanficus.com/post/65155f1a60150b0018c984c7, chapters 1, 2

Notes Time taken: 34 minutes (Not counting initial examination 3 weeks ago. Estimate 1 hour total)

dteviot avatar Feb 23 '25 07:02 dteviot

@DEATHAN-SAA

Updated version (1.0.3.0) has been submitted to Firefox and Chrome stores. Firefox version is available now. Chrome might be available in a few hours (typical) to 21 days.

dteviot avatar Mar 09 '25 05:03 dteviot