WebToEpub icon indicating copy to clipboard operation
WebToEpub copied to clipboard

Please add site https://sangtacviet.vip/

Open Samirbsnajh opened this issue 1 year ago • 10 comments

Please note, I'm basically the only developer working on WebToEpub, and I'm not paid for doing this. (WebToEpub is completely free, and generates no money.) By asking to add a site, you're asking me to give you some of my limited free time. So, I think it's not unreasonable for me to ask you to do as much as you can to help me.

Provide URL for web page that contains Table of Contents (list of chapters) of a typical story on the site

Did you try using the Default Parser for the site? If not, why not?

Instructions for using the default parser can be found at https://dteviot.github.io/Projects/webToEpub_DefaultParser.html

What settings did you use? What didn't work?

  • URL of first chapter
  • CSS selector for element holding content to put into EPUB
  • CSS selector for element holding Title of Chapter
  • CSS selector for element(s) to remove

If the Default Parser did not work, if you have developer skills, did you try writing a new parser?

Instructions https://dteviot.github.io/Projects/webToEpub_FAQ.html#write-parser

If you don't have developer skills, can you ask a friend who does have them if they can do it for you?

If you tried writing a parser, and it doesn't work. Attach the parser here.

Samirbsnajh avatar Sep 09 '24 23:09 Samirbsnajh

This host seems to be a mirror of https://sangtacviet.com/, so is almost (but not quite) duplicate of https://github.com/dteviot/WebToEpub/issues/1477 e.g.

  • https://sangtacviet.com/truyen/qidian/1/1041491430/
  • https://sangtacviet.vip/truyen/qidian/1/1041491430/

dteviot avatar Sep 10 '24 07:09 dteviot

@dteviot This site is hard to crawl.

Kaizo2004 avatar Sep 10 '24 16:09 Kaizo2004

@Kaizo2004 Can you provide more details?

  • What exactly are you trying to do?
  • How are you doing it?
  • What makes it hard?

dteviot avatar Sep 10 '24 19:09 dteviot

@dteviot I've tried adding the site multiple times, but it didn't work. This is the first time I've encountered an issue with any site

OK -- Error: Could not find content element for web page 'https://sangtacviet.vip/truyen/faloo/1/1433830/1/'. at chrome-extension://lmpaopndjhekdgkedjoefdamomekeiic/js/DefaultParserUI.js:154:23

Kaizo2004 avatar Sep 10 '24 21:09 Kaizo2004

@Kaizo2004 the problem is the same as in #1477. Example: novel: https://sangtacviet.vip/truyen/qidian/1/1041491430/ 1st Chapter: https://sangtacviet.vip/truyen/qidian/1/1041491430/804134403/ 1st Chapter content link: https://sangtacviet.vip/index.php?bookid=1041491430&h=qidian&c=804134403&ngmar=readc&sajax=readchapter&sty=1&exts= Problems:

  1. You can't just crawl the Chapter link as the content isn't in the HTML of this request.
  2. You have to use the Chapter content link which if you analyze it has a few similarities with the Chapter link.
  3. If you just crawl the Chapter content link you get an empty response from the server because of this you need to send in the header: "Referer: https://sangtacviet.vip" to get the Chapter content instead of a response. If it succeeds you get a json response: image

How to test it:

  1. Use firefox
  2. Open Dev tools (CTRL+Shift+E)
  3. Open this link to get the right cookies: https://sangtacviet.vip/truyen/qidian/1/1041491430/
  4. Open this link to try and get the content: https://sangtacviet.vip/index.php?bookid=1041491430&h=qidian&c=804134403&ngmar=readc&sajax=readchapter&sty=1&exts=
  5. In the Network tab of the dev tools select the first request with a Size of 0 B image
  6. Click "Resend" image
  7. In the new segment under Headers add "Referer" and "https://sangtacviet.vip" image
  8. Click "Send"
  9. Now you have a new Network request with a size of 162.18 kB (in this example) and if you look at the Response you can see content. image image

Here is described how to set headers in the fetch api https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch#setting_headers maybe i am going to do that i am not sure.

gamebeaker avatar Sep 10 '24 21:09 gamebeaker

@gamebeaker

Here is described how to set headers in the fetch api https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch#setting_headers maybe i am going to do that i am not sure.

That won't work. Referrer can't be modified by the fetch API. https://developer.mozilla.org/en-US/docs/Glossary/Forbidden_header_name. IIRC you need to use the webRequest API. Which IIRC is not supported by Chrome V3 manifest. Refer Firefox.js file.

dteviot avatar Sep 10 '24 22:09 dteviot

This is another site where the solution is probably to open the page in a new tab, then inject content script into page to fetch the content. Note to self, I really need to stop procrastinating and build that.

dteviot avatar Sep 10 '24 22:09 dteviot

@gamebeaker

Here is described how to set headers in the fetch api https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch#setting_headers maybe i am going to do that i am not sure.

That won't work. Referrer can't be modified by the fetch API. https://developer.mozilla.org/en-US/docs/Glossary/Forbidden_header_name. IIRC you need to use the webRequest API. Which IIRC is not supported by Chrome V3 manifest. Refer Firefox.js file.

I guess https://developer.chrome.com/docs/extensions/reference/api/declarativeNetRequest should work (new manifest permission). But this is just a temporary fix i guess more websites will be using frameworks like nextjs (reaperscans.com) and for these the new tab method would be the right solution.

gamebeaker avatar Sep 10 '24 23:09 gamebeaker

I give up xD

@gamebeaker

Kaizo2004 avatar Sep 13 '24 23:09 Kaizo2004

One fruity idea: If the plugin can access the chrome devtools, you can puppet a remote tab to fetch, and then retreive the rendered HTML after it's been rendered by the site's JS.

I have done this from python without too much trouble, but I have no idea how to do it from within the context of an extension.

This may also wind up needing different code between chromium and firefox, due to variances in the debug protocols.

fake-name avatar Jan 03 '25 10:01 fake-name

@Samirbsnajh Test versions for Firefox and Chrome have been uploaded to https://github.com/dteviot/WebToEpub/releases/tag/developer-build. Pick the one suitable for you, follow the "How to install from Source" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-installation and let me know how it goes.

gamebeaker avatar May 02 '25 04:05 gamebeaker

@Samirbsnajh

Updated version (1.0.6.0) has been submitted to Firefox and Chrome stores. Firefox version is available now. Chrome might be available in a few hours (typical) to 21 days.

dteviot avatar Jul 06 '25 03:07 dteviot