WebToEpub cannot capture the Story Seedling Webpage
Hi, so I recently discovered this issue where webtoepub cannot take in the webpages of the story seeding website, i think they have some sort of protection against it? Anyways trying to copy and paste does not work and webtoepub cannot capture the contents of the page it just shows up blank.
Any suggestions how to get the contents to be visible? I've also tried using this story seedling parser on epubeditor and used the function run above script to modify epub but no new epub was generated and I'm officially out of ideas from here onwards : "use strict";
//dead url/ parser parserFactory.register("storyseedling.com", () => new StorySeedlingParser());
class StorySeedlingParser extends Parser { constructor() { super(); }
async getChapterUrls(dom) {
return [...dom.querySelectorAll("main .grid.w-full a")]
.map(link => this.linkToChapter(link));
}
linkToChapter(link) {
let title = link.querySelector(".truncate").textContent;
return ({
sourceUrl: link.href,
title: title,
});
}
findContent(dom) {
return (
dom.querySelector("div.prose .mb-4") || dom.querySelector("#chapter-content")
);
}
populateUIImpl() {
document.getElementById("removeAuthorNotesRow").hidden = false;
}
preprocessRawDom(webPageDom) {
let notes = webPageDom.querySelector("div.prose .mb-4:nth-of-type(2)");
if ((notes != null) && !this.userPreferences.removeAuthorNotes.value) {
this.tagAuthorNotes([notes]);
this.findContent(webPageDom).appendChild(notes);
}
}
extractTitleImpl(dom) {
return dom.querySelector("h1");
}
extractAuthor(dom) {
let authorLabel = dom.querySelector("div.leading-7 a");
return authorLabel?.textContent ?? super.extractAuthor(dom);
}
findChapterTitle(dom) {
return dom.querySelector(".truncate");
}
findCoverImageUrl(dom) {
return util.getFirstImgSrc(dom, "div[x-data='']");
}
getInformationEpubItemChildNodes(dom) {
return [...dom.querySelectorAll("div.order-2.mb-4")];
}
}
I can confirm with a different dataset that something strange is happening - the path mapping looks correct (when explored with an online tool) in the h5 file Measurements/DATETIME/Experiment/Path_Mappings folder, and the Measurements/DATETIME/Image/PathName_CHANNELand URLName_CHANNEL tables all show mapped paths, not unmapped paths, but when I go to try to execute the batch file on my local machine, it says it cannot find the unmapped directory. We tried this in 4 permutations (\\ vs \ separators, and trailing slash present vs absent), same issue every time.
Stack trace is
pipeline_exception
Error detected during run of module NamesAndTypes
Traceback (most recent call last):
File "/Users/bcimini/Documents/GitHub/CellProfiler/core/cellprofiler_core/pipeline/_pipeline.py", line 1016, in run_with_yield
self.run_module(module, workspace)
File "/Users/bcimini/Documents/GitHub/CellProfiler/core/cellprofiler_core/pipeline/_pipeline.py", line 1349, in run_module
module.run(workspace)
File "/Users/bcimini/Documents/GitHub/CellProfiler/core/cellprofiler_core/modules/namesandtypes.py", line 1940, in run
self.add_image_provider(
File "/Users/bcimini/Documents/GitHub/CellProfiler/core/cellprofiler_core/modules/namesandtypes.py", line 2003, in add_image_provider
self.add_simple_image(
File "/Users/bcimini/Documents/GitHub/CellProfiler/core/cellprofiler_core/modules/namesandtypes.py", line 2062, in add_simple_image
self.add_provider_measurements(provider, m, "Image")
File "/Users/bcimini/Documents/GitHub/CellProfiler/core/cellprofiler_core/modules/namesandtypes.py", line 2076, in add_provider_measurements
img = provider.provide_image(m)
File "/Users/bcimini/Documents/GitHub/CellProfiler/core/cellprofiler_core/image/abstract_image/file/url/_monochrome_image.py", line 33, in provide_image
image = URLImage.provide_image(self, image_set)
File "/Users/bcimini/Documents/GitHub/CellProfiler/core/cellprofiler_core/image/abstract_image/file/_file_image.py", line 328, in provide_image
self.__set_image()
File "/Users/bcimini/Documents/GitHub/CellProfiler/core/cellprofiler_core/image/abstract_image/file/_file_image.py", line 259, in __set_image
self.__set_image_volume()
File "/Users/bcimini/Documents/GitHub/CellProfiler/core/cellprofiler_core/image/abstract_image/file/_file_image.py", line 332, in __set_image_volume
pathname = url2pathname(self.get_url())
File "/Users/bcimini/Documents/GitHub/CellProfiler/core/cellprofiler_core/image/abstract_image/file/url/_url_image.py", line 33, in get_url
if self.cache_file():
File "/Users/bcimini/Documents/GitHub/CellProfiler/core/cellprofiler_core/image/abstract_image/file/_file_image.py", line 159, in cache_file
raise IOError(
OSError: Test for access to directory failed. Directory: ///C:/REST_OF_UNMAPPED_PATH
Nowhere near solved yet, but a bit of tracing (all the below in 4.2.x):
The name, it turns out, is stored in the ImagePlaneDetails, which are a Java object. Horrifying that that's what we're putting in the batch file IMO.
For a file on Mac, it starts with file:; for a file on Windows, it starts with file:///C: etc. NamesAndTypes pulls this and then sends it to alter_url_post_create_batch in the measurements class, which among other things, hardcodes the number of characters before the path starts to 5, and tries to do a dumb find-and-replace that is failing, for among other reasons, because it's using startswith (and the hardcoded path digits are wrong), and also it tries to path separator adjust the filename, but not the pathmapping, and then say "if pathmapping there", but it's not, because only one has been adjusted not the other! So I think alter_url_post_create_batch is the thing we need to change if we're hotfixing a CP4 fix; I'm wondering though if this means this has been broken back all the way to at least CP4.0.0, if not before? I swear I used to make batchfiles on my Windows machine when I started in the group, but maybe that's a Mandela effect...
For CP5, I think since the H5 file already has lovely path-adjusted names in it, we should think about if the input modules should just read those rather than trying to pass along the Java objects and treat them more like a load_data, but not sure if that's birds or national parks.