Zeno icon indicating copy to clipboard operation
Zeno copied to clipboard

EPUB Support

Open akshithio opened this issue 9 months ago • 2 comments

Copied over from PR #252:

This PR adds initial support for epub files alongside a custom-made my_epub.epub in testdata for which test cases are ~still a work in progress~ also written and passing.

Since epub files are essentially zipped folders containing xhtml and html files, epub.go currently works by unzipping the file and looking for these assets. I've tried to take inspiration from existing implementations and patterns in html.go where I felt applicable. Please advise on whether this is a suitable approach.


Additionally, I have addressed the issue with the ensureBasePath and fixed that to make sure that the implementation is more OS-agnostic.

Attempts to close #160.

akshithio avatar Apr 07 '25 18:04 akshithio

Thanks! It does look good! But does it not make a lot of duplicated code in EPUBAssets with the HTML assets extraction?

CorentinB avatar Apr 10 '25 13:04 CorentinB

@CorentinB

Apologies for the delay but I did make some changes to try to address this issue. I'm not necessarily sure if all of them are more suitable which is why I'd like to hear what you think about them.

Initially, I thought it would be ebst to just mostly re-write logic into a new file for for .epub extensions to deal with the minor differences between the two formats such as for example, the difference in folder structures and how relative file paths are mentioned in .epub files. However, I've now made changes to that approach with a shared helper for both outlinks and assets but that means I have also had to change html.go and some other files. I have listed those changes in detail here:

  • html_extractor.go - new file created containing methods to extract outlinks and assets in html.go and epub.go
  • html.go - Changed to call the new methods in html_extractor.go instead
  • epub.go - Changed to call the new methods in html_extractor.go instead
  • epub_test.go - Add two new test links in a previous test case because I was not previously detecting a .js file and some other file (I noticed this when re-writing and was accordingly able to use the new html_extractor.go file effectively)
  • html_test.go - Changed to check the returned links instead of the count both make stronger test cases and to help in effective debugging.
  • json.go - Minor changes to findURLs to ensure relative paths are also passed back, which is required for .epub asset detection.

Additionally, I think it would be wise to hold on making the changes I need to fix the errors on url_test.go and the other comments on my other PR #261 based on whether or not you think this refactor is suitable, since there is overlap in the files that they do edit.

akshithio avatar Apr 21 '25 21:04 akshithio

No activity, closing.

CorentinB avatar Aug 14 '25 08:08 CorentinB