EPUB Support
Copied over from PR #252:
This PR adds initial support for epub files alongside a custom-made my_epub.epub in testdata for which test cases are ~still a work in progress~ also written and passing.
Since epub files are essentially zipped folders containing xhtml and html files, epub.go currently works by unzipping the file and looking for these assets. I've tried to take inspiration from existing implementations and patterns in html.go where I felt applicable. Please advise on whether this is a suitable approach.
Additionally, I have addressed the issue with the ensureBasePath and fixed that to make sure that the implementation is more OS-agnostic.
Attempts to close #160.
Thanks! It does look good! But does it not make a lot of duplicated code in EPUBAssets with the HTML assets extraction?
@CorentinB
Apologies for the delay but I did make some changes to try to address this issue. I'm not necessarily sure if all of them are more suitable which is why I'd like to hear what you think about them.
Initially, I thought it would be ebst to just mostly re-write logic into a new file for for .epub extensions to deal with the minor differences between the two formats such as for example, the difference in folder structures and how relative file paths are mentioned in .epub files. However, I've now made changes to that approach with a shared helper for both outlinks and assets but that means I have also had to change html.go and some other files. I have listed those changes in detail here:
html_extractor.go- new file created containing methods to extract outlinks and assets inhtml.goandepub.gohtml.go- Changed to call the new methods inhtml_extractor.goinsteadepub.go- Changed to call the new methods inhtml_extractor.goinsteadepub_test.go- Add two new test links in a previous test case because I was not previously detecting a.jsfile and some other file (I noticed this when re-writing and was accordingly able to use the newhtml_extractor.gofile effectively)html_test.go- Changed to check the returned links instead of the count both make stronger test cases and to help in effective debugging.json.go- Minor changes tofindURLsto ensure relative paths are also passed back, which is required for.epubasset detection.
Additionally, I think it would be wise to hold on making the changes I need to fix the errors on url_test.go and the other comments on my other PR #261 based on whether or not you think this refactor is suitable, since there is overlap in the files that they do edit.
No activity, closing.