gutenberg icon indicating copy to clipboard operation
gutenberg copied to clipboard

Remove « .html » extension

Open kelson42 opened this issue 2 years ago • 8 comments

These « .html » extensions, for example here https://library.kiwix.org/content/gutenberg_fr_all/A/Les%20Fleurs%20du%20Mal_cover.6099.html, were necessary at the time we were using zimwriterfs. Zimwriterfs neede this to identify HTML content which shoukd be indexed. This is not necessary anymore. There it should be simplified and removed for cleaner URLs and smaller ZIM size.

kelson42 avatar Jan 25 '23 12:01 kelson42

It is not that simple, or I miss something, this extension is necessary to make a distinction between the various file formats in the archive.

For instance for book ID 18812 we have these three files now:

Douze ans de séjour dans la Haute-Éthiopie.18812.epub
Douze ans de séjour dans la Haute-Éthiopie.18812.html
Douze ans de séjour dans la Haute-Éthiopie_cover.18812.html

benoit74 avatar Jan 25 '23 12:01 benoit74

@benoit74 Should not create a conflict to remove « html » for books in html. This topic will anyway disappear IMO if we implement #95.

kelson42 avatar Jan 25 '23 12:01 kelson42

Ok, I didn't got this, all files would have an extension except for the HTML version. Makes sense to me.

benoit74 avatar Jan 25 '23 12:01 benoit74

This topic will anyway disappear IMO if we implement #95.

No, we'd still need the cover page so it won't be affected.

@benoit74 beside the chrome urls (Home.html), the most important one is the cover and yes the HTML format version when it's included.

To avoid conflicts yet keep decent-looking URLs I'd propose the following:

/18812/Douze ans de séjour dans la Haute-Éthiopie  # Cover page
/18812/Douze ans de séjour dans la Haute-Éthiopie.epub
/18812/Douze ans de séjour dans la Haute-Éthiopie.pdf
/18812/Douze ans de séjour dans la Haute-Éthiopie.html

I am fine with the HTML format being named .html because it's a formatted book, is a single file that can be saved as well ; and I like consistency.

@kelson42 if you don't like it, please suggest another pattern ; keeping in mind:

  • extensions are very important for files that can be saved to disk/phone.
  • We need the book ID somewhere because there can be duplicates in titles

rgaudin avatar Jan 26 '23 11:01 rgaudin

@rgaudin Agree with your proposal.

kelson42 avatar Jan 26 '23 13:01 kelson42

If it helps, I have code that will make a safe title based github-safe filename slug for any book in PG.

eshellman avatar Apr 20 '23 13:04 eshellman

hiii may i help by removing the « .html » extensions

prathamkumarjha avatar Apr 21 '23 01:04 prathamkumarjha

@prathamkumarjha ; yes, you can submit a PR , as per my comment above.

rgaudin avatar Apr 22 '23 19:04 rgaudin