Make the directory hierarchy and the file name structure configurable
My wish would be to create the possibility to omit file extensions for HTML documents such as html or php or to make them configurable. So instead of /path/filename.html I would like to use /path/filename or /path/filename/.
Of course, the static copy of the website created may then only be usable with the help of a web server. This is a known fact.
I assume that the URL will be split into its components anyway. Ideally, you could then make the directory hierarchy and the file name structure fully configurable. The components could be assigned to variables or placeholders and then used in the new parameter.
s:scheme
h:host
p:path (with leading but without trailing slash)
f:filename (without last dot and extension)
e:extension (without dot)
b:basename (with all dots and extension)
q:query (without ?)
a:fragment (anchor link without #)
We should make it as simple as possible and stick to the rules of parse_url() and pathinfo().
Example: --custom-site-structure='%s%h%p/%f__%q#%a'
Care must be taken to ensure that there is no collision with the replace-query-string functionality.
Adjusting the file extension for HTML documents would be sufficient for now. However, my ideas may be useful for preparing further configurability in the code, even if this is not yet feasible.
Hi @GitHub-Mike,
the implementation of this function will be quite difficult. When saving to filesystem crawler solves for example conflicts like "/dev.js" (as file) in combination with "/dev.js/file.js". In such a case, the root directory on the disk must not contain "dev.js" as a folder and "dev.js" as a file. But on real sites they set such conflicts and it's fine on the URL level, but unfortunately not on the level of storing the file on disk. I have encountered these conflicts mostly in the web pages of various libraries and frameworks.
These edge-cases then significantly degrade even the possibilities of configurations like you suggest. It's not unrealistic, but it's quite difficult to implement.
the implementation of this function will be quite difficult.
Yes, there are of course a few things to consider when fully implementing configurability.
When saving to filesystem crawler solves for example conflicts like "/dev.js" (as file) in combination with "/dev.js/file.js". In such a case, the root directory on the disk must not contain "dev.js" as a folder and "dev.js" as a file.
Yes, this is a problem with the Windows file system and must be taken into account. However, the request to remove the file extension relates exclusively to HTML documents and not to assets. The idea with the parameters, on the other hand, could be used for the asset structure.
But on real sites they set such conflicts and it's fine on the URL level, but unfortunately not on the level of storing the file on disk. I have encountered these conflicts mostly in the web pages of various libraries and frameworks.
As "said", I would rule out removing the file extension for assets. HTML documents without an extension are saved as folders and require an index.html file within them. This is of course a solution that only works with web servers and not as a local offline variant.
These edge-cases then significantly degrade even the possibilities of configurations like you suggest. It's not unrealistic, but it's quite difficult to implement.
Yes, I am aware of that and that is why some special cases must be analysed and excluded.