hexo-util
hexo-util copied to clipboard
fix(full_url_for): handle config.url with a trailing slash
a notable example is Github Pages with project page, e.g. https://github.com/OpenTechSchool/social-coding corresponds to https://opentechschool.github.io/social-coding/. Project page always have a trailing slash, accessing https://opentechschool.github.io/social-coding
will 301-redirect to https://opentechschool.github.io/social-coding/
. So, in this case, user may prefer to use url: https://opentechschool.github.io/social-coding/
.
This PR adds support for that.
Coverage increased (+0.005%) to 97.092% when pulling 6cd14945a668052bf58dcba1b856bbd95eac69e5 on curbengh:full-url-for into 14fa817f350cc4c318a411fbd52c0f0cce7cea2e on hexojs:master.
Yes, and there is a hint in the _config.yml
If your site is put in a subdirectory, set url as 'http://yoursite.com/child' and root as '/child/'
https://github.com/hexojs/hexo-starter/blob/259ecc3982e0445075f6c0daaccc24e4f1c7e663/_config.yml#L15
Which means
url: https://example.com/blog/
should be considered as invalid config.
See also https://github.com/hexojs/hexo/issues/1812#issuecomment-213667682
config.url = config.url.replace(//+$/, '');
that failsafe is not available in Hexo API (e.g. when use in a plugin's unit test).
It is also not ideal in use cases such as rss and sitemap. When a trailing slash (e.g. http://example.com/blog/
) is the actual URL (no redirection), those use cases should use that URL; currently due to that failsafe (trailing slash is removed), web crawler may catch a 301 redirection (when crawling through a sitemap).
@curbengh
hexo.config
is added by load_config
, thus the failsafe is always there. Plugins‘ unit test? They should be written in valid config format (because we already transform it into the valid way during load_config
). In case you are worrying about mocking test cases, use await hexo.init() && await hexo.load()
(in which load_config
will be called) will be fine.
Also, current behavior of full_url_for('/')
will always resulted in hexo.config.url + '/'
(see full_url_for
's test cases). There is no need to worry about trailing slash
issue.
https://github.com/hexojs/hexo-util/blob/14fa817f350cc4c318a411fbd52c0f0cce7cea2e/test/full_url_for.spec.js#L13-L15
Last but not least, all browsers, http request libs as well as crawlers will treat //example.com
as //example.com/
, and no 301 will be returned:
GET https://blog.skk.moe
> HTTP/1.1 200 OK
GET https://blog.skk.moe.
> HTTP/1.1 200 OK
GET https://blog.skk.moe/
> HTTP/1.1 200 OK
no 301 will be returned
I was referring to url: http://example.com/blog/
example. load_config.js
removes the trailing slash, so requesting http://example.com/blog
will be 301-ed to http://example.com/blog/
(in GH Pages case). feed
currently does add it back, but for unrelated purpose (it was for config.url + path
). This remove/add trailing slash operation seems un-intuitive to me.
## If your site is put in a subdirectory, set url as 'http://yoursite.com/child' and root as '/child/'
url: http://yoursite.com
root: /
My understanding of the rational behind http://yoursite.com/child
limitation is that Hexo previously used config.url + path
. After the availability of full_url_for()
, that concat approach is no longer used (AFAIK) and we even discourage theme/plugin devs from using it. So, the limitation is no longer needed, as in it's now trivial to be flexible to existence of trailing slash.
In GH Pages, since http://yoursite.com/child/
is the actual URL, it's natural for newcomers to use that; it's only after they notice ## If your site is put in a subdirectory...
that the trailing slash needs to be removed.
Hexo blog can be hosted anywhere, we can't know whether http://yoursite.com/child/
or http://yoursite.com/child
is the actual one, a web server/hosting platform may prefer to have a trailing slash and vice versa. Since we can't know, why not let the user decides?
Fixed invalid protocol (https:/example.com/index.html
) raised by @jiangtj
full_url_for('/')
now simply returns config.url
(note: this is different from previous behavior):
const hexo = {
config: {
url: 'http://example.com'
}
}
console.log(full_url_for.call(hexo, '/'))
// http://example.com
hexo.config.url = 'http://example.com/'
console.log(full_url_for.call(hexo, '/'))
// http://example.com/
hexo.config.url = 'http://example.com/blog'
console.log(full_url_for.call(hexo, '/'))
// http://example.com/blog
hexo.config.url = 'http://example.com/blog/'
console.log(full_url_for.call(hexo, '/'))
// http://example.com/blog/
full_url_for('index.html')
appends a trailing slash if config.pretty_urls.trailing_index === false
(same as previous behavior):
const hexo = {
config: {
url: 'http://example.com'
}
}
console.log(full_url_for.call(hexo, 'index.html'))
// http://example.com/
hexo.config.url = 'http://example.com/'
console.log(full_url_for.call(hexo, 'index.html'))
// http://example.com/
hexo.config.url = 'http://example.com/blog'
console.log(full_url_for.call(hexo, 'index.html'))
// http://example.com/blog/
hexo.config.url = 'http://example.com/blog/'
console.log(full_url_for.call(hexo, 'index.html'))
// http://example.com/blog/
In fact, I am confused about these two configurations.
url: http://yoursite.com => root: /
url: http://yoursite.com/child => root: /child/
If such a rule must meet, I think that only the configuration of url
is enough, root
can be generated during the initialization of hexo
Or is there a situation that does not meet this rule?
url: http://yoursite.com/child ≠> root: /child/
This PR makes it flexible to any combination of the following configs:
url: http://yoursite.com/child
root: /child
url: http://yoursite.com/child/
root: /child
url: http://yoursite.com/child
root: /child/
url: http://yoursite.com/child/
root: /child/
url: http://yoursite.com/child/
root: child/
url: http://yoursite.com/child/
root: child
url: http://yoursite.com/child
root: child/
url: http://yoursite.com/child
root: child
@jiangtj
I think that only the configuration of url is enough, root can be generated during the initialization of hexo
You are right, it's trivial to derive root
from url
; we can make root
config as optional and generate it if it's empty.
I wonder if there is any use case for:
url: http://yoursite.com/
root: /child/
I'm not sure if this PR is being too lenient on invalid config.
I'm not sure if this PR is being too lenient on invalid config.
Hexo currently is already pretty lenient about it.
This PR may break {{ url + path }}
or {{ root + path }}
, however, these approaches are discouraged and we recommend using full_url_for(path)
or url_for(path)
instead.
The purpose of this PR is reducing 301/302 redirects for anyone using url: http://example.com/child
(possibly benefiting SEO) and encourage users to use ``url: http://example.com/child/` instead since this is the true URL most of the time.
The main goal is to make the config slightly more user-friendly.
@jiangtj @curbengh @stevenjoezhang
Please consider the Breaking Change we introduced and the backward compatibility of current themes.
If theme is designed to work with the current configuration format, then we shouldn't change the root
& url
in case things go wrong.
You are right, it's trivial to derive
root
fromurl
; we can makeroot
config as optional and generate it if it's empty.I wonder if there is any use case for:
url: http://yoursite.com/ root: /child/
@curbengh If there is no special case, I prefer to remove the root
directly, and then generate it
url: http://yoursite.com => {url: http://yoursite.com, root: /}
url: http://yoursite.com/ => {url: http://yoursite.com, root: /}
url: http://yoursite.com/child => {url: http://yoursite.com/child, root: /child/}
url: http://yoursite.com/child/ => {url: http://yoursite.com/child, root: /child/}
Like https://github.com/hexojs/hexo/pull/4414, handle url
and root
in load_config, which can also be pretty lenient, and both full_url_for
and url + path
root + path
are valid
Please considering about Breaking Changes and the backward compatibility of current themes then.
@SukkaW If just make full_url_for
more flexible, I think this is not BC. Because the documentation and many third-party plugin themes may use url + path
or root + path
, so there is no change for users. . .
@SukkaW If theme is designed to work with the current configuration format, then we shouldn't change the root & url in case things go wrong. @jiangtj If just make full_url_for more flexible, I think this is not BC.
To clarify, this PR doesn't directly cause a BC; the flexibility I mentioned earlier is more of a side effect, not a primary intention of this PR, though I think it's a benefit. It is this flexibility that could be a BC.
However, I reckon all users with root: /child/
uses url: http://yoursite.com/child
as currently enforced by Hexo, so there is no immediate BC, {{ url + path }}
still works.
With this PR, Hexo doesn't need (and shouldn't) enforce the syntax, so user can then use:
url: http://yoursite.com/child/
root: child
But that will break {{ url + path }}
. Theme has two (easy) options:
-
full_url_for()
(recommended) - User change config to
url: http://yoursite.com/child
@jiangtj If there is no special case, I prefer to remove the root directly, and then generate it and both
full_url_for
andurl + path
root + path
are valid
The intention is to keep the trailing slash url: http://yoursite.com/child/
, so {{ url + path }}
will not work without workaround.
I mean when the user configures url: http://yoursite.com/child/
, generate a new url without the trailing slash (url: http://yoursite.com/child
) and the correct root (root: /child/
)
So that {{ url + path }}
can also work normally