hexo-util icon indicating copy to clipboard operation
hexo-util copied to clipboard

fix(full_url_for): handle config.url with a trailing slash

Open curbengh opened this issue 4 years ago • 14 comments

a notable example is Github Pages with project page, e.g. https://github.com/OpenTechSchool/social-coding corresponds to https://opentechschool.github.io/social-coding/. Project page always have a trailing slash, accessing https://opentechschool.github.io/social-coding will 301-redirect to https://opentechschool.github.io/social-coding/. So, in this case, user may prefer to use url: https://opentechschool.github.io/social-coding/.

This PR adds support for that.

curbengh avatar Jul 01 '20 08:07 curbengh

Coverage Status

Coverage increased (+0.005%) to 97.092% when pulling 6cd14945a668052bf58dcba1b856bbd95eac69e5 on curbengh:full-url-for into 14fa817f350cc4c318a411fbd52c0f0cce7cea2e on hexojs:master.

coveralls avatar Jul 01 '20 08:07 coveralls

Yes, and there is a hint in the _config.yml

If your site is put in a subdirectory, set url as 'http://yoursite.com/child' and root as '/child/'

https://github.com/hexojs/hexo-starter/blob/259ecc3982e0445075f6c0daaccc24e4f1c7e663/_config.yml#L15

Which means

url: https://example.com/blog/

should be considered as invalid config.

See also https://github.com/hexojs/hexo/issues/1812#issuecomment-213667682

stevenjoezhang avatar Jul 01 '20 09:07 stevenjoezhang

config.url = config.url.replace(//+$/, '');

that failsafe is not available in Hexo API (e.g. when use in a plugin's unit test).

It is also not ideal in use cases such as rss and sitemap. When a trailing slash (e.g. http://example.com/blog/) is the actual URL (no redirection), those use cases should use that URL; currently due to that failsafe (trailing slash is removed), web crawler may catch a 301 redirection (when crawling through a sitemap).

curbengh avatar Jul 01 '20 10:07 curbengh

@curbengh

hexo.config is added by load_config, thus the failsafe is always there. Plugins‘ unit test? They should be written in valid config format (because we already transform it into the valid way during load_config). In case you are worrying about mocking test cases, use await hexo.init() && await hexo.load() (in which load_config will be called) will be fine.

Also, current behavior of full_url_for('/') will always resulted in hexo.config.url + '/' (see full_url_for's test cases). There is no need to worry about trailing slash issue.

https://github.com/hexojs/hexo-util/blob/14fa817f350cc4c318a411fbd52c0f0cce7cea2e/test/full_url_for.spec.js#L13-L15

Last but not least, all browsers, http request libs as well as crawlers will treat //example.com as //example.com/, and no 301 will be returned:

GET https://blog.skk.moe

> HTTP/1.1 200 OK
GET https://blog.skk.moe.

> HTTP/1.1 200 OK
GET https://blog.skk.moe/

> HTTP/1.1 200 OK

SukkaW avatar Jul 01 '20 12:07 SukkaW

no 301 will be returned

I was referring to url: http://example.com/blog/ example. load_config.js removes the trailing slash, so requesting http://example.com/blog will be 301-ed to http://example.com/blog/ (in GH Pages case). feed currently does add it back, but for unrelated purpose (it was for config.url + path). This remove/add trailing slash operation seems un-intuitive to me.


## If your site is put in a subdirectory, set url as 'http://yoursite.com/child' and root as '/child/'
url: http://yoursite.com
root: /

My understanding of the rational behind http://yoursite.com/child limitation is that Hexo previously used config.url + path. After the availability of full_url_for(), that concat approach is no longer used (AFAIK) and we even discourage theme/plugin devs from using it. So, the limitation is no longer needed, as in it's now trivial to be flexible to existence of trailing slash.

In GH Pages, since http://yoursite.com/child/ is the actual URL, it's natural for newcomers to use that; it's only after they notice ## If your site is put in a subdirectory... that the trailing slash needs to be removed.

Hexo blog can be hosted anywhere, we can't know whether http://yoursite.com/child/ or http://yoursite.com/child is the actual one, a web server/hosting platform may prefer to have a trailing slash and vice versa. Since we can't know, why not let the user decides?

curbengh avatar Jul 02 '20 01:07 curbengh

Fixed invalid protocol (https:/example.com/index.html) raised by @jiangtj


full_url_for('/') now simply returns config.url (note: this is different from previous behavior):

const hexo = {
  config: {
    url: 'http://example.com'
  }
}

console.log(full_url_for.call(hexo, '/'))
// http://example.com

hexo.config.url = 'http://example.com/'
console.log(full_url_for.call(hexo, '/'))
// http://example.com/

hexo.config.url = 'http://example.com/blog'
console.log(full_url_for.call(hexo, '/'))
// http://example.com/blog

hexo.config.url = 'http://example.com/blog/'
console.log(full_url_for.call(hexo, '/'))
// http://example.com/blog/

full_url_for('index.html') appends a trailing slash if config.pretty_urls.trailing_index === false (same as previous behavior):

const hexo = {
  config: {
    url: 'http://example.com'
  }
}

console.log(full_url_for.call(hexo, 'index.html'))
// http://example.com/

hexo.config.url = 'http://example.com/'
console.log(full_url_for.call(hexo, 'index.html'))
// http://example.com/

hexo.config.url = 'http://example.com/blog'
console.log(full_url_for.call(hexo, 'index.html'))
// http://example.com/blog/

hexo.config.url = 'http://example.com/blog/'
console.log(full_url_for.call(hexo, 'index.html'))
// http://example.com/blog/

curbengh avatar Jul 12 '20 09:07 curbengh

In fact, I am confused about these two configurations.

url: http://yoursite.com  => root: /
url: http://yoursite.com/child  => root: /child/

If such a rule must meet, I think that only the configuration of url is enough, root can be generated during the initialization of hexo

Or is there a situation that does not meet this rule?

url: http://yoursite.com/child  ≠> root: /child/

jiangtj avatar Jul 13 '20 02:07 jiangtj

This PR makes it flexible to any combination of the following configs:

url: http://yoursite.com/child
root: /child

url: http://yoursite.com/child/
root: /child

url: http://yoursite.com/child
root: /child/

url: http://yoursite.com/child/
root: /child/

url: http://yoursite.com/child/
root: child/

url: http://yoursite.com/child/
root: child

url: http://yoursite.com/child
root: child/

url: http://yoursite.com/child
root: child

@jiangtj

I think that only the configuration of url is enough, root can be generated during the initialization of hexo

You are right, it's trivial to derive root from url; we can make root config as optional and generate it if it's empty.

I wonder if there is any use case for:

url: http://yoursite.com/
root: /child/

curbengh avatar Jul 13 '20 02:07 curbengh

I'm not sure if this PR is being too lenient on invalid config.

stevenjoezhang avatar Jul 13 '20 02:07 stevenjoezhang

I'm not sure if this PR is being too lenient on invalid config.

Hexo currently is already pretty lenient about it.

This PR may break {{ url + path }} or {{ root + path }}, however, these approaches are discouraged and we recommend using full_url_for(path) or url_for(path) instead.

The purpose of this PR is reducing 301/302 redirects for anyone using url: http://example.com/child (possibly benefiting SEO) and encourage users to use ``url: http://example.com/child/` instead since this is the true URL most of the time.

The main goal is to make the config slightly more user-friendly.

curbengh avatar Jul 13 '20 03:07 curbengh

@jiangtj @curbengh @stevenjoezhang

Please consider the Breaking Change we introduced and the backward compatibility of current themes.

If theme is designed to work with the current configuration format, then we shouldn't change the root & url in case things go wrong.

SukkaW avatar Jul 13 '20 06:07 SukkaW

You are right, it's trivial to derive root from url; we can make root config as optional and generate it if it's empty.

I wonder if there is any use case for:

url: http://yoursite.com/
root: /child/

@curbengh If there is no special case, I prefer to remove the root directly, and then generate it

url: http://yoursite.com  => {url: http://yoursite.com, root: /}
url: http://yoursite.com/  => {url: http://yoursite.com, root: /}
url: http://yoursite.com/child  => {url: http://yoursite.com/child, root: /child/}
url: http://yoursite.com/child/  => {url: http://yoursite.com/child, root: /child/}

Like https://github.com/hexojs/hexo/pull/4414, handle url and root in load_config, which can also be pretty lenient, and both full_url_for and url + path root + path are valid

Please considering about Breaking Changes and the backward compatibility of current themes then.

@SukkaW If just make full_url_for more flexible, I think this is not BC. Because the documentation and many third-party plugin themes may use url + path or root + path, so there is no change for users. . .

jiangtj avatar Jul 13 '20 06:07 jiangtj

@SukkaW If theme is designed to work with the current configuration format, then we shouldn't change the root & url in case things go wrong. @jiangtj If just make full_url_for more flexible, I think this is not BC.

To clarify, this PR doesn't directly cause a BC; the flexibility I mentioned earlier is more of a side effect, not a primary intention of this PR, though I think it's a benefit. It is this flexibility that could be a BC.

However, I reckon all users with root: /child/ uses url: http://yoursite.com/child as currently enforced by Hexo, so there is no immediate BC, {{ url + path }} still works.

With this PR, Hexo doesn't need (and shouldn't) enforce the syntax, so user can then use:

url: http://yoursite.com/child/
root: child

But that will break {{ url + path }}. Theme has two (easy) options:

  1. full_url_for() (recommended)
  2. User change config to url: http://yoursite.com/child

@jiangtj If there is no special case, I prefer to remove the root directly, and then generate it and both full_url_for and url + path root + path are valid

The intention is to keep the trailing slash url: http://yoursite.com/child/, so {{ url + path }} will not work without workaround.

curbengh avatar Jul 23 '20 06:07 curbengh

I mean when the user configures url: http://yoursite.com/child/, generate a new url without the trailing slash (url: http://yoursite.com/child) and the correct root (root: /child/)

So that {{ url + path }} can also work normally

jiangtj avatar Jul 30 '20 09:07 jiangtj