incubator-answer
incubator-answer copied to clipboard
fix/slugify-non-ascii
This pull request fixes an issue where questions with full Persian titles were causing a "many redirect error" when trying to access them through the link. The issue was caused by the slugify library returning a blank string for the title of these questions. With this fix, the slugify library has been updated to return a proper string for the title, resolving the issue and allowing for smooth access to all questions, regardless of their title.
Firstly, thank you for your contribution. There are a few questions below:
- Could you tell us some test cases for Persian to help us reproduce the problem?
- If we replace the
slugify
library directly, it may affect the links that are already generated. I haven't actually tested the various scenarios, but it would be a lot of work. My personal suggestion would be to do post processing separately for specific issues. - Do not change the maximum length of the limit, our current maximum is
150
- Do not include debug information in the code
fmt.Println
@LinkinStars Thank you for response. I solved 3 and 4.
-
A post with this title has many redirect problem.
title= چگونه میتوانم مساله گزارش شده را ارزبای کنم؟
And post with this title hasn't problem.title = چگونه میتوانم Bug گزارش شده را ارزیابی کنم؟
-
I think since the slugs aren't saved, it won't be a problem. I also tested a few things and it didn't seem to be a problem. Have you any specific scenario in your mind? Can you tell us your exact solution?
@LinkinStars Thank you for response. I solved 3 and 4.
- A post with this title has many redirect problem.
title= چگونه میتوانم مساله گزارش شده را ارزبای کنم؟
And post with this title hasn't problem.title = چگونه میتوانم Bug گزارش شده را ارزیابی کنم؟
- I think since the slugs aren't saved, it won't be a problem. I also tested a few things and it didn't seem to be a problem. Have you any specific scenario in your mind? Can you tell us your exact solution?
Thank you for the examples. Firstly, you're right that using Persian as a title does cause problems and can lead to redirection errors. The reason is because the url address of the title is empty after formatting. After slugify.Slugify(title)
the title is empty.
I have two better solutions to this problem that we can discuss.
solution 1
I don't know if you noticed that there is a convertChinese
method in the UrlTitle
method. Yes, in fact, Chinese has the same problem. It then relies on this method to identify and convert to English. So maybe we can write a similar method to handle this.
solution 2
You are right, the address behind this url is not saved. It is also only for SEO optimisation. If the language can't be converted to English on the URL, then it doesn't really make a lot of sense for SEO. So I would recommend using a shorter address for access to avoid this problem.
BTW
By the way, you don't really have to abandon slugify
to achieve the capabilities you need. You can just replace the IsValidCharacterChecker
method. Like in the example below.
package main
import (
"fmt"
"strings"
"unicode"
"github.com/Machiel/slugify"
)
func IsValidCharacterChecker(r rune) bool {
// TODO: Implement your own character validation
return true
}
func main() {
originalTitle := " چگونه میتوانم مساله گزارش شده را ارزبای کنم؟"
slug := slugify.New(slugify.Configuration{IsValidCharacterChecker: IsValidCharacterChecker})
title := slug.Slugify(originalTitle)
fmt.Println(title)
}
https://github.com/apache/incubator-answer/blob/9681c026adfe4ddd0c8ae158b2f95e76e414df9f/pkg/htmltext/htmltext.go#L94
Our final solution is to use a fixed description(topic) when the language cannot be parsed.