godot
godot copied to clipboard
Add `String.strip_bbcode()` and `String.bbcode_escape()` BBCode methods
String.strip_bbcode()
can be used to display text from a RichTextLabel in places that don't support rich formatting.
String.bbcode_escape()
can be used to prevent BBCode injection in RichTextLabels that display user input.
- This closes https://github.com/godotengine/godot-proposals/issues/5056. See also https://github.com/godotengine/godot/pull/65839, which provides an alternative implementation that only has
String.strip_bbcode()
implemented.
Thanks @dalexeev for providing the String.strip_bbcode()
implementation :slightly_smiling_face:
Preview
From top to bottom: unescaped, stripped, escaped
[b]hello[/b] [u][color=#f89]world[/color][/u]
@Calinou Just tested and using the same format as your example it seems like strip_bbcode()
strips both BBCode and non-BBCode tags.
From top to bottom: unescaped, stripped, escaped
@markdibarry See https://github.com/godotengine/godot-proposals/issues/5056#issuecomment-1202460982:
It should likely strip every
[tag]
-like substring in the source string (and the closing pair, if present), without relying on Godot's own implementation of BBCode and the tags that we have defined.
It removes all tags, whether they exist or not. It's a String
method, not a RichTextLabel
method, so it can't know which tags exist and which don't. Just like to_pascal_case()
method uses general rules and does not recognize specific abbreviations.
@dalexeev Ah, I see now. I think it would be misleading to call it strip_bbcode()
then, since it strips all tags whether or not it contains BBCode. I think strip_tags()
would be a more appropriate name, since it can't verify anything to do with BBCode, and this also would prevent a name conflict if in the future someone implemented a proper way to strip BBCode from text. Though, that would probably be part of the text server or RTL... though I can't remember if Godot's BBCode tag logic is tightly coupled to that node. That would probably need changed in the future as well, if so. escape_bbcode()
still makes sense, since it's relying on the Godot BBCode implementation.
I think it is fine as is. bbcode
refers to BBCode syntax, not a list of specific codes. Any [tag]
is a BBCode tag, any <tag>
is a HTML tag. Just like is_valid_identifier()
method only analyzes the string itself, but does not answer the question whether this identifier is declared in your script or not (and does not even answer the question whether the string is a keyword like if
, var
, etc.).
I disagree, but you're the decision makers. Just pointing out that this method conveys that it removes BBCode, but instead, it removes BBCode, BBCode-similar tags, and any content that happens to be within brackets.
I, and two other devs I know who have established dialog languages for Godot, use tags for our language and also support BBCode. The problem, as we've discussed, is that in order to assign text, the current workflow is:
- Assign the entire string with BBCode and custom tags to the RTL text field.
- Call
get_parsed_text()
to get the text without BBCode - Take that result and parse it for custom tags.
- Rebuild the string adding the BBCode as you go, comparing the two strings.
- Reassign the new text.
This interested me (and I'm sure others), because I assumed it actually stripped BBCode, preventing us from having to assign the text field twice. It doesn't, so it's not relevant to our cases, but since it also removes all tags, even if they're not in BBCode syntax format, I don't see how it'd would be reliable to use in any scenario.
Currently would heavily appreciate these methods, as it would prevent users from writing BBCode in a chat system I'm currently experimenting with. Fortunately, writing a workaround isn't exactly difficult, but it would be nice.
@Calinou Any chance you can name it strip_tags()
and tag_escape()
instead since this doesn't check for bbcode? Pretty, pretty please? I'd be so grateful.
@Calinou Any chance you can name it
strip_tags()
andtag_escape()
instead since this doesn't check for bbcode? Pretty, pretty please? I'd be so grateful.
We already use bbcode
in RichTextLabel's properties, while tag
is a generic term that doesn't refer to any specific syntax. It's not quite as expressive as actually having bbcode
in the function name.
It isn't technically 100% correct, but just because there's no true BBCode standard doesn't mean we can't call it BBCode. It's just like INI :slightly_smiling_face:
@Calinou I know, I know. It sounds totally pedantic, I just knew I'd regret it if I didn't throw my reasoning out there before it's too late, so I want to be annoying and express my concerns while I still can:
You're absolutely right that there's no official standard for BBCode. Almost every program out there that uses the term BBCode uses their own implementation/flavor, but there is a current established implementation that we use in Godot, and this method doesn't support it. I'm not invested in the word "tag" at all, it just seems like the closest word to what it actually does without being misleading, and I couldn't think of something better. It makes more sense in my head to use a generic word for this method that works in a generic way, rather than use a word that means something else specific in Godot, since that can cause a name conflict in the near future. String.strip_bbcode()
seems to work like a method called Node.remove_sprite2ds()
that removes all child Node2D
s.
More selfishly: My projects (and a few others I know) use dialog systems that get bogged down with heavy string manipulation required to strip RichTextLabel
's implementation of BBCode, since Godot doesn't currently provide an avenue to do so without a bunch of hacks. I've spoken to a few others that maintain dialog systems and we generally all employ the same hack to strip BBCode, because stripping all text between brackets would break our systems. Even though we wouldn't be able to use this method, I would love it if someday a performant way to strip BBCode was provided, so I am invested in an eventual RichTextLabel.strip_bbcode()
being available.
If Godot's implementation of BBCode and the BBCode that this method supports are incompatible, then that raises a lot of questions. In the case of a method to strip Godot's implementation of BBCode was eventually exposed, what would happen with these methods? I'd think it'd be confusing for many to have two methods that take strings that are named the same thing, but do something different. Would we rename the string.strip_bbcode()
? What would we rename it to? If so, why not now? Would we have to have a different name for the RichTextLabel
version? What would that be named? RichTextLabel.strip_godot_bbcode()
?
If you want a specialized and reasonably performant workaround then I'd say just to use regex, as long as the documentation clarifies it isn't selective I don't see any reason to bog this down or rename it, naming it anything else would be confusing IMO, the only other name I'd say is non-confusing would be something like strip_square_bracket_tags
or something verbose, because otherwise the assumption to me would be <tag>
tags
That's all a fair point to be made. Perhaps it is best to delegate this method to RichTextLabel itself to have more control over tags. However, there have also been occasions where the editor may need to strip generic BBCode from a text before displaying it, too, without creating a whole RichTextLabel for it. Perhaps there should be a static method for it, too...
@markdibarry Formally, [unknown_tag]
is also a BBCode (can be BBCode), so its removal is correct. Just RichTextLabel
does not report unknown tags and invalid BBCode markup, but displays them as is.
If you are dealing with user input, then you should use escape_bbcode()
. If you are dealing with valid BBCode markup, then you can use strip_bbcode()
to display text where BBCode is not supported. If you are dealing with invalid BBCode markup, then strip_bbcode()
may not be suitable, and you will need to implement the necessary functionality yourself.
If we want to achieve more accurate (but more verbose) naming, then I would suggest the escape_bbcode_markup()
and strip_bbcode_tags()
. strip_tags()
is too general.
I agree that there is good use for this method, and I don't want to hold it up too much, I just wanted to share my thoughts, since I can already see problems in the future, but only with name conflicts. Some background:
In RichTextLabel
, when you use valid Godot BBCode tags, it'll be applied, otherwise, it displays it as normal text.
Ex:
This word is [color="green"]green[/color] and this does [foo="bar"]nothing[/foo].
would display as
This word is green and this does [foo="bar"]nothing[/foo].
In most of the dialog systems for Godot (including mine), the built in dialog code is between brackets ([mood="happy"]
, [GetName("John")]
, or [jumpTo="Section 3"]
), but it also supports Godot's RichTextLabel
implementation of BBCode, so the system needs to know the difference between valid Godot BBCode tags, valid dialog code tags, and normal text.
The only current avenue to get the text with valid Godot BBCode tags removed, is to assign it to the text
property first and call get_parsed_text()
on the node. This needs done every time any text changes:
# in RichTextLabel node
text = fullText
var parsedText = get_parsed_text()
# for dialog parsing
var dialogTags = get_dialog_tags(fullText)
text = reconstruct_text(fullText, parsedText, dialogTags)
This proposed method would remove valid Godot BBCode, valid dialog code, and any intentional non-code bracket surrounded text. So, of course, it can't be used for anything that needs awareness of Godot's implementation of BBCode, since it works on a very general level of anything that is between brackets. However, it'd be very helpful to someday have a static method that parses Godot's implementation of BBCode from a string without needing to update the RichTextLabel
's text
twice. The question still is, if we were to implement this, would both methods be called strip_bbcode()
even though "BBCode" means two different things in these contexts?
The only current avenue to get the text with valid Godot BBCode tags removed, is to assign it to the
text
property first and callget_parsed_text()
on the node.
Note that you are not required to use get_parsed_text()
. You can implement a BBCode parser that takes into account both standard BBCode and custom pseudo-tags. For example, this is how Editor Help and the doc comment parser in GDScript are made. Also, to simplify things, you can use delimiters other than brackets.
String.strip_bbcode()
is designed to deterministically process strings, regardless of the specific RichTextLabel
instance, its settings, and pseudo-tags. For example, this method can be used with print_rich()
.
As for RichTextLabel
and advanced scenarios like dialog systems, I'm not sure which method we might need in the future. There may be strip_bbcode()
which you described. There may be some kind of parse_bbcode()
(although the BBCode parser is not difficult to implement in GDScript). There are also proposals to add support for alternative markup such as Markdown.
The question still is, if we were to implement this, would both methods be called
strip_bbcode()
even though "BBCode" means two different things in these contexts?
Thanks for your concerns about potential naming conflicts! I don't have a clear opinion on this issue other than the suggestion above. But I'm not sure if clarifying tags vs markup really helps. To be completely precise, the method should be named strip_bbcode_tags_and_special_chars()
, but this is too verbose.
Also, see String.is_valid_identifier()
and TextServer.is_valid_identifier()
methods. Just like here, the methods are named the same, and "identifier" denotes different things in different contexts. However, both methods cannot be used to check whether a string is a valid GDScript identifier, since they do not take into account the _
token, keywords, built-in types, etc. I think String
vs RichTextLabel
would be enough to understand the difference in purpose.
Finally, if we add the RichTextLabel
method in the future, I think we could name it strip_valid_tags()
, strip_valid_bbcode()
or something like that to emphasize that the RichTextLabel
method keeps invalid markup, unlike the String
method.
But I'm not sure if clarifying tags vs markup really helps
Just to reiterate, I have no need for the word "tag" to be used, it's just the closest general word I could think up that doesn't have a double meaning in the code base. I'm more on the "not bbcode" team than the "yes tag" team. :joy:
To be completely precise, the method should be named strip_bbcode_tags_and_special_chars(), but this is too verbose.
I doubt this would be a widely used method, but I also totally agree about keeping verbosity to a minimum.
I think String vs RichTextLabel would be enough to understand the difference in purpose.
I also like to think my code is self-documenting, but I don't think most users know that RichTextLabel
and String
have two different implementations of the BBCode concept; The first pulling from an internal list of valid tags, and the second meaning anything between brackets. To be honest, it sounds obvious since I assume you and I have been familiar for so long, but I originally thought there was a standard, and it was a surprise to find out that it's still kinda Calvinball out there as far as BBCode goes. :sweat:
I think if it's really important that they're named the same, some detailed documentation on the method's limitations may help alleviate some potential confusion among the community. I think at least just a note that says "Does not consider any specific BBCode implementation when parsing." or something similar would help.
Also, to simplify things, you can use delimiters other than brackets.
I can't speak for the other dialog system maintainers, but I personally think it's definitely a double-edged sword! You're right that it does make it tricky to parse, but the benefits make it worth it. Using a unified syntax can be very helpful to users looking to adopt, without having them needing to memorize a ton out of the gate. Especially when those who are most likely to use the dialog system aren't programmers. Plus, the obvious, it wouldn't be a great solution to completely rewrite the system and make all users upgrade and relearn a new syntax.
You can implement a BBCode parser that takes into account both standard BBCode and custom pseudo-tags.
Okay, this is something new to me! You've got me very interested! How do you access the internal tag list RichTextLabel
supports? I'd be totally down to spend some time making a custom parser, and I know a ton of people in the community who'd be interested in having an optimized solution. I had originally planned to just copy the internal tag list into my project, but I see that the list does get changed every so often for what is/isn't supported, so that was a no-go.
Okay, this is something new to me! You've got me very interested! How do you access the internal tag list
RichTextLabel
supports? I'd be totally down to spend some time making a custom parser, and I know a ton of people in the community who'd be interested in having an optimized solution. I had originally planned to just copy the internal tag list into my project, but I see that the list does get changed every so often for what is/isn't supported, so that was a no-go.
This is not relevant to this PR, but I made a gist (if you have any questions, please leave comments there, not in this PR). It is not possible to obtain the tag list via Godot API, see source code instead. However, you probably won't need to get the full list, you can only process custom pseudo-tags and individual tags, such as [url]
(option 2 in the gist). If you don't need to validate user input, you probably don't need a parser, you can just use String.replace()
or at most regular expressions.
It is not possible to obtain the tag list via Godot API
Ah that's a shame. Your gist looks like what I came up with a few years ago, but this would require constant maintenance, so it wouldn't be of any use. Thanks anyway! I appreciate you trying!
Seeing as though I'm the minority on the naming ambiguity, as long as there's a small note in the method description about its limitations or intended use, I won't push it any further. It's enough to point users to properly read the docs if they're confused. 😅
Hey, is this going to be added? Would be useful to have a neat built-in function.
Hey, is this going to be added? Would be useful to have a neat built-in function.
4.3 is in feature freeze now, so this will probably not be merged in that version. That said, it should be straightforward to implement the functions from this PR in GDScript so you can use them in your project.
See this comment from the original proposal if you want a work-around in the meantime.
With time thinking about it, I do think this method belongs inside RichTextLabel. Whether it should be static or directly tied to a RichTextLabel object (hence it'd be able to distinguish valid BBcode tags), I am not sure.
My reasoning is also... Well, the entirety of String's method list has turned into a movie to scroll through. It's a relatively niche method, I'm not sure if it deserves to be there.
To be honest, it's fine. I will still sleep at night if this PR is merged as is.
With time thinking about it, I do think this method belongs inside RichTextLabel. Whether it should be static or directly tied to a RichTextLabel object (hence it'd be able to distinguish valid BBcode tags), I am not sure.
Here are code examples of both implementations.
RichTextLabel:
# node reference
@onready var label = $RichTextLabel
var stripped_text = label.strip_bbcode("[center]Text Here")
# or, class reference
var stripped_text = RichTextLabel.strip_bbcode("[center]Text Here")
String:
var stripped_text = "[center]Text Here".strip_bbcode()
Personally I prefer how the code reads when it is under the String class. It also functions like other immutable String methods, so the behavior is clearer from the outset. Implementing it under RichTextLabel would imply (to me at least) that the method would render or otherwise modify the text in some way, when all it does is return a modified String.