TextFormatter icon indicating copy to clipboard operation
TextFormatter copied to clipboard

Add custom BBCode parser to bundle

Open taravasya opened this issue 3 years ago • 8 comments

Hello! I try to migrate from my old forum to a Flarum. And I need to save custom bbcodes to Database as parsed XML. Can I add parser to my own predefined bundle? I learned how to add items, including tag 'HEART'(from doc's) to bundle:

$configurator = s9e\TextFormatter\Configurator\Bundles\Forum::getConfigurator();
$configurator->BBCodes->addFromRepository('ACRONYM');
$configurator->tags->add('HEART')->template = '♥';
$configurator->MediaEmbed->add('spotify');
$configurator->saveBundle('FlarumBundle', 'new_flarumbundle.php');

and then parse this tag while run importing script:

include_once 'new_flarumbundle.php';
$parser = FlarumBundle::getParser();
$parser->registerParser(
    'MyParser',
    function ($text, $matches) use ($parser)
    {
        // Here, $matches will contain the result of the following instruction:
        // preg_match_all('(<3)', $text, $matches, PREG_SET_ORDER | PREG_OFFSET_CAPTURE)
        foreach ($matches as $match)
        {
            // Let's create a self-closing tag around the match
            $parser->addSelfClosingTag('HEART', $match[0][1], 2);
        }
    },
    // Here we pass a regexp as the third argument to indicate that we only want to
    // run this parser if the text matches (<3)
    '(<3)'
);

But I want to add this parser to bundle, to have ready to use workspace for some else migratings. Can I do this somehow? Thanks!

taravasya avatar Jul 04 '22 22:07 taravasya

Ah.. I missed the line extract($configurator->finalize());

With it, I can add my parser directly to the configurator and generate a bundle

$configurator = s9e\TextFormatter\Configurator\Bundles\Forum::getConfigurator();
$configurator->tags->add('HEART')->template = '&#9829;';
extract($configurator->finalize());
$parser->registerParser(
    'MyParser',
    function ($text, $matches) use ($parser)
    {
        // Here, $matches will contain the result of the following instruction:
        // preg_match_all('(<3)', $text, $matches, PREG_SET_ORDER | PREG_OFFSET_CAPTURE)
        foreach ($matches as $match)
        {
            // Let's create a self-closing tag around the match
            $parser->addSelfClosingTag('HEART', $match[0][1], 2);
        }
    },
    // Here we pass a regexp as the third argument to indicate that we only want to
    // run this parser if the text matches (<3)
    '(<3)'
);
// Save it back as your own
$configurator->saveBundle('FlarumBundle', 'new_flarumbundle.php');

taravasya avatar Jul 05 '22 07:07 taravasya

But now I can't figure out, what sintax for creating bbcode parser? In doc's I found only with addSelfClosingTag examples, and in sources s9e I don't see anything what can help me with this. Can someone share with me hello world examle, how to build custom bbcode parser? Thanks.

taravasya avatar Jul 05 '22 08:07 taravasya

Bundles are a way to easily add and redistribute a preconfigured parser to an application, but Flarum already has a parser and that's the one you should use. Check out Flarum's docs and forums for how to extend Flarum's parser.

If you're converting from another forum software, a migration script may already exist too.

JoshyPHP avatar Jul 05 '22 14:07 JoshyPHP

Thanks for reply. Yes, I'm migrating from another forum, vBulletin, and I didn't find a proper migration script for it. There are only a few "drafts" but none of them include a parser. In turn, the Flarum parser itself is not suitable for me, because I need to import the posts text directly into the database, but not through the process of writing messages, and in addition there are some customizations that I would like to add during the migration process.

I need a simple example of how to correctly add a custom parser for bbcode to my bundle. As example, I'd like to improve the quote parser, because the way it is now, the database doesn't create proper records of mentions of posts and users. As a result, after the transfer it is often difficult to understand where the source of the quote is, because not added correct url to it. With the help of a regular expression and a callback function, I achieved the desired result, however, the regular expression does not fit where there is a lot of text in the post. The template recognition process simply breaks and the text with bbcode is simply not parsed. Also I want to migrate some custom bbcodes... But.. this is another story. Here i'm for simple example of how to add bbcode parser to bundle. Thanks.

taravasya avatar Jul 05 '22 15:07 taravasya

You can't use registerParser() to add parsers to a bundle. A custom parser is only useful for custom markup, if you need to transform an attribute value into another format you should use an attribute filter. If you need to create an attribute based on the value of another attribute, you can use a tag filter.

  • https://s9etextformatter.readthedocs.io/Filters/Attribute_filters/
  • https://s9etextformatter.readthedocs.io/Filters/Tag_filters/

What does the vBulletin markup look like?

JoshyPHP avatar Jul 08 '22 19:07 JoshyPHP

Hi! It seems to me that filters are useful at the render stage, and can be of little help for parsing to xml. In any case, I don’t see how they can be used to modify the xml data of the posts written to the database.

What does the vBulletin markup look like?

If I understand you correct - its very similar to phpbb. Quotes usual looks like this:
[quote="john doe;10000"]text[/quote] And text stored as plain text, without any markup. and s9e pars it correct. But as I say early, I want to add user/post mentions records into Flarum database, because without this records Flarum don't link quote with its source, and as result - quotes looks simplified.

This is "good" linked quote: image And this is "not good", NOT linked quote: image

For this I need to use custom call_back points, to insert db row with current postid and quotet postid, and also here I need to get the disscussionid and postnumber of quotet post, for generate correct link to it. In general, a number of actions are required... So in the end, I was forced to create my own parser for quotes and add it to migrate script. In finally quote in Flarum db will look like this: <QUOTE><i>&gt; </i><p><POSTMENTION discussionid="0001" displayname="john doe" id="10000" number="100">@"john doe"#p10000</POSTMENTION> Text </p></QUOTE>

And there are also a number of other problems. For example, the numbering of the bbcode SIZE, in vBulletin goes from 1 to 7, and is rendered not by size in pixels, but by css rules:

font-size: x-small;
font-size: small;
font-size: medium;

and so on... And it conflicting with parser/renderer s9e wich use SIZE=20 as font-size: 20px; In order to have the largest font in vBulletin, I used [SIZE=7], and for Flarum I need to use either [H] or [SIZE=30], [SIZE=35] etc. Bbcode [H] in vBulletin used like this: [H=1] text [/H], while for s9e the correct format looks like this [H1] text [/H1].

Unfortunately, there are more many different small and large inconsistencies that I have to catch and individually select the right parsing algorithms for them. But at this moment, I think I have picked up all the necessary algorithms to bring the appearance to a real "good looking" forum.

taravasya avatar Jul 08 '22 22:07 taravasya

If the goal is to replace the original markup to make it match Flarum's (rather than extend Flarum to support vBulletin's markup) then what you need is a Parser instance that understands vBulletin's markup. You can use parse the original text, use the generated XML to modify its content manually using DOM methods rather than string replacements, then serialize it back to XML in a manner that's consistent to what Flarum produces. Alternatively, you can unparse the XML back to plain text and reparse it using Flarum's parser. It should produce the same XML if done correctly.

I would consider creating a new bundle for vBulletin markup as part of the library if you're able to assist with verifying its correctness. Is there some public documentation about vBulletin's markup somewhere?

JoshyPHP avatar Jul 10 '22 17:07 JoshyPHP

Unfortunately, I am not sure that I can provide qualified assistance here. As I said above, if I understand correctly what you mean by markup, vBulletin stores the text of messages in the database as plain text, in which some parts of the text are wrapped by bbcode tags. These parts are either rendered to html at load time or read already rendered from the cache. So.. no XML used at any stage...

I can give you examples of all the standard bbcodes used in vBulletin and how it look in the finished html code.

I should also note that on my work site, I'm using vBulletin version 4.2.2 while the latest version is 5.6.9 and I have no idea how much vBulletin has changed structurally. Purely visually, the frontend of the 5th version differs significantly from the 4th. The most majority of current forums use vB3 and vB4, which are very similar to each other, and only a small part of forums use vB5.

(At the time of the creation of the 5th version, all the main vB developers left the team and started developing xenforo. And the latest versions of vB4 and vB5 left many vulnerabilities and performance issues. As a result, it turned out that the 5th version was in demand less than others)

Here: https://wedframe.ru/misc.php?do=bbcode all bbcodes used on my forum are presented. Starting from bbcode igm (intsagram) and up to the end, these are custom bbcodes that may differ in different forums. (link part: misc.php?do=bbcode usual can show bbcodes on any vB4 forum)

Here: https://forum.vbulletin.com/help#bbcode_reference/bbcode_basic standart bbcodes what used on vB5

Here: https://www.vbulletin.com/docs/html/ some general info about vB. I don't know if you found there something useful.

taravasya avatar Jul 10 '22 17:07 taravasya