ptbtest icon indicating copy to clipboard operation
ptbtest copied to clipboard

[EntityParser] `parse_markdown`. Markdown V1 entities parser.

Open elebur opened this issue 9 months ago • 1 comments

The code of the parse_markdown method is ready. Most tests are written.

I want to add more tests but have still decided to open the PR so you can review the code. Perhaps you'll have some comments or recommendations.

Also, I have a question about licensing. My code is partly based on the code from https://github.com/tdlib/td Should it be mentioned?

elebur avatar Mar 31 '25 19:03 elebur

One of the tests for MessageGenerator is failing now, but it will be fixed as soon as I add parsers for URLs, mentions, and hashtags.

elebur avatar Mar 31 '25 20:03 elebur

Also, I have a question about licensing. My code is partly based on the code from https://github.com/tdlib/td Should it be mentioned?

Hmm, good question. I'll have to study it a little (the licensing part). This library is written is C++. In what way did you base your code on this? The general strategy for parsing?

AlexPHorta avatar Apr 01 '25 11:04 AlexPHorta

Yes, I took the algorithm from it and rewrite in python.

elebur avatar Apr 01 '25 12:04 elebur

@AlexPHorta the PR is ready to be merged.

elebur avatar Apr 02 '25 16:04 elebur

@AlexPHorta the PR is ready to be merged.

Nice! Let me just do some more reading on jurisprudence about the license issue.

AlexPHorta avatar Apr 02 '25 16:04 AlexPHorta

Hi, @elebur, sorry about the delay. Things are a little messy right now on my side. Ok about the license, but please avoid basing your work on something that's not GPL compatible. Since it's only about the markdown parsing I'm gonna allow it, but you could have used any of these:

  • https://github.com/progsource/maddy
  • https://github.com/mekhontsev/imgui_md?tab=MIT-1-ov-file
  • https://github.com/mity/md4c
  • https://github.com/markedjs/marked?tab=License-1-ov-file

All of them are MIT licensed. The first two are C++.

Gonna check the actual code until the end of the week. Thanks!

AlexPHorta avatar Apr 08 '25 01:04 AlexPHorta

Hey, @AlexPHorta .

The td project uses BSL (Boost Software License).

According to gnu.org BSL is compatible with GPL. So, I think, we won't have any issues with using this code. But I will pay attention in the future.

The reason I took the algorithm from the tdlib is that Telegram has its own Markdown syntax which is slightly differs from the original Markdown. All libraries that you mentioned deal with the original syntax.

Also, I don't know if the original code/project/authors should be mentioned in our code.

elebur avatar Apr 08 '25 08:04 elebur

According to gnu.org BSL is compatible with GPL. So, I think, we won't have any issues with using this code. But I will pay attention in the future.

Hahaha, you live you learn. I was interpreting things in a totally different direction. Who am I to question the creators of the license? Just one thing, I believe it's necessary to add the text of the BSL license to the repository. Could you add a copy of it to the PR?

Also, check the output of hatch fmt --check and apply the corrections it points to. There are some 'input' names in the code that need to be changed, no need of name collisions in what is strictly our code.

Besides that, thank very much for the effort and sorry, sorry, sorry for the delay in approving the PR. I still need to make some adjustments on my schedule.

AlexPHorta avatar Apr 13 '25 22:04 AlexPHorta

Hey, Alex! Added the license and applied the linter corrections. But there are some warnings left about the complexity of functions, like this one C901 `_handle_attachments` is too complex (39 > 10). I ignored such warnings.

elebur avatar Apr 14 '25 18:04 elebur

Yes, ignore those for now. Thanks!!!

Hey, Alex! Added the license and applied the linter corrections. But there are some warnings left about the complexity of functions, like this one C901 `_handle_attachments` is too complex (39 > 10). I ignored such warnings.

AlexPHorta avatar Apr 14 '25 18:04 AlexPHorta