skin.arctic.horizon.2
skin.arctic.horizon.2 copied to clipboard
[BUG] Global search doesn't return any results for Arabic words/letters, works fine with tmdbHelper search.
Skin section
Other
Current Behavior
I have enabled Arabic keyboard layout and changed the Fonts to NotoRoboto-Arabic
for the keyboard to show Arabic letters. When I search for any word or just a single letter that I know should return results that are in the TMDB database and also are in my Library I get no results.
Expected Behavior
It should return results for what I'm searching for because the exact same input return a lot of results when using TMDB Helper directly to search.
Steps To Reproduce
- Change fonts to
NotoRobot-Arabic
and add Arabic keyboard layout - go to the home screen and click on search and type in single Arabic letter
- no matter what you type in, you will get no results
Screenshots and Additional Info
I have tested a couple of other non-Latin scripts and there is no issue with them, this only happens to Arabic language. No response
Checklist
- [X] I have searched the issues and this bug has not been reported.
- [X] I am using the most recent skin version.
- [X] I have confirmed that this bug does not occur in the default skin.
It's because RTL languages reverse the string order which means that the search path is backwards too.
Could you test the version in fix_search_rtl
branch PR and let me know if that fixes the issue.
https://github.com/jurialmunkey/skin.arctic.horizon.2/archive/refs/heads/fix_search_rtl.zip
If that version doesn't fix it then it is unlikely to be fixable.
I've installed that fix and it doesn't fix the issue. I think the appearance of arabic letters in reverse order is another separate issue which may affect the Global search results, but it's not what is causing this bug that I'm reporting.
I don't think RTL languages are the issue here. I may not have included enough information in my original post, so I will try to be more detailed in this comment.
First of all, there are no issues when doing the searches using tmdbHelper directly, I get the expected results no matter what language I use for the search, whether that language is LTR like English/Russian or RTL like Arabic/Hebrew.
The issue is solely related to Global Search that is present in the home page. And the reason why this issue is not related to RTL languages is that I've tested Hebrew which is also a RTL language and it return expected results.
I also made sure to also do searches using only a single letter to make sure that the issue is not the reversing order of letters, because if you try to search a single letter of an RTL language, it's a single letter so there is no order and there is no reverse order.
I didn't include any logs in my original post, so here are the Kodi.log for searches I did. I made sure to clear the Kodi.log file before every search, so the logs you see are exactly what happens during that search only.
Arabic
-
Kodi.log after tmdbHelper search single Arabic letter
Expected results - No Errors
-
Kodi.log after tmdbHelper search single Arabic word
Expected results - No Errors
-
Kodi.log after Global search single Arabic letter
No results - multiple Errors
-
Kodi.log after Global search single Arabic word
No results - multiple Errors
From the above result, we can conclude that Arabic searching works as expected within tmdbHelper, and it doesn't work at all within the Global search.
Hebrew
-
Kodi.log after Global search single Hebrew word
Expected results - multiple Errors
Hebrew is also an RTL language, and using the Global search it return expected results, so we can conclude that this bug is not related to RTL languages. With that said, there are a couple of errors that you may find interesting.
Russian
-
Kodi.log after Global search single Russian letter
Expected results - multiple Errors
I added a Russian search because I want to add non-latin LTR language and to make sure that the errors I got from Hebrew single letter search are not related to RTL languages. And the result is that the behavior of the Russian search is similar to Hebrew, it returns expected results, but at the same time produces some interesting errors that don't seem to interfere with the search results. So RTL languages are not at fault for any errors as Russian is an LTR language.
Special Characters
-
Kodi.log After Global search for $
No results - multiple Errors
Before I did this search I went to themoviedb.org website and I made sure that there are results for this character, and there are a lot of movies/shows in the results when I search $
in that website.
However, in the Global search, I get zero results similar to Arabic and I also get multiple errors.
If you want me to do any more testings, let me know and I will be happy to do all the tests required to fix this issue.
The errors are not important. It is just kodi saying the plugin returned no items for the container.
There are two issues at play here. The first is that rtl languages in kodi will reverse the order of infolabels. It is not reversing just the search query but the entire plugin path so that it becomes invalid. That was what my patch addressed and likely why you find you can now search Hebrew without issue.
The second issue is going to be encoding. The path needs percent encoded utf8 to handle special characters (such as dollar signs) and utf16 (such as Arabic forms). There's no option to encode text via the skinning engine so there's no way to pass the plugin correctly encoded text and these characters will break things.
That was what my patch addressed and why you find you can now search Hebrew without issue.
The testing and their logs are all done in the unpatched version, the version that is available in your repo. I only tested your patch and since it didn't fix the results issue, I returned to the unpatched version and I then performed all the tests above. So Hebrew works fine even prior to the patch. And it is an RTL language, so there should be no difference between it and Arabic.
Concerning encoding, yes I understand the need that special characters may need encoding, and that's a separate issue, but that shouldn't be the cause for why Arabic searches are not working. After all, how do you explain Hebrew searches working but not Arabic, they are similar in every way.
Hebrew and Arabic are different character sets. All characters require encoding not just special characters. Arabic characters are complex and likely being lost or garbled when passed from kodi to the python interpreter via the paramstring because they have not been encoded in the correct format before being passed. You don't have this issue when searching within tmdbhelper because this happens internally rather than as a script argument.
As an example, here is your single word search as a saved search in TMDbHelper. You can see from the overlay that the path has been saved with percent encoding to ensure that the utf8 characters are maintained whilst being passed through the encoding used by sys.argv
By contrast, here is the same search via the skinning engine. You can see that percent encoding has not been applied. There is no way to apply percent encoding. Percent encoding the value would require sending the string to a script, by which point it is lost. You can also see clear issues with loss of RTL and combined forms here
We can see the difference in the output of the paramstring that TMDbHelper receives from Kodi for these queries. You can see that the Arabic forms get messed up whereas the Hebrew forms do not. That would be because Arabic is more complex due to word ligatures and these are lost in translation without proper encoding (well actually the reverse is happening - the ligatures are being applied via the skin when we just want raw character codes passed as percent encoded byte references)
Do you know why this only started happening with Arctic Horizon 2 but not the original Arctic Horizon 1? I've just tested Arctic Horizon 1 skin and it returned results for Arabic words. I can also search my local library in Kodi's default skin with Arabic words and get expected results.
What's new in the AH2 search function that is different from what was in AH1 or from Kodi's default skin global search function?
Also, I have made some tests and I found that in some scenarios I was able to get search results when searching Arabic letters, but I will have to do more testing to get the full picture, when I get any concrete results I will share them here.
@jurialmunkey I've done some testing and I have concluded that Kodi's implementation of text shaping engine for Arabic script doesn't follow Unicode Consortium standard. In short, what happens is that Kodi is converting Arabic characters that I input from the on-screen keyboard from the Arabic_(Unicode_block) to an old obsolete Unicode block called Arabic_Presentation_Forms-B before it puts the inputed text into the search label, so the AH2 skin receives characters that are not normal Arabic characters.
I've opened an issue in the xbmc repository and I will wait to see what they think.
Meanwhile, I've tried to use https://kodi.wiki/view/Add-on:Global_Search in AH2 as an alternative to AH2 internal search, it works fine and returns everything that is in my library whether I use Arabic or English text, the only problem is that there is no integration of it in AH2 skin, so the normal views that everyone loves about AH2 can not be used for the search results returned by this addon.
The Global Search addon is in the Kodi official repository for both Matrix and Nexus versions and is used by multiple skins and it uses informations from movies,shows,actors,episodes and lot of other information to get the search results for everything you have in your library.
That is some good investigation and I think you're right. I thought it was because the characters are not encoded (which possibly could be an additional problem) but it appears that it is actually an issue with the edit control converting the character set. That would also explain why the edit control text displays incorrectly to start with - the characters are wrong before they even go into the skin infolabels and so are then subsequently wrong in the content path that goes to the plugin.
On the topic of global search - it isn't customisable because it is a scripted window which is why I don't bother with it. You can't add custom search lists like tmdbhelper to it.
The difference between AH1 and AH2 search is the edit control. AH1 uses a button control you must click to set a skinstring which is then passed to the search paths as the query; by contrast AH2 uses the edit control to allow the user to type directly in the box and have search results on demand.
Could you confirm that search definitely works correctly in AH1? because if it does I might be able to add a setting to toggle the two methods - its just a bit of work so I dont want to go to the effort if it doesn't work properly. It will mean that you lose the ability to type directly in the box but working search is better than not working search!
What I meant for the AH1 search is that at least in that Skin it returns results for the local library, so if I bring the on-screen keyboard in AH1 and type in Arabic or English it will return content that is in my local library. AH1 default search doesn't include addons like tmdbHelper, so I didn't pay too much attention to that. So when you said I should test AH1 again, I went and set custom search for tmdbHelper and Youtube and this is what I found.
Arctic Horizon 1
in AH1, searching using Arabic doesn't return results for tmdbHelper addon no matter what I type(it works with English). As for Youtube, it return results for Arabic searches, but it actually searches the incorrect input as it reverses the order of letters in the words I'm searching, so I get incorrect results, but at least I get any results. But as said, tmdbHelper doesn't return any results, even if I try a single letter or construct the search input in reverse, so that reverse of reverse will be a correct word that has results, even with that, there are no results.
So to conclude for AH1, library searches works as expected. For the Youtube custom search, it reverse letter order so we get incorrect results. As for tmdbHelper, I honestly don't know why, but maybe there is a character substitution of Arabic that happens before the input is submitted to tmdbHelper, this substitution is the same as the one that happens to AH2 where Arabic characters gets converted from the correct Arabic_(Unicode_Block) to the old obsolete Arabic_presentation_Forms-B and this is what is causing the problem.
I went to Youtube website and I did two searches, the first using a single character from Arabic_(Unicode_Block) and did again search of the same letter but this time I took the character from Arabic_presentation_Forms-B and I got almost the same results. But when I went to themoviedb.org and I did the same thing, I only get results if the Arabic letter is from Arabic_(Unicode_Block) but no results if it's in Arabic_presentation_Forms-B. This explains the different behavior between Youtube and tmdbHelper addon. So it's not like the Youtube addon is doing something special to the input to make it return results, it's just the Youtube API/website is able to handle characters from different Unicode blocks that themoviedb.org api/website can't. As for why the local library is returning expected results, I have no idea why, maybe since the search matching is done by Kodi itself for local library using the local database, so there is no character substitution and it is actually using the correct Arabic_(Unicode_Block) similar to how you use the search function inside an addon.
Arctic Horizon 2
Now if we move to AH2, if I actually type in the search label using the physical keyboard, I get exactly the same behavior as AH1 when typing (on-screen or physical) keyboard. But if I type in using on-screen keyboard in AH2, the input first passes through the label and then from that it gets passed to different addons. This passing from on-screen keyboard to label and out to the addons is definitely converting all the input from Arabic_(Unicode_Block) to Arabic_Presentation_Forms-B and the only result I get is from Youtube (but same as AH1, the input gets reversed before the search happens), so using on-screen keyboard prevent local library results. And the reason why Youtube return any results at all is because the Youtube api/website can handle way more range of Unicode blocks and is smart enough to know that different Unicode code points are actually the same abstract letter, this can easily be done by other websites/apis through a process called Unicode Normalization, but it seems like some don't bother with this.
Summary:
AH1=> Using on-screen keyboard => local library results + Youtube results, but the search input gets reversed.
AH2=> Using physical keyboard => local library results + Youtube results, but the search input get reversed.
AH2=> Using on-screen keyboard=>Youtube results, but the search input get reversed.
So with these findings, I think it maybe worth it to add the option to enable a search button so that when it's clicked, it brings the on-screen keyboard and you pass that input directly to different addons, this way there will no character substitution at least for the local library, and also this way the user can avoid looking at Arabic text that is inverted in the search label. And for my use case, If I want to add a custom search for other addons, I may have to edit that specific addon and add edit its search function so that it can handle incorrectly mapped Arabic characters before doing any API calls with that input.
Concerning Global Search, I should have created a separate issue for this, What I meant is to customize the search result view in Global Search Addon to fit in AH2, and not to add custom searches to the addon. Right now the search results of Global Search addon in AH2 return results in a page that looks like Estuary skin, so it looks like you are not using AH2 skin at all. I will create an separate issue with enhancement tag along with screenshots and see for yourself if it's possible.
Thanks, this is all really useful info. I'll have to think a bit more about possible solutions. At the moment I think there are three issues: rtl being reverse; lack of percent encoding; AND the character code translation issue.
Re global search - my point was that as a scripted window it does not allow me to customise the searches provided. I have a mixture of library and addon content and since global search addon only allows library searches it is not useful for me. Skinning scripted windows is a lot of work and as global search addon isn't something I will use at all, it is very low priority for me - I would rather spend the time on fixing my own custom search window.
That's understandable, I just brought the Global Search addon because I thought maybe this bug will be very hard to solve, so maybe that addon can be used as a compromise, but if you think the bug can be solved, then definitely you should focus on fixing AH2 search function as it's way better, and I myself would rather use the internal search function as it way more customizable and it can search in more data sources and addons.
@movianlost - Can you test v4.10.21 of TMDbHelper https://github.com/jurialmunkey/plugin.video.themoviedb.helper/releases/tag/v4.10.21
That should normalize and decompose the FE character block back into 06 forms which should make the searches work correctly from the search window.
We will still have the issue with edit control reversing the characters but this should hopefully get us to a somewhat usable search in the meantime until (or if) things can be fixed on the Kodi side.
Thanks for the fix, yes right now I'm getting expected results from TMDBHelper in the home search.
As you said, this may be a temporary solution that at least can give the user search results from TMDBHelper, but the real fix should come from Kodi itself so that users can use other addons in the custom search without having to normalize the characters for every addon they want to use.
A reminder on what's working and what's not. Opening the on-screen keyboard and typing text in Arabic will get expected results in TMDBHelper and Youtube. But on the other hand, it doesn't return any results from content in my local library and the text in the search label is shown backward.
So it seems none of the components that are part of the custom search are receiving correct input. For addons like TMDBHelper and Youtube, they are receiving converted text from 0600 to FE00, but the good thing is that they are receiving it in good order. On the other hand, the local library search function is receiving correct unicode code points, so no conversion is happening which is good, but the bad thing is that it's receiving the text backward. The same applies for the search label where the text is shown backward.
I think we should pick a text input method that is considered to be correct so that we determine objectively what's working and what's not because using On-screen keyboard will make Addons get results but none for the local library, and on the other hand, using the physical keyboard on the search label which make it that Addons search are not working and local library is working. So for this, I think we should consider that On-screen keyboard is the one that's inputting correct data and start from there to determine where the data is being altered.
Another issue that I just found today and that is caused by the same original bug is that when you click on information
on a movie and then click down button to bring the crew/recommendation/Youtube etc, the Youtube videos that appear in this section are Youtube videos from a search using the movie title, but if the title is in Arabic, what happens is the search is done with that title but in reverse order so you get incorrect results, so similar behavior to the home custom search label.