ebuku could not get any bookmark

hi, I could not get any bookmark in ebuku,but buku has import some bookmark. When I output this command "ebuku", the error message is "Invalid string for collation: Invalid argument". Here is my configuration.

OS: Windows 10.0.19045 emacs version : 30.0.50 buku: 4.9 ebuku: 2024.09.05

Sep 20 '24 04:09 1925381584

This reminds me of #31, which the user fixed by ensuring that the value of the LC_ALL environment variable was changed from C to the appropriate locale (in that case, zh_CN.UTF-8, as described here).

What's the value of LC_ALL in Emacs' environment on your system? You can find that information via e.g. M-x list-environment.

Sep 21 '24 02:09 flexibeast

there is no a command call "list-environment" in my emacs. And I could not find the viable LC_ALL too.

Sep 21 '24 05:09 1925381584

My apologies. Can you instead please evaluate:

(seq-filter #'(lambda (v) (numberp (string-match "^LC" v))) process-environment)

and report the output?

Sep 21 '24 07:09 flexibeast

the output is that

Sep 21 '24 07:09 1925381584

Okay, so LC_ALL is set appropriately.

Could you please do M-x toggle-debug-on-error, and then try to start Ebuku? It should result in a buffer showing what commands/functions got called; could you please share the contents of that buffer?

Sep 21 '24 08:09 flexibeast

here is

Sep 21 '24 08:09 1925381584

this is my configure

Sep 21 '24 08:09 1925381584

Thank you - i'll investigate this and get back to you.

Sep 21 '24 08:09 flexibeast

In the second line of the backtrace - the one that starts with #<subr string-collate-lessp ... - there are two bookmark tags being compared in order to sort them correctly. However, it appears that the tags have been saved in the buku database with different encodings; presumably the second is UTF-8, as it's rendering correctly, but the first one is showing the raw bytes (in octal), and i'm not sure what encoding it might be..

Can you please copy-and-paste the two tags into two new and separate files, each tag in their own file, and then open up each of those files in Emacs, calling C-h v buffer-file-coding-system in each buffer, and sharing the results?

Sep 21 '24 09:09 flexibeast

Now I only import one bookmark,but still getting this error.

Sep 21 '24 13:09 1925381584

But you're importing one bookmark into a pre-existing buku database, correct? If so, then there's still the issue of comparing pre-existing tags with the tag(s) of the bookmark being imported. So, please follow the instructions i provided in my previous comment, and share the results.

Sep 21 '24 14:09 flexibeast

I’m sorry I don't know how to copy-and-paste the two tags into two new and separate files. when I import the bookmark, I have clean bookmarks. After that I made the changes in the image below and it reads successfully. It looks like there is a problem parsing the Chinese language.

Sep 21 '24 16:09 1925381584

It's clearly not a problem with handling Chinese per se, for two reasons:

The user in #31 and #32 is successfully using Chinese overall, even if there's a specific issue with emoji (as described in #32).
Your own screenshots show that some Chinese text is being displayed okay, whereas other Chinese text is not. This is why i think the issue might be an encoding problem - that is, how Chinese text is being stored in the computer, with some text being stored in the way it expects, and other text being stored in a way it doesn't expect - and why i've needed you to to copy and paste the text as i describe.

It's okay if you don't understand how to do something i ask of you, but in that case, please ask for further instructions. As the developer of this software, i can't help you if you don't provide me with the information i need.

To copy and paste text:

Move point / the cursor to the start of the text you want to copy.
Press C-SPC.
Move point / the cursor to the end of the text you want to copy.
Press M-w.

That will copy the text to the 'kill-ring' / 'clipboard'.

To paste text:

Move point / the cursor to where you want to paste the text.
Press C-y.

Sep 22 '24 00:09 flexibeast

Thank you for your answer.I did so as you asked and did find something new. First I created two new buffers with notepad++ and put the respective text in them and saved them. Then I opened them in emacs. Their encoding is different as shown below.

But I'm not quite sure if this difference means the db in buku is different, because I looked at the database in sqlite through the tool, and found that the Chinese are all displayed properly, and they are all in utf8 encoding.

So I guess there are two possible reasons, the first one could be that the encoding in buku is different, but it doesn't show it. The second middle possibility is that there is a problem with parsing Chinese in ebuku.

Sep 22 '24 03:09 1925381584

The issue seems to be that Emacs is sometimes incorrectly guessing the encoding as undecided-dos, as in your first screenshot, rather than UTF-8. Ebuku uses Emacs' built-in call-process to retrieve data from the buku database - refer to this part of the Ebuku code, where it calls buku and inserts the resulting output in a temporary buffer. It's Emacs, not Ebuku, that guesses the encoding of the buffer.

Please read through this discussion on #32, in which, as i noted above, the user wasn't having problems with Chinese in Ebuku in general, but only when also using certain emoji. Emacs maintainer Eli Zaretskii is part of that discussion, and he noted that using UTF-8 on Windows machines is problematic:

[T]he user sets a UTF-8 locale, which as I wrote up-thread is not a good idea on MS-Windows. It could well cause failures in invoking external programs from Emacs, if the arguments to those programs include non-ASCII characters. In general, on MS-Windows Emacs can only safely invoke programs with non-ASCII characters in the command-line arguments if those characters can be encoded by the system codepage, in this case codepage-936 AFAIU. ... Emacs on MS-Windows cannot use UTF-8 when encoding command-line arguments for sub-programs, it can only use the system codepage. Using set-language-environment as above will force Emacs to encode command-line arguments in UTF-8, which could very well be the reason for some of these problems. ... [Setting the language environment to "UTF-8" is] NOT RECOMMENDED!

Unfortunately, that discussion wasn't resolved because the user has never responded to Eli's most recent comment. However, in this case, you've reported that the value of buffer-file-coding-system is undecided-dos when it comes to some of the Chinese text in your buku database, and this was some of the information Eli was seeking from the other user. So i'm going to cc him on this discussion, as he might be able to assist further.

@Eli-Zaretskii

Sep 22 '24 04:09 flexibeast