py-leveldb-windows icon indicating copy to clipboard operation
py-leveldb-windows copied to clipboard

Is it support unicode paths?

Open zedxxx opened this issue 9 years ago • 11 comments

Can I create db in path like: c:\tmp\中文-español\?

zedxxx avatar Oct 30 '15 10:10 zedxxx

I am not sure. But if the official leveldb suport, there is no reason this code can't.

happynear avatar Oct 31 '15 11:10 happynear

Unfortunately, official leveldb not support Windows :(

zedxxx avatar Oct 31 '15 11:10 zedxxx

Linux is the same, can the linux version support /usr/xxx/中文目录/?

happynear avatar Oct 31 '15 11:10 happynear

Default encoding in Linux is UTF-8 and this is unicode and there is no problem, but in Window it is Win1251, for example. So, from C code you must make some conversions to support unicode in windows.

When we call this:

leveldb_open(const leveldb_options_t* options, const char* name, char** errptr);

by default, in name we put path to db in ANSI encoding in Windows and in UTF-8 in Linux. And we can't access to path in not-system encoding in Windows. To access such paths in Windows we should put in name UTF-8 too, but Windows port of leveldb must expect this and convert UTF-8 to UTF-16 and call Unicode functions from windows api (CreateFileW instead of CreateFileA).

So, is your port of leveldb work with UTF-8 or default encoding?

zedxxx avatar Oct 31 '15 11:10 zedxxx

I am not quite sure. I am busy with a conference deadline now. You may check it by yourself.

happynear avatar Oct 31 '15 12:10 happynear

Can you give me precompiled *.pyd for x86 Python?

zedxxx avatar Oct 31 '15 13:10 zedxxx

I don't have x86 python. I have update the Win32 configuration of the project. You can compile it by yourself.

happynear avatar Oct 31 '15 15:10 happynear

I can install Python x64 for testing in this case. Because installing Visual Studio and compile lib from sources is more difficult. So, give me your pyd for x64, please?

zedxxx avatar Oct 31 '15 15:10 zedxxx

You can download the x64 leveldb.pyd at http://pan.baidu.com/s/1pJ1mMnx .

happynear avatar Oct 31 '15 15:10 happynear

Test code:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import codecs
import leveldb

db_path_uni = u'c:\\tmp\\中文-español'

with codecs.open('leveldb_uni_test.txt', 'w', encoding='utf-8') as f:
    f.write(db_path_uni)

db = leveldb.LevelDB(db_path_uni)

db.Put('hello', 'hello world')

print db.Get('hello')

failed with message:

Traceback (most recent call last):
  File "C:\Python27\leveldb_uni_test.py", line 12, in <module>
    db = leveldb.LevelDB(db_path_uni)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-8: ordinal not in range(128)

If I convert unicode to utf-8 and try to open db:

db = leveldb.LevelDB(db_path_uni.encode('utf-8'))

than it works, BUT it create a new directory c:\tmp\дё­ж–‡-espaГ±ol that is not a unicode path, this is path with a garbage text in my windows default encoding - win1251.

In summary, this port is not work with unicode paths :(

What do you think and can you fix it?

zedxxx avatar Oct 31 '15 16:10 zedxxx

I will try to fix this problem after paper deadline 11/6.

happynear avatar Nov 01 '15 01:11 happynear