django-import-export icon indicating copy to clipboard operation
django-import-export copied to clipboard

Suggestion:The encode conversion

Open hebijiandai opened this issue 10 years ago • 23 comments

Sir,I think over this question in django

UnicodeDecodeError at /admin/core/book/import/
'utf8' codec can't decode byte 0xc4 in position 0: invalid continuation byte

then I use vim set fileencoding,I found the csv standard of file exported from the django-import-export is UTF-8,I search it on internet,I also found that if i use utf-8 standard file to import,it's OK.

Maybe I use the OS in Chinese language so it cause the problem.Would you please modify the project:If the encode of import-file or export-file is not utf-8,first convert it to utf-8,then process other code?

I have write some code to fix the encode convertion:

import chardet

def convertEncoding(from_encode,to_encode,old_filepath,target_file):
    f1=file(old_filepath)
    content2=[]
    while True:
        line=f1.readline()
        content2.append(line.decode(from_encode).encode(to_encode))
        if len(line) ==0:
            break

    f1.close()
    f2=file(target_file,'w')
    f2.writelines(content2)
    f2.close()

convertFile = open('1234.csv','r')
data = convertFile.read()
convertFile.close()

convertEncoding(chardet.detect(data)['encoding'], "utf-8", "1234.csv", "1234_bak.csv")

I am a newbie , my code is not concise.would you please think about that and integrite the regular to the project?I very like this project,thanks for your reputation!

hebijiandai avatar Mar 09 '14 07:03 hebijiandai

Have you checked docs specifically settings for from_encoding and to_encoding?

bmihelac avatar Mar 09 '14 13:03 bmihelac

Yes sir,I test it in my code,in my OS environment it still cause the error. Does from_encoding='utf-8' means the encode of import-file must be utf-8? I also change the parameter to from_encoding='GB2312’ follow my csv file's encoding,it also cause error~And when I convert the file to utf-8 ,it runs well~

hebijiandai avatar Mar 09 '14 15:03 hebijiandai

Please see how data is encoded and decoded:

https://github.com/bmihelac/django-import-export/blob/master/import_export/admin.py

bmihelac avatar Mar 10 '14 15:03 bmihelac

I also have same problem under chinese os while file content is encoded with utf8.

Would it be the problem of os environment setting and should open file with open(..., encoding="...") not just open filename&read_mode, to solve that?

my traceback: Traceback: File "D:\My Documents\Workspaces\xbcWeb\xbcWeb\env\lib\site-packages\django\core\handlers\base.py" in get_response

  1.                 response = wrapped_callback(request, _callback_args, *_callback_kwargs)
    
    File "D:\My Documents\Workspaces\xbcWeb\xbcWeb\env\lib\site-packages\django\utils\decorators.py" in _wrapped_view
  2.                 response = view_func(request, _args, *_kwargs)
    
    File "D:\My Documents\Workspaces\xbcWeb\xbcWeb\env\lib\site-packages\django\views\decorators\cache.py" in _wrapped_view_func
  3.     response = view_func(request, _args, *_kwargs)
    
    File "D:\My Documents\Workspaces\xbcWeb\xbcWeb\env\lib\site-packages\django\contrib\admin\sites.py" in inner
  4.         return view(request, _args, *_kwargs)
    
    File "D:\My Documents\Workspaces\xbcWeb\xbcWeb\env\lib\site-packages\import_export\admin.py" in import_action
  5.             data = uploaded_import_file.read()
    

u8621011 avatar Jan 08 '15 01:01 u8621011

I tried and can fixed the exception with this code. please patch it if it's suitable. i am not familiar with github.

in process_import() import_file = open(import_file_name, input_format.get_read_mode(), encoding=self.from_encoding)

in import_action() with open(uploaded_file.name, input_format.get_read_mode(), encoding=self.from_encoding) as uploaded_import_file:

u8621011 avatar Jan 08 '15 06:01 u8621011

Still having this problem using v1.1.0, python3, in ubuntu16.04 when trying to import a CSV file with latin characters. Also gives the error even if you specify the "from_encoding" attribute.

class MyImportMixin(ImportMixin):
        formats = (CSV,)
        from_encoding = 'latin-1'
        
class UserTmpAdmin(MyImportMixin, admin.ModelAdmin):
        resource_class = UsuariosTmpResource

admin.site.register(Usuarios_temporales, UserTmpAdmin)

I also tested tablib in the shell and works ok...

image

Any ideas of what could be wrong. :-( The most strange thing is that works like a charm in Windows, but same base code fails when deloyed in ubuntu 16.06.

hypnotic-frog avatar Oct 08 '18 19:10 hypnotic-frog

I've hit similar problems, and there are several distinct problems with the current code:

1: Encoding is happening in the wrong place

... should probably be happening with standard Python open() & encoding=, rather than fetching the data and using force_text() on it. (And while this is being fixed: it may be better to default to the utf-8-sig encoding rather than plain utf-8, but only for reading, as it will detect and skip the BOM if there is one.)

2: force_text() exceptions aren't being caught

And if the code is going to use force_text(), it should catch the correct exception (DjangoUnicodeDecodeError, not UnicodeDecodeError)

3: Doesn't handle universal newline types

The code already knows what open() modes to use for each format, thanks to base_formats. Text formats should have the U (universal newline) flag added. (Update: the U flag is deprecated in open() now. Use text mode or set newline=None)

4: Should just raise the actual exception rather than doing an HttpResponse + <h1> simulation of an error

No wonder nothing was showing up in my error logs. forehead-slap

yozlet avatar Jan 22 '19 19:01 yozlet

@yozlet great tip for utf-8-sig

while such issues have almost always missed reproducible test case, I'm totally for making library more robust when handling different encodings.

bmihelac avatar Jan 23 '19 11:01 bmihelac

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 06 '19 07:06 stale[bot]

Hi guys, Any updates on this, currently experiencing similar issues.

xiubinzheng avatar Oct 03 '19 20:10 xiubinzheng

Me too

Imported file has a wrong encoding: 'ascii' codec can't decode byte 0xc3 in position 31: ordinal not in range(128)

Can't get it parsed with utf-8 in deploy environment, although in development works fine

GabrieleCalarota avatar May 25 '20 15:05 GabrieleCalarota

Me too

Imported file has a wrong encoding: 'ascii' codec can't decode byte 0xc3 in position 2011: ordinal not in range(128)

only in Production environment, a json, @bmihelac why if file is encoded into UTF-8?? Django 2.2

giuseppenovielli avatar Apr 07 '21 13:04 giuseppenovielli

4: Should just raise the actual exception rather than doing an HttpResponse + h1 simulation of an error

Implemented in PR #1281 (although the error is presented back in the UI as a form error)

matthewhegarty avatar Jul 31 '21 16:07 matthewhegarty

I have created #1306 based on the suggestions made by @yozlet here. I cannot reproduce the import errors but it would be great if anyone who has commented previously in this thread can test the PR to see if it resolves their issues.

matthewhegarty avatar Aug 01 '21 10:08 matthewhegarty

I want to know if had resolved this issu? I uploaded file's language is chinese char, the problem is same as your...

bowuL avatar Feb 20 '22 17:02 bowuL

@bowuL Please can you try this branch and let us know if the problem still exists?

matthewhegarty avatar Feb 20 '22 19:02 matthewhegarty

I had the same issue and now it's fixed. Thx.

jairodri avatar Mar 05 '22 12:03 jairodri

@jairodri Thanks - was that using the new branch?

matthewhegarty avatar Mar 05 '22 12:03 matthewhegarty

yes, i'm using it :)

jairodri avatar Mar 05 '22 13:03 jairodri

@matthewhegarty Sorry for the late reply. I didn't use other branch, just change the source code and rewrite the FolderStorage class image

bowuL avatar Mar 15 '22 07:03 bowuL

@bowuL thanks - if you could try the other branch that would be great, as it would help us understand whether the proposed fix is going to work.

matthewhegarty avatar Mar 15 '22 09:03 matthewhegarty

Release 3.0 (beta) is now available, so anyone who is hitting this issue is encouraged to test with v3.0-0-beta.

matthewhegarty avatar Apr 07 '22 12:04 matthewhegarty

Release 3.0 (beta) is now available, so anyone who is hitting this issue is encouraged to test with v3.0-0-beta.

I had a similar issue on django-import-export==2.8.0

'charmap' codec can't decode byte 0x8f in position 29

After upgrading to django-import-export==3.0.0b4, i did not get this anymore

Thanks

NwawelAIroume avatar Jun 25 '22 12:06 NwawelAIroume

Closing - this should be fixed after release v3 - please raise new issue if still occurring.

matthewhegarty avatar Apr 12 '23 14:04 matthewhegarty