django-import-export Suggestion:The encode conversion

Sir,I think over this question in django

UnicodeDecodeError at /admin/core/book/import/
'utf8' codec can't decode byte 0xc4 in position 0: invalid continuation byte

then I use vim set fileencoding,I found the csv standard of file exported from the django-import-export is UTF-8,I search it on internet,I also found that if i use utf-8 standard file to import,it's OK.

Maybe I use the OS in Chinese language so it cause the problem.Would you please modify the project:If the encode of import-file or export-file is not utf-8,first convert it to utf-8,then process other code?

I have write some code to fix the encode convertion:

import chardet

def convertEncoding(from_encode,to_encode,old_filepath,target_file):
    f1=file(old_filepath)
    content2=[]
    while True:
        line=f1.readline()
        content2.append(line.decode(from_encode).encode(to_encode))
        if len(line) ==0:
            break

    f1.close()
    f2=file(target_file,'w')
    f2.writelines(content2)
    f2.close()

convertFile = open('1234.csv','r')
data = convertFile.read()
convertFile.close()

convertEncoding(chardet.detect(data)['encoding'], "utf-8", "1234.csv", "1234_bak.csv")

I am a newbie , my code is not concise.would you please think about that and integrite the regular to the project?I very like this project,thanks for your reputation!

Mar 09 '14 07:03 hebijiandai

Have you checked docs specifically settings for from_encoding and to_encoding?

Mar 09 '14 13:03 bmihelac

Yes sir,I test it in my code,in my OS environment it still cause the error. Does from_encoding='utf-8' means the encode of import-file must be utf-8? I also change the parameter to from_encoding='GB2312’ follow my csv file's encoding,it also cause error～And when I convert the file to utf-8 ,it runs well~

Mar 09 '14 15:03 hebijiandai

Please see how data is encoded and decoded:

https://github.com/bmihelac/django-import-export/blob/master/import_export/admin.py

Mar 10 '14 15:03 bmihelac

I also have same problem under chinese os while file content is encoded with utf8.

Would it be the problem of os environment setting and should open file with open(..., encoding="...") not just open filename&read_mode, to solve that?

my traceback: Traceback: File "D:\My Documents\Workspaces\xbcWeb\xbcWeb\env\lib\site-packages\django\core\handlers\base.py" in get_response

```
                response = wrapped_callback(request, _callback_args, *_callback_kwargs)
```
File "D:\My Documents\Workspaces\xbcWeb\xbcWeb\env\lib\site-packages\django\utils\decorators.py" in _wrapped_view
```
                response = view_func(request, _args, *_kwargs)
```
File "D:\My Documents\Workspaces\xbcWeb\xbcWeb\env\lib\site-packages\django\views\decorators\cache.py" in _wrapped_view_func
```
    response = view_func(request, _args, *_kwargs)
```
File "D:\My Documents\Workspaces\xbcWeb\xbcWeb\env\lib\site-packages\django\contrib\admin\sites.py" in inner
```
        return view(request, _args, *_kwargs)
```
File "D:\My Documents\Workspaces\xbcWeb\xbcWeb\env\lib\site-packages\import_export\admin.py" in import_action

            data = uploaded_import_file.read()

Jan 08 '15 01:01 u8621011

I tried and can fixed the exception with this code. please patch it if it's suitable. i am not familiar with github.

in process_import() import_file = open(import_file_name, input_format.get_read_mode(), encoding=self.from_encoding)

in import_action() with open(uploaded_file.name, input_format.get_read_mode(), encoding=self.from_encoding) as uploaded_import_file:

Jan 08 '15 06:01 u8621011

Still having this problem using v1.1.0, python3, in ubuntu16.04 when trying to import a CSV file with latin characters. Also gives the error even if you specify the "from_encoding" attribute.

class MyImportMixin(ImportMixin):
        formats = (CSV,)
        from_encoding = 'latin-1'
        
class UserTmpAdmin(MyImportMixin, admin.ModelAdmin):
        resource_class = UsuariosTmpResource

admin.site.register(Usuarios_temporales, UserTmpAdmin)

I also tested tablib in the shell and works ok...

Any ideas of what could be wrong. :-( The most strange thing is that works like a charm in Windows, but same base code fails when deloyed in ubuntu 16.06.

Oct 08 '18 19:10 hypnotic-frog

I've hit similar problems, and there are several distinct problems with the current code:

1: Encoding is happening in the wrong place

... should probably be happening with standard Python open() & encoding=, rather than fetching the data and using force_text() on it. (And while this is being fixed: it may be better to default to the utf-8-sig encoding rather than plain utf-8, but only for reading, as it will detect and skip the BOM if there is one.)

2: force_text() exceptions aren't being caught

And if the code is going to use force_text(), it should catch the correct exception (DjangoUnicodeDecodeError, not UnicodeDecodeError)

3: Doesn't handle universal newline types

The code already knows what open() modes to use for each format, thanks to base_formats. Text formats should have the U (universal newline) flag added. (Update: the U flag is deprecated in open() now. Use text mode or set newline=None)

4: Should just raise the actual exception rather than doing an `HttpResponse` + `<h1>` simulation of an error

No wonder nothing was showing up in my error logs. forehead-slap

Jan 22 '19 19:01 yozlet

@yozlet great tip for utf-8-sig

while such issues have almost always missed reproducible test case, I'm totally for making library more robust when handling different encodings.

Jan 23 '19 11:01 bmihelac

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Jun 06 '19 07:06 stale[bot]

Hi guys, Any updates on this, currently experiencing similar issues.

Oct 03 '19 20:10 xiubinzheng

Me too

Imported file has a wrong encoding: 'ascii' codec can't decode byte 0xc3 in position 31: ordinal not in range(128)

Can't get it parsed with utf-8 in deploy environment, although in development works fine

May 25 '20 15:05 GabrieleCalarota

Me too

Imported file has a wrong encoding: 'ascii' codec can't decode byte 0xc3 in position 2011: ordinal not in range(128)

only in Production environment, a json, @bmihelac why if file is encoded into UTF-8?? Django 2.2

Apr 07 '21 13:04 giuseppenovielli

4: Should just raise the actual exception rather than doing an HttpResponse + h1 simulation of an error

Implemented in PR #1281 (although the error is presented back in the UI as a form error)

Jul 31 '21 16:07 matthewhegarty

I have created #1306 based on the suggestions made by @yozlet here. I cannot reproduce the import errors but it would be great if anyone who has commented previously in this thread can test the PR to see if it resolves their issues.

Aug 01 '21 10:08 matthewhegarty

I want to know if had resolved this issu? I uploaded file's language is chinese char, the problem is same as your...

Feb 20 '22 17:02 bowuL

@bowuL Please can you try this branch and let us know if the problem still exists?

Feb 20 '22 19:02 matthewhegarty

I had the same issue and now it's fixed. Thx.

Mar 05 '22 12:03 jairodri

@jairodri Thanks - was that using the new branch?

Mar 05 '22 12:03 matthewhegarty

yes, i'm using it :)

Mar 05 '22 13:03 jairodri

@matthewhegarty Sorry for the late reply. I didn't use other branch, just change the source code and rewrite the FolderStorage class

Mar 15 '22 07:03 bowuL

@bowuL thanks - if you could try the other branch that would be great, as it would help us understand whether the proposed fix is going to work.

Mar 15 '22 09:03 matthewhegarty

Release 3.0 (beta) is now available, so anyone who is hitting this issue is encouraged to test with v3.0-0-beta.

Apr 07 '22 12:04 matthewhegarty

Release 3.0 (beta) is now available, so anyone who is hitting this issue is encouraged to test with v3.0-0-beta.

I had a similar issue on django-import-export==2.8.0

'charmap' codec can't decode byte 0x8f in position 29

After upgrading to django-import-export==3.0.0b4, i did not get this anymore

Thanks

Jun 25 '22 12:06 NwawelAIroume

Closing - this should be fixed after release v3 - please raise new issue if still occurring.

Apr 12 '23 14:04 matthewhegarty

django-import-export django-import-export copied to clipboard

Suggestion:The encode conversion

1: Encoding is happening in the wrong place

2: force_text() exceptions aren't being caught

3: Doesn't handle universal newline types

4: Should just raise the actual exception rather than doing an HttpResponse + <h1> simulation of an error

django-import-export
django-import-export copied to clipboard

4: Should just raise the actual exception rather than doing an `HttpResponse` + `<h1>` simulation of an error