svn-scm icon indicating copy to clipboard operation
svn-scm copied to clipboard

Try using a different encoding package

Open JohnstonCode opened this issue 4 years ago • 23 comments

I propose we test the use of chardet for detecting character encoding. The use of this new package will be behind a settings option svn.experimental.detect_encoding

JohnstonCode avatar Feb 19 '20 16:02 JohnstonCode

@Yanpas @Blashaq @nihlet @michelegalante

Can you all update to 2.10.0 and set svn.experimental.detect_encoding to true and see if you have any encoding issues please?

Can you also remove "files.encoding" and "svn.default.encoding" as well to make sure.

Let me know if you have any issues.

JohnstonCode avatar Feb 19 '20 16:02 JohnstonCode

@jacobweber

JohnstonCode avatar Feb 19 '20 16:02 JohnstonCode

Thanks for thinking of me. :) I'm still fairly convinced that my issue isn't related to character encoding, though. In fact, I've recently been seeing it with git as well, which makes me think it may actually be a vscode issue.

jacobweber avatar Feb 19 '20 17:02 jacobweber

This extension and git use the exact same encoding detection.

JohnstonCode avatar Feb 19 '20 17:02 JohnstonCode

Well, I'll give it a try.

jacobweber avatar Feb 19 '20 17:02 jacobweber

I have a file in cp1251 encoding. No matter if I set or unset detect_encoding - the diff lines appear if I set cp1251 encoding in editor (with UTF8 there is no diff and a lot of symbols).

Diff in item history view does not respect encodings

Yanpas avatar Feb 20 '20 14:02 Yanpas

can you send me said file?

JohnstonCode avatar Feb 20 '20 14:02 JohnstonCode

@Yanpas can you try 2.10.1

JohnstonCode avatar Feb 20 '20 15:02 JohnstonCode

Here it is in base64:

IyEvYmluL2Jhc2gKCkRBVEFfRElSPSIkMSIKV0VCX0RJUj0iJDIiClJPT1RfRElSPSIkMyIKClNF
TEY9IiQoIHdoaWNoICIkMCIgKSIKQklOX0RJUj0iJCggZGlybmFtZSAiJFNFTEYiICkiCgpta2Rp
ciAtcCAkV0VCX0RJUgpjcCAtcmYgJEJJTl9ESVIvaW1hZ2VzICRXRUJfRElSLy4uCiMg5ODt7fvl
LCDo5yDq7vLu8Pv1IOPl7eXw6PDz5fLx/yDu8vfl8iwg7O7j8/Ig7/Do4+7k6PL88f8g5Ov/IO7y
6+Dk6ugg6CDv8Ogg8e7n5ODt6Ogg5O7v7uvt6PLl6/zt+/UKIyDu8vfl8u7iIOTr/yDx8uDw+/Ug
8eHu8O7qCmNwICREQVRBX0RJUi8qLmNzdiAkV0VCX0RJUgoKJEJJTl9ESVIvZ2VuX3BhZ2VfYnVp
bGQuc2ggJERBVEFfRElSICRXRUJfRElSCiMg8ePl7eXw6PDz5ewganVuaXQueG1sIOIg6u7w7eUs
IOIg6u7y7vDu7CDh8+Tl8iDo7fTu8Ozg9uj/IO4g8uXx8uD1LCD38u7h+yBqZW5raW5zIO/u5PXi
4PLo6yD98vMg6O307vDs4Pbo/gokQklOX0RJUi9nZW5feG1sX3Rlc3RzLnNoICREQVRBX0RJUiAk
Uk9PVF9ESVIK

With 2.10.1 I see blue lines with both 1251 and utf8 encodings.

image image

Yanpas avatar Feb 20 '20 20:02 Yanpas

BTW I disabled vscode's encoding autodetection since it's very buggy, often treats utf8 files as cp1252

https://github.com/microsoft/vscode/issues/85480 https://github.com/microsoft/vscode/issues/33720

Yanpas avatar Feb 20 '20 20:02 Yanpas

Thanks for that @Yanpas will see if i can fix it today

JohnstonCode avatar Feb 21 '20 08:02 JohnstonCode

jschardet guesses the correct encoding for your file. it is the default encoding of utf8 and auto-guess being disabled that doesn't get it to run.

https://github.com/JohnstonCode/svn-scm/blob/master/src/svnRepository.ts#L313 https://github.com/JohnstonCode/svn-scm/blob/master/src/svnRepository.ts#L336

JohnstonCode avatar Feb 21 '20 09:02 JohnstonCode

Maybe an encoding detection priority list? https://github.com/microsoft/vscode/issues/85480#issuecomment-579880763

JohnstonCode avatar Feb 21 '20 10:02 JohnstonCode

Can you give 2.10.2 as try. This adds a new config option svn.experimental.encoding_priority which takes an array of encoding types and prioritise them based on order. For example ["UTF-8", "GB18030", "windows-1251"] will prioritise UTF-8 then GB18030 ect. If there are no matches it will just return null and will either use svn.default.encoding or UTF-8.

JohnstonCode avatar Feb 21 '20 11:02 JohnstonCode

This helps

Things would be much easier if vscode provided chosen encoding. Maybe request API extending?

Yanpas avatar Feb 21 '20 11:02 Yanpas

Glad that is working for you. As per your linked issues i think they need to decide on how they will handle it internally before they do any API work.

JohnstonCode avatar Feb 21 '20 11:02 JohnstonCode

There still seans to be a problem- function createTempSvnRevisionFile in temp_svn_fs.ts seems to break characters, because it converts encoded buffer to JS string before saving file. Js strings are always utf-8 encoded. Working on buffers in this case solves the problem.

pt., 21 lut 2020, 12:44 użytkownik Christopher [email protected] napisał:

Glad that is working for you. As per your linked issues i think they need to decide on how they will handle it internally before they do any API work.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JohnstonCode/svn-scm/issues/830?email_source=notifications&email_token=AGS36IMMUUMXTGQFDMBWP6LRD65DHA5CNFSM4KX4HPC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMSOGXI#issuecomment-589620061, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGS36IIC3POYW5RE55GTB2DRD65DHANCNFSM4KX4HPCQ .

Blashaq avatar Feb 21 '20 12:02 Blashaq

I will create pull request with this change for you to review.

pt., 21 lut 2020, 13:16 użytkownik Blashaqu . [email protected] napisał:

There still seans to be a problem- function createTempSvnRevisionFile in temp_svn_fs.ts seems to break characters, because it converts encoded buffer to JS string before saving file. Js strings are always utf-8 encoded. Working on buffers in this case solves the problem.

pt., 21 lut 2020, 12:44 użytkownik Christopher [email protected] napisał:

Glad that is working for you. As per your linked issues i think they need to decide on how they will handle it internally before they do any API work.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JohnstonCode/svn-scm/issues/830?email_source=notifications&email_token=AGS36IMMUUMXTGQFDMBWP6LRD65DHA5CNFSM4KX4HPC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMSOGXI#issuecomment-589620061, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGS36IIC3POYW5RE55GTB2DRD65DHANCNFSM4KX4HPCQ .

Blashaq avatar Feb 21 '20 12:02 Blashaq

thanks @Blashaq

JohnstonCode avatar Feb 21 '20 13:02 JohnstonCode

I'm still seeing the diffs, using the latest version of the plugin.

jacobweber avatar Mar 24 '20 18:03 jacobweber

Can you give 2.10.2 as try. This adds a new config option svn.experimental.encoding_priority which takes an array of encoding types and prioritise them based on order. For example ["UTF-8", "GB18030", "windows-1251"] will prioritise UTF-8 then GB18030 ect. If there are no matches it will just return null and will either use svn.default.encoding or UTF-8.

Have you given this a try?

JohnstonCode avatar Mar 24 '20 18:03 JohnstonCode

No, I was just using svn.experimental.detect_encoding: true. I just added `"svn.experimental.encoding_priority": ["UTF-8"] and reloaded the window. I'll keep an eye on it and see if it appears again.

jacobweber avatar Mar 24 '20 18:03 jacobweber

Still seeing it -- same symptoms (log shows a "svn info" with no "svn cat").

jacobweber avatar Apr 07 '20 18:04 jacobweber