vlfi
vlfi copied to clipboard
Opening file with wrong encoding.
Hi, I find vlf
cannot find the encoding of file and open it with right encoding the same way as GNU Emacs default find-file
.
I can open the same file with right encoding with find-file
.
How can I open the file with the right encoding?
Thanks!
Can you tell what encoding find-file reports? After opening the file, this can be checked with:
M-x describe-current-coding-system
Maybe I would be able to reproduce it on arbitrary file of my own and attempt some tweaks. Otherwise it's a known issue that detecting correct encoding starting at random part of file is imperfect: #16
Coding system for saving this buffer:
c -- chinese-gbk-dos (alias: gbk-dos cp936-dos windows-936-dos)
Default coding system (for new files):
U -- utf-8-unix (alias: mule-utf-8-unix)
Coding system for keyboard input:
U -- utf-8-unix (alias: mule-utf-8-unix)
Coding system for terminal output:
U -- utf-8-unix (alias: mule-utf-8-unix)
Coding system for inter-client cut and paste:
nil
Defaults for subprocess I/O:
decoding: U -- utf-8-unix (alias: mule-utf-8-unix)
encoding: U -- utf-8-unix (alias: mule-utf-8-unix)
Priority order for recognizing coding systems when reading files:
1. utf-8 (alias: mule-utf-8)
2. chinese-gbk (alias: gbk cp936 windows-936)
3. iso-2022-cn (alias: chinese-iso-7bit)
4. chinese-big5 (alias: big5 cn-big5 cp950)
5. chinese-iso-8bit (alias: cn-gb-2312 euc-china euc-cn cn-gb gb2312)
6. iso-2022-7bit
7. iso-2022-8bit-ss2
8. emacs-mule
9. raw-text
10. iso-2022-jp (alias: junet)
11. in-is13194-devanagari (alias: devanagari)
12. utf-8-auto
13. utf-8-with-signature
14. utf-16
15. utf-16be-with-signature (alias: utf-16-be)
16. utf-16le-with-signature (alias: utf-16-le)
17. utf-16be
18. utf-16le
19. japanese-shift-jis (alias: shift_jis sjis)
20. undecided
Other coding systems cannot be distinguished automatically
from these, and therefore cannot be recognized automatically
with the present coding system priorities.
Particular coding systems specified for certain file names:
OPERATION TARGET PATTERN CODING SYSTEM(s)
--------- -------------- ----------------
File I/O "\\.dz\\'" (no-conversion . no-conversion)
"\\.txz\\'" (no-conversion . no-conversion)
"\\.xz\\'" (no-conversion . no-conversion)
"\\.lzma\\'" (no-conversion . no-conversion)
"\\.lz\\'" (no-conversion . no-conversion)
"\\.g?z\\'" (no-conversion . no-conversion)
"\\.\\(?:tgz\\|svgz\\|sifz\\)\\'"
(no-conversion . no-conversion)
"\\.tbz2?\\'" (no-conversion . no-conversion)
"\\.bz2\\'" (no-conversion . no-conversion)
"\\.Z\\'" (no-conversion . no-conversion)
"\\.elc\\'" utf-8-emacs
"\\.el\\'" prefer-utf-8
"\\.utf\\(-8\\)?\\'" utf-8
"\\.xml\\'" xml-find-file-coding-system
"\\(\\`\\|/\\)loaddefs.el\\'"
(raw-text . raw-text-unix)
"\\.tar\\'" (no-conversion . no-conversion)
"\\.po[tx]?\\'\\|\\.po\\."
po-find-file-coding-system
"\\.\\(tex\\|ltx\\|dtx\\|drv\\)\\'"
latexenc-find-file-coding-system
"" (undecided)
Process I/O nothing specified
Network I/O nothing specified
I found vlf
can correctly open the file I cut from the beginning of the large file which cannot be opened correctly.
Thank you for the details! It seems in line with what I observed once upon a time with utf-16. The case back then was that there were some magic header bytes in the beginning of the file which specified encoding. Inserting arbitrary batch from anywhere beside the beginning doesn't get this information and the insert function is unable to detect proper encoding.
Probably in such cases VLF has to keep track of the initially observed encoding and use it in case auto detection fails on other batches. I'll look deeper probably this weekend and hopefully come up with solution this time. Keep your file around for just in case ;-)
Thank you for your work. I recall that one of the chapters of Emacs or Elisp manual has some description about the magic header bytes of files with other encoding.
I've just pushed something that fixes the issue with utf-16 (at least). Hopefully it will work in this case too.
Sorry for reopened. I just opened a wrong file. And the file mentioned above still cannot be opened correctly.
The file is on http://vdisk.weibo.com/s/utbH7Zm3Y8yvm , if you can access to it, and want to use it for testing.
To download it, please click on the
in this page,
and then click on the
in the popup window.
Note that this page should not be opened on mobile, you can check the url after opening it, the url should not be changed to http://vdisk.weibo.com/wap/s/utbH7Zm3Y8yvm .
If you cannot access to this file, and want to get this file to test, plz @ me and I will upload it to dropbox and send it to you.
Thanks a lot!
Got the file, thanks!
So the battle continues. I'll investigate in the coming days.