utf8-all
utf8-all copied to clipboard
turn on Unicode - all of it
NAME
utf8::all - turn on Unicode - all of it
VERSION
version 0.024
SYNOPSIS
use utf8::all; # Turn on UTF-8, all of it.
open my $in, '<', 'contains-utf8'; # UTF-8 already turned on here
print length 'føø bār'; # 7 UTF-8 characters
my $utf8_arg = shift @ARGV; # @ARGV is UTF-8 too (only for main)
DESCRIPTION
The use utf8 pragma tells the Perl parser to allow UTF-8 in the
program text in the current lexical scope. This also means that you
can now use literal Unicode characters as part of strings, variable
names, and regular expressions.
utf8::all goes further:
charnamesare imported so\N{...}sequences can be used to compile Unicode characters based on names.- On Perl
v5.11.0or higher, theuse feature 'unicode_strings'is enabled. use feature fcanduse feature unicode_evalare enabled on Perl5.16.0and higher.- Filehandles are opened with UTF-8 encoding turned on by default
(including
STDIN,STDOUT, andSTDERRwhenutf8::allis used from themainpackage). Meaning that they automatically convert UTF-8 octets to characters and vice versa. If you don't want UTF-8 for a particular filehandle, you'll have to setbinmode $filehandle. @ARGVgets converted from UTF-8 octets to Unicode characters (whenutf8::allis used from themainpackage). This is similar to the behaviour of the-CAperl command-line switch (see perlrun).readdir,readlink,readpipe(including theqx//and backtick operators), andglob(including the<>operator) now all work with and return Unicode characters instead of (UTF-8) octets (again only whenutf8::allis used from themainpackage).
Lexical Scope
The pragma is lexically-scoped, so you can do the following if you had some reason to:
{
use utf8::all;
open my $out, '>', 'outfile';
my $utf8_str = 'føø bār';
print length $utf8_str, "\n"; # 7
print $out $utf8_str; # out as utf8
}
open my $in, '<', 'outfile'; # in as raw
my $text = do { local $/; <$in>};
print length $text, "\n"; # 10, not 7!
Instead of lexical scoping, you can also use no utf8::all to turn
off the effects.
Note that the effect on @ARGV and the STDIN, STDOUT, and
STDERR file handles is always global and can not be undone!
Enabling/Disabling Global Features
As described above, the default behaviour of utf8::all is to
convert @ARGV and to open the STDIN, STDOUT, and STDERR
file handles with UTF-8 encoding, and override the readlink and
readdir functions and glob operators when utf8::all is used
from the main package.
If you want to disable these features even when utf8::all is used
from the main package, add the option NO-GLOBAL (or
LEXICAL-ONLY) to the use line. E.g.:
use utf8::all 'NO-GLOBAL';
If on the other hand you want to enable these global effects even when
utf8::all was used from another package than main, use the
option GLOBAL on the use line:
use utf8::all 'GLOBAL';
UTF-8 Errors
utf8::all will handle invalid code points (i.e., utf-8 that does
not map to a valid unicode "character"), as a fatal error.
For glob, readdir, and readlink, one can change this
behaviour by setting the attribute "$utf8::all::UTF8_CHECK".
ATTRIBUTES
$utf8::all::UTF8_CHECK
By default utf8::all marks decoding errors as fatal (default value
for this setting is Encode::FB_CROAK). If you want, you can change this by
setting $utf8::all::UTF8_CHECK. The value Encode::FB_WARN reports
the encoding errors as warnings, and Encode::FB_DEFAULT will completely
ignore them. Please see Encode for details. Note: Encode::LEAVE_SRC is
always enforced.
Important: Only controls the handling of decoding errors in glob,
readdir, and readlink.
INTERACTION WITH AUTODIE
If you use autodie, which is a great idea, you need to use at least version 2.12, released on June 26, 2012. Otherwise, autodie obliterates the IO layers set by the open pragma. See RT #54777 and GH #7.
BUGS
Please report any bugs or feature requests on the bugtracker website.
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.
COMPATIBILITY
The filesystems of Dos, Windows, and OS/2 do not (fully) support
UTF-8. The readlink and readdir functions and glob operators
will therefore not be replaced on these systems.
SEE ALSO
- File::Find::utf8 for fully utf-8 aware File::Find functions.
- Cwd::utf8 for fully utf-8 aware Cwd functions.
AUTHORS
- Michael Schwern [email protected]
- Mike Doherty [email protected]
- Hayo Baan [email protected]
COPYRIGHT AND LICENSE
This software is copyright (c) 2009 by Michael Schwern [email protected]; he originated it.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
