| Current File : /home/mmdealscpanel/yummmdeals.com/pod.zip |
PK y3�Z�A� perlvos.podnu �[��� If you read this file _as_is_, just ignore the funny characters you
see. It is written in the POD format (see pod/perlpod.pod) which is
specially designed to be readable as is.
=head1 NAME
perlvos - Perl for Stratus OpenVOS
=head1 SYNOPSIS
This file contains notes for building perl on the Stratus OpenVOS
operating system. Perl is a scripting or macro language that is
popular on many systems. See L<perlbook> for a number of good books
on Perl.
These are instructions for building Perl from source. This version of
Perl requires the dynamic linking support that is found in OpenVOS
Release 17.1 and thus is not supported on OpenVOS Release 17.0 or
earlier releases.
If you are running VOS Release 14.4.1 or later, you can obtain a
pre-compiled, supported copy of perl by purchasing the GNU Tools
product from Stratus Technologies.
=head1 BUILDING PERL FOR OPENVOS
To build perl from its source code on the Stratus V Series platform
you must have OpenVOS Release 17.1.0 or later, GNU Tools Release
3.5 or later, and the C/POSIX Runtime Libraries.
Follow the normal instructions for building perl; e.g, enter bash, run
the Configure script, then use "gmake" to build perl.
=head1 INSTALLING PERL IN OPENVOS
=over 4
=item 1
After you have built perl using the Configure script, ensure that you
have modify and default write permission to C<< >system>ported >> and
all subdirectories. Then type
gmake install
=item 2
While there are currently no architecture-specific extensions or
modules distributed with perl, the following directories can be
used to hold such files (replace the string VERSION by the
appropriate version number):
>system>ported>lib>perl5>VERSION>i786
=item 3
Site-specific perl extensions and modules can be installed in one of
two places. Put architecture-independent files into:
>system>ported>lib>perl5>site_perl>VERSION
Put site-specific architecture-dependent files into one of the
following directories:
>system>ported>lib>perl5>site_perl>VERSION>i786
=item 4
You can examine the @INC variable from within a perl program
to see the order in which Perl searches these directories.
=back
=head1 USING PERL IN OPENVOS
=head2 Restrictions of Perl on OpenVOS
This port of Perl version 5 prefers Unix-style, slash-separated
pathnames over OpenVOS-style greater-than-separated pathnames.
OpenVOS-style pathnames should work in most contexts, but if you have
trouble, replace all greater-than characters by slash characters.
Because the slash character is used as a pathname delimiter, Perl
cannot process OpenVOS pathnames containing a slash character in a
directory or file name; these must be renamed.
This port of Perl also uses Unix-epoch date values internally.
As long as you are dealing with ASCII character string
representations of dates, this should not be an issue. The
supported epoch is January 1, 1980 to January 17, 2038.
See the file pod/perlport.pod for more information about the OpenVOS
port of Perl.
=head1 TEST STATUS
A number of the perl self-tests fails for various reasons; generally
these are minor and due to subtle differences between common
POSIX-based environments and the OpenVOS POSIX environment. Ensure
that you conduct sufficient testing of your code to guarantee that it
works properly in the OpenVOS environment.
=head1 SUPPORT STATUS
I'm offering this port "as is". You can ask me questions, but I
can't guarantee I'll be able to answer them. There are some
excellent books available on the Perl language; consult a book
seller.
If you want a supported version of perl for OpenVOS, purchase the
OpenVOS GNU Tools product from Stratus Technologies, along with a
support contract (or from anyone else who will sell you support).
=head1 AUTHOR
Paul Green (Paul.Green@stratus.com)
=head1 LAST UPDATE
February 28, 2013
=cut
PK y3�Z�\A� �
perllinux.podnu �[��� If you read this file _as_is_, just ignore the funny characters you
see. It is written in the POD format (see pod/perlpod.pod) which is
specifically designed to be readable as is.
=head1 NAME
perllinux - Perl version 5 on Linux systems
=head1 DESCRIPTION
This document describes various features of Linux that will affect how Perl
version 5 (hereafter just Perl) is compiled and/or runs.
=head2 Experimental Support for Sun Studio Compilers for Linux OS
Sun Microsystems has released a port of their Sun Studio compilers for
Linux. As of November 2005, only an alpha version has been released.
Until a release of these compilers is made, support for compiling Perl with
these compiler experimental.
Also, some special instructions for building Perl with Sun Studio on Linux.
Following the normal C<Configure>, you have to run make as follows:
LDLOADLIBS=-lc make
C<LDLOADLIBS> is an environment variable used by the linker to link modules
C</ext> modules to glibc. Currently, that environment variable is not getting
populated by a combination of C<Config> entries and C<ExtUtil::MakeMaker>.
While there may be a bug somewhere in Perl's configuration or
C<ExtUtil::MakeMaker> causing the problem, the most likely cause is an
incomplete understanding of Sun Studio by this author. Further investigation
is needed to get this working better.
=head1 AUTHOR
Steve Peters <steve@fisharerojo.org>
Please report any errors, updates, or suggestions to F<perlbug@perl.org>.
PK y3�Z{��V �V perlpod.podnu �[���
=for comment
This document is in Pod format. To read this, use a Pod formatter,
like "perldoc perlpod".
=head1 NAME
X<POD> X<plain old documentation>
perlpod - the Plain Old Documentation format
=head1 DESCRIPTION
Pod is a simple-to-use markup language used for writing documentation
for Perl, Perl programs, and Perl modules.
Translators are available for converting Pod to various formats
like plain text, HTML, man pages, and more.
Pod markup consists of three basic kinds of paragraphs:
L<ordinary|/"Ordinary Paragraph">,
L<verbatim|/"Verbatim Paragraph">, and
L<command|/"Command Paragraph">.
=head2 Ordinary Paragraph
X<POD, ordinary paragraph>
Most paragraphs in your documentation will be ordinary blocks
of text, like this one. You can simply type in your text without
any markup whatsoever, and with just a blank line before and
after. When it gets formatted, it will undergo minimal formatting,
like being rewrapped, probably put into a proportionally spaced
font, and maybe even justified.
You can use formatting codes in ordinary paragraphs, for B<bold>,
I<italic>, C<code-style>, L<hyperlinks|perlfaq>, and more. Such
codes are explained in the "L<Formatting Codes|/"Formatting Codes">"
section, below.
=head2 Verbatim Paragraph
X<POD, verbatim paragraph> X<verbatim>
Verbatim paragraphs are usually used for presenting a codeblock or
other text which does not require any special parsing or formatting,
and which shouldn't be wrapped.
A verbatim paragraph is distinguished by having its first character
be a space or a tab. (And commonly, all its lines begin with spaces
and/or tabs.) It should be reproduced exactly, with tabs assumed to
be on 8-column boundaries. There are no special formatting codes,
so you can't italicize or anything like that. A \ means \, and
nothing else.
=head2 Command Paragraph
X<POD, command>
A command paragraph is used for special treatment of whole chunks
of text, usually as headings or parts of lists.
All command paragraphs (which are typically only one line long) start
with "=", followed by an identifier, followed by arbitrary text that
the command can use however it pleases. Currently recognized commands
are
=pod
=head1 Heading Text
=head2 Heading Text
=head3 Heading Text
=head4 Heading Text
=over indentlevel
=item stuff
=back
=begin format
=end format
=for format text...
=encoding type
=cut
To explain them each in detail:
=over
=item C<=head1 I<Heading Text>>
X<=head1> X<=head2> X<=head3> X<=head4>
X<head1> X<head2> X<head3> X<head4>
=item C<=head2 I<Heading Text>>
=item C<=head3 I<Heading Text>>
=item C<=head4 I<Heading Text>>
Head1 through head4 produce headings, head1 being the highest
level. The text in the rest of this paragraph is the content of the
heading. For example:
=head2 Object Attributes
The text "Object Attributes" comprises the heading there.
The text in these heading commands can use formatting codes, as seen here:
=head2 Possible Values for C<$/>
Such commands are explained in the
"L<Formatting Codes|/"Formatting Codes">" section, below.
=item C<=over I<indentlevel>>
X<=over> X<=item> X<=back> X<over> X<item> X<back>
=item C<=item I<stuff...>>
=item C<=back>
Item, over, and back require a little more explanation: "=over" starts
a region specifically for the generation of a list using "=item"
commands, or for indenting (groups of) normal paragraphs. At the end
of your list, use "=back" to end it. The I<indentlevel> option to
"=over" indicates how far over to indent, generally in ems (where
one em is the width of an "M" in the document's base font) or roughly
comparable units; if there is no I<indentlevel> option, it defaults
to four. (And some formatters may just ignore whatever I<indentlevel>
you provide.) In the I<stuff> in C<=item I<stuff...>>, you may
use formatting codes, as seen here:
=item Using C<$|> to Control Buffering
Such commands are explained in the
"L<Formatting Codes|/"Formatting Codes">" section, below.
Note also that there are some basic rules to using "=over" ...
"=back" regions:
=over
=item *
Don't use "=item"s outside of an "=over" ... "=back" region.
=item *
The first thing after the "=over" command should be an "=item", unless
there aren't going to be any items at all in this "=over" ... "=back"
region.
=item *
Don't put "=headI<n>" commands inside an "=over" ... "=back" region.
=item *
And perhaps most importantly, keep the items consistent: either use
"=item *" for all of them, to produce bullets; or use "=item 1.",
"=item 2.", etc., to produce numbered lists; or use "=item foo",
"=item bar", etc.--namely, things that look nothing like bullets or
numbers.
If you start with bullets or numbers, stick with them, as
formatters use the first "=item" type to decide how to format the
list.
=back
=item C<=cut>
X<=cut> X<cut>
To end a Pod block, use a blank line,
then a line beginning with "=cut", and a blank
line after it. This lets Perl (and the Pod formatter) know that
this is where Perl code is resuming. (The blank line before the "=cut"
is not technically necessary, but many older Pod processors require it.)
=item C<=pod>
X<=pod> X<pod>
The "=pod" command by itself doesn't do much of anything, but it
signals to Perl (and Pod formatters) that a Pod block starts here. A
Pod block starts with I<any> command paragraph, so a "=pod" command is
usually used just when you want to start a Pod block with an ordinary
paragraph or a verbatim paragraph. For example:
=item stuff()
This function does stuff.
=cut
sub stuff {
...
}
=pod
Remember to check its return value, as in:
stuff() || die "Couldn't do stuff!";
=cut
=item C<=begin I<formatname>>
X<=begin> X<=end> X<=for> X<begin> X<end> X<for>
=item C<=end I<formatname>>
=item C<=for I<formatname> I<text...>>
For, begin, and end will let you have regions of text/code/data that
are not generally interpreted as normal Pod text, but are passed
directly to particular formatters, or are otherwise special. A
formatter that can use that format will use the region, otherwise it
will be completely ignored.
A command "=begin I<formatname>", some paragraphs, and a
command "=end I<formatname>", mean that the text/data in between
is meant for formatters that understand the special format
called I<formatname>. For example,
=begin html
<hr> <img src="thang.png">
<p> This is a raw HTML paragraph </p>
=end html
The command "=for I<formatname> I<text...>"
specifies that the remainder of just this paragraph (starting
right after I<formatname>) is in that special format.
=for html <hr> <img src="thang.png">
<p> This is a raw HTML paragraph </p>
This means the same thing as the above "=begin html" ... "=end html"
region.
That is, with "=for", you can have only one paragraph's worth
of text (i.e., the text in "=foo targetname text..."), but with
"=begin targetname" ... "=end targetname", you can have any amount
of stuff in between. (Note that there still must be a blank line
after the "=begin" command and a blank line before the "=end"
command.)
Here are some examples of how to use these:
=begin html
<br>Figure 1.<br><IMG SRC="figure1.png"><br>
=end html
=begin text
---------------
| foo |
| bar |
---------------
^^^^ Figure 1. ^^^^
=end text
Some format names that formatters currently are known to accept
include "roff", "man", "latex", "tex", "text", and "html". (Some
formatters will treat some of these as synonyms.)
A format name of "comment" is common for just making notes (presumably
to yourself) that won't appear in any formatted version of the Pod
document:
=for comment
Make sure that all the available options are documented!
Some I<formatnames> will require a leading colon (as in
C<"=for :formatname">, or
C<"=begin :formatname" ... "=end :formatname">),
to signal that the text is not raw data, but instead I<is> Pod text
(i.e., possibly containing formatting codes) that's just not for
normal formatting (e.g., may not be a normal-use paragraph, but might
be for formatting as a footnote).
=item C<=encoding I<encodingname>>
X<=encoding> X<encoding>
This command is used for declaring the encoding of a document. Most
users won't need this; but if your encoding isn't US-ASCII,
then put a C<=encoding I<encodingname>> command very early in the document so
that pod formatters will know how to decode the document. For
I<encodingname>, use a name recognized by the L<Encode::Supported>
module. Some pod formatters may try to guess between a Latin-1 or
CP-1252 versus
UTF-8 encoding, but they may guess wrong. It's best to be explicit if
you use anything besides strict ASCII. Examples:
=encoding latin1
=encoding utf8
=encoding koi8-r
=encoding ShiftJIS
=encoding big5
C<=encoding> affects the whole document, and must occur only once.
=back
And don't forget, all commands but C<=encoding> last up
until the end of its I<paragraph>, not its line. So in the
examples below, you can see that every command needs the blank
line after it, to end its paragraph. (And some older Pod translators
may require the C<=encoding> line to have a following blank line as
well, even though it should be legal to omit.)
Some examples of lists include:
=over
=item *
First item
=item *
Second item
=back
=over
=item Foo()
Description of Foo function
=item Bar()
Description of Bar function
=back
=head2 Formatting Codes
X<POD, formatting code> X<formatting code>
X<POD, interior sequence> X<interior sequence>
In ordinary paragraphs and in some command paragraphs, various
formatting codes (a.k.a. "interior sequences") can be used:
=for comment
"interior sequences" is such an opaque term.
Prefer "formatting codes" instead.
=over
=item C<IE<lt>textE<gt>> -- italic text
X<I> X<< IZ<><> >> X<POD, formatting code, italic> X<italic>
Used for emphasis ("C<be IE<lt>careful!E<gt>>") and parameters
("C<redo IE<lt>LABELE<gt>>")
=item C<BE<lt>textE<gt>> -- bold text
X<B> X<< BZ<><> >> X<POD, formatting code, bold> X<bold>
Used for switches ("C<perl's BE<lt>-nE<gt> switch>"), programs
("C<some systems provide a BE<lt>chfnE<gt> for that>"),
emphasis ("C<be BE<lt>careful!E<gt>>"), and so on
("C<and that feature is known as BE<lt>autovivificationE<gt>>").
=item C<CE<lt>codeE<gt>> -- code text
X<C> X<< CZ<><> >> X<POD, formatting code, code> X<code>
Renders code in a typewriter font, or gives some other indication that
this represents program text ("C<CE<lt>gmtime($^T)E<gt>>") or some other
form of computerese ("C<CE<lt>drwxr-xr-xE<gt>>").
=item C<LE<lt>nameE<gt>> -- a hyperlink
X<L> X<< LZ<><> >> X<POD, formatting code, hyperlink> X<hyperlink>
There are various syntaxes, listed below. In the syntaxes given,
C<text>, C<name>, and C<section> cannot contain the characters
'/' and '|'; and any '<' or '>' should be matched.
=over
=item *
C<LE<lt>nameE<gt>>
Link to a Perl manual page (e.g., C<LE<lt>Net::PingE<gt>>). Note
that C<name> should not contain spaces. This syntax
is also occasionally used for references to Unix man pages, as in
C<LE<lt>crontab(5)E<gt>>.
=item *
C<LE<lt>name/"sec"E<gt>> or C<LE<lt>name/secE<gt>>
Link to a section in other manual page. E.g.,
C<LE<lt>perlsyn/"For Loops"E<gt>>
=item *
C<LE<lt>/"sec"E<gt>> or C<LE<lt>/secE<gt>>
Link to a section in this manual page. E.g.,
C<LE<lt>/"Object Methods"E<gt>>
=back
A section is started by the named heading or item. For
example, C<LE<lt>perlvar/$.E<gt>> or C<LE<lt>perlvar/"$."E<gt>> both
link to the section started by "C<=item $.>" in perlvar. And
C<LE<lt>perlsyn/For LoopsE<gt>> or C<LE<lt>perlsyn/"For Loops"E<gt>>
both link to the section started by "C<=head2 For Loops>"
in perlsyn.
To control what text is used for display, you
use "C<LE<lt>text|...E<gt>>", as in:
=over
=item *
C<LE<lt>text|nameE<gt>>
Link this text to that manual page. E.g.,
C<LE<lt>Perl Error Messages|perldiagE<gt>>
=item *
C<LE<lt>text|name/"sec"E<gt>> or C<LE<lt>text|name/secE<gt>>
Link this text to that section in that manual page. E.g.,
C<LE<lt>postfix "if"|perlsyn/"Statement Modifiers"E<gt>>
=item *
C<LE<lt>text|/"sec"E<gt>> or C<LE<lt>text|/secE<gt>>
or C<LE<lt>text|"sec"E<gt>>
Link this text to that section in this manual page. E.g.,
C<LE<lt>the various attributes|/"Member Data"E<gt>>
=back
Or you can link to a web page:
=over
=item *
C<LE<lt>scheme:...E<gt>>
C<LE<lt>text|scheme:...E<gt>>
Links to an absolute URL. For example, C<LE<lt>http://www.perl.org/E<gt>> or
C<LE<lt>The Perl Home Page|http://www.perl.org/E<gt>>.
=back
=item C<EE<lt>escapeE<gt>> -- a character escape
X<E> X<< EZ<><> >> X<POD, formatting code, escape> X<escape>
Very similar to HTML/XML C<&I<foo>;> "entity references":
=over
=item *
C<EE<lt>ltE<gt>> -- a literal E<lt> (less than)
=item *
C<EE<lt>gtE<gt>> -- a literal E<gt> (greater than)
=item *
C<EE<lt>verbarE<gt>> -- a literal | (I<ver>tical I<bar>)
=item *
C<EE<lt>solE<gt>> -- a literal / (I<sol>idus)
The above four are optional except in other formatting codes,
notably C<LE<lt>...E<gt>>, and when preceded by a
capital letter.
=item *
C<EE<lt>htmlnameE<gt>>
Some non-numeric HTML entity name, such as C<EE<lt>eacuteE<gt>>,
meaning the same thing as C<é> in HTML -- i.e., a lowercase
e with an acute (/-shaped) accent.
=item *
C<EE<lt>numberE<gt>>
The ASCII/Latin-1/Unicode character with that number. A
leading "0x" means that I<number> is hex, as in
C<EE<lt>0x201EE<gt>>. A leading "0" means that I<number> is octal,
as in C<EE<lt>075E<gt>>. Otherwise I<number> is interpreted as being
in decimal, as in C<EE<lt>181E<gt>>.
Note that older Pod formatters might not recognize octal or
hex numeric escapes, and that many formatters cannot reliably
render characters above 255. (Some formatters may even have
to use compromised renderings of Latin-1/CP-1252 characters, like
rendering C<EE<lt>eacuteE<gt>> as just a plain "e".)
=back
=item C<FE<lt>filenameE<gt>> -- used for filenames
X<F> X<< FZ<><> >> X<POD, formatting code, filename> X<filename>
Typically displayed in italics. Example: "C<FE<lt>.cshrcE<gt>>"
=item C<SE<lt>textE<gt>> -- text contains non-breaking spaces
X<S> X<< SZ<><> >> X<POD, formatting code, non-breaking space>
X<non-breaking space>
This means that the words in I<text> should not be broken
across lines. Example: S<C<SE<lt>$x ? $y : $zE<gt>>>.
=item C<XE<lt>topic nameE<gt>> -- an index entry
X<X> X<< XZ<><> >> X<POD, formatting code, index entry> X<index entry>
This is ignored by most formatters, but some may use it for building
indexes. It always renders as empty-string.
Example: C<XE<lt>absolutizing relative URLsE<gt>>
=item C<ZE<lt>E<gt>> -- a null (zero-effect) formatting code
X<Z> X<< ZZ<><> >> X<POD, formatting code, null> X<null>
This is rarely used. It's one way to get around using an
EE<lt>...E<gt> code sometimes. For example, instead of
"C<NEE<lt>ltE<gt>3>" (for "NE<lt>3") you could write
"C<NZE<lt>E<gt>E<lt>3>" (the "ZE<lt>E<gt>" breaks up the "N" and
the "E<lt>" so they can't be considered
the part of a (fictitious) "NE<lt>...E<gt>" code).
=for comment
This was formerly explained as a "zero-width character". But it in
most parser models, it parses to nothing at all, as opposed to parsing
as if it were a E<zwnj> or E<zwj>, which are REAL zero-width characters.
So "width" and "character" are exactly the wrong words.
=back
Most of the time, you will need only a single set of angle brackets to
delimit the beginning and end of formatting codes. However,
sometimes you will want to put a real right angle bracket (a
greater-than sign, '>') inside of a formatting code. This is particularly
common when using a formatting code to provide a different font-type for a
snippet of code. As with all things in Perl, there is more than
one way to do it. One way is to simply escape the closing bracket
using an C<E> code:
C<$a E<lt>=E<gt> $b>
This will produce: "C<$a E<lt>=E<gt> $b>"
A more readable, and perhaps more "plain" way is to use an alternate
set of delimiters that doesn't require a single ">" to be escaped.
Doubled angle brackets ("<<" and ">>") may be used I<if and only if there is
whitespace right after the opening delimiter and whitespace right
before the closing delimiter!> For example, the following will
do the trick:
X<POD, formatting code, escaping with multiple brackets>
C<< $a <=> $b >>
In fact, you can use as many repeated angle-brackets as you like so
long as you have the same number of them in the opening and closing
delimiters, and make sure that whitespace immediately follows the last
'<' of the opening delimiter, and immediately precedes the first '>'
of the closing delimiter. (The whitespace is ignored.) So the
following will also work:
X<POD, formatting code, escaping with multiple brackets>
C<<< $a <=> $b >>>
C<<<< $a <=> $b >>>>
And they all mean exactly the same as this:
C<$a E<lt>=E<gt> $b>
The multiple-bracket form does not affect the interpretation of the contents of
the formatting code, only how it must end. That means that the examples above
are also exactly the same as this:
C<< $a E<lt>=E<gt> $b >>
As a further example, this means that if you wanted to put these bits of
code in C<C> (code) style:
open(X, ">>thing.dat") || die $!
$foo->bar();
you could do it like so:
C<<< open(X, ">>thing.dat") || die $! >>>
C<< $foo->bar(); >>
which is presumably easier to read than the old way:
C<open(X, "E<gt>E<gt>thing.dat") || die $!>
C<$foo-E<gt>bar();>
This is currently supported by pod2text (Pod::Text), pod2man (Pod::Man),
and any other pod2xxx or Pod::Xxxx translators that use
Pod::Parser 1.093 or later, or Pod::Tree 1.02 or later.
=head2 The Intent
X<POD, intent of>
The intent is simplicity of use, not power of expression. Paragraphs
look like paragraphs (block format), so that they stand out
visually, and so that I could run them through C<fmt> easily to reformat
them (that's F7 in my version of B<vi>, or Esc Q in my version of
B<emacs>). I wanted the translator to always leave the C<'> and C<`> and
C<"> quotes alone, in verbatim mode, so I could slurp in a
working program, shift it over four spaces, and have it print out, er,
verbatim. And presumably in a monospace font.
The Pod format is not necessarily sufficient for writing a book. Pod
is just meant to be an idiot-proof common source for nroff, HTML,
TeX, and other markup languages, as used for online
documentation. Translators exist for B<pod2text>, B<pod2html>,
B<pod2man> (that's for nroff(1) and troff(1)), B<pod2latex>, and
B<pod2fm>. Various others are available in CPAN.
=head2 Embedding Pods in Perl Modules
X<POD, embedding>
You can embed Pod documentation in your Perl modules and scripts. Start
your documentation with an empty line, a "=head1" command at the
beginning, and end it with a "=cut" command and an empty line. The
B<perl> executable will ignore the Pod text. You can place a Pod
statement where B<perl> expects the beginning of a new statement, but
not within a statement, as that would result in an error. See any of
the supplied library modules for examples.
If you're going to put your Pod at the end of the file, and you're using
an C<__END__> or C<__DATA__> cut mark, make sure to put an empty line there
before the first Pod command.
__END__
=head1 NAME
Time::Local - efficiently compute time from local and GMT time
Without that empty line before the "=head1", many translators wouldn't
have recognized the "=head1" as starting a Pod block.
=head2 Hints for Writing Pod
=over
=item *
X<podchecker> X<POD, validating>
The B<podchecker> command is provided for checking Pod syntax for errors
and warnings. For example, it checks for completely blank lines in
Pod blocks and for unknown commands and formatting codes. You should
still also pass your document through one or more translators and proofread
the result, or print out the result and proofread that. Some of the
problems found may be bugs in the translators, which you may or may not
wish to work around.
=item *
If you're more familiar with writing in HTML than with writing in Pod, you
can try your hand at writing documentation in simple HTML, and converting
it to Pod with the experimental L<Pod::HTML2Pod|Pod::HTML2Pod> module,
(available in CPAN), and looking at the resulting code. The experimental
L<Pod::PXML|Pod::PXML> module in CPAN might also be useful.
=item *
Many older Pod translators require the lines before every Pod
command and after every Pod command (including "=cut"!) to be a blank
line. Having something like this:
# - - - - - - - - - - - -
=item $firecracker->boom()
This noisily detonates the firecracker object.
=cut
sub boom {
...
...will make such Pod translators completely fail to see the Pod block
at all.
Instead, have it like this:
# - - - - - - - - - - - -
=item $firecracker->boom()
This noisily detonates the firecracker object.
=cut
sub boom {
...
=item *
Some older Pod translators require paragraphs (including command
paragraphs like "=head2 Functions") to be separated by I<completely>
empty lines. If you have an apparently empty line with some spaces
on it, this might not count as a separator for those translators, and
that could cause odd formatting.
=item *
Older translators might add wording around an LE<lt>E<gt> link, so that
C<LE<lt>Foo::BarE<gt>> may become "the Foo::Bar manpage", for example.
So you shouldn't write things like C<the LE<lt>fooE<gt>
documentation>, if you want the translated document to read sensibly.
Instead, write C<the LE<lt>Foo::Bar|Foo::BarE<gt> documentation> or
C<LE<lt>the Foo::Bar documentation|Foo::BarE<gt>>, to control how the
link comes out.
=item *
Going past the 70th column in a verbatim block might be ungracefully
wrapped by some formatters.
=back
=head1 SEE ALSO
L<perlpodspec>, L<perlsyn/"PODs: Embedded Documentation">,
L<perlnewmod>, L<perldoc>, L<pod2html>, L<pod2man>, L<podchecker>.
=head1 AUTHOR
Larry Wall, Sean M. Burke
=cut
PK y3�Z� �� �� perl5004delta.podnu �[��� =head1 NAME
perl5004delta - what's new for perl5.004
=head1 DESCRIPTION
This document describes differences between the 5.003 release (as
documented in I<Programming Perl>, second edition--the Camel Book) and
this one.
=head1 Supported Environments
Perl5.004 builds out of the box on Unix, Plan 9, LynxOS, VMS, OS/2,
QNX, AmigaOS, and Windows NT. Perl runs on Windows 95 as well, but it
cannot be built there, for lack of a reasonable command interpreter.
=head1 Core Changes
Most importantly, many bugs were fixed, including several security
problems. See the F<Changes> file in the distribution for details.
=head2 List assignment to %ENV works
C<%ENV = ()> and C<%ENV = @list> now work as expected (except on VMS
where it generates a fatal error).
=head2 Change to "Can't locate Foo.pm in @INC" error
The error "Can't locate Foo.pm in @INC" now lists the contents of @INC
for easier debugging.
=head2 Compilation option: Binary compatibility with 5.003
There is a new Configure question that asks if you want to maintain
binary compatibility with Perl 5.003. If you choose binary
compatibility, you do not have to recompile your extensions, but you
might have symbol conflicts if you embed Perl in another application,
just as in the 5.003 release. By default, binary compatibility
is preserved at the expense of symbol table pollution.
=head2 $PERL5OPT environment variable
You may now put Perl options in the $PERL5OPT environment variable.
Unless Perl is running with taint checks, it will interpret this
variable as if its contents had appeared on a "#!perl" line at the
beginning of your script, except that hyphens are optional. PERL5OPT
may only be used to set the following switches: B<-[DIMUdmw]>.
=head2 Limitations on B<-M>, B<-m>, and B<-T> options
The C<-M> and C<-m> options are no longer allowed on the C<#!> line of
a script. If a script needs a module, it should invoke it with the
C<use> pragma.
The B<-T> option is also forbidden on the C<#!> line of a script,
unless it was present on the Perl command line. Due to the way C<#!>
works, this usually means that B<-T> must be in the first argument.
Thus:
#!/usr/bin/perl -T -w
will probably work for an executable script invoked as C<scriptname>,
while:
#!/usr/bin/perl -w -T
will probably fail under the same conditions. (Non-Unix systems will
probably not follow this rule.) But C<perl scriptname> is guaranteed
to fail, since then there is no chance of B<-T> being found on the
command line before it is found on the C<#!> line.
=head2 More precise warnings
If you removed the B<-w> option from your Perl 5.003 scripts because it
made Perl too verbose, we recommend that you try putting it back when
you upgrade to Perl 5.004. Each new perl version tends to remove some
undesirable warnings, while adding new warnings that may catch bugs in
your scripts.
=head2 Deprecated: Inherited C<AUTOLOAD> for non-methods
Before Perl 5.004, C<AUTOLOAD> functions were looked up as methods
(using the C<@ISA> hierarchy), even when the function to be autoloaded
was called as a plain function (e.g. C<Foo::bar()>), not a method
(e.g. C<< Foo->bar() >> or C<< $obj->bar() >>).
Perl 5.005 will use method lookup only for methods' C<AUTOLOAD>s.
However, there is a significant base of existing code that may be using
the old behavior. So, as an interim step, Perl 5.004 issues an optional
warning when a non-method uses an inherited C<AUTOLOAD>.
The simple rule is: Inheritance will not work when autoloading
non-methods. The simple fix for old code is: In any module that used to
depend on inheriting C<AUTOLOAD> for non-methods from a base class named
C<BaseClass>, execute C<*AUTOLOAD = \&BaseClass::AUTOLOAD> during startup.
=head2 Previously deprecated %OVERLOAD is no longer usable
Using %OVERLOAD to define overloading was deprecated in 5.003.
Overloading is now defined using the overload pragma. %OVERLOAD is
still used internally but should not be used by Perl scripts. See
L<overload> for more details.
=head2 Subroutine arguments created only when they're modified
In Perl 5.004, nonexistent array and hash elements used as subroutine
parameters are brought into existence only if they are actually
assigned to (via C<@_>).
Earlier versions of Perl vary in their handling of such arguments.
Perl versions 5.002 and 5.003 always brought them into existence.
Perl versions 5.000 and 5.001 brought them into existence only if
they were not the first argument (which was almost certainly a bug).
Earlier versions of Perl never brought them into existence.
For example, given this code:
undef @a; undef %a;
sub show { print $_[0] };
sub change { $_[0]++ };
show($a[2]);
change($a{b});
After this code executes in Perl 5.004, $a{b} exists but $a[2] does
not. In Perl 5.002 and 5.003, both $a{b} and $a[2] would have existed
(but $a[2]'s value would have been undefined).
=head2 Group vector changeable with C<$)>
The C<$)> special variable has always (well, in Perl 5, at least)
reflected not only the current effective group, but also the group list
as returned by the C<getgroups()> C function (if there is one).
However, until this release, there has not been a way to call the
C<setgroups()> C function from Perl.
In Perl 5.004, assigning to C<$)> is exactly symmetrical with examining
it: The first number in its string value is used as the effective gid;
if there are any numbers after the first one, they are passed to the
C<setgroups()> C function (if there is one).
=head2 Fixed parsing of $$<digit>, &$<digit>, etc.
Perl versions before 5.004 misinterpreted any type marker followed by
"$" and a digit. For example, "$$0" was incorrectly taken to mean
"${$}0" instead of "${$0}". This bug is (mostly) fixed in Perl 5.004.
However, the developers of Perl 5.004 could not fix this bug completely,
because at least two widely-used modules depend on the old meaning of
"$$0" in a string. So Perl 5.004 still interprets "$$<digit>" in the
old (broken) way inside strings; but it generates this message as a
warning. And in Perl 5.005, this special treatment will cease.
=head2 Fixed localization of $<digit>, $&, etc.
Perl versions before 5.004 did not always properly localize the
regex-related special variables. Perl 5.004 does localize them, as
the documentation has always said it should. This may result in $1,
$2, etc. no longer being set where existing programs use them.
=head2 No resetting of $. on implicit close
The documentation for Perl 5.0 has always stated that C<$.> is I<not>
reset when an already-open file handle is reopened with no intervening
call to C<close>. Due to a bug, perl versions 5.000 through 5.003
I<did> reset C<$.> under that circumstance; Perl 5.004 does not.
=head2 C<wantarray> may return undef
The C<wantarray> operator returns true if a subroutine is expected to
return a list, and false otherwise. In Perl 5.004, C<wantarray> can
also return the undefined value if a subroutine's return value will
not be used at all, which allows subroutines to avoid a time-consuming
calculation of a return value if it isn't going to be used.
=head2 C<eval EXPR> determines value of EXPR in scalar context
Perl (version 5) used to determine the value of EXPR inconsistently,
sometimes incorrectly using the surrounding context for the determination.
Now, the value of EXPR (before being parsed by eval) is always determined in
a scalar context. Once parsed, it is executed as before, by providing
the context that the scope surrounding the eval provided. This change
makes the behavior Perl4 compatible, besides fixing bugs resulting from
the inconsistent behavior. This program:
@a = qw(time now is time);
print eval @a;
print '|', scalar eval @a;
used to print something like "timenowis881399109|4", but now (and in perl4)
prints "4|4".
=head2 Changes to tainting checks
A bug in previous versions may have failed to detect some insecure
conditions when taint checks are turned on. (Taint checks are used
in setuid or setgid scripts, or when explicitly turned on with the
C<-T> invocation option.) Although it's unlikely, this may cause a
previously-working script to now fail, which should be construed
as a blessing since that indicates a potentially-serious security
hole was just plugged.
The new restrictions when tainting include:
=over 4
=item No glob() or <*>
These operators may spawn the C shell (csh), which cannot be made
safe. This restriction will be lifted in a future version of Perl
when globbing is implemented without the use of an external program.
=item No spawning if tainted $CDPATH, $ENV, $BASH_ENV
These environment variables may alter the behavior of spawned programs
(especially shells) in ways that subvert security. So now they are
treated as dangerous, in the manner of $IFS and $PATH.
=item No spawning if tainted $TERM doesn't look like a terminal name
Some termcap libraries do unsafe things with $TERM. However, it would be
unnecessarily harsh to treat all $TERM values as unsafe, since only shell
metacharacters can cause trouble in $TERM. So a tainted $TERM is
considered to be safe if it contains only alphanumerics, underscores,
dashes, and colons, and unsafe if it contains other characters (including
whitespace).
=back
=head2 New Opcode module and revised Safe module
A new Opcode module supports the creation, manipulation and
application of opcode masks. The revised Safe module has a new API
and is implemented using the new Opcode module. Please read the new
Opcode and Safe documentation.
=head2 Embedding improvements
In older versions of Perl it was not possible to create more than one
Perl interpreter instance inside a single process without leaking like a
sieve and/or crashing. The bugs that caused this behavior have all been
fixed. However, you still must take care when embedding Perl in a C
program. See the updated perlembed manpage for tips on how to manage
your interpreters.
=head2 Internal change: FileHandle class based on IO::* classes
File handles are now stored internally as type IO::Handle. The
FileHandle module is still supported for backwards compatibility, but
it is now merely a front end to the IO::* modules, specifically
IO::Handle, IO::Seekable, and IO::File. We suggest, but do not
require, that you use the IO::* modules in new code.
In harmony with this change, C<*GLOB{FILEHANDLE}> is now just a
backward-compatible synonym for C<*GLOB{IO}>.
=head2 Internal change: PerlIO abstraction interface
It is now possible to build Perl with AT&T's sfio IO package
instead of stdio. See L<perlapio> for more details, and
the F<INSTALL> file for how to use it.
=head2 New and changed syntax
=over 4
=item $coderef->(PARAMS)
A subroutine reference may now be suffixed with an arrow and a
(possibly empty) parameter list. This syntax denotes a call of the
referenced subroutine, with the given parameters (if any).
This new syntax follows the pattern of S<C<< $hashref->{FOO} >>> and
S<C<< $aryref->[$foo] >>>: You may now write S<C<&$subref($foo)>> as
S<C<< $subref->($foo) >>>. All these arrow terms may be chained;
thus, S<C<< &{$table->{FOO}}($bar) >>> may now be written
S<C<< $table->{FOO}->($bar) >>>.
=back
=head2 New and changed builtin constants
=over 4
=item __PACKAGE__
The current package name at compile time, or the undefined value if
there is no current package (due to a C<package;> directive). Like
C<__FILE__> and C<__LINE__>, C<__PACKAGE__> does I<not> interpolate
into strings.
=back
=head2 New and changed builtin variables
=over 4
=item $^E
Extended error message on some platforms. (Also known as
$EXTENDED_OS_ERROR if you C<use English>).
=item $^H
The current set of syntax checks enabled by C<use strict>. See the
documentation of C<strict> for more details. Not actually new, but
newly documented.
Because it is intended for internal use by Perl core components,
there is no C<use English> long name for this variable.
=item $^M
By default, running out of memory it is not trappable. However, if
compiled for this, Perl may use the contents of C<$^M> as an emergency
pool after die()ing with this message. Suppose that your Perl were
compiled with -DPERL_EMERGENCY_SBRK and used Perl's malloc. Then
$^M = 'a' x (1<<16);
would allocate a 64K buffer for use when in emergency.
See the F<INSTALL> file for information on how to enable this option.
As a disincentive to casual use of this advanced feature,
there is no C<use English> long name for this variable.
=back
=head2 New and changed builtin functions
=over 4
=item delete on slices
This now works. (e.g. C<delete @ENV{'PATH', 'MANPATH'}>)
=item flock
is now supported on more platforms, prefers fcntl to lockf when
emulating, and always flushes before (un)locking.
=item printf and sprintf
Perl now implements these functions itself; it doesn't use the C
library function sprintf() any more, except for floating-point
numbers, and even then only known flags are allowed. As a result, it
is now possible to know which conversions and flags will work, and
what they will do.
The new conversions in Perl's sprintf() are:
%i a synonym for %d
%p a pointer (the address of the Perl value, in hexadecimal)
%n special: *stores* the number of characters output so far
into the next variable in the parameter list
The new flags that go between the C<%> and the conversion are:
# prefix octal with "0", hex with "0x"
h interpret integer as C type "short" or "unsigned short"
V interpret integer as Perl's standard integer type
Also, where a number would appear in the flags, an asterisk ("*") may
be used instead, in which case Perl uses the next item in the
parameter list as the given number (that is, as the field width or
precision). If a field width obtained through "*" is negative, it has
the same effect as the '-' flag: left-justification.
See L<perlfunc/sprintf> for a complete list of conversion and flags.
=item keys as an lvalue
As an lvalue, C<keys> allows you to increase the number of hash buckets
allocated for the given hash. This can gain you a measure of efficiency if
you know the hash is going to get big. (This is similar to pre-extending
an array by assigning a larger number to $#array.) If you say
keys %hash = 200;
then C<%hash> will have at least 200 buckets allocated for it. These
buckets will be retained even if you do C<%hash = ()>; use C<undef
%hash> if you want to free the storage while C<%hash> is still in scope.
You can't shrink the number of buckets allocated for the hash using
C<keys> in this way (but you needn't worry about doing this by accident,
as trying has no effect).
=item my() in Control Structures
You can now use my() (with or without the parentheses) in the control
expressions of control structures such as:
while (defined(my $line = <>)) {
$line = lc $line;
} continue {
print $line;
}
if ((my $answer = <STDIN>) =~ /^y(es)?$/i) {
user_agrees();
} elsif ($answer =~ /^n(o)?$/i) {
user_disagrees();
} else {
chomp $answer;
die "`$answer' is neither `yes' nor `no'";
}
Also, you can declare a foreach loop control variable as lexical by
preceding it with the word "my". For example, in:
foreach my $i (1, 2, 3) {
some_function();
}
$i is a lexical variable, and the scope of $i extends to the end of
the loop, but not beyond it.
Note that you still cannot use my() on global punctuation variables
such as $_ and the like.
=item pack() and unpack()
A new format 'w' represents a BER compressed integer (as defined in
ASN.1). Its format is a sequence of one or more bytes, each of which
provides seven bits of the total value, with the most significant
first. Bit eight of each byte is set, except for the last byte, in
which bit eight is clear.
If 'p' or 'P' are given undef as values, they now generate a NULL
pointer.
Both pack() and unpack() now fail when their templates contain invalid
types. (Invalid types used to be ignored.)
=item sysseek()
The new sysseek() operator is a variant of seek() that sets and gets the
file's system read/write position, using the lseek(2) system call. It is
the only reliable way to seek before using sysread() or syswrite(). Its
return value is the new position, or the undefined value on failure.
=item use VERSION
If the first argument to C<use> is a number, it is treated as a version
number instead of a module name. If the version of the Perl interpreter
is less than VERSION, then an error message is printed and Perl exits
immediately. Because C<use> occurs at compile time, this check happens
immediately during the compilation process, unlike C<require VERSION>,
which waits until runtime for the check. This is often useful if you
need to check the current Perl version before C<use>ing library modules
which have changed in incompatible ways from older versions of Perl.
(We try not to do this more than we have to.)
=item use Module VERSION LIST
If the VERSION argument is present between Module and LIST, then the
C<use> will call the VERSION method in class Module with the given
version as an argument. The default VERSION method, inherited from
the UNIVERSAL class, croaks if the given version is larger than the
value of the variable $Module::VERSION. (Note that there is not a
comma after VERSION!)
This version-checking mechanism is similar to the one currently used
in the Exporter module, but it is faster and can be used with modules
that don't use the Exporter. It is the recommended method for new
code.
=item prototype(FUNCTION)
Returns the prototype of a function as a string (or C<undef> if the
function has no prototype). FUNCTION is a reference to or the name of the
function whose prototype you want to retrieve.
(Not actually new; just never documented before.)
=item srand
The default seed for C<srand>, which used to be C<time>, has been changed.
Now it's a heady mix of difficult-to-predict system-dependent values,
which should be sufficient for most everyday purposes.
Previous to version 5.004, calling C<rand> without first calling C<srand>
would yield the same sequence of random numbers on most or all machines.
Now, when perl sees that you're calling C<rand> and haven't yet called
C<srand>, it calls C<srand> with the default seed. You should still call
C<srand> manually if your code might ever be run on a pre-5.004 system,
of course, or if you want a seed other than the default.
=item $_ as Default
Functions documented in the Camel to default to $_ now in
fact do, and all those that do are so documented in L<perlfunc>.
=item C<m//gc> does not reset search position on failure
The C<m//g> match iteration construct has always reset its target
string's search position (which is visible through the C<pos> operator)
when a match fails; as a result, the next C<m//g> match after a failure
starts again at the beginning of the string. With Perl 5.004, this
reset may be disabled by adding the "c" (for "continue") modifier,
i.e. C<m//gc>. This feature, in conjunction with the C<\G> zero-width
assertion, makes it possible to chain matches together. See L<perlop>
and L<perlre>.
=item C<m//x> ignores whitespace before ?*+{}
The C<m//x> construct has always been intended to ignore all unescaped
whitespace. However, before Perl 5.004, whitespace had the effect of
escaping repeat modifiers like "*" or "?"; for example, C</a *b/x> was
(mis)interpreted as C</a\*b/x>. This bug has been fixed in 5.004.
=item nested C<sub{}> closures work now
Prior to the 5.004 release, nested anonymous functions didn't work
right. They do now.
=item formats work right on changing lexicals
Just like anonymous functions that contain lexical variables
that change (like a lexical index variable for a C<foreach> loop),
formats now work properly. For example, this silently failed
before (printed only zeros), but is fine now:
my $i;
foreach $i ( 1 .. 10 ) {
write;
}
format =
my i is @#
$i
.
However, it still fails (without a warning) if the foreach is within a
subroutine:
my $i;
sub foo {
foreach $i ( 1 .. 10 ) {
write;
}
}
foo;
format =
my i is @#
$i
.
=back
=head2 New builtin methods
The C<UNIVERSAL> package automatically contains the following methods that
are inherited by all other classes:
=over 4
=item isa(CLASS)
C<isa> returns I<true> if its object is blessed into a subclass of C<CLASS>
C<isa> is also exportable and can be called as a sub with two arguments. This
allows the ability to check what a reference points to. Example:
use UNIVERSAL qw(isa);
if(isa($ref, 'ARRAY')) {
...
}
=item can(METHOD)
C<can> checks to see if its object has a method called C<METHOD>,
if it does then a reference to the sub is returned; if it does not then
I<undef> is returned.
=item VERSION( [NEED] )
C<VERSION> returns the version number of the class (package). If the
NEED argument is given then it will check that the current version (as
defined by the $VERSION variable in the given package) not less than
NEED; it will die if this is not the case. This method is normally
called as a class method. This method is called automatically by the
C<VERSION> form of C<use>.
use A 1.2 qw(some imported subs);
# implies:
A->VERSION(1.2);
=back
B<NOTE:> C<can> directly uses Perl's internal code for method lookup, and
C<isa> uses a very similar method and caching strategy. This may cause
strange effects if the Perl code dynamically changes @ISA in any package.
You may add other methods to the UNIVERSAL class via Perl or XS code.
You do not need to C<use UNIVERSAL> in order to make these methods
available to your program. This is necessary only if you wish to
have C<isa> available as a plain subroutine in the current package.
=head2 TIEHANDLE now supported
See L<perltie> for other kinds of tie()s.
=over 4
=item TIEHANDLE classname, LIST
This is the constructor for the class. That means it is expected to
return an object of some sort. The reference can be used to
hold some internal information.
sub TIEHANDLE {
print "<shout>\n";
my $i;
return bless \$i, shift;
}
=item PRINT this, LIST
This method will be triggered every time the tied handle is printed to.
Beyond its self reference it also expects the list that was passed to
the print function.
sub PRINT {
$r = shift;
$$r++;
return print join( $, => map {uc} @_), $\;
}
=item PRINTF this, LIST
This method will be triggered every time the tied handle is printed to
with the C<printf()> function.
Beyond its self reference it also expects the format and list that was
passed to the printf function.
sub PRINTF {
shift;
my $fmt = shift;
print sprintf($fmt, @_)."\n";
}
=item READ this LIST
This method will be called when the handle is read from via the C<read>
or C<sysread> functions.
sub READ {
$r = shift;
my($buf,$len,$offset) = @_;
print "READ called, \$buf=$buf, \$len=$len, \$offset=$offset";
}
=item READLINE this
This method will be called when the handle is read from. The method
should return undef when there is no more data.
sub READLINE {
$r = shift;
return "PRINT called $$r times\n"
}
=item GETC this
This method will be called when the C<getc> function is called.
sub GETC { print "Don't GETC, Get Perl"; return "a"; }
=item DESTROY this
As with the other types of ties, this method will be called when the
tied handle is about to be destroyed. This is useful for debugging and
possibly for cleaning up.
sub DESTROY {
print "</shout>\n";
}
=back
=head2 Malloc enhancements
If perl is compiled with the malloc included with the perl distribution
(that is, if C<perl -V:d_mymalloc> is 'define') then you can print
memory statistics at runtime by running Perl thusly:
env PERL_DEBUG_MSTATS=2 perl your_script_here
The value of 2 means to print statistics after compilation and on
exit; with a value of 1, the statistics are printed only on exit.
(If you want the statistics at an arbitrary time, you'll need to
install the optional module Devel::Peek.)
Three new compilation flags are recognized by malloc.c. (They have no
effect if perl is compiled with system malloc().)
=over 4
=item -DPERL_EMERGENCY_SBRK
If this macro is defined, running out of memory need not be a fatal
error: a memory pool can allocated by assigning to the special
variable C<$^M>. See L</"$^M">.
=item -DPACK_MALLOC
Perl memory allocation is by bucket with sizes close to powers of two.
Because of these malloc overhead may be big, especially for data of
size exactly a power of two. If C<PACK_MALLOC> is defined, perl uses
a slightly different algorithm for small allocations (up to 64 bytes
long), which makes it possible to have overhead down to 1 byte for
allocations which are powers of two (and appear quite often).
Expected memory savings (with 8-byte alignment in C<alignbytes>) is
about 20% for typical Perl usage. Expected slowdown due to additional
malloc overhead is in fractions of a percent (hard to measure, because
of the effect of saved memory on speed).
=item -DTWO_POT_OPTIMIZE
Similarly to C<PACK_MALLOC>, this macro improves allocations of data
with size close to a power of two; but this works for big allocations
(starting with 16K by default). Such allocations are typical for big
hashes and special-purpose scripts, especially image processing.
On recent systems, the fact that perl requires 2M from system for 1M
allocation will not affect speed of execution, since the tail of such
a chunk is not going to be touched (and thus will not require real
memory). However, it may result in a premature out-of-memory error.
So if you will be manipulating very large blocks with sizes close to
powers of two, it would be wise to define this macro.
Expected saving of memory is 0-100% (100% in applications which
require most memory in such 2**n chunks); expected slowdown is
negligible.
=back
=head2 Miscellaneous efficiency enhancements
Functions that have an empty prototype and that do nothing but return
a fixed value are now inlined (e.g. C<sub PI () { 3.14159 }>).
Each unique hash key is only allocated once, no matter how many hashes
have an entry with that key. So even if you have 100 copies of the
same hash, the hash keys never have to be reallocated.
=head1 Support for More Operating Systems
Support for the following operating systems is new in Perl 5.004.
=head2 Win32
Perl 5.004 now includes support for building a "native" perl under
Windows NT, using the Microsoft Visual C++ compiler (versions 2.0
and above) or the Borland C++ compiler (versions 5.02 and above).
The resulting perl can be used under Windows 95 (if it
is installed in the same directory locations as it got installed
in Windows NT). This port includes support for perl extension
building tools like L<ExtUtils::MakeMaker> and L<h2xs>, so that many extensions
available on the Comprehensive Perl Archive Network (CPAN) can now be
readily built under Windows NT. See http://www.perl.com/ for more
information on CPAN and F<README.win32> in the perl distribution for more
details on how to get started with building this port.
There is also support for building perl under the Cygwin32 environment.
Cygwin32 is a set of GNU tools that make it possible to compile and run
many Unix programs under Windows NT by providing a mostly Unix-like
interface for compilation and execution. See F<README.cygwin32> in the
perl distribution for more details on this port and how to obtain the
Cygwin32 toolkit.
=head2 Plan 9
See F<README.plan9> in the perl distribution.
=head2 QNX
See F<README.qnx> in the perl distribution.
=head2 AmigaOS
See F<README.amigaos> in the perl distribution.
=head1 Pragmata
Six new pragmatic modules exist:
=over 4
=item use autouse MODULE => qw(sub1 sub2 sub3)
Defers C<require MODULE> until someone calls one of the specified
subroutines (which must be exported by MODULE). This pragma should be
used with caution, and only when necessary.
=item use blib
=item use blib 'dir'
Looks for MakeMaker-like I<'blib'> directory structure starting in
I<dir> (or current directory) and working back up to five levels of
parent directories.
Intended for use on command line with B<-M> option as a way of testing
arbitrary scripts against an uninstalled version of a package.
=item use constant NAME => VALUE
Provides a convenient interface for creating compile-time constants,
See L<perlsub/"Constant Functions">.
=item use locale
Tells the compiler to enable (or disable) the use of POSIX locales for
builtin operations.
When C<use locale> is in effect, the current LC_CTYPE locale is used
for regular expressions and case mapping; LC_COLLATE for string
ordering; and LC_NUMERIC for numeric formatting in printf and sprintf
(but B<not> in print). LC_NUMERIC is always used in write, since
lexical scoping of formats is problematic at best.
Each C<use locale> or C<no locale> affects statements to the end of
the enclosing BLOCK or, if not inside a BLOCK, to the end of the
current file. Locales can be switched and queried with
POSIX::setlocale().
See L<perllocale> for more information.
=item use ops
Disable unsafe opcodes, or any named opcodes, when compiling Perl code.
=item use vmsish
Enable VMS-specific language features. Currently, there are three
VMS-specific features available: 'status', which makes C<$?> and
C<system> return genuine VMS status values instead of emulating POSIX;
'exit', which makes C<exit> take a genuine VMS status value instead of
assuming that C<exit 1> is an error; and 'time', which makes all times
relative to the local time zone, in the VMS tradition.
=back
=head1 Modules
=head2 Required Updates
Though Perl 5.004 is compatible with almost all modules that work
with Perl 5.003, there are a few exceptions:
Module Required Version for Perl 5.004
------ -------------------------------
Filter Filter-1.12
LWP libwww-perl-5.08
Tk Tk400.202 (-w makes noise)
Also, the majordomo mailing list program, version 1.94.1, doesn't work
with Perl 5.004 (nor with perl 4), because it executes an invalid
regular expression. This bug is fixed in majordomo version 1.94.2.
=head2 Installation directories
The I<installperl> script now places the Perl source files for
extensions in the architecture-specific library directory, which is
where the shared libraries for extensions have always been. This
change is intended to allow administrators to keep the Perl 5.004
library directory unchanged from a previous version, without running
the risk of binary incompatibility between extensions' Perl source and
shared libraries.
=head2 Module information summary
Brand new modules, arranged by topic rather than strictly
alphabetically:
CGI.pm Web server interface ("Common Gateway Interface")
CGI/Apache.pm Support for Apache's Perl module
CGI/Carp.pm Log server errors with helpful context
CGI/Fast.pm Support for FastCGI (persistent server process)
CGI/Push.pm Support for server push
CGI/Switch.pm Simple interface for multiple server types
CPAN Interface to Comprehensive Perl Archive Network
CPAN::FirstTime Utility for creating CPAN configuration file
CPAN::Nox Runs CPAN while avoiding compiled extensions
IO.pm Top-level interface to IO::* classes
IO/File.pm IO::File extension Perl module
IO/Handle.pm IO::Handle extension Perl module
IO/Pipe.pm IO::Pipe extension Perl module
IO/Seekable.pm IO::Seekable extension Perl module
IO/Select.pm IO::Select extension Perl module
IO/Socket.pm IO::Socket extension Perl module
Opcode.pm Disable named opcodes when compiling Perl code
ExtUtils/Embed.pm Utilities for embedding Perl in C programs
ExtUtils/testlib.pm Fixes up @INC to use just-built extension
FindBin.pm Find path of currently executing program
Class/Struct.pm Declare struct-like datatypes as Perl classes
File/stat.pm By-name interface to Perl's builtin stat
Net/hostent.pm By-name interface to Perl's builtin gethost*
Net/netent.pm By-name interface to Perl's builtin getnet*
Net/protoent.pm By-name interface to Perl's builtin getproto*
Net/servent.pm By-name interface to Perl's builtin getserv*
Time/gmtime.pm By-name interface to Perl's builtin gmtime
Time/localtime.pm By-name interface to Perl's builtin localtime
Time/tm.pm Internal object for Time::{gm,local}time
User/grent.pm By-name interface to Perl's builtin getgr*
User/pwent.pm By-name interface to Perl's builtin getpw*
Tie/RefHash.pm Base class for tied hashes with references as keys
UNIVERSAL.pm Base class for *ALL* classes
=head2 Fcntl
New constants in the existing Fcntl modules are now supported,
provided that your operating system happens to support them:
F_GETOWN F_SETOWN
O_ASYNC O_DEFER O_DSYNC O_FSYNC O_SYNC
O_EXLOCK O_SHLOCK
These constants are intended for use with the Perl operators sysopen()
and fcntl() and the basic database modules like SDBM_File. For the
exact meaning of these and other Fcntl constants please refer to your
operating system's documentation for fcntl() and open().
In addition, the Fcntl module now provides these constants for use
with the Perl operator flock():
LOCK_SH LOCK_EX LOCK_NB LOCK_UN
These constants are defined in all environments (because where there is
no flock() system call, Perl emulates it). However, for historical
reasons, these constants are not exported unless they are explicitly
requested with the ":flock" tag (e.g. C<use Fcntl ':flock'>).
=head2 IO
The IO module provides a simple mechanism to load all the IO modules at one
go. Currently this includes:
IO::Handle
IO::Seekable
IO::File
IO::Pipe
IO::Socket
For more information on any of these modules, please see its
respective documentation.
=head2 Math::Complex
The Math::Complex module has been totally rewritten, and now supports
more operations. These are overloaded:
+ - * / ** <=> neg ~ abs sqrt exp log sin cos atan2 "" (stringify)
And these functions are now exported:
pi i Re Im arg
log10 logn ln cbrt root
tan
csc sec cot
asin acos atan
acsc asec acot
sinh cosh tanh
csch sech coth
asinh acosh atanh
acsch asech acoth
cplx cplxe
=head2 Math::Trig
This new module provides a simpler interface to parts of Math::Complex for
those who need trigonometric functions only for real numbers.
=head2 DB_File
There have been quite a few changes made to DB_File. Here are a few of
the highlights:
=over 4
=item *
Fixed a handful of bugs.
=item *
By public demand, added support for the standard hash function exists().
=item *
Made it compatible with Berkeley DB 1.86.
=item *
Made negative subscripts work with RECNO interface.
=item *
Changed the default flags from O_RDWR to O_CREAT|O_RDWR and the default
mode from 0640 to 0666.
=item *
Made DB_File automatically import the open() constants (O_RDWR,
O_CREAT etc.) from Fcntl, if available.
=item *
Updated documentation.
=back
Refer to the HISTORY section in DB_File.pm for a complete list of
changes. Everything after DB_File 1.01 has been added since 5.003.
=head2 Net::Ping
Major rewrite - support added for both udp echo and real icmp pings.
=head2 Object-oriented overrides for builtin operators
Many of the Perl builtins returning lists now have
object-oriented overrides. These are:
File::stat
Net::hostent
Net::netent
Net::protoent
Net::servent
Time::gmtime
Time::localtime
User::grent
User::pwent
For example, you can now say
use File::stat;
use User::pwent;
$his = (stat($filename)->st_uid == pwent($whoever)->pw_uid);
=head1 Utility Changes
=head2 pod2html
=over 4
=item Sends converted HTML to standard output
The I<pod2html> utility included with Perl 5.004 is entirely new.
By default, it sends the converted HTML to its standard output,
instead of writing it to a file like Perl 5.003's I<pod2html> did.
Use the B<--outfile=FILENAME> option to write to a file.
=back
=head2 xsubpp
=over 4
=item C<void> XSUBs now default to returning nothing
Due to a documentation/implementation bug in previous versions of
Perl, XSUBs with a return type of C<void> have actually been
returning one value. Usually that value was the GV for the XSUB,
but sometimes it was some already freed or reused value, which would
sometimes lead to program failure.
In Perl 5.004, if an XSUB is declared as returning C<void>, it
actually returns no value, i.e. an empty list (though there is a
backward-compatibility exception; see below). If your XSUB really
does return an SV, you should give it a return type of C<SV *>.
For backward compatibility, I<xsubpp> tries to guess whether a
C<void> XSUB is really C<void> or if it wants to return an C<SV *>.
It does so by examining the text of the XSUB: if I<xsubpp> finds
what looks like an assignment to C<ST(0)>, it assumes that the
XSUB's return type is really C<SV *>.
=back
=head1 C Language API Changes
=over 4
=item C<gv_fetchmethod> and C<perl_call_sv>
The C<gv_fetchmethod> function finds a method for an object, just like
in Perl 5.003. The GV it returns may be a method cache entry.
However, in Perl 5.004, method cache entries are not visible to users;
therefore, they can no longer be passed directly to C<perl_call_sv>.
Instead, you should use the C<GvCV> macro on the GV to extract its CV,
and pass the CV to C<perl_call_sv>.
The most likely symptom of passing the result of C<gv_fetchmethod> to
C<perl_call_sv> is Perl's producing an "Undefined subroutine called"
error on the I<second> call to a given method (since there is no cache
on the first call).
=item C<perl_eval_pv>
A new function handy for eval'ing strings of Perl code inside C code.
This function returns the value from the eval statement, which can
be used instead of fetching globals from the symbol table. See
L<perlguts>, L<perlembed> and L<perlcall> for details and examples.
=item Extended API for manipulating hashes
Internal handling of hash keys has changed. The old hashtable API is
still fully supported, and will likely remain so. The additions to the
API allow passing keys as C<SV*>s, so that C<tied> hashes can be given
real scalars as keys rather than plain strings (nontied hashes still
can only use strings as keys). New extensions must use the new hash
access functions and macros if they wish to use C<SV*> keys. These
additions also make it feasible to manipulate C<HE*>s (hash entries),
which can be more efficient. See L<perlguts> for details.
=back
=head1 Documentation Changes
Many of the base and library pods were updated. These
new pods are included in section 1:
=over 4
=item L<perldelta>
This document.
=item L<perlfaq>
Frequently asked questions.
=item L<perllocale>
Locale support (internationalization and localization).
=item L<perltoot>
Tutorial on Perl OO programming.
=item L<perlapio>
Perl internal IO abstraction interface.
=item L<perlmodlib>
Perl module library and recommended practice for module creation.
Extracted from L<perlmod> (which is much smaller as a result).
=item L<perldebug>
Although not new, this has been massively updated.
=item L<perlsec>
Although not new, this has been massively updated.
=back
=head1 New Diagnostics
Several new conditions will trigger warnings that were
silent before. Some only affect certain platforms.
The following new warnings and errors outline these.
These messages are classified as follows (listed in
increasing order of desperation):
(W) A warning (optional).
(D) A deprecation (optional).
(S) A severe warning (mandatory).
(F) A fatal error (trappable).
(P) An internal error you should never see (trappable).
(X) A very fatal error (nontrappable).
(A) An alien error message (not generated by Perl).
=over 4
=item "my" variable %s masks earlier declaration in same scope
(W) A lexical variable has been redeclared in the same scope, effectively
eliminating all access to the previous instance. This is almost always
a typographical error. Note that the earlier variable will still exist
until the end of the scope or until all closure referents to it are
destroyed.
=item %s argument is not a HASH element or slice
(F) The argument to delete() must be either a hash element, such as
$foo{$bar}
$ref->[12]->{"susie"}
or a hash slice, such as
@foo{$bar, $baz, $xyzzy}
@{$ref->[12]}{"susie", "queue"}
=item Allocation too large: %lx
(X) You can't allocate more than 64K on an MS-DOS machine.
=item Allocation too large
(F) You can't allocate more than 2^31+"small amount" bytes.
=item Applying %s to %s will act on scalar(%s)
(W) The pattern match (//), substitution (s///), and transliteration (tr///)
operators work on scalar values. If you apply one of them to an array
or a hash, it will convert the array or hash to a scalar value (the
length of an array or the population info of a hash) and then work on
that scalar value. This is probably not what you meant to do. See
L<perlfunc/grep> and L<perlfunc/map> for alternatives.
=item Attempt to free nonexistent shared string
(P) Perl maintains a reference counted internal table of strings to
optimize the storage and access of hash keys and other strings. This
indicates someone tried to decrement the reference count of a string
that can no longer be found in the table.
=item Attempt to use reference as lvalue in substr
(W) You supplied a reference as the first argument to substr() used
as an lvalue, which is pretty strange. Perhaps you forgot to
dereference it first. See L<perlfunc/substr>.
=item Bareword "%s" refers to nonexistent package
(W) You used a qualified bareword of the form C<Foo::>, but
the compiler saw no other uses of that namespace before that point.
Perhaps you need to predeclare a package?
=item Can't redefine active sort subroutine %s
(F) Perl optimizes the internal handling of sort subroutines and keeps
pointers into them. You tried to redefine one such sort subroutine when it
was currently active, which is not allowed. If you really want to do
this, you should write C<sort { &func } @x> instead of C<sort func @x>.
=item Can't use bareword ("%s") as %s ref while "strict refs" in use
(F) Only hard references are allowed by "strict refs". Symbolic references
are disallowed. See L<perlref>.
=item Cannot resolve method `%s' overloading `%s' in package `%s'
(P) Internal error trying to resolve overloading specified by a method
name (as opposed to a subroutine reference).
=item Constant subroutine %s redefined
(S) You redefined a subroutine which had previously been eligible for
inlining. See L<perlsub/"Constant Functions"> for commentary and
workarounds.
=item Constant subroutine %s undefined
(S) You undefined a subroutine which had previously been eligible for
inlining. See L<perlsub/"Constant Functions"> for commentary and
workarounds.
=item Copy method did not return a reference
(F) The method which overloads "=" is buggy. See L<overload/Copy Constructor>.
=item Died
(F) You passed die() an empty string (the equivalent of C<die "">) or
you called it with no args and both C<$@> and C<$_> were empty.
=item Exiting pseudo-block via %s
(W) You are exiting a rather special block construct (like a sort block or
subroutine) by unconventional means, such as a goto, or a loop control
statement. See L<perlfunc/sort>.
=item Identifier too long
(F) Perl limits identifiers (names for variables, functions, etc.) to
252 characters for simple names, somewhat more for compound names (like
C<$A::B>). You've exceeded Perl's limits. Future versions of Perl are
likely to eliminate these arbitrary limitations.
=item Illegal character %s (carriage return)
(F) A carriage return character was found in the input. This is an
error, and not a warning, because carriage return characters can break
multi-line strings, including here documents (e.g., C<print <<EOF;>).
=item Illegal switch in PERL5OPT: %s
(X) The PERL5OPT environment variable may only be used to set the
following switches: B<-[DIMUdmw]>.
=item Integer overflow in hex number
(S) The literal hex number you have specified is too big for your
architecture. On a 32-bit architecture the largest hex literal is
0xFFFFFFFF.
=item Integer overflow in octal number
(S) The literal octal number you have specified is too big for your
architecture. On a 32-bit architecture the largest octal literal is
037777777777.
=item internal error: glob failed
(P) Something went wrong with the external program(s) used for C<glob>
and C<< <*.c> >>. This may mean that your csh (C shell) is
broken. If so, you should change all of the csh-related variables in
config.sh: If you have tcsh, make the variables refer to it as if it
were csh (e.g. C<full_csh='/usr/bin/tcsh'>); otherwise, make them all
empty (except that C<d_csh> should be C<'undef'>) so that Perl will
think csh is missing. In either case, after editing config.sh, run
C<./Configure -S> and rebuild Perl.
=item Invalid conversion in %s: "%s"
(W) Perl does not understand the given format conversion.
See L<perlfunc/sprintf>.
=item Invalid type in pack: '%s'
(F) The given character is not a valid pack type. See L<perlfunc/pack>.
=item Invalid type in unpack: '%s'
(F) The given character is not a valid unpack type. See L<perlfunc/unpack>.
=item Name "%s::%s" used only once: possible typo
(W) Typographical errors often show up as unique variable names.
If you had a good reason for having a unique name, then just mention
it again somehow to suppress the message (the C<use vars> pragma is
provided for just this purpose).
=item Null picture in formline
(F) The first argument to formline must be a valid format picture
specification. It was found to be empty, which probably means you
supplied it an uninitialized value. See L<perlform>.
=item Offset outside string
(F) You tried to do a read/write/send/recv operation with an offset
pointing outside the buffer. This is difficult to imagine.
The sole exception to this is that C<sysread()>ing past the buffer
will extend the buffer and zero pad the new area.
=item Out of memory!
(X|F) The malloc() function returned 0, indicating there was insufficient
remaining memory (or virtual memory) to satisfy the request.
The request was judged to be small, so the possibility to trap it
depends on the way Perl was compiled. By default it is not trappable.
However, if compiled for this, Perl may use the contents of C<$^M> as
an emergency pool after die()ing with this message. In this case the
error is trappable I<once>.
=item Out of memory during request for %s
(F) The malloc() function returned 0, indicating there was insufficient
remaining memory (or virtual memory) to satisfy the request. However,
the request was judged large enough (compile-time default is 64K), so
a possibility to shut down by trapping this error is granted.
=item panic: frexp
(P) The library function frexp() failed, making printf("%f") impossible.
=item Possible attempt to put comments in qw() list
(W) qw() lists contain items separated by whitespace; as with literal
strings, comment characters are not ignored, but are instead treated
as literal data. (You may have used different delimiters than the
parentheses shown here; braces are also frequently used.)
You probably wrote something like this:
@list = qw(
a # a comment
b # another comment
);
when you should have written this:
@list = qw(
a
b
);
If you really want comments, build your list the
old-fashioned way, with quotes and commas:
@list = (
'a', # a comment
'b', # another comment
);
=item Possible attempt to separate words with commas
(W) qw() lists contain items separated by whitespace; therefore commas
aren't needed to separate the items. (You may have used different
delimiters than the parentheses shown here; braces are also frequently
used.)
You probably wrote something like this:
qw! a, b, c !;
which puts literal commas into some of the list items. Write it without
commas if you don't want them to appear in your data:
qw! a b c !;
=item Scalar value @%s{%s} better written as $%s{%s}
(W) You've used a hash slice (indicated by @) to select a single element of
a hash. Generally it's better to ask for a scalar value (indicated by $).
The difference is that C<$foo{&bar}> always behaves like a scalar, both when
assigning to it and when evaluating its argument, while C<@foo{&bar}> behaves
like a list when you assign to it, and provides a list context to its
subscript, which can do weird things if you're expecting only one subscript.
=item Stub found while resolving method `%s' overloading `%s' in %s
(P) Overloading resolution over @ISA tree may be broken by importing stubs.
Stubs should never be implicitly created, but explicit calls to C<can>
may break this.
=item Too late for "B<-T>" option
(X) The #! line (or local equivalent) in a Perl script contains the
B<-T> option, but Perl was not invoked with B<-T> in its argument
list. This is an error because, by the time Perl discovers a B<-T> in
a script, it's too late to properly taint everything from the
environment. So Perl gives up.
=item untie attempted while %d inner references still exist
(W) A copy of the object returned from C<tie> (or C<tied>) was still
valid when C<untie> was called.
=item Unrecognized character %s
(F) The Perl parser has no idea what to do with the specified character
in your Perl script (or eval). Perhaps you tried to run a compressed
script, a binary program, or a directory as a Perl program.
=item Unsupported function fork
(F) Your version of executable does not support forking.
Note that under some systems, like OS/2, there may be different flavors of
Perl executables, some of which may support fork, some not. Try changing
the name you call Perl by to C<perl_>, C<perl__>, and so on.
=item Use of "$$<digit>" to mean "${$}<digit>" is deprecated
(D) Perl versions before 5.004 misinterpreted any type marker followed
by "$" and a digit. For example, "$$0" was incorrectly taken to mean
"${$}0" instead of "${$0}". This bug is (mostly) fixed in Perl 5.004.
However, the developers of Perl 5.004 could not fix this bug completely,
because at least two widely-used modules depend on the old meaning of
"$$0" in a string. So Perl 5.004 still interprets "$$<digit>" in the
old (broken) way inside strings; but it generates this message as a
warning. And in Perl 5.005, this special treatment will cease.
=item Value of %s can be "0"; test with defined()
(W) In a conditional expression, you used <HANDLE>, <*> (glob), C<each()>,
or C<readdir()> as a boolean value. Each of these constructs can return a
value of "0"; that would make the conditional expression false, which is
probably not what you intended. When using these constructs in conditional
expressions, test their values with the C<defined> operator.
=item Variable "%s" may be unavailable
(W) An inner (nested) I<anonymous> subroutine is inside a I<named>
subroutine, and outside that is another subroutine; and the anonymous
(innermost) subroutine is referencing a lexical variable defined in
the outermost subroutine. For example:
sub outermost { my $a; sub middle { sub { $a } } }
If the anonymous subroutine is called or referenced (directly or
indirectly) from the outermost subroutine, it will share the variable
as you would expect. But if the anonymous subroutine is called or
referenced when the outermost subroutine is not active, it will see
the value of the shared variable as it was before and during the
*first* call to the outermost subroutine, which is probably not what
you want.
In these circumstances, it is usually best to make the middle
subroutine anonymous, using the C<sub {}> syntax. Perl has specific
support for shared variables in nested anonymous subroutines; a named
subroutine in between interferes with this feature.
=item Variable "%s" will not stay shared
(W) An inner (nested) I<named> subroutine is referencing a lexical
variable defined in an outer subroutine.
When the inner subroutine is called, it will probably see the value of
the outer subroutine's variable as it was before and during the
*first* call to the outer subroutine; in this case, after the first
call to the outer subroutine is complete, the inner and outer
subroutines will no longer share a common value for the variable. In
other words, the variable will no longer be shared.
Furthermore, if the outer subroutine is anonymous and references a
lexical variable outside itself, then the outer and inner subroutines
will I<never> share the given variable.
This problem can usually be solved by making the inner subroutine
anonymous, using the C<sub {}> syntax. When inner anonymous subs that
reference variables in outer subroutines are called or referenced,
they are automatically rebound to the current values of such
variables.
=item Warning: something's wrong
(W) You passed warn() an empty string (the equivalent of C<warn "">) or
you called it with no args and C<$_> was empty.
=item Ill-formed logical name |%s| in prime_env_iter
(W) A warning peculiar to VMS. A logical name was encountered when preparing
to iterate over %ENV which violates the syntactic rules governing logical
names. Since it cannot be translated normally, it is skipped, and will not
appear in %ENV. This may be a benign occurrence, as some software packages
might directly modify logical name tables and introduce nonstandard names,
or it may indicate that a logical name table has been corrupted.
=item Got an error from DosAllocMem
(P) An error peculiar to OS/2. Most probably you're using an obsolete
version of Perl, and this should not happen anyway.
=item Malformed PERLLIB_PREFIX
(F) An error peculiar to OS/2. PERLLIB_PREFIX should be of the form
prefix1;prefix2
or
prefix1 prefix2
with nonempty prefix1 and prefix2. If C<prefix1> is indeed a prefix
of a builtin library search path, prefix2 is substituted. The error
may appear if components are not found, or are too long. See
"PERLLIB_PREFIX" in F<README.os2>.
=item PERL_SH_DIR too long
(F) An error peculiar to OS/2. PERL_SH_DIR is the directory to find the
C<sh>-shell in. See "PERL_SH_DIR" in F<README.os2>.
=item Process terminated by SIG%s
(W) This is a standard message issued by OS/2 applications, while *nix
applications die in silence. It is considered a feature of the OS/2
port. One can easily disable this by appropriate sighandlers, see
L<perlipc/"Signals">. See also "Process terminated by SIGTERM/SIGINT"
in F<README.os2>.
=back
=head1 BUGS
If you find what you think is a bug, you might check the headers of
recently posted articles in the comp.lang.perl.misc newsgroup.
There may also be information at http://www.perl.com/perl/ , the Perl
Home Page.
If you believe you have an unreported bug, please run the B<perlbug>
program included with your release. Make sure you trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to <F<perlbug@perl.com>> to be
analysed by the Perl porting team.
=head1 SEE ALSO
The F<Changes> file for exhaustive details on what changed.
The F<INSTALL> file for how to build Perl. This file has been
significantly updated for 5.004, so even veteran users should
look through it.
The F<README> file for general stuff.
The F<Copying> file for copyright information.
=head1 HISTORY
Constructed by Tom Christiansen, grabbing material with permission
from innumerable contributors, with kibitzing by more than a few Perl
porters.
Last update: Wed May 14 11:14:09 EDT 1997
PK y3�Z�q]
perlplan9.podnu �[��� If you read this file _as_is_, just ignore the funny characters you see.
It is written in the POD format (see pod/perlpod.pod) which is specially
designed to be readable as is.
=head1 NAME
perlplan9 - Plan 9-specific documentation for Perl
=head1 DESCRIPTION
These are a few notes describing features peculiar to
Plan 9 Perl. As such, it is not intended to be a replacement
for the rest of the Perl 5 documentation (which is both
copious and excellent). If you have any questions to
which you can't find answers in these man pages, contact
Luther Huffman at lutherh@stratcom.com and we'll try to
answer them.
=head2 Invoking Perl
Perl is invoked from the command line as described in
L<perl>. Most perl scripts, however, do have a first line
such as "#!/usr/local/bin/perl". This is known as a shebang
(shell-bang) statement and tells the OS shell where to find
the perl interpreter. In Plan 9 Perl this statement should be
"#!/bin/perl" if you wish to be able to directly invoke the
script by its name.
Alternatively, you may invoke perl with the command "Perl"
instead of "perl". This will produce Acme-friendly error
messages of the form "filename:18".
Some scripts, usually identified with a *.PL extension, are
self-configuring and are able to correctly create their own
shebang path from config information located in Plan 9
Perl. These you won't need to be worried about.
=head2 What's in Plan 9 Perl
Although Plan 9 Perl currently only provides static
loading, it is built with a number of useful extensions.
These include Opcode, FileHandle, Fcntl, and POSIX. Expect
to see others (and DynaLoading!) in the future.
=head2 What's not in Plan 9 Perl
As mentioned previously, dynamic loading isn't currently
available nor is MakeMaker. Both are high-priority items.
=head2 Perl5 Functions not currently supported in Plan 9 Perl
Some, such as C<chown> and C<umask> aren't provided
because the concept does not exist within Plan 9. Others,
such as some of the socket-related functions, simply
haven't been written yet. Many in the latter category
may be supported in the future.
The functions not currently implemented include:
chown, chroot, dbmclose, dbmopen, getsockopt,
setsockopt, recvmsg, sendmsg, getnetbyname,
getnetbyaddr, getnetent, getprotoent, getservent,
sethostent, setnetent, setprotoent, setservent,
endservent, endnetent, endprotoent, umask
There may be several other functions that have undefined
behavior so this list shouldn't be considered complete.
=head2 Signals in Plan 9 Perl
For compatibility with perl scripts written for the Unix
environment, Plan 9 Perl uses the POSIX signal emulation
provided in Plan 9's ANSI POSIX Environment (APE). Signal stacking
isn't supported. The signals provided are:
SIGHUP, SIGINT, SIGQUIT, SIGILL, SIGABRT,
SIGFPE, SIGKILL, SIGSEGV, SIGPIPE, SIGPIPE, SIGALRM,
SIGTERM, SIGUSR1, SIGUSR2, SIGCHLD, SIGCONT,
SIGSTOP, SIGTSTP, SIGTTIN, SIGTTOU
=head1 COMPILING AND INSTALLING PERL ON PLAN 9
WELCOME to Plan 9 Perl, brave soul!
This is a preliminary alpha version of Plan 9 Perl. Still to be
implemented are MakeMaker and DynaLoader. Many perl commands are
missing or currently behave in an inscrutable manner. These gaps will,
with perseverance and a modicum of luck, be remedied in the near
future.To install this software:
1. Create the source directories and libraries for perl by running the
plan9/setup.rc command (i.e., located in the plan9 subdirectory).
Note: the setup routine assumes that you haven't dearchived these
files into /sys/src/cmd/perl. After running setup.rc you may delete
the copy of the source you originally detarred, as source code has now
been installed in /sys/src/cmd/perl. If you plan on installing perl
binaries for all architectures, run "setup.rc -a".
2. After making sure that you have adequate privileges to build system
software, from /sys/src/cmd/perl/5.00301 (adjust version
appropriately) run:
mk install
If you wish to install perl versions for all architectures (68020,
mips, sparc and 386) run:
mk installall
3. Wait. The build process will take a *long* time because perl
bootstraps itself. A 75MHz Pentium, 16MB RAM machine takes roughly 30
minutes to build the distribution from scratch.
=head2 Installing Perl Documentation on Plan 9
This perl distribution comes with a tremendous amount of
documentation. To add these to the built-in manuals that come with
Plan 9, from /sys/src/cmd/perl/5.00301 (adjust version appropriately)
run:
mk man
To begin your reading, start with:
man perl
This is a good introduction and will direct you towards other man
pages that may interest you.
(Note: "mk man" may produce some extraneous noise. Fear not.)
=head1 BUGS
"As many as there are grains of sand on all the beaches of the
world . . ." - Carl Sagan
=head1 Revision date
This document was revised 09-October-1996 for Perl 5.003_7.
=head1 AUTHOR
Direct questions, comments, and the unlikely bug report (ahem) direct
comments toward:
Luther Huffman, lutherh@stratcom.com,
Strategic Computer Solutions, Inc.
PK y3�Z���l% l% perllol.podnu �[��� =head1 NAME
perllol - Manipulating Arrays of Arrays in Perl
=head1 DESCRIPTION
=head2 Declaration and Access of Arrays of Arrays
The simplest two-level data structure to build in Perl is an array of
arrays, sometimes casually called a list of lists. It's reasonably easy to
understand, and almost everything that applies here will also be applicable
later on with the fancier data structures.
An array of an array is just a regular old array @AoA that you can
get at with two subscripts, like C<$AoA[3][2]>. Here's a declaration
of the array:
use 5.010; # so we can use say()
# assign to our array, an array of array references
@AoA = (
[ "fred", "barney", "pebbles", "bambam", "dino", ],
[ "george", "jane", "elroy", "judy", ],
[ "homer", "bart", "marge", "maggie", ],
);
say $AoA[2][1];
bart
Now you should be very careful that the outer bracket type
is a round one, that is, a parenthesis. That's because you're assigning to
an @array, so you need parentheses. If you wanted there I<not> to be an @AoA,
but rather just a reference to it, you could do something more like this:
# assign a reference to array of array references
$ref_to_AoA = [
[ "fred", "barney", "pebbles", "bambam", "dino", ],
[ "george", "jane", "elroy", "judy", ],
[ "homer", "bart", "marge", "maggie", ],
];
say $ref_to_AoA->[2][1];
bart
Notice that the outer bracket type has changed, and so our access syntax
has also changed. That's because unlike C, in perl you can't freely
interchange arrays and references thereto. $ref_to_AoA is a reference to an
array, whereas @AoA is an array proper. Likewise, C<$AoA[2]> is not an
array, but an array ref. So how come you can write these:
$AoA[2][2]
$ref_to_AoA->[2][2]
instead of having to write these:
$AoA[2]->[2]
$ref_to_AoA->[2]->[2]
Well, that's because the rule is that on adjacent brackets only (whether
square or curly), you are free to omit the pointer dereferencing arrow.
But you cannot do so for the very first one if it's a scalar containing
a reference, which means that $ref_to_AoA always needs it.
=head2 Growing Your Own
That's all well and good for declaration of a fixed data structure,
but what if you wanted to add new elements on the fly, or build
it up entirely from scratch?
First, let's look at reading it in from a file. This is something like
adding a row at a time. We'll assume that there's a flat file in which
each line is a row and each word an element. If you're trying to develop an
@AoA array containing all these, here's the right way to do that:
while (<>) {
@tmp = split;
push @AoA, [ @tmp ];
}
You might also have loaded that from a function:
for $i ( 1 .. 10 ) {
$AoA[$i] = [ somefunc($i) ];
}
Or you might have had a temporary variable sitting around with the
array in it.
for $i ( 1 .. 10 ) {
@tmp = somefunc($i);
$AoA[$i] = [ @tmp ];
}
It's important you make sure to use the C<[ ]> array reference
constructor. That's because this wouldn't work:
$AoA[$i] = @tmp; # WRONG!
The reason that doesn't do what you want is because assigning a
named array like that to a scalar is taking an array in scalar
context, which means just counts the number of elements in @tmp.
If you are running under C<use strict> (and if you aren't, why in
the world aren't you?), you'll have to add some declarations to
make it happy:
use strict;
my(@AoA, @tmp);
while (<>) {
@tmp = split;
push @AoA, [ @tmp ];
}
Of course, you don't need the temporary array to have a name at all:
while (<>) {
push @AoA, [ split ];
}
You also don't have to use push(). You could just make a direct assignment
if you knew where you wanted to put it:
my (@AoA, $i, $line);
for $i ( 0 .. 10 ) {
$line = <>;
$AoA[$i] = [ split " ", $line ];
}
or even just
my (@AoA, $i);
for $i ( 0 .. 10 ) {
$AoA[$i] = [ split " ", <> ];
}
You should in general be leery of using functions that could
potentially return lists in scalar context without explicitly stating
such. This would be clearer to the casual reader:
my (@AoA, $i);
for $i ( 0 .. 10 ) {
$AoA[$i] = [ split " ", scalar(<>) ];
}
If you wanted to have a $ref_to_AoA variable as a reference to an array,
you'd have to do something like this:
while (<>) {
push @$ref_to_AoA, [ split ];
}
Now you can add new rows. What about adding new columns? If you're
dealing with just matrices, it's often easiest to use simple assignment:
for $x (1 .. 10) {
for $y (1 .. 10) {
$AoA[$x][$y] = func($x, $y);
}
}
for $x ( 3, 7, 9 ) {
$AoA[$x][20] += func2($x);
}
It doesn't matter whether those elements are already
there or not: it'll gladly create them for you, setting
intervening elements to C<undef> as need be.
If you wanted just to append to a row, you'd have
to do something a bit funnier looking:
# add new columns to an existing row
push @{ $AoA[0] }, "wilma", "betty"; # explicit deref
=head2 Access and Printing
Now it's time to print your data structure out. How
are you going to do that? Well, if you want only one
of the elements, it's trivial:
print $AoA[0][0];
If you want to print the whole thing, though, you can't
say
print @AoA; # WRONG
because you'll get just references listed, and perl will never
automatically dereference things for you. Instead, you have to
roll yourself a loop or two. This prints the whole structure,
using the shell-style for() construct to loop across the outer
set of subscripts.
for $aref ( @AoA ) {
say "\t [ @$aref ],";
}
If you wanted to keep track of subscripts, you might do this:
for $i ( 0 .. $#AoA ) {
say "\t elt $i is [ @{$AoA[$i]} ],";
}
or maybe even this. Notice the inner loop.
for $i ( 0 .. $#AoA ) {
for $j ( 0 .. $#{$AoA[$i]} ) {
say "elt $i $j is $AoA[$i][$j]";
}
}
As you can see, it's getting a bit complicated. That's why
sometimes is easier to take a temporary on your way through:
for $i ( 0 .. $#AoA ) {
$aref = $AoA[$i];
for $j ( 0 .. $#{$aref} ) {
say "elt $i $j is $AoA[$i][$j]";
}
}
Hmm... that's still a bit ugly. How about this:
for $i ( 0 .. $#AoA ) {
$aref = $AoA[$i];
$n = @$aref - 1;
for $j ( 0 .. $n ) {
say "elt $i $j is $AoA[$i][$j]";
}
}
When you get tired of writing a custom print for your data structures,
you might look at the standard L<Dumpvalue> or L<Data::Dumper> modules.
The former is what the Perl debugger uses, while the latter generates
parsable Perl code. For example:
use v5.14; # using the + prototype, new to v5.14
sub show(+) {
require Dumpvalue;
state $prettily = new Dumpvalue::
tick => q("),
compactDump => 1, # comment these two lines
# out
veryCompact => 1, # if you want a bigger
# dump
;
dumpValue $prettily @_;
}
# Assign a list of array references to an array.
my @AoA = (
[ "fred", "barney" ],
[ "george", "jane", "elroy" ],
[ "homer", "marge", "bart" ],
);
push @{ $AoA[0] }, "wilma", "betty";
show @AoA;
will print out:
0 0..3 "fred" "barney" "wilma" "betty"
1 0..2 "george" "jane" "elroy"
2 0..2 "homer" "marge" "bart"
Whereas if you comment out the two lines I said you might wish to,
then it shows it to you this way instead:
0 ARRAY(0x8031d0)
0 "fred"
1 "barney"
2 "wilma"
3 "betty"
1 ARRAY(0x803d40)
0 "george"
1 "jane"
2 "elroy"
2 ARRAY(0x803e10)
0 "homer"
1 "marge"
2 "bart"
=head2 Slices
If you want to get at a slice (part of a row) in a multidimensional
array, you're going to have to do some fancy subscripting. That's
because while we have a nice synonym for single elements via the
pointer arrow for dereferencing, no such convenience exists for slices.
Here's how to do one operation using a loop. We'll assume an @AoA
variable as before.
@part = ();
$x = 4;
for ($y = 7; $y < 13; $y++) {
push @part, $AoA[$x][$y];
}
That same loop could be replaced with a slice operation:
@part = @{$AoA[4]}[7..12];
or spaced out a bit:
@part = @{ $AoA[4] } [ 7..12 ];
But as you might well imagine, this can get pretty rough on the reader.
Ah, but what if you wanted a I<two-dimensional slice>, such as having
$x run from 4..8 and $y run from 7 to 12? Hmm... here's the simple way:
@newAoA = ();
for ($startx = $x = 4; $x <= 8; $x++) {
for ($starty = $y = 7; $y <= 12; $y++) {
$newAoA[$x - $startx][$y - $starty] = $AoA[$x][$y];
}
}
We can reduce some of the looping through slices
for ($x = 4; $x <= 8; $x++) {
push @newAoA, [ @{ $AoA[$x] } [ 7..12 ] ];
}
If you were into Schwartzian Transforms, you would probably
have selected map for that
@newAoA = map { [ @{ $AoA[$_] } [ 7..12 ] ] } 4 .. 8;
Although if your manager accused you of seeking job security (or rapid
insecurity) through inscrutable code, it would be hard to argue. :-)
If I were you, I'd put that in a function:
@newAoA = splice_2D( \@AoA, 4 => 8, 7 => 12 );
sub splice_2D {
my $lrr = shift; # ref to array of array refs!
my ($x_lo, $x_hi,
$y_lo, $y_hi) = @_;
return map {
[ @{ $lrr->[$_] } [ $y_lo .. $y_hi ] ]
} $x_lo .. $x_hi;
}
=head1 SEE ALSO
L<perldata>, L<perlref>, L<perldsc>
=head1 AUTHOR
Tom Christiansen <F<tchrist@perl.com>>
Last update: Tue Apr 26 18:30:55 MDT 2011
PK y3�Z��i+< <
perlop.podnu �[��� =head1 NAME
X<operator>
perlop - Perl operators and precedence
=head1 DESCRIPTION
In Perl, the operator determines what operation is performed,
independent of the type of the operands. For example S<C<$x + $y>>
is always a numeric addition, and if C<$x> or C<$y> do not contain
numbers, an attempt is made to convert them to numbers first.
This is in contrast to many other dynamic languages, where the
operation is determined by the type of the first argument. It also
means that Perl has two versions of some operators, one for numeric
and one for string comparison. For example S<C<$x == $y>> compares
two numbers for equality, and S<C<$x eq $y>> compares two strings.
There are a few exceptions though: C<x> can be either string
repetition or list repetition, depending on the type of the left
operand, and C<&>, C<|>, C<^> and C<~> can be either string or numeric bit
operations.
=head2 Operator Precedence and Associativity
X<operator, precedence> X<precedence> X<associativity>
Operator precedence and associativity work in Perl more or less like
they do in mathematics.
I<Operator precedence> means some operators are evaluated before
others. For example, in S<C<2 + 4 * 5>>, the multiplication has higher
precedence so S<C<4 * 5>> is evaluated first yielding S<C<2 + 20 ==
22>> and not S<C<6 * 5 == 30>>.
I<Operator associativity> defines what happens if a sequence of the
same operators is used one after another: whether the evaluator will
evaluate the left operations first, or the right first. For example, in
S<C<8 - 4 - 2>>, subtraction is left associative so Perl evaluates the
expression left to right. S<C<8 - 4>> is evaluated first making the
expression S<C<4 - 2 == 2>> and not S<C<8 - 2 == 6>>.
Perl operators have the following associativity and precedence,
listed from highest precedence to lowest. Operators borrowed from
C keep the same precedence relationship with each other, even where
C's precedence is slightly screwy. (This makes learning Perl easier
for C folks.) With very few exceptions, these all operate on scalar
values only, not array values.
left terms and list operators (leftward)
left ->
nonassoc ++ --
right **
right ! ~ \ and unary + and -
left =~ !~
left * / % x
left + - .
left << >>
nonassoc named unary operators
nonassoc < > <= >= lt gt le ge
nonassoc == != <=> eq ne cmp ~~
left &
left | ^
left &&
left || //
nonassoc .. ...
right ?:
right = += -= *= etc. goto last next redo dump
left , =>
nonassoc list operators (rightward)
right not
left and
left or xor
In the following sections, these operators are covered in detail, in the
same order in which they appear in the table above.
Many operators can be overloaded for objects. See L<overload>.
=head2 Terms and List Operators (Leftward)
X<list operator> X<operator, list> X<term>
A TERM has the highest precedence in Perl. They include variables,
quote and quote-like operators, any expression in parentheses,
and any function whose arguments are parenthesized. Actually, there
aren't really functions in this sense, just list operators and unary
operators behaving as functions because you put parentheses around
the arguments. These are all documented in L<perlfunc>.
If any list operator (C<print()>, etc.) or any unary operator (C<chdir()>, etc.)
is followed by a left parenthesis as the next token, the operator and
arguments within parentheses are taken to be of highest precedence,
just like a normal function call.
In the absence of parentheses, the precedence of list operators such as
C<print>, C<sort>, or C<chmod> is either very high or very low depending on
whether you are looking at the left side or the right side of the operator.
For example, in
@ary = (1, 3, sort 4, 2);
print @ary; # prints 1324
the commas on the right of the C<sort> are evaluated before the C<sort>,
but the commas on the left are evaluated after. In other words,
list operators tend to gobble up all arguments that follow, and
then act like a simple TERM with regard to the preceding expression.
Be careful with parentheses:
# These evaluate exit before doing the print:
print($foo, exit); # Obviously not what you want.
print $foo, exit; # Nor is this.
# These do the print before evaluating exit:
(print $foo), exit; # This is what you want.
print($foo), exit; # Or this.
print ($foo), exit; # Or even this.
Also note that
print ($foo & 255) + 1, "\n";
probably doesn't do what you expect at first glance. The parentheses
enclose the argument list for C<print> which is evaluated (printing
the result of S<C<$foo & 255>>). Then one is added to the return value
of C<print> (usually 1). The result is something like this:
1 + 1, "\n"; # Obviously not what you meant.
To do what you meant properly, you must write:
print(($foo & 255) + 1, "\n");
See L</Named Unary Operators> for more discussion of this.
Also parsed as terms are the S<C<do {}>> and S<C<eval {}>> constructs, as
well as subroutine and method calls, and the anonymous
constructors C<[]> and C<{}>.
See also L</Quote and Quote-like Operators> toward the end of this section,
as well as L</"I/O Operators">.
=head2 The Arrow Operator
X<arrow> X<dereference> X<< -> >>
"C<< -> >>" is an infix dereference operator, just as it is in C
and C++. If the right side is either a C<[...]>, C<{...}>, or a
C<(...)> subscript, then the left side must be either a hard or
symbolic reference to an array, a hash, or a subroutine respectively.
(Or technically speaking, a location capable of holding a hard
reference, if it's an array or hash reference being used for
assignment.) See L<perlreftut> and L<perlref>.
Otherwise, the right side is a method name or a simple scalar
variable containing either the method name or a subroutine reference,
and the left side must be either an object (a blessed reference)
or a class name (that is, a package name). See L<perlobj>.
The dereferencing cases (as opposed to method-calling cases) are
somewhat extended by the C<postderef> feature. For the
details of that feature, consult L<perlref/Postfix Dereference Syntax>.
=head2 Auto-increment and Auto-decrement
X<increment> X<auto-increment> X<++> X<decrement> X<auto-decrement> X<-->
C<"++"> and C<"--"> work as in C. That is, if placed before a variable,
they increment or decrement the variable by one before returning the
value, and if placed after, increment or decrement after returning the
value.
$i = 0; $j = 0;
print $i++; # prints 0
print ++$j; # prints 1
Note that just as in C, Perl doesn't define B<when> the variable is
incremented or decremented. You just know it will be done sometime
before or after the value is returned. This also means that modifying
a variable twice in the same statement will lead to undefined behavior.
Avoid statements like:
$i = $i ++;
print ++ $i + $i ++;
Perl will not guarantee what the result of the above statements is.
The auto-increment operator has a little extra builtin magic to it. If
you increment a variable that is numeric, or that has ever been used in
a numeric context, you get a normal increment. If, however, the
variable has been used in only string contexts since it was set, and
has a value that is not the empty string and matches the pattern
C</^[a-zA-Z]*[0-9]*\z/>, the increment is done as a string, preserving each
character within its range, with carry:
print ++($foo = "99"); # prints "100"
print ++($foo = "a0"); # prints "a1"
print ++($foo = "Az"); # prints "Ba"
print ++($foo = "zz"); # prints "aaa"
C<undef> is always treated as numeric, and in particular is changed
to C<0> before incrementing (so that a post-increment of an undef value
will return C<0> rather than C<undef>).
The auto-decrement operator is not magical.
=head2 Exponentiation
X<**> X<exponentiation> X<power>
Binary C<"**"> is the exponentiation operator. It binds even more
tightly than unary minus, so C<-2**4> is C<-(2**4)>, not C<(-2)**4>.
(This is
implemented using C's C<pow(3)> function, which actually works on doubles
internally.)
Note that certain exponentiation expressions are ill-defined:
these include C<0**0>, C<1**Inf>, and C<Inf**0>. Do not expect
any particular results from these special cases, the results
are platform-dependent.
=head2 Symbolic Unary Operators
X<unary operator> X<operator, unary>
Unary C<"!"> performs logical negation, that is, "not". See also
L<C<not>|/Logical Not> for a lower precedence version of this.
X<!>
Unary C<"-"> performs arithmetic negation if the operand is numeric,
including any string that looks like a number. If the operand is
an identifier, a string consisting of a minus sign concatenated
with the identifier is returned. Otherwise, if the string starts
with a plus or minus, a string starting with the opposite sign is
returned. One effect of these rules is that C<-bareword> is equivalent
to the string C<"-bareword">. If, however, the string begins with a
non-alphabetic character (excluding C<"+"> or C<"-">), Perl will attempt
to convert
the string to a numeric, and the arithmetic negation is performed. If the
string cannot be cleanly converted to a numeric, Perl will give the warning
B<Argument "the string" isn't numeric in negation (-) at ...>.
X<-> X<negation, arithmetic>
Unary C<"~"> performs bitwise negation, that is, 1's complement. For
example, S<C<0666 & ~027>> is 0640. (See also L</Integer Arithmetic> and
L</Bitwise String Operators>.) Note that the width of the result is
platform-dependent: C<~0> is 32 bits wide on a 32-bit platform, but 64
bits wide on a 64-bit platform, so if you are expecting a certain bit
width, remember to use the C<"&"> operator to mask off the excess bits.
X<~> X<negation, binary>
When complementing strings, if all characters have ordinal values under
256, then their complements will, also. But if they do not, all
characters will be in either 32- or 64-bit complements, depending on your
architecture. So for example, C<~"\x{3B1}"> is C<"\x{FFFF_FC4E}"> on
32-bit machines and C<"\x{FFFF_FFFF_FFFF_FC4E}"> on 64-bit machines.
If the experimental "bitwise" feature is enabled via S<C<use feature
'bitwise'>>, then unary C<"~"> always treats its argument as a number, and an
alternate form of the operator, C<"~.">, always treats its argument as a
string. So C<~0> and C<~"0"> will both give 2**32-1 on 32-bit platforms,
whereas C<~.0> and C<~."0"> will both yield C<"\xff">. This feature
produces a warning unless you use S<C<no warnings 'experimental::bitwise'>>.
Unary C<"+"> has no effect whatsoever, even on strings. It is useful
syntactically for separating a function name from a parenthesized expression
that would otherwise be interpreted as the complete list of function
arguments. (See examples above under L</Terms and List Operators (Leftward)>.)
X<+>
Unary C<"\"> creates a reference to whatever follows it. See L<perlreftut>
and L<perlref>. Do not confuse this behavior with the behavior of
backslash within a string, although both forms do convey the notion
of protecting the next thing from interpolation.
X<\> X<reference> X<backslash>
=head2 Binding Operators
X<binding> X<operator, binding> X<=~> X<!~>
Binary C<"=~"> binds a scalar expression to a pattern match. Certain operations
search or modify the string C<$_> by default. This operator makes that kind
of operation work on some other string. The right argument is a search
pattern, substitution, or transliteration. The left argument is what is
supposed to be searched, substituted, or transliterated instead of the default
C<$_>. When used in scalar context, the return value generally indicates the
success of the operation. The exceptions are substitution (C<s///>)
and transliteration (C<y///>) with the C</r> (non-destructive) option,
which cause the B<r>eturn value to be the result of the substitution.
Behavior in list context depends on the particular operator.
See L</"Regexp Quote-Like Operators"> for details and L<perlretut> for
examples using these operators.
If the right argument is an expression rather than a search pattern,
substitution, or transliteration, it is interpreted as a search pattern at run
time. Note that this means that its
contents will be interpolated twice, so
'\\' =~ q'\\';
is not ok, as the regex engine will end up trying to compile the
pattern C<\>, which it will consider a syntax error.
Binary C<"!~"> is just like C<"=~"> except the return value is negated in
the logical sense.
Binary C<"!~"> with a non-destructive substitution (C<s///r>) or transliteration
(C<y///r>) is a syntax error.
=head2 Multiplicative Operators
X<operator, multiplicative>
Binary C<"*"> multiplies two numbers.
X<*>
Binary C<"/"> divides two numbers.
X</> X<slash>
Binary C<"%"> is the modulo operator, which computes the division
remainder of its first argument with respect to its second argument.
Given integer
operands C<$m> and C<$n>: If C<$n> is positive, then S<C<$m % $n>> is
C<$m> minus the largest multiple of C<$n> less than or equal to
C<$m>. If C<$n> is negative, then S<C<$m % $n>> is C<$m> minus the
smallest multiple of C<$n> that is not less than C<$m> (that is, the
result will be less than or equal to zero). If the operands
C<$m> and C<$n> are floating point values and the absolute value of
C<$n> (that is C<abs($n)>) is less than S<C<(UV_MAX + 1)>>, only
the integer portion of C<$m> and C<$n> will be used in the operation
(Note: here C<UV_MAX> means the maximum of the unsigned integer type).
If the absolute value of the right operand (C<abs($n)>) is greater than
or equal to S<C<(UV_MAX + 1)>>, C<"%"> computes the floating-point remainder
C<$r> in the equation S<C<($r = $m - $i*$n)>> where C<$i> is a certain
integer that makes C<$r> have the same sign as the right operand
C<$n> (B<not> as the left operand C<$m> like C function C<fmod()>)
and the absolute value less than that of C<$n>.
Note that when S<C<use integer>> is in scope, C<"%"> gives you direct access
to the modulo operator as implemented by your C compiler. This
operator is not as well defined for negative operands, but it will
execute faster.
X<%> X<remainder> X<modulo> X<mod>
Binary C<"x"> is the repetition operator. In scalar context or if the left
operand is not enclosed in parentheses, it returns a string consisting
of the left operand repeated the number of times specified by the right
operand. In list context, if the left operand is enclosed in
parentheses or is a list formed by C<qw/I<STRING>/>, it repeats the list.
If the right operand is zero or negative (raising a warning on
negative), it returns an empty string
or an empty list, depending on the context.
X<x>
print '-' x 80; # print row of dashes
print "\t" x ($tab/8), ' ' x ($tab%8); # tab over
@ones = (1) x 80; # a list of 80 1's
@ones = (5) x @ones; # set all elements to 5
=head2 Additive Operators
X<operator, additive>
Binary C<"+"> returns the sum of two numbers.
X<+>
Binary C<"-"> returns the difference of two numbers.
X<->
Binary C<"."> concatenates two strings.
X<string, concatenation> X<concatenation>
X<cat> X<concat> X<concatenate> X<.>
=head2 Shift Operators
X<shift operator> X<operator, shift> X<<< << >>>
X<<< >> >>> X<right shift> X<left shift> X<bitwise shift>
X<shl> X<shr> X<shift, right> X<shift, left>
Binary C<<< "<<" >>> returns the value of its left argument shifted left by the
number of bits specified by the right argument. Arguments should be
integers. (See also L</Integer Arithmetic>.)
Binary C<<< ">>" >>> returns the value of its left argument shifted right by
the number of bits specified by the right argument. Arguments should
be integers. (See also L</Integer Arithmetic>.)
If S<C<use integer>> (see L</Integer Arithmetic>) is in force then
signed C integers are used (I<arithmetic shift>), otherwise unsigned C
integers are used (I<logical shift>), even for negative shiftees.
In arithmetic right shift the sign bit is replicated on the left,
in logical shift zero bits come in from the left.
Either way, the implementation isn't going to generate results larger
than the size of the integer type Perl was built with (32 bits or 64 bits).
Shifting by negative number of bits means the reverse shift: left
shift becomes right shift, right shift becomes left shift. This is
unlike in C, where negative shift is undefined.
Shifting by more bits than the size of the integers means most of the
time zero (all bits fall off), except that under S<C<use integer>>
right overshifting a negative shiftee results in -1. This is unlike
in C, where shifting by too many bits is undefined. A common C
behavior is "shift by modulo wordbits", so that for example
1 >> 64 == 1 >> (64 % 64) == 1 >> 0 == 1 # Common C behavior.
but that is completely accidental.
If you get tired of being subject to your platform's native integers,
the S<C<use bigint>> pragma neatly sidesteps the issue altogether:
print 20 << 20; # 20971520
print 20 << 40; # 5120 on 32-bit machines,
# 21990232555520 on 64-bit machines
use bigint;
print 20 << 100; # 25353012004564588029934064107520
=head2 Named Unary Operators
X<operator, named unary>
The various named unary operators are treated as functions with one
argument, with optional parentheses.
If any list operator (C<print()>, etc.) or any unary operator (C<chdir()>, etc.)
is followed by a left parenthesis as the next token, the operator and
arguments within parentheses are taken to be of highest precedence,
just like a normal function call. For example,
because named unary operators are higher precedence than C<||>:
chdir $foo || die; # (chdir $foo) || die
chdir($foo) || die; # (chdir $foo) || die
chdir ($foo) || die; # (chdir $foo) || die
chdir +($foo) || die; # (chdir $foo) || die
but, because C<"*"> is higher precedence than named operators:
chdir $foo * 20; # chdir ($foo * 20)
chdir($foo) * 20; # (chdir $foo) * 20
chdir ($foo) * 20; # (chdir $foo) * 20
chdir +($foo) * 20; # chdir ($foo * 20)
rand 10 * 20; # rand (10 * 20)
rand(10) * 20; # (rand 10) * 20
rand (10) * 20; # (rand 10) * 20
rand +(10) * 20; # rand (10 * 20)
Regarding precedence, the filetest operators, like C<-f>, C<-M>, etc. are
treated like named unary operators, but they don't follow this functional
parenthesis rule. That means, for example, that C<-f($file).".bak"> is
equivalent to S<C<-f "$file.bak">>.
X<-X> X<filetest> X<operator, filetest>
See also L</"Terms and List Operators (Leftward)">.
=head2 Relational Operators
X<relational operator> X<operator, relational>
Perl operators that return true or false generally return values
that can be safely used as numbers. For example, the relational
operators in this section and the equality operators in the next
one return C<1> for true and a special version of the defined empty
string, C<"">, which counts as a zero but is exempt from warnings
about improper numeric conversions, just as S<C<"0 but true">> is.
Binary C<< "<" >> returns true if the left argument is numerically less than
the right argument.
X<< < >>
Binary C<< ">" >> returns true if the left argument is numerically greater
than the right argument.
X<< > >>
Binary C<< "<=" >> returns true if the left argument is numerically less than
or equal to the right argument.
X<< <= >>
Binary C<< ">=" >> returns true if the left argument is numerically greater
than or equal to the right argument.
X<< >= >>
Binary C<"lt"> returns true if the left argument is stringwise less than
the right argument.
X<< lt >>
Binary C<"gt"> returns true if the left argument is stringwise greater
than the right argument.
X<< gt >>
Binary C<"le"> returns true if the left argument is stringwise less than
or equal to the right argument.
X<< le >>
Binary C<"ge"> returns true if the left argument is stringwise greater
than or equal to the right argument.
X<< ge >>
=head2 Equality Operators
X<equality> X<equal> X<equals> X<operator, equality>
Binary C<< "==" >> returns true if the left argument is numerically equal to
the right argument.
X<==>
Binary C<< "!=" >> returns true if the left argument is numerically not equal
to the right argument.
X<!=>
Binary C<< "<=>" >> returns -1, 0, or 1 depending on whether the left
argument is numerically less than, equal to, or greater than the right
argument. If your platform supports C<NaN>'s (not-a-numbers) as numeric
values, using them with C<< "<=>" >> returns undef. C<NaN> is not
C<< "<" >>, C<< "==" >>, C<< ">" >>, C<< "<=" >> or C<< ">=" >> anything
(even C<NaN>), so those 5 return false. S<C<< NaN != NaN >>> returns
true, as does S<C<NaN !=> I<anything else>>. If your platform doesn't
support C<NaN>'s then C<NaN> is just a string with numeric value 0.
X<< <=> >>
X<spaceship>
$ perl -le '$x = "NaN"; print "No NaN support here" if $x == $x'
$ perl -le '$x = "NaN"; print "NaN support here" if $x != $x'
(Note that the L<bigint>, L<bigrat>, and L<bignum> pragmas all
support C<"NaN">.)
Binary C<"eq"> returns true if the left argument is stringwise equal to
the right argument.
X<eq>
Binary C<"ne"> returns true if the left argument is stringwise not equal
to the right argument.
X<ne>
Binary C<"cmp"> returns -1, 0, or 1 depending on whether the left
argument is stringwise less than, equal to, or greater than the right
argument.
X<cmp>
Binary C<"~~"> does a smartmatch between its arguments. Smart matching
is described in the next section.
X<~~>
C<"lt">, C<"le">, C<"ge">, C<"gt"> and C<"cmp"> use the collation (sort)
order specified by the current C<LC_COLLATE> locale if a S<C<use
locale>> form that includes collation is in effect. See L<perllocale>.
Do not mix these with Unicode,
only use them with legacy 8-bit locale encodings.
The standard C<L<Unicode::Collate>> and
C<L<Unicode::Collate::Locale>> modules offer much more powerful
solutions to collation issues.
For case-insensitive comparisions, look at the L<perlfunc/fc> case-folding
function, available in Perl v5.16 or later:
if ( fc($x) eq fc($y) ) { ... }
=head2 Smartmatch Operator
First available in Perl 5.10.1 (the 5.10.0 version behaved differently),
binary C<~~> does a "smartmatch" between its arguments. This is mostly
used implicitly in the C<when> construct described in L<perlsyn>, although
not all C<when> clauses call the smartmatch operator. Unique among all of
Perl's operators, the smartmatch operator can recurse. The smartmatch
operator is L<experimental|perlpolicy/experimental> and its behavior is
subject to change.
It is also unique in that all other Perl operators impose a context
(usually string or numeric context) on their operands, autoconverting
those operands to those imposed contexts. In contrast, smartmatch
I<infers> contexts from the actual types of its operands and uses that
type information to select a suitable comparison mechanism.
The C<~~> operator compares its operands "polymorphically", determining how
to compare them according to their actual types (numeric, string, array,
hash, etc.) Like the equality operators with which it shares the same
precedence, C<~~> returns 1 for true and C<""> for false. It is often best
read aloud as "in", "inside of", or "is contained in", because the left
operand is often looked for I<inside> the right operand. That makes the
order of the operands to the smartmatch operand often opposite that of
the regular match operator. In other words, the "smaller" thing is usually
placed in the left operand and the larger one in the right.
The behavior of a smartmatch depends on what type of things its arguments
are, as determined by the following table. The first row of the table
whose types apply determines the smartmatch behavior. Because what
actually happens is mostly determined by the type of the second operand,
the table is sorted on the right operand instead of on the left.
Left Right Description and pseudocode
===============================================================
Any undef check whether Any is undefined
like: !defined Any
Any Object invoke ~~ overloading on Object, or die
Right operand is an ARRAY:
Left Right Description and pseudocode
===============================================================
ARRAY1 ARRAY2 recurse on paired elements of ARRAY1 and ARRAY2[2]
like: (ARRAY1[0] ~~ ARRAY2[0])
&& (ARRAY1[1] ~~ ARRAY2[1]) && ...
HASH ARRAY any ARRAY elements exist as HASH keys
like: grep { exists HASH->{$_} } ARRAY
Regexp ARRAY any ARRAY elements pattern match Regexp
like: grep { /Regexp/ } ARRAY
undef ARRAY undef in ARRAY
like: grep { !defined } ARRAY
Any ARRAY smartmatch each ARRAY element[3]
like: grep { Any ~~ $_ } ARRAY
Right operand is a HASH:
Left Right Description and pseudocode
===============================================================
HASH1 HASH2 all same keys in both HASHes
like: keys HASH1 ==
grep { exists HASH2->{$_} } keys HASH1
ARRAY HASH any ARRAY elements exist as HASH keys
like: grep { exists HASH->{$_} } ARRAY
Regexp HASH any HASH keys pattern match Regexp
like: grep { /Regexp/ } keys HASH
undef HASH always false (undef can't be a key)
like: 0 == 1
Any HASH HASH key existence
like: exists HASH->{Any}
Right operand is CODE:
Left Right Description and pseudocode
===============================================================
ARRAY CODE sub returns true on all ARRAY elements[1]
like: !grep { !CODE->($_) } ARRAY
HASH CODE sub returns true on all HASH keys[1]
like: !grep { !CODE->($_) } keys HASH
Any CODE sub passed Any returns true
like: CODE->(Any)
Right operand is a Regexp:
Left Right Description and pseudocode
===============================================================
ARRAY Regexp any ARRAY elements match Regexp
like: grep { /Regexp/ } ARRAY
HASH Regexp any HASH keys match Regexp
like: grep { /Regexp/ } keys HASH
Any Regexp pattern match
like: Any =~ /Regexp/
Other:
Left Right Description and pseudocode
===============================================================
Object Any invoke ~~ overloading on Object,
or fall back to...
Any Num numeric equality
like: Any == Num
Num nummy[4] numeric equality
like: Num == nummy
undef Any check whether undefined
like: !defined(Any)
Any Any string equality
like: Any eq Any
Notes:
=over
=item 1.
Empty hashes or arrays match.
=item 2.
That is, each element smartmatches the element of the same index in the other array.[3]
=item 3.
If a circular reference is found, fall back to referential equality.
=item 4.
Either an actual number, or a string that looks like one.
=back
The smartmatch implicitly dereferences any non-blessed hash or array
reference, so the C<I<HASH>> and C<I<ARRAY>> entries apply in those cases.
For blessed references, the C<I<Object>> entries apply. Smartmatches
involving hashes only consider hash keys, never hash values.
The "like" code entry is not always an exact rendition. For example, the
smartmatch operator short-circuits whenever possible, but C<grep> does
not. Also, C<grep> in scalar context returns the number of matches, but
C<~~> returns only true or false.
Unlike most operators, the smartmatch operator knows to treat C<undef>
specially:
use v5.10.1;
@array = (1, 2, 3, undef, 4, 5);
say "some elements undefined" if undef ~~ @array;
Each operand is considered in a modified scalar context, the modification
being that array and hash variables are passed by reference to the
operator, which implicitly dereferences them. Both elements
of each pair are the same:
use v5.10.1;
my %hash = (red => 1, blue => 2, green => 3,
orange => 4, yellow => 5, purple => 6,
black => 7, grey => 8, white => 9);
my @array = qw(red blue green);
say "some array elements in hash keys" if @array ~~ %hash;
say "some array elements in hash keys" if \@array ~~ \%hash;
say "red in array" if "red" ~~ @array;
say "red in array" if "red" ~~ \@array;
say "some keys end in e" if /e$/ ~~ %hash;
say "some keys end in e" if /e$/ ~~ \%hash;
Two arrays smartmatch if each element in the first array smartmatches
(that is, is "in") the corresponding element in the second array,
recursively.
use v5.10.1;
my @little = qw(red blue green);
my @bigger = ("red", "blue", [ "orange", "green" ] );
if (@little ~~ @bigger) { # true!
say "little is contained in bigger";
}
Because the smartmatch operator recurses on nested arrays, this
will still report that "red" is in the array.
use v5.10.1;
my @array = qw(red blue green);
my $nested_array = [[[[[[[ @array ]]]]]]];
say "red in array" if "red" ~~ $nested_array;
If two arrays smartmatch each other, then they are deep
copies of each others' values, as this example reports:
use v5.12.0;
my @a = (0, 1, 2, [3, [4, 5], 6], 7);
my @b = (0, 1, 2, [3, [4, 5], 6], 7);
if (@a ~~ @b && @b ~~ @a) {
say "a and b are deep copies of each other";
}
elsif (@a ~~ @b) {
say "a smartmatches in b";
}
elsif (@b ~~ @a) {
say "b smartmatches in a";
}
else {
say "a and b don't smartmatch each other at all";
}
If you were to set S<C<$b[3] = 4>>, then instead of reporting that "a and b
are deep copies of each other", it now reports that C<"b smartmatches in a">.
That's because the corresponding position in C<@a> contains an array that
(eventually) has a 4 in it.
Smartmatching one hash against another reports whether both contain the
same keys, no more and no less. This could be used to see whether two
records have the same field names, without caring what values those fields
might have. For example:
use v5.10.1;
sub make_dogtag {
state $REQUIRED_FIELDS = { name=>1, rank=>1, serial_num=>1 };
my ($class, $init_fields) = @_;
die "Must supply (only) name, rank, and serial number"
unless $init_fields ~~ $REQUIRED_FIELDS;
...
}
However, this only does what you mean if C<$init_fields> is indeed a hash
reference. The condition C<$init_fields ~~ $REQUIRED_FIELDS> also allows the
strings C<"name">, C<"rank">, C<"serial_num"> as well as any array reference
that contains C<"name"> or C<"rank"> or C<"serial_num"> anywhere to pass
through.
The smartmatch operator is most often used as the implicit operator of a
C<when> clause. See the section on "Switch Statements" in L<perlsyn>.
=head3 Smartmatching of Objects
To avoid relying on an object's underlying representation, if the
smartmatch's right operand is an object that doesn't overload C<~~>,
it raises the exception "C<Smartmatching a non-overloaded object
breaks encapsulation>". That's because one has no business digging
around to see whether something is "in" an object. These are all
illegal on objects without a C<~~> overload:
%hash ~~ $object
42 ~~ $object
"fred" ~~ $object
However, you can change the way an object is smartmatched by overloading
the C<~~> operator. This is allowed to
extend the usual smartmatch semantics.
For objects that do have an C<~~> overload, see L<overload>.
Using an object as the left operand is allowed, although not very useful.
Smartmatching rules take precedence over overloading, so even if the
object in the left operand has smartmatch overloading, this will be
ignored. A left operand that is a non-overloaded object falls back on a
string or numeric comparison of whatever the C<ref> operator returns. That
means that
$object ~~ X
does I<not> invoke the overload method with C<I<X>> as an argument.
Instead the above table is consulted as normal, and based on the type of
C<I<X>>, overloading may or may not be invoked. For simple strings or
numbers, "in" becomes equivalent to this:
$object ~~ $number ref($object) == $number
$object ~~ $string ref($object) eq $string
For example, this reports that the handle smells IOish
(but please don't really do this!):
use IO::Handle;
my $fh = IO::Handle->new();
if ($fh ~~ /\bIO\b/) {
say "handle smells IOish";
}
That's because it treats C<$fh> as a string like
C<"IO::Handle=GLOB(0x8039e0)">, then pattern matches against that.
=head2 Bitwise And
X<operator, bitwise, and> X<bitwise and> X<&>
Binary C<"&"> returns its operands ANDed together bit by bit. Although no
warning is currently raised, the result is not well defined when this operation
is performed on operands that aren't either numbers (see
L</Integer Arithmetic>) nor bitstrings (see L</Bitwise String Operators>).
Note that C<"&"> has lower priority than relational operators, so for example
the parentheses are essential in a test like
print "Even\n" if ($x & 1) == 0;
If the experimental "bitwise" feature is enabled via S<C<use feature
'bitwise'>>, then this operator always treats its operand as numbers. This
feature produces a warning unless you also use C<S<no warnings
'experimental::bitwise'>>.
=head2 Bitwise Or and Exclusive Or
X<operator, bitwise, or> X<bitwise or> X<|> X<operator, bitwise, xor>
X<bitwise xor> X<^>
Binary C<"|"> returns its operands ORed together bit by bit.
Binary C<"^"> returns its operands XORed together bit by bit.
Although no warning is currently raised, the results are not well
defined when these operations are performed on operands that aren't either
numbers (see L</Integer Arithmetic>) nor bitstrings (see L</Bitwise String
Operators>).
Note that C<"|"> and C<"^"> have lower priority than relational operators, so
for example the parentheses are essential in a test like
print "false\n" if (8 | 2) != 10;
If the experimental "bitwise" feature is enabled via S<C<use feature
'bitwise'>>, then this operator always treats its operand as numbers. This
feature produces a warning unless you also use S<C<no warnings
'experimental::bitwise'>>.
=head2 C-style Logical And
X<&&> X<logical and> X<operator, logical, and>
Binary C<"&&"> performs a short-circuit logical AND operation. That is,
if the left operand is false, the right operand is not even evaluated.
Scalar or list context propagates down to the right operand if it
is evaluated.
=head2 C-style Logical Or
X<||> X<operator, logical, or>
Binary C<"||"> performs a short-circuit logical OR operation. That is,
if the left operand is true, the right operand is not even evaluated.
Scalar or list context propagates down to the right operand if it
is evaluated.
=head2 Logical Defined-Or
X<//> X<operator, logical, defined-or>
Although it has no direct equivalent in C, Perl's C<//> operator is related
to its C-style "or". In fact, it's exactly the same as C<||>, except that it
tests the left hand side's definedness instead of its truth. Thus,
S<C<< EXPR1 // EXPR2 >>> returns the value of C<< EXPR1 >> if it's defined,
otherwise, the value of C<< EXPR2 >> is returned.
(C<< EXPR1 >> is evaluated in scalar context, C<< EXPR2 >>
in the context of C<< // >> itself). Usually,
this is the same result as S<C<< defined(EXPR1) ? EXPR1 : EXPR2 >>> (except that
the ternary-operator form can be used as a lvalue, while S<C<< EXPR1 // EXPR2 >>>
cannot). This is very useful for
providing default values for variables. If you actually want to test if
at least one of C<$x> and C<$y> is defined, use S<C<defined($x // $y)>>.
The C<||>, C<//> and C<&&> operators return the last value evaluated
(unlike C's C<||> and C<&&>, which return 0 or 1). Thus, a reasonably
portable way to find out the home directory might be:
$home = $ENV{HOME}
// $ENV{LOGDIR}
// (getpwuid($<))[7]
// die "You're homeless!\n";
In particular, this means that you shouldn't use this
for selecting between two aggregates for assignment:
@a = @b || @c; # This doesn't do the right thing
@a = scalar(@b) || @c; # because it really means this.
@a = @b ? @b : @c; # This works fine, though.
As alternatives to C<&&> and C<||> when used for
control flow, Perl provides the C<and> and C<or> operators (see below).
The short-circuit behavior is identical. The precedence of C<"and">
and C<"or"> is much lower, however, so that you can safely use them after a
list operator without the need for parentheses:
unlink "alpha", "beta", "gamma"
or gripe(), next LINE;
With the C-style operators that would have been written like this:
unlink("alpha", "beta", "gamma")
|| (gripe(), next LINE);
It would be even more readable to write that this way:
unless(unlink("alpha", "beta", "gamma")) {
gripe();
next LINE;
}
Using C<"or"> for assignment is unlikely to do what you want; see below.
=head2 Range Operators
X<operator, range> X<range> X<..> X<...>
Binary C<".."> is the range operator, which is really two different
operators depending on the context. In list context, it returns a
list of values counting (up by ones) from the left value to the right
value. If the left value is greater than the right value then it
returns the empty list. The range operator is useful for writing
S<C<foreach (1..10)>> loops and for doing slice operations on arrays. In
the current implementation, no temporary array is created when the
range operator is used as the expression in C<foreach> loops, but older
versions of Perl might burn a lot of memory when you write something
like this:
for (1 .. 1_000_000) {
# code
}
The range operator also works on strings, using the magical
auto-increment, see below.
In scalar context, C<".."> returns a boolean value. The operator is
bistable, like a flip-flop, and emulates the line-range (comma)
operator of B<sed>, B<awk>, and various editors. Each C<".."> operator
maintains its own boolean state, even across calls to a subroutine
that contains it. It is false as long as its left operand is false.
Once the left operand is true, the range operator stays true until the
right operand is true, I<AFTER> which the range operator becomes false
again. It doesn't become false till the next time the range operator
is evaluated. It can test the right operand and become false on the
same evaluation it became true (as in B<awk>), but it still returns
true once. If you don't want it to test the right operand until the
next evaluation, as in B<sed>, just use three dots (C<"...">) instead of
two. In all other regards, C<"..."> behaves just like C<".."> does.
The right operand is not evaluated while the operator is in the
"false" state, and the left operand is not evaluated while the
operator is in the "true" state. The precedence is a little lower
than || and &&. The value returned is either the empty string for
false, or a sequence number (beginning with 1) for true. The sequence
number is reset for each range encountered. The final sequence number
in a range has the string C<"E0"> appended to it, which doesn't affect
its numeric value, but gives you something to search for if you want
to exclude the endpoint. You can exclude the beginning point by
waiting for the sequence number to be greater than 1.
If either operand of scalar C<".."> is a constant expression,
that operand is considered true if it is equal (C<==>) to the current
input line number (the C<$.> variable).
To be pedantic, the comparison is actually S<C<int(EXPR) == int(EXPR)>>,
but that is only an issue if you use a floating point expression; when
implicitly using C<$.> as described in the previous paragraph, the
comparison is S<C<int(EXPR) == int($.)>> which is only an issue when C<$.>
is set to a floating point value and you are not reading from a file.
Furthermore, S<C<"span" .. "spat">> or S<C<2.18 .. 3.14>> will not do what
you want in scalar context because each of the operands are evaluated
using their integer representation.
Examples:
As a scalar operator:
if (101 .. 200) { print; } # print 2nd hundred lines, short for
# if ($. == 101 .. $. == 200) { print; }
next LINE if (1 .. /^$/); # skip header lines, short for
# next LINE if ($. == 1 .. /^$/);
# (typically in a loop labeled LINE)
s/^/> / if (/^$/ .. eof()); # quote body
# parse mail messages
while (<>) {
$in_header = 1 .. /^$/;
$in_body = /^$/ .. eof;
if ($in_header) {
# do something
} else { # in body
# do something else
}
} continue {
close ARGV if eof; # reset $. each file
}
Here's a simple example to illustrate the difference between
the two range operators:
@lines = (" - Foo",
"01 - Bar",
"1 - Baz",
" - Quux");
foreach (@lines) {
if (/0/ .. /1/) {
print "$_\n";
}
}
This program will print only the line containing "Bar". If
the range operator is changed to C<...>, it will also print the
"Baz" line.
And now some examples as a list operator:
for (101 .. 200) { print } # print $_ 100 times
@foo = @foo[0 .. $#foo]; # an expensive no-op
@foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
The range operator (in list context) makes use of the magical
auto-increment algorithm if the operands are strings. You
can say
@alphabet = ("A" .. "Z");
to get all normal letters of the English alphabet, or
$hexdigit = (0 .. 9, "a" .. "f")[$num & 15];
to get a hexadecimal digit, or
@z2 = ("01" .. "31");
print $z2[$mday];
to get dates with leading zeros.
If the final value specified is not in the sequence that the magical
increment would produce, the sequence goes until the next value would
be longer than the final value specified.
As of Perl 5.26, the list-context range operator on strings works as expected
in the scope of L<< S<C<"use feature 'unicode_strings">>|feature/The
'unicode_strings' feature >>. In previous versions, and outside the scope of
that feature, it exhibits L<perlunicode/The "Unicode Bug">: its behavior
depends on the internal encoding of the range endpoint.
If the initial value specified isn't part of a magical increment
sequence (that is, a non-empty string matching C</^[a-zA-Z]*[0-9]*\z/>),
only the initial value will be returned. So the following will only
return an alpha:
use charnames "greek";
my @greek_small = ("\N{alpha}" .. "\N{omega}");
To get the 25 traditional lowercase Greek letters, including both sigmas,
you could use this instead:
use charnames "greek";
my @greek_small = map { chr } ( ord("\N{alpha}")
..
ord("\N{omega}")
);
However, because there are I<many> other lowercase Greek characters than
just those, to match lowercase Greek characters in a regular expression,
you could use the pattern C</(?:(?=\p{Greek})\p{Lower})+/> (or the
L<experimental feature|perlrecharclass/Extended Bracketed Character
Classes> C<S</(?[ \p{Greek} & \p{Lower} ])+/>>).
Because each operand is evaluated in integer form, S<C<2.18 .. 3.14>> will
return two elements in list context.
@list = (2.18 .. 3.14); # same as @list = (2 .. 3);
=head2 Conditional Operator
X<operator, conditional> X<operator, ternary> X<ternary> X<?:>
Ternary C<"?:"> is the conditional operator, just as in C. It works much
like an if-then-else. If the argument before the C<?> is true, the
argument before the C<:> is returned, otherwise the argument after the
C<:> is returned. For example:
printf "I have %d dog%s.\n", $n,
($n == 1) ? "" : "s";
Scalar or list context propagates downward into the 2nd
or 3rd argument, whichever is selected.
$x = $ok ? $y : $z; # get a scalar
@x = $ok ? @y : @z; # get an array
$x = $ok ? @y : @z; # oops, that's just a count!
The operator may be assigned to if both the 2nd and 3rd arguments are
legal lvalues (meaning that you can assign to them):
($x_or_y ? $x : $y) = $z;
Because this operator produces an assignable result, using assignments
without parentheses will get you in trouble. For example, this:
$x % 2 ? $x += 10 : $x += 2
Really means this:
(($x % 2) ? ($x += 10) : $x) += 2
Rather than this:
($x % 2) ? ($x += 10) : ($x += 2)
That should probably be written more simply as:
$x += ($x % 2) ? 10 : 2;
=head2 Assignment Operators
X<assignment> X<operator, assignment> X<=> X<**=> X<+=> X<*=> X<&=>
X<<< <<= >>> X<&&=> X<-=> X</=> X<|=> X<<< >>= >>> X<||=> X<//=> X<.=>
X<%=> X<^=> X<x=> X<&.=> X<|.=> X<^.=>
C<"="> is the ordinary assignment operator.
Assignment operators work as in C. That is,
$x += 2;
is equivalent to
$x = $x + 2;
although without duplicating any side effects that dereferencing the lvalue
might trigger, such as from C<tie()>. Other assignment operators work similarly.
The following are recognized:
**= += *= &= &.= <<= &&=
-= /= |= |.= >>= ||=
.= %= ^= ^.= //=
x=
Although these are grouped by family, they all have the precedence
of assignment. These combined assignment operators can only operate on
scalars, whereas the ordinary assignment operator can assign to arrays,
hashes, lists and even references. (See L<"Context"|perldata/Context>
and L<perldata/List value constructors>, and L<perlref/Assigning to
References>.)
Unlike in C, the scalar assignment operator produces a valid lvalue.
Modifying an assignment is equivalent to doing the assignment and
then modifying the variable that was assigned to. This is useful
for modifying a copy of something, like this:
($tmp = $global) =~ tr/13579/24680/;
Although as of 5.14, that can be also be accomplished this way:
use v5.14;
$tmp = ($global =~ tr/13579/24680/r);
Likewise,
($x += 2) *= 3;
is equivalent to
$x += 2;
$x *= 3;
Similarly, a list assignment in list context produces the list of
lvalues assigned to, and a list assignment in scalar context returns
the number of elements produced by the expression on the right hand
side of the assignment.
The three dotted bitwise assignment operators (C<&.=> C<|.=> C<^.=>) are new in
Perl 5.22 and experimental. See L</Bitwise String Operators>.
=head2 Comma Operator
X<comma> X<operator, comma> X<,>
Binary C<","> is the comma operator. In scalar context it evaluates
its left argument, throws that value away, then evaluates its right
argument and returns that value. This is just like C's comma operator.
In list context, it's just the list argument separator, and inserts
both its arguments into the list. These arguments are also evaluated
from left to right.
The C<< => >> operator (sometimes pronounced "fat comma") is a synonym
for the comma except that it causes a
word on its left to be interpreted as a string if it begins with a letter
or underscore and is composed only of letters, digits and underscores.
This includes operands that might otherwise be interpreted as operators,
constants, single number v-strings or function calls. If in doubt about
this behavior, the left operand can be quoted explicitly.
Otherwise, the C<< => >> operator behaves exactly as the comma operator
or list argument separator, according to context.
For example:
use constant FOO => "something";
my %h = ( FOO => 23 );
is equivalent to:
my %h = ("FOO", 23);
It is I<NOT>:
my %h = ("something", 23);
The C<< => >> operator is helpful in documenting the correspondence
between keys and values in hashes, and other paired elements in lists.
%hash = ( $key => $value );
login( $username => $password );
The special quoting behavior ignores precedence, and hence may apply to
I<part> of the left operand:
print time.shift => "bbb";
That example prints something like C<"1314363215shiftbbb">, because the
C<< => >> implicitly quotes the C<shift> immediately on its left, ignoring
the fact that C<time.shift> is the entire left operand.
=head2 List Operators (Rightward)
X<operator, list, rightward> X<list operator>
On the right side of a list operator, the comma has very low precedence,
such that it controls all comma-separated expressions found there.
The only operators with lower precedence are the logical operators
C<"and">, C<"or">, and C<"not">, which may be used to evaluate calls to list
operators without the need for parentheses:
open HANDLE, "< :encoding(UTF-8)", "filename"
or die "Can't open: $!\n";
However, some people find that code harder to read than writing
it with parentheses:
open(HANDLE, "< :encoding(UTF-8)", "filename")
or die "Can't open: $!\n";
in which case you might as well just use the more customary C<"||"> operator:
open(HANDLE, "< :encoding(UTF-8)", "filename")
|| die "Can't open: $!\n";
See also discussion of list operators in L</Terms and List Operators (Leftward)>.
=head2 Logical Not
X<operator, logical, not> X<not>
Unary C<"not"> returns the logical negation of the expression to its right.
It's the equivalent of C<"!"> except for the very low precedence.
=head2 Logical And
X<operator, logical, and> X<and>
Binary C<"and"> returns the logical conjunction of the two surrounding
expressions. It's equivalent to C<&&> except for the very low
precedence. This means that it short-circuits: the right
expression is evaluated only if the left expression is true.
=head2 Logical or and Exclusive Or
X<operator, logical, or> X<operator, logical, xor>
X<operator, logical, exclusive or>
X<or> X<xor>
Binary C<"or"> returns the logical disjunction of the two surrounding
expressions. It's equivalent to C<||> except for the very low precedence.
This makes it useful for control flow:
print FH $data or die "Can't write to FH: $!";
This means that it short-circuits: the right expression is evaluated
only if the left expression is false. Due to its precedence, you must
be careful to avoid using it as replacement for the C<||> operator.
It usually works out better for flow control than in assignments:
$x = $y or $z; # bug: this is wrong
($x = $y) or $z; # really means this
$x = $y || $z; # better written this way
However, when it's a list-context assignment and you're trying to use
C<||> for control flow, you probably need C<"or"> so that the assignment
takes higher precedence.
@info = stat($file) || die; # oops, scalar sense of stat!
@info = stat($file) or die; # better, now @info gets its due
Then again, you could always use parentheses.
Binary C<"xor"> returns the exclusive-OR of the two surrounding expressions.
It cannot short-circuit (of course).
There is no low precedence operator for defined-OR.
=head2 C Operators Missing From Perl
X<operator, missing from perl> X<&> X<*>
X<typecasting> X<(TYPE)>
Here is what C has that Perl doesn't:
=over 8
=item unary &
Address-of operator. (But see the C<"\"> operator for taking a reference.)
=item unary *
Dereference-address operator. (Perl's prefix dereferencing
operators are typed: C<$>, C<@>, C<%>, and C<&>.)
=item (TYPE)
Type-casting operator.
=back
=head2 Quote and Quote-like Operators
X<operator, quote> X<operator, quote-like> X<q> X<qq> X<qx> X<qw> X<m>
X<qr> X<s> X<tr> X<'> X<''> X<"> X<""> X<//> X<`> X<``> X<<< << >>>
X<escape sequence> X<escape>
While we usually think of quotes as literal values, in Perl they
function as operators, providing various kinds of interpolating and
pattern matching capabilities. Perl provides customary quote characters
for these behaviors, but also provides a way for you to choose your
quote character for any of them. In the following table, a C<{}> represents
any pair of delimiters you choose.
Customary Generic Meaning Interpolates
'' q{} Literal no
"" qq{} Literal yes
`` qx{} Command yes*
qw{} Word list no
// m{} Pattern match yes*
qr{} Pattern yes*
s{}{} Substitution yes*
tr{}{} Transliteration no (but see below)
y{}{} Transliteration no (but see below)
<<EOF here-doc yes*
* unless the delimiter is ''.
Non-bracketing delimiters use the same character fore and aft, but the four
sorts of ASCII brackets (round, angle, square, curly) all nest, which means
that
q{foo{bar}baz}
is the same as
'foo{bar}baz'
Note, however, that this does not always work for quoting Perl code:
$s = q{ if($x eq "}") ... }; # WRONG
is a syntax error. The C<L<Text::Balanced>> module (standard as of v5.8,
and from CPAN before then) is able to do this properly.
There can (and in some cases, must) be whitespace between the operator
and the quoting
characters, except when C<#> is being used as the quoting character.
C<q#foo#> is parsed as the string C<foo>, while S<C<q #foo#>> is the
operator C<q> followed by a comment. Its argument will be taken
from the next line. This allows you to write:
s {foo} # Replace foo
{bar} # with bar.
The cases where whitespace must be used are when the quoting character
is a word character (meaning it matches C</\w/>):
q XfooX # Works: means the string 'foo'
qXfooX # WRONG!
The following escape sequences are available in constructs that interpolate,
and in transliterations:
X<\t> X<\n> X<\r> X<\f> X<\b> X<\a> X<\e> X<\x> X<\0> X<\c> X<\N> X<\N{}>
X<\o{}>
Sequence Note Description
\t tab (HT, TAB)
\n newline (NL)
\r return (CR)
\f form feed (FF)
\b backspace (BS)
\a alarm (bell) (BEL)
\e escape (ESC)
\x{263A} [1,8] hex char (example: SMILEY)
\x1b [2,8] restricted range hex char (example: ESC)
\N{name} [3] named Unicode character or character sequence
\N{U+263D} [4,8] Unicode character (example: FIRST QUARTER MOON)
\c[ [5] control char (example: chr(27))
\o{23072} [6,8] octal char (example: SMILEY)
\033 [7,8] restricted range octal char (example: ESC)
=over 4
=item [1]
The result is the character specified by the hexadecimal number between
the braces. See L</[8]> below for details on which character.
Only hexadecimal digits are valid between the braces. If an invalid
character is encountered, a warning will be issued and the invalid
character and all subsequent characters (valid or invalid) within the
braces will be discarded.
If there are no valid digits between the braces, the generated character is
the NULL character (C<\x{00}>). However, an explicit empty brace (C<\x{}>)
will not cause a warning (currently).
=item [2]
The result is the character specified by the hexadecimal number in the range
0x00 to 0xFF. See L</[8]> below for details on which character.
Only hexadecimal digits are valid following C<\x>. When C<\x> is followed
by fewer than two valid digits, any valid digits will be zero-padded. This
means that C<\x7> will be interpreted as C<\x07>, and a lone C<"\x"> will be
interpreted as C<\x00>. Except at the end of a string, having fewer than
two valid digits will result in a warning. Note that although the warning
says the illegal character is ignored, it is only ignored as part of the
escape and will still be used as the subsequent character in the string.
For example:
Original Result Warns?
"\x7" "\x07" no
"\x" "\x00" no
"\x7q" "\x07q" yes
"\xq" "\x00q" yes
=item [3]
The result is the Unicode character or character sequence given by I<name>.
See L<charnames>.
=item [4]
S<C<\N{U+I<hexadecimal number>}>> means the Unicode character whose Unicode code
point is I<hexadecimal number>.
=item [5]
The character following C<\c> is mapped to some other character as shown in the
table:
Sequence Value
\c@ chr(0)
\cA chr(1)
\ca chr(1)
\cB chr(2)
\cb chr(2)
...
\cZ chr(26)
\cz chr(26)
\c[ chr(27)
# See below for chr(28)
\c] chr(29)
\c^ chr(30)
\c_ chr(31)
\c? chr(127) # (on ASCII platforms; see below for link to
# EBCDIC discussion)
In other words, it's the character whose code point has had 64 xor'd with
its uppercase. C<\c?> is DELETE on ASCII platforms because
S<C<ord("?") ^ 64>> is 127, and
C<\c@> is NULL because the ord of C<"@"> is 64, so xor'ing 64 itself produces 0.
Also, C<\c\I<X>> yields S<C< chr(28) . "I<X>">> for any I<X>, but cannot come at the
end of a string, because the backslash would be parsed as escaping the end
quote.
On ASCII platforms, the resulting characters from the list above are the
complete set of ASCII controls. This isn't the case on EBCDIC platforms; see
L<perlebcdic/OPERATOR DIFFERENCES> for a full discussion of the
differences between these for ASCII versus EBCDIC platforms.
Use of any other character following the C<"c"> besides those listed above is
discouraged, and as of Perl v5.20, the only characters actually allowed
are the printable ASCII ones, minus the left brace C<"{">. What happens
for any of the allowed other characters is that the value is derived by
xor'ing with the seventh bit, which is 64, and a warning raised if
enabled. Using the non-allowed characters generates a fatal error.
To get platform independent controls, you can use C<\N{...}>.
=item [6]
The result is the character specified by the octal number between the braces.
See L</[8]> below for details on which character.
If a character that isn't an octal digit is encountered, a warning is raised,
and the value is based on the octal digits before it, discarding it and all
following characters up to the closing brace. It is a fatal error if there are
no octal digits at all.
=item [7]
The result is the character specified by the three-digit octal number in the
range 000 to 777 (but best to not use above 077, see next paragraph). See
L</[8]> below for details on which character.
Some contexts allow 2 or even 1 digit, but any usage without exactly
three digits, the first being a zero, may give unintended results. (For
example, in a regular expression it may be confused with a backreference;
see L<perlrebackslash/Octal escapes>.) Starting in Perl 5.14, you may
use C<\o{}> instead, which avoids all these problems. Otherwise, it is best to
use this construct only for ordinals C<\077> and below, remembering to pad to
the left with zeros to make three digits. For larger ordinals, either use
C<\o{}>, or convert to something else, such as to hex and use C<\N{U+}>
(which is portable between platforms with different character sets) or
C<\x{}> instead.
=item [8]
Several constructs above specify a character by a number. That number
gives the character's position in the character set encoding (indexed from 0).
This is called synonymously its ordinal, code position, or code point. Perl
works on platforms that have a native encoding currently of either ASCII/Latin1
or EBCDIC, each of which allow specification of 256 characters. In general, if
the number is 255 (0xFF, 0377) or below, Perl interprets this in the platform's
native encoding. If the number is 256 (0x100, 0400) or above, Perl interprets
it as a Unicode code point and the result is the corresponding Unicode
character. For example C<\x{50}> and C<\o{120}> both are the number 80 in
decimal, which is less than 256, so the number is interpreted in the native
character set encoding. In ASCII the character in the 80th position (indexed
from 0) is the letter C<"P">, and in EBCDIC it is the ampersand symbol C<"&">.
C<\x{100}> and C<\o{400}> are both 256 in decimal, so the number is interpreted
as a Unicode code point no matter what the native encoding is. The name of the
character in the 256th position (indexed by 0) in Unicode is
C<LATIN CAPITAL LETTER A WITH MACRON>.
An exception to the above rule is that S<C<\N{U+I<hex number>}>> is
always interpreted as a Unicode code point, so that C<\N{U+0050}> is C<"P"> even
on EBCDIC platforms.
=back
B<NOTE>: Unlike C and other languages, Perl has no C<\v> escape sequence for
the vertical tab (VT, which is 11 in both ASCII and EBCDIC), but you may
use C<\N{VT}>, C<\ck>, C<\N{U+0b}>, or C<\x0b>. (C<\v>
does have meaning in regular expression patterns in Perl, see L<perlre>.)
The following escape sequences are available in constructs that interpolate,
but not in transliterations.
X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q> X<\F>
\l lowercase next character only
\u titlecase (not uppercase!) next character only
\L lowercase all characters till \E or end of string
\U uppercase all characters till \E or end of string
\F foldcase all characters till \E or end of string
\Q quote (disable) pattern metacharacters till \E or
end of string
\E end either case modification or quoted section
(whichever was last seen)
See L<perlfunc/quotemeta> for the exact definition of characters that
are quoted by C<\Q>.
C<\L>, C<\U>, C<\F>, and C<\Q> can stack, in which case you need one
C<\E> for each. For example:
say"This \Qquoting \ubusiness \Uhere isn't quite\E done yet,\E is it?";
This quoting\ Business\ HERE\ ISN\'T\ QUITE\ done\ yet\, is it?
If a S<C<use locale>> form that includes C<LC_CTYPE> is in effect (see
L<perllocale>), the case map used by C<\l>, C<\L>, C<\u>, and C<\U> is
taken from the current locale. If Unicode (for example, C<\N{}> or code
points of 0x100 or beyond) is being used, the case map used by C<\l>,
C<\L>, C<\u>, and C<\U> is as defined by Unicode. That means that
case-mapping a single character can sometimes produce a sequence of
several characters.
Under S<C<use locale>>, C<\F> produces the same results as C<\L>
for all locales but a UTF-8 one, where it instead uses the Unicode
definition.
All systems use the virtual C<"\n"> to represent a line terminator,
called a "newline". There is no such thing as an unvarying, physical
newline character. It is only an illusion that the operating system,
device drivers, C libraries, and Perl all conspire to preserve. Not all
systems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF. For example,
on the ancient Macs (pre-MacOS X) of yesteryear, these used to be reversed,
and on systems without a line terminator,
printing C<"\n"> might emit no actual data. In general, use C<"\n"> when
you mean a "newline" for your system, but use the literal ASCII when you
need an exact character. For example, most networking protocols expect
and prefer a CR+LF (C<"\015\012"> or C<"\cM\cJ">) for line terminators,
and although they often accept just C<"\012">, they seldom tolerate just
C<"\015">. If you get in the habit of using C<"\n"> for networking,
you may be burned some day.
X<newline> X<line terminator> X<eol> X<end of line>
X<\n> X<\r> X<\r\n>
For constructs that do interpolate, variables beginning with "C<$>"
or "C<@>" are interpolated. Subscripted variables such as C<$a[3]> or
C<< $href->{key}[0] >> are also interpolated, as are array and hash slices.
But method calls such as C<< $obj->meth >> are not.
Interpolating an array or slice interpolates the elements in order,
separated by the value of C<$">, so is equivalent to interpolating
S<C<join $", @array>>. "Punctuation" arrays such as C<@*> are usually
interpolated only if the name is enclosed in braces C<@{*}>, but the
arrays C<@_>, C<@+>, and C<@-> are interpolated even without braces.
For double-quoted strings, the quoting from C<\Q> is applied after
interpolation and escapes are processed.
"abc\Qfoo\tbar$s\Exyz"
is equivalent to
"abc" . quotemeta("foo\tbar$s") . "xyz"
For the pattern of regex operators (C<qr//>, C<m//> and C<s///>),
the quoting from C<\Q> is applied after interpolation is processed,
but before escapes are processed. This allows the pattern to match
literally (except for C<$> and C<@>). For example, the following matches:
'\s\t' =~ /\Q\s\t/
Because C<$> or C<@> trigger interpolation, you'll need to use something
like C</\Quser\E\@\Qhost/> to match them literally.
Patterns are subject to an additional level of interpretation as a
regular expression. This is done as a second pass, after variables are
interpolated, so that regular expressions may be incorporated into the
pattern from the variables. If this is not what you want, use C<\Q> to
interpolate a variable literally.
Apart from the behavior described above, Perl does not expand
multiple levels of interpolation. In particular, contrary to the
expectations of shell programmers, back-quotes do I<NOT> interpolate
within double quotes, nor do single quotes impede evaluation of
variables when used within double quotes.
=head2 Regexp Quote-Like Operators
X<operator, regexp>
Here are the quote-like operators that apply to pattern
matching and related activities.
=over 8
=item C<qr/I<STRING>/msixpodualn>
X<qr> X</i> X</m> X</o> X</s> X</x> X</p>
This operator quotes (and possibly compiles) its I<STRING> as a regular
expression. I<STRING> is interpolated the same way as I<PATTERN>
in C<m/I<PATTERN>/>. If C<"'"> is used as the delimiter, no variable
interpolation is done. Returns a Perl value which may be used instead of the
corresponding C</I<STRING>/msixpodualn> expression. The returned value is a
normalized version of the original pattern. It magically differs from
a string containing the same characters: C<ref(qr/x/)> returns "Regexp";
however, dereferencing it is not well defined (you currently get the
normalized version of the original pattern, but this may change).
For example,
$rex = qr/my.STRING/is;
print $rex; # prints (?si-xm:my.STRING)
s/$rex/foo/;
is equivalent to
s/my.STRING/foo/is;
The result may be used as a subpattern in a match:
$re = qr/$pattern/;
$string =~ /foo${re}bar/; # can be interpolated in other
# patterns
$string =~ $re; # or used standalone
$string =~ /$re/; # or this way
Since Perl may compile the pattern at the moment of execution of the C<qr()>
operator, using C<qr()> may have speed advantages in some situations,
notably if the result of C<qr()> is used standalone:
sub match {
my $patterns = shift;
my @compiled = map qr/$_/i, @$patterns;
grep {
my $success = 0;
foreach my $pat (@compiled) {
$success = 1, last if /$pat/;
}
$success;
} @_;
}
Precompilation of the pattern into an internal representation at
the moment of C<qr()> avoids the need to recompile the pattern every
time a match C</$pat/> is attempted. (Perl has many other internal
optimizations, but none would be triggered in the above example if
we did not use C<qr()> operator.)
Options (specified by the following modifiers) are:
m Treat string as multiple lines.
s Treat string as single line. (Make . match a newline)
i Do case-insensitive pattern matching.
x Use extended regular expressions; specifying two
x's means \t and the SPACE character are ignored within
square-bracketed character classes
p When matching preserve a copy of the matched string so
that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be
defined (ignored starting in v5.20) as these are always
defined starting in that release
o Compile pattern only once.
a ASCII-restrict: Use ASCII for \d, \s, \w and [[:posix:]]
character classes; specifying two a's adds the further
restriction that no ASCII character will match a
non-ASCII one under /i.
l Use the current run-time locale's rules.
u Use Unicode rules.
d Use Unicode or native charset, as in 5.12 and earlier.
n Non-capture mode. Don't let () fill in $1, $2, etc...
If a precompiled pattern is embedded in a larger pattern then the effect
of C<"msixpluadn"> will be propagated appropriately. The effect that the
C</o> modifier has is not propagated, being restricted to those patterns
explicitly using it.
The last four modifiers listed above, added in Perl 5.14,
control the character set rules, but C</a> is the only one you are likely
to want to specify explicitly; the other three are selected
automatically by various pragmas.
See L<perlre> for additional information on valid syntax for I<STRING>, and
for a detailed look at the semantics of regular expressions. In
particular, all modifiers except the largely obsolete C</o> are further
explained in L<perlre/Modifiers>. C</o> is described in the next section.
=item C<m/I<PATTERN>/msixpodualngc>
X<m> X<operator, match>
X<regexp, options> X<regexp> X<regex, options> X<regex>
X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c>
=item C</I<PATTERN>/msixpodualngc>
Searches a string for a pattern match, and in scalar context returns
true if it succeeds, false if it fails. If no string is specified
via the C<=~> or C<!~> operator, the C<$_> string is searched. (The
string specified with C<=~> need not be an lvalue--it may be the
result of an expression evaluation, but remember the C<=~> binds
rather tightly.) See also L<perlre>.
Options are as described in C<qr//> above; in addition, the following match
process modifiers are available:
g Match globally, i.e., find all occurrences.
c Do not reset search position on a failed match when /g is
in effect.
If C<"/"> is the delimiter then the initial C<m> is optional. With the C<m>
you can use any pair of non-whitespace (ASCII) characters
as delimiters. This is particularly useful for matching path names
that contain C<"/">, to avoid LTS (leaning toothpick syndrome). If C<"?"> is
the delimiter, then a match-only-once rule applies,
described in C<m?I<PATTERN>?> below. If C<"'"> (single quote) is the delimiter,
no variable interpolation is performed on the I<PATTERN>.
When using a delimiter character valid in an identifier, whitespace is required
after the C<m>.
I<PATTERN> may contain variables, which will be interpolated
every time the pattern search is evaluated, except
for when the delimiter is a single quote. (Note that C<$(>, C<$)>, and
C<$|> are not interpolated because they look like end-of-string tests.)
Perl will not recompile the pattern unless an interpolated
variable that it contains changes. You can force Perl to skip the
test and never recompile by adding a C</o> (which stands for "once")
after the trailing delimiter.
Once upon a time, Perl would recompile regular expressions
unnecessarily, and this modifier was useful to tell it not to do so, in the
interests of speed. But now, the only reasons to use C</o> are one of:
=over
=item 1
The variables are thousands of characters long and you know that they
don't change, and you need to wring out the last little bit of speed by
having Perl skip testing for that. (There is a maintenance penalty for
doing this, as mentioning C</o> constitutes a promise that you won't
change the variables in the pattern. If you do change them, Perl won't
even notice.)
=item 2
you want the pattern to use the initial values of the variables
regardless of whether they change or not. (But there are saner ways
of accomplishing this than using C</o>.)
=item 3
If the pattern contains embedded code, such as
use re 'eval';
$code = 'foo(?{ $x })';
/$code/
then perl will recompile each time, even though the pattern string hasn't
changed, to ensure that the current value of C<$x> is seen each time.
Use C</o> if you want to avoid this.
=back
The bottom line is that using C</o> is almost never a good idea.
=item The empty pattern C<//>
If the I<PATTERN> evaluates to the empty string, the last
I<successfully> matched regular expression is used instead. In this
case, only the C<g> and C<c> flags on the empty pattern are honored;
the other flags are taken from the original pattern. If no match has
previously succeeded, this will (silently) act instead as a genuine
empty pattern (which will always match).
Note that it's possible to confuse Perl into thinking C<//> (the empty
regex) is really C<//> (the defined-or operator). Perl is usually pretty
good about this, but some pathological cases might trigger this, such as
C<$x///> (is that S<C<($x) / (//)>> or S<C<$x // />>?) and S<C<print $fh //>>
(S<C<print $fh(//>> or S<C<print($fh //>>?). In all of these examples, Perl
will assume you meant defined-or. If you meant the empty regex, just
use parentheses or spaces to disambiguate, or even prefix the empty
regex with an C<m> (so C<//> becomes C<m//>).
=item Matching in list context
If the C</g> option is not used, C<m//> in list context returns a
list consisting of the subexpressions matched by the parentheses in the
pattern, that is, (C<$1>, C<$2>, C<$3>...) (Note that here C<$1> etc. are
also set). When there are no parentheses in the pattern, the return
value is the list C<(1)> for success.
With or without parentheses, an empty list is returned upon failure.
Examples:
open(TTY, "+</dev/tty")
|| die "can't access /dev/tty: $!";
<TTY> =~ /^y/i && foo(); # do foo if desired
if (/Version: *([0-9.]*)/) { $version = $1; }
next if m#^/usr/spool/uucp#;
# poor man's grep
$arg = shift;
while (<>) {
print if /$arg/o; # compile only once (no longer needed!)
}
if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
This last example splits C<$foo> into the first two words and the
remainder of the line, and assigns those three fields to C<$F1>, C<$F2>, and
C<$Etc>. The conditional is true if any variables were assigned; that is,
if the pattern matched.
The C</g> modifier specifies global pattern matching--that is,
matching as many times as possible within the string. How it behaves
depends on the context. In list context, it returns a list of the
substrings matched by any capturing parentheses in the regular
expression. If there are no parentheses, it returns a list of all
the matched strings, as if there were parentheses around the whole
pattern.
In scalar context, each execution of C<m//g> finds the next match,
returning true if it matches, and false if there is no further match.
The position after the last match can be read or set using the C<pos()>
function; see L<perlfunc/pos>. A failed match normally resets the
search position to the beginning of the string, but you can avoid that
by adding the C</c> modifier (for example, C<m//gc>). Modifying the target
string also resets the search position.
=item C<\G I<assertion>>
You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
zero-width assertion that matches the exact position where the
previous C<m//g>, if any, left off. Without the C</g> modifier, the
C<\G> assertion still anchors at C<pos()> as it was at the start of
the operation (see L<perlfunc/pos>), but the match is of course only
attempted once. Using C<\G> without C</g> on a target string that has
not previously had a C</g> match applied to it is the same as using
the C<\A> assertion to match the beginning of the string. Note also
that, currently, C<\G> is only properly supported when anchored at the
very beginning of the pattern.
Examples:
# list context
($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
# scalar context
local $/ = "";
while ($paragraph = <>) {
while ($paragraph =~ /\p{Ll}['")]*[.!?]+['")]*\s/g) {
$sentences++;
}
}
say $sentences;
Here's another way to check for sentences in a paragraph:
my $sentence_rx = qr{
(?: (?<= ^ ) | (?<= \s ) ) # after start-of-string or
# whitespace
\p{Lu} # capital letter
.*? # a bunch of anything
(?<= \S ) # that ends in non-
# whitespace
(?<! \b [DMS]r ) # but isn't a common abbr.
(?<! \b Mrs )
(?<! \b Sra )
(?<! \b St )
[.?!] # followed by a sentence
# ender
(?= $ | \s ) # in front of end-of-string
# or whitespace
}sx;
local $/ = "";
while (my $paragraph = <>) {
say "NEW PARAGRAPH";
my $count = 0;
while ($paragraph =~ /($sentence_rx)/g) {
printf "\tgot sentence %d: <%s>\n", ++$count, $1;
}
}
Here's how to use C<m//gc> with C<\G>:
$_ = "ppooqppqq";
while ($i++ < 2) {
print "1: '";
print $1 while /(o)/gc; print "', pos=", pos, "\n";
print "2: '";
print $1 if /\G(q)/gc; print "', pos=", pos, "\n";
print "3: '";
print $1 while /(p)/gc; print "', pos=", pos, "\n";
}
print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
The last example should print:
1: 'oo', pos=4
2: 'q', pos=5
3: 'pp', pos=7
1: '', pos=7
2: 'q', pos=8
3: '', pos=8
Final: 'q', pos=8
Notice that the final match matched C<q> instead of C<p>, which a match
without the C<\G> anchor would have done. Also note that the final match
did not update C<pos>. C<pos> is only updated on a C</g> match. If the
final match did indeed match C<p>, it's a good bet that you're running a
very old (pre-5.6.0) version of Perl.
A useful idiom for C<lex>-like scanners is C</\G.../gc>. You can
combine several regexps like this to process a string part-by-part,
doing different actions depending on which regexp matched. Each
regexp tries to match where the previous one leaves off.
$_ = <<'EOL';
$url = URI::URL->new( "http://example.com/" );
die if $url eq "xXx";
EOL
LOOP: {
print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
print(" lowercase"), redo LOOP
if /\G\p{Ll}+\b[,.;]?\s*/gc;
print(" UPPERCASE"), redo LOOP
if /\G\p{Lu}+\b[,.;]?\s*/gc;
print(" Capitalized"), redo LOOP
if /\G\p{Lu}\p{Ll}+\b[,.;]?\s*/gc;
print(" MiXeD"), redo LOOP if /\G\pL+\b[,.;]?\s*/gc;
print(" alphanumeric"), redo LOOP
if /\G[\p{Alpha}\pN]+\b[,.;]?\s*/gc;
print(" line-noise"), redo LOOP if /\G\W+/gc;
print ". That's all!\n";
}
Here is the output (split into several lines):
line-noise lowercase line-noise UPPERCASE line-noise UPPERCASE
line-noise lowercase line-noise lowercase line-noise lowercase
lowercase line-noise lowercase lowercase line-noise lowercase
lowercase line-noise MiXeD line-noise. That's all!
=item C<m?I<PATTERN>?msixpodualngc>
X<?> X<operator, match-once>
This is just like the C<m/I<PATTERN>/> search, except that it matches
only once between calls to the C<reset()> operator. This is a useful
optimization when you want to see only the first occurrence of
something in each file of a set of files, for instance. Only C<m??>
patterns local to the current package are reset.
while (<>) {
if (m?^$?) {
# blank line between header and body
}
} continue {
reset if eof; # clear m?? status for next file
}
Another example switched the first "latin1" encoding it finds
to "utf8" in a pod file:
s//utf8/ if m? ^ =encoding \h+ \K latin1 ?x;
The match-once behavior is controlled by the match delimiter being
C<?>; with any other delimiter this is the normal C<m//> operator.
In the past, the leading C<m> in C<m?I<PATTERN>?> was optional, but omitting it
would produce a deprecation warning. As of v5.22.0, omitting it produces a
syntax error. If you encounter this construct in older code, you can just add
C<m>.
=item C<s/I<PATTERN>/I<REPLACEMENT>/msixpodualngcer>
X<s> X<substitute> X<substitution> X<replace> X<regexp, replace>
X<regexp, substitute> X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c> X</e> X</r>
Searches a string for a pattern, and if found, replaces that pattern
with the replacement text and returns the number of substitutions
made. Otherwise it returns false (specifically, the empty string).
If the C</r> (non-destructive) option is used then it runs the
substitution on a copy of the string and instead of returning the
number of substitutions, it returns the copy whether or not a
substitution occurred. The original string is never changed when
C</r> is used. The copy will always be a plain string, even if the
input is an object or a tied variable.
If no string is specified via the C<=~> or C<!~> operator, the C<$_>
variable is searched and modified. Unless the C</r> option is used,
the string specified must be a scalar variable, an array element, a
hash element, or an assignment to one of those; that is, some sort of
scalar lvalue.
If the delimiter chosen is a single quote, no variable interpolation is
done on either the I<PATTERN> or the I<REPLACEMENT>. Otherwise, if the
I<PATTERN> contains a C<$> that looks like a variable rather than an
end-of-string test, the variable will be interpolated into the pattern
at run-time. If you want the pattern compiled only once the first time
the variable is interpolated, use the C</o> option. If the pattern
evaluates to the empty string, the last successfully executed regular
expression is used instead. See L<perlre> for further explanation on these.
Options are as with C<m//> with the addition of the following replacement
specific options:
e Evaluate the right side as an expression.
ee Evaluate the right side as a string then eval the
result.
r Return substitution and leave the original string
untouched.
Any non-whitespace delimiter may replace the slashes. Add space after
the C<s> when using a character allowed in identifiers. If single quotes
are used, no interpretation is done on the replacement string (the C</e>
modifier overrides this, however). Note that Perl treats backticks
as normal delimiters; the replacement text is not evaluated as a command.
If the I<PATTERN> is delimited by bracketing quotes, the I<REPLACEMENT> has
its own pair of quotes, which may or may not be bracketing quotes, for example,
C<s(foo)(bar)> or C<< s<foo>/bar/ >>. A C</e> will cause the
replacement portion to be treated as a full-fledged Perl expression
and evaluated right then and there. It is, however, syntax checked at
compile-time. A second C<e> modifier will cause the replacement portion
to be C<eval>ed before being run as a Perl expression.
Examples:
s/\bgreen\b/mauve/g; # don't change wintergreen
$path =~ s|/usr/bin|/usr/local/bin|;
s/Login: $foo/Login: $bar/; # run-time pattern
($foo = $bar) =~ s/this/that/; # copy first, then
# change
($foo = "$bar") =~ s/this/that/; # convert to string,
# copy, then change
$foo = $bar =~ s/this/that/r; # Same as above using /r
$foo = $bar =~ s/this/that/r
=~ s/that/the other/r; # Chained substitutes
# using /r
@foo = map { s/this/that/r } @bar # /r is very useful in
# maps
$count = ($paragraph =~ s/Mister\b/Mr./g); # get change-cnt
$_ = 'abc123xyz';
s/\d+/$&*2/e; # yields 'abc246xyz'
s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
s/%(.)/$percent{$1}/g; # change percent escapes; no /e
s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
s/^=(\w+)/pod($1)/ge; # use function call
$_ = 'abc123xyz';
$x = s/abc/def/r; # $x is 'def123xyz' and
# $_ remains 'abc123xyz'.
# expand variables in $_, but dynamics only, using
# symbolic dereferencing
s/\$(\w+)/${$1}/g;
# Add one to the value of any numbers in the string
s/(\d+)/1 + $1/eg;
# Titlecase words in the last 30 characters only
substr($str, -30) =~ s/\b(\p{Alpha}+)\b/\u\L$1/g;
# This will expand any embedded scalar variable
# (including lexicals) in $_ : First $1 is interpolated
# to the variable name, and then evaluated
s/(\$\w+)/$1/eeg;
# Delete (most) C comments.
$program =~ s {
/\* # Match the opening delimiter.
.*? # Match a minimal number of characters.
\*/ # Match the closing delimiter.
} []gsx;
s/^\s*(.*?)\s*$/$1/; # trim whitespace in $_,
# expensively
for ($variable) { # trim whitespace in $variable,
# cheap
s/^\s+//;
s/\s+$//;
}
s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
Note the use of C<$> instead of C<\> in the last example. Unlike
B<sed>, we use the \<I<digit>> form only in the left hand side.
Anywhere else it's $<I<digit>>.
Occasionally, you can't use just a C</g> to get all the changes
to occur that you might want. Here are two common cases:
# put commas in the right places in an integer
1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
# expand tabs to 8-column spacing
1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
=back
=head2 Quote-Like Operators
X<operator, quote-like>
=over 4
=item C<q/I<STRING>/>
X<q> X<quote, single> X<'> X<''>
=item C<'I<STRING>'>
A single-quoted, literal string. A backslash represents a backslash
unless followed by the delimiter or another backslash, in which case
the delimiter or backslash is interpolated.
$foo = q!I said, "You said, 'She said it.'"!;
$bar = q('This is it.');
$baz = '\n'; # a two-character string
=item C<qq/I<STRING>/>
X<qq> X<quote, double> X<"> X<"">
=item "I<STRING>"
A double-quoted, interpolated string.
$_ .= qq
(*** The previous line contains the naughty word "$1".\n)
if /\b(tcl|java|python)\b/i; # :-)
$baz = "\n"; # a one-character string
=item C<qx/I<STRING>/>
X<qx> X<`> X<``> X<backtick>
=item C<`I<STRING>`>
A string which is (possibly) interpolated and then executed as a
system command with F</bin/sh> or its equivalent. Shell wildcards,
pipes, and redirections will be honored. The collected standard
output of the command is returned; standard error is unaffected. In
scalar context, it comes back as a single (potentially multi-line)
string, or C<undef> if the command failed. In list context, returns a
list of lines (however you've defined lines with C<$/> or
C<$INPUT_RECORD_SEPARATOR>), or an empty list if the command failed.
Because backticks do not affect standard error, use shell file descriptor
syntax (assuming the shell supports this) if you care to address this.
To capture a command's STDERR and STDOUT together:
$output = `cmd 2>&1`;
To capture a command's STDOUT but discard its STDERR:
$output = `cmd 2>/dev/null`;
To capture a command's STDERR but discard its STDOUT (ordering is
important here):
$output = `cmd 2>&1 1>/dev/null`;
To exchange a command's STDOUT and STDERR in order to capture the STDERR
but leave its STDOUT to come out the old STDERR:
$output = `cmd 3>&1 1>&2 2>&3 3>&-`;
To read both a command's STDOUT and its STDERR separately, it's easiest
to redirect them separately to files, and then read from those files
when the program is done:
system("program args 1>program.stdout 2>program.stderr");
The STDIN filehandle used by the command is inherited from Perl's STDIN.
For example:
open(SPLAT, "stuff") || die "can't open stuff: $!";
open(STDIN, "<&SPLAT") || die "can't dupe SPLAT: $!";
print STDOUT `sort`;
will print the sorted contents of the file named F<"stuff">.
Using single-quote as a delimiter protects the command from Perl's
double-quote interpolation, passing it on to the shell instead:
$perl_info = qx(ps $$); # that's Perl's $$
$shell_info = qx'ps $$'; # that's the new shell's $$
How that string gets evaluated is entirely subject to the command
interpreter on your system. On most platforms, you will have to protect
shell metacharacters if you want them treated literally. This is in
practice difficult to do, as it's unclear how to escape which characters.
See L<perlsec> for a clean and safe example of a manual C<fork()> and C<exec()>
to emulate backticks safely.
On some platforms (notably DOS-like ones), the shell may not be
capable of dealing with multiline commands, so putting newlines in
the string may not get you what you want. You may be able to evaluate
multiple commands in a single line by separating them with the command
separator character, if your shell supports that (for example, C<;> on
many Unix shells and C<&> on the Windows NT C<cmd> shell).
Perl will attempt to flush all files opened for
output before starting the child process, but this may not be supported
on some platforms (see L<perlport>). To be safe, you may need to set
C<$|> (C<$AUTOFLUSH> in C<L<English>>) or call the C<autoflush()> method of
C<L<IO::Handle>> on any open handles.
Beware that some command shells may place restrictions on the length
of the command line. You must ensure your strings don't exceed this
limit after any necessary interpolations. See the platform-specific
release notes for more details about your particular environment.
Using this operator can lead to programs that are difficult to port,
because the shell commands called vary between systems, and may in
fact not be present at all. As one example, the C<type> command under
the POSIX shell is very different from the C<type> command under DOS.
That doesn't mean you should go out of your way to avoid backticks
when they're the right way to get something done. Perl was made to be
a glue language, and one of the things it glues together is commands.
Just understand what you're getting yourself into.
Like C<system>, backticks put the child process exit code in C<$?>.
If you'd like to manually inspect failure, you can check all possible
failure modes by inspecting C<$?> like this:
if ($? == -1) {
print "failed to execute: $!\n";
}
elsif ($? & 127) {
printf "child died with signal %d, %s coredump\n",
($? & 127), ($? & 128) ? 'with' : 'without';
}
else {
printf "child exited with value %d\n", $? >> 8;
}
Use the L<open> pragma to control the I/O layers used when reading the
output of the command, for example:
use open IN => ":encoding(UTF-8)";
my $x = `cmd-producing-utf-8`;
See L</"I/O Operators"> for more discussion.
=item C<qw/I<STRING>/>
X<qw> X<quote, list> X<quote, words>
Evaluates to a list of the words extracted out of I<STRING>, using embedded
whitespace as the word delimiters. It can be understood as being roughly
equivalent to:
split(" ", q/STRING/);
the differences being that it generates a real list at compile time, and
in scalar context it returns the last element in the list. So
this expression:
qw(foo bar baz)
is semantically equivalent to the list:
"foo", "bar", "baz"
Some frequently seen examples:
use POSIX qw( setlocale localeconv )
@EXPORT = qw( foo bar baz );
A common mistake is to try to separate the words with commas or to
put comments into a multi-line C<qw>-string. For this reason, the
S<C<use warnings>> pragma and the B<-w> switch (that is, the C<$^W> variable)
produces warnings if the I<STRING> contains the C<","> or the C<"#"> character.
=item C<tr/I<SEARCHLIST>/I<REPLACEMENTLIST>/cdsr>
X<tr> X<y> X<transliterate> X</c> X</d> X</s>
=item C<y/I<SEARCHLIST>/I<REPLACEMENTLIST>/cdsr>
Transliterates all occurrences of the characters found in the search list
with the corresponding character in the replacement list. It returns
the number of characters replaced or deleted. If no string is
specified via the C<=~> or C<!~> operator, the C<$_> string is transliterated.
If the C</r> (non-destructive) option is present, a new copy of the string
is made and its characters transliterated, and this copy is returned no
matter whether it was modified or not: the original string is always
left unchanged. The new copy is always a plain string, even if the input
string is an object or a tied variable.
Unless the C</r> option is used, the string specified with C<=~> must be a
scalar variable, an array element, a hash element, or an assignment to one
of those; in other words, an lvalue.
A character range may be specified with a hyphen, so C<tr/A-J/0-9/>
does the same replacement as C<tr/ACEGIBDFHJ/0246813579/>.
For B<sed> devotees, C<y> is provided as a synonym for C<tr>. If the
I<SEARCHLIST> is delimited by bracketing quotes, the I<REPLACEMENTLIST>
must have its own pair of quotes, which may or may not be bracketing
quotes; for example, C<tr[aeiouy][yuoiea]> or C<tr(+\-*/)/ABCD/>.
Characters may be literals or any of the escape sequences accepted in
double-quoted strings. But there is no variable interpolation, so C<"$">
and C<"@"> are treated as literals. A hyphen at the beginning or end, or
preceded by a backslash is considered a literal. Escape sequence
details are in L<the table near the beginning of this section|/Quote and
Quote-like Operators>.
Note that C<tr> does B<not> do regular expression character classes such as
C<\d> or C<\pL>. The C<tr> operator is not equivalent to the C<L<tr(1)>>
utility. C<tr[a-z][A-Z]> will uppercase the 26 letters "a" through "z",
but for case changing not confined to ASCII, use
L<C<lc>|perlfunc/lc>, L<C<uc>|perlfunc/uc>,
L<C<lcfirst>|perlfunc/lcfirst>, L<C<ucfirst>|perlfunc/ucfirst>
(all documented in L<perlfunc>), or the
L<substitution operator C<sE<sol>I<PATTERN>E<sol>I<REPLACEMENT>E<sol>>|/sE<sol>PATTERNE<sol>REPLACEMENTE<sol>msixpodualngcer>
(with C<\U>, C<\u>, C<\L>, and C<\l> string-interpolation escapes in the
I<REPLACEMENT> portion).
Most ranges are unportable between character sets, but certain ones
signal Perl to do special handling to make them portable. There are two
classes of portable ranges. The first are any subsets of the ranges
C<A-Z>, C<a-z>, and C<0-9>, when expressed as literal characters.
tr/h-k/H-K/
capitalizes the letters C<"h">, C<"i">, C<"j">, and C<"k"> and nothing
else, no matter what the platform's character set is. In contrast, all
of
tr/\x68-\x6B/\x48-\x4B/
tr/h-\x6B/H-\x4B/
tr/\x68-k/\x48-K/
do the same capitalizations as the previous example when run on ASCII
platforms, but something completely different on EBCDIC ones.
The second class of portable ranges is invoked when one or both of the
range's end points are expressed as C<\N{...}>
$string =~ tr/\N{U+20}-\N{U+7E}//d;
removes from C<$string> all the platform's characters which are
equivalent to any of Unicode U+0020, U+0021, ... U+007D, U+007E. This
is a portable range, and has the same effect on every platform it is
run on. It turns out that in this example, these are the ASCII
printable characters. So after this is run, C<$string> has only
controls and characters which have no ASCII equivalents.
But, even for portable ranges, it is not generally obvious what is
included without having to look things up. A sound principle is to use
only ranges that begin from and end at either ASCII alphabetics of equal
case (C<b-e>, C<B-E>), or digits (C<1-4>). Anything else is unclear
(and unportable unless C<\N{...}> is used). If in doubt, spell out the
character sets in full.
Options:
c Complement the SEARCHLIST.
d Delete found but unreplaced characters.
s Squash duplicate replaced characters.
r Return the modified string and leave the original string
untouched.
If the C</c> modifier is specified, the I<SEARCHLIST> character set
is complemented. If the C</d> modifier is specified, any characters
specified by I<SEARCHLIST> not found in I<REPLACEMENTLIST> are deleted.
(Note that this is slightly more flexible than the behavior of some
B<tr> programs, which delete anything they find in the I<SEARCHLIST>,
period.) If the C</s> modifier is specified, sequences of characters
that were transliterated to the same character are squashed down
to a single instance of the character.
If the C</d> modifier is used, the I<REPLACEMENTLIST> is always interpreted
exactly as specified. Otherwise, if the I<REPLACEMENTLIST> is shorter
than the I<SEARCHLIST>, the final character is replicated till it is long
enough. If the I<REPLACEMENTLIST> is empty, the I<SEARCHLIST> is replicated.
This latter is useful for counting characters in a class or for
squashing character sequences in a class.
Examples:
$ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case ASCII
$cnt = tr/*/*/; # count the stars in $_
$cnt = $sky =~ tr/*/*/; # count the stars in $sky
$cnt = tr/0-9//; # count the digits in $_
tr/a-zA-Z//s; # bookkeeper -> bokeper
($HOST = $host) =~ tr/a-z/A-Z/;
$HOST = $host =~ tr/a-z/A-Z/r; # same thing
$HOST = $host =~ tr/a-z/A-Z/r # chained with s///r
=~ s/:/ -p/r;
tr/a-zA-Z/ /cs; # change non-alphas to single space
@stripped = map tr/a-zA-Z/ /csr, @original;
# /r with map
tr [\200-\377]
[\000-\177]; # wickedly delete 8th bit
If multiple transliterations are given for a character, only the
first one is used:
tr/AAA/XYZ/
will transliterate any A to X.
Because the transliteration table is built at compile time, neither
the I<SEARCHLIST> nor the I<REPLACEMENTLIST> are subjected to double quote
interpolation. That means that if you want to use variables, you
must use an C<eval()>:
eval "tr/$oldlist/$newlist/";
die $@ if $@;
eval "tr/$oldlist/$newlist/, 1" or die $@;
=item C<< <<I<EOF> >>
X<here-doc> X<heredoc> X<here-document> X<<< << >>>
A line-oriented form of quoting is based on the shell "here-document"
syntax. Following a C<< << >> you specify a string to terminate
the quoted material, and all lines following the current line down to
the terminating string are the value of the item.
Prefixing the terminating string with a C<~> specifies that you
want to use L</Indented Here-docs> (see below).
The terminating string may be either an identifier (a word), or some
quoted text. An unquoted identifier works like double quotes.
There may not be a space between the C<< << >> and the identifier,
unless the identifier is explicitly quoted. (If you put a space it
will be treated as a null identifier, which is valid, and matches the
first empty line.) The terminating string must appear by itself
(unquoted and with no surrounding whitespace) on the terminating line.
If the terminating string is quoted, the type of quotes used determine
the treatment of the text.
=over 4
=item Double Quotes
Double quotes indicate that the text will be interpolated using exactly
the same rules as normal double quoted strings.
print <<EOF;
The price is $Price.
EOF
print << "EOF"; # same as above
The price is $Price.
EOF
=item Single Quotes
Single quotes indicate the text is to be treated literally with no
interpolation of its content. This is similar to single quoted
strings except that backslashes have no special meaning, with C<\\>
being treated as two backslashes and not one as they would in every
other quoting construct.
Just as in the shell, a backslashed bareword following the C<<< << >>>
means the same thing as a single-quoted string does:
$cost = <<'VISTA'; # hasta la ...
That'll be $10 please, ma'am.
VISTA
$cost = <<\VISTA; # Same thing!
That'll be $10 please, ma'am.
VISTA
This is the only form of quoting in perl where there is no need
to worry about escaping content, something that code generators
can and do make good use of.
=item Backticks
The content of the here doc is treated just as it would be if the
string were embedded in backticks. Thus the content is interpolated
as though it were double quoted and then executed via the shell, with
the results of the execution returned.
print << `EOC`; # execute command and get results
echo hi there
EOC
=back
=over 4
=item Indented Here-docs
The here-doc modifier C<~> allows you to indent your here-docs to make
the code more readable:
if ($some_var) {
print <<~EOF;
This is a here-doc
EOF
}
This will print...
This is a here-doc
...with no leading whitespace.
The delimiter is used to determine the B<exact> whitespace to
remove from the beginning of each line. All lines B<must> have
at least the same starting whitespace (except lines only
containing a newline) or perl will croak. Tabs and spaces can
be mixed, but are matched exactly. One tab will not be equal to
8 spaces!
Additional beginning whitespace (beyond what preceded the
delimiter) will be preserved:
print <<~EOF;
This text is not indented
This text is indented with two spaces
This text is indented with two tabs
EOF
Finally, the modifier may be used with all of the forms
mentioned above:
<<~\EOF;
<<~'EOF'
<<~"EOF"
<<~`EOF`
And whitespace may be used between the C<~> and quoted delimiters:
<<~ 'EOF'; # ... "EOF", `EOF`
=back
It is possible to stack multiple here-docs in a row:
print <<"foo", <<"bar"; # you can stack them
I said foo.
foo
I said bar.
bar
myfunc(<< "THIS", 23, <<'THAT');
Here's a line
or two.
THIS
and here's another.
THAT
Just don't forget that you have to put a semicolon on the end
to finish the statement, as Perl doesn't know you're not going to
try to do this:
print <<ABC
179231
ABC
+ 20;
If you want to remove the line terminator from your here-docs,
use C<chomp()>.
chomp($string = <<'END');
This is a string.
END
If you want your here-docs to be indented with the rest of the code,
you'll need to remove leading whitespace from each line manually:
($quote = <<'FINIS') =~ s/^\s+//gm;
The Road goes ever on and on,
down from the door where it began.
FINIS
If you use a here-doc within a delimited construct, such as in C<s///eg>,
the quoted material must still come on the line following the
C<<< <<FOO >>> marker, which means it may be inside the delimited
construct:
s/this/<<E . 'that'
the other
E
. 'more '/eg;
It works this way as of Perl 5.18. Historically, it was inconsistent, and
you would have to write
s/this/<<E . 'that'
. 'more '/eg;
the other
E
outside of string evals.
Additionally, quoting rules for the end-of-string identifier are
unrelated to Perl's quoting rules. C<q()>, C<qq()>, and the like are not
supported in place of C<''> and C<"">, and the only interpolation is for
backslashing the quoting character:
print << "abc\"def";
testing...
abc"def
Finally, quoted strings cannot span multiple lines. The general rule is
that the identifier must be a string literal. Stick with that, and you
should be safe.
=back
=head2 Gory details of parsing quoted constructs
X<quote, gory details>
When presented with something that might have several different
interpretations, Perl uses the B<DWIM> (that's "Do What I Mean")
principle to pick the most probable interpretation. This strategy
is so successful that Perl programmers often do not suspect the
ambivalence of what they write. But from time to time, Perl's
notions differ substantially from what the author honestly meant.
This section hopes to clarify how Perl handles quoted constructs.
Although the most common reason to learn this is to unravel labyrinthine
regular expressions, because the initial steps of parsing are the
same for all quoting operators, they are all discussed together.
The most important Perl parsing rule is the first one discussed
below: when processing a quoted construct, Perl first finds the end
of that construct, then interprets its contents. If you understand
this rule, you may skip the rest of this section on the first
reading. The other rules are likely to contradict the user's
expectations much less frequently than this first one.
Some passes discussed below are performed concurrently, but because
their results are the same, we consider them individually. For different
quoting constructs, Perl performs different numbers of passes, from
one to four, but these passes are always performed in the same order.
=over 4
=item Finding the end
The first pass is finding the end of the quoted construct. This results
in saving to a safe location a copy of the text (between the starting
and ending delimiters), normalized as necessary to avoid needing to know
what the original delimiters were.
If the construct is a here-doc, the ending delimiter is a line
that has a terminating string as the content. Therefore C<<<EOF> is
terminated by C<EOF> immediately followed by C<"\n"> and starting
from the first column of the terminating line.
When searching for the terminating line of a here-doc, nothing
is skipped. In other words, lines after the here-doc syntax
are compared with the terminating string line by line.
For the constructs except here-docs, single characters are used as starting
and ending delimiters. If the starting delimiter is an opening punctuation
(that is C<(>, C<[>, C<{>, or C<< < >>), the ending delimiter is the
corresponding closing punctuation (that is C<)>, C<]>, C<}>, or C<< > >>).
If the starting delimiter is an unpaired character like C</> or a closing
punctuation, the ending delimiter is the same as the starting delimiter.
Therefore a C</> terminates a C<qq//> construct, while a C<]> terminates
both C<qq[]> and C<qq]]> constructs.
When searching for single-character delimiters, escaped delimiters
and C<\\> are skipped. For example, while searching for terminating C</>,
combinations of C<\\> and C<\/> are skipped. If the delimiters are
bracketing, nested pairs are also skipped. For example, while searching
for a closing C<]> paired with the opening C<[>, combinations of C<\\>, C<\]>,
and C<\[> are all skipped, and nested C<[> and C<]> are skipped as well.
However, when backslashes are used as the delimiters (like C<qq\\> and
C<tr\\\>), nothing is skipped.
During the search for the end, backslashes that escape delimiters or
other backslashes are removed (exactly speaking, they are not copied to the
safe location).
For constructs with three-part delimiters (C<s///>, C<y///>, and
C<tr///>), the search is repeated once more.
If the first delimiter is not an opening punctuation, the three delimiters must
be the same, such as C<s!!!> and C<tr)))>,
in which case the second delimiter
terminates the left part and starts the right part at once.
If the left part is delimited by bracketing punctuation (that is C<()>,
C<[]>, C<{}>, or C<< <> >>), the right part needs another pair of
delimiters such as C<s(){}> and C<tr[]//>. In these cases, whitespace
and comments are allowed between the two parts, although the comment must follow
at least one whitespace character; otherwise a character expected as the
start of the comment may be regarded as the starting delimiter of the right part.
During this search no attention is paid to the semantics of the construct.
Thus:
"$hash{"$foo/$bar"}"
or:
m/
bar # NOT a comment, this slash / terminated m//!
/x
do not form legal quoted expressions. The quoted part ends on the
first C<"> and C</>, and the rest happens to be a syntax error.
Because the slash that terminated C<m//> was followed by a C<SPACE>,
the example above is not C<m//x>, but rather C<m//> with no C</x>
modifier. So the embedded C<#> is interpreted as a literal C<#>.
Also no attention is paid to C<\c\> (multichar control char syntax) during
this search. Thus the second C<\> in C<qq/\c\/> is interpreted as a part
of C<\/>, and the following C</> is not recognized as a delimiter.
Instead, use C<\034> or C<\x1c> at the end of quoted constructs.
=item Interpolation
X<interpolation>
The next step is interpolation in the text obtained, which is now
delimiter-independent. There are multiple cases.
=over 4
=item C<<<'EOF'>
No interpolation is performed.
Note that the combination C<\\> is left intact, since escaped delimiters
are not available for here-docs.
=item C<m''>, the pattern of C<s'''>
No interpolation is performed at this stage.
Any backslashed sequences including C<\\> are treated at the stage
to L</"parsing regular expressions">.
=item C<''>, C<q//>, C<tr'''>, C<y'''>, the replacement of C<s'''>
The only interpolation is removal of C<\> from pairs of C<\\>.
Therefore C<"-"> in C<tr'''> and C<y'''> is treated literally
as a hyphen and no character range is available.
C<\1> in the replacement of C<s'''> does not work as C<$1>.
=item C<tr///>, C<y///>
No variable interpolation occurs. String modifying combinations for
case and quoting such as C<\Q>, C<\U>, and C<\E> are not recognized.
The other escape sequences such as C<\200> and C<\t> and backslashed
characters such as C<\\> and C<\-> are converted to appropriate literals.
The character C<"-"> is treated specially and therefore C<\-> is treated
as a literal C<"-">.
=item C<"">, C<``>, C<qq//>, C<qx//>, C<< <file*glob> >>, C<<<"EOF">
C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F> (possibly paired with C<\E>) are
converted to corresponding Perl constructs. Thus, C<"$foo\Qbaz$bar">
is converted to S<C<$foo . (quotemeta("baz" . $bar))>> internally.
The other escape sequences such as C<\200> and C<\t> and backslashed
characters such as C<\\> and C<\-> are replaced with appropriate
expansions.
Let it be stressed that I<whatever falls between C<\Q> and C<\E>>
is interpolated in the usual way. Something like C<"\Q\\E"> has
no C<\E> inside. Instead, it has C<\Q>, C<\\>, and C<E>, so the
result is the same as for C<"\\\\E">. As a general rule, backslashes
between C<\Q> and C<\E> may lead to counterintuitive results. So,
C<"\Q\t\E"> is converted to C<quotemeta("\t")>, which is the same
as C<"\\\t"> (since TAB is not alphanumeric). Note also that:
$str = '\t';
return "\Q$str";
may be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">.
Interpolated scalars and arrays are converted internally to the C<join> and
C<"."> catenation operations. Thus, S<C<"$foo XXX '@arr'">> becomes:
$foo . " XXX '" . (join $", @arr) . "'";
All operations above are performed simultaneously, left to right.
Because the result of S<C<"\Q I<STRING> \E">> has all metacharacters
quoted, there is no way to insert a literal C<$> or C<@> inside a
C<\Q\E> pair. If protected by C<\>, C<$> will be quoted to become
C<"\\\$">; if not, it is interpreted as the start of an interpolated
scalar.
Note also that the interpolation code needs to make a decision on
where the interpolated scalar ends. For instance, whether
S<C<< "a $x -> {c}" >>> really means:
"a " . $x . " -> {c}";
or:
"a " . $x -> {c};
Most of the time, the longest possible text that does not include
spaces between components and which contains matching braces or
brackets. because the outcome may be determined by voting based
on heuristic estimators, the result is not strictly predictable.
Fortunately, it's usually correct for ambiguous cases.
=item the replacement of C<s///>
Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F> and interpolation
happens as with C<qq//> constructs.
It is at this step that C<\1> is begrudgingly converted to C<$1> in
the replacement text of C<s///>, in order to correct the incorrigible
I<sed> hackers who haven't picked up the saner idiom yet. A warning
is emitted if the S<C<use warnings>> pragma or the B<-w> command-line flag
(that is, the C<$^W> variable) was set.
=item C<RE> in C<m?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>,
Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F>, C<\E>,
and interpolation happens (almost) as with C<qq//> constructs.
Processing of C<\N{...}> is also done here, and compiled into an intermediate
form for the regex compiler. (This is because, as mentioned below, the regex
compilation may be done at execution time, and C<\N{...}> is a compile-time
construct.)
However any other combinations of C<\> followed by a character
are not substituted but only skipped, in order to parse them
as regular expressions at the following step.
As C<\c> is skipped at this step, C<@> of C<\c@> in RE is possibly
treated as an array symbol (for example C<@foo>),
even though the same text in C<qq//> gives interpolation of C<\c@>.
Code blocks such as C<(?{BLOCK})> are handled by temporarily passing control
back to the perl parser, in a similar way that an interpolated array
subscript expression such as C<"foo$array[1+f("[xyz")]bar"> would be.
Moreover, inside C<(?{BLOCK})>, S<C<(?# comment )>>, and
a C<#>-comment in a C</x>-regular expression, no processing is
performed whatsoever. This is the first step at which the presence
of the C</x> modifier is relevant.
Interpolation in patterns has several quirks: C<$|>, C<$(>, C<$)>, C<@+>
and C<@-> are not interpolated, and constructs C<$var[SOMETHING]> are
voted (by several different estimators) to be either an array element
or C<$var> followed by an RE alternative. This is where the notation
C<${arr[$bar]}> comes handy: C</${arr[0-9]}/> is interpreted as
array element C<-9>, not as a regular expression from the variable
C<$arr> followed by a digit, which would be the interpretation of
C</$arr[0-9]/>. Since voting among different estimators may occur,
the result is not predictable.
The lack of processing of C<\\> creates specific restrictions on
the post-processed text. If the delimiter is C</>, one cannot get
the combination C<\/> into the result of this step. C</> will
finish the regular expression, C<\/> will be stripped to C</> on
the previous step, and C<\\/> will be left as is. Because C</> is
equivalent to C<\/> inside a regular expression, this does not
matter unless the delimiter happens to be character special to the
RE engine, such as in C<s*foo*bar*>, C<m[foo]>, or C<m?foo?>; or an
alphanumeric char, as in:
m m ^ a \s* b mmx;
In the RE above, which is intentionally obfuscated for illustration, the
delimiter is C<m>, the modifier is C<mx>, and after delimiter-removal the
RE is the same as for S<C<m/ ^ a \s* b /mx>>. There's more than one
reason you're encouraged to restrict your delimiters to non-alphanumeric,
non-whitespace choices.
=back
This step is the last one for all constructs except regular expressions,
which are processed further.
=item parsing regular expressions
X<regexp, parse>
Previous steps were performed during the compilation of Perl code,
but this one happens at run time, although it may be optimized to
be calculated at compile time if appropriate. After preprocessing
described above, and possibly after evaluation if concatenation,
joining, casing translation, or metaquoting are involved, the
resulting I<string> is passed to the RE engine for compilation.
Whatever happens in the RE engine might be better discussed in L<perlre>,
but for the sake of continuity, we shall do so here.
This is another step where the presence of the C</x> modifier is
relevant. The RE engine scans the string from left to right and
converts it into a finite automaton.
Backslashed characters are either replaced with corresponding
literal strings (as with C<\{>), or else they generate special nodes
in the finite automaton (as with C<\b>). Characters special to the
RE engine (such as C<|>) generate corresponding nodes or groups of
nodes. C<(?#...)> comments are ignored. All the rest is either
converted to literal strings to match, or else is ignored (as is
whitespace and C<#>-style comments if C</x> is present).
Parsing of the bracketed character class construct, C<[...]>, is
rather different than the rule used for the rest of the pattern.
The terminator of this construct is found using the same rules as
for finding the terminator of a C<{}>-delimited construct, the only
exception being that C<]> immediately following C<[> is treated as
though preceded by a backslash.
The terminator of runtime C<(?{...})> is found by temporarily switching
control to the perl parser, which should stop at the point where the
logically balancing terminating C<}> is found.
It is possible to inspect both the string given to RE engine and the
resulting finite automaton. See the arguments C<debug>/C<debugcolor>
in the S<C<use L<re>>> pragma, as well as Perl's B<-Dr> command-line
switch documented in L<perlrun/"Command Switches">.
=item Optimization of regular expressions
X<regexp, optimization>
This step is listed for completeness only. Since it does not change
semantics, details of this step are not documented and are subject
to change without notice. This step is performed over the finite
automaton that was generated during the previous pass.
It is at this stage that C<split()> silently optimizes C</^/> to
mean C</^/m>.
=back
=head2 I/O Operators
X<operator, i/o> X<operator, io> X<io> X<while> X<filehandle>
X<< <> >> X<< <<>> >> X<@ARGV>
There are several I/O operators you should know about.
A string enclosed by backticks (grave accents) first undergoes
double-quote interpolation. It is then interpreted as an external
command, and the output of that command is the value of the
backtick string, like in a shell. In scalar context, a single string
consisting of all output is returned. In list context, a list of
values is returned, one per line of output. (You can set C<$/> to use
a different line terminator.) The command is executed each time the
pseudo-literal is evaluated. The status value of the command is
returned in C<$?> (see L<perlvar> for the interpretation of C<$?>).
Unlike in B<csh>, no translation is done on the return data--newlines
remain newlines. Unlike in any of the shells, single quotes do not
hide variable names in the command from interpretation. To pass a
literal dollar-sign through to the shell you need to hide it with a
backslash. The generalized form of backticks is C<qx//>. (Because
backticks always undergo shell expansion as well, see L<perlsec> for
security concerns.)
X<qx> X<`> X<``> X<backtick> X<glob>
In scalar context, evaluating a filehandle in angle brackets yields
the next line from that file (the newline, if any, included), or
C<undef> at end-of-file or on error. When C<$/> is set to C<undef>
(sometimes known as file-slurp mode) and the file is empty, it
returns C<''> the first time, followed by C<undef> subsequently.
Ordinarily you must assign the returned value to a variable, but
there is one situation where an automatic assignment happens. If
and only if the input symbol is the only thing inside the conditional
of a C<while> statement (even if disguised as a C<for(;;)> loop),
the value is automatically assigned to the global variable C<$_>,
destroying whatever was there previously. (This may seem like an
odd thing to you, but you'll use the construct in almost every Perl
script you write.) The C<$_> variable is not implicitly localized.
You'll have to put a S<C<local $_;>> before the loop if you want that
to happen.
The following lines are equivalent:
while (defined($_ = <STDIN>)) { print; }
while ($_ = <STDIN>) { print; }
while (<STDIN>) { print; }
for (;<STDIN>;) { print; }
print while defined($_ = <STDIN>);
print while ($_ = <STDIN>);
print while <STDIN>;
This also behaves similarly, but assigns to a lexical variable
instead of to C<$_>:
while (my $line = <STDIN>) { print $line }
In these loop constructs, the assigned value (whether assignment
is automatic or explicit) is then tested to see whether it is
defined. The defined test avoids problems where the line has a string
value that would be treated as false by Perl; for example a "" or
a C<"0"> with no trailing newline. If you really mean for such values
to terminate the loop, they should be tested for explicitly:
while (($_ = <STDIN>) ne '0') { ... }
while (<STDIN>) { last unless $_; ... }
In other boolean contexts, C<< <I<FILEHANDLE>> >> without an
explicit C<defined> test or comparison elicits a warning if the
S<C<use warnings>> pragma or the B<-w>
command-line switch (the C<$^W> variable) is in effect.
The filehandles STDIN, STDOUT, and STDERR are predefined. (The
filehandles C<stdin>, C<stdout>, and C<stderr> will also work except
in packages, where they would be interpreted as local identifiers
rather than global.) Additional filehandles may be created with
the C<open()> function, amongst others. See L<perlopentut> and
L<perlfunc/open> for details on this.
X<stdin> X<stdout> X<sterr>
If a C<< <I<FILEHANDLE>> >> is used in a context that is looking for
a list, a list comprising all input lines is returned, one line per
list element. It's easy to grow to a rather large data space this
way, so use with care.
C<< <I<FILEHANDLE>> >> may also be spelled C<readline(*I<FILEHANDLE>)>.
See L<perlfunc/readline>.
The null filehandle C<< <> >> is special: it can be used to emulate the
behavior of B<sed> and B<awk>, and any other Unix filter program
that takes a list of filenames, doing the same to each line
of input from all of them. Input from C<< <> >> comes either from
standard input, or from each file listed on the command line. Here's
how it works: the first time C<< <> >> is evaluated, the C<@ARGV> array is
checked, and if it is empty, C<$ARGV[0]> is set to C<"-">, which when opened
gives you standard input. The C<@ARGV> array is then processed as a list
of filenames. The loop
while (<>) {
... # code for each line
}
is equivalent to the following Perl-like pseudo code:
unshift(@ARGV, '-') unless @ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}
except that it isn't so cumbersome to say, and will actually work.
It really does shift the C<@ARGV> array and put the current filename
into the C<$ARGV> variable. It also uses filehandle I<ARGV>
internally. C<< <> >> is just a synonym for C<< <ARGV> >>, which
is magical. (The pseudo code above doesn't work because it treats
C<< <ARGV> >> as non-magical.)
Since the null filehandle uses the two argument form of L<perlfunc/open>
it interprets special characters, so if you have a script like this:
while (<>) {
print;
}
and call it with S<C<perl dangerous.pl 'rm -rfv *|'>>, it actually opens a
pipe, executes the C<rm> command and reads C<rm>'s output from that pipe.
If you want all items in C<@ARGV> to be interpreted as file names, you
can use the module C<ARGV::readonly> from CPAN, or use the double bracket:
while (<<>>) {
print;
}
Using double angle brackets inside of a while causes the open to use the
three argument form (with the second argument being C<< < >>), so all
arguments in C<ARGV> are treated as literal filenames (including C<"-">).
(Note that for convenience, if you use C<< <<>> >> and if C<@ARGV> is
empty, it will still read from the standard input.)
You can modify C<@ARGV> before the first C<< <> >> as long as the array ends up
containing the list of filenames you really want. Line numbers (C<$.>)
continue as though the input were one big happy file. See the example
in L<perlfunc/eof> for how to reset line numbers on each file.
If you want to set C<@ARGV> to your own list of files, go right ahead.
This sets C<@ARGV> to all plain text files if no C<@ARGV> was given:
@ARGV = grep { -f && -T } glob('*') unless @ARGV;
You can even set them to pipe commands. For example, this automatically
filters compressed arguments through B<gzip>:
@ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
If you want to pass switches into your script, you can use one of the
C<Getopts> modules or put a loop on the front like this:
while ($_ = $ARGV[0], /^-/) {
shift;
last if /^--$/;
if (/^-D(.*)/) { $debug = $1 }
if (/^-v/) { $verbose++ }
# ... # other switches
}
while (<>) {
# ... # code for each line
}
The C<< <> >> symbol will return C<undef> for end-of-file only once.
If you call it again after this, it will assume you are processing another
C<@ARGV> list, and if you haven't set C<@ARGV>, will read input from STDIN.
If what the angle brackets contain is a simple scalar variable (for example,
C<$foo>), then that variable contains the name of the
filehandle to input from, or its typeglob, or a reference to the
same. For example:
$fh = \*STDIN;
$line = <$fh>;
If what's within the angle brackets is neither a filehandle nor a simple
scalar variable containing a filehandle name, typeglob, or typeglob
reference, it is interpreted as a filename pattern to be globbed, and
either a list of filenames or the next filename in the list is returned,
depending on context. This distinction is determined on syntactic
grounds alone. That means C<< <$x> >> is always a C<readline()> from
an indirect handle, but C<< <$hash{key}> >> is always a C<glob()>.
That's because C<$x> is a simple scalar variable, but C<$hash{key}> is
not--it's a hash element. Even C<< <$x > >> (note the extra space)
is treated as C<glob("$x ")>, not C<readline($x)>.
One level of double-quote interpretation is done first, but you can't
say C<< <$foo> >> because that's an indirect filehandle as explained
in the previous paragraph. (In older versions of Perl, programmers
would insert curly brackets to force interpretation as a filename glob:
C<< <${foo}> >>. These days, it's considered cleaner to call the
internal function directly as C<glob($foo)>, which is probably the right
way to have done it in the first place.) For example:
while (<*.c>) {
chmod 0644, $_;
}
is roughly equivalent to:
open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
while (<FOO>) {
chomp;
chmod 0644, $_;
}
except that the globbing is actually done internally using the standard
C<L<File::Glob>> extension. Of course, the shortest way to do the above is:
chmod 0644, <*.c>;
A (file)glob evaluates its (embedded) argument only when it is
starting a new list. All values must be read before it will start
over. In list context, this isn't important because you automatically
get them all anyway. However, in scalar context the operator returns
the next value each time it's called, or C<undef> when the list has
run out. As with filehandle reads, an automatic C<defined> is
generated when the glob occurs in the test part of a C<while>,
because legal glob returns (for example,
a file called F<0>) would otherwise
terminate the loop. Again, C<undef> is returned only once. So if
you're expecting a single value from a glob, it is much better to
say
($file) = <blurch*>;
than
$file = <blurch*>;
because the latter will alternate between returning a filename and
returning false.
If you're trying to do variable interpolation, it's definitely better
to use the C<glob()> function, because the older notation can cause people
to become confused with the indirect filehandle notation.
@files = glob("$dir/*.[ch]");
@files = glob($files[$i]);
=head2 Constant Folding
X<constant folding> X<folding>
Like C, Perl does a certain amount of expression evaluation at
compile time whenever it determines that all arguments to an
operator are static and have no side effects. In particular, string
concatenation happens at compile time between literals that don't do
variable substitution. Backslash interpolation also happens at
compile time. You can say
'Now is the time for all'
. "\n"
. 'good men to come to.'
and this all reduces to one string internally. Likewise, if
you say
foreach $file (@filenames) {
if (-s $file > 5 + 100 * 2**16) { }
}
the compiler precomputes the number which that expression
represents so that the interpreter won't have to.
=head2 No-ops
X<no-op> X<nop>
Perl doesn't officially have a no-op operator, but the bare constants
C<0> and C<1> are special-cased not to produce a warning in void
context, so you can for example safely do
1 while foo();
=head2 Bitwise String Operators
X<operator, bitwise, string> X<&.> X<|.> X<^.> X<~.>
Bitstrings of any size may be manipulated by the bitwise operators
(C<~ | & ^>).
If the operands to a binary bitwise op are strings of different
sizes, B<|> and B<^> ops act as though the shorter operand had
additional zero bits on the right, while the B<&> op acts as though
the longer operand were truncated to the length of the shorter.
The granularity for such extension or truncation is one or more
bytes.
# ASCII-based examples
print "j p \n" ^ " a h"; # prints "JAPH\n"
print "JA" | " ph\n"; # prints "japh\n"
print "japh\nJunk" & '_____'; # prints "JAPH\n";
print 'p N$' ^ " E<H\n"; # prints "Perl\n";
If you are intending to manipulate bitstrings, be certain that
you're supplying bitstrings: If an operand is a number, that will imply
a B<numeric> bitwise operation. You may explicitly show which type of
operation you intend by using C<""> or C<0+>, as in the examples below.
$foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF)
$foo = '150' | 105; # yields 255
$foo = 150 | '105'; # yields 255
$foo = '150' | '105'; # yields string '155' (under ASCII)
$baz = 0+$foo & 0+$bar; # both ops explicitly numeric
$biz = "$foo" ^ "$bar"; # both ops explicitly stringy
This somewhat unpredictable behavior can be avoided with the experimental
"bitwise" feature, new in Perl 5.22. You can enable it via S<C<use feature
'bitwise'>>. By default, it will warn unless the C<"experimental::bitwise">
warnings category has been disabled. (S<C<use experimental 'bitwise'>> will
enable the feature and disable the warning.) Under this feature, the four
standard bitwise operators (C<~ | & ^>) are always numeric. Adding a dot
after each operator (C<~. |. &. ^.>) forces it to treat its operands as
strings:
use experimental "bitwise";
$foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF)
$foo = '150' | 105; # yields 255
$foo = 150 | '105'; # yields 255
$foo = '150' | '105'; # yields 255
$foo = 150 |. 105; # yields string '155'
$foo = '150' |. 105; # yields string '155'
$foo = 150 |.'105'; # yields string '155'
$foo = '150' |.'105'; # yields string '155'
$baz = $foo & $bar; # both operands numeric
$biz = $foo ^. $bar; # both operands stringy
The assignment variants of these operators (C<&= |= ^= &.= |.= ^.=>)
behave likewise under the feature.
The behavior of these operators is problematic (and subject to change)
if either or both of the strings are encoded in UTF-8 (see
L<perlunicode/Byte and Character Semantics>.
See L<perlfunc/vec> for information on how to manipulate individual bits
in a bit vector.
=head2 Integer Arithmetic
X<integer>
By default, Perl assumes that it must do most of its arithmetic in
floating point. But by saying
use integer;
you may tell the compiler to use integer operations
(see L<integer> for a detailed explanation) from here to the end of
the enclosing BLOCK. An inner BLOCK may countermand this by saying
no integer;
which lasts until the end of that BLOCK. Note that this doesn't
mean everything is an integer, merely that Perl will use integer
operations for arithmetic, comparison, and bitwise operators. For
example, even under S<C<use integer>>, if you take the C<sqrt(2)>, you'll
still get C<1.4142135623731> or so.
Used on numbers, the bitwise operators (C<&> C<|> C<^> C<~> C<< << >>
C<< >> >>) always produce integral results. (But see also
L</Bitwise String Operators>.) However, S<C<use integer>> still has meaning for
them. By default, their results are interpreted as unsigned integers, but
if S<C<use integer>> is in effect, their results are interpreted
as signed integers. For example, C<~0> usually evaluates to a large
integral value. However, S<C<use integer; ~0>> is C<-1> on two's-complement
machines.
=head2 Floating-point Arithmetic
X<floating-point> X<floating point> X<float> X<real>
While S<C<use integer>> provides integer-only arithmetic, there is no
analogous mechanism to provide automatic rounding or truncation to a
certain number of decimal places. For rounding to a certain number
of digits, C<sprintf()> or C<printf()> is usually the easiest route.
See L<perlfaq4>.
Floating-point numbers are only approximations to what a mathematician
would call real numbers. There are infinitely more reals than floats,
so some corners must be cut. For example:
printf "%.20g\n", 123456789123456789;
# produces 123456789123456784
Testing for exact floating-point equality or inequality is not a
good idea. Here's a (relatively expensive) work-around to compare
whether two floating-point numbers are equal to a particular number of
decimal places. See Knuth, volume II, for a more robust treatment of
this topic.
sub fp_equal {
my ($X, $Y, $POINTS) = @_;
my ($tX, $tY);
$tX = sprintf("%.${POINTS}g", $X);
$tY = sprintf("%.${POINTS}g", $Y);
return $tX eq $tY;
}
The POSIX module (part of the standard perl distribution) implements
C<ceil()>, C<floor()>, and other mathematical and trigonometric functions.
The C<L<Math::Complex>> module (part of the standard perl distribution)
defines mathematical functions that work on both the reals and the
imaginary numbers. C<Math::Complex> is not as efficient as POSIX, but
POSIX can't work with complex numbers.
Rounding in financial applications can have serious implications, and
the rounding method used should be specified precisely. In these
cases, it probably pays not to trust whichever system rounding is
being used by Perl, but to instead implement the rounding function you
need yourself.
=head2 Bigger Numbers
X<number, arbitrary precision>
The standard C<L<Math::BigInt>>, C<L<Math::BigRat>>, and
C<L<Math::BigFloat>> modules,
along with the C<bignum>, C<bigint>, and C<bigrat> pragmas, provide
variable-precision arithmetic and overloaded operators, although
they're currently pretty slow. At the cost of some space and
considerable speed, they avoid the normal pitfalls associated with
limited-precision representations.
use 5.010;
use bigint; # easy interface to Math::BigInt
$x = 123456789123456789;
say $x * $x;
+15241578780673678515622620750190521
Or with rationals:
use 5.010;
use bigrat;
$x = 3/22;
$y = 4/6;
say "x/y is ", $x/$y;
say "x*y is ", $x*$y;
x/y is 9/44
x*y is 1/11
Several modules let you calculate with unlimited or fixed precision
(bound only by memory and CPU time). There
are also some non-standard modules that
provide faster implementations via external C libraries.
Here is a short, but incomplete summary:
Math::String treat string sequences like numbers
Math::FixedPrecision calculate with a fixed precision
Math::Currency for currency calculations
Bit::Vector manipulate bit vectors fast (uses C)
Math::BigIntFast Bit::Vector wrapper for big numbers
Math::Pari provides access to the Pari C library
Math::Cephes uses the external Cephes C library (no
big numbers)
Math::Cephes::Fraction fractions via the Cephes library
Math::GMP another one using an external C library
Math::GMPz an alternative interface to libgmp's big ints
Math::GMPq an interface to libgmp's fraction numbers
Math::GMPf an interface to libgmp's floating point numbers
Choose wisely.
=cut
PK y3�Z&�j� � perl5005delta.podnu �[��� =head1 NAME
perl5005delta - what's new for perl5.005
=head1 DESCRIPTION
This document describes differences between the 5.004 release and this one.
=head1 About the new versioning system
Perl is now developed on two tracks: a maintenance track that makes
small, safe updates to released production versions with emphasis on
compatibility; and a development track that pursues more aggressive
evolution. Maintenance releases (which should be considered production
quality) have subversion numbers that run from C<1> to C<49>, and
development releases (which should be considered "alpha" quality) run
from C<50> to C<99>.
Perl 5.005 is the combined product of the new dual-track development
scheme.
=head1 Incompatible Changes
=head2 WARNING: This version is not binary compatible with Perl 5.004.
Starting with Perl 5.004_50 there were many deep and far-reaching changes
to the language internals. If you have dynamically loaded extensions
that you built under perl 5.003 or 5.004, you can continue to use them
with 5.004, but you will need to rebuild and reinstall those extensions
to use them 5.005. See F<INSTALL> for detailed instructions on how to
upgrade.
=head2 Default installation structure has changed
The new Configure defaults are designed to allow a smooth upgrade from
5.004 to 5.005, but you should read F<INSTALL> for a detailed
discussion of the changes in order to adapt them to your system.
=head2 Perl Source Compatibility
When none of the experimental features are enabled, there should be
very few user-visible Perl source compatibility issues.
If threads are enabled, then some caveats apply. C<@_> and C<$_> become
lexical variables. The effect of this should be largely transparent to
the user, but there are some boundary conditions under which user will
need to be aware of the issues. For example, C<local(@_)> results in
a "Can't localize lexical variable @_ ..." message. This may be enabled
in a future version.
Some new keywords have been introduced. These are generally expected to
have very little impact on compatibility. See L<New C<INIT> keyword>,
L<New C<lock> keyword>, and L<New C<qrE<sol>E<sol>> operator>.
Certain barewords are now reserved. Use of these will provoke a warning
if you have asked for them with the C<-w> switch.
See L<C<our> is now a reserved word>.
=head2 C Source Compatibility
There have been a large number of changes in the internals to support
the new features in this release.
=over 4
=item *
Core sources now require ANSI C compiler
An ANSI C compiler is now B<required> to build perl. See F<INSTALL>.
=item *
All Perl global variables must now be referenced with an explicit prefix
All Perl global variables that are visible for use by extensions now
have a C<PL_> prefix. New extensions should C<not> refer to perl globals
by their unqualified names. To preserve sanity, we provide limited
backward compatibility for globals that are being widely used like
C<sv_undef> and C<na> (which should now be written as C<PL_sv_undef>,
C<PL_na> etc.)
If you find that your XS extension does not compile anymore because a
perl global is not visible, try adding a C<PL_> prefix to the global
and rebuild.
It is strongly recommended that all functions in the Perl API that don't
begin with C<perl> be referenced with a C<Perl_> prefix. The bare function
names without the C<Perl_> prefix are supported with macros, but this
support may cease in a future release.
See L<perlapi>.
=item *
Enabling threads has source compatibility issues
Perl built with threading enabled requires extensions to use the new
C<dTHR> macro to initialize the handle to access per-thread data.
If you see a compiler error that talks about the variable C<thr> not
being declared (when building a module that has XS code), you need
to add C<dTHR;> at the beginning of the block that elicited the error.
The API function C<perl_get_sv("@",GV_ADD)> should be used instead of
directly accessing perl globals as C<GvSV(errgv)>. The API call is
backward compatible with existing perls and provides source compatibility
with threading is enabled.
See L</"C Source Compatibility"> for more information.
=back
=head2 Binary Compatibility
This version is NOT binary compatible with older versions. All extensions
will need to be recompiled. Further binaries built with threads enabled
are incompatible with binaries built without. This should largely be
transparent to the user, as all binary incompatible configurations have
their own unique architecture name, and extension binaries get installed at
unique locations. This allows coexistence of several configurations in
the same directory hierarchy. See F<INSTALL>.
=head2 Security fixes may affect compatibility
A few taint leaks and taint omissions have been corrected. This may lead
to "failure" of scripts that used to work with older versions. Compiling
with -DINCOMPLETE_TAINTS provides a perl with minimal amounts of changes
to the tainting behavior. But note that the resulting perl will have
known insecurities.
Oneliners with the C<-e> switch do not create temporary files anymore.
=head2 Relaxed new mandatory warnings introduced in 5.004
Many new warnings that were introduced in 5.004 have been made
optional. Some of these warnings are still present, but perl's new
features make them less often a problem. See L</New Diagnostics>.
=head2 Licensing
Perl has a new Social Contract for contributors. See F<Porting/Contract>.
The license included in much of the Perl documentation has changed.
Most of the Perl documentation was previously under the implicit GNU
General Public License or the Artistic License (at the user's choice).
Now much of the documentation unambiguously states the terms under which
it may be distributed. Those terms are in general much less restrictive
than the GNU GPL. See L<perl> and the individual perl manpages listed
therein.
=head1 Core Changes
=head2 Threads
WARNING: Threading is considered an B<experimental> feature. Details of the
implementation may change without notice. There are known limitations
and some bugs. These are expected to be fixed in future versions.
See F<README.threads>.
=head2 Compiler
WARNING: The Compiler and related tools are considered B<experimental>.
Features may change without notice, and there are known limitations
and bugs. Since the compiler is fully external to perl, the default
configuration will build and install it.
The Compiler produces three different types of transformations of a
perl program. The C backend generates C code that captures perl's state
just before execution begins. It eliminates the compile-time overheads
of the regular perl interpreter, but the run-time performance remains
comparatively the same. The CC backend generates optimized C code
equivalent to the code path at run-time. The CC backend has greater
potential for big optimizations, but only a few optimizations are
implemented currently. The Bytecode backend generates a platform
independent bytecode representation of the interpreter's state
just before execution. Thus, the Bytecode back end also eliminates
much of the compilation overhead of the interpreter.
The compiler comes with several valuable utilities.
C<B::Lint> is an experimental module to detect and warn about suspicious
code, especially the cases that the C<-w> switch does not detect.
C<B::Deparse> can be used to demystify perl code, and understand
how perl optimizes certain constructs.
C<B::Xref> generates cross reference reports of all definition and use
of variables, subroutines and formats in a program.
C<B::Showlex> show the lexical variables used by a subroutine or file
at a glance.
C<perlcc> is a simple frontend for compiling perl.
See C<ext/B/README>, L<B>, and the respective compiler modules.
=head2 Regular Expressions
Perl's regular expression engine has been seriously overhauled, and
many new constructs are supported. Several bugs have been fixed.
Here is an itemized summary:
=over 4
=item Many new and improved optimizations
Changes in the RE engine:
Unneeded nodes removed;
Substrings merged together;
New types of nodes to process (SUBEXPR)* and similar expressions
quickly, used if the SUBEXPR has no side effects and matches
strings of the same length;
Better optimizations by lookup for constant substrings;
Better search for constants substrings anchored by $ ;
Changes in Perl code using RE engine:
More optimizations to s/longer/short/;
study() was not working;
/blah/ may be optimized to an analogue of index() if $& $` $' not seen;
Unneeded copying of matched-against string removed;
Only matched part of the string is copying if $` $' were not seen;
=item Many bug fixes
Note that only the major bug fixes are listed here. See F<Changes> for others.
Backtracking might not restore start of $3.
No feedback if max count for * or + on "complex" subexpression
was reached, similarly (but at compile time) for {3,34567}
Primitive restrictions on max count introduced to decrease a
possibility of a segfault;
(ZERO-LENGTH)* could segfault;
(ZERO-LENGTH)* was prohibited;
Long REs were not allowed;
/RE/g could skip matches at the same position after a
zero-length match;
=item New regular expression constructs
The following new syntax elements are supported:
(?<=RE)
(?<!RE)
(?{ CODE })
(?i-x)
(?i:RE)
(?(COND)YES_RE|NO_RE)
(?>RE)
\z
=item New operator for precompiled regular expressions
See L<New C<qrE<sol>E<sol>> operator>.
=item Other improvements
Better debugging output (possibly with colors),
even from non-debugging Perl;
RE engine code now looks like C, not like assembler;
Behaviour of RE modifiable by `use re' directive;
Improved documentation;
Test suite significantly extended;
Syntax [:^upper:] etc., reserved inside character classes;
=item Incompatible changes
(?i) localized inside enclosing group;
$( is not interpolated into RE any more;
/RE/g may match at the same position (with non-zero length)
after a zero-length match (bug fix).
=back
See L<perlre> and L<perlop>.
=head2 Improved malloc()
See banner at the beginning of C<malloc.c> for details.
=head2 Quicksort is internally implemented
Perl now contains its own highly optimized qsort() routine. The new qsort()
is resistant to inconsistent comparison functions, so Perl's C<sort()> will
not provoke coredumps any more when given poorly written sort subroutines.
(Some C library C<qsort()>s that were being used before used to have this
problem.) In our testing, the new C<qsort()> required the minimal number
of pair-wise compares on average, among all known C<qsort()> implementations.
See C<perlfunc/sort>.
=head2 Reliable signals
Perl's signal handling is susceptible to random crashes, because signals
arrive asynchronously, and the Perl runtime is not reentrant at arbitrary
times.
However, one experimental implementation of reliable signals is available
when threads are enabled. See C<Thread::Signal>. Also see F<INSTALL> for
how to build a Perl capable of threads.
=head2 Reliable stack pointers
The internals now reallocate the perl stack only at predictable times.
In particular, magic calls never trigger reallocations of the stack,
because all reentrancy of the runtime is handled using a "stack of stacks".
This should improve reliability of cached stack pointers in the internals
and in XSUBs.
=head2 More generous treatment of carriage returns
Perl used to complain if it encountered literal carriage returns in
scripts. Now they are mostly treated like whitespace within program text.
Inside string literals and here documents, literal carriage returns are
ignored if they occur paired with linefeeds, or get interpreted as whitespace
if they stand alone. This behavior means that literal carriage returns
in files should be avoided. You can get the older, more compatible (but
less generous) behavior by defining the preprocessor symbol
C<PERL_STRICT_CR> when building perl. Of course, all this has nothing
whatever to do with how escapes like C<\r> are handled within strings.
Note that this doesn't somehow magically allow you to keep all text files
in DOS format. The generous treatment only applies to files that perl
itself parses. If your C compiler doesn't allow carriage returns in
files, you may still be unable to build modules that need a C compiler.
=head2 Memory leaks
C<substr>, C<pos> and C<vec> don't leak memory anymore when used in lvalue
context. Many small leaks that impacted applications that embed multiple
interpreters have been fixed.
=head2 Better support for multiple interpreters
The build-time option C<-DMULTIPLICITY> has had many of the details
reworked. Some previously global variables that should have been
per-interpreter now are. With care, this allows interpreters to call
each other. See the C<PerlInterp> extension on CPAN.
=head2 Behavior of local() on array and hash elements is now well-defined
See L<perlsub/"Temporary Values via local()">.
=head2 C<%!> is transparently tied to the L<Errno> module
See L<perlvar>, and L<Errno>.
=head2 Pseudo-hashes are supported
See L<perlref>.
=head2 C<EXPR foreach EXPR> is supported
See L<perlsyn>.
=head2 Keywords can be globally overridden
See L<perlsub>.
=head2 C<$^E> is meaningful on Win32
See L<perlvar>.
=head2 C<foreach (1..1000000)> optimized
C<foreach (1..1000000)> is now optimized into a counting loop. It does
not try to allocate a 1000000-size list anymore.
=head2 C<Foo::> can be used as implicitly quoted package name
Barewords caused unintuitive behavior when a subroutine with the same
name as a package happened to be defined. Thus, C<new Foo @args>,
use the result of the call to C<Foo()> instead of C<Foo> being treated
as a literal. The recommended way to write barewords in the indirect
object slot is C<new Foo:: @args>. Note that the method C<new()> is
called with a first argument of C<Foo>, not C<Foo::> when you do that.
=head2 C<exists $Foo::{Bar::}> tests existence of a package
It was impossible to test for the existence of a package without
actually creating it before. Now C<exists $Foo::{Bar::}> can be
used to test if the C<Foo::Bar> namespace has been created.
=head2 Better locale support
See L<perllocale>.
=head2 Experimental support for 64-bit platforms
Perl5 has always had 64-bit support on systems with 64-bit longs.
Starting with 5.005, the beginnings of experimental support for systems
with 32-bit long and 64-bit 'long long' integers has been added.
If you add -DUSE_LONG_LONG to your ccflags in config.sh (or manually
define it in perl.h) then perl will be built with 'long long' support.
There will be many compiler warnings, and the resultant perl may not
work on all systems. There are many other issues related to
third-party extensions and libraries. This option exists to allow
people to work on those issues.
=head2 prototype() returns useful results on builtins
See L<perlfunc/prototype>.
=head2 Extended support for exception handling
C<die()> now accepts a reference value, and C<$@> gets set to that
value in exception traps. This makes it possible to propagate
exception objects. This is an undocumented B<experimental> feature.
=head2 Re-blessing in DESTROY() supported for chaining DESTROY() methods
See L<perlobj/Destructors>.
=head2 All C<printf> format conversions are handled internally
See L<perlfunc/printf>.
=head2 New C<INIT> keyword
C<INIT> subs are like C<BEGIN> and C<END>, but they get run just before
the perl runtime begins execution. e.g., the Perl Compiler makes use of
C<INIT> blocks to initialize and resolve pointers to XSUBs.
=head2 New C<lock> keyword
The C<lock> keyword is the fundamental synchronization primitive
in threaded perl. When threads are not enabled, it is currently a noop.
To minimize impact on source compatibility this keyword is "weak", i.e., any
user-defined subroutine of the same name overrides it, unless a C<use Thread>
has been seen.
=head2 New C<qr//> operator
The C<qr//> operator, which is syntactically similar to the other quote-like
operators, is used to create precompiled regular expressions. This compiled
form can now be explicitly passed around in variables, and interpolated in
other regular expressions. See L<perlop>.
=head2 C<our> is now a reserved word
Calling a subroutine with the name C<our> will now provoke a warning when
using the C<-w> switch.
=head2 Tied arrays are now fully supported
See L<Tie::Array>.
=head2 Tied handles support is better
Several missing hooks have been added. There is also a new base class for
TIEARRAY implementations. See L<Tie::Array>.
=head2 4th argument to substr
substr() can now both return and replace in one operation. The optional
4th argument is the replacement string. See L<perlfunc/substr>.
=head2 Negative LENGTH argument to splice
splice() with a negative LENGTH argument now work similar to what the
LENGTH did for substr(). Previously a negative LENGTH was treated as
0. See L<perlfunc/splice>.
=head2 Magic lvalues are now more magical
When you say something like C<substr($x, 5) = "hi">, the scalar returned
by substr() is special, in that any modifications to it affect $x.
(This is called a 'magic lvalue' because an 'lvalue' is something on
the left side of an assignment.) Normally, this is exactly what you
would expect to happen, but Perl uses the same magic if you use substr(),
pos(), or vec() in a context where they might be modified, like taking
a reference with C<\> or as an argument to a sub that modifies C<@_>.
In previous versions, this 'magic' only went one way, but now changes
to the scalar the magic refers to ($x in the above example) affect the
magic lvalue too. For instance, this code now acts differently:
$x = "hello";
sub printit {
$x = "g'bye";
print $_[0], "\n";
}
printit(substr($x, 0, 5));
In previous versions, this would print "hello", but it now prints "g'bye".
=head2 <> now reads in records
If C<$/> is a reference to an integer, or a scalar that holds an integer,
<> will read in records instead of lines. For more info, see
L<perlvar/$E<sol>>.
=head1 Supported Platforms
Configure has many incremental improvements. Site-wide policy for building
perl can now be made persistent, via Policy.sh. Configure also records
the command-line arguments used in F<config.sh>.
=head2 New Platforms
BeOS is now supported. See F<README.beos>.
DOS is now supported under the DJGPP tools. See F<README.dos> (installed
as L<perldos> on some systems).
MiNT is now supported. See F<README.mint>.
MPE/iX is now supported. See README.mpeix.
MVS (aka OS390, aka Open Edition) is now supported. See F<README.os390>
(installed as L<perlos390> on some systems).
Stratus VOS is now supported. See F<README.vos>.
=head2 Changes in existing support
Win32 support has been vastly enhanced. Support for Perl Object, a C++
encapsulation of Perl. GCC and EGCS are now supported on Win32.
See F<README.win32>, aka L<perlwin32>.
VMS configuration system has been rewritten. See F<README.vms> (installed
as F<README_vms> on some systems).
The hints files for most Unix platforms have seen incremental improvements.
=head1 Modules and Pragmata
=head2 New Modules
=over 4
=item B
Perl compiler and tools. See L<B>.
=item Data::Dumper
A module to pretty print Perl data. See L<Data::Dumper>.
=item Dumpvalue
A module to dump perl values to the screen. See L<Dumpvalue>.
=item Errno
A module to look up errors more conveniently. See L<Errno>.
=item File::Spec
A portable API for file operations.
=item ExtUtils::Installed
Query and manage installed modules.
=item ExtUtils::Packlist
Manipulate .packlist files.
=item Fatal
Make functions/builtins succeed or die.
=item IPC::SysV
Constants and other support infrastructure for System V IPC operations
in perl.
=item Test
A framework for writing test suites.
=item Tie::Array
Base class for tied arrays.
=item Tie::Handle
Base class for tied handles.
=item Thread
Perl thread creation, manipulation, and support.
=item attrs
Set subroutine attributes.
=item fields
Compile-time class fields.
=item re
Various pragmata to control behavior of regular expressions.
=back
=head2 Changes in existing modules
=over 4
=item Benchmark
You can now run tests for I<x> seconds instead of guessing the right
number of tests to run.
Keeps better time.
=item Carp
Carp has a new function cluck(). cluck() warns, like carp(), but also adds
a stack backtrace to the error message, like confess().
=item CGI
CGI has been updated to version 2.42.
=item Fcntl
More Fcntl constants added: F_SETLK64, F_SETLKW64, O_LARGEFILE for
large (more than 4G) file access (the 64-bit support is not yet
working, though, so no need to get overly excited), Free/Net/OpenBSD
locking behaviour flags F_FLOCK, F_POSIX, Linux F_SHLCK, and
O_ACCMODE: the mask of O_RDONLY, O_WRONLY, and O_RDWR.
=item Math::Complex
The accessors methods Re, Im, arg, abs, rho, theta, methods can
($z->Re()) now also act as mutators ($z->Re(3)).
=item Math::Trig
A little bit of radial trigonometry (cylindrical and spherical) added,
for example the great circle distance.
=item POSIX
POSIX now has its own platform-specific hints files.
=item DB_File
DB_File supports version 2.x of Berkeley DB. See C<ext/DB_File/Changes>.
=item MakeMaker
MakeMaker now supports writing empty makefiles, provides a way to
specify that site umask() policy should be honored. There is also
better support for manipulation of .packlist files, and getting
information about installed modules.
Extensions that have both architecture-dependent and
architecture-independent files are now always installed completely in
the architecture-dependent locations. Previously, the shareable parts
were shared both across architectures and across perl versions and were
therefore liable to be overwritten with newer versions that might have
subtle incompatibilities.
=item CPAN
See L<perlmodinstall> and L<CPAN>.
=item Cwd
Cwd::cwd is faster on most platforms.
=back
=head1 Utility Changes
C<h2ph> and related utilities have been vastly overhauled.
C<perlcc>, a new experimental front end for the compiler is available.
The crude GNU C<configure> emulator is now called C<configure.gnu> to
avoid trampling on C<Configure> under case-insensitive filesystems.
C<perldoc> used to be rather slow. The slower features are now optional.
In particular, case-insensitive searches need the C<-i> switch, and
recursive searches need C<-r>. You can set these switches in the
C<PERLDOC> environment variable to get the old behavior.
=head1 Documentation Changes
Config.pm now has a glossary of variables.
F<Porting/patching.pod> has detailed instructions on how to create and
submit patches for perl.
L<perlport> specifies guidelines on how to write portably.
L<perlmodinstall> describes how to fetch and install modules from C<CPAN>
sites.
Some more Perl traps are documented now. See L<perltrap>.
L<perlopentut> gives a tutorial on using open().
L<perlreftut> gives a tutorial on references.
L<perlthrtut> gives a tutorial on threads.
=head1 New Diagnostics
=over 4
=item Ambiguous call resolved as CORE::%s(), qualify as such or use &
(W) A subroutine you have declared has the same name as a Perl keyword,
and you have used the name without qualification for calling one or the
other. Perl decided to call the builtin because the subroutine is
not imported.
To force interpretation as a subroutine call, either put an ampersand
before the subroutine name, or qualify the name with its package.
Alternatively, you can import the subroutine (or pretend that it's
imported with the C<use subs> pragma).
To silently interpret it as the Perl operator, use the C<CORE::> prefix
on the operator (e.g. C<CORE::log($x)>) or by declaring the subroutine
to be an object method (see L</attrs>).
=item Bad index while coercing array into hash
(F) The index looked up in the hash found as the 0'th element of a
pseudo-hash is not legal. Index values must be at 1 or greater.
See L<perlref>.
=item Bareword "%s" refers to nonexistent package
(W) You used a qualified bareword of the form C<Foo::>, but
the compiler saw no other uses of that namespace before that point.
Perhaps you need to predeclare a package?
=item Can't call method "%s" on an undefined value
(F) You used the syntax of a method call, but the slot filled by the
object reference or package name contains an undefined value.
Something like this will reproduce the error:
$BADREF = 42;
process $BADREF 1,2,3;
$BADREF->process(1,2,3);
=item Can't check filesystem of script "%s" for nosuid
(P) For some reason you can't check the filesystem of the script for nosuid.
=item Can't coerce array into hash
(F) You used an array where a hash was expected, but the array has no
information on how to map from keys to array indices. You can do that
only with arrays that have a hash reference at index 0.
=item Can't goto subroutine from an eval-string
(F) The "goto subroutine" call can't be used to jump out of an eval "string".
(You can use it to jump out of an eval {BLOCK}, but you probably don't want to.)
=item Can't localize pseudo-hash element
(F) You said something like C<< local $ar->{'key'} >>, where $ar is
a reference to a pseudo-hash. That hasn't been implemented yet, but
you can get a similar effect by localizing the corresponding array
element directly: C<< local $ar->[$ar->[0]{'key'}] >>.
=item Can't use %%! because Errno.pm is not available
(F) The first time the %! hash is used, perl automatically loads the
Errno.pm module. The Errno module is expected to tie the %! hash to
provide symbolic names for C<$!> errno values.
=item Cannot find an opnumber for "%s"
(F) A string of a form C<CORE::word> was given to prototype(), but
there is no builtin with the name C<word>.
=item Character class syntax [. .] is reserved for future extensions
(W) Within regular expression character classes ([]) the syntax beginning
with "[." and ending with ".]" is reserved for future extensions.
If you need to represent those character sequences inside a regular
expression character class, just quote the square brackets with the
backslash: "\[." and ".\]".
=item Character class syntax [: :] is reserved for future extensions
(W) Within regular expression character classes ([]) the syntax beginning
with "[:" and ending with ":]" is reserved for future extensions.
If you need to represent those character sequences inside a regular
expression character class, just quote the square brackets with the
backslash: "\[:" and ":\]".
=item Character class syntax [= =] is reserved for future extensions
(W) Within regular expression character classes ([]) the syntax
beginning with "[=" and ending with "=]" is reserved for future extensions.
If you need to represent those character sequences inside a regular
expression character class, just quote the square brackets with the
backslash: "\[=" and "=\]".
=item %s: Eval-group in insecure regular expression
(F) Perl detected tainted data when trying to compile a regular expression
that contains the C<(?{ ... })> zero-width assertion, which is unsafe.
See L<perlre/(?{ code })>, and L<perlsec>.
=item %s: Eval-group not allowed, use re 'eval'
(F) A regular expression contained the C<(?{ ... })> zero-width assertion,
but that construct is only allowed when the C<use re 'eval'> pragma is
in effect. See L<perlre/(?{ code })>.
=item %s: Eval-group not allowed at run time
(F) Perl tried to compile a regular expression containing the C<(?{ ... })>
zero-width assertion at run time, as it would when the pattern contains
interpolated values. Since that is a security risk, it is not allowed.
If you insist, you may still do this by explicitly building the pattern
from an interpolated string at run time and using that in an eval().
See L<perlre/(?{ code })>.
=item Explicit blessing to '' (assuming package main)
(W) You are blessing a reference to a zero length string. This has
the effect of blessing the reference into the package main. This is
usually not what you want. Consider providing a default target
package, e.g. bless($ref, $p || 'MyPackage');
=item Illegal hex digit ignored
(W) You may have tried to use a character other than 0 - 9 or A - F in a
hexadecimal number. Interpretation of the hexadecimal number stopped
before the illegal character.
=item No such array field
(F) You tried to access an array as a hash, but the field name used is
not defined. The hash at index 0 should map all valid field names to
array indices for that to work.
=item No such field "%s" in variable %s of type %s
(F) You tried to access a field of a typed variable where the type
does not know about the field name. The field names are looked up in
the %FIELDS hash in the type package at compile time. The %FIELDS hash
is usually set up with the 'fields' pragma.
=item Out of memory during ridiculously large request
(F) You can't allocate more than 2^31+"small amount" bytes. This error
is most likely to be caused by a typo in the Perl program. e.g., C<$arr[time]>
instead of C<$arr[$time]>.
=item Range iterator outside integer range
(F) One (or both) of the numeric arguments to the range operator ".."
are outside the range which can be represented by integers internally.
One possible workaround is to force Perl to use magical string
increment by prepending "0" to your numbers.
=item Recursive inheritance detected while looking for method '%s' %s
(F) More than 100 levels of inheritance were encountered while invoking a
method. Probably indicates an unintended loop in your inheritance hierarchy.
=item Reference found where even-sized list expected
(W) You gave a single reference where Perl was expecting a list with
an even number of elements (for assignment to a hash). This
usually means that you used the anon hash constructor when you meant
to use parens. In any case, a hash requires key/value B<pairs>.
%hash = { one => 1, two => 2, }; # WRONG
%hash = [ qw/ an anon array / ]; # WRONG
%hash = ( one => 1, two => 2, ); # right
%hash = qw( one 1 two 2 ); # also fine
=item Undefined value assigned to typeglob
(W) An undefined value was assigned to a typeglob, a la C<*foo = undef>.
This does nothing. It's possible that you really mean C<undef *foo>.
=item Use of reserved word "%s" is deprecated
(D) The indicated bareword is a reserved word. Future versions of perl
may use it as a keyword, so you're better off either explicitly quoting
the word in a manner appropriate for its context of use, or using a
different name altogether. The warning can be suppressed for subroutine
names by either adding a C<&> prefix, or using a package qualifier,
e.g. C<&our()>, or C<Foo::our()>.
=item perl: warning: Setting locale failed.
(S) The whole warning message will look something like:
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LC_ALL = "En_US",
LANG = (unset)
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Exactly what were the failed locale settings varies. In the above the
settings were that the LC_ALL was "En_US" and the LANG had no value.
This error means that Perl detected that you and/or your system
administrator have set up the so-called variable system but Perl could
not use those settings. This was not dead serious, fortunately: there
is a "default locale" called "C" that Perl can and will use, the
script will be run. Before you really fix the problem, however, you
will get the same error message each time you run Perl. How to really
fix the problem can be found in L<perllocale/"LOCALE PROBLEMS">.
=back
=head1 Obsolete Diagnostics
=over 4
=item Can't mktemp()
(F) The mktemp() routine failed for some reason while trying to process
a B<-e> switch. Maybe your /tmp partition is full, or clobbered.
Removed because B<-e> doesn't use temporary files any more.
=item Can't write to temp file for B<-e>: %s
(F) The write routine failed for some reason while trying to process
a B<-e> switch. Maybe your /tmp partition is full, or clobbered.
Removed because B<-e> doesn't use temporary files any more.
=item Cannot open temporary file
(F) The create routine failed for some reason while trying to process
a B<-e> switch. Maybe your /tmp partition is full, or clobbered.
Removed because B<-e> doesn't use temporary files any more.
=item regexp too big
(F) The current implementation of regular expressions uses shorts as
address offsets within a string. Unfortunately this means that if
the regular expression compiles to longer than 32767, it'll blow up.
Usually when you want a regular expression this big, there is a better
way to do it with multiple statements. See L<perlre>.
=back
=head1 Configuration Changes
You can use "Configure -Uinstallusrbinperl" which causes installperl
to skip installing perl also as /usr/bin/perl. This is useful if you
prefer not to modify /usr/bin for some reason or another but harmful
because many scripts assume to find Perl in /usr/bin/perl.
=head1 BUGS
If you find what you think is a bug, you might check the headers of
recently posted articles in the comp.lang.perl.misc newsgroup.
There may also be information at http://www.perl.com/perl/ , the Perl
Home Page.
If you believe you have an unreported bug, please run the B<perlbug>
program included with your release. Make sure you trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to <F<perlbug@perl.com>> to be
analysed by the Perl porting team.
=head1 SEE ALSO
The F<Changes> file for exhaustive details on what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=head1 HISTORY
Written by Gurusamy Sarathy <F<gsar@activestate.com>>, with many contributions
from The Perl Porters.
Send omissions or corrections to <F<perlbug@perl.com>>.
=cut
PK y3�Z�Z�2V 2V perlport.podnu �[��� =head1 NAME
perlport - Writing portable Perl
=head1 DESCRIPTION
Perl runs on numerous operating systems. While most of them share
much in common, they also have their own unique features.
This document is meant to help you to find out what constitutes portable
Perl code. That way once you make a decision to write portably,
you know where the lines are drawn, and you can stay within them.
There is a tradeoff between taking full advantage of one particular
type of computer and taking advantage of a full range of them.
Naturally, as you broaden your range and become more diverse, the
common factors drop, and you are left with an increasingly smaller
area of common ground in which you can operate to accomplish a
particular task. Thus, when you begin attacking a problem, it is
important to consider under which part of the tradeoff curve you
want to operate. Specifically, you must decide whether it is
important that the task that you are coding has the full generality
of being portable, or whether to just get the job done right now.
This is the hardest choice to be made. The rest is easy, because
Perl provides many choices, whichever way you want to approach your
problem.
Looking at it another way, writing portable code is usually about
willfully limiting your available choices. Naturally, it takes
discipline and sacrifice to do that. The product of portability
and convenience may be a constant. You have been warned.
Be aware of two important points:
=over 4
=item Not all Perl programs have to be portable
There is no reason you should not use Perl as a language to glue Unix
tools together, or to prototype a Macintosh application, or to manage the
Windows registry. If it makes no sense to aim for portability for one
reason or another in a given program, then don't bother.
=item Nearly all of Perl already I<is> portable
Don't be fooled into thinking that it is hard to create portable Perl
code. It isn't. Perl tries its level-best to bridge the gaps between
what's available on different platforms, and all the means available to
use those features. Thus almost all Perl code runs on any machine
without modification. But there are some significant issues in
writing portable code, and this document is entirely about those issues.
=back
Here's the general rule: When you approach a task commonly done
using a whole range of platforms, think about writing portable
code. That way, you don't sacrifice much by way of the implementation
choices you can avail yourself of, and at the same time you can give
your users lots of platform choices. On the other hand, when you have to
take advantage of some unique feature of a particular platform, as is
often the case with systems programming (whether for Unix, Windows,
VMS, etc.), consider writing platform-specific code.
When the code will run on only two or three operating systems, you
may need to consider only the differences of those particular systems.
The important thing is to decide where the code will run and to be
deliberate in your decision.
The material below is separated into three main sections: main issues of
portability (L</"ISSUES">), platform-specific issues (L</"PLATFORMS">), and
built-in Perl functions that behave differently on various ports
(L</"FUNCTION IMPLEMENTATIONS">).
This information should not be considered complete; it includes possibly
transient information about idiosyncrasies of some of the ports, almost
all of which are in a state of constant evolution. Thus, this material
should be considered a perpetual work in progress
(C<< <IMG SRC="yellow_sign.gif" ALT="Under Construction"> >>).
=head1 ISSUES
=head2 Newlines
In most operating systems, lines in files are terminated by newlines.
Just what is used as a newline may vary from OS to OS. Unix
traditionally uses C<\012>, one type of DOSish I/O uses C<\015\012>,
S<Mac OS> uses C<\015>, and z/OS uses C<\025>.
Perl uses C<\n> to represent the "logical" newline, where what is
logical may depend on the platform in use. In MacPerl, C<\n> always
means C<\015>. On EBCDIC platforms, C<\n> could be C<\025> or C<\045>.
In DOSish perls, C<\n> usually means C<\012>, but when
accessing a file in "text" mode, perl uses the C<:crlf> layer that
translates it to (or from) C<\015\012>, depending on whether you're
reading or writing. Unix does the same thing on ttys in canonical
mode. C<\015\012> is commonly referred to as CRLF.
To trim trailing newlines from text lines use
L<C<chomp>|perlfunc/chomp VARIABLE>. With default settings that function
looks for a trailing C<\n> character and thus trims in a portable way.
When dealing with binary files (or text files in binary mode) be sure
to explicitly set L<C<$E<sol>>|perlvar/$E<sol>> to the appropriate value for
your file format before using L<C<chomp>|perlfunc/chomp VARIABLE>.
Because of the "text" mode translation, DOSish perls have limitations in
using L<C<seek>|perlfunc/seek FILEHANDLE,POSITION,WHENCE> and
L<C<tell>|perlfunc/tell FILEHANDLE> on a file accessed in "text" mode.
Stick to L<C<seek>|perlfunc/seek FILEHANDLE,POSITION,WHENCE>-ing to
locations you got from L<C<tell>|perlfunc/tell FILEHANDLE> (and no
others), and you are usually free to use
L<C<seek>|perlfunc/seek FILEHANDLE,POSITION,WHENCE> and
L<C<tell>|perlfunc/tell FILEHANDLE> even in "text" mode. Using
L<C<seek>|perlfunc/seek FILEHANDLE,POSITION,WHENCE> or
L<C<tell>|perlfunc/tell FILEHANDLE> or other file operations may be
non-portable. If you use L<C<binmode>|perlfunc/binmode FILEHANDLE> on a
file, however, you can usually
L<C<seek>|perlfunc/seek FILEHANDLE,POSITION,WHENCE> and
L<C<tell>|perlfunc/tell FILEHANDLE> with arbitrary values safely.
A common misconception in socket programming is that S<C<\n eq \012>>
everywhere. When using protocols such as common Internet protocols,
C<\012> and C<\015> are called for specifically, and the values of
the logical C<\n> and C<\r> (carriage return) are not reliable.
print $socket "Hi there, client!\r\n"; # WRONG
print $socket "Hi there, client!\015\012"; # RIGHT
However, using C<\015\012> (or C<\cM\cJ>, or C<\x0D\x0A>) can be tedious
and unsightly, as well as confusing to those maintaining the code. As
such, the L<C<Socket>|Socket> module supplies the Right Thing for those
who want it.
use Socket qw(:DEFAULT :crlf);
print $socket "Hi there, client!$CRLF" # RIGHT
When reading from a socket, remember that the default input record
separator L<C<$E<sol>>|perlvar/$E<sol>> is C<\n>, but robust socket code
will recognize as either C<\012> or C<\015\012> as end of line:
while (<$socket>) { # NOT ADVISABLE!
# ...
}
Because both CRLF and LF end in LF, the input record separator can
be set to LF and any CR stripped later. Better to write:
use Socket qw(:DEFAULT :crlf);
local($/) = LF; # not needed if $/ is already \012
while (<$socket>) {
s/$CR?$LF/\n/; # not sure if socket uses LF or CRLF, OK
# s/\015?\012/\n/; # same thing
}
This example is preferred over the previous one--even for Unix
platforms--because now any C<\015>'s (C<\cM>'s) are stripped out
(and there was much rejoicing).
Similarly, functions that return text data--such as a function that
fetches a web page--should sometimes translate newlines before
returning the data, if they've not yet been translated to the local
newline representation. A single line of code will often suffice:
$data =~ s/\015?\012/\n/g;
return $data;
Some of this may be confusing. Here's a handy reference to the ASCII CR
and LF characters. You can print it out and stick it in your wallet.
LF eq \012 eq \x0A eq \cJ eq chr(10) eq ASCII 10
CR eq \015 eq \x0D eq \cM eq chr(13) eq ASCII 13
| Unix | DOS | Mac |
---------------------------
\n | LF | LF | CR |
\r | CR | CR | LF |
\n * | LF | CRLF | CR |
\r * | CR | CR | LF |
---------------------------
* text-mode STDIO
The Unix column assumes that you are not accessing a serial line
(like a tty) in canonical mode. If you are, then CR on input becomes
"\n", and "\n" on output becomes CRLF.
These are just the most common definitions of C<\n> and C<\r> in Perl.
There may well be others. For example, on an EBCDIC implementation
such as z/OS (OS/390) or OS/400 (using the ILE, the PASE is ASCII-based)
the above material is similar to "Unix" but the code numbers change:
LF eq \025 eq \x15 eq \cU eq chr(21) eq CP-1047 21
LF eq \045 eq \x25 eq chr(37) eq CP-0037 37
CR eq \015 eq \x0D eq \cM eq chr(13) eq CP-1047 13
CR eq \015 eq \x0D eq \cM eq chr(13) eq CP-0037 13
| z/OS | OS/400 |
----------------------
\n | LF | LF |
\r | CR | CR |
\n * | LF | LF |
\r * | CR | CR |
----------------------
* text-mode STDIO
=head2 Numbers endianness and Width
Different CPUs store integers and floating point numbers in different
orders (called I<endianness>) and widths (32-bit and 64-bit being the
most common today). This affects your programs when they attempt to transfer
numbers in binary format from one CPU architecture to another,
usually either "live" via network connection, or by storing the
numbers to secondary storage such as a disk file or tape.
Conflicting storage orders make an utter mess out of the numbers. If a
little-endian host (Intel, VAX) stores 0x12345678 (305419896 in
decimal), a big-endian host (Motorola, Sparc, PA) reads it as
0x78563412 (2018915346 in decimal). Alpha and MIPS can be either:
Digital/Compaq used/uses them in little-endian mode; SGI/Cray uses
them in big-endian mode. To avoid this problem in network (socket)
connections use the L<C<pack>|perlfunc/pack TEMPLATE,LIST> and
L<C<unpack>|perlfunc/unpack TEMPLATE,EXPR> formats C<n> and C<N>, the
"network" orders. These are guaranteed to be portable.
As of Perl 5.10.0, you can also use the C<E<gt>> and C<E<lt>> modifiers
to force big- or little-endian byte-order. This is useful if you want
to store signed integers or 64-bit integers, for example.
You can explore the endianness of your platform by unpacking a
data structure packed in native format such as:
print unpack("h*", pack("s2", 1, 2)), "\n";
# '10002000' on e.g. Intel x86 or Alpha 21064 in little-endian mode
# '00100020' on e.g. Motorola 68040
If you need to distinguish between endian architectures you could use
either of the variables set like so:
$is_big_endian = unpack("h*", pack("s", 1)) =~ /01/;
$is_little_endian = unpack("h*", pack("s", 1)) =~ /^1/;
Differing widths can cause truncation even between platforms of equal
endianness. The platform of shorter width loses the upper parts of the
number. There is no good solution for this problem except to avoid
transferring or storing raw binary numbers.
One can circumnavigate both these problems in two ways. Either
transfer and store numbers always in text format, instead of raw
binary, or else consider using modules like
L<C<Data::Dumper>|Data::Dumper> and L<C<Storable>|Storable> (included as
of Perl 5.8). Keeping all data as text significantly simplifies matters.
=head2 Files and Filesystems
Most platforms these days structure files in a hierarchical fashion.
So, it is reasonably safe to assume that all platforms support the
notion of a "path" to uniquely identify a file on the system. How
that path is really written, though, differs considerably.
Although similar, file path specifications differ between Unix,
Windows, S<Mac OS>, OS/2, VMS, VOS, S<RISC OS>, and probably others.
Unix, for example, is one of the few OSes that has the elegant idea
of a single root directory.
DOS, OS/2, VMS, VOS, and Windows can work similarly to Unix with C</>
as path separator, or in their own idiosyncratic ways (such as having
several root directories and various "unrooted" device files such NIL:
and LPT:).
S<Mac OS> 9 and earlier used C<:> as a path separator instead of C</>.
The filesystem may support neither hard links
(L<C<link>|perlfunc/link OLDFILE,NEWFILE>) nor symbolic links
(L<C<symlink>|perlfunc/symlink OLDFILE,NEWFILE>,
L<C<readlink>|perlfunc/readlink EXPR>,
L<C<lstat>|perlfunc/lstat FILEHANDLE>).
The filesystem may support neither access timestamp nor change
timestamp (meaning that about the only portable timestamp is the
modification timestamp), or one second granularity of any timestamps
(e.g. the FAT filesystem limits the time granularity to two seconds).
The "inode change timestamp" (the L<C<-C>|perlfunc/-X FILEHANDLE>
filetest) may really be the "creation timestamp" (which it is not in
Unix).
VOS perl can emulate Unix filenames with C</> as path separator. The
native pathname characters greater-than, less-than, number-sign, and
percent-sign are always accepted.
S<RISC OS> perl can emulate Unix filenames with C</> as path
separator, or go native and use C<.> for path separator and C<:> to
signal filesystems and disk names.
Don't assume Unix filesystem access semantics: that read, write,
and execute are all the permissions there are, and even if they exist,
that their semantics (for example what do C<r>, C<w>, and C<x> mean on
a directory) are the Unix ones. The various Unix/POSIX compatibility
layers usually try to make interfaces like L<C<chmod>|perlfunc/chmod LIST>
work, but sometimes there simply is no good mapping.
The L<C<File::Spec>|File::Spec> modules provide methods to manipulate path
specifications and return the results in native format for each
platform. This is often unnecessary as Unix-style paths are
understood by Perl on every supported platform, but if you need to
produce native paths for a native utility that does not understand
Unix syntax, or if you are operating on paths or path components
in unknown (and thus possibly native) syntax, L<C<File::Spec>|File::Spec>
is your friend. Here are two brief examples:
use File::Spec::Functions;
chdir(updir()); # go up one directory
# Concatenate a path from its components
my $file = catfile(updir(), 'temp', 'file.txt');
# on Unix: '../temp/file.txt'
# on Win32: '..\temp\file.txt'
# on VMS: '[-.temp]file.txt'
In general, production code should not have file paths hardcoded.
Making them user-supplied or read from a configuration file is
better, keeping in mind that file path syntax varies on different
machines.
This is especially noticeable in scripts like Makefiles and test suites,
which often assume C</> as a path separator for subdirectories.
Also of use is L<C<File::Basename>|File::Basename> from the standard
distribution, which splits a pathname into pieces (base filename, full
path to directory, and file suffix).
Even when on a single platform (if you can call Unix a single platform),
remember not to count on the existence or the contents of particular
system-specific files or directories, like F</etc/passwd>,
F</etc/sendmail.conf>, F</etc/resolv.conf>, or even F</tmp/>. For
example, F</etc/passwd> may exist but not contain the encrypted
passwords, because the system is using some form of enhanced security.
Or it may not contain all the accounts, because the system is using NIS.
If code does need to rely on such a file, include a description of the
file and its format in the code's documentation, then make it easy for
the user to override the default location of the file.
Don't assume a text file will end with a newline. They should,
but people forget.
Do not have two files or directories of the same name with different
case, like F<test.pl> and F<Test.pl>, as many platforms have
case-insensitive (or at least case-forgiving) filenames. Also, try
not to have non-word characters (except for C<.>) in the names, and
keep them to the 8.3 convention, for maximum portability, onerous a
burden though this may appear.
Likewise, when using the L<C<AutoSplit>|AutoSplit> module, try to keep
your functions to 8.3 naming and case-insensitive conventions; or, at the
least, make it so the resulting files have a unique (case-insensitively)
first 8 characters.
Whitespace in filenames is tolerated on most systems, but not all,
and even on systems where it might be tolerated, some utilities
might become confused by such whitespace.
Many systems (DOS, VMS ODS-2) cannot have more than one C<.> in their
filenames.
Don't assume C<< > >> won't be the first character of a filename.
Always use the three-arg version of
L<C<open>|perlfunc/open FILEHANDLE,EXPR>:
open my $fh, '<', $existing_file) or die $!;
Two-arg L<C<open>|perlfunc/open FILEHANDLE,EXPR> is magic and can
translate characters like C<< > >>, C<< < >>, and C<|> in filenames,
which is usually the wrong thing to do.
L<C<sysopen>|perlfunc/sysopen FILEHANDLE,FILENAME,MODE> and three-arg
L<C<open>|perlfunc/open FILEHANDLE,EXPR> don't have this problem.
Don't use C<:> as a part of a filename since many systems use that for
their own semantics (Mac OS Classic for separating pathname components,
many networking schemes and utilities for separating the nodename and
the pathname, and so on). For the same reasons, avoid C<@>, C<;> and
C<|>.
Don't assume that in pathnames you can collapse two leading slashes
C<//> into one: some networking and clustering filesystems have special
semantics for that. Let the operating system sort it out.
The I<portable filename characters> as defined by ANSI C are
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
. _ -
and C<-> shouldn't be the first character. If you want to be
hypercorrect, stay case-insensitive and within the 8.3 naming
convention (all the files and directories have to be unique within one
directory if their names are lowercased and truncated to eight
characters before the C<.>, if any, and to three characters after the
C<.>, if any). (And do not use C<.>s in directory names.)
=head2 System Interaction
Not all platforms provide a command line. These are usually platforms
that rely primarily on a Graphical User Interface (GUI) for user
interaction. A program requiring a command line interface might
not work everywhere. This is probably for the user of the program
to deal with, so don't stay up late worrying about it.
Some platforms can't delete or rename files held open by the system,
this limitation may also apply to changing filesystem metainformation
like file permissions or owners. Remember to
L<C<close>|perlfunc/close FILEHANDLE> files when you are done with them.
Don't L<C<unlink>|perlfunc/unlink LIST> or
L<C<rename>|perlfunc/rename OLDNAME,NEWNAME> an open file. Don't
L<C<tie>|perlfunc/tie VARIABLE,CLASSNAME,LIST> or
L<C<open>|perlfunc/open FILEHANDLE,EXPR> a file already tied or opened;
L<C<untie>|perlfunc/untie VARIABLE> or
L<C<close>|perlfunc/close FILEHANDLE> it first.
Don't open the same file more than once at a time for writing, as some
operating systems put mandatory locks on such files.
Don't assume that write/modify permission on a directory gives the
right to add or delete files/directories in that directory. That is
filesystem specific: in some filesystems you need write/modify
permission also (or even just) in the file/directory itself. In some
filesystems (AFS, DFS) the permission to add/delete directory entries
is a completely separate permission.
Don't assume that a single L<C<unlink>|perlfunc/unlink LIST> completely
gets rid of the file: some filesystems (most notably the ones in VMS) have
versioned filesystems, and L<C<unlink>|perlfunc/unlink LIST> removes only
the most recent one (it doesn't remove all the versions because by default
the native tools on those platforms remove just the most recent version,
too). The portable idiom to remove all the versions of a file is
1 while unlink "file";
This will terminate if the file is undeleteable for some reason
(protected, not there, and so on).
Don't count on a specific environment variable existing in
L<C<%ENV>|perlvar/%ENV>. Don't count on L<C<%ENV>|perlvar/%ENV> entries
being case-sensitive, or even case-preserving. Don't try to clear
L<C<%ENV>|perlvar/%ENV> by saying C<%ENV = ();>, or, if you really have
to, make it conditional on C<$^O ne 'VMS'> since in VMS the
L<C<%ENV>|perlvar/%ENV> table is much more than a per-process key-value
string table.
On VMS, some entries in the L<C<%ENV>|perlvar/%ENV> hash are dynamically
created when their key is used on a read if they did not previously
exist. The values for C<$ENV{HOME}>, C<$ENV{TERM}>, C<$ENV{PATH}>, and
C<$ENV{USER}>, are known to be dynamically generated. The specific names
that are dynamically generated may vary with the version of the C library
on VMS, and more may exist than are documented.
On VMS by default, changes to the L<C<%ENV>|perlvar/%ENV> hash persist
after perl exits. Subsequent invocations of perl in the same process can
inadvertently inherit environment settings that were meant to be
temporary.
Don't count on signals or L<C<%SIG>|perlvar/%SIG> for anything.
Don't count on filename globbing. Use
L<C<opendir>|perlfunc/opendir DIRHANDLE,EXPR>,
L<C<readdir>|perlfunc/readdir DIRHANDLE>, and
L<C<closedir>|perlfunc/closedir DIRHANDLE> instead.
Don't count on per-program environment variables, or per-program current
directories.
Don't count on specific values of L<C<$!>|perlvar/$!>, neither numeric nor
especially the string values. Users may switch their locales causing
error messages to be translated into their languages. If you can
trust a POSIXish environment, you can portably use the symbols defined
by the L<C<Errno>|Errno> module, like C<ENOENT>. And don't trust on the
values of L<C<$!>|perlvar/$!> at all except immediately after a failed
system call.
=head2 Command names versus file pathnames
Don't assume that the name used to invoke a command or program with
L<C<system>|perlfunc/system LIST> or L<C<exec>|perlfunc/exec LIST> can
also be used to test for the existence of the file that holds the
executable code for that command or program.
First, many systems have "internal" commands that are built-in to the
shell or OS and while these commands can be invoked, there is no
corresponding file. Second, some operating systems (e.g., Cygwin,
DJGPP, OS/2, and VOS) have required suffixes for executable files;
these suffixes are generally permitted on the command name but are not
required. Thus, a command like C<perl> might exist in a file named
F<perl>, F<perl.exe>, or F<perl.pm>, depending on the operating system.
The variable L<C<$Config{_exe}>|Config/C<_exe>> in the
L<C<Config>|Config> module holds the executable suffix, if any. Third,
the VMS port carefully sets up L<C<$^X>|perlvar/$^X> and
L<C<$Config{perlpath}>|Config/C<perlpath>> so that no further processing
is required. This is just as well, because the matching regular
expression used below would then have to deal with a possible trailing
version number in the VMS file name.
To convert L<C<$^X>|perlvar/$^X> to a file pathname, taking account of
the requirements of the various operating system possibilities, say:
use Config;
my $thisperl = $^X;
if ($^O ne 'VMS') {
$thisperl .= $Config{_exe}
unless $thisperl =~ m/\Q$Config{_exe}\E$/i;
}
To convert L<C<$Config{perlpath}>|Config/C<perlpath>> to a file pathname, say:
use Config;
my $thisperl = $Config{perlpath};
if ($^O ne 'VMS') {
$thisperl .= $Config{_exe}
unless $thisperl =~ m/\Q$Config{_exe}\E$/i;
}
=head2 Networking
Don't assume that you can reach the public Internet.
Don't assume that there is only one way to get through firewalls
to the public Internet.
Don't assume that you can reach outside world through any other port
than 80, or some web proxy. ftp is blocked by many firewalls.
Don't assume that you can send email by connecting to the local SMTP port.
Don't assume that you can reach yourself or any node by the name
'localhost'. The same goes for '127.0.0.1'. You will have to try both.
Don't assume that the host has only one network card, or that it
can't bind to many virtual IP addresses.
Don't assume a particular network device name.
Don't assume a particular set of
L<C<ioctl>|perlfunc/ioctl FILEHANDLE,FUNCTION,SCALAR>s will work.
Don't assume that you can ping hosts and get replies.
Don't assume that any particular port (service) will respond.
Don't assume that L<C<Sys::Hostname>|Sys::Hostname> (or any other API or
command) returns either a fully qualified hostname or a non-qualified
hostname: it all depends on how the system had been configured. Also
remember that for things such as DHCP and NAT, the hostname you get back
might not be very useful.
All the above I<don't>s may look daunting, and they are, but the key
is to degrade gracefully if one cannot reach the particular network
service one wants. Croaking or hanging do not look very professional.
=head2 Interprocess Communication (IPC)
In general, don't directly access the system in code meant to be
portable. That means, no L<C<system>|perlfunc/system LIST>,
L<C<exec>|perlfunc/exec LIST>, L<C<fork>|perlfunc/fork>,
L<C<pipe>|perlfunc/pipe READHANDLE,WRITEHANDLE>,
L<C<``> or C<qxE<sol>E<sol>>|perlop/C<qxE<sol>I<STRING>E<sol>>>,
L<C<open>|perlfunc/open FILEHANDLE,EXPR> with a C<|>, nor any of the other
things that makes being a Perl hacker worth being.
Commands that launch external processes are generally supported on
most platforms (though many of them do not support any type of
forking). The problem with using them arises from what you invoke
them on. External tools are often named differently on different
platforms, may not be available in the same location, might accept
different arguments, can behave differently, and often present their
results in a platform-dependent way. Thus, you should seldom depend
on them to produce consistent results. (Then again, if you're calling
C<netstat -a>, you probably don't expect it to run on both Unix and CP/M.)
One especially common bit of Perl code is opening a pipe to B<sendmail>:
open(my $mail, '|-', '/usr/lib/sendmail -t')
or die "cannot fork sendmail: $!";
This is fine for systems programming when sendmail is known to be
available. But it is not fine for many non-Unix systems, and even
some Unix systems that may not have sendmail installed. If a portable
solution is needed, see the various distributions on CPAN that deal
with it. L<C<Mail::Mailer>|Mail::Mailer> and L<C<Mail::Send>|Mail::Send>
in the C<MailTools> distribution are commonly used, and provide several
mailing methods, including C<mail>, C<sendmail>, and direct SMTP (via
L<C<Net::SMTP>|Net::SMTP>) if a mail transfer agent is not available.
L<C<Mail::Sendmail>|Mail::Sendmail> is a standalone module that provides
simple, platform-independent mailing.
The Unix System V IPC (C<msg*(), sem*(), shm*()>) is not available
even on all Unix platforms.
Do not use either the bare result of C<pack("N", 10, 20, 30, 40)> or
bare v-strings (such as C<v10.20.30.40>) to represent IPv4 addresses:
both forms just pack the four bytes into network order. That this
would be equal to the C language C<in_addr> struct (which is what the
socket code internally uses) is not guaranteed. To be portable use
the routines of the L<C<Socket>|Socket> module, such as
L<C<inet_aton>|Socket/$ip_address = inet_aton $string>,
L<C<inet_ntoa>|Socket/$string = inet_ntoa $ip_address>, and
L<C<sockaddr_in>|Socket/$sockaddr = sockaddr_in $port, $ip_address>.
The rule of thumb for portable code is: Do it all in portable Perl, or
use a module (that may internally implement it with platform-specific
code, but exposes a common interface).
=head2 External Subroutines (XS)
XS code can usually be made to work with any platform, but dependent
libraries, header files, etc., might not be readily available or
portable, or the XS code itself might be platform-specific, just as Perl
code might be. If the libraries and headers are portable, then it is
normally reasonable to make sure the XS code is portable, too.
A different type of portability issue arises when writing XS code:
availability of a C compiler on the end-user's system. C brings
with it its own portability issues, and writing XS code will expose
you to some of those. Writing purely in Perl is an easier way to
achieve portability.
=head2 Standard Modules
In general, the standard modules work across platforms. Notable
exceptions are the L<C<CPAN>|CPAN> module (which currently makes
connections to external programs that may not be available),
platform-specific modules (like L<C<ExtUtils::MM_VMS>|ExtUtils::MM_VMS>),
and DBM modules.
There is no one DBM module available on all platforms.
L<C<SDBM_File>|SDBM_File> and the others are generally available on all
Unix and DOSish ports, but not in MacPerl, where only
L<C<NDBM_File>|NDBM_File> and L<C<DB_File>|DB_File> are available.
The good news is that at least some DBM module should be available, and
L<C<AnyDBM_File>|AnyDBM_File> will use whichever module it can find. Of
course, then the code needs to be fairly strict, dropping to the greatest
common factor (e.g., not exceeding 1K for each record), so that it will
work with any DBM module. See L<AnyDBM_File> for more details.
=head2 Time and Date
The system's notion of time of day and calendar date is controlled in
widely different ways. Don't assume the timezone is stored in C<$ENV{TZ}>,
and even if it is, don't assume that you can control the timezone through
that variable. Don't assume anything about the three-letter timezone
abbreviations (for example that MST would be the Mountain Standard Time,
it's been known to stand for Moscow Standard Time). If you need to
use timezones, express them in some unambiguous format like the
exact number of minutes offset from UTC, or the POSIX timezone
format.
Don't assume that the epoch starts at 00:00:00, January 1, 1970,
because that is OS- and implementation-specific. It is better to
store a date in an unambiguous representation. The ISO 8601 standard
defines YYYY-MM-DD as the date format, or YYYY-MM-DDTHH:MM:SS
(that's a literal "T" separating the date from the time).
Please do use the ISO 8601 instead of making us guess what
date 02/03/04 might be. ISO 8601 even sorts nicely as-is.
A text representation (like "1987-12-18") can be easily converted
into an OS-specific value using a module like
L<C<Time::Piece>|Time::Piece> (see L<Time::Piece/Date Parsing>) or
L<C<Date::Parse>|Date::Parse>. An array of values, such as those
returned by L<C<localtime>|perlfunc/localtime EXPR>, can be converted to an OS-specific
representation using L<C<Time::Local>|Time::Local>.
When calculating specific times, such as for tests in time or date modules,
it may be appropriate to calculate an offset for the epoch.
use Time::Local qw(timegm);
my $offset = timegm(0, 0, 0, 1, 0, 70);
The value for C<$offset> in Unix will be C<0>, but in Mac OS Classic
will be some large number. C<$offset> can then be added to a Unix time
value to get what should be the proper value on any system.
=head2 Character sets and character encoding
Assume very little about character sets.
Assume nothing about numerical values (L<C<ord>|perlfunc/ord EXPR>,
L<C<chr>|perlfunc/chr NUMBER>) of characters.
Do not use explicit code point ranges (like C<\xHH-\xHH)>. However,
starting in Perl v5.22, regular expression pattern bracketed character
class ranges specified like C<qr/[\N{U+HH}-\N{U+HH}]/> are portable,
and starting in Perl v5.24, the same ranges are portable in
L<C<trE<sol>E<sol>E<sol>>|perlop/C<trE<sol>I<SEARCHLIST>E<sol>I<REPLACEMENTLIST>E<sol>cdsr>>.
You can portably use symbolic character classes like C<[:print:]>.
Do not assume that the alphabetic characters are encoded contiguously
(in the numeric sense). There may be gaps. Special coding in Perl,
however, guarantees that all subsets of C<qr/[A-Z]/>, C<qr/[a-z]/>, and
C<qr/[0-9]/> behave as expected.
L<C<trE<sol>E<sol>E<sol>>|perlop/C<trE<sol>I<SEARCHLIST>E<sol>I<REPLACEMENTLIST>E<sol>cdsr>>
behaves the same for these ranges. In patterns, any ranges specified with
end points using the C<\N{...}> notations ensures character set
portability, but it is a bug in Perl v5.22 that this isn't true of
L<C<trE<sol>E<sol>E<sol>>|perlop/C<trE<sol>I<SEARCHLIST>E<sol>I<REPLACEMENTLIST>E<sol>cdsr>>,
fixed in v5.24.
Do not assume anything about the ordering of the characters.
The lowercase letters may come before or after the uppercase letters;
the lowercase and uppercase may be interlaced so that both "a" and "A"
come before "b"; the accented and other international characters may
be interlaced so that E<auml> comes before "b".
L<Unicode::Collate> can be used to sort this all out.
=head2 Internationalisation
If you may assume POSIX (a rather large assumption), you may read
more about the POSIX locale system from L<perllocale>. The locale
system at least attempts to make things a little bit more portable,
or at least more convenient and native-friendly for non-English
users. The system affects character sets and encoding, and date
and time formatting--amongst other things.
If you really want to be international, you should consider Unicode.
See L<perluniintro> and L<perlunicode> for more information.
By default Perl assumes your source code is written in an 8-bit ASCII
superset. To embed Unicode characters in your strings and regexes, you can
use the L<C<\x{HH}> or (more portably) C<\N{U+HH}>
notations|perlop/Quote and Quote-like Operators>. You can also use the
L<C<utf8>|utf8> pragma and write your code in UTF-8, which lets you use
Unicode characters directly (not just in quoted constructs but also in
identifiers).
=head2 System Resources
If your code is destined for systems with severely constrained (or
missing!) virtual memory systems then you want to be I<especially> mindful
of avoiding wasteful constructs such as:
my @lines = <$very_large_file>; # bad
while (<$fh>) {$file .= $_} # sometimes bad
my $file = join('', <$fh>); # better
The last two constructs may appear unintuitive to most people. The
first repeatedly grows a string, whereas the second allocates a
large chunk of memory in one go. On some systems, the second is
more efficient than the first.
=head2 Security
Most multi-user platforms provide basic levels of security, usually
implemented at the filesystem level. Some, however, unfortunately do
not. Thus the notion of user id, or "home" directory,
or even the state of being logged-in, may be unrecognizable on many
platforms. If you write programs that are security-conscious, it
is usually best to know what type of system you will be running
under so that you can write code explicitly for that platform (or
class of platforms).
Don't assume the Unix filesystem access semantics: the operating
system or the filesystem may be using some ACL systems, which are
richer languages than the usual C<rwx>. Even if the C<rwx> exist,
their semantics might be different.
(From the security viewpoint, testing for permissions before attempting to
do something is silly anyway: if one tries this, there is potential
for race conditions. Someone or something might change the
permissions between the permissions check and the actual operation.
Just try the operation.)
Don't assume the Unix user and group semantics: especially, don't
expect L<C<< $< >>|perlvar/$E<lt>> and L<C<< $> >>|perlvar/$E<gt>> (or
L<C<$(>|perlvar/$(> and L<C<$)>|perlvar/$)>) to work for switching
identities (or memberships).
Don't assume set-uid and set-gid semantics. (And even if you do,
think twice: set-uid and set-gid are a known can of security worms.)
=head2 Style
For those times when it is necessary to have platform-specific code,
consider keeping the platform-specific code in one place, making porting
to other platforms easier. Use the L<C<Config>|Config> module and the
special variable L<C<$^O>|perlvar/$^O> to differentiate platforms, as
described in L</"PLATFORMS">.
Beware of the "else syndrome":
if ($^O eq 'MSWin32') {
# code that assumes Windows
} else {
# code that assumes Linux
}
The C<else> branch should be used for the really ultimate fallback,
not for code specific to some platform.
Be careful in the tests you supply with your module or programs.
Module code may be fully portable, but its tests might not be. This
often happens when tests spawn off other processes or call external
programs to aid in the testing, or when (as noted above) the tests
assume certain things about the filesystem and paths. Be careful not
to depend on a specific output style for errors, such as when checking
L<C<$!>|perlvar/$!> after a failed system call. Using
L<C<$!>|perlvar/$!> for anything else than displaying it as output is
doubtful (though see the L<C<Errno>|Errno> module for testing reasonably
portably for error value). Some platforms expect a certain output format,
and Perl on those platforms may have been adjusted accordingly. Most
specifically, don't anchor a regex when testing an error value.
=head1 CPAN Testers
Modules uploaded to CPAN are tested by a variety of volunteers on
different platforms. These CPAN testers are notified by mail of each
new upload, and reply to the list with PASS, FAIL, NA (not applicable to
this platform), or UNKNOWN (unknown), along with any relevant notations.
The purpose of the testing is twofold: one, to help developers fix any
problems in their code that crop up because of lack of testing on other
platforms; two, to provide users with information about whether
a given module works on a given platform.
Also see:
=over 4
=item *
Mailing list: cpan-testers-discuss@perl.org
=item *
Testing results: L<http://www.cpantesters.org/>
=back
=head1 PLATFORMS
Perl is built with a L<C<$^O>|perlvar/$^O> variable that indicates the
operating system it was built on. This was implemented
to help speed up code that would otherwise have to C<use Config>
and use the value of L<C<$Config{osname}>|Config/C<osname>>. Of course,
to get more detailed information about the system, looking into
L<C<%Config>|Config/DESCRIPTION> is certainly recommended.
L<C<%Config>|Config/DESCRIPTION> cannot always be trusted, however,
because it was built at compile time. If perl was built in one place,
then transferred elsewhere, some values may be wrong. The values may
even have been edited after the fact.
=head2 Unix
Perl works on a bewildering variety of Unix and Unix-like platforms (see
e.g. most of the files in the F<hints/> directory in the source code kit).
On most of these systems, the value of L<C<$^O>|perlvar/$^O> (hence
L<C<$Config{osname}>|Config/C<osname>>, too) is determined either by
lowercasing and stripping punctuation from the first field of the string
returned by typing C<uname -a> (or a similar command) at the shell prompt
or by testing the file system for the presence of uniquely named files
such as a kernel or header file. Here, for example, are a few of the
more popular Unix flavors:
uname $^O $Config{archname}
--------------------------------------------
AIX aix aix
BSD/OS bsdos i386-bsdos
Darwin darwin darwin
DYNIX/ptx dynixptx i386-dynixptx
FreeBSD freebsd freebsd-i386
Haiku haiku BePC-haiku
Linux linux arm-linux
Linux linux armv5tel-linux
Linux linux i386-linux
Linux linux i586-linux
Linux linux ppc-linux
HP-UX hpux PA-RISC1.1
IRIX irix irix
Mac OS X darwin darwin
NeXT 3 next next-fat
NeXT 4 next OPENSTEP-Mach
openbsd openbsd i386-openbsd
OSF1 dec_osf alpha-dec_osf
reliantunix-n svr4 RM400-svr4
SCO_SV sco_sv i386-sco_sv
SINIX-N svr4 RM400-svr4
sn4609 unicos CRAY_C90-unicos
sn6521 unicosmk t3e-unicosmk
sn9617 unicos CRAY_J90-unicos
SunOS solaris sun4-solaris
SunOS solaris i86pc-solaris
SunOS4 sunos sun4-sunos
Because the value of L<C<$Config{archname}>|Config/C<archname>> may
depend on the hardware architecture, it can vary more than the value of
L<C<$^O>|perlvar/$^O>.
=head2 DOS and Derivatives
Perl has long been ported to Intel-style microcomputers running under
systems like PC-DOS, MS-DOS, OS/2, and most Windows platforms you can
bring yourself to mention (except for Windows CE, if you count that).
Users familiar with I<COMMAND.COM> or I<CMD.EXE> style shells should
be aware that each of these file specifications may have subtle
differences:
my $filespec0 = "c:/foo/bar/file.txt";
my $filespec1 = "c:\\foo\\bar\\file.txt";
my $filespec2 = 'c:\foo\bar\file.txt';
my $filespec3 = 'c:\\foo\\bar\\file.txt';
System calls accept either C</> or C<\> as the path separator.
However, many command-line utilities of DOS vintage treat C</> as
the option prefix, so may get confused by filenames containing C</>.
Aside from calling any external programs, C</> will work just fine,
and probably better, as it is more consistent with popular usage,
and avoids the problem of remembering what to backwhack and what
not to.
The DOS FAT filesystem can accommodate only "8.3" style filenames. Under
the "case-insensitive, but case-preserving" HPFS (OS/2) and NTFS (NT)
filesystems you may have to be careful about case returned with functions
like L<C<readdir>|perlfunc/readdir DIRHANDLE> or used with functions like
L<C<open>|perlfunc/open FILEHANDLE,EXPR> or
L<C<opendir>|perlfunc/opendir DIRHANDLE,EXPR>.
DOS also treats several filenames as special, such as F<AUX>, F<PRN>,
F<NUL>, F<CON>, F<COM1>, F<LPT1>, F<LPT2>, etc. Unfortunately, sometimes
these filenames won't even work if you include an explicit directory
prefix. It is best to avoid such filenames, if you want your code to be
portable to DOS and its derivatives. It's hard to know what these all
are, unfortunately.
Users of these operating systems may also wish to make use of
scripts such as F<pl2bat.bat> to put wrappers around your scripts.
Newline (C<\n>) is translated as C<\015\012> by the I/O system when
reading from and writing to files (see L</"Newlines">).
C<binmode($filehandle)> will keep C<\n> translated as C<\012> for that
filehandle.
L<C<binmode>|perlfunc/binmode FILEHANDLE> should always be used for code
that deals with binary data. That's assuming you realize in advance that
your data is in binary. General-purpose programs should often assume
nothing about their data.
The L<C<$^O>|perlvar/$^O> variable and the
L<C<$Config{archname}>|Config/C<archname>> values for various DOSish
perls are as follows:
OS $^O $Config{archname} ID Version
---------------------------------------------------------
MS-DOS dos ?
PC-DOS dos ?
OS/2 os2 ?
Windows 3.1 ? ? 0 3 01
Windows 95 MSWin32 MSWin32-x86 1 4 00
Windows 98 MSWin32 MSWin32-x86 1 4 10
Windows ME MSWin32 MSWin32-x86 1 ?
Windows NT MSWin32 MSWin32-x86 2 4 xx
Windows NT MSWin32 MSWin32-ALPHA 2 4 xx
Windows NT MSWin32 MSWin32-ppc 2 4 xx
Windows 2000 MSWin32 MSWin32-x86 2 5 00
Windows XP MSWin32 MSWin32-x86 2 5 01
Windows 2003 MSWin32 MSWin32-x86 2 5 02
Windows Vista MSWin32 MSWin32-x86 2 6 00
Windows 7 MSWin32 MSWin32-x86 2 6 01
Windows 7 MSWin32 MSWin32-x64 2 6 01
Windows 2008 MSWin32 MSWin32-x86 2 6 01
Windows 2008 MSWin32 MSWin32-x64 2 6 01
Windows CE MSWin32 ? 3
Cygwin cygwin cygwin
The various MSWin32 Perl's can distinguish the OS they are running on
via the value of the fifth element of the list returned from
L<C<Win32::GetOSVersion()>|Win32/Win32::GetOSVersion()>. For example:
if ($^O eq 'MSWin32') {
my @os_version_info = Win32::GetOSVersion();
print +('3.1','95','NT')[$os_version_info[4]],"\n";
}
There are also C<Win32::IsWinNT()|Win32/Win32::IsWinNT()>,
C<Win32::IsWin95()|Win32/Win32::IsWin95()>, and
L<C<Win32::GetOSName()>|Win32/Win32::GetOSName()>; try
L<C<perldoc Win32>|Win32>.
The very portable L<C<POSIX::uname()>|POSIX/C<uname>> will work too:
c:\> perl -MPOSIX -we "print join '|', uname"
Windows NT|moonru|5.0|Build 2195 (Service Pack 2)|x86
Errors set by Winsock functions are now put directly into C<$^E>,
and the relevant C<WSAE*> error codes are now exported from the
L<Errno> and L<POSIX> modules for testing this against.
The previous behavior of putting the errors (converted to POSIX-style
C<E*> error codes since Perl 5.20.0) into C<$!> was buggy due to
the non-equivalence of like-named Winsock and POSIX error constants,
a relationship between which has unfortunately been established
in one way or another since Perl 5.8.0.
The new behavior provides a much more robust solution for checking
Winsock errors in portable software without accidentally matching
POSIX tests that were intended for other OSes and may have different
meanings for Winsock.
The old behavior is currently retained, warts and all, for backwards
compatibility, but users are encouraged to change any code that
tests C<$!> against C<E*> constants for Winsock errors to instead
test C<$^E> against C<WSAE*> constants. After a suitable deprecation
period, which started with Perl 5.24, the old behavior may be
removed, leaving C<$!> unchanged after Winsock function calls, to
avoid any possible confusion over which error variable to check.
Also see:
=over 4
=item *
The djgpp environment for DOS, L<http://www.delorie.com/djgpp/>
and L<perldos>.
=item *
The EMX environment for DOS, OS/2, etc. emx@iaehv.nl,
L<ftp://hobbes.nmsu.edu/pub/os2/dev/emx/> Also L<perlos2>.
=item *
Build instructions for Win32 in L<perlwin32>, or under the Cygnus environment
in L<perlcygwin>.
=item *
The C<Win32::*> modules in L<Win32>.
=item *
The ActiveState Pages, L<http://www.activestate.com/>
=item *
The Cygwin environment for Win32; F<README.cygwin> (installed
as L<perlcygwin>), L<http://www.cygwin.com/>
=item *
The U/WIN environment for Win32,
L<http://www.research.att.com/sw/tools/uwin/>
=item *
Build instructions for OS/2, L<perlos2>
=back
=head2 VMS
Perl on VMS is discussed in L<perlvms> in the Perl distribution.
The official name of VMS as of this writing is OpenVMS.
Interacting with Perl from the Digital Command Language (DCL) shell
often requires a different set of quotation marks than Unix shells do.
For example:
$ perl -e "print ""Hello, world.\n"""
Hello, world.
There are several ways to wrap your Perl scripts in DCL F<.COM> files, if
you are so inclined. For example:
$ write sys$output "Hello from DCL!"
$ if p1 .eqs. ""
$ then perl -x 'f$environment("PROCEDURE")
$ else perl -x - 'p1 'p2 'p3 'p4 'p5 'p6 'p7 'p8
$ deck/dollars="__END__"
#!/usr/bin/perl
print "Hello from Perl!\n";
__END__
$ endif
Do take care with C<$ ASSIGN/nolog/user SYS$COMMAND: SYS$INPUT> if your
Perl-in-DCL script expects to do things like C<< $read = <STDIN>; >>.
The VMS operating system has two filesystems, designated by their
on-disk structure (ODS) level: ODS-2 and its successor ODS-5. The
initial port of Perl to VMS pre-dates ODS-5, but all current testing and
development assumes ODS-5 and its capabilities, including case
preservation, extended characters in filespecs, and names up to 8192
bytes long.
Perl on VMS can accept either VMS- or Unix-style file
specifications as in either of the following:
$ perl -ne "print if /perl_setup/i" SYS$LOGIN:LOGIN.COM
$ perl -ne "print if /perl_setup/i" /sys$login/login.com
but not a mixture of both as in:
$ perl -ne "print if /perl_setup/i" sys$login:/login.com
Can't open sys$login:/login.com: file specification syntax error
In general, the easiest path to portability is always to specify
filenames in Unix format unless they will need to be processed by native
commands or utilities. Because of this latter consideration, the
L<File::Spec> module by default returns native format specifications
regardless of input format. This default may be reversed so that
filenames are always reported in Unix format by specifying the
C<DECC$FILENAME_UNIX_REPORT> feature logical in the environment.
The file type, or extension, is always present in a VMS-format file
specification even if it's zero-length. This means that, by default,
L<C<readdir>|perlfunc/readdir DIRHANDLE> will return a trailing dot on a
file with no extension, so where you would see C<"a"> on Unix you'll see
C<"a."> on VMS. However, the trailing dot may be suppressed by enabling
the C<DECC$READDIR_DROPDOTNOTYPE> feature in the environment (see the CRTL
documentation on feature logical names).
What C<\n> represents depends on the type of file opened. It usually
represents C<\012> but it could also be C<\015>, C<\012>, C<\015\012>,
C<\000>, C<\040>, or nothing depending on the file organization and
record format. The L<C<VMS::Stdio>|VMS::Stdio> module provides access to
the special C<fopen()> requirements of files with unusual attributes on
VMS.
The value of L<C<$^O>|perlvar/$^O> on OpenVMS is "VMS". To determine the
architecture that you are running on refer to
L<C<$Config{archname}>|Config/C<archname>>.
On VMS, perl determines the UTC offset from the C<SYS$TIMEZONE_DIFFERENTIAL>
logical name. Although the VMS epoch began at 17-NOV-1858 00:00:00.00,
calls to L<C<localtime>|perlfunc/localtime EXPR> are adjusted to count
offsets from 01-JAN-1970 00:00:00.00, just like Unix.
Also see:
=over 4
=item *
F<README.vms> (installed as F<README_vms>), L<perlvms>
=item *
vmsperl list, vmsperl-subscribe@perl.org
=item *
vmsperl on the web, L<http://www.sidhe.org/vmsperl/index.html>
=item *
VMS Software Inc. web site, L<http://www.vmssoftware.com>
=back
=head2 VOS
Perl on VOS (also known as OpenVOS) is discussed in F<README.vos>
in the Perl distribution (installed as L<perlvos>). Perl on VOS
can accept either VOS- or Unix-style file specifications as in
either of the following:
$ perl -ne "print if /perl_setup/i" >system>notices
$ perl -ne "print if /perl_setup/i" /system/notices
or even a mixture of both as in:
$ perl -ne "print if /perl_setup/i" >system/notices
Even though VOS allows the slash character to appear in object
names, because the VOS port of Perl interprets it as a pathname
delimiting character, VOS files, directories, or links whose
names contain a slash character cannot be processed. Such files
must be renamed before they can be processed by Perl.
Older releases of VOS (prior to OpenVOS Release 17.0) limit file
names to 32 or fewer characters, prohibit file names from
starting with a C<-> character, and prohibit file names from
containing C< > (space) or any character from the set C<< !#%&'()*;<=>? >>.
Newer releases of VOS (OpenVOS Release 17.0 or later) support a
feature known as extended names. On these releases, file names
can contain up to 255 characters, are prohibited from starting
with a C<-> character, and the set of prohibited characters is
reduced to C<< #%*<>? >>. There are
restrictions involving spaces and apostrophes: these characters
must not begin or end a name, nor can they immediately precede or
follow a period. Additionally, a space must not immediately
precede another space or hyphen. Specifically, the following
character combinations are prohibited: space-space,
space-hyphen, period-space, space-period, period-apostrophe,
apostrophe-period, leading or trailing space, and leading or
trailing apostrophe. Although an extended file name is limited
to 255 characters, a path name is still limited to 256
characters.
The value of L<C<$^O>|perlvar/$^O> on VOS is "vos". To determine the
architecture that you are running on refer to
L<C<$Config{archname}>|Config/C<archname>>.
Also see:
=over 4
=item *
F<README.vos> (installed as L<perlvos>)
=item *
The VOS mailing list.
There is no specific mailing list for Perl on VOS. You can contact
the Stratus Technologies Customer Assistance Center (CAC) for your
region, or you can use the contact information located in the
distribution files on the Stratus Anonymous FTP site.
=item *
Stratus Technologies on the web at L<http://www.stratus.com>
=item *
VOS Open-Source Software on the web at L<http://ftp.stratus.com/pub/vos/vos.html>
=back
=head2 EBCDIC Platforms
v5.22 core Perl runs on z/OS (formerly OS/390). Theoretically it could
run on the successors of OS/400 on AS/400 minicomputers as well as
VM/ESA, and BS2000 for S/390 Mainframes. Such computers use EBCDIC
character sets internally (usually Character Code Set ID 0037 for OS/400
and either 1047 or POSIX-BC for S/390 systems).
The rest of this section may need updating, but we don't know what it
should say. Please email comments to
L<perlbug@perl.org|mailto:perlbug@perl.org>.
On the mainframe Perl currently works under the "Unix system
services for OS/390" (formerly known as OpenEdition), VM/ESA OpenEdition, or
the BS200 POSIX-BC system (BS2000 is supported in Perl 5.6 and greater).
See L<perlos390> for details. Note that for OS/400 there is also a port of
Perl 5.8.1/5.10.0 or later to the PASE which is ASCII-based (as opposed to
ILE which is EBCDIC-based), see L<perlos400>.
As of R2.5 of USS for OS/390 and Version 2.3 of VM/ESA these Unix
sub-systems do not support the C<#!> shebang trick for script invocation.
Hence, on OS/390 and VM/ESA Perl scripts can be executed with a header
similar to the following simple script:
: # use perl
eval 'exec /usr/local/bin/perl -S $0 ${1+"$@"}'
if 0;
#!/usr/local/bin/perl # just a comment really
print "Hello from perl!\n";
OS/390 will support the C<#!> shebang trick in release 2.8 and beyond.
Calls to L<C<system>|perlfunc/system LIST> and backticks can use POSIX
shell syntax on all S/390 systems.
On the AS/400, if PERL5 is in your library list, you may need
to wrap your Perl scripts in a CL procedure to invoke them like so:
BEGIN
CALL PGM(PERL5/PERL) PARM('/QOpenSys/hello.pl')
ENDPGM
This will invoke the Perl script F<hello.pl> in the root of the
QOpenSys file system. On the AS/400 calls to
L<C<system>|perlfunc/system LIST> or backticks must use CL syntax.
On these platforms, bear in mind that the EBCDIC character set may have
an effect on what happens with some Perl functions (such as
L<C<chr>|perlfunc/chr NUMBER>, L<C<pack>|perlfunc/pack TEMPLATE,LIST>,
L<C<print>|perlfunc/print FILEHANDLE LIST>,
L<C<printf>|perlfunc/printf FILEHANDLE FORMAT, LIST>,
L<C<ord>|perlfunc/ord EXPR>, L<C<sort>|perlfunc/sort SUBNAME LIST>,
L<C<sprintf>|perlfunc/sprintf FORMAT, LIST>,
L<C<unpack>|perlfunc/unpack TEMPLATE,EXPR>), as
well as bit-fiddling with ASCII constants using operators like
L<C<^>, C<&> and C<|>|perlop/Bitwise String Operators>, not to mention
dealing with socket interfaces to ASCII computers (see L</"Newlines">).
Fortunately, most web servers for the mainframe will correctly
translate the C<\n> in the following statement to its ASCII equivalent
(C<\r> is the same under both Unix and z/OS):
print "Content-type: text/html\r\n\r\n";
The values of L<C<$^O>|perlvar/$^O> on some of these platforms include:
uname $^O $Config{archname}
--------------------------------------------
OS/390 os390 os390
OS400 os400 os400
POSIX-BC posix-bc BS2000-posix-bc
Some simple tricks for determining if you are running on an EBCDIC
platform could include any of the following (perhaps all):
if ("\t" eq "\005") { print "EBCDIC may be spoken here!\n"; }
if (ord('A') == 193) { print "EBCDIC may be spoken here!\n"; }
if (chr(169) eq 'z') { print "EBCDIC may be spoken here!\n"; }
One thing you may not want to rely on is the EBCDIC encoding
of punctuation characters since these may differ from code page to code
page (and once your module or script is rumoured to work with EBCDIC,
folks will want it to work with all EBCDIC character sets).
Also see:
=over 4
=item *
L<perlos390>, L<perlos400>, L<perlbs2000>, L<perlebcdic>.
=item *
The perl-mvs@perl.org list is for discussion of porting issues as well as
general usage issues for all EBCDIC Perls. Send a message body of
"subscribe perl-mvs" to majordomo@perl.org.
=item *
AS/400 Perl information at
L<http://as400.rochester.ibm.com/>
as well as on CPAN in the F<ports/> directory.
=back
=head2 Acorn RISC OS
Because Acorns use ASCII with newlines (C<\n>) in text files as C<\012> like
Unix, and because Unix filename emulation is turned on by default,
most simple scripts will probably work "out of the box". The native
filesystem is modular, and individual filesystems are free to be
case-sensitive or insensitive, and are usually case-preserving. Some
native filesystems have name length limits, which file and directory
names are silently truncated to fit. Scripts should be aware that the
standard filesystem currently has a name length limit of B<10>
characters, with up to 77 items in a directory, but other filesystems
may not impose such limitations.
Native filenames are of the form
Filesystem#Special_Field::DiskName.$.Directory.Directory.File
where
Special_Field is not usually present, but may contain . and $ .
Filesystem =~ m|[A-Za-z0-9_]|
DsicName =~ m|[A-Za-z0-9_/]|
$ represents the root directory
. is the path separator
@ is the current directory (per filesystem but machine global)
^ is the parent directory
Directory and File =~ m|[^\0- "\.\$\%\&:\@\\^\|\177]+|
The default filename translation is roughly C<tr|/.|./|>, swapping dots
and slahes.
Note that C<"ADFS::HardDisk.$.File" ne 'ADFS::HardDisk.$.File'> and that
the second stage of C<$> interpolation in regular expressions will fall
foul of the L<C<$.>|perlvar/$.> variable if scripts are not careful.
Logical paths specified by system variables containing comma-separated
search lists are also allowed; hence C<System:Modules> is a valid
filename, and the filesystem will prefix C<Modules> with each section of
C<System$Path> until a name is made that points to an object on disk.
Writing to a new file C<System:Modules> would be allowed only if
C<System$Path> contains a single item list. The filesystem will also
expand system variables in filenames if enclosed in angle brackets, so
C<< <System$Dir>.Modules >> would look for the file
S<C<$ENV{'System$Dir'} . 'Modules'>>. The obvious implication of this is
that B<fully qualified filenames can start with C<< <> >>> and the
three-argument form of L<C<open>|perlfunc/open FILEHANDLE,EXPR> should
always be used.
Because C<.> was in use as a directory separator and filenames could not
be assumed to be unique after 10 characters, Acorn implemented the C
compiler to strip the trailing C<.c> C<.h> C<.s> and C<.o> suffix from
filenames specified in source code and store the respective files in
subdirectories named after the suffix. Hence files are translated:
foo.h h.foo
C:foo.h C:h.foo (logical path variable)
sys/os.h sys.h.os (C compiler groks Unix-speak)
10charname.c c.10charname
10charname.o o.10charname
11charname_.c c.11charname (assuming filesystem truncates at 10)
The Unix emulation library's translation of filenames to native assumes
that this sort of translation is required, and it allows a user-defined list
of known suffixes that it will transpose in this fashion. This may
seem transparent, but consider that with these rules F<foo/bar/baz.h>
and F<foo/bar/h/baz> both map to F<foo.bar.h.baz>, and that
L<C<readdir>|perlfunc/readdir DIRHANDLE> and L<C<glob>|perlfunc/glob EXPR>
cannot and do not attempt to emulate the reverse mapping. Other
C<.>'s in filenames are translated to C</>.
As implied above, the environment accessed through
L<C<%ENV>|perlvar/%ENV> is global, and the convention is that program
specific environment variables are of the form C<Program$Name>.
Each filesystem maintains a current directory,
and the current filesystem's current directory is the B<global> current
directory. Consequently, sociable programs don't change the current
directory but rely on full pathnames, and programs (and Makefiles) cannot
assume that they can spawn a child process which can change the current
directory without affecting its parent (and everyone else for that
matter).
Because native operating system filehandles are global and are currently
allocated down from 255, with 0 being a reserved value, the Unix emulation
library emulates Unix filehandles. Consequently, you can't rely on
passing C<STDIN>, C<STDOUT>, or C<STDERR> to your children.
The desire of users to express filenames of the form
C<< <Foo$Dir>.Bar >> on the command line unquoted causes problems,
too: L<C<``>|perlop/C<qxE<sol>I<STRING>E<sol>>> command output capture has
to perform a guessing game. It assumes that a string C<< <[^<>]+\$[^<>]> >>
is a reference to an environment variable, whereas anything else involving
C<< < >> or C<< > >> is redirection, and generally manages to be 99%
right. Of course, the problem remains that scripts cannot rely on any
Unix tools being available, or that any tools found have Unix-like command
line arguments.
Extensions and XS are, in theory, buildable by anyone using free
tools. In practice, many don't, as users of the Acorn platform are
used to binary distributions. MakeMaker does run, but no available
make currently copes with MakeMaker's makefiles; even if and when
this should be fixed, the lack of a Unix-like shell will cause
problems with makefile rules, especially lines of the form
C<cd sdbm && make all>, and anything using quoting.
S<"RISC OS"> is the proper name for the operating system, but the value
in L<C<$^O>|perlvar/$^O> is "riscos" (because we don't like shouting).
=head2 Other perls
Perl has been ported to many platforms that do not fit into any of
the categories listed above. Some, such as AmigaOS,
QNX, Plan 9, and VOS, have been well-integrated into the standard
Perl source code kit. You may need to see the F<ports/> directory
on CPAN for information, and possibly binaries, for the likes of:
aos, Atari ST, lynxos, riscos, Novell Netware, Tandem Guardian,
I<etc.> (Yes, we know that some of these OSes may fall under the
Unix category, but we are not a standards body.)
Some approximate operating system names and their L<C<$^O>|perlvar/$^O>
values in the "OTHER" category include:
OS $^O $Config{archname}
------------------------------------------
Amiga DOS amigaos m68k-amigos
See also:
=over 4
=item *
Amiga, F<README.amiga> (installed as L<perlamiga>).
=item *
A free perl5-based PERL.NLM for Novell Netware is available in
precompiled binary and source code form from L<http://www.novell.com/>
as well as from CPAN.
=item *
S<Plan 9>, F<README.plan9>
=back
=head1 FUNCTION IMPLEMENTATIONS
Listed below are functions that are either completely unimplemented
or else have been implemented differently on various platforms.
Preceding each description will be, in parentheses, a list of
platforms that the description applies to.
The list may well be incomplete, or even wrong in some places. When
in doubt, consult the platform-specific README files in the Perl
source distribution, and any other documentation resources accompanying
a given port.
Be aware, moreover, that even among Unix-ish systems there are variations.
For many functions, you can also query L<C<%Config>|Config/DESCRIPTION>,
exported by default from the L<C<Config>|Config> module. For example, to
check whether the platform has the L<C<lstat>|perlfunc/lstat FILEHANDLE>
call, check L<C<$Config{d_lstat}>|Config/C<d_lstat>>. See L<Config> for a
full description of available variables.
=head2 Alphabetical Listing of Perl Functions
=over 8
=item -X
(Win32)
C<-w> only inspects the read-only file attribute (FILE_ATTRIBUTE_READONLY),
which determines whether the directory can be deleted, not whether it can
be written to. Directories always have read and write access unless denied
by discretionary access control lists (DACLs).
(VMS)
C<-r>, C<-w>, C<-x>, and C<-o> tell whether the file is accessible,
which may not reflect UIC-based file protections.
(S<RISC OS>)
C<-s> by name on an open file will return the space reserved on disk,
rather than the current extent. C<-s> on an open filehandle returns the
current size.
(Win32, VMS, S<RISC OS>)
C<-R>, C<-W>, C<-X>, C<-O> are indistinguishable from C<-r>, C<-w>,
C<-x>, C<-o>.
(Win32, VMS, S<RISC OS>)
C<-g>, C<-k>, C<-l>, C<-u>, C<-A> are not particularly meaningful.
(VMS, S<RISC OS>)
C<-p> is not particularly meaningful.
(VMS)
C<-d> is true if passed a device spec without an explicit directory.
(Win32)
C<-x> (or C<-X>) determine if a file ends in one of the executable
suffixes. C<-S> is meaningless.
(S<RISC OS>)
C<-x> (or C<-X>) determine if a file has an executable file type.
=item alarm
(Win32)
Emulated using timers that must be explicitly polled whenever Perl
wants to dispatch "safe signals" and therefore cannot interrupt
blocking system calls.
=item atan2
(Tru64, HP-UX 10.20)
Due to issues with various CPUs, math libraries, compilers, and standards,
results for C<atan2> may vary depending on any combination of the above.
Perl attempts to conform to the Open Group/IEEE standards for the results
returned from C<atan2>, but cannot force the issue if the system Perl is
run on does not allow it.
The current version of the standards for C<atan2> is available at
L<http://www.opengroup.org/onlinepubs/009695399/functions/atan2.html>.
=item binmode
(S<RISC OS>)
Meaningless.
(VMS)
Reopens file and restores pointer; if function fails, underlying
filehandle may be closed, or pointer may be in a different position.
(Win32)
The value returned by L<C<tell>|perlfunc/tell FILEHANDLE> may be affected
after the call, and the filehandle may be flushed.
=item chmod
(Win32)
Only good for changing "owner" read-write access; "group" and "other"
bits are meaningless.
(S<RISC OS>)
Only good for changing "owner" and "other" read-write access.
(VOS)
Access permissions are mapped onto VOS access-control list changes.
(Cygwin)
The actual permissions set depend on the value of the C<CYGWIN> variable
in the SYSTEM environment settings.
(Android)
Setting the exec bit on some locations (generally F</sdcard>) will return true
but not actually set the bit.
=item chown
(S<Plan 9>, S<RISC OS>)
Not implemented.
(Win32)
Does nothing, but won't fail.
(VOS)
A little funky, because VOS's notion of ownership is a little funky.
=item chroot
(Win32, VMS, S<Plan 9>, S<RISC OS>, VOS)
Not implemented.
=item crypt
(Win32)
May not be available if library or source was not provided when building
perl.
(Android)
Not implemented.
=item dbmclose
(VMS, S<Plan 9>, VOS)
Not implemented.
=item dbmopen
(VMS, S<Plan 9>, VOS)
Not implemented.
=item dump
(S<RISC OS>)
Not useful.
(Cygwin, Win32)
Not supported.
(VMS)
Invokes VMS debugger.
=item exec
(Win32)
C<exec LIST> without the use of indirect object syntax (C<exec PROGRAM LIST>)
may fall back to trying the shell if the first C<spawn()> fails.
(SunOS, Solaris, HP-UX)
Does not automatically flush output handles on some platforms.
(Symbian OS)
Not supported.
=item exit
(VMS)
Emulates Unix C<exit> (which considers C<exit 1> to indicate an error) by
mapping the C<1> to C<SS$_ABORT> (C<44>). This behavior may be overridden
with the pragma L<C<use vmsish 'exit'>|vmsish/C<vmsish exit>>. As with
the CRTL's C<exit()> function, C<exit 0> is also mapped to an exit status
of C<SS$_NORMAL> (C<1>); this mapping cannot be overridden. Any other
argument to C<exit>
is used directly as Perl's exit status. On VMS, unless the future
POSIX_EXIT mode is enabled, the exit code should always be a valid
VMS exit code and not a generic number. When the POSIX_EXIT mode is
enabled, a generic number will be encoded in a method compatible with
the C library _POSIX_EXIT macro so that it can be decoded by other
programs, particularly ones written in C, like the GNV package.
(Solaris)
C<exit> resets file pointers, which is a problem when called
from a child process (created by L<C<fork>|perlfunc/fork>) in
L<C<BEGIN>|perlmod/BEGIN, UNITCHECK, CHECK, INIT and END>.
A workaround is to use L<C<POSIX::_exit>|POSIX/C<_exit>>.
exit unless $Config{archname} =~ /\bsolaris\b/;
require POSIX;
POSIX::_exit(0);
=item fcntl
(Win32)
Not implemented.
(VMS)
Some functions available based on the version of VMS.
=item flock
(VMS, S<RISC OS>, VOS)
Not implemented.
=item fork
(AmigaOS, S<RISC OS>, VMS)
Not implemented.
(Win32)
Emulated using multiple interpreters. See L<perlfork>.
(SunOS, Solaris, HP-UX)
Does not automatically flush output handles on some platforms.
=item getlogin
(S<RISC OS>)
Not implemented.
=item getpgrp
(Win32, VMS, S<RISC OS>)
Not implemented.
=item getppid
(Win32, S<RISC OS>)
Not implemented.
=item getpriority
(Win32, VMS, S<RISC OS>, VOS)
Not implemented.
=item getpwnam
(Win32)
Not implemented.
(S<RISC OS>)
Not useful.
=item getgrnam
(Win32, VMS, S<RISC OS>)
Not implemented.
=item getnetbyname
(Android, Win32, S<Plan 9>)
Not implemented.
=item getpwuid
(Win32)
Not implemented.
(S<RISC OS>)
Not useful.
=item getgrgid
(Win32, VMS, S<RISC OS>)
Not implemented.
=item getnetbyaddr
(Android, Win32, S<Plan 9>)
Not implemented.
=item getprotobynumber
(Android)
Not implemented.
=item getpwent
(Android, Win32)
Not implemented.
=item getgrent
(Android, Win32, VMS)
Not implemented.
=item gethostbyname
(S<Irix 5>)
C<gethostbyname('localhost')> does not work everywhere: you may have
to use C<gethostbyname('127.0.0.1')>.
=item gethostent
(Win32)
Not implemented.
=item getnetent
(Android, Win32, S<Plan 9>)
Not implemented.
=item getprotoent
(Android, Win32, S<Plan 9>)
Not implemented.
=item getservent
(Win32, S<Plan 9>)
Not implemented.
=item seekdir
(Android)
Not implemented.
=item sethostent
(Android, Win32, S<Plan 9>, S<RISC OS>)
Not implemented.
=item setnetent
(Win32, S<Plan 9>, S<RISC OS>)
Not implemented.
=item setprotoent
(Android, Win32, S<Plan 9>, S<RISC OS>)
Not implemented.
=item setservent
(S<Plan 9>, Win32, S<RISC OS>)
Not implemented.
=item endpwent
(Win32)
Not implemented.
(Android)
Either not implemented or a no-op.
=item endgrent
(Android, S<RISC OS>, VMS, Win32)
Not implemented.
=item endhostent
(Android, Win32)
Not implemented.
=item endnetent
(Android, Win32, S<Plan 9>)
Not implemented.
=item endprotoent
(Android, Win32, S<Plan 9>)
Not implemented.
=item endservent
(S<Plan 9>, Win32)
Not implemented.
=item getsockopt
(S<Plan 9>)
Not implemented.
=item glob
This operator is implemented via the L<C<File::Glob>|File::Glob> extension
on most platforms. See L<File::Glob> for portability information.
=item gmtime
In theory, C<gmtime> is reliable from -2**63 to 2**63-1. However,
because work-arounds in the implementation use floating point numbers,
it will become inaccurate as the time gets larger. This is a bug and
will be fixed in the future.
(VOS)
Time values are 32-bit quantities.
=item ioctl
(VMS)
Not implemented.
(Win32)
Available only for socket handles, and it does what the C<ioctlsocket()> call
in the Winsock API does.
(S<RISC OS>)
Available only for socket handles.
=item kill
(S<RISC OS>)
Not implemented, hence not useful for taint checking.
(Win32)
C<kill> doesn't send a signal to the identified process like it does on
Unix platforms. Instead C<kill($sig, $pid)> terminates the process
identified by C<$pid>, and makes it exit immediately with exit status
C<$sig>. As in Unix, if C<$sig> is 0 and the specified process exists, it
returns true without actually terminating it.
(Win32)
C<kill(-9, $pid)> will terminate the process specified by C<$pid> and
recursively all child processes owned by it. This is different from
the Unix semantics, where the signal will be delivered to all
processes in the same process group as the process specified by
C<$pid>.
(VMS)
A pid of -1 indicating all processes on the system is not currently
supported.
=item link
(S<RISC OS>, VOS)
Not implemented.
(AmigaOS)
Link count not updated because hard links are not quite that hard
(They are sort of half-way between hard and soft links).
(Win32)
Hard links are implemented on Win32 under NTFS only. They are
natively supported on Windows 2000 and later. On Windows NT they
are implemented using the Windows POSIX subsystem support and the
Perl process will need Administrator or Backup Operator privileges
to create hard links.
(VMS)
Available on 64 bit OpenVMS 8.2 and later.
=item localtime
C<localtime> has the same range as L</gmtime>, but because time zone
rules change, its accuracy for historical and future times may degrade
but usually by no more than an hour.
=item lstat
(S<RISC OS>)
Not implemented.
(Win32)
Return values (especially for device and inode) may be bogus.
=item msgctl
=item msgget
=item msgsnd
=item msgrcv
(Android, Win32, VMS, S<Plan 9>, S<RISC OS>, VOS)
Not implemented.
=item open
(Win32, S<RISC OS>)
Open modes C<|-> and C<-|> are unsupported.
(SunOS, Solaris, HP-UX)
Opening a process does not automatically flush output handles on some
platforms.
=item readlink
(Win32, VMS, S<RISC OS>)
Not implemented.
=item rename
(Win32)
Can't move directories between directories on different logical volumes.
=item rewinddir
(Win32)
Will not cause L<C<readdir>|perlfunc/readdir DIRHANDLE> to re-read the
directory stream. The entries already read before the C<rewinddir> call
will just be returned again from a cache buffer.
=item select
(Win32, VMS)
Only implemented on sockets.
(S<RISC OS>)
Only reliable on sockets.
Note that the L<C<select FILEHANDLE>|perlfunc/select FILEHANDLE> form is
generally portable.
=item semctl
=item semget
=item semop
(Android, Win32, VMS, S<RISC OS>)
Not implemented.
=item setgrent
(Android, VMS, Win32, S<RISC OS>)
Not implemented.
=item setpgrp
(Win32, VMS, S<RISC OS>, VOS)
Not implemented.
=item setpriority
(Win32, VMS, S<RISC OS>, VOS)
Not implemented.
=item setpwent
(Android, Win32, S<RISC OS>)
Not implemented.
=item setsockopt
(S<Plan 9>)
Not implemented.
=item shmctl
=item shmget
=item shmread
=item shmwrite
(Android, Win32, VMS, S<RISC OS>)
Not implemented.
=item sleep
(Win32)
Emulated using synchronization functions such that it can be
interrupted by L<C<alarm>|perlfunc/alarm SECONDS>, and limited to a
maximum of 4294967 seconds, approximately 49 days.
=item socketpair
(S<RISC OS>)
Not implemented.
(VMS)
Available on 64 bit OpenVMS 8.2 and later.
=item stat
Platforms that do not have C<rdev>, C<blksize>, or C<blocks> will return
these as C<''>, so numeric comparison or manipulation of these fields may
cause 'not numeric' warnings.
(S<Mac OS X>)
C<ctime> not supported on UFS.
(Win32)
C<ctime> is creation time instead of inode change time.
(Win32)
C<dev> and C<ino> are not meaningful.
(VMS)
C<dev> and C<ino> are not necessarily reliable.
(S<RISC OS>)
C<mtime>, C<atime> and C<ctime> all return the last modification time.
C<dev> and C<ino> are not necessarily reliable.
(OS/2)
C<dev>, C<rdev>, C<blksize>, and C<blocks> are not available. C<ino> is not
meaningful and will differ between stat calls on the same file.
(Cygwin)
Some versions of cygwin when doing a C<stat("foo")> and not finding it
may then attempt to C<stat("foo.exe")>.
(Win32)
C<stat> needs to open the file to determine the link count
and update attributes that may have been changed through hard links.
Setting L<C<${^WIN32_SLOPPY_STAT}>|perlvar/${^WIN32_SLOPPY_STAT}> to a
true value speeds up C<stat> by not performing this operation.
=item symlink
(Win32, S<RISC OS>)
Not implemented.
(VMS)
Implemented on 64 bit VMS 8.3. VMS requires the symbolic link to be in Unix
syntax if it is intended to resolve to a valid path.
=item syscall
(Win32, VMS, S<RISC OS>, VOS)
Not implemented.
=item sysopen
(S<Mac OS>, OS/390)
The traditional C<0>, C<1>, and C<2> MODEs are implemented with different
numeric values on some systems. The flags exported by L<C<Fcntl>|Fcntl>
(C<O_RDONLY>, C<O_WRONLY>, C<O_RDWR>) should work everywhere though.
=item system
(Win32)
As an optimization, may not call the command shell specified in
C<$ENV{PERL5SHELL}>. C<system(1, @args)> spawns an external
process and immediately returns its process designator, without
waiting for it to terminate. Return value may be used subsequently
in L<C<wait>|perlfunc/wait> or L<C<waitpid>|perlfunc/waitpid PID,FLAGS>.
Failure to C<spawn()> a subprocess is indicated by setting
L<C<$?>|perlvar/$?> to C<<< 255 << 8 >>>. L<C<$?>|perlvar/$?> is set in a
way compatible with Unix (i.e. the exit status of the subprocess is
obtained by C<<< $? >> 8 >>>, as described in the documentation).
(S<RISC OS>)
There is no shell to process metacharacters, and the native standard is
to pass a command line terminated by "\n" "\r" or "\0" to the spawned
program. Redirection such as C<< > foo >> is performed (if at all) by
the run time library of the spawned program. C<system LIST> will call
the Unix emulation library's L<C<exec>|perlfunc/exec LIST> emulation,
which attempts to provide emulation of the stdin, stdout, stderr in force
in the parent, provided the child program uses a compatible version of the
emulation library. C<system SCALAR> will call the native command line
directly and no such emulation of a child Unix program will occur.
Mileage B<will> vary.
(Win32)
C<system LIST> without the use of indirect object syntax (C<system PROGRAM LIST>)
may fall back to trying the shell if the first C<spawn()> fails.
(SunOS, Solaris, HP-UX)
Does not automatically flush output handles on some platforms.
(VMS)
The return value is POSIX-like (shifted up by 8 bits), which only allows
room for a made-up value derived from the severity bits of the native
32-bit condition code (unless overridden by
L<C<use vmsish 'status'>|vmsish/C<vmsish status>>). If the native
condition code is one that has a POSIX value encoded, the POSIX value will
be decoded to extract the expected exit value. For more details see
L<perlvms/$?>.
=item telldir
(Android)
Not implemented.
=item times
(Win32)
"Cumulative" times will be bogus. On anything other than Windows NT
or Windows 2000, "system" time will be bogus, and "user" time is
actually the time returned by the L<C<clock()>|clock(3)> function in the C
runtime library.
(S<RISC OS>)
Not useful.
=item truncate
(Older versions of VMS)
Not implemented.
(VOS)
Truncation to same-or-shorter lengths only.
(Win32)
If a FILEHANDLE is supplied, it must be writable and opened in append
mode (i.e., use C<<< open(my $fh, '>>', 'filename') >>>
or C<sysopen(my $fh, ..., O_APPEND|O_RDWR)>. If a filename is supplied, it
should not be held open elsewhere.
=item umask
Returns C<undef> where unavailable.
(AmigaOS)
C<umask> works but the correct permissions are set only when the file
is finally closed.
=item utime
(VMS, S<RISC OS>)
Only the modification time is updated.
(Win32)
May not behave as expected. Behavior depends on the C runtime
library's implementation of L<C<utime()>|utime(2)>, and the filesystem
being used. The FAT filesystem typically does not support an "access
time" field, and it may limit timestamps to a granularity of two seconds.
=item wait
=item waitpid
(Win32)
Can only be applied to process handles returned for processes spawned
using C<system(1, ...)> or pseudo processes created with
L<C<fork>|perlfunc/fork>.
(S<RISC OS>)
Not useful.
=back
=head1 Supported Platforms
The following platforms are known to build Perl 5.12 (as of April 2010,
its release date) from the standard source code distribution available
at L<http://www.cpan.org/src>
=over
=item Linux (x86, ARM, IA64)
=item HP-UX
=item AIX
=item Win32
=over
=item Windows 2000
=item Windows XP
=item Windows Server 2003
=item Windows Vista
=item Windows Server 2008
=item Windows 7
=back
=item Cygwin
Some tests are known to fail:
=over
=item *
F<ext/XS-APItest/t/call_checker.t> - see
L<https://rt.perl.org/Ticket/Display.html?id=78502>
=item *
F<dist/I18N-Collate/t/I18N-Collate.t>
=item *
F<ext/Win32CORE/t/win32core.t> - may fail on recent cygwin installs.
=back
=item Solaris (x86, SPARC)
=item OpenVMS
=over
=item Alpha (7.2 and later)
=item I64 (8.2 and later)
=back
=item Symbian
=item NetBSD
=item FreeBSD
=item Debian GNU/kFreeBSD
=item Haiku
=item Irix (6.5. What else?)
=item OpenBSD
=item Dragonfly BSD
=item Midnight BSD
=item QNX Neutrino RTOS (6.5.0)
=item MirOS BSD
=item Stratus OpenVOS (17.0 or later)
Caveats:
=over
=item time_t issues that may or may not be fixed
=back
=item Symbian (Series 60 v3, 3.2 and 5 - what else?)
=item Stratus VOS / OpenVOS
=item AIX
=item Android
=item FreeMINT
Perl now builds with FreeMiNT/Atari. It fails a few tests, that needs
some investigation.
The FreeMiNT port uses GNU dld for loadable module capabilities. So
ensure you have that library installed when building perl.
=back
=head1 EOL Platforms
=head2 (Perl 5.20)
The following platforms were supported by a previous version of
Perl but have been officially removed from Perl's source code
as of 5.20:
=over
=item AT&T 3b1
=back
=head2 (Perl 5.14)
The following platforms were supported up to 5.10. They may still
have worked in 5.12, but supporting code has been removed for 5.14:
=over
=item Windows 95
=item Windows 98
=item Windows ME
=item Windows NT4
=back
=head2 (Perl 5.12)
The following platforms were supported by a previous version of
Perl but have been officially removed from Perl's source code
as of 5.12:
=over
=item Atari MiNT
=item Apollo Domain/OS
=item Apple Mac OS 8/9
=item Tenon Machten
=back
=head1 Supported Platforms (Perl 5.8)
As of July 2002 (the Perl release 5.8.0), the following platforms were
able to build Perl from the standard source code distribution
available at L<http://www.cpan.org/src/>
AIX
BeOS
BSD/OS (BSDi)
Cygwin
DG/UX
DOS DJGPP 1)
DYNIX/ptx
EPOC R5
FreeBSD
HI-UXMPP (Hitachi) (5.8.0 worked but we didn't know it)
HP-UX
IRIX
Linux
Mac OS Classic
Mac OS X (Darwin)
MPE/iX
NetBSD
NetWare
NonStop-UX
ReliantUNIX (formerly SINIX)
OpenBSD
OpenVMS (formerly VMS)
Open UNIX (Unixware) (since Perl 5.8.1/5.9.0)
OS/2
OS/400 (using the PASE) (since Perl 5.8.1/5.9.0)
PowerUX
POSIX-BC (formerly BS2000)
QNX
Solaris
SunOS 4
SUPER-UX (NEC)
Tru64 UNIX (formerly DEC OSF/1, Digital UNIX)
UNICOS
UNICOS/mk
UTS
VOS / OpenVOS
Win95/98/ME/2K/XP 2)
WinCE
z/OS (formerly OS/390)
VM/ESA
1) in DOS mode either the DOS or OS/2 ports can be used
2) compilers: Borland, MinGW (GCC), VC6
The following platforms worked with the previous releases (5.6 and
5.7), but we did not manage either to fix or to test these in time
for the 5.8.0 release. There is a very good chance that many of these
will work fine with the 5.8.0.
BSD/OS
DomainOS
Hurd
LynxOS
MachTen
PowerMAX
SCO SV
SVR4
Unixware
Windows 3.1
Known to be broken for 5.8.0 (but 5.6.1 and 5.7.2 can be used):
AmigaOS 3
The following platforms have been known to build Perl from source in
the past (5.005_03 and earlier), but we haven't been able to verify
their status for the current release, either because the
hardware/software platforms are rare or because we don't have an
active champion on these platforms--or both. They used to work,
though, so go ahead and try compiling them, and let perlbug@perl.org
of any trouble.
3b1
A/UX
ConvexOS
CX/UX
DC/OSx
DDE SMES
DOS EMX
Dynix
EP/IX
ESIX
FPS
GENIX
Greenhills
ISC
MachTen 68k
MPC
NEWS-OS
NextSTEP
OpenSTEP
Opus
Plan 9
RISC/os
SCO ODT/OSR
Stellar
SVR2
TI1500
TitanOS
Ultrix
Unisys Dynix
The following platforms have their own source code distributions and
binaries available via L<http://www.cpan.org/ports/>
Perl release
OS/400 (ILE) 5.005_02
Tandem Guardian 5.004
The following platforms have only binaries available via
L<http://www.cpan.org/ports/index.html> :
Perl release
Acorn RISCOS 5.005_02
AOS 5.002
LynxOS 5.004_02
Although we do suggest that you always build your own Perl from
the source code, both for maximal configurability and for security,
in case you are in a hurry you can check
L<http://www.cpan.org/ports/index.html> for binary distributions.
=head1 SEE ALSO
L<perlaix>, L<perlamiga>, L<perlbs2000>,
L<perlce>, L<perlcygwin>, L<perldos>,
L<perlebcdic>, L<perlfreebsd>, L<perlhurd>, L<perlhpux>, L<perlirix>,
L<perlmacos>, L<perlmacosx>,
L<perlnetware>, L<perlos2>, L<perlos390>, L<perlos400>,
L<perlplan9>, L<perlqnx>, L<perlsolaris>, L<perltru64>,
L<perlunicode>, L<perlvms>, L<perlvos>, L<perlwin32>, and L<Win32>.
=head1 AUTHORS / CONTRIBUTORS
Abigail <abigail@abigail.be>,
Charles Bailey <bailey@newman.upenn.edu>,
Graham Barr <gbarr@pobox.com>,
Tom Christiansen <tchrist@perl.com>,
Nicholas Clark <nick@ccl4.org>,
Thomas Dorner <Thomas.Dorner@start.de>,
Andy Dougherty <doughera@lafayette.edu>,
Dominic Dunlop <domo@computer.org>,
Neale Ferguson <neale@vma.tabnsw.com.au>,
David J. Fiander <davidf@mks.com>,
Paul Green <Paul.Green@stratus.com>,
M.J.T. Guy <mjtg@cam.ac.uk>,
Jarkko Hietaniemi <jhi@iki.fi>,
Luther Huffman <lutherh@stratcom.com>,
Nick Ing-Simmons <nick@ing-simmons.net>,
Andreas J. KE<ouml>nig <a.koenig@mind.de>,
Markus Laker <mlaker@contax.co.uk>,
Andrew M. Langmead <aml@world.std.com>,
Lukas Mai <l.mai@web.de>,
Larry Moore <ljmoore@freespace.net>,
Paul Moore <Paul.Moore@uk.origin-it.com>,
Chris Nandor <pudge@pobox.com>,
Matthias Neeracher <neeracher@mac.com>,
Philip Newton <pne@cpan.org>,
Gary Ng <71564.1743@CompuServe.COM>,
Tom Phoenix <rootbeer@teleport.com>,
AndrE<eacute> Pirard <A.Pirard@ulg.ac.be>,
Peter Prymmer <pvhp@forte.com>,
Hugo van der Sanden <hv@crypt0.demon.co.uk>,
Gurusamy Sarathy <gsar@activestate.com>,
Paul J. Schinder <schinder@pobox.com>,
Michael G Schwern <schwern@pobox.com>,
Dan Sugalski <dan@sidhe.org>,
Nathan Torkington <gnat@frii.com>,
John Malmberg <wb8tyw@qsl.net>
PK y3�Z+
�6� � perl5261delta.podnu �[��� =encoding utf8
=head1 NAME
perl5261delta - what is new for perl v5.26.1
=head1 DESCRIPTION
This document describes differences between the 5.26.0 release and the 5.26.1
release.
If you are upgrading from an earlier release such as 5.24.0, first read
L<perl5260delta>, which describes differences between 5.24.0 and 5.26.0.
=head1 Security
=head2 [CVE-2017-12837] Heap buffer overflow in regular expression compiler
Compiling certain regular expression patterns with the case-insensitive
modifier could cause a heap buffer overflow and crash perl. This has now been
fixed.
L<[perl #131582]|https://rt.perl.org/Public/Bug/Display.html?id=131582>
=head2 [CVE-2017-12883] Buffer over-read in regular expression parser
For certain types of syntax error in a regular expression pattern, the error
message could either contain the contents of a random, possibly large, chunk of
memory, or could crash perl. This has now been fixed.
L<[perl #131598]|https://rt.perl.org/Public/Bug/Display.html?id=131598>
=head2 [CVE-2017-12814] C<$ENV{$key}> stack buffer overflow on Windows
A possible stack buffer overflow in the C<%ENV> code on Windows has been fixed
by removing the buffer completely since it was superfluous anyway.
L<[perl #131665]|https://rt.perl.org/Public/Bug/Display.html?id=131665>
=head1 Incompatible Changes
There are no changes intentionally incompatible with 5.26.0. If any exist,
they are bugs, and we request that you submit a report. See L</Reporting
Bugs> below.
=head1 Modules and Pragmata
=head2 Updated Modules and Pragmata
=over 4
=item *
L<base> has been upgraded from version 2.25 to 2.26.
The effects of dotless C<@INC> on this module have been limited by the
introduction of a more refined and accurate solution for removing C<'.'> from
C<@INC> while reducing the false positives.
=item *
L<charnames> has been upgraded from version 1.44 to 1.45.
=item *
L<Module::CoreList> has been upgraded from version 5.20170530 to 5.20170922_26.
=back
=head1 Platform Support
=head2 Platform-Specific Notes
=over 4
=item FreeBSD
=over 4
=item *
Building with B<g++> on FreeBSD-11.0 has been fixed.
L<[perl #131337]|https://rt.perl.org/Public/Bug/Display.html?id=131337>
=back
=item Windows
=over 4
=item *
Support for compiling perl on Windows using Microsoft Visual Studio 2017
(containing Visual C++ 14.1) has been added.
=item *
Building XS modules with GCC 6 in a 64-bit build of Perl failed due to
incorrect mapping of C<strtoll> and C<strtoull>. This has now been fixed.
L<[perl #131726]|https://rt.perl.org/Public/Bug/Display.html?id=131726>
L<[cpan #121683]|https://rt.cpan.org/Public/Bug/Display.html?id=121683>
L<[cpan #122353]|https://rt.cpan.org/Public/Bug/Display.html?id=122353>
=back
=back
=head1 Selected Bug Fixes
=over 4
=item *
Several built-in functions previously had bugs that could cause them to write
to the internal stack without allocating room for the item being written. In
rare situations, this could have led to a crash. These bugs have now been
fixed, and if any similar bugs are introduced in future, they will be detected
automatically in debugging builds.
L<[perl #131732]|https://rt.perl.org/Public/Bug/Display.html?id=131732>
=item *
Using a symbolic ref with postderef syntax as the key in a hash lookup was
yielding an assertion failure on debugging builds.
L<[perl #131627]|https://rt.perl.org/Public/Bug/Display.html?id=131627>
=item *
List assignment (C<aassign>) could in some rare cases allocate an entry on the
mortal stack and leave the entry uninitialized.
L<[perl #131570]|https://rt.perl.org/Public/Bug/Display.html?id=131570>
=item *
Attempting to apply an attribute to an C<our> variable where a function of that
name already exists could result in a NULL pointer being supplied where an SV
was expected, crashing perl.
L<[perl #131597]|https://rt.perl.org/Public/Bug/Display.html?id=131597>
=item *
The code that vivifies a typeglob out of a code ref made some false assumptions
that could lead to a crash in cases such as C<< $::{"A"} = sub {}; \&{"A"} >>.
This has now been fixed.
L<[perl #131085]|https://rt.perl.org/Public/Bug/Display.html?id=131085>
=item *
C<my_atof2> no longer reads beyond the terminating NUL, which previously
occurred if the decimal point is immediately before the NUL.
L<[perl #131526]|https://rt.perl.org/Public/Bug/Display.html?id=131526>
=item *
Occasional "Malformed UTF-8 character" crashes in C<s//> on utf8 strings have
been fixed.
L<[perl #131575]|https://rt.perl.org/Public/Bug/Display.html?id=131575>
=item *
C<perldoc -f s> now finds C<s///>.
L<[perl #131371]|https://rt.perl.org/Public/Bug/Display.html?id=131371>
=item *
Some erroneous warnings after utf8 conversion have been fixed.
L<[perl #131190]|https://rt.perl.org/Public/Bug/Display.html?id=131190>
=item *
The C<jmpenv> frame to catch Perl exceptions is set up lazily, and this used to
be a bit too lazy. The catcher is now set up earlier, preventing some possible
crashes.
L<[perl #105930]|https://rt.perl.org/Public/Bug/Display.html?id=105930>
=item *
Spurious "Assuming NOT a POSIX class" warnings have been removed.
L<[perl #131522]|https://rt.perl.org/Public/Bug/Display.html?id=131522>
=back
=head1 Acknowledgements
Perl 5.26.1 represents approximately 4 months of development since Perl 5.26.0
and contains approximately 8,900 lines of changes across 85 files from 23
authors.
Excluding auto-generated files, documentation and release tools, there were
approximately 990 lines of changes to 38 .pm, .t, .c and .h files.
Perl continues to flourish into its third decade thanks to a vibrant community
of users and developers. The following people are known to have contributed
the improvements that became Perl 5.26.1:
Aaron Crane, Andy Dougherty, Aristotle Pagaltzis, Chris 'BinGOs' Williams,
Craig A. Berry, Dagfinn Ilmari Mannsåker, David Mitchell, E. Choroba, Eric
Herman, Father Chrysostomos, Jacques Germishuys, James E Keenan, John SJ
Anderson, Karl Williamson, Ken Brown, Lukas Mai, Matthew Horsfall, Ricardo
Signes, Sawyer X, Steve Hay, Tony Cook, Yves Orton, Zefram.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the perl bug database
at L<https://rt.perl.org/> . There may also be information at
L<http://www.perl.org/> , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug> program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of C<perl -V>,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications which make it
inappropriate to send to a publicly archived mailing list, then see
L<perlsec/SECURITY VULNERABILITY CONTACT INFORMATION> for details of how to
report the issue.
=head1 Give Thanks
If you wish to thank the Perl 5 Porters for the work we had done in Perl 5, you
can do so by running the C<perlthanks> program:
perlthanks
This will send an email to the Perl 5 Porters list with your show of thanks.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details on
what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK y3�Z�N���f �f perlmod.podnu �[��� =head1 NAME
perlmod - Perl modules (packages and symbol tables)
=head1 DESCRIPTION
=head2 Is this the document you were after?
There are other documents which might contain the information that you're
looking for:
=over 2
=item This doc
Perl's packages, namespaces, and some info on classes.
=item L<perlnewmod>
Tutorial on making a new module.
=item L<perlmodstyle>
Best practices for making a new module.
=back
=head2 Packages
X<package> X<namespace> X<variable, global> X<global variable> X<global>
Unlike Perl 4, in which all the variables were dynamic and shared one
global name space, causing maintainability problems, Perl 5 provides two
mechanisms for protecting code from having its variables stomped on by
other code: lexically scoped variables created with C<my> or C<state> and
namespaced global variables, which are exposed via the C<vars> pragma,
or the C<our> keyword. Any global variable is considered to
be part of a namespace and can be accessed via a "fully qualified form".
Conversely, any lexically scoped variable is considered to be part of
that lexical-scope, and does not have a "fully qualified form".
In perl namespaces are called "packages" and
the C<package> declaration tells the compiler which
namespace to prefix to C<our> variables and unqualified dynamic names.
This both protects
against accidental stomping and provides an interface for deliberately
clobbering global dynamic variables declared and used in other scopes or
packages, when that is what you want to do.
The scope of the C<package> declaration is from the
declaration itself through the end of the enclosing block, C<eval>,
or file, whichever comes first (the same scope as the my(), our(), state(), and
local() operators, and also the effect
of the experimental "reference aliasing," which may change), or until
the next C<package> declaration. Unqualified dynamic identifiers will be in
this namespace, except for those few identifiers that, if unqualified,
default to the main package instead of the current one as described
below. A C<package> statement affects only dynamic global
symbols, including subroutine names, and variables you've used local()
on, but I<not> lexical variables created with my(), our() or state().
Typically, a C<package> statement is the first declaration in a file
included in a program by one of the C<do>, C<require>, or C<use> operators. You can
switch into a package in more than one place: C<package> has no
effect beyond specifying which symbol table the compiler will use for
dynamic symbols for the rest of that block or until the next C<package> statement.
You can refer to variables and filehandles in other packages
by prefixing the identifier with the package name and a double
colon: C<$Package::Variable>. If the package name is null, the
C<main> package is assumed. That is, C<$::sail> is equivalent to
C<$main::sail>.
The old package delimiter was a single quote, but double colon is now the
preferred delimiter, in part because it's more readable to humans, and
in part because it's more readable to B<emacs> macros. It also makes C++
programmers feel like they know what's going on--as opposed to using the
single quote as separator, which was there to make Ada programmers feel
like they knew what was going on. Because the old-fashioned syntax is still
supported for backwards compatibility, if you try to use a string like
C<"This is $owner's house">, you'll be accessing C<$owner::s>; that is,
the $s variable in package C<owner>, which is probably not what you meant.
Use braces to disambiguate, as in C<"This is ${owner}'s house">.
X<::> X<'>
Packages may themselves contain package separators, as in
C<$OUTER::INNER::var>. This implies nothing about the order of
name lookups, however. There are no relative packages: all symbols
are either local to the current package, or must be fully qualified
from the outer package name down. For instance, there is nowhere
within package C<OUTER> that C<$INNER::var> refers to
C<$OUTER::INNER::var>. C<INNER> refers to a totally
separate global package. The custom of treating package names as a
hierarchy is very strong, but the language in no way enforces it.
Only identifiers starting with letters (or underscore) are stored
in a package's symbol table. All other symbols are kept in package
C<main>, including all punctuation variables, like $_. In addition,
when unqualified, the identifiers STDIN, STDOUT, STDERR, ARGV,
ARGVOUT, ENV, INC, and SIG are forced to be in package C<main>,
even when used for other purposes than their built-in ones. If you
have a package called C<m>, C<s>, or C<y>, then you can't use the
qualified form of an identifier because it would be instead interpreted
as a pattern match, a substitution, or a transliteration.
X<variable, punctuation>
Variables beginning with underscore used to be forced into package
main, but we decided it was more useful for package writers to be able
to use leading underscore to indicate private variables and method names.
However, variables and functions named with a single C<_>, such as
$_ and C<sub _>, are still forced into the package C<main>. See also
L<perlvar/"The Syntax of Variable Names">.
C<eval>ed strings are compiled in the package in which the eval() was
compiled. (Assignments to C<$SIG{}>, however, assume the signal
handler specified is in the C<main> package. Qualify the signal handler
name if you wish to have a signal handler in a package.) For an
example, examine F<perldb.pl> in the Perl library. It initially switches
to the C<DB> package so that the debugger doesn't interfere with variables
in the program you are trying to debug. At various points, however, it
temporarily switches back to the C<main> package to evaluate various
expressions in the context of the C<main> package (or wherever you came
from). See L<perldebug>.
The special symbol C<__PACKAGE__> contains the current package, but cannot
(easily) be used to construct variable names. After C<my($foo)> has hidden
package variable C<$foo>, it can still be accessed, without knowing what
package you are in, as C<${__PACKAGE__.'::foo'}>.
See L<perlsub> for other scoping issues related to my() and local(),
and L<perlref> regarding closures.
=head2 Symbol Tables
X<symbol table> X<stash> X<%::> X<%main::> X<typeglob> X<glob> X<alias>
The symbol table for a package happens to be stored in the hash of that
name with two colons appended. The main symbol table's name is thus
C<%main::>, or C<%::> for short. Likewise the symbol table for the nested
package mentioned earlier is named C<%OUTER::INNER::>.
The value in each entry of the hash is what you are referring to when you
use the C<*name> typeglob notation.
local *main::foo = *main::bar;
You can use this to print out all the variables in a package, for
instance. The standard but antiquated F<dumpvar.pl> library and
the CPAN module Devel::Symdump make use of this.
The results of creating new symbol table entries directly or modifying any
entries that are not already typeglobs are undefined and subject to change
between releases of perl.
Assignment to a typeglob performs an aliasing operation, i.e.,
*dick = *richard;
causes variables, subroutines, formats, and file and directory handles
accessible via the identifier C<richard> also to be accessible via the
identifier C<dick>. If you want to alias only a particular variable or
subroutine, assign a reference instead:
*dick = \$richard;
Which makes $richard and $dick the same variable, but leaves
@richard and @dick as separate arrays. Tricky, eh?
There is one subtle difference between the following statements:
*foo = *bar;
*foo = \$bar;
C<*foo = *bar> makes the typeglobs themselves synonymous while
C<*foo = \$bar> makes the SCALAR portions of two distinct typeglobs
refer to the same scalar value. This means that the following code:
$bar = 1;
*foo = \$bar; # Make $foo an alias for $bar
{
local $bar = 2; # Restrict changes to block
print $foo; # Prints '1'!
}
Would print '1', because C<$foo> holds a reference to the I<original>
C<$bar>. The one that was stuffed away by C<local()> and which will be
restored when the block ends. Because variables are accessed through the
typeglob, you can use C<*foo = *bar> to create an alias which can be
localized. (But be aware that this means you can't have a separate
C<@foo> and C<@bar>, etc.)
What makes all of this important is that the Exporter module uses glob
aliasing as the import/export mechanism. Whether or not you can properly
localize a variable that has been exported from a module depends on how
it was exported:
@EXPORT = qw($FOO); # Usual form, can't be localized
@EXPORT = qw(*FOO); # Can be localized
You can work around the first case by using the fully qualified name
(C<$Package::FOO>) where you need a local value, or by overriding it
by saying C<*FOO = *Package::FOO> in your script.
The C<*x = \$y> mechanism may be used to pass and return cheap references
into or from subroutines if you don't want to copy the whole
thing. It only works when assigning to dynamic variables, not
lexicals.
%some_hash = (); # can't be my()
*some_hash = fn( \%another_hash );
sub fn {
local *hashsym = shift;
# now use %hashsym normally, and you
# will affect the caller's %another_hash
my %nhash = (); # do what you want
return \%nhash;
}
On return, the reference will overwrite the hash slot in the
symbol table specified by the *some_hash typeglob. This
is a somewhat tricky way of passing around references cheaply
when you don't want to have to remember to dereference variables
explicitly.
Another use of symbol tables is for making "constant" scalars.
X<constant> X<scalar, constant>
*PI = \3.14159265358979;
Now you cannot alter C<$PI>, which is probably a good thing all in all.
This isn't the same as a constant subroutine, which is subject to
optimization at compile-time. A constant subroutine is one prototyped
to take no arguments and to return a constant expression. See
L<perlsub> for details on these. The C<use constant> pragma is a
convenient shorthand for these.
You can say C<*foo{PACKAGE}> and C<*foo{NAME}> to find out what name and
package the *foo symbol table entry comes from. This may be useful
in a subroutine that gets passed typeglobs as arguments:
sub identify_typeglob {
my $glob = shift;
print 'You gave me ', *{$glob}{PACKAGE},
'::', *{$glob}{NAME}, "\n";
}
identify_typeglob *foo;
identify_typeglob *bar::baz;
This prints
You gave me main::foo
You gave me bar::baz
The C<*foo{THING}> notation can also be used to obtain references to the
individual elements of *foo. See L<perlref>.
Subroutine definitions (and declarations, for that matter) need
not necessarily be situated in the package whose symbol table they
occupy. You can define a subroutine outside its package by
explicitly qualifying the name of the subroutine:
package main;
sub Some_package::foo { ... } # &foo defined in Some_package
This is just a shorthand for a typeglob assignment at compile time:
BEGIN { *Some_package::foo = sub { ... } }
and is I<not> the same as writing:
{
package Some_package;
sub foo { ... }
}
In the first two versions, the body of the subroutine is
lexically in the main package, I<not> in Some_package. So
something like this:
package main;
$Some_package::name = "fred";
$main::name = "barney";
sub Some_package::foo {
print "in ", __PACKAGE__, ": \$name is '$name'\n";
}
Some_package::foo();
prints:
in main: $name is 'barney'
rather than:
in Some_package: $name is 'fred'
This also has implications for the use of the SUPER:: qualifier
(see L<perlobj>).
=head2 BEGIN, UNITCHECK, CHECK, INIT and END
X<BEGIN> X<UNITCHECK> X<CHECK> X<INIT> X<END>
Five specially named code blocks are executed at the beginning and at
the end of a running Perl program. These are the C<BEGIN>,
C<UNITCHECK>, C<CHECK>, C<INIT>, and C<END> blocks.
These code blocks can be prefixed with C<sub> to give the appearance of a
subroutine (although this is not considered good style). One should note
that these code blocks don't really exist as named subroutines (despite
their appearance). The thing that gives this away is the fact that you can
have B<more than one> of these code blocks in a program, and they will get
B<all> executed at the appropriate moment. So you can't execute any of
these code blocks by name.
A C<BEGIN> code block is executed as soon as possible, that is, the moment
it is completely defined, even before the rest of the containing file (or
string) is parsed. You may have multiple C<BEGIN> blocks within a file (or
eval'ed string); they will execute in order of definition. Because a C<BEGIN>
code block executes immediately, it can pull in definitions of subroutines
and such from other files in time to be visible to the rest of the compile
and run time. Once a C<BEGIN> has run, it is immediately undefined and any
code it used is returned to Perl's memory pool.
An C<END> code block is executed as late as possible, that is, after
perl has finished running the program and just before the interpreter
is being exited, even if it is exiting as a result of a die() function.
(But not if it's morphing into another program via C<exec>, or
being blown out of the water by a signal--you have to trap that yourself
(if you can).) You may have multiple C<END> blocks within a file--they
will execute in reverse order of definition; that is: last in, first
out (LIFO). C<END> blocks are not executed when you run perl with the
C<-c> switch, or if compilation fails.
Note that C<END> code blocks are B<not> executed at the end of a string
C<eval()>: if any C<END> code blocks are created in a string C<eval()>,
they will be executed just as any other C<END> code block of that package
in LIFO order just before the interpreter is being exited.
Inside an C<END> code block, C<$?> contains the value that the program is
going to pass to C<exit()>. You can modify C<$?> to change the exit
value of the program. Beware of changing C<$?> by accident (e.g. by
running something via C<system>).
X<$?>
Inside of a C<END> block, the value of C<${^GLOBAL_PHASE}> will be
C<"END">.
C<UNITCHECK>, C<CHECK> and C<INIT> code blocks are useful to catch the
transition between the compilation phase and the execution phase of
the main program.
C<UNITCHECK> blocks are run just after the unit which defined them has
been compiled. The main program file and each module it loads are
compilation units, as are string C<eval>s, run-time code compiled using the
C<(?{ })> construct in a regex, calls to C<do FILE>, C<require FILE>,
and code after the C<-e> switch on the command line.
C<BEGIN> and C<UNITCHECK> blocks are not directly related to the phase of
the interpreter. They can be created and executed during any phase.
C<CHECK> code blocks are run just after the B<initial> Perl compile phase ends
and before the run time begins, in LIFO order. C<CHECK> code blocks are used
in the Perl compiler suite to save the compiled state of the program.
Inside of a C<CHECK> block, the value of C<${^GLOBAL_PHASE}> will be
C<"CHECK">.
C<INIT> blocks are run just before the Perl runtime begins execution, in
"first in, first out" (FIFO) order.
Inside of an C<INIT> block, the value of C<${^GLOBAL_PHASE}> will be C<"INIT">.
The C<CHECK> and C<INIT> blocks in code compiled by C<require>, string C<do>,
or string C<eval> will not be executed if they occur after the end of the
main compilation phase; that can be a problem in mod_perl and other persistent
environments which use those functions to load code at runtime.
When you use the B<-n> and B<-p> switches to Perl, C<BEGIN> and
C<END> work just as they do in B<awk>, as a degenerate case.
Both C<BEGIN> and C<CHECK> blocks are run when you use the B<-c>
switch for a compile-only syntax check, although your main code
is not.
The B<begincheck> program makes it all clear, eventually:
#!/usr/bin/perl
# begincheck
print "10. Ordinary code runs at runtime.\n";
END { print "16. So this is the end of the tale.\n" }
INIT { print " 7. INIT blocks run FIFO just before runtime.\n" }
UNITCHECK {
print " 4. And therefore before any CHECK blocks.\n"
}
CHECK { print " 6. So this is the sixth line.\n" }
print "11. It runs in order, of course.\n";
BEGIN { print " 1. BEGIN blocks run FIFO during compilation.\n" }
END { print "15. Read perlmod for the rest of the story.\n" }
CHECK { print " 5. CHECK blocks run LIFO after all compilation.\n" }
INIT { print " 8. Run this again, using Perl's -c switch.\n" }
print "12. This is anti-obfuscated code.\n";
END { print "14. END blocks run LIFO at quitting time.\n" }
BEGIN { print " 2. So this line comes out second.\n" }
UNITCHECK {
print " 3. UNITCHECK blocks run LIFO after each file is compiled.\n"
}
INIT { print " 9. You'll see the difference right away.\n" }
print "13. It only _looks_ like it should be confusing.\n";
__END__
=head2 Perl Classes
X<class> X<@ISA>
There is no special class syntax in Perl, but a package may act
as a class if it provides subroutines to act as methods. Such a
package may also derive some of its methods from another class (package)
by listing the other package name(s) in its global @ISA array (which
must be a package global, not a lexical).
For more on this, see L<perlootut> and L<perlobj>.
=head2 Perl Modules
X<module>
A module is just a set of related functions in a library file, i.e.,
a Perl package with the same name as the file. It is specifically
designed to be reusable by other modules or programs. It may do this
by providing a mechanism for exporting some of its symbols into the
symbol table of any package using it, or it may function as a class
definition and make its semantics available implicitly through
method calls on the class and its objects, without explicitly
exporting anything. Or it can do a little of both.
For example, to start a traditional, non-OO module called Some::Module,
create a file called F<Some/Module.pm> and start with this template:
package Some::Module; # assumes Some/Module.pm
use strict;
use warnings;
BEGIN {
require Exporter;
# set the version for version checking
our $VERSION = 1.00;
# Inherit from Exporter to export functions and variables
our @ISA = qw(Exporter);
# Functions and variables which are exported by default
our @EXPORT = qw(func1 func2);
# Functions and variables which can be optionally exported
our @EXPORT_OK = qw($Var1 %Hashit func3);
}
# exported package globals go here
our $Var1 = '';
our %Hashit = ();
# non-exported package globals go here
# (they are still accessible as $Some::Module::stuff)
our @more = ();
our $stuff = '';
# file-private lexicals go here, before any functions which use them
my $priv_var = '';
my %secret_hash = ();
# here's a file-private function as a closure,
# callable as $priv_func->();
my $priv_func = sub {
...
};
# make all your functions, whether exported or not;
# remember to put something interesting in the {} stubs
sub func1 { ... }
sub func2 { ... }
# this one isn't exported, but could be called directly
# as Some::Module::func3()
sub func3 { ... }
END { ... } # module clean-up code here (global destructor)
1; # don't forget to return a true value from the file
Then go on to declare and use your variables in functions without
any qualifications. See L<Exporter> and the L<perlmodlib> for
details on mechanics and style issues in module creation.
Perl modules are included into your program by saying
use Module;
or
use Module LIST;
This is exactly equivalent to
BEGIN { require 'Module.pm'; 'Module'->import; }
or
BEGIN { require 'Module.pm'; 'Module'->import( LIST ); }
As a special case
use Module ();
is exactly equivalent to
BEGIN { require 'Module.pm'; }
All Perl module files have the extension F<.pm>. The C<use> operator
assumes this so you don't have to spell out "F<Module.pm>" in quotes.
This also helps to differentiate new modules from old F<.pl> and
F<.ph> files. Module names are also capitalized unless they're
functioning as pragmas; pragmas are in effect compiler directives,
and are sometimes called "pragmatic modules" (or even "pragmata"
if you're a classicist).
The two statements:
require SomeModule;
require "SomeModule.pm";
differ from each other in two ways. In the first case, any double
colons in the module name, such as C<Some::Module>, are translated
into your system's directory separator, usually "/". The second
case does not, and would have to be specified literally. The other
difference is that seeing the first C<require> clues in the compiler
that uses of indirect object notation involving "SomeModule", as
in C<$ob = purge SomeModule>, are method calls, not function calls.
(Yes, this really can make a difference.)
Because the C<use> statement implies a C<BEGIN> block, the importing
of semantics happens as soon as the C<use> statement is compiled,
before the rest of the file is compiled. This is how it is able
to function as a pragma mechanism, and also how modules are able to
declare subroutines that are then visible as list or unary operators for
the rest of the current file. This will not work if you use C<require>
instead of C<use>. With C<require> you can get into this problem:
require Cwd; # make Cwd:: accessible
$here = Cwd::getcwd();
use Cwd; # import names from Cwd::
$here = getcwd();
require Cwd; # make Cwd:: accessible
$here = getcwd(); # oops! no main::getcwd()
In general, C<use Module ()> is recommended over C<require Module>,
because it determines module availability at compile time, not in the
middle of your program's execution. An exception would be if two modules
each tried to C<use> each other, and each also called a function from
that other module. In that case, it's easy to use C<require> instead.
Perl packages may be nested inside other package names, so we can have
package names containing C<::>. But if we used that package name
directly as a filename it would make for unwieldy or impossible
filenames on some systems. Therefore, if a module's name is, say,
C<Text::Soundex>, then its definition is actually found in the library
file F<Text/Soundex.pm>.
Perl modules always have a F<.pm> file, but there may also be
dynamically linked executables (often ending in F<.so>) or autoloaded
subroutine definitions (often ending in F<.al>) associated with the
module. If so, these will be entirely transparent to the user of
the module. It is the responsibility of the F<.pm> file to load
(or arrange to autoload) any additional functionality. For example,
although the POSIX module happens to do both dynamic loading and
autoloading, the user can say just C<use POSIX> to get it all.
=head2 Making your module threadsafe
X<threadsafe> X<thread safe>
X<module, threadsafe> X<module, thread safe>
X<CLONE> X<CLONE_SKIP> X<thread> X<threads> X<ithread>
Perl supports a type of threads called interpreter threads (ithreads).
These threads can be used explicitly and implicitly.
Ithreads work by cloning the data tree so that no data is shared
between different threads. These threads can be used by using the C<threads>
module or by doing fork() on win32 (fake fork() support). When a
thread is cloned all Perl data is cloned, however non-Perl data cannot
be cloned automatically. Perl after 5.8.0 has support for the C<CLONE>
special subroutine. In C<CLONE> you can do whatever
you need to do,
like for example handle the cloning of non-Perl data, if necessary.
C<CLONE> will be called once as a class method for every package that has it
defined (or inherits it). It will be called in the context of the new thread,
so all modifications are made in the new area. Currently CLONE is called with
no parameters other than the invocant package name, but code should not assume
that this will remain unchanged, as it is likely that in future extra parameters
will be passed in to give more information about the state of cloning.
If you want to CLONE all objects you will need to keep track of them per
package. This is simply done using a hash and Scalar::Util::weaken().
Perl after 5.8.7 has support for the C<CLONE_SKIP> special subroutine.
Like C<CLONE>, C<CLONE_SKIP> is called once per package; however, it is
called just before cloning starts, and in the context of the parent
thread. If it returns a true value, then no objects of that class will
be cloned; or rather, they will be copied as unblessed, undef values.
For example: if in the parent there are two references to a single blessed
hash, then in the child there will be two references to a single undefined
scalar value instead.
This provides a simple mechanism for making a module threadsafe; just add
C<sub CLONE_SKIP { 1 }> at the top of the class, and C<DESTROY()> will
now only be called once per object. Of course, if the child thread needs
to make use of the objects, then a more sophisticated approach is
needed.
Like C<CLONE>, C<CLONE_SKIP> is currently called with no parameters other
than the invocant package name, although that may change. Similarly, to
allow for future expansion, the return value should be a single C<0> or
C<1> value.
=head1 SEE ALSO
See L<perlmodlib> for general style issues related to building Perl
modules and classes, as well as descriptions of the standard library
and CPAN, L<Exporter> for how Perl's standard import/export mechanism
works, L<perlootut> and L<perlobj> for in-depth information on
creating classes, L<perlobj> for a hard-core reference document on
objects, L<perlsub> for an explanation of functions and scoping,
and L<perlxstut> and L<perlguts> for more information on writing
extension modules.
PK y3�Zx���; �; perlsymbian.podnu �[��� If you read this file _as_is_, just ignore the funny characters you see.
It is written in the POD format (see pod/perlpod.pod) which is specially
designed to be readable as is.
=head1 NAME
perlsymbian - Perl version 5 on Symbian OS
=head1 DESCRIPTION
This document describes various features of the Symbian operating
system that will affect how Perl version 5 (hereafter just Perl)
is compiled and/or runs.
B<NOTE: this port (as of 0.4.1) does not compile into a Symbian
OS GUI application, but instead it results in a Symbian DLL.>
The DLL includes a C++ class called CPerlBase, which one can then
(derive from and) use to embed Perl into applications, see F<symbian/README>.
The base port of Perl to Symbian only implements the basic POSIX-like
functionality; it does not implement any further Symbian or Series 60,
Series 80, or UIQ bindings for Perl.
It is also possible to generate Symbian executables for "miniperl"
and "perl", but since there is no standard command line interface
for Symbian (nor full keyboards in the devices), these are useful
mainly as demonstrations.
=head2 Compiling Perl on Symbian
(0) You need to have the appropriate Symbian SDK installed.
These instructions have been tested under various Nokia Series 60
Symbian SDKs (1.2 to 2.6, 2.8 should also work, 1.2 compiles but
does not work), Series 80 2.0, and Nokia 7710 (Series 90) SDK.
You can get the SDKs from Forum Nokia (L<http://www.forum.nokia.com/>).
A very rough port ("it compiles") to UIQ 2.1 has also been made.
A prerequisite for any of the SDKs is to install ActivePerl
from ActiveState, L<http://www.activestate.com/Products/ActivePerl/>
Having the SDK installed also means that you need to have either
the Metrowerks CodeWarrior installed (2.8 and 3.0 were used in testing)
or the Microsoft Visual C++ 6.0 installed (SP3 minimum, SP5 recommended).
Note that for example the Series 60 2.0 VC SDK installation talks
about ActivePerl build 518, which does no more (as of mid-2005) exist
at the ActiveState website. The ActivePerl 5.8.4 build 810 was
used successfully for compiling Perl on Symbian. The 5.6.x ActivePerls
do not work.
Other SDKs or compilers like Visual.NET, command-line-only
Visual.NET, Borland, GnuPoc, or sdk2unix have not been tried.
These instructions almost certainly won't work with older Symbian
releases or other SDKs. Patches to get this port running in other
releases, SDKs, compilers, platforms, or devices are naturally welcome.
(1) Get a Perl source code distribution (for example the file
perl-5.9.2.tar.gz is fine) from L<http://www.cpan.org/src/>
and unpack it in your the C:/Symbian directory of your Windows
system.
(2) Change to the perl source directory.
cd c:\Symbian\perl-5.x.x
(3) Run the following script using the perl coming with the SDK
perl symbian\config.pl
You must use the cmd.exe, the Cygwin shell will not work.
The PATH must include the SDK tools, including a Perl,
which should be the case under cmd.exe. If you do not
have that, see the end of symbian\sdk.pl for notes of
how your environment should be set up for Symbian compiles.
(4) Build the project, either by
make all
in cmd.exe or by using either the Metrowerks CodeWarrior
or the Visual C++ 6.0, or the Visual Studio 8 (the Visual C++
2005 Express Edition works fine).
If you use the VC IDE, you will have to run F<symbian\config.pl>
first using the cmd.exe, and then run 'make win.mf vc6.mf' to generate
the VC6 makefiles and workspaces. "make vc6" will compile for the VC6,
and "make cw" for the CodeWarrior.
The following SDK and compiler configurations and Nokia phones were
tested at some point in time (+ = compiled and PerlApp run, - = not),
both for Perl 5.8.x and 5.9.x:
SDK | VC | CW |
--------+----+----+---
S60 1.2 | + | + | 3650 (*)
S60 2.0 | + | + | 6600
S60 2.1 | - | + | 6670
S60 2.6 | + | + | 6630
S60 2.8 | + | + | (not tested in a device)
S80 2.6 | - | + | 9300
S90 1.1 | + | - | 7710
UIQ 2.1 | - | + | (not tested in a device)
(*) Compiles but does not work, unfortunately, a problem with Symbian.
If you are using the 'make' directly, it is the GNU make from the SDKs,
and it will invoke the right make commands for the Windows emulator
build and the Arm target builds ('thumb' by default) as necessary.
The build scripts assume the 'absolute style' SDK installs under C:,
the 'subst style' will not work.
If using the VC IDE, to build use for example the File->Open Workspace->
C:\Symbian\8.0a\S60_2nd_FP2\epoc32\build\symbian\perl\perl\wins\perl.dsw
The emulator binaries will appear in the same directory.
If using the VC IDE, you will a lot of warnings in the beginning of
the build because a lot of headers mentioned by the source cannot
be found, but this is not serious since those headers are not used.
The Metrowerks will give a lot of warnings about unused variables and
empty declarations, you can ignore those.
When the Windows and Arm DLLs are built do not be scared by a very long
messages whizzing by: it is the "export freeze" phase where the whole
(rather large) API of Perl is listed.
Once the build is completed you need to create the DLL SIS file by
make perldll.sis
which will create the file perlXYZ.sis (the XYZ being the Perl version)
which you can then install into your Symbian device: an easy way
to do this is to send them via Bluetooth or infrared and just open
the messages.
Since the total size of all Perl SIS files once installed is
over 2 MB, it is recommended to do the installation into a
memory card (drive E:) instead of the C: drive.
The size of the perlXYZ.SIS is about 370 kB but once it is in the
device it is about one 750 kB (according to the application manager).
The perlXYZ.sis includes only the Perl DLL: to create an additional
SIS file which includes some of the standard (pure) Perl libraries,
issue the command
make perllib.sis
Some of the standard Perl libraries are included, but not all:
see L</HISTORY> or F<symbian\install.cfg> for more details
(250 kB -> 700 kB).
Some of the standard Perl XS extensions (see L</HISTORY> are
also available:
make perlext.sis
which will create perlXYZext.sis (290 kB -> 770 kB).
To compile the demonstration application PerlApp you need first to
install the Perl headers under the SDK.
To install the Perl headers and the class CPerlBase documentation
so that you no more need the Perl sources around to compile Perl
applications using the SDK:
make sdkinstall
The destination directory is C:\Symbian\perl\X.Y.Z. For more
details, see F<symbian\PerlBase.pod>.
Once the headers have been installed, you can create a SIS for
the PerlApp:
make perlapp.sis
The perlapp.sis (11 kB -> 16 kB) will be built in the symbian
subdirectory, but a copy will also be made to the main directory.
If you want to package the Perl DLLs (one for WINS, one for ARMI),
the headers, and the documentation:
make perlsdk.zip
which will create perlXYZsdk.zip that can be used in another
Windows system with the SDK, without having to compile Perl in
that system.
If you want to package the PerlApp sources:
make perlapp.zip
If you want to package the perl.exe and miniperl.exe, you
can use the perlexe.sis and miniperlexe.sis make targets.
You also probably want the perllib.sis for the libraries
and maybe even the perlapp.sis for the recognizer.
The make target 'allsis' combines all the above SIS targets.
To clean up after compilation you can use either of
make clean
make distclean
depending on how clean you want to be.
=head2 Compilation problems
If you see right after "make" this
cat makefile.sh >makefile
'cat' is not recognized as an internal or external command,
operable program or batch file.
it means you need to (re)run the F<symbian\config.pl>.
If you get the error
'perl' is not recognized as an internal or external command,
operable program or batch file.
you may need to reinstall the ActivePerl.
If you see this
ren makedef.pl nomakedef.pl
The system cannot find the file specified.
C:\Symbian\...\make.exe: [rename_makedef] Error 1 (ignored)
please ignore it since it is nothing serious (the build process of
renames the Perl makedef.pl as nomakedef.pl to avoid confusing it
with a makedef.pl of the SDK).
=head2 PerlApp
The PerlApp application demonstrates how to embed Perl interpreters
to a Symbian application. The "Time" menu item runs the following
Perl code: C<print "Running in ", $^O, "\n", scalar localtime>,
the "Oneliner" allows one to type in Perl code, and the "Run"
opens a file chooser for selecting a Perl file to run.
The PerlApp also is started when the "Perl recognizer" (also included
and installed) detects a Perl file being activated through the GUI,
and offers either to install it under \Perl (if the Perl file is in
the inbox of the messaging application) or to run it (if the Perl file
is under \Perl).
=head2 sisify.pl
In the symbian subdirectory there is F<sisify.pl> utility which can be used
to package Perl scripts and/or Perl library directories into SIS files,
which can be installed to the device. To run the sisify.pl utility,
you will need to have the 'makesis' and 'uidcrc' utilities already
installed. If you don't have the Win32 SDKs, you may try for example
L<http://gnupoc.sourceforge.net/> or L<http://symbianos.org/~andreh/>.
=head2 Using Perl in Symbian
First of all note that you have full access to the Symbian device
when using Perl: you can do a lot of damage to your device (like
removing system files) unless you are careful. Please do take
backups before doing anything.
The Perl port has been done for the most part using the Symbian
standard POSIX-ish STDLIB library. It is a reasonably complete
library, but certain corners of such emulation libraries that tend
to be left unimplemented on non-UNIX platforms have been left
unimplemented also this time: fork(), signals(), user/group ids,
select() working for sockets, non-blocking sockets, and so forth.
See the file F<symbian/config.sh> and look for 'undef' to find the
unsupported APIs (or from Perl use Config).
The filesystem of Symbian devices uses DOSish syntax, "drives"
separated from paths by a colon, and backslashes for the path. The
exact assignment of the drives probably varies between platforms, but
for example in Series 60 you might see C: as the (flash) main memory,
D: as the RAM drive, E: as the memory card (MMC), Z: as the ROM. In
Series 80 D: is the memory card. As far the devices go the NUL: is
the bit bucket, the COMx: are the serial lines, IRCOMx: are the IR
ports, TMP: might be C:\System\Temp. Remember to double those
backslashes in doublequoted strings.
The Perl DLL is installed in \System\Libs\. The Perl libraries and
extension DLLs are installed in \System\Libs\Perl\X.Y.Z\. The PerlApp
is installed in \System\Apps\, and the SIS also installs a couple of
demo scripts in \Perl\ (C:\Mydocs\Perl\ on Nokia 7710).
Note that the Symbian filesystem is very picky: it strongly prefers
the \ instead of the /.
When doing XS / Symbian C++ programming include first the Symbian
headers, then any standard C/POSIX headers, then Perl headers, and finally
any application headers.
New() and Copy() are unfortunately used by both Symbian and Perl code
so you'll have to play cpp games if you need them. PerlBase.h undefines
the Perl definitions and redefines them as PerlNew() and PerlCopy().
=head1 TO DO
Lots. See F<symbian/TODO>.
=head1 WARNING
As of Perl Symbian port version 0.4.1 any part of Perl's standard
regression test suite has not been run on a real Symbian device using
the ported Perl, so innumerable bugs may lie in wait. Therefore there
is absolutely no warranty.
=head1 NOTE
When creating and extending application programming interfaces (APIs)
for Symbian or Series 60 or Series 80 or Series 90 it is suggested
that trademarks, registered trademarks, or trade names are not used in
the API names. Instead, developers should consider basing the API
naming in the existing (C++, or maybe Java) public component and API
naming, modified as appropriate by the rules of the programming
language the new APIs are for.
Nokia is a registered trademark of Nokia Corporation. Nokia's product
names are trademarks or registered trademarks of Nokia. Other product
and company names mentioned herein may be trademarks or trade names of
their respective owners.
=head1 AUTHOR
Jarkko Hietaniemi
=head1 COPYRIGHT
Copyright (c) 2004-2005 Nokia. All rights reserved.
Copyright (c) 2006-2007 Jarkko Hietaniemi.
=head1 LICENSE
The Symbian port is licensed under the same terms as Perl itself.
=head1 HISTORY
=over 4
=item *
0.1.0: April 2005
(This will show as "0.01" in the Symbian Installer.)
- The console window is a very simple console indeed: one can
get the newline with "000" and the "C" button is a backspace.
Do not expect a terminal capable of vt100 or ANSI sequences.
The console is also "ASCII", you cannot input e.g. any accented
letters. Because of obvious physical constraints the console is
also very small: (in Nokia 6600) 22 columns, 17 rows.
- The following libraries are available:
AnyDBM_File AutoLoader base Carp Config Cwd constant
DynaLoader Exporter File::Spec integer lib strict Symbol
vars warnings XSLoader
- The following extensions are available:
attributes Compress::Zlib Cwd Data::Dumper Devel::Peek
Digest::MD5 DynaLoader Fcntl File::Glob Filter::Util::Call
IO List::Util MIME::Base64
PerlIO::scalar PerlIO::via SDBM_File Socket Storable Time::HiRes
- The following extensions are missing for various technical
reasons:
B ByteLoader Devel::DProf Devel::PPPort Encode GDBM_File
I18N::Langinfo IPC::SysV NDBM_File Opcode PerlIO::encoding POSIX
re Safe Sys::Hostname Sys::Syslog
threads threads::shared Unicode::Normalize
- Using MakeMaker or the Module::* to build and install modules
is not supported.
- Building XS other than the ones in the core is not supported.
Since this is 0.something release, any future releases are almost
guaranteed to be binary incompatible. As a sign of this the Symbian
symbol exports are kept unfrozen and the .def files fully rebuilt
every time.
=item *
0.2.0: October 2005
- Perl 5.9.3 (patch level 25741)
- Compress::Zlib and IO::Zlib supported
- sisify.pl added
We maintain the binary incompatibility.
=item *
0.3.0: October 2005
- Perl 5.9.3 (patch level 25911)
- Series 80 2.0 and UIQ 2.1 support
We maintain the binary incompatibility.
=item *
0.4.0: November 2005
- Perl 5.9.3 (patch level 26052)
- adding a sample Symbian extension
We maintain the binary incompatibility.
=item *
0.4.1: December 2006
- Perl 5.9.5-to-be (patch level 30002)
- added extensions: Compress/Raw/Zlib, Digest/SHA,
Hash/Util, Math/BigInt/FastCalc, Text/Soundex, Time/Piece
- port to S90 1.1 by alexander smishlajev
We maintain the binary incompatibility.
=item *
0.4.2: March 2007
- catchup with Perl 5.9.5-to-be (patch level 30812)
- tested to build with Microsoft Visual C++ 2005 Express Edition
(which uses Microsoft Visual C 8, instead of the old VC6),
SDK used for testing S60_2nd_FP3 aka 8.1a
We maintain the binary incompatibility.
=back
=cut
PK y3�Z�>�! �!
perlstyle.podnu �[��� =head1 NAME
perlstyle - Perl style guide
=head1 DESCRIPTION
Each programmer will, of course, have his or her own preferences in
regards to formatting, but there are some general guidelines that will
make your programs easier to read, understand, and maintain.
The most important thing is to run your programs under the B<-w>
flag at all times. You may turn it off explicitly for particular
portions of code via the C<no warnings> pragma or the C<$^W> variable
if you must. You should also always run under C<use strict> or know the
reason why not. The C<use sigtrap> and even C<use diagnostics> pragmas
may also prove useful.
Regarding aesthetics of code lay out, about the only thing Larry
cares strongly about is that the closing curly bracket of
a multi-line BLOCK should line up with the keyword that started the construct.
Beyond that, he has other preferences that aren't so strong:
=over 4
=item *
4-column indent.
=item *
Opening curly on same line as keyword, if possible, otherwise line up.
=item *
Space before the opening curly of a multi-line BLOCK.
=item *
One-line BLOCK may be put on one line, including curlies.
=item *
No space before the semicolon.
=item *
Semicolon omitted in "short" one-line BLOCK.
=item *
Space around most operators.
=item *
Space around a "complex" subscript (inside brackets).
=item *
Blank lines between chunks that do different things.
=item *
Uncuddled elses.
=item *
No space between function name and its opening parenthesis.
=item *
Space after each comma.
=item *
Long lines broken after an operator (except C<and> and C<or>).
=item *
Space after last parenthesis matching on current line.
=item *
Line up corresponding items vertically.
=item *
Omit redundant punctuation as long as clarity doesn't suffer.
=back
Larry has his reasons for each of these things, but he doesn't claim that
everyone else's mind works the same as his does.
Here are some other more substantive style issues to think about:
=over 4
=item *
Just because you I<CAN> do something a particular way doesn't mean that
you I<SHOULD> do it that way. Perl is designed to give you several
ways to do anything, so consider picking the most readable one. For
instance
open(FOO,$foo) || die "Can't open $foo: $!";
is better than
die "Can't open $foo: $!" unless open(FOO,$foo);
because the second way hides the main point of the statement in a
modifier. On the other hand
print "Starting analysis\n" if $verbose;
is better than
$verbose && print "Starting analysis\n";
because the main point isn't whether the user typed B<-v> or not.
Similarly, just because an operator lets you assume default arguments
doesn't mean that you have to make use of the defaults. The defaults
are there for lazy systems programmers writing one-shot programs. If
you want your program to be readable, consider supplying the argument.
Along the same lines, just because you I<CAN> omit parentheses in many
places doesn't mean that you ought to:
return print reverse sort num values %array;
return print(reverse(sort num (values(%array))));
When in doubt, parenthesize. At the very least it will let some poor
schmuck bounce on the % key in B<vi>.
Even if you aren't in doubt, consider the mental welfare of the person
who has to maintain the code after you, and who will probably put
parentheses in the wrong place.
=item *
Don't go through silly contortions to exit a loop at the top or the
bottom, when Perl provides the C<last> operator so you can exit in
the middle. Just "outdent" it a little to make it more visible:
LINE:
for (;;) {
statements;
last LINE if $foo;
next LINE if /^#/;
statements;
}
=item *
Don't be afraid to use loop labels--they're there to enhance
readability as well as to allow multilevel loop breaks. See the
previous example.
=item *
Avoid using C<grep()> (or C<map()>) or `backticks` in a void context, that is,
when you just throw away their return values. Those functions all
have return values, so use them. Otherwise use a C<foreach()> loop or
the C<system()> function instead.
=item *
For portability, when using features that may not be implemented on
every machine, test the construct in an eval to see if it fails. If
you know what version or patchlevel a particular feature was
implemented, you can test C<$]> (C<$PERL_VERSION> in C<English>) to see if it
will be there. The C<Config> module will also let you interrogate values
determined by the B<Configure> program when Perl was installed.
=item *
Choose mnemonic identifiers. If you can't remember what mnemonic means,
you've got a problem.
=item *
While short identifiers like C<$gotit> are probably ok, use underscores to
separate words in longer identifiers. It is generally easier to read
C<$var_names_like_this> than C<$VarNamesLikeThis>, especially for
non-native speakers of English. It's also a simple rule that works
consistently with C<VAR_NAMES_LIKE_THIS>.
Package names are sometimes an exception to this rule. Perl informally
reserves lowercase module names for "pragma" modules like C<integer> and
C<strict>. Other modules should begin with a capital letter and use mixed
case, but probably without underscores due to limitations in primitive
file systems' representations of module names as files that must fit into a
few sparse bytes.
=item *
You may find it helpful to use letter case to indicate the scope
or nature of a variable. For example:
$ALL_CAPS_HERE constants only (beware clashes with perl vars!)
$Some_Caps_Here package-wide global/static
$no_caps_here function scope my() or local() variables
Function and method names seem to work best as all lowercase.
E.g., C<$obj-E<gt>as_string()>.
You can use a leading underscore to indicate that a variable or
function should not be used outside the package that defined it.
=item *
If you have a really hairy regular expression, use the C</x> or C</xx>
modifiers and put in some whitespace to make it look a little less like
line noise.
Don't use slash as a delimiter when your regexp has slashes or backslashes.
=item *
Use the new C<and> and C<or> operators to avoid having to parenthesize
list operators so much, and to reduce the incidence of punctuation
operators like C<&&> and C<||>. Call your subroutines as if they were
functions or list operators to avoid excessive ampersands and parentheses.
=item *
Use here documents instead of repeated C<print()> statements.
=item *
Line up corresponding things vertically, especially if it'd be too long
to fit on one line anyway.
$IDX = $ST_MTIME;
$IDX = $ST_ATIME if $opt_u;
$IDX = $ST_CTIME if $opt_c;
$IDX = $ST_SIZE if $opt_s;
mkdir $tmpdir, 0700 or die "can't mkdir $tmpdir: $!";
chdir($tmpdir) or die "can't chdir $tmpdir: $!";
mkdir 'tmp', 0777 or die "can't mkdir $tmpdir/tmp: $!";
=item *
Always check the return codes of system calls. Good error messages should
go to C<STDERR>, include which program caused the problem, what the failed
system call and arguments were, and (VERY IMPORTANT) should contain the
standard system error message for what went wrong. Here's a simple but
sufficient example:
opendir(D, $dir) or die "can't opendir $dir: $!";
=item *
Line up your transliterations when it makes sense:
tr [abc]
[xyz];
=item *
Think about reusability. Why waste brainpower on a one-shot when you
might want to do something like it again? Consider generalizing your
code. Consider writing a module or object class. Consider making your
code run cleanly with C<use strict> and C<use warnings> (or B<-w>) in
effect. Consider giving away your code. Consider changing your whole
world view. Consider... oh, never mind.
=item *
Try to document your code and use Pod formatting in a consistent way. Here
are commonly expected conventions:
=over 4
=item *
use C<CE<lt>E<gt>> for function, variable and module names (and more
generally anything that can be considered part of code, like filehandles
or specific values). Note that function names are considered more readable
with parentheses after their name, that is C<function()>.
=item *
use C<BE<lt>E<gt>> for commands names like B<cat> or B<grep>.
=item *
use C<FE<lt>E<gt>> or C<CE<lt>E<gt>> for file names. C<FE<lt>E<gt>> should
be the only Pod code for file names, but as most Pod formatters render it
as italic, Unix and Windows paths with their slashes and backslashes may
be less readable, and better rendered with C<CE<lt>E<gt>>.
=back
=item *
Be consistent.
=item *
Be nice.
=back
PK y3�Z`� � perl5161delta.podnu �[��� =encoding utf8
=head1 NAME
perl5161delta - what is new for perl v5.16.1
=head1 DESCRIPTION
This document describes differences between the 5.16.0 release and
the 5.16.1 release.
If you are upgrading from an earlier release such as 5.14.0, first read
L<perl5160delta>, which describes differences between 5.14.0 and
5.16.0.
=head1 Security
=head2 an off-by-two error in Scalar-List-Util has been fixed
The bugfix was in Scalar-List-Util 1.23_04, and perl 5.16.1 includes
Scalar-List-Util 1.25.
=head1 Incompatible Changes
There are no changes intentionally incompatible with 5.16.0 If any
exist, they are bugs, and we request that you submit a report. See
L</Reporting Bugs> below.
=head1 Modules and Pragmata
=head2 Updated Modules and Pragmata
=over 4
=item *
L<Scalar::Util> and L<List::Util> have been upgraded from version 1.23 to
version 1.25.
=item *
L<B::Deparse> has been updated from version 1.14 to 1.14_01. An
"uninitialized" warning emitted by B::Deparse has been squashed
[perl #113464].
=back
=head1 Configuration and Compilation
=over
=item *
Building perl with some Windows compilers used to fail due to a problem
with miniperl's C<glob> operator (which uses the C<perlglob> program)
deleting the PATH environment variable [perl #113798].
=back
=head1 Platform Support
=head2 Platform-Specific Notes
=over 4
=item VMS
All C header files from the top-level directory of the distribution are now
installed on VMS, providing consistency with a long-standing practice on other
platforms. Previously only a subset were installed, which broke non-core extension
builds for extensions that depended on the missing include files.
=back
=head1 Selected Bug Fixes
=over 4
=item *
A regression introduced in Perl v5.16.0 involving
C<tr/I<SEARCHLIST>/I<REPLACEMENTLIST>/> has been fixed. Only the first
instance is supposed to be meaningful if a character appears more than
once in C<I<SEARCHLIST>>. Under some circumstances, the final instance
was overriding all earlier ones. [perl #113584]
=item *
C<B::COP::stashlen> has been added. This provides access to an internal
field added in perl 5.16 under threaded builds. It was broken at the last
minute before 5.16 was released [perl #113034].
=item *
The L<re> pragma will no longer clobber C<$_>. [perl #113750]
=item *
Unicode 6.1 published an incorrect alias for one of the
Canonical_Combining_Class property's values (which range between 0 and
254). The alias C<CCC133> should have been C<CCC132>. Perl now
overrides the data file furnished by Unicode to give the correct value.
=item *
Duplicating scalar filehandles works again. [perl #113764]
=item *
Under threaded perls, a runtime code block in a regular expression could
corrupt the package name stored in the op tree, resulting in bad reads
in C<caller>, and possibly crashes [perl #113060].
=item *
For efficiency's sake, many operators and built-in functions return the
same scalar each time. Lvalue subroutines and subroutines in the CORE::
namespace were allowing this implementation detail to leak through.
C<print &CORE::uc("a"), &CORE::uc("b")> used to print "BB". The same thing
would happen with an lvalue subroutine returning the return value of C<uc>.
Now the value is copied in such cases [perl #113044].
=item *
C<__SUB__> now works in special blocks (C<BEGIN>, C<END>, etc.).
=item *
Formats that reference lexical variables from outside no longer result
in crashes.
=back
=head1 Known Problems
There are no new known problems, but consult L<perl5160delta/Known
Problems> to see those identified in the 5.16.0 release.
=head1 Acknowledgements
Perl 5.16.1 represents approximately 2 months of development since Perl
5.16.0 and contains approximately 14,000 lines of changes across 96
files from 8 authors.
Perl continues to flourish into its third decade thanks to a vibrant
community of users and developers. The following people are known to
have contributed the improvements that became Perl 5.16.1:
Chris 'BinGOs' Williams, Craig A. Berry, Father Chrysostomos, Karl
Williamson, Paul Johnson, Reini Urban, Ricardo Signes, Tony Cook.
The list above is almost certainly incomplete as it is automatically
generated from version control history. In particular, it does not
include the names of the (very much appreciated) contributors who
reported issues to the Perl bug tracker.
Many of the changes included in this version originated in the CPAN
modules included in Perl's core. We're grateful to the entire CPAN
community for helping Perl to flourish.
For a more complete list of all of Perl's historical contributors,
please see the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles
recently posted to the comp.lang.perl.misc newsgroup and the perl
bug database at http://rt.perl.org/perlbug/ . There may also be
information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug>
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it
inappropriate to send to a publicly archived mailing list, then please
send it to perl5-security-report@perl.org. This points to a closed
subscription unarchived mailing list, which includes all the core
committers, who will be able to help assess the impact of issues, figure
out a resolution, and help co-ordinate the release of patches to
mitigate or fix the problem across all platforms on which Perl is
supported. Please only use this address for security issues in the Perl
core, not for modules independently distributed on CPAN.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details
on what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK y3�Z�OO perlexperiment.podnu �[��� =head1 NAME
perlexperiment - A listing of experimental features in Perl
=head1 DESCRIPTION
This document lists the current and past experimental features in the perl
core. Although all of these are documented with their appropriate topics,
this succinct listing gives you an overview and basic facts about their
status.
So far we've merely tried to find and list the experimental features and infer
their inception, versions, etc. There's a lot of speculation here.
=head2 Current experiments
=over 8
=item C<our> can now have an experimental optional attribute C<unique>
Introduced in Perl 5.8.0
Deprecated in Perl 5.10.0
The ticket for this feature is
L<[perl #119313]|https://rt.perl.org/rt3/Ticket/Display.html?id=119313>.
=item Smart match (C<~~>)
Introduced in Perl 5.10.0
Modified in Perl 5.10.1, 5.12.0
Using this feature triggers warnings in the category
C<experimental::smartmatch>.
The ticket for this feature is
L<[perl #119317]|https://rt.perl.org/rt3/Ticket/Display.html?id=119317>.
=item Pluggable keywords
The ticket for this feature is
L<[perl #119455]|https://rt.perl.org/rt3/Ticket/Display.html?id=119455>.
See L<perlapi/PL_keyword_plugin> for the mechanism.
Introduced in Perl 5.11.2
=item Regular Expression Set Operations
Introduced in Perl 5.18
The ticket for this feature is
L<[perl #119451]|https://rt.perl.org/rt3/Ticket/Display.html?id=119451>.
See also: L<perlrecharclass/Extended Bracketed Character Classes>
Using this feature triggers warnings in the category
C<experimental::regex_sets>.
=item Subroutine signatures
Introduced in Perl 5.20.0
Using this feature triggers warnings in the category
C<experimental::signatures>.
The ticket for this feature is
L<[perl #121481]|https://rt.perl.org/Ticket/Display.html?id=121481>.
=item Aliasing via reference
Introduced in Perl 5.22.0
Using this feature triggers warnings in the category
C<experimental::refaliasing>.
The ticket for this feature is
L<[perl #122947]|https://rt.perl.org/rt3/Ticket/Display.html?id=122947>.
See also: L<perlref/Assigning to References>
=item The "const" attribute
Introduced in Perl 5.22.0
Using this feature triggers warnings in the category
C<experimental::const_attr>.
The ticket for this feature is
L<[perl #123630]|https://rt.perl.org/rt3/Ticket/Display.html?id=123630>.
See also: L<perlsub/Constant Functions>
=item use re 'strict';
Introduced in Perl 5.22.0
Using this feature triggers warnings in the category
C<experimental::re_strict>.
See L<re/'strict' mode>
=item String- and number-specific bitwise operators
Introduced in Perl 5.22.0
See also: L<perlop/Bitwise String Operators>
Using this feature triggers warnings in the category
C<experimental::bitwise>.
The ticket for this feature is
L<[perl #123707]|https://rt.perl.org/rt3/Ticket/Display.html?id=123707>.
=item The <:win32> IO pseudolayer
The ticket for this feature is
L<[perl #119453]|https://rt.perl.org/rt3/Ticket/Display.html?id=119453>.
See also L<perlrun>
=item Declaring a reference to a variable
Introduced in Perl 5.26.0
Using this feature triggers warnings in the category
C<experimental::declared_refs>.
The ticket for this feature is
L<[perl #128654]|https://rt.perl.org/rt3/Ticket/Display.html?id=128654>.
See also: L<perlref/Declaring a Reference to a Variable>
=item There is an C<installhtml> target in the Makefile.
The ticket for this feature is
L<[perl #116487]|https://rt.perl.org/rt3/Ticket/Display.html?id=116487>.
=item Unicode in Perl on EBCDIC
=back
=head2 Accepted features
These features were so wildly successful and played so well with others that
we decided to remove their experimental status and admit them as full, stable
features in the world of Perl, lavishing all the benefits and luxuries thereof.
They are also awarded +5 Stability and +3 Charisma.
=over 8
=item 64-bit support
Introduced in Perl 5.005
=item die accepts a reference
Introduced in Perl 5.005
=item DB module
Introduced in Perl 5.6.0
See also L<perldebug>, L<perldebtut>
=item Weak references
Introduced in Perl 5.6.0
=item Internal file glob
Introduced in Perl 5.6.0
=item fork() emulation
Introduced in Perl 5.6.1
See also L<perlfork>
=item -Dusemultiplicity -Duseithreads
Introduced in Perl 5.6.0
Accepted in Perl 5.8.0
=item Support for long doubles
Introduced in Perl 5.6.0
Accepted in Perl 5.8.1
=item The C<\N> regex character class
The C<\N> character class, not to be confused with the named character
sequence C<\N{NAME}>, denotes any non-newline character in a regular
expression.
Introduced in Perl 5.12
Exact version of acceptance unclear, but no later than Perl 5.18.
=item C<(?{code})> and C<(??{ code })>
Introduced in Perl 5.6.0
Accepted in Perl 5.20.0
See also L<perlre>
=item Linux abstract Unix domain sockets
Introduced in Perl 5.9.2
Accepted before Perl 5.20.0. The Socket library is now primarily maintained
on CPAN, rather than in the perl core.
See also L<Socket>
=item Lvalue subroutines
Introduced in Perl 5.6.0
Accepted in Perl 5.20.0
See also L<perlsub>
=item Backtracking control verbs
C<(*ACCEPT)>
Introduced in Perl 5.10
Accepted in Perl 5.20.0
=item The <:pop> IO pseudolayer
See also L<perlrun>
Accepted in Perl 5.20.0
=item C<\s> in regexp matches vertical tab
Accepted in Perl 5.22.0
=item Postfix dereference syntax
Introduced in Perl 5.20.0
Accepted in Perl 5.24.0
=item Lexical subroutines
Introduced in Perl 5.18.0
Accepted in Perl 5.26.0
=back
=head2 Removed features
These features are no longer considered experimental and their functionality
has disappeared. It's your own fault if you wrote production programs using
these features after we explicitly told you not to (see L<perlpolicy>).
=over 8
=item 5.005-style threading
Introduced in Perl 5.005
Removed in Perl 5.10
=item perlcc
Introduced in Perl 5.005
Moved from Perl 5.9.0 to CPAN
=item The pseudo-hash data type
Introduced in Perl 5.6.0
Removed in Perl 5.9.0
=item GetOpt::Long Options can now take multiple values at once (experimental)
C<Getopt::Long> upgraded to version 2.35
Removed in Perl 5.8.8
=item Assertions
The C<-A> command line switch
Introduced in Perl 5.9.0
Removed in Perl 5.9.5
=item Test::Harness::Straps
Moved from Perl 5.10.1 to CPAN
=item C<legacy>
The experimental C<legacy> pragma was swallowed by the C<feature> pragma.
Introduced in Perl 5.11.2
Removed in Perl 5.11.3
=item Lexical C<$_>
Using this feature triggered warnings in the category
C<experimental::lexical_topic>.
Introduced in Perl 5.10.0
Removed in Perl 5.24.0
=item Array and hash container functions accept references
Using this feature triggered warnings in the category
C<experimental::autoderef>.
Superseded by L</Postfix dereference syntax>.
Introduced in Perl 5.14.0
Removed in Perl 5.24.0
=back
=head1 SEE ALSO
For a complete list of features check L<feature>.
=head1 AUTHORS
brian d foy C<< <brian.d.foy@gmail.com> >>
SE<eacute>bastien Aperghis-Tramoni C<< <saper@cpan.org> >>
=head1 COPYRIGHT
Copyright 2010, brian d foy C<< <brian.d.foy@gmail.com> >>
=head1 LICENSE
You can use and redistribute this document under the same terms as Perl
itself.
=cut
PK y3�Z���) ) perldos.podnu �[��� If you read this file _as_is_, just ignore the funny characters you
see. It is written in the POD format (see perlpod manpage) which is
specially designed to be readable as is.
=head1 NAME
perldos - Perl under DOS, W31, W95.
=head1 SYNOPSIS
These are instructions for building Perl under DOS (or w??), using
DJGPP v2.03 or later. Under w95 long filenames are supported.
=head1 DESCRIPTION
Before you start, you should glance through the README file
found in the top-level directory where the Perl distribution
was extracted. Make sure you read and understand the terms under
which this software is being distributed.
This port currently supports MakeMaker (the set of modules that
is used to build extensions to perl). Therefore, you should be
able to build and install most extensions found in the CPAN sites.
Detailed instructions on how to build and install perl extension
modules, including XS-type modules, is included. See 'BUILDING AND
INSTALLING MODULES'.
=head2 Prerequisites for Compiling Perl on DOS
=over 4
=item DJGPP
DJGPP is a port of GNU C/C++ compiler and development tools to 32-bit,
protected-mode environment on Intel 32-bit CPUs running MS-DOS and compatible
operating systems, by DJ Delorie <dj@delorie.com> and friends.
For more details (FAQ), check out the home of DJGPP at:
http://www.delorie.com/djgpp/
If you have questions about DJGPP, try posting to the DJGPP newsgroup:
comp.os.msdos.djgpp, or use the email gateway djgpp@delorie.com.
You can find the full DJGPP distribution on any of the mirrors listed here:
http://www.delorie.com/djgpp/getting.html
You need the following files to build perl (or add new modules):
v2/djdev203.zip
v2gnu/bnu2112b.zip
v2gnu/gcc2953b.zip
v2gnu/bsh204b.zip
v2gnu/mak3791b.zip
v2gnu/fil40b.zip
v2gnu/sed3028b.zip
v2gnu/txt20b.zip
v2gnu/dif272b.zip
v2gnu/grep24b.zip
v2gnu/shl20jb.zip
v2gnu/gwk306b.zip
v2misc/csdpmi5b.zip
or possibly any newer version.
=item Pthreads
Thread support is not tested in this version of the djgpp perl.
=back
=head2 Shortcomings of Perl under DOS
Perl under DOS lacks some features of perl under UNIX because of
deficiencies in the UNIX-emulation, most notably:
=over 4
=item *
fork() and pipe()
=item *
some features of the UNIX filesystem regarding link count and file dates
=item *
in-place operation is a little bit broken with short filenames
=item *
sockets
=back
=head2 Building Perl on DOS
=over 4
=item *
Unpack the source package F<perl5.8*.tar.gz> with djtarx. If you want
to use long file names under w95 and also to get Perl to pass all its
tests, don't forget to use
set LFN=y
set FNCASE=y
before unpacking the archive.
=item *
Create a "symlink" or copy your bash.exe to sh.exe in your C<($DJDIR)/bin>
directory.
ln -s bash.exe sh.exe
[If you have the recommended version of bash for DJGPP, this is already
done for you.]
And make the C<SHELL> environment variable point to this F<sh.exe>:
set SHELL=c:/djgpp/bin/sh.exe (use full path name!)
You can do this in F<djgpp.env> too. Add this line BEFORE any section
definition:
+SHELL=%DJDIR%/bin/sh.exe
=item *
If you have F<split.exe> and F<gsplit.exe> in your path, then rename
F<split.exe> to F<djsplit.exe>, and F<gsplit.exe> to F<split.exe>.
Copy or link F<gecho.exe> to F<echo.exe> if you don't have F<echo.exe>.
Copy or link F<gawk.exe> to F<awk.exe> if you don't have F<awk.exe>.
[If you have the recommended versions of djdev, shell utilities and
gawk, all these are already done for you, and you will not need to do
anything.]
=item *
Chdir to the djgpp subdirectory of perl toplevel and type the following
commands:
set FNCASE=y
configure.bat
This will do some preprocessing then run the Configure script for you.
The Configure script is interactive, but in most cases you just need to
press ENTER. The "set" command ensures that DJGPP preserves the letter
case of file names when reading directories. If you already issued this
set command when unpacking the archive, and you are in the same DOS
session as when you unpacked the archive, you don't have to issue the
set command again. This command is necessary *before* you start to
(re)configure or (re)build perl in order to ensure both that perl builds
correctly and that building XS-type modules can succeed. See the DJGPP
info entry for "_preserve_fncase" for more information:
info libc alphabetical _preserve_fncase
If the script says that your package is incomplete, and asks whether
to continue, just answer with Y (this can only happen if you don't use
long filenames or forget to issue "set FNCASE=y" first).
When Configure asks about the extensions, I suggest IO and Fcntl,
and if you want database handling then SDBM_File or GDBM_File
(you need to install gdbm for this one). If you want to use the
POSIX extension (this is the default), make sure that the stack
size of your F<cc1.exe> is at least 512kbyte (you can check this
with: C<stubedit cc1.exe>).
You can use the Configure script in non-interactive mode too.
When I built my F<perl.exe>, I used something like this:
configure.bat -des
You can find more info about Configure's command line switches in
the F<INSTALL> file.
When the script ends, and you want to change some values in the
generated F<config.sh> file, then run
sh Configure -S
after you made your modifications.
IMPORTANT: if you use this C<-S> switch, be sure to delete the CONFIG
environment variable before running the script:
set CONFIG=
=item *
Now you can compile Perl. Type:
make
=back
=head2 Testing Perl on DOS
Type:
make test
If you're lucky you should see "All tests successful". But there can be
a few failed subtests (less than 5 hopefully) depending on some external
conditions (e.g. some subtests fail under linux/dosemu or plain dos
with short filenames only).
=head2 Installation of Perl on DOS
Type:
make install
This will copy the newly compiled perl and libraries into your DJGPP
directory structure. Perl.exe and the utilities go into C<($DJDIR)/bin>,
and the library goes under C<($DJDIR)/lib/perl5>. The pod documentation
goes under C<($DJDIR)/lib/perl5/pod>.
=head1 BUILDING AND INSTALLING MODULES ON DOS
=head2 Building Prerequisites for Perl on DOS
For building and installing non-XS modules, all you need is a working
perl under DJGPP. Non-XS modules do not require re-linking the perl
binary, and so are simpler to build and install.
XS-type modules do require re-linking the perl binary, because part of
an XS module is written in "C", and has to be linked together with the
perl binary to be executed. This is required because perl under DJGPP
is built with the "static link" option, due to the lack of "dynamic
linking" in the DJGPP environment.
Because XS modules require re-linking of the perl binary, you need both
the perl binary distribution and the perl source distribution to build
an XS extension module. In addition, you will have to have built your
perl binary from the source distribution so that all of the components
of the perl binary are available for the required link step.
=head2 Unpacking CPAN Modules on DOS
First, download the module package from CPAN (e.g., the "Comma Separated
Value" text package, Text-CSV-0.01.tar.gz). Then expand the contents of
the package into some location on your disk. Most CPAN modules are
built with an internal directory structure, so it is usually safe to
expand it in the root of your DJGPP installation. Some people prefer to
locate source trees under /usr/src (i.e., C<($DJDIR)/usr/src>), but you may
put it wherever seems most logical to you, *EXCEPT* under the same
directory as your perl source code. There are special rules that apply
to modules which live in the perl source tree that do not apply to most
of the modules in CPAN.
Unlike other DJGPP packages, which are normal "zip" files, most CPAN
module packages are "gzipped tarballs". Recent versions of WinZip will
safely unpack and expand them, *UNLESS* they have zero-length files. It
is a known WinZip bug (as of v7.0) that it will not extract zero-length
files.
From the command line, you can use the djtar utility provided with DJGPP
to unpack and expand these files. For example:
C:\djgpp>djtarx -v Text-CSV-0.01.tar.gz
This will create the new directory C<($DJDIR)/Text-CSV-0.01>, filling
it with the source for this module.
=head2 Building Non-XS Modules on DOS
To build a non-XS module, you can use the standard module-building
instructions distributed with perl modules.
perl Makefile.PL
make
make test
make install
This is sufficient because non-XS modules install only ".pm" files and
(sometimes) pod and/or man documentation. No re-linking of the perl
binary is needed to build, install or use non-XS modules.
=head2 Building XS Modules on DOS
To build an XS module, you must use the standard module-building
instructions distributed with perl modules *PLUS* three extra
instructions specific to the DJGPP "static link" build environment.
set FNCASE=y
perl Makefile.PL
make
make perl
make test
make -f Makefile.aperl inst_perl MAP_TARGET=perl.exe
make install
The first extra instruction sets DJGPP's FNCASE environment variable so
that the new perl binary which you must build for an XS-type module will
build correctly. The second extra instruction re-builds the perl binary
in your module directory before you run "make test", so that you are
testing with the new module code you built with "make". The third extra
instruction installs the perl binary from your module directory into the
standard DJGPP binary directory, C<($DJDIR)/bin>, replacing your
previous perl binary.
Note that the MAP_TARGET value *must* have the ".exe" extension or you
will not create a "perl.exe" to replace the one in C<($DJDIR)/bin>.
When you are done, the XS-module install process will have added information
to your "perllocal" information telling that the perl binary has been replaced,
and what module was installed. You can view this information at any time
by using the command:
perl -S perldoc perllocal
=head1 AUTHOR
Laszlo Molnar, F<laszlo.molnar@eth.ericsson.se> [Installing/building perl]
Peter J. Farley III F<pjfarley@banet.net> [Building/installing modules]
=head1 SEE ALSO
perl(1).
=cut
PK y3�ZGo�+ + perl586delta.podnu �[��� =head1 NAME
perl586delta - what is new for perl v5.8.6
=head1 DESCRIPTION
This document describes differences between the 5.8.5 release and
the 5.8.6 release.
=head1 Incompatible Changes
There are no changes incompatible with 5.8.5.
=head1 Core Enhancements
The perl interpreter is now more tolerant of UTF-16-encoded scripts.
On Win32, Perl can now use non-IFS compatible LSPs, which allows Perl to
work in conjunction with firewalls such as McAfee Guardian. For full details
see the file F<README.win32>, particularly if you're running Win95.
=head1 Modules and Pragmata
=over 4
=item *
With the C<base> pragma, an intermediate class with no fields used to messes
up private fields in the base class. This has been fixed.
=item *
Cwd upgraded to version 3.01 (as part of the new PathTools distribution)
=item *
Devel::PPPort upgraded to version 3.03
=item *
File::Spec upgraded to version 3.01 (as part of the new PathTools distribution)
=item *
Encode upgraded to version 2.08
=item *
ExtUtils::MakeMaker remains at version 6.17, as later stable releases currently
available on CPAN have some issues with core modules on some core platforms.
=item *
I18N::LangTags upgraded to version 0.35
=item *
Math::BigInt upgraded to version 1.73
=item *
Math::BigRat upgraded to version 0.13
=item *
MIME::Base64 upgraded to version 3.05
=item *
POSIX::sigprocmask function can now retrieve the current signal mask without
also setting it.
=item *
Time::HiRes upgraded to version 1.65
=back
=head1 Utility Changes
Perl has a new -dt command-line flag, which enables threads support in the
debugger.
=head1 Performance Enhancements
C<reverse sort ...> is now optimized to sort in reverse, avoiding the
generation of a temporary intermediate list.
C<for (reverse @foo)> now iterates in reverse, avoiding the generation of a
temporary reversed list.
=head1 Selected Bug Fixes
The regexp engine is now more robust when given invalid utf8 input, as is
sometimes generated by buggy XS modules.
C<foreach> on threads::shared array used to be able to crash Perl. This bug
has now been fixed.
A regexp in C<STDOUT>'s destructor used to coredump, because the regexp pad
was already freed. This has been fixed.
C<goto &> is now more robust - bugs in deep recursion and chained C<goto &>
have been fixed.
Using C<delete> on an array no longer leaks memory. A C<pop> of an item from a
shared array reference no longer causes a leak.
C<eval_sv()> failing a taint test could corrupt the stack - this has been
fixed.
On platforms with 64 bit pointers numeric comparison operators used to
erroneously compare the addresses of references that are overloaded, rather
than using the overloaded values. This has been fixed.
C<read> into a UTF8-encoded buffer with an offset off the end of the buffer
no longer mis-calculates buffer lengths.
Although Perl has promised since version 5.8 that C<sort()> would be
stable, the two cases C<sort {$b cmp $a}> and C<< sort {$b <=> $a} >> could
produce non-stable sorts. This is corrected in perl5.8.6.
Localising C<$^D> no longer generates a diagnostic message about valid -D
flags.
=head1 New or Changed Diagnostics
For -t and -T,
Too late for "-T" option
has been changed to the more informative
"-T" is on the #! line, it must also be used on the command line
=head1 Changed Internals
From now on all applications embedding perl will behave as if perl
were compiled with -DPERL_USE_SAFE_PUTENV. See "Environment access" in
the F<INSTALL> file for details.
Most C<C> source files now have comments at the top explaining their purpose,
which should help anyone wishing to get an overview of the implementation.
=head1 New Tests
There are significantly more tests for the C<B> suite of modules.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles
recently posted to the comp.lang.perl.misc newsgroup and the perl
bug database at http://bugs.perl.org. There may also be
information at http://www.perl.org, the Perl Home Page.
If you believe you have an unreported bug, please run the B<perlbug>
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team. You can browse and search
the Perl 5 bugs at http://bugs.perl.org/
=head1 SEE ALSO
The F<Changes> file for exhaustive details on what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK y3�Z]�op� p� perl5101delta.podnu �[��� =head1 NAME
perl5101delta - what is new for perl v5.10.1
=head1 DESCRIPTION
This document describes differences between the 5.10.0 release and
the 5.10.1 release.
If you are upgrading from an earlier release such as 5.8.8, first read
the L<perl5100delta>, which describes differences between 5.8.8 and
5.10.0
=head1 Incompatible Changes
=head2 Switch statement changes
The handling of complex expressions by the C<given>/C<when> switch
statement has been enhanced. There are two new cases where C<when> now
interprets its argument as a boolean, instead of an expression to be used
in a smart match:
=over 4
=item flip-flop operators
The C<..> and C<...> flip-flop operators are now evaluated in boolean
context, following their usual semantics; see L<perlop/"Range Operators">.
Note that, as in perl 5.10.0, C<when (1..10)> will not work to test
whether a given value is an integer between 1 and 10; you should use
C<when ([1..10])> instead (note the array reference).
However, contrary to 5.10.0, evaluating the flip-flop operators in boolean
context ensures it can now be useful in a C<when()>, notably for
implementing bistable conditions, like in:
when (/^=begin/ .. /^=end/) {
# do something
}
=item defined-or operator
A compound expression involving the defined-or operator, as in
C<when (expr1 // expr2)>, will be treated as boolean if the first
expression is boolean. (This just extends the existing rule that applies
to the regular or operator, as in C<when (expr1 || expr2)>.)
=back
The next section details more changes brought to the semantics to
the smart match operator, that naturally also modify the behaviour
of the switch statements where smart matching is implicitly used.
=head2 Smart match changes
=head3 Changes to type-based dispatch
The smart match operator C<~~> is no longer commutative. The behaviour of
a smart match now depends primarily on the type of its right hand
argument. Moreover, its semantics have been adjusted for greater
consistency or usefulness in several cases. While the general backwards
compatibility is maintained, several changes must be noted:
=over 4
=item *
Code references with an empty prototype are no longer treated specially.
They are passed an argument like the other code references (even if they
choose to ignore it).
=item *
C<%hash ~~ sub {}> and C<@array ~~ sub {}> now test that the subroutine
returns a true value for each key of the hash (or element of the
array), instead of passing the whole hash or array as a reference to
the subroutine.
=item *
Due to the commutativity breakage, code references are no longer
treated specially when appearing on the left of the C<~~> operator,
but like any vulgar scalar.
=item *
C<undef ~~ %hash> is always false (since C<undef> can't be a key in a
hash). No implicit conversion to C<""> is done (as was the case in perl
5.10.0).
=item *
C<$scalar ~~ @array> now always distributes the smart match across the
elements of the array. It's true if one element in @array verifies
C<$scalar ~~ $element>. This is a generalization of the old behaviour
that tested whether the array contained the scalar.
=back
The full dispatch table for the smart match operator is given in
L<perlsyn/"Smart matching in detail">.
=head3 Smart match and overloading
According to the rule of dispatch based on the rightmost argument type,
when an object overloading C<~~> appears on the right side of the
operator, the overload routine will always be called (with a 3rd argument
set to a true value, see L<overload>.) However, when the object will
appear on the left, the overload routine will be called only when the
rightmost argument is a simple scalar. This way distributivity of smart match
across arrays is not broken, as well as the other behaviours with complex
types (coderefs, hashes, regexes). Thus, writers of overloading routines
for smart match mostly need to worry only with comparing against a scalar,
and possibly with stringification overloading; the other common cases
will be automatically handled consistently.
C<~~> will now refuse to work on objects that do not overload it (in order
to avoid relying on the object's underlying structure). (However, if the
object overloads the stringification or the numification operators, and
if overload fallback is active, it will be used instead, as usual.)
=head2 Other incompatible changes
=over 4
=item *
The semantics of C<use feature :5.10*> have changed slightly.
See L</"Modules and Pragmata"> for more information.
=item *
It is now a run-time error to use the smart match operator C<~~>
with an object that has no overload defined for it. (This way
C<~~> will not break encapsulation by matching against the
object's internal representation as a reference.)
=item *
The version control system used for the development of the perl
interpreter has been switched from Perforce to git. This is mainly an
internal issue that only affects people actively working on the perl core;
but it may have minor external visibility, for example in some of details
of the output of C<perl -V>. See L<perlrepository> for more information.
=item *
The internal structure of the C<ext/> directory in the perl source has
been reorganised. In general, a module C<Foo::Bar> whose source was
stored under F<ext/Foo/Bar/> is now located under F<ext/Foo-Bar/>. Also,
some modules have been moved from F<lib/> to F<ext/>. This is purely a
source tarball change, and should make no difference to the compilation or
installation of perl, unless you have a very customised build process that
explicitly relies on this structure, or which hard-codes the C<nonxs_ext>
F<Configure> parameter. Specifically, this change does not by default
alter the location of any files in the final installation.
=item *
As part of the C<Test::Harness> 2.x to 3.x upgrade, the experimental
C<Test::Harness::Straps> module has been removed.
See L</"Updated Modules"> for more details.
=item *
As part of the C<ExtUtils::MakeMaker> upgrade, the
C<ExtUtils::MakeMaker::bytes> and C<ExtUtils::MakeMaker::vmsish> modules
have been removed from this distribution.
=item *
C<Module::CoreList> no longer contains the C<%:patchlevel> hash.
=item *
This one is actually a change introduced in 5.10.0, but it was missed
from that release's perldelta, so it is mentioned here instead.
A bugfix related to the handling of the C</m> modifier and C<qr> resulted
in a change of behaviour between 5.8.x and 5.10.0:
# matches in 5.8.x, doesn't match in 5.10.0
$re = qr/^bar/; "foo\nbar" =~ /$re/m;
=back
=head1 Core Enhancements
=head2 Unicode Character Database 5.1.0
The copy of the Unicode Character Database included in Perl 5.10.1 has
been updated to 5.1.0 from 5.0.0. See
L<http://www.unicode.org/versions/Unicode5.1.0/#Notable_Changes> for the
notable changes.
=head2 A proper interface for pluggable Method Resolution Orders
As of Perl 5.10.1 there is a new interface for plugging and using method
resolution orders other than the default (linear depth first search).
The C3 method resolution order added in 5.10.0 has been re-implemented as
a plugin, without changing its Perl-space interface. See L<perlmroapi> for
more information.
=head2 The C<overloading> pragma
This pragma allows you to lexically disable or enable overloading
for some or all operations. (Yuval Kogman)
=head2 Parallel tests
The core distribution can now run its regression tests in parallel on
Unix-like platforms. Instead of running C<make test>, set C<TEST_JOBS> in
your environment to the number of tests to run in parallel, and run
C<make test_harness>. On a Bourne-like shell, this can be done as
TEST_JOBS=3 make test_harness # Run 3 tests in parallel
An environment variable is used, rather than parallel make itself, because
L<TAP::Harness> needs to be able to schedule individual non-conflicting test
scripts itself, and there is no standard interface to C<make> utilities to
interact with their job schedulers.
Note that currently some test scripts may fail when run in parallel (most
notably C<ext/IO/t/io_dir.t>). If necessary run just the failing scripts
again sequentially and see if the failures go away.
=head2 DTrace support
Some support for DTrace has been added. See "DTrace support" in F<INSTALL>.
=head2 Support for C<configure_requires> in CPAN module metadata
Both C<CPAN> and C<CPANPLUS> now support the C<configure_requires> keyword
in the C<META.yml> metadata file included in most recent CPAN distributions.
This allows distribution authors to specify configuration prerequisites that
must be installed before running F<Makefile.PL> or F<Build.PL>.
See the documentation for C<ExtUtils::MakeMaker> or C<Module::Build> for more
on how to specify C<configure_requires> when creating a distribution for CPAN.
=head1 Modules and Pragmata
=head2 New Modules and Pragmata
=over 4
=item C<autodie>
This is a new lexically-scoped alternative for the C<Fatal> module.
The bundled version is 2.06_01. Note that in this release, using a string
eval when C<autodie> is in effect can cause the autodie behaviour to leak
into the surrounding scope. See L<autodie/"BUGS"> for more details.
=item C<Compress::Raw::Bzip2>
This has been added to the core (version 2.020).
=item C<parent>
This pragma establishes an ISA relationship with base classes at compile
time. It provides the key feature of C<base> without the feature creep.
=item C<Parse::CPAN::Meta>
This has been added to the core (version 1.39).
=back
=head2 Pragmata Changes
=over 4
=item C<attributes>
Upgraded from version 0.08 to 0.09.
=item C<attrs>
Upgraded from version 1.02 to 1.03.
=item C<base>
Upgraded from version 2.13 to 2.14. See L<parent> for a replacement.
=item C<bigint>
Upgraded from version 0.22 to 0.23.
=item C<bignum>
Upgraded from version 0.22 to 0.23.
=item C<bigrat>
Upgraded from version 0.22 to 0.23.
=item C<charnames>
Upgraded from version 1.06 to 1.07.
The Unicode F<NameAliases.txt> database file has been added. This has the
effect of adding some extra C<\N> character names that formerly wouldn't
have been recognised; for example, C<"\N{LATIN CAPITAL LETTER GHA}">.
=item C<constant>
Upgraded from version 1.13 to 1.17.
=item C<feature>
The meaning of the C<:5.10> and C<:5.10.X> feature bundles has
changed slightly. The last component, if any (i.e. C<X>) is simply ignored.
This is predicated on the assumption that new features will not, in
general, be added to maintenance releases. So C<:5.10> and C<:5.10.X>
have identical effect. This is a change to the behaviour documented for
5.10.0.
=item C<fields>
Upgraded from version 2.13 to 2.14 (this was just a version bump; there
were no functional changes).
=item C<lib>
Upgraded from version 0.5565 to 0.62.
=item C<open>
Upgraded from version 1.06 to 1.07.
=item C<overload>
Upgraded from version 1.06 to 1.07.
=item C<overloading>
See L</"The C<overloading> pragma"> above.
=item C<version>
Upgraded from version 0.74 to 0.77.
=back
=head2 Updated Modules
=over 4
=item C<Archive::Extract>
Upgraded from version 0.24 to 0.34.
=item C<Archive::Tar>
Upgraded from version 1.38 to 1.52.
=item C<Attribute::Handlers>
Upgraded from version 0.79 to 0.85.
=item C<AutoLoader>
Upgraded from version 5.63 to 5.68.
=item C<AutoSplit>
Upgraded from version 1.05 to 1.06.
=item C<B>
Upgraded from version 1.17 to 1.22.
=item C<B::Debug>
Upgraded from version 1.05 to 1.11.
=item C<B::Deparse>
Upgraded from version 0.83 to 0.89.
=item C<B::Lint>
Upgraded from version 1.09 to 1.11.
=item C<B::Xref>
Upgraded from version 1.01 to 1.02.
=item C<Benchmark>
Upgraded from version 1.10 to 1.11.
=item C<Carp>
Upgraded from version 1.08 to 1.11.
=item C<CGI>
Upgraded from version 3.29 to 3.43.
(also includes the "default_value for popup_menu()" fix from 3.45).
=item C<Compress::Zlib>
Upgraded from version 2.008 to 2.020.
=item C<CPAN>
Upgraded from version 1.9205 to 1.9402. C<CPAN::FTP> has a local fix to
stop it being too verbose on download failure.
=item C<CPANPLUS>
Upgraded from version 0.84 to 0.88.
=item C<CPANPLUS::Dist::Build>
Upgraded from version 0.06_02 to 0.36.
=item C<Cwd>
Upgraded from version 3.25_01 to 3.30.
=item C<Data::Dumper>
Upgraded from version 2.121_14 to 2.124.
=item C<DB>
Upgraded from version 1.01 to 1.02.
=item C<DB_File>
Upgraded from version 1.816_1 to 1.820.
=item C<Devel::PPPort>
Upgraded from version 3.13 to 3.19.
=item C<Digest::MD5>
Upgraded from version 2.36_01 to 2.39.
=item C<Digest::SHA>
Upgraded from version 5.45 to 5.47.
=item C<DirHandle>
Upgraded from version 1.01 to 1.03.
=item C<Dumpvalue>
Upgraded from version 1.12 to 1.13.
=item C<DynaLoader>
Upgraded from version 1.08 to 1.10.
=item C<Encode>
Upgraded from version 2.23 to 2.35.
=item C<Errno>
Upgraded from version 1.10 to 1.11.
=item C<Exporter>
Upgraded from version 5.62 to 5.63.
=item C<ExtUtils::CBuilder>
Upgraded from version 0.21 to 0.2602.
=item C<ExtUtils::Command>
Upgraded from version 1.13 to 1.16.
=item C<ExtUtils::Constant>
Upgraded from 0.20 to 0.22. (Note that neither of these versions are
available on CPAN.)
=item C<ExtUtils::Embed>
Upgraded from version 1.27 to 1.28.
=item C<ExtUtils::Install>
Upgraded from version 1.44 to 1.54.
=item C<ExtUtils::MakeMaker>
Upgraded from version 6.42 to 6.55_02.
Note that C<ExtUtils::MakeMaker::bytes> and C<ExtUtils::MakeMaker::vmsish>
have been removed from this distribution.
=item C<ExtUtils::Manifest>
Upgraded from version 1.51_01 to 1.56.
=item C<ExtUtils::ParseXS>
Upgraded from version 2.18_02 to 2.2002.
=item C<Fatal>
Upgraded from version 1.05 to 2.06_01. See also the new pragma C<autodie>.
=item C<File::Basename>
Upgraded from version 2.76 to 2.77.
=item C<File::Compare>
Upgraded from version 1.1005 to 1.1006.
=item C<File::Copy>
Upgraded from version 2.11 to 2.14.
=item C<File::Fetch>
Upgraded from version 0.14 to 0.20.
=item C<File::Find>
Upgraded from version 1.12 to 1.14.
=item C<File::Path>
Upgraded from version 2.04 to 2.07_03.
=item C<File::Spec>
Upgraded from version 3.2501 to 3.30.
=item C<File::stat>
Upgraded from version 1.00 to 1.01.
=item C<File::Temp>
Upgraded from version 0.18 to 0.22.
=item C<FileCache>
Upgraded from version 1.07 to 1.08.
=item C<FileHandle>
Upgraded from version 2.01 to 2.02.
=item C<Filter::Simple>
Upgraded from version 0.82 to 0.84.
=item C<Filter::Util::Call>
Upgraded from version 1.07 to 1.08.
=item C<FindBin>
Upgraded from version 1.49 to 1.50.
=item C<GDBM_File>
Upgraded from version 1.08 to 1.09.
=item C<Getopt::Long>
Upgraded from version 2.37 to 2.38.
=item C<Hash::Util::FieldHash>
Upgraded from version 1.03 to 1.04. This fixes a memory leak.
=item C<I18N::Collate>
Upgraded from version 1.00 to 1.01.
=item C<IO>
Upgraded from version 1.23_01 to 1.25.
This makes non-blocking mode work on Windows in C<IO::Socket::INET>
[CPAN #43573].
=item C<IO::Compress::*>
Upgraded from version 2.008 to 2.020.
=item C<IO::Dir>
Upgraded from version 1.06 to 1.07.
=item C<IO::Handle>
Upgraded from version 1.27 to 1.28.
=item C<IO::Socket>
Upgraded from version 1.30_01 to 1.31.
=item C<IO::Zlib>
Upgraded from version 1.07 to 1.09.
=item C<IPC::Cmd>
Upgraded from version 0.40_1 to 0.46.
=item C<IPC::Open3>
Upgraded from version 1.02 to 1.04.
=item C<IPC::SysV>
Upgraded from version 1.05 to 2.01.
=item C<lib>
Upgraded from version 0.5565 to 0.62.
=item C<List::Util>
Upgraded from version 1.19 to 1.21.
=item C<Locale::MakeText>
Upgraded from version 1.12 to 1.13.
=item C<Log::Message>
Upgraded from version 0.01 to 0.02.
=item C<Math::BigFloat>
Upgraded from version 1.59 to 1.60.
=item C<Math::BigInt>
Upgraded from version 1.88 to 1.89.
=item C<Math::BigInt::FastCalc>
Upgraded from version 0.16 to 0.19.
=item C<Math::BigRat>
Upgraded from version 0.21 to 0.22.
=item C<Math::Complex>
Upgraded from version 1.37 to 1.56.
=item C<Math::Trig>
Upgraded from version 1.04 to 1.20.
=item C<Memoize>
Upgraded from version 1.01_02 to 1.01_03 (just a minor documentation
change).
=item C<Module::Build>
Upgraded from version 0.2808_01 to 0.34_02.
=item C<Module::CoreList>
Upgraded from version 2.13 to 2.18. This release no longer contains the
C<%Module::CoreList::patchlevel> hash.
=item C<Module::Load>
Upgraded from version 0.12 to 0.16.
=item C<Module::Load::Conditional>
Upgraded from version 0.22 to 0.30.
=item C<Module::Loaded>
Upgraded from version 0.01 to 0.02.
=item C<Module::Pluggable>
Upgraded from version 3.6 to 3.9.
=item C<NDBM_File>
Upgraded from version 1.07 to 1.08.
=item C<Net::Ping>
Upgraded from version 2.33 to 2.36.
=item C<NEXT>
Upgraded from version 0.60_01 to 0.64.
=item C<Object::Accessor>
Upgraded from version 0.32 to 0.34.
=item C<OS2::REXX>
Upgraded from version 1.03 to 1.04.
=item C<Package::Constants>
Upgraded from version 0.01 to 0.02.
=item C<PerlIO>
Upgraded from version 1.04 to 1.06.
=item C<PerlIO::via>
Upgraded from version 0.04 to 0.07.
=item C<Pod::Man>
Upgraded from version 2.16 to 2.22.
=item C<Pod::Parser>
Upgraded from version 1.35 to 1.37.
=item C<Pod::Simple>
Upgraded from version 3.05 to 3.07.
=item C<Pod::Text>
Upgraded from version 3.08 to 3.13.
=item C<POSIX>
Upgraded from version 1.13 to 1.17.
=item C<Safe>
Upgraded from 2.12 to 2.18.
=item C<Scalar::Util>
Upgraded from version 1.19 to 1.21.
=item C<SelectSaver>
Upgraded from 1.01 to 1.02.
=item C<SelfLoader>
Upgraded from 1.11 to 1.17.
=item C<Socket>
Upgraded from 1.80 to 1.82.
=item C<Storable>
Upgraded from 2.18 to 2.20.
=item C<Switch>
Upgraded from version 2.13 to 2.14. Please see L</Deprecations>.
=item C<Symbol>
Upgraded from version 1.06 to 1.07.
=item C<Sys::Syslog>
Upgraded from version 0.22 to 0.27.
=item C<Term::ANSIColor>
Upgraded from version 1.12 to 2.00.
=item C<Term::ReadLine>
Upgraded from version 1.03 to 1.04.
=item C<Term::UI>
Upgraded from version 0.18 to 0.20.
=item C<Test::Harness>
Upgraded from version 2.64 to 3.17.
Note that one side-effect of the 2.x to 3.x upgrade is that the
experimental C<Test::Harness::Straps> module (and its supporting
C<Assert>, C<Iterator>, C<Point> and C<Results> modules) have been
removed. If you still need this, then they are available in the
(unmaintained) C<Test-Harness-Straps> distribution on CPAN.
=item C<Test::Simple>
Upgraded from version 0.72 to 0.92.
=item C<Text::ParseWords>
Upgraded from version 3.26 to 3.27.
=item C<Text::Tabs>
Upgraded from version 2007.1117 to 2009.0305.
=item C<Text::Wrap>
Upgraded from version 2006.1117 to 2009.0305.
=item C<Thread::Queue>
Upgraded from version 2.00 to 2.11.
=item C<Thread::Semaphore>
Upgraded from version 2.01 to 2.09.
=item C<threads>
Upgraded from version 1.67 to 1.72.
=item C<threads::shared>
Upgraded from version 1.14 to 1.29.
=item C<Tie::RefHash>
Upgraded from version 1.37 to 1.38.
=item C<Tie::StdHandle>
This has documentation changes, and has been assigned a version for the
first time: version 4.2.
=item C<Time::HiRes>
Upgraded from version 1.9711 to 1.9719.
=item C<Time::Local>
Upgraded from version 1.18 to 1.1901.
=item C<Time::Piece>
Upgraded from version 1.12 to 1.15.
=item C<Unicode::Normalize>
Upgraded from version 1.02 to 1.03.
=item C<Unicode::UCD>
Upgraded from version 0.25 to 0.27.
C<charinfo()> now works on Unified CJK code points added to later versions
of Unicode.
C<casefold()> has new fields returned to provide both a simpler interface
and previously missing information. The old fields are retained for
backwards compatibility. Information about Turkic-specific code points is
now returned.
The documentation has been corrected and expanded.
=item C<UNIVERSAL>
Upgraded from version 1.04 to 1.05.
=item C<Win32>
Upgraded from version 0.34 to 0.39.
=item C<Win32API::File>
Upgraded from version 0.1001_01 to 0.1101.
=item C<XSLoader>
Upgraded from version 0.08 to 0.10.
=back
=head1 Utility Changes
=over 4
=item F<h2ph>
Now looks in C<include-fixed> too, which is a recent addition to gcc's
search path.
=item F<h2xs>
No longer incorrectly treats enum values like macros (Daniel Burr).
Now handles C++ style constants (C<//>) properly in enums. (A patch from
Rainer Weikusat was used; Daniel Burr also proposed a similar fix).
=item F<perl5db.pl>
C<LVALUE> subroutines now work under the debugger.
The debugger now correctly handles proxy constant subroutines, and
subroutine stubs.
=item F<perlthanks>
Perl 5.10.1 adds a new utility F<perlthanks>, which is a variant of
F<perlbug>, but for sending non-bug-reports to the authors and maintainers
of Perl. Getting nothing but bug reports can become a bit demoralising:
we'll see if this changes things.
=back
=head1 New Documentation
=over 4
=item L<perlhaiku>
This contains instructions on how to build perl for the Haiku platform.
=item L<perlmroapi>
This describes the new interface for pluggable Method Resolution Orders.
=item L<perlperf>
This document, by Richard Foley, provides an introduction to the use of
performance and optimization techniques which can be used with particular
reference to perl programs.
=item L<perlrepository>
This describes how to access the perl source using the I<git> version
control system.
=item L<perlthanks>
This describes the new F<perlthanks> utility.
=back
=head1 Changes to Existing Documentation
The various large C<Changes*> files (which listed every change made to perl
over the last 18 years) have been removed, and replaced by a small file,
also called C<Changes>, which just explains how that same information may
be extracted from the git version control system.
The file F<Porting/patching.pod> has been deleted, as it mainly described
interacting with the old Perforce-based repository, which is now obsolete.
Information still relevant has been moved to L<perlrepository>.
L<perlapi>, L<perlintern>, L<perlmodlib> and L<perltoc> are now all
generated at build time, rather than being shipped as part of the release.
=head1 Performance Enhancements
=over 4
=item *
A new internal cache means that C<isa()> will often be faster.
=item *
Under C<use locale>, the locale-relevant information is now cached on
read-only values, such as the list returned by C<keys %hash>. This makes
operations such as C<sort keys %hash> in the scope of C<use locale> much
faster.
=item *
Empty C<DESTROY> methods are no longer called.
=back
=head1 Installation and Configuration Improvements
=head2 F<ext/> reorganisation
The layout of directories in F<ext> has been revised. Specifically, all
extensions are now flat, and at the top level, with C</> in pathnames
replaced by C<->, so that F<ext/Data/Dumper/> is now F<ext/Data-Dumper/>,
etc. The names of the extensions as specified to F<Configure>, and as
reported by C<%Config::Config> under the keys C<dynamic_ext>,
C<known_extensions>, C<nonxs_ext> and C<static_ext> have not changed, and
still use C</>. Hence this change will not have any affect once perl is
installed. However, C<Attribute::Handlers>, C<Safe> and C<mro> have now
become extensions in their own right, so if you run F<Configure> with
options to specify an exact list of extensions to build, you will need to
change it to account for this.
For 5.10.2, it is planned that many dual-life modules will have been moved
from F<lib> to F<ext>; again this will have no effect on an installed
perl, but will matter if you invoke F<Configure> with a pre-canned list of
extensions to build.
=head2 Configuration improvements
If C<vendorlib> and C<vendorarch> are the same, then they are only added to
C<@INC> once.
C<$Config{usedevel}> and the C-level C<PERL_USE_DEVEL> are now defined if
perl is built with C<-Dusedevel>.
F<Configure> will enable use of C<-fstack-protector>, to provide protection
against stack-smashing attacks, if the compiler supports it.
F<Configure> will now determine the correct prototypes for re-entrant
functions, and for C<gconvert>, if you are using a C++ compiler rather
than a C compiler.
On Unix, if you build from a tree containing a git repository, the
configuration process will note the commit hash you have checked out, for
display in the output of C<perl -v> and C<perl -V>. Unpushed local commits
are automatically added to the list of local patches displayed by
C<perl -V>.
=head2 Compilation improvements
As part of the flattening of F<ext>, all extensions on all platforms are
built by F<make_ext.pl>. This replaces the Unix-specific
F<ext/util/make_ext>, VMS-specific F<make_ext.com> and Win32-specific
F<win32/buildext.pl>.
=head2 Platform Specific Changes
=over 4
=item AIX
Removed F<libbsd> for AIX 5L and 6.1. Only flock() was used from F<libbsd>.
Removed F<libgdbm> for AIX 5L and 6.1. The F<libgdbm> is delivered as an
optional package with the AIX Toolbox. Unfortunately the 64 bit version
is broken.
Hints changes mean that AIX 4.2 should work again.
=item Cygwin
On Cygwin we now strip the last number from the DLL. This has been the
behaviour in the cygwin.com build for years. The hints files have been
updated.
=item FreeBSD
The hints files now identify the correct threading libraries on FreeBSD 7
and later.
=item Irix
We now work around a bizarre preprocessor bug in the Irix 6.5 compiler:
C<cc -E -> unfortunately goes into K&R mode, but C<cc -E file.c> doesn't.
=item Haiku
Patches from the Haiku maintainers have been merged in. Perl should now
build on Haiku.
=item MirOS BSD
Perl should now build on MirOS BSD.
=item NetBSD
Hints now supports versions 5.*.
=item Stratus VOS
Various changes from Stratus have been merged in.
=item Symbian
There is now support for Symbian S60 3.2 SDK and S60 5.0 SDK.
=item Win32
Improved message window handling means that C<alarm> and C<kill> messages
will no longer be dropped under race conditions.
=item VMS
Reads from the in-memory temporary files of C<PerlIO::scalar> used to fail
if C<$/> was set to a numeric reference (to indicate record-style reads).
This is now fixed.
VMS now supports C<getgrgid>.
Many improvements and cleanups have been made to the VMS file name handling
and conversion code.
Enabling the C<PERL_VMS_POSIX_EXIT> logical name now encodes a POSIX exit
status in a VMS condition value for better interaction with GNV's bash
shell and other utilities that depend on POSIX exit values. See
L<perlvms/"$?"> for details.
=back
=head1 Selected Bug Fixes
=over 4
=item *
5.10.0 inadvertently disabled an optimisation, which caused a measurable
performance drop in list assignment, such as is often used to assign
function parameters from C<@_>. The optimisation has been re-instated, and
the performance regression fixed.
=item *
Fixed memory leak on C<while (1) { map 1, 1 }> [RT #53038].
=item *
Some potential coredumps in PerlIO fixed [RT #57322,54828].
=item *
The debugger now works with lvalue subroutines.
=item *
The debugger's C<m> command was broken on modules that defined constants
[RT #61222].
=item *
C<crypt()> and string complement could return tainted values for untainted
arguments [RT #59998].
=item *
The C<-i.suffix> command-line switch now recreates the file using
restricted permissions, before changing its mode to match the original
file. This eliminates a potential race condition [RT #60904].
=item *
On some Unix systems, the value in C<$?> would not have the top bit set
(C<$? & 128>) even if the child core dumped.
=item *
Under some circumstances, $^R could incorrectly become undefined
[RT #57042].
=item *
(XS) In various hash functions, passing a pre-computed hash to when the
key is UTF-8 might result in an incorrect lookup.
=item *
(XS) Including F<XSUB.h> before F<perl.h> gave a compile-time error
[RT #57176].
=item *
C<< $object->isa('Foo') >> would report false if the package C<Foo> didn't
exist, even if the object's C<@ISA> contained C<Foo>.
=item *
Various bugs in the new-to 5.10.0 mro code, triggered by manipulating
C<@ISA>, have been found and fixed.
=item *
Bitwise operations on references could crash the interpreter, e.g.
C<$x=\$y; $x |= "foo"> [RT #54956].
=item *
Patterns including alternation might be sensitive to the internal UTF-8
representation, e.g.
my $byte = chr(192);
my $utf8 = chr(192); utf8::upgrade($utf8);
$utf8 =~ /$byte|X}/i; # failed in 5.10.0
=item *
Within UTF8-encoded Perl source files (i.e. where C<use utf8> is in
effect), double-quoted literal strings could be corrupted where a C<\xNN>,
C<\0NNN> or C<\N{}> is followed by a literal character with ordinal value
greater than 255 [RT #59908].
=item *
C<B::Deparse> failed to correctly deparse various constructs:
C<readpipe STRING> [RT #62428], C<CORE::require(STRING)> [RT #62488],
C<sub foo(_)> [RT #62484].
=item *
Using C<setpgrp()> with no arguments could corrupt the perl stack.
=item *
The block form of C<eval> is now specifically trappable by C<Safe> and
C<ops>. Previously it was erroneously treated like string C<eval>.
=item *
In 5.10.0, the two characters C<[~> were sometimes parsed as the smart
match operator (C<~~>) [RT #63854].
=item *
In 5.10.0, the C<*> quantifier in patterns was sometimes treated as
C<{0,32767}> [RT #60034, #60464]. For example, this match would fail:
("ab" x 32768) =~ /^(ab)*$/
=item *
C<shmget> was limited to a 32 bit segment size on a 64 bit OS [RT #63924].
=item *
Using C<next> or C<last> to exit a C<given> block no longer produces a
spurious warning like the following:
Exiting given via last at foo.pl line 123
=item *
On Windows, C<'.\foo'> and C<'..\foo'> were treated differently than
C<'./foo'> and C<'../foo'> by C<do> and C<require> [RT #63492].
=item *
Assigning a format to a glob could corrupt the format; e.g.:
*bar=*foo{FORMAT}; # foo format now bad
=item *
Attempting to coerce a typeglob to a string or number could cause an
assertion failure. The correct error message is now generated,
C<Can't coerce GLOB to I<$type>>.
=item *
Under C<use filetest 'access'>, C<-x> was using the wrong access mode. This
has been fixed [RT #49003].
=item *
C<length> on a tied scalar that returned a Unicode value would not be
correct the first time. This has been fixed.
=item *
Using an array C<tie> inside in array C<tie> could SEGV. This has been
fixed. [RT #51636]
=item *
A race condition inside C<PerlIOStdio_close()> has been identified and
fixed. This used to cause various threading issues, including SEGVs.
=item *
In C<unpack>, the use of C<()> groups in scalar context was internally
placing a list on the interpreter's stack, which manifested in various
ways, including SEGVs. This is now fixed [RT #50256].
=item *
Magic was called twice in C<substr>, C<\&$x>, C<tie $x, $m> and C<chop>.
These have all been fixed.
=item *
A 5.10.0 optimisation to clear the temporary stack within the implicit
loop of C<s///ge> has been reverted, as it turned out to be the cause of
obscure bugs in seemingly unrelated parts of the interpreter [commit
ef0d4e17921ee3de].
=item *
The line numbers for warnings inside C<elsif> are now correct.
=item *
The C<..> operator now works correctly with ranges whose ends are at or
close to the values of the smallest and largest integers.
=item *
C<binmode STDIN, ':raw'> could lead to segmentation faults on some platforms.
This has been fixed [RT #54828].
=item *
An off-by-one error meant that C<index $str, ...> was effectively being
executed as C<index "$str\0", ...>. This has been fixed [RT #53746].
=item *
Various leaks associated with named captures in regexes have been fixed
[RT #57024].
=item *
A weak reference to a hash would leak. This was affecting C<DBI>
[RT #56908].
=item *
Using (?|) in a regex could cause a segfault [RT #59734].
=item *
Use of a UTF-8 C<tr//> within a closure could cause a segfault [RT #61520].
=item *
Calling C<sv_chop()> or otherwise upgrading an SV could result in an
unaligned 64-bit access on the SPARC architecture [RT #60574].
=item *
In the 5.10.0 release, C<inc_version_list> would incorrectly list
C<5.10.*> after C<5.8.*>; this affected the C<@INC> search order
[RT #67628].
=item *
In 5.10.0, C<pack "a*", $tainted_value> returned a non-tainted value
[RT #52552].
=item *
In 5.10.0, C<printf> and C<sprintf> could produce the fatal error
C<panic: utf8_mg_pos_cache_update> when printing UTF-8 strings
[RT #62666].
=item *
In the 5.10.0 release, a dynamically created C<AUTOLOAD> method might be
missed (method cache issue) [RT #60220,60232].
=item *
In the 5.10.0 release, a combination of C<use feature> and C<//ee> could
cause a memory leak [RT #63110].
=item *
C<-C> on the shebang (C<#!>) line is once more permitted if it is also
specified on the command line. C<-C> on the shebang line used to be a
silent no-op I<if> it was not also on the command line, so perl 5.10.0
disallowed it, which broke some scripts. Now perl checks whether it is
also on the command line and only dies if it is not [RT #67880].
=item *
In 5.10.0, certain types of re-entrant regular expression could crash,
or cause the following assertion failure [RT #60508]:
Assertion rx->sublen >= (s - rx->subbeg) + i failed
=back
=head1 New or Changed Diagnostics
=over 4
=item C<panic: sv_chop %s>
This new fatal error occurs when the C routine C<Perl_sv_chop()> was
passed a position that is not within the scalar's string buffer. This
could be caused by buggy XS code, and at this point recovery is not
possible.
=item C<Can't locate package %s for the parents of %s>
This warning has been removed. In general, it only got produced in
conjunction with other warnings, and removing it allowed an ISA lookup
optimisation to be added.
=item C<v-string in use/require is non-portable>
This warning has been removed.
=item C<Deep recursion on subroutine "%s">
It is now possible to change the depth threshold for this warning from the
default of 100, by recompiling the F<perl> binary, setting the C
pre-processor macro C<PERL_SUB_DEPTH_WARN> to the desired value.
=back
=head1 Changed Internals
=over 4
=item *
The J.R.R. Tolkien quotes at the head of C source file have been checked and
proper citations added, thanks to a patch from Tom Christiansen.
=item *
C<vcroak()> now accepts a null first argument. In addition, a full audit
was made of the "not NULL" compiler annotations, and those for several
other internal functions were corrected.
=item *
New macros C<dSAVEDERRNO>, C<dSAVE_ERRNO>, C<SAVE_ERRNO>, C<RESTORE_ERRNO>
have been added to formalise the temporary saving of the C<errno>
variable.
=item *
The function C<Perl_sv_insert_flags> has been added to augment
C<Perl_sv_insert>.
=item *
The function C<Perl_newSV_type(type)> has been added, equivalent to
C<Perl_newSV()> followed by C<Perl_sv_upgrade(type)>.
=item *
The function C<Perl_newSVpvn_flags()> has been added, equivalent to
C<Perl_newSVpvn()> and then performing the action relevant to the flag.
Two flag bits are currently supported.
=over 4
=item C<SVf_UTF8>
This will call C<SvUTF8_on()> for you. (Note that this does not convert an
sequence of ISO 8859-1 characters to UTF-8). A wrapper, C<newSVpvn_utf8()>
is available for this.
=item C<SVs_TEMP>
Call C<sv_2mortal()> on the new SV.
=back
There is also a wrapper that takes constant strings, C<newSVpvs_flags()>.
=item *
The function C<Perl_croak_xs_usage> has been added as a wrapper to
C<Perl_croak>.
=item *
The functions C<PerlIO_find_layer> and C<PerlIO_list_alloc> are now
exported.
=item *
C<PL_na> has been exterminated from the core code, replaced by local STRLEN
temporaries, or C<*_nolen()> calls. Either approach is faster than C<PL_na>,
which is a pointer deference into the interpreter structure under ithreads,
and a global variable otherwise.
=item *
C<Perl_mg_free()> used to leave freed memory accessible via SvMAGIC() on
the scalar. It now updates the linked list to remove each piece of magic
as it is freed.
=item *
Under ithreads, the regex in C<PL_reg_curpm> is now reference counted. This
eliminates a lot of hackish workarounds to cope with it not being reference
counted.
=item *
C<Perl_mg_magical()> would sometimes incorrectly turn on C<SvRMAGICAL()>.
This has been fixed.
=item *
The I<public> IV and NV flags are now not set if the string value has
trailing "garbage". This behaviour is consistent with not setting the
public IV or NV flags if the value is out of range for the type.
=item *
SV allocation tracing has been added to the diagnostics enabled by C<-Dm>.
The tracing can alternatively output via the C<PERL_MEM_LOG> mechanism, if
that was enabled when the F<perl> binary was compiled.
=item *
Uses of C<Nullav>, C<Nullcv>, C<Nullhv>, C<Nullop>, C<Nullsv> etc have been
replaced by C<NULL> in the core code, and non-dual-life modules, as C<NULL>
is clearer to those unfamiliar with the core code.
=item *
A macro C<MUTABLE_PTR(p)> has been added, which on (non-pedantic) gcc will
not cast away C<const>, returning a C<void *>. Macros C<MUTABLE_SV(av)>,
C<MUTABLE_SV(cv)> etc build on this, casting to C<AV *> etc without
casting away C<const>. This allows proper compile-time auditing of
C<const> correctness in the core, and helped picked up some errors (now
fixed).
=item *
Macros C<mPUSHs()> and C<mXPUSHs()> have been added, for pushing SVs on the
stack and mortalizing them.
=item *
Use of the private structure C<mro_meta> has changed slightly. Nothing
outside the core should be accessing this directly anyway.
=item *
A new tool, C<Porting/expand-macro.pl> has been added, that allows you
to view how a C preprocessor macro would be expanded when compiled.
This is handy when trying to decode the macro hell that is the perl
guts.
=back
=head1 New Tests
Many modules updated from CPAN incorporate new tests.
Several tests that have the potential to hang forever if they fail now
incorporate a "watchdog" functionality that will kill them after a timeout,
which helps ensure that C<make test> and C<make test_harness> run to
completion automatically. (Jerry Hedden).
Some core-specific tests have been added:
=over 4
=item t/comp/retainedlines.t
Check that the debugger can retain source lines from C<eval>.
=item t/io/perlio_fail.t
Check that bad layers fail.
=item t/io/perlio_leaks.t
Check that PerlIO layers are not leaking.
=item t/io/perlio_open.t
Check that certain special forms of open work.
=item t/io/perlio.t
General PerlIO tests.
=item t/io/pvbm.t
Check that there is no unexpected interaction between the internal types
C<PVBM> and C<PVGV>.
=item t/mro/package_aliases.t
Check that mro works properly in the presence of aliased packages.
=item t/op/dbm.t
Tests for C<dbmopen> and C<dbmclose>.
=item t/op/index_thr.t
Tests for the interaction of C<index> and threads.
=item t/op/pat_thr.t
Tests for the interaction of esoteric patterns and threads.
=item t/op/qr_gc.t
Test that C<qr> doesn't leak.
=item t/op/reg_email_thr.t
Tests for the interaction of regex recursion and threads.
=item t/op/regexp_qr_embed_thr.t
Tests for the interaction of patterns with embedded C<qr//> and threads.
=item t/op/regexp_unicode_prop.t
Tests for Unicode properties in regular expressions.
=item t/op/regexp_unicode_prop_thr.t
Tests for the interaction of Unicode properties and threads.
=item t/op/reg_nc_tie.t
Test the tied methods of C<Tie::Hash::NamedCapture>.
=item t/op/reg_posixcc.t
Check that POSIX character classes behave consistently.
=item t/op/re.t
Check that exportable C<re> functions in F<universal.c> work.
=item t/op/setpgrpstack.t
Check that C<setpgrp> works.
=item t/op/substr_thr.t
Tests for the interaction of C<substr> and threads.
=item t/op/upgrade.t
Check that upgrading and assigning scalars works.
=item t/uni/lex_utf8.t
Check that Unicode in the lexer works.
=item t/uni/tie.t
Check that Unicode and C<tie> work.
=back
=head1 Known Problems
This is a list of some significant unfixed bugs, which are regressions
from either 5.10.0 or 5.8.x.
=over 4
=item *
C<List::Util::first> misbehaves in the presence of a lexical C<$_>
(typically introduced by C<my $_> or implicitly by C<given>). The variable
which gets set for each iteration is the package variable C<$_>, not the
lexical C<$_> [RT #67694].
A similar issue may occur in other modules that provide functions which
take a block as their first argument, like
foo { ... $_ ...} list
=item *
The C<charnames> pragma may generate a run-time error when a regex is
interpolated [RT #56444]:
use charnames ':full';
my $r1 = qr/\N{THAI CHARACTER SARA I}/;
"foo" =~ $r1; # okay
"foo" =~ /$r1+/; # runtime error
A workaround is to generate the character outside of the regex:
my $a = "\N{THAI CHARACTER SARA I}";
my $r1 = qr/$a/;
=item *
Some regexes may run much more slowly when run in a child thread compared
with the thread the pattern was compiled into [RT #55600].
=back
=head1 Deprecations
The following items are now deprecated.
=over 4
=item *
C<Switch> is buggy and should be avoided. From perl 5.11.0 onwards, it is
intended that any use of the core version of this module will emit a
warning, and that the module will eventually be removed from the core
(probably in perl 5.14.0). See L<perlsyn/"Switch statements"> for its
replacement.
=item *
C<suidperl> will be removed in 5.12.0. This provides a mechanism to
emulate setuid permission bits on systems that don't support it properly.
=back
=head1 Acknowledgements
Some of the work in this release was funded by a TPF grant.
Nicholas Clark officially retired from maintenance pumpking duty at the
end of 2008; however in reality he has put much effort in since then to
help get 5.10.1 into a fit state to be released, including writing a
considerable chunk of this perldelta.
Steffen Mueller and David Golden in particular helped getting CPAN modules
polished and synchronised with their in-core equivalents.
Craig Berry was tireless in getting maint to run under VMS, no matter how
many times we broke it for him.
The other core committers contributed most of the changes, and applied most
of the patches sent in by the hundreds of contributors listed in F<AUTHORS>.
(Sorry to all the people I haven't mentioned by name).
Finally, thanks to Larry Wall, without whom none of this would be
necessary.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles
recently posted to the comp.lang.perl.misc newsgroup and the perl
bug database at http://rt.perl.org/perlbug/ . There may also be
information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the B<perlbug>
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it
inappropriate to send to a publicly archived mailing list, then please send
it to perl5-security-report@perl.org. This points to a closed subscription
unarchived mailing list, which includes
all the core committers, who will be able
to help assess the impact of issues, figure out a resolution, and help
co-ordinate the release of patches to mitigate or fix the problem across all
platforms on which Perl is supported. Please only use this address for
security issues in the Perl core, not for modules independently
distributed on CPAN.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details
on what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK y3�Z�Cx perl5123delta.podnu �[��� =encoding utf8
=head1 NAME
perl5123delta - what is new for perl v5.12.3
=head1 DESCRIPTION
This document describes differences between the 5.12.2 release and
the 5.12.3 release.
If you are upgrading from an earlier release such as 5.12.1, first read
L<perl5122delta>, which describes differences between 5.12.1 and
5.12.2. The major changes made in 5.12.0 are described in L<perl5120delta>.
=head1 Incompatible Changes
There are no changes intentionally incompatible with 5.12.2. If any
exist, they are bugs and reports are welcome.
=head1 Core Enhancements
=head2 C<keys>, C<values> work on arrays
You can now use the C<keys>, C<values>, C<each> builtin functions on arrays
(previously you could only use them on hashes). See L<perlfunc> for details.
This is actually a change introduced in perl 5.12.0, but it was missed from
that release's perldelta.
=head1 Bug Fixes
"no VERSION" will now correctly deparse with B::Deparse, as will certain
constant expressions.
Module::Build should be more reliably pass its tests under cygwin.
Lvalue subroutines are again able to return copy-on-write scalars. This
had been broken since version 5.10.0.
=head1 Platform Specific Notes
=over 4
=item Solaris
A separate DTrace is now build for miniperl, which means that perl can be
compiled with -Dusedtrace on Solaris again.
=item VMS
A number of regressions on VMS have been fixed. In addition to minor cleanup
of questionable expressions in F<vms.c>, file permissions should no longer be
garbled by the PerlIO layer, and spurious record boundaries should no longer be
introduced by the PerlIO layer during output.
For more details and discussion on the latter, see:
http://www.nntp.perl.org/group/perl.vmsperl/2010/11/msg15419.html
=item VOS
A few very small changes were made to the build process on VOS to better
support the platform. Longer-than-32-character filenames are now supported on
OpenVOS, and build properly without IPv6 support.
=back
=head1 Acknowledgements
Perl 5.12.3 represents approximately four months of development since
Perl 5.12.2 and contains approximately 2500 lines of changes across
54 files from 16 authors.
Perl continues to flourish into its third decade thanks to a vibrant
community of users and developers. The following people are known to
have contributed the improvements that became Perl 5.12.3:
Craig A. Berry, David Golden, David Leadbeater, Father Chrysostomos, Florian
Ragwitz, Jesse Vincent, Karl Williamson, Nick Johnston, Nicolas Kaiser, Paul
Green, Rafael Garcia-Suarez, Rainer Tammer, Ricardo Signes, Steffen Mueller,
Zsbán Ambrus, Ævar Arnfjörð Bjarmason
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles
recently posted to the comp.lang.perl.misc newsgroup and the perl
bug database at http://rt.perl.org/perlbug/ . There may also be
information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the B<perlbug>
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it
inappropriate to send to a publicly archived mailing list, then please send
it to perl5-security-report@perl.org. This points to a closed subscription
unarchived mailing list, which includes
all the core committers, who will be able
to help assess the impact of issues, figure out a resolution, and help
co-ordinate the release of patches to mitigate or fix the problem across all
platforms on which Perl is supported. Please only use this address for
security issues in the Perl core, not for modules independently
distributed on CPAN.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details
on what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK y3�Z�NLdI| I| perlrebackslash.podnu �[��� =head1 NAME
perlrebackslash - Perl Regular Expression Backslash Sequences and Escapes
=head1 DESCRIPTION
The top level documentation about Perl regular expressions
is found in L<perlre>.
This document describes all backslash and escape sequences. After
explaining the role of the backslash, it lists all the sequences that have
a special meaning in Perl regular expressions (in alphabetical order),
then describes each of them.
Most sequences are described in detail in different documents; the primary
purpose of this document is to have a quick reference guide describing all
backslash and escape sequences.
=head2 The backslash
In a regular expression, the backslash can perform one of two tasks:
it either takes away the special meaning of the character following it
(for instance, C<\|> matches a vertical bar, it's not an alternation),
or it is the start of a backslash or escape sequence.
The rules determining what it is are quite simple: if the character
following the backslash is an ASCII punctuation (non-word) character (that is,
anything that is not a letter, digit, or underscore), then the backslash just
takes away any special meaning of the character following it.
If the character following the backslash is an ASCII letter or an ASCII digit,
then the sequence may be special; if so, it's listed below. A few letters have
not been used yet, so escaping them with a backslash doesn't change them to be
special. A future version of Perl may assign a special meaning to them, so if
you have warnings turned on, Perl issues a warning if you use such a
sequence. [1].
It is however guaranteed that backslash or escape sequences never have a
punctuation character following the backslash, not now, and not in a future
version of Perl 5. So it is safe to put a backslash in front of a non-word
character.
Note that the backslash itself is special; if you want to match a backslash,
you have to escape the backslash with a backslash: C</\\/> matches a single
backslash.
=over 4
=item [1]
There is one exception. If you use an alphanumeric character as the
delimiter of your pattern (which you probably shouldn't do for readability
reasons), you have to escape the delimiter if you want to match
it. Perl won't warn then. See also L<perlop/Gory details of parsing
quoted constructs>.
=back
=head2 All the sequences and escapes
Those not usable within a bracketed character class (like C<[\da-z]>) are marked
as C<Not in [].>
\000 Octal escape sequence. See also \o{}.
\1 Absolute backreference. Not in [].
\a Alarm or bell.
\A Beginning of string. Not in [].
\b{}, \b Boundary. (\b is a backspace in []).
\B{}, \B Not a boundary. Not in [].
\cX Control-X.
\d Match any digit character.
\D Match any character that isn't a digit.
\e Escape character.
\E Turn off \Q, \L and \U processing. Not in [].
\f Form feed.
\F Foldcase till \E. Not in [].
\g{}, \g1 Named, absolute or relative backreference.
Not in [].
\G Pos assertion. Not in [].
\h Match any horizontal whitespace character.
\H Match any character that isn't horizontal whitespace.
\k{}, \k<>, \k'' Named backreference. Not in [].
\K Keep the stuff left of \K. Not in [].
\l Lowercase next character. Not in [].
\L Lowercase till \E. Not in [].
\n (Logical) newline character.
\N Match any character but newline. Not in [].
\N{} Named or numbered (Unicode) character or sequence.
\o{} Octal escape sequence.
\p{}, \pP Match any character with the given Unicode property.
\P{}, \PP Match any character without the given property.
\Q Quote (disable) pattern metacharacters till \E. Not
in [].
\r Return character.
\R Generic new line. Not in [].
\s Match any whitespace character.
\S Match any character that isn't a whitespace.
\t Tab character.
\u Titlecase next character. Not in [].
\U Uppercase till \E. Not in [].
\v Match any vertical whitespace character.
\V Match any character that isn't vertical whitespace
\w Match any word character.
\W Match any character that isn't a word character.
\x{}, \x00 Hexadecimal escape sequence.
\X Unicode "extended grapheme cluster". Not in [].
\z End of string. Not in [].
\Z End of string. Not in [].
=head2 Character Escapes
=head3 Fixed characters
A handful of characters have a dedicated I<character escape>. The following
table shows them, along with their ASCII code points (in decimal and hex),
their ASCII name, the control escape on ASCII platforms and a short
description. (For EBCDIC platforms, see L<perlebcdic/OPERATOR DIFFERENCES>.)
Seq. Code Point ASCII Cntrl Description.
Dec Hex
\a 7 07 BEL \cG alarm or bell
\b 8 08 BS \cH backspace [1]
\e 27 1B ESC \c[ escape character
\f 12 0C FF \cL form feed
\n 10 0A LF \cJ line feed [2]
\r 13 0D CR \cM carriage return
\t 9 09 TAB \cI tab
=over 4
=item [1]
C<\b> is the backspace character only inside a character class. Outside a
character class, C<\b> alone is a word-character/non-word-character
boundary, and C<\b{}> is some other type of boundary.
=item [2]
C<\n> matches a logical newline. Perl converts between C<\n> and your
OS's native newline character when reading from or writing to text files.
=back
=head4 Example
$str =~ /\t/; # Matches if $str contains a (horizontal) tab.
=head3 Control characters
C<\c> is used to denote a control character; the character following C<\c>
determines the value of the construct. For example the value of C<\cA> is
C<chr(1)>, and the value of C<\cb> is C<chr(2)>, etc.
The gory details are in L<perlop/"Regexp Quote-Like Operators">. A complete
list of what C<chr(1)>, etc. means for ASCII and EBCDIC platforms is in
L<perlebcdic/OPERATOR DIFFERENCES>.
Note that C<\c\> alone at the end of a regular expression (or doubled-quoted
string) is not valid. The backslash must be followed by another character.
That is, C<\c\I<X>> means C<chr(28) . 'I<X>'> for all characters I<X>.
To write platform-independent code, you must use C<\N{I<NAME>}> instead, like
C<\N{ESCAPE}> or C<\N{U+001B}>, see L<charnames>.
Mnemonic: I<c>ontrol character.
=head4 Example
$str =~ /\cK/; # Matches if $str contains a vertical tab (control-K).
=head3 Named or numbered characters and character sequences
Unicode characters have a Unicode name and numeric code point (ordinal)
value. Use the
C<\N{}> construct to specify a character by either of these values.
Certain sequences of characters also have names.
To specify by name, the name of the character or character sequence goes
between the curly braces.
To specify a character by Unicode code point, use the form C<\N{U+I<code
point>}>, where I<code point> is a number in hexadecimal that gives the
code point that Unicode has assigned to the desired character. It is
customary but not required to use leading zeros to pad the number to 4
digits. Thus C<\N{U+0041}> means C<LATIN CAPITAL LETTER A>, and you will
rarely see it written without the two leading zeros. C<\N{U+0041}> means
"A" even on EBCDIC machines (where the ordinal value of "A" is not 0x41).
It is even possible to give your own names to characters and character
sequences. For details, see L<charnames>.
(There is an expanded internal form that you may see in debug output:
C<\N{U+I<code point>.I<code point>...}>.
The C<...> means any number of these I<code point>s separated by dots.
This represents the sequence formed by the characters. This is an internal
form only, subject to change, and you should not try to use it yourself.)
Mnemonic: I<N>amed character.
Note that a character or character sequence expressed as a named
or numbered character is considered a character without special
meaning by the regex engine, and will match "as is".
=head4 Example
$str =~ /\N{THAI CHARACTER SO SO}/; # Matches the Thai SO SO character
use charnames 'Cyrillic'; # Loads Cyrillic names.
$str =~ /\N{ZHE}\N{KA}/; # Match "ZHE" followed by "KA".
=head3 Octal escapes
There are two forms of octal escapes. Each is used to specify a character by
its code point specified in octal notation.
One form, available starting in Perl 5.14 looks like C<\o{...}>, where the dots
represent one or more octal digits. It can be used for any Unicode character.
It was introduced to avoid the potential problems with the other form,
available in all Perls. That form consists of a backslash followed by three
octal digits. One problem with this form is that it can look exactly like an
old-style backreference (see
L</Disambiguation rules between old-style octal escapes and backreferences>
below.) You can avoid this by making the first of the three digits always a
zero, but that makes \077 the largest code point specifiable.
In some contexts, a backslash followed by two or even one octal digits may be
interpreted as an octal escape, sometimes with a warning, and because of some
bugs, sometimes with surprising results. Also, if you are creating a regex
out of smaller snippets concatenated together, and you use fewer than three
digits, the beginning of one snippet may be interpreted as adding digits to the
ending of the snippet before it. See L</Absolute referencing> for more
discussion and examples of the snippet problem.
Note that a character expressed as an octal escape is considered
a character without special meaning by the regex engine, and will match
"as is".
To summarize, the C<\o{}> form is always safe to use, and the other form is
safe to use for code points through \077 when you use exactly three digits to
specify them.
Mnemonic: I<0>ctal or I<o>ctal.
=head4 Examples (assuming an ASCII platform)
$str = "Perl";
$str =~ /\o{120}/; # Match, "\120" is "P".
$str =~ /\120/; # Same.
$str =~ /\o{120}+/; # Match, "\120" is "P",
# it's repeated at least once.
$str =~ /\120+/; # Same.
$str =~ /P\053/; # No match, "\053" is "+" and taken literally.
/\o{23073}/ # Black foreground, white background smiling face.
/\o{4801234567}/ # Raises a warning, and yields chr(4).
=head4 Disambiguation rules between old-style octal escapes and backreferences
Octal escapes of the C<\000> form outside of bracketed character classes
potentially clash with old-style backreferences (see L</Absolute referencing>
below). They both consist of a backslash followed by numbers. So Perl has to
use heuristics to determine whether it is a backreference or an octal escape.
Perl uses the following rules to disambiguate:
=over 4
=item 1
If the backslash is followed by a single digit, it's a backreference.
=item 2
If the first digit following the backslash is a 0, it's an octal escape.
=item 3
If the number following the backslash is N (in decimal), and Perl already
has seen N capture groups, Perl considers this a backreference. Otherwise,
it considers it an octal escape. If N has more than three digits, Perl
takes only the first three for the octal escape; the rest are matched as is.
my $pat = "(" x 999;
$pat .= "a";
$pat .= ")" x 999;
/^($pat)\1000$/; # Matches 'aa'; there are 1000 capture groups.
/^$pat\1000$/; # Matches 'a@0'; there are 999 capture groups
# and \1000 is seen as \100 (a '@') and a '0'.
=back
You can force a backreference interpretation always by using the C<\g{...}>
form. You can the force an octal interpretation always by using the C<\o{...}>
form, or for numbers up through \077 (= 63 decimal), by using three digits,
beginning with a "0".
=head3 Hexadecimal escapes
Like octal escapes, there are two forms of hexadecimal escapes, but both start
with the sequence C<\x>. This is followed by either exactly two hexadecimal
digits forming a number, or a hexadecimal number of arbitrary length surrounded
by curly braces. The hexadecimal number is the code point of the character you
want to express.
Note that a character expressed as one of these escapes is considered a
character without special meaning by the regex engine, and will match
"as is".
Mnemonic: heI<x>adecimal.
=head4 Examples (assuming an ASCII platform)
$str = "Perl";
$str =~ /\x50/; # Match, "\x50" is "P".
$str =~ /\x50+/; # Match, "\x50" is "P", it is repeated at least once
$str =~ /P\x2B/; # No match, "\x2B" is "+" and taken literally.
/\x{2603}\x{2602}/ # Snowman with an umbrella.
# The Unicode character 2603 is a snowman,
# the Unicode character 2602 is an umbrella.
/\x{263B}/ # Black smiling face.
/\x{263b}/ # Same, the hex digits A - F are case insensitive.
=head2 Modifiers
A number of backslash sequences have to do with changing the character,
or characters following them. C<\l> will lowercase the character following
it, while C<\u> will uppercase (or, more accurately, titlecase) the
character following it. They provide functionality similar to the
functions C<lcfirst> and C<ucfirst>.
To uppercase or lowercase several characters, one might want to use
C<\L> or C<\U>, which will lowercase/uppercase all characters following
them, until either the end of the pattern or the next occurrence of
C<\E>, whichever comes first. They provide functionality similar to what
the functions C<lc> and C<uc> provide.
C<\Q> is used to quote (disable) pattern metacharacters, up to the next
C<\E> or the end of the pattern. C<\Q> adds a backslash to any character
that could have special meaning to Perl. In the ASCII range, it quotes
every character that isn't a letter, digit, or underscore. See
L<perlfunc/quotemeta> for details on what gets quoted for non-ASCII
code points. Using this ensures that any character between C<\Q> and
C<\E> will be matched literally, not interpreted as a metacharacter by
the regex engine.
C<\F> can be used to casefold all characters following, up to the next C<\E>
or the end of the pattern. It provides the functionality similar to
the C<fc> function.
Mnemonic: I<L>owercase, I<U>ppercase, I<F>old-case, I<Q>uotemeta, I<E>nd.
=head4 Examples
$sid = "sid";
$greg = "GrEg";
$miranda = "(Miranda)";
$str =~ /\u$sid/; # Matches 'Sid'
$str =~ /\L$greg/; # Matches 'greg'
$str =~ /\Q$miranda\E/; # Matches '(Miranda)', as if the pattern
# had been written as /\(Miranda\)/
=head2 Character classes
Perl regular expressions have a large range of character classes. Some of
the character classes are written as a backslash sequence. We will briefly
discuss those here; full details of character classes can be found in
L<perlrecharclass>.
C<\w> is a character class that matches any single I<word> character
(letters, digits, Unicode marks, and connector punctuation (like the
underscore)). C<\d> is a character class that matches any decimal
digit, while the character class C<\s> matches any whitespace character.
New in perl 5.10.0 are the classes C<\h> and C<\v> which match horizontal
and vertical whitespace characters.
The exact set of characters matched by C<\d>, C<\s>, and C<\w> varies
depending on various pragma and regular expression modifiers. It is
possible to restrict the match to the ASCII range by using the C</a>
regular expression modifier. See L<perlrecharclass>.
The uppercase variants (C<\W>, C<\D>, C<\S>, C<\H>, and C<\V>) are
character classes that match, respectively, any character that isn't a
word character, digit, whitespace, horizontal whitespace, or vertical
whitespace.
Mnemonics: I<w>ord, I<d>igit, I<s>pace, I<h>orizontal, I<v>ertical.
=head3 Unicode classes
C<\pP> (where C<P> is a single letter) and C<\p{Property}> are used to
match a character that matches the given Unicode property; properties
include things like "letter", or "thai character". Capitalizing the
sequence to C<\PP> and C<\P{Property}> make the sequence match a character
that doesn't match the given Unicode property. For more details, see
L<perlrecharclass/Backslash sequences> and
L<perlunicode/Unicode Character Properties>.
Mnemonic: I<p>roperty.
=head2 Referencing
If capturing parenthesis are used in a regular expression, we can refer
to the part of the source string that was matched, and match exactly the
same thing. There are three ways of referring to such I<backreference>:
absolutely, relatively, and by name.
=for later add link to perlrecapture
=head3 Absolute referencing
Either C<\gI<N>> (starting in Perl 5.10.0), or C<\I<N>> (old-style) where I<N>
is a positive (unsigned) decimal number of any length is an absolute reference
to a capturing group.
I<N> refers to the Nth set of parentheses, so C<\gI<N>> refers to whatever has
been matched by that set of parentheses. Thus C<\g1> refers to the first
capture group in the regex.
The C<\gI<N>> form can be equivalently written as C<\g{I<N>}>
which avoids ambiguity when building a regex by concatenating shorter
strings. Otherwise if you had a regex C<qr/$a$b/>, and C<$a> contained
C<"\g1">, and C<$b> contained C<"37">, you would get C</\g137/> which is
probably not what you intended.
In the C<\I<N>> form, I<N> must not begin with a "0", and there must be at
least I<N> capturing groups, or else I<N> is considered an octal escape
(but something like C<\18> is the same as C<\0018>; that is, the octal escape
C<"\001"> followed by a literal digit C<"8">).
Mnemonic: I<g>roup.
=head4 Examples
/(\w+) \g1/; # Finds a duplicated word, (e.g. "cat cat").
/(\w+) \1/; # Same thing; written old-style.
/(.)(.)\g2\g1/; # Match a four letter palindrome (e.g. "ABBA").
=head3 Relative referencing
C<\g-I<N>> (starting in Perl 5.10.0) is used for relative addressing. (It can
be written as C<\g{-I<N>>.) It refers to the I<N>th group before the
C<\g{-I<N>}>.
The big advantage of this form is that it makes it much easier to write
patterns with references that can be interpolated in larger patterns,
even if the larger pattern also contains capture groups.
=head4 Examples
/(A) # Group 1
( # Group 2
(B) # Group 3
\g{-1} # Refers to group 3 (B)
\g{-3} # Refers to group 1 (A)
)
/x; # Matches "ABBA".
my $qr = qr /(.)(.)\g{-2}\g{-1}/; # Matches 'abab', 'cdcd', etc.
/$qr$qr/ # Matches 'ababcdcd'.
=head3 Named referencing
C<\g{I<name>}> (starting in Perl 5.10.0) can be used to back refer to a
named capture group, dispensing completely with having to think about capture
buffer positions.
To be compatible with .Net regular expressions, C<\g{name}> may also be
written as C<\k{name}>, C<< \k<name> >> or C<\k'name'>.
To prevent any ambiguity, I<name> must not start with a digit nor contain a
hyphen.
=head4 Examples
/(?<word>\w+) \g{word}/ # Finds duplicated word, (e.g. "cat cat")
/(?<word>\w+) \k{word}/ # Same.
/(?<word>\w+) \k<word>/ # Same.
/(?<letter1>.)(?<letter2>.)\g{letter2}\g{letter1}/
# Match a four letter palindrome (e.g. "ABBA")
=head2 Assertions
Assertions are conditions that have to be true; they don't actually
match parts of the substring. There are six assertions that are written as
backslash sequences.
=over 4
=item \A
C<\A> only matches at the beginning of the string. If the C</m> modifier
isn't used, then C</\A/> is equivalent to C</^/>. However, if the C</m>
modifier is used, then C</^/> matches internal newlines, but the meaning
of C</\A/> isn't changed by the C</m> modifier. C<\A> matches at the beginning
of the string regardless whether the C</m> modifier is used.
=item \z, \Z
C<\z> and C<\Z> match at the end of the string. If the C</m> modifier isn't
used, then C</\Z/> is equivalent to C</$/>; that is, it matches at the
end of the string, or one before the newline at the end of the string. If the
C</m> modifier is used, then C</$/> matches at internal newlines, but the
meaning of C</\Z/> isn't changed by the C</m> modifier. C<\Z> matches at
the end of the string (or just before a trailing newline) regardless whether
the C</m> modifier is used.
C<\z> is just like C<\Z>, except that it does not match before a trailing
newline. C<\z> matches at the end of the string only, regardless of the
modifiers used, and not just before a newline. It is how to anchor the
match to the true end of the string under all conditions.
=item \G
C<\G> is usually used only in combination with the C</g> modifier. If the
C</g> modifier is used and the match is done in scalar context, Perl
remembers where in the source string the last match ended, and the next time,
it will start the match from where it ended the previous time.
C<\G> matches the point where the previous match on that string ended,
or the beginning of that string if there was no previous match.
=for later add link to perlremodifiers
Mnemonic: I<G>lobal.
=item \b{}, \b, \B{}, \B
C<\b{...}>, available starting in v5.22, matches a boundary (between two
characters, or before the first character of the string, or after the
final character of the string) based on the Unicode rules for the
boundary type specified inside the braces. The boundary
types are given a few paragraphs below. C<\B{...}> matches at any place
between characters where C<\b{...}> of the same type doesn't match.
C<\b> when not immediately followed by a C<"{"> matches at any place
between a word (something matched by C<\w>) and a non-word character
(C<\W>); C<\B> when not immediately followed by a C<"{"> matches at any
place between characters where C<\b> doesn't match. To get better
word matching of natural language text, see L</\b{wb}> below.
C<\b>
and C<\B> assume there's a non-word character before the beginning and after
the end of the source string; so C<\b> will match at the beginning (or end)
of the source string if the source string begins (or ends) with a word
character. Otherwise, C<\B> will match.
Do not use something like C<\b=head\d\b> and expect it to match the
beginning of a line. It can't, because for there to be a boundary before
the non-word "=", there must be a word character immediately previous.
All plain C<\b> and C<\B> boundary determinations look for word
characters alone, not for
non-word characters nor for string ends. It may help to understand how
C<\b> and C<\B> work by equating them as follows:
\b really means (?:(?<=\w)(?!\w)|(?<!\w)(?=\w))
\B really means (?:(?<=\w)(?=\w)|(?<!\w)(?!\w))
In contrast, C<\b{...}> and C<\B{...}> may or may not match at the
beginning and end of the line, depending on the boundary type. These
implement the Unicode default boundaries, specified in
L<http://www.unicode.org/reports/tr14/> and
L<http://www.unicode.org/reports/tr29/>.
The boundary types are:
=over
=item C<\b{gcb}> or C<\b{g}>
This matches a Unicode "Grapheme Cluster Boundary". (Actually Perl
always uses the improved "extended" grapheme cluster"). These are
explained below under L</C<\X>>. In fact, C<\X> is another way to get
the same functionality. It is equivalent to C</.+?\b{gcb}/>. Use
whichever is most convenient for your situation.
=item C<\b{lb}>
This matches according to the default Unicode Line Breaking Algorithm
(L<http://www.unicode.org/reports/tr14/>), as customized in that
document
(L<Example 7 of revision 35|http://www.unicode.org/reports/tr14/tr14-35.html#Example7>)
for better handling of numeric expressions.
This is suitable for many purposes, but the L<Unicode::LineBreak> module
is available on CPAN that provides many more features, including
customization.
=item C<\b{sb}>
This matches a Unicode "Sentence Boundary". This is an aid to parsing
natural language sentences. It gives good, but imperfect results. For
example, it thinks that "Mr. Smith" is two sentences. More details are
at L<http://www.unicode.org/reports/tr29/>. Note also that it thinks
that anything matching L</\R> (except form feed and vertical tab) is a
sentence boundary. C<\b{sb}> works with text designed for
word-processors which wrap lines
automatically for display, but hard-coded line boundaries are considered
to be essentially the ends of text blocks (paragraphs really), and hence
the ends of sententces. C<\b{sb}> doesn't do well with text containing
embedded newlines, like the source text of the document you are reading.
Such text needs to be preprocessed to get rid of the line separators
before looking for sentence boundaries. Some people view this as a bug
in the Unicode standard, and this behavior is quite subject to change in
future Perl versions.
=item C<\b{wb}>
This matches a Unicode "Word Boundary", but tailored to Perl
expectations. This gives better (though not
perfect) results for natural language processing than plain C<\b>
(without braces) does. For example, it understands that apostrophes can
be in the middle of words and that parentheses aren't (see the examples
below). More details are at L<http://www.unicode.org/reports/tr29/>.
The current Unicode definition of a Word Boundary matches between every
white space character. Perl tailors this, starting in version 5.24, to
generally not break up spans of white space, just as plain C<\b> has
always functioned. This allows C<\b{wb}> to be a drop-in replacement for
C<\b>, but with generally better results for natural language
processing. (The exception to this tailoring is when a span of white
space is immediately followed by something like U+0303, COMBINING TILDE.
If the final space character in the span is a horizontal white space, it
is broken out so that it attaches instead to the combining character.
To be precise, if a span of white space that ends in a horizontal space
has the character immediately following it have either of the Word
Boundary property values "Extend", "Format" or "ZWJ", the boundary between the
final horizontal space character and the rest of the span matches
C<\b{wb}>. In all other cases the boundary between two white space
characters matches C<\B{wb}>.)
=back
It is important to realize when you use these Unicode boundaries,
that you are taking a risk that a future version of Perl which contains
a later version of the Unicode Standard will not work precisely the same
way as it did when your code was written. These rules are not
considered stable and have been somewhat more subject to change than the
rest of the Standard. Unicode reserves the right to change them at
will, and Perl reserves the right to update its implementation to
Unicode's new rules. In the past, some changes have been because new
characters have been added to the Standard which have different
characteristics than all previous characters, so new rules are
formulated for handling them. These should not cause any backward
compatibility issues. But some changes have changed the treatment of
existing characters because the Unicode Technical Committee has decided
that the change is warranted for whatever reason. This could be to fix
a bug, or because they think better results are obtained with the new
rule.
It is also important to realize that these are default boundary
definitions, and that implementations may wish to tailor the results for
particular purposes and locales. For example, some languages, such as
Japanese and Thai, require dictionary lookup to determine word
boundaries.
Mnemonic: I<b>oundary.
=back
=head4 Examples
"cat" =~ /\Acat/; # Match.
"cat" =~ /cat\Z/; # Match.
"cat\n" =~ /cat\Z/; # Match.
"cat\n" =~ /cat\z/; # No match.
"cat" =~ /\bcat\b/; # Matches.
"cats" =~ /\bcat\b/; # No match.
"cat" =~ /\bcat\B/; # No match.
"cats" =~ /\bcat\B/; # Match.
while ("cat dog" =~ /(\w+)/g) {
print $1; # Prints 'catdog'
}
while ("cat dog" =~ /\G(\w+)/g) {
print $1; # Prints 'cat'
}
my $s = "He said, \"Is pi 3.14? (I'm not sure).\"";
print join("|", $s =~ m/ ( .+? \b ) /xg), "\n";
print join("|", $s =~ m/ ( .+? \b{wb} ) /xg), "\n";
prints
He| |said|, "|Is| |pi| |3|.|14|? (|I|'|m| |not| |sure
He| |said|,| |"|Is| |pi| |3.14|?| |(|I'm| |not| |sure|)|.|"
=head2 Misc
Here we document the backslash sequences that don't fall in one of the
categories above. These are:
=over 4
=item \K
This appeared in perl 5.10.0. Anything matched left of C<\K> is
not included in C<$&>, and will not be replaced if the pattern is
used in a substitution. This lets you write C<s/PAT1 \K PAT2/REPL/x>
instead of C<s/(PAT1) PAT2/${1}REPL/x> or C<s/(?<=PAT1) PAT2/REPL/x>.
Mnemonic: I<K>eep.
=item \N
This feature, available starting in v5.12, matches any character
that is B<not> a newline. It is a short-hand for writing C<[^\n]>, and is
identical to the C<.> metasymbol, except under the C</s> flag, which changes
the meaning of C<.>, but not C<\N>.
Note that C<\N{...}> can mean a
L<named or numbered character
|/Named or numbered characters and character sequences>.
Mnemonic: Complement of I<\n>.
=item \R
X<\R>
C<\R> matches a I<generic newline>; that is, anything considered a
linebreak sequence by Unicode. This includes all characters matched by
C<\v> (vertical whitespace), and the multi character sequence C<"\x0D\x0A">
(carriage return followed by a line feed, sometimes called the network
newline; it's the end of line sequence used in Microsoft text files opened
in binary mode). C<\R> is equivalent to C<< (?>\x0D\x0A|\v) >>. (The
reason it doesn't backtrack is that the sequence is considered
inseparable. That means that
"\x0D\x0A" =~ /^\R\x0A$/ # No match
fails, because the C<\R> matches the entire string, and won't backtrack
to match just the C<"\x0D">.) Since
C<\R> can match a sequence of more than one character, it cannot be put
inside a bracketed character class; C</[\R]/> is an error; use C<\v>
instead. C<\R> was introduced in perl 5.10.0.
Note that this does not respect any locale that might be in effect; it
matches according to the platform's native character set.
Mnemonic: none really. C<\R> was picked because PCRE already uses C<\R>,
and more importantly because Unicode recommends such a regular expression
metacharacter, and suggests C<\R> as its notation.
=item \X
X<\X>
This matches a Unicode I<extended grapheme cluster>.
C<\X> matches quite well what normal (non-Unicode-programmer) usage
would consider a single character. As an example, consider a G with some sort
of diacritic mark, such as an arrow. There is no such single character in
Unicode, but one can be composed by using a G followed by a Unicode "COMBINING
UPWARDS ARROW BELOW", and would be displayed by Unicode-aware software as if it
were a single character.
The match is greedy and non-backtracking, so that the cluster is never
broken up into smaller components.
See also L<C<\b{gcb}>|/\b{}, \b, \B{}, \B>.
Mnemonic: eI<X>tended Unicode character.
=back
=head4 Examples
$str =~ s/foo\Kbar/baz/g; # Change any 'bar' following a 'foo' to 'baz'
$str =~ s/(.)\K\g1//g; # Delete duplicated characters.
"\n" =~ /^\R$/; # Match, \n is a generic newline.
"\r" =~ /^\R$/; # Match, \r is a generic newline.
"\r\n" =~ /^\R$/; # Match, \r\n is a generic newline.
"P\x{307}" =~ /^\X$/ # \X matches a P with a dot above.
=cut
PK y3�Zd��g�� �� perl5100delta.podnu �[��� =encoding utf8
=head1 NAME
perl5100delta - what is new for perl 5.10.0
=head1 DESCRIPTION
This document describes the differences between the 5.8.8 release and
the 5.10.0 release.
Many of the bug fixes in 5.10.0 were already seen in the 5.8.X maintenance
releases; they are not duplicated here and are documented in the set of
man pages named perl58[1-8]?delta.
=head1 Core Enhancements
=head2 The C<feature> pragma
The C<feature> pragma is used to enable new syntax that would break Perl's
backwards-compatibility with older releases of the language. It's a lexical
pragma, like C<strict> or C<warnings>.
Currently the following new features are available: C<switch> (adds a
switch statement), C<say> (adds a C<say> built-in function), and C<state>
(adds a C<state> keyword for declaring "static" variables). Those
features are described in their own sections of this document.
The C<feature> pragma is also implicitly loaded when you require a minimal
perl version (with the C<use VERSION> construct) greater than, or equal
to, 5.9.5. See L<feature> for details.
=head2 New B<-E> command-line switch
B<-E> is equivalent to B<-e>, but it implicitly enables all
optional features (like C<use feature ":5.10">).
=head2 Defined-or operator
A new operator C<//> (defined-or) has been implemented.
The following expression:
$a // $b
is merely equivalent to
defined $a ? $a : $b
and the statement
$c //= $d;
can now be used instead of
$c = $d unless defined $c;
The C<//> operator has the same precedence and associativity as C<||>.
Special care has been taken to ensure that this operator Do What You Mean
while not breaking old code, but some edge cases involving the empty
regular expression may now parse differently. See L<perlop> for
details.
=head2 Switch and Smart Match operator
Perl 5 now has a switch statement. It's available when C<use feature
'switch'> is in effect. This feature introduces three new keywords,
C<given>, C<when>, and C<default>:
given ($foo) {
when (/^abc/) { $abc = 1; }
when (/^def/) { $def = 1; }
when (/^xyz/) { $xyz = 1; }
default { $nothing = 1; }
}
A more complete description of how Perl matches the switch variable
against the C<when> conditions is given in L<perlsyn/"Switch statements">.
This kind of match is called I<smart match>, and it's also possible to use
it outside of switch statements, via the new C<~~> operator. See
L<perlsyn/"Smart matching in detail">.
This feature was contributed by Robin Houston.
=head2 Regular expressions
=over 4
=item Recursive Patterns
It is now possible to write recursive patterns without using the C<(??{})>
construct. This new way is more efficient, and in many cases easier to
read.
Each capturing parenthesis can now be treated as an independent pattern
that can be entered by using the C<(?PARNO)> syntax (C<PARNO> standing for
"parenthesis number"). For example, the following pattern will match
nested balanced angle brackets:
/
^ # start of line
( # start capture buffer 1
< # match an opening angle bracket
(?: # match one of:
(?> # don't backtrack over the inside of this group
[^<>]+ # one or more non angle brackets
) # end non backtracking group
| # ... or ...
(?1) # recurse to bracket 1 and try it again
)* # 0 or more times.
> # match a closing angle bracket
) # end capture buffer one
$ # end of line
/x
PCRE users should note that Perl's recursive regex feature allows
backtracking into a recursed pattern, whereas in PCRE the recursion is
atomic or "possessive" in nature. As in the example above, you can
add (?>) to control this selectively. (Yves Orton)
=item Named Capture Buffers
It is now possible to name capturing parenthesis in a pattern and refer to
the captured contents by name. The naming syntax is C<< (?<NAME>....) >>.
It's possible to backreference to a named buffer with the C<< \k<NAME> >>
syntax. In code, the new magical hashes C<%+> and C<%-> can be used to
access the contents of the capture buffers.
Thus, to replace all doubled chars with a single copy, one could write
s/(?<letter>.)\k<letter>/$+{letter}/g
Only buffers with defined contents will be "visible" in the C<%+> hash, so
it's possible to do something like
foreach my $name (keys %+) {
print "content of buffer '$name' is $+{$name}\n";
}
The C<%-> hash is a bit more complete, since it will contain array refs
holding values from all capture buffers similarly named, if there should
be many of them.
C<%+> and C<%-> are implemented as tied hashes through the new module
C<Tie::Hash::NamedCapture>.
Users exposed to the .NET regex engine will find that the perl
implementation differs in that the numerical ordering of the buffers
is sequential, and not "unnamed first, then named". Thus in the pattern
/(A)(?<B>B)(C)(?<D>D)/
$1 will be 'A', $2 will be 'B', $3 will be 'C' and $4 will be 'D' and not
$1 is 'A', $2 is 'C' and $3 is 'B' and $4 is 'D' that a .NET programmer
would expect. This is considered a feature. :-) (Yves Orton)
=item Possessive Quantifiers
Perl now supports the "possessive quantifier" syntax of the "atomic match"
pattern. Basically a possessive quantifier matches as much as it can and never
gives any back. Thus it can be used to control backtracking. The syntax is
similar to non-greedy matching, except instead of using a '?' as the modifier
the '+' is used. Thus C<?+>, C<*+>, C<++>, C<{min,max}+> are now legal
quantifiers. (Yves Orton)
=item Backtracking control verbs
The regex engine now supports a number of special-purpose backtrack
control verbs: (*THEN), (*PRUNE), (*MARK), (*SKIP), (*COMMIT), (*FAIL)
and (*ACCEPT). See L<perlre> for their descriptions. (Yves Orton)
=item Relative backreferences
A new syntax C<\g{N}> or C<\gN> where "N" is a decimal integer allows a
safer form of back-reference notation as well as allowing relative
backreferences. This should make it easier to generate and embed patterns
that contain backreferences. See L<perlre/"Capture buffers">. (Yves Orton)
=item C<\K> escape
The functionality of Jeff Pinyan's module Regexp::Keep has been added to
the core. In regular expressions you can now use the special escape C<\K>
as a way to do something like floating length positive lookbehind. It is
also useful in substitutions like:
s/(foo)bar/$1/g
that can now be converted to
s/foo\Kbar//g
which is much more efficient. (Yves Orton)
=item Vertical and horizontal whitespace, and linebreak
Regular expressions now recognize the C<\v> and C<\h> escapes that match
vertical and horizontal whitespace, respectively. C<\V> and C<\H>
logically match their complements.
C<\R> matches a generic linebreak, that is, vertical whitespace, plus
the multi-character sequence C<"\x0D\x0A">.
=item Optional pre-match and post-match captures with the /p flag
There is a new flag C</p> for regular expressions. Using this
makes the engine preserve a copy of the part of the matched string before
the matching substring to the new special variable C<${^PREMATCH}>, the
part after the matching substring to C<${^POSTMATCH}>, and the matched
substring itself to C<${^MATCH}>.
Perl is still able to store these substrings to the special variables
C<$`>, C<$'>, C<$&>, but using these variables anywhere in the program
adds a penalty to all regular expression matches, whereas if you use
the C</p> flag and the new special variables instead, you pay only for
the regular expressions where the flag is used.
For more detail on the new variables, see L<perlvar>; for the use of
the regular expression flag, see L<perlop> and L<perlre>.
=back
=head2 C<say()>
say() is a new built-in, only available when C<use feature 'say'> is in
effect, that is similar to print(), but that implicitly appends a newline
to the printed string. See L<perlfunc/say>. (Robin Houston)
=head2 Lexical C<$_>
The default variable C<$_> can now be lexicalized, by declaring it like
any other lexical variable, with a simple
my $_;
The operations that default on C<$_> will use the lexically-scoped
version of C<$_> when it exists, instead of the global C<$_>.
In a C<map> or a C<grep> block, if C<$_> was previously my'ed, then the
C<$_> inside the block is lexical as well (and scoped to the block).
In a scope where C<$_> has been lexicalized, you can still have access to
the global version of C<$_> by using C<$::_>, or, more simply, by
overriding the lexical declaration with C<our $_>. (Rafael Garcia-Suarez)
=head2 The C<_> prototype
A new prototype character has been added. C<_> is equivalent to C<$> but
defaults to C<$_> if the corresponding argument isn't supplied (both C<$>
and C<_> denote a scalar). Due to the optional nature of the argument,
you can only use it at the end of a prototype, or before a semicolon.
This has a small incompatible consequence: the prototype() function has
been adjusted to return C<_> for some built-ins in appropriate cases (for
example, C<prototype('CORE::rmdir')>). (Rafael Garcia-Suarez)
=head2 UNITCHECK blocks
C<UNITCHECK>, a new special code block has been introduced, in addition to
C<BEGIN>, C<CHECK>, C<INIT> and C<END>.
C<CHECK> and C<INIT> blocks, while useful for some specialized purposes,
are always executed at the transition between the compilation and the
execution of the main program, and thus are useless whenever code is
loaded at runtime. On the other hand, C<UNITCHECK> blocks are executed
just after the unit which defined them has been compiled. See L<perlmod>
for more information. (Alex Gough)
=head2 New Pragma, C<mro>
A new pragma, C<mro> (for Method Resolution Order) has been added. It
permits to switch, on a per-class basis, the algorithm that perl uses to
find inherited methods in case of a multiple inheritance hierarchy. The
default MRO hasn't changed (DFS, for Depth First Search). Another MRO is
available: the C3 algorithm. See L<mro> for more information.
(Brandon Black)
Note that, due to changes in the implementation of class hierarchy search,
code that used to undef the C<*ISA> glob will most probably break. Anyway,
undef'ing C<*ISA> had the side-effect of removing the magic on the @ISA
array and should not have been done in the first place. Also, the
cache C<*::ISA::CACHE::> no longer exists; to force reset the @ISA cache,
you now need to use the C<mro> API, or more simply to assign to @ISA
(e.g. with C<@ISA = @ISA>).
=head2 readdir() may return a "short filename" on Windows
The readdir() function may return a "short filename" when the long
filename contains characters outside the ANSI codepage. Similarly
Cwd::cwd() may return a short directory name, and glob() may return short
names as well. On the NTFS file system these short names can always be
represented in the ANSI codepage. This will not be true for all other file
system drivers; e.g. the FAT filesystem stores short filenames in the OEM
codepage, so some files on FAT volumes remain unaccessible through the
ANSI APIs.
Similarly, $^X, @INC, and $ENV{PATH} are preprocessed at startup to make
sure all paths are valid in the ANSI codepage (if possible).
The Win32::GetLongPathName() function now returns the UTF-8 encoded
correct long file name instead of using replacement characters to force
the name into the ANSI codepage. The new Win32::GetANSIPathName()
function can be used to turn a long pathname into a short one only if the
long one cannot be represented in the ANSI codepage.
Many other functions in the C<Win32> module have been improved to accept
UTF-8 encoded arguments. Please see L<Win32> for details.
=head2 readpipe() is now overridable
The built-in function readpipe() is now overridable. Overriding it permits
also to override its operator counterpart, C<qx//> (a.k.a. C<``>).
Moreover, it now defaults to C<$_> if no argument is provided. (Rafael
Garcia-Suarez)
=head2 Default argument for readline()
readline() now defaults to C<*ARGV> if no argument is provided. (Rafael
Garcia-Suarez)
=head2 state() variables
A new class of variables has been introduced. State variables are similar
to C<my> variables, but are declared with the C<state> keyword in place of
C<my>. They're visible only in their lexical scope, but their value is
persistent: unlike C<my> variables, they're not undefined at scope entry,
but retain their previous value. (Rafael Garcia-Suarez, Nicholas Clark)
To use state variables, one needs to enable them by using
use feature 'state';
or by using the C<-E> command-line switch in one-liners.
See L<perlsub/"Persistent Private Variables">.
=head2 Stacked filetest operators
As a new form of syntactic sugar, it's now possible to stack up filetest
operators. You can now write C<-f -w -x $file> in a row to mean
C<-x $file && -w _ && -f _>. See L<perlfunc/-X>.
=head2 UNIVERSAL::DOES()
The C<UNIVERSAL> class has a new method, C<DOES()>. It has been added to
solve semantic problems with the C<isa()> method. C<isa()> checks for
inheritance, while C<DOES()> has been designed to be overridden when
module authors use other types of relations between classes (in addition
to inheritance). (chromatic)
See L<< UNIVERSAL/"$obj->DOES( ROLE )" >>.
=head2 Formats
Formats were improved in several ways. A new field, C<^*>, can be used for
variable-width, one-line-at-a-time text. Null characters are now handled
correctly in picture lines. Using C<@#> and C<~~> together will now
produce a compile-time error, as those format fields are incompatible.
L<perlform> has been improved, and miscellaneous bugs fixed.
=head2 Byte-order modifiers for pack() and unpack()
There are two new byte-order modifiers, C<E<gt>> (big-endian) and C<E<lt>>
(little-endian), that can be appended to most pack() and unpack() template
characters and groups to force a certain byte-order for that type or group.
See L<perlfunc/pack> and L<perlpacktut> for details.
=head2 C<no VERSION>
You can now use C<no> followed by a version number to specify that you
want to use a version of perl older than the specified one.
=head2 C<chdir>, C<chmod> and C<chown> on filehandles
C<chdir>, C<chmod> and C<chown> can now work on filehandles as well as
filenames, if the system supports respectively C<fchdir>, C<fchmod> and
C<fchown>, thanks to a patch provided by Gisle Aas.
=head2 OS groups
C<$(> and C<$)> now return groups in the order where the OS returns them,
thanks to Gisle Aas. This wasn't previously the case.
=head2 Recursive sort subs
You can now use recursive subroutines with sort(), thanks to Robin Houston.
=head2 Exceptions in constant folding
The constant folding routine is now wrapped in an exception handler, and
if folding throws an exception (such as attempting to evaluate 0/0), perl
now retains the current optree, rather than aborting the whole program.
Without this change, programs would not compile if they had expressions that
happened to generate exceptions, even though those expressions were in code
that could never be reached at runtime. (Nicholas Clark, Dave Mitchell)
=head2 Source filters in @INC
It's possible to enhance the mechanism of subroutine hooks in @INC by
adding a source filter on top of the filehandle opened and returned by the
hook. This feature was planned a long time ago, but wasn't quite working
until now. See L<perlfunc/require> for details. (Nicholas Clark)
=head2 New internal variables
=over 4
=item C<${^RE_DEBUG_FLAGS}>
This variable controls what debug flags are in effect for the regular
expression engine when running under C<use re "debug">. See L<re> for
details.
=item C<${^CHILD_ERROR_NATIVE}>
This variable gives the native status returned by the last pipe close,
backtick command, successful call to wait() or waitpid(), or from the
system() operator. See L<perlvar> for details. (Contributed by Gisle Aas.)
=item C<${^RE_TRIE_MAXBUF}>
See L</"Trie optimisation of literal string alternations">.
=item C<${^WIN32_SLOPPY_STAT}>
See L</"Sloppy stat on Windows">.
=back
=head2 Miscellaneous
C<unpack()> now defaults to unpacking the C<$_> variable.
C<mkdir()> without arguments now defaults to C<$_>.
The internal dump output has been improved, so that non-printable characters
such as newline and backspace are output in C<\x> notation, rather than
octal.
The B<-C> option can no longer be used on the C<#!> line. It wasn't
working there anyway, since the standard streams are already set up
at this point in the execution of the perl interpreter. You can use
binmode() instead to get the desired behaviour.
=head2 UCD 5.0.0
The copy of the Unicode Character Database included in Perl 5 has
been updated to version 5.0.0.
=head2 MAD
MAD, which stands for I<Miscellaneous Attribute Decoration>, is a
still-in-development work leading to a Perl 5 to Perl 6 converter. To
enable it, it's necessary to pass the argument C<-Dmad> to Configure. The
obtained perl isn't binary compatible with a regular perl 5.10, and has
space and speed penalties; moreover not all regression tests still pass
with it. (Larry Wall, Nicholas Clark)
=head2 kill() on Windows
On Windows platforms, C<kill(-9, $pid)> now kills a process tree.
(On Unix, this delivers the signal to all processes in the same process
group.)
=head1 Incompatible Changes
=head2 Packing and UTF-8 strings
The semantics of pack() and unpack() regarding UTF-8-encoded data has been
changed. Processing is now by default character per character instead of
byte per byte on the underlying encoding. Notably, code that used things
like C<pack("a*", $string)> to see through the encoding of string will now
simply get back the original $string. Packed strings can also get upgraded
during processing when you store upgraded characters. You can get the old
behaviour by using C<use bytes>.
To be consistent with pack(), the C<C0> in unpack() templates indicates
that the data is to be processed in character mode, i.e. character by
character; on the contrary, C<U0> in unpack() indicates UTF-8 mode, where
the packed string is processed in its UTF-8-encoded Unicode form on a byte
by byte basis. This is reversed with regard to perl 5.8.X, but now consistent
between pack() and unpack().
Moreover, C<C0> and C<U0> can also be used in pack() templates to specify
respectively character and byte modes.
C<C0> and C<U0> in the middle of a pack or unpack format now switch to the
specified encoding mode, honoring parens grouping. Previously, parens were
ignored.
Also, there is a new pack() character format, C<W>, which is intended to
replace the old C<C>. C<C> is kept for unsigned chars coded as bytes in
the strings internal representation. C<W> represents unsigned (logical)
character values, which can be greater than 255. It is therefore more
robust when dealing with potentially UTF-8-encoded data (as C<C> will wrap
values outside the range 0..255, and not respect the string encoding).
In practice, that means that pack formats are now encoding-neutral, except
C<C>.
For consistency, C<A> in unpack() format now trims all Unicode whitespace
from the end of the string. Before perl 5.9.2, it used to strip only the
classical ASCII space characters.
=head2 Byte/character count feature in unpack()
A new unpack() template character, C<".">, returns the number of bytes or
characters (depending on the selected encoding mode, see above) read so far.
=head2 The C<$*> and C<$#> variables have been removed
C<$*>, which was deprecated in favor of the C</s> and C</m> regexp
modifiers, has been removed.
The deprecated C<$#> variable (output format for numbers) has been
removed.
Two new severe warnings, C<$#/$* is no longer supported>, have been added.
=head2 substr() lvalues are no longer fixed-length
The lvalues returned by the three argument form of substr() used to be a
"fixed length window" on the original string. In some cases this could
cause surprising action at distance or other undefined behaviour. Now the
length of the window adjusts itself to the length of the string assigned to
it.
=head2 Parsing of C<-f _>
The identifier C<_> is now forced to be a bareword after a filetest
operator. This solves a number of misparsing issues when a global C<_>
subroutine is defined.
=head2 C<:unique>
The C<:unique> attribute has been made a no-op, since its current
implementation was fundamentally flawed and not threadsafe.
=head2 Effect of pragmas in eval
The compile-time value of the C<%^H> hint variable can now propagate into
eval("")uated code. This makes it more useful to implement lexical
pragmas.
As a side-effect of this, the overloaded-ness of constants now propagates
into eval("").
=head2 chdir FOO
A bareword argument to chdir() is now recognized as a file handle.
Earlier releases interpreted the bareword as a directory name.
(Gisle Aas)
=head2 Handling of .pmc files
An old feature of perl was that before C<require> or C<use> look for a
file with a F<.pm> extension, they will first look for a similar filename
with a F<.pmc> extension. If this file is found, it will be loaded in
place of any potentially existing file ending in a F<.pm> extension.
Previously, F<.pmc> files were loaded only if more recent than the
matching F<.pm> file. Starting with 5.9.4, they'll be always loaded if
they exist.
=head2 $^V is now a C<version> object instead of a v-string
$^V can still be used with the C<%vd> format in printf, but any
character-level operations will now access the string representation
of the C<version> object and not the ordinals of a v-string.
Expressions like C<< substr($^V, 0, 2) >> or C<< split //, $^V >>
no longer work and must be rewritten.
=head2 @- and @+ in patterns
The special arrays C<@-> and C<@+> are no longer interpolated in regular
expressions. (Sadahiro Tomoyuki)
=head2 $AUTOLOAD can now be tainted
If you call a subroutine by a tainted name, and if it defers to an
AUTOLOAD function, then $AUTOLOAD will be (correctly) tainted.
(Rick Delaney)
=head2 Tainting and printf
When perl is run under taint mode, C<printf()> and C<sprintf()> will now
reject any tainted format argument. (Rafael Garcia-Suarez)
=head2 undef and signal handlers
Undefining or deleting a signal handler via C<undef $SIG{FOO}> is now
equivalent to setting it to C<'DEFAULT'>. (Rafael Garcia-Suarez)
=head2 strictures and dereferencing in defined()
C<use strict 'refs'> was ignoring taking a hard reference in an argument
to defined(), as in :
use strict 'refs';
my $x = 'foo';
if (defined $$x) {...}
This now correctly produces the run-time error C<Can't use string as a
SCALAR ref while "strict refs" in use>.
C<defined @$foo> and C<defined %$bar> are now also subject to C<strict
'refs'> (that is, C<$foo> and C<$bar> shall be proper references there.)
(C<defined(@foo)> and C<defined(%bar)> are discouraged constructs anyway.)
(Nicholas Clark)
=head2 C<(?p{})> has been removed
The regular expression construct C<(?p{})>, which was deprecated in perl
5.8, has been removed. Use C<(??{})> instead. (Rafael Garcia-Suarez)
=head2 Pseudo-hashes have been removed
Support for pseudo-hashes has been removed from Perl 5.9. (The C<fields>
pragma remains here, but uses an alternate implementation.)
=head2 Removal of the bytecode compiler and of perlcc
C<perlcc>, the byteloader and the supporting modules (B::C, B::CC,
B::Bytecode, etc.) are no longer distributed with the perl sources. Those
experimental tools have never worked reliably, and, due to the lack of
volunteers to keep them in line with the perl interpreter developments, it
was decided to remove them instead of shipping a broken version of those.
The last version of those modules can be found with perl 5.9.4.
However the B compiler framework stays supported in the perl core, as with
the more useful modules it has permitted (among others, B::Deparse and
B::Concise).
=head2 Removal of the JPL
The JPL (Java-Perl Lingo) has been removed from the perl sources tarball.
=head2 Recursive inheritance detected earlier
Perl will now immediately throw an exception if you modify any package's
C<@ISA> in such a way that it would cause recursive inheritance.
Previously, the exception would not occur until Perl attempted to make
use of the recursive inheritance while resolving a method or doing a
C<$foo-E<gt>isa($bar)> lookup.
=head2 warnings::enabled and warnings::warnif changed to favor users of modules
The behaviour in 5.10.x favors the person using the module;
The behaviour in 5.8.x favors the module writer;
Assume the following code:
main calls Foo::Bar::baz()
Foo::Bar inherits from Foo::Base
Foo::Bar::baz() calls Foo::Base::_bazbaz()
Foo::Base::_bazbaz() calls: warnings::warnif('substr', 'some warning
message');
On 5.8.x, the code warns when Foo::Bar contains C<use warnings;>
It does not matter if Foo::Base or main have warnings enabled
to disable the warning one has to modify Foo::Bar.
On 5.10.0 and newer, the code warns when main contains C<use warnings;>
It does not matter if Foo::Base or Foo::Bar have warnings enabled
to disable the warning one has to modify main.
=head1 Modules and Pragmata
=head2 Upgrading individual core modules
Even more core modules are now also available separately through the
CPAN. If you wish to update one of these modules, you don't need to
wait for a new perl release. From within the cpan shell, running the
'r' command will report on modules with upgrades available. See
C<perldoc CPAN> for more information.
=head2 Pragmata Changes
=over 4
=item C<feature>
The new pragma C<feature> is used to enable new features that might break
old code. See L</"The C<feature> pragma"> above.
=item C<mro>
This new pragma enables to change the algorithm used to resolve inherited
methods. See L</"New Pragma, C<mro>"> above.
=item Scoping of the C<sort> pragma
The C<sort> pragma is now lexically scoped. Its effect used to be global.
=item Scoping of C<bignum>, C<bigint>, C<bigrat>
The three numeric pragmas C<bignum>, C<bigint> and C<bigrat> are now
lexically scoped. (Tels)
=item C<base>
The C<base> pragma now warns if a class tries to inherit from itself.
(Curtis "Ovid" Poe)
=item C<strict> and C<warnings>
C<strict> and C<warnings> will now complain loudly if they are loaded via
incorrect casing (as in C<use Strict;>). (Johan Vromans)
=item C<version>
The C<version> module provides support for version objects.
=item C<warnings>
The C<warnings> pragma doesn't load C<Carp> anymore. That means that code
that used C<Carp> routines without having loaded it at compile time might
need to be adjusted; typically, the following (faulty) code won't work
anymore, and will require parentheses to be added after the function name:
use warnings;
require Carp;
Carp::confess 'argh';
=item C<less>
C<less> now does something useful (or at least it tries to). In fact, it
has been turned into a lexical pragma. So, in your modules, you can now
test whether your users have requested to use less CPU, or less memory,
less magic, or maybe even less fat. See L<less> for more. (Joshua ben
Jore)
=back
=head2 New modules
=over 4
=item *
C<encoding::warnings>, by Audrey Tang, is a module to emit warnings
whenever an ASCII character string containing high-bit bytes is implicitly
converted into UTF-8. It's a lexical pragma since Perl 5.9.4; on older
perls, its effect is global.
=item *
C<Module::CoreList>, by Richard Clamp, is a small handy module that tells
you what versions of core modules ship with any versions of Perl 5. It
comes with a command-line frontend, C<corelist>.
=item *
C<Math::BigInt::FastCalc> is an XS-enabled, and thus faster, version of
C<Math::BigInt::Calc>.
=item *
C<Compress::Zlib> is an interface to the zlib compression library. It
comes with a bundled version of zlib, so having a working zlib is not a
prerequisite to install it. It's used by C<Archive::Tar> (see below).
=item *
C<IO::Zlib> is an C<IO::>-style interface to C<Compress::Zlib>.
=item *
C<Archive::Tar> is a module to manipulate C<tar> archives.
=item *
C<Digest::SHA> is a module used to calculate many types of SHA digests,
has been included for SHA support in the CPAN module.
=item *
C<ExtUtils::CBuilder> and C<ExtUtils::ParseXS> have been added.
=item *
C<Hash::Util::FieldHash>, by Anno Siegel, has been added. This module
provides support for I<field hashes>: hashes that maintain an association
of a reference with a value, in a thread-safe garbage-collected way.
Such hashes are useful to implement inside-out objects.
=item *
C<Module::Build>, by Ken Williams, has been added. It's an alternative to
C<ExtUtils::MakeMaker> to build and install perl modules.
=item *
C<Module::Load>, by Jos Boumans, has been added. It provides a single
interface to load Perl modules and F<.pl> files.
=item *
C<Module::Loaded>, by Jos Boumans, has been added. It's used to mark
modules as loaded or unloaded.
=item *
C<Package::Constants>, by Jos Boumans, has been added. It's a simple
helper to list all constants declared in a given package.
=item *
C<Win32API::File>, by Tye McQueen, has been added (for Windows builds).
This module provides low-level access to Win32 system API calls for
files/dirs.
=item *
C<Locale::Maketext::Simple>, needed by CPANPLUS, is a simple wrapper around
C<Locale::Maketext::Lexicon>. Note that C<Locale::Maketext::Lexicon> isn't
included in the perl core; the behaviour of C<Locale::Maketext::Simple>
gracefully degrades when the later isn't present.
=item *
C<Params::Check> implements a generic input parsing/checking mechanism. It
is used by CPANPLUS.
=item *
C<Term::UI> simplifies the task to ask questions at a terminal prompt.
=item *
C<Object::Accessor> provides an interface to create per-object accessors.
=item *
C<Module::Pluggable> is a simple framework to create modules that accept
pluggable sub-modules.
=item *
C<Module::Load::Conditional> provides simple ways to query and possibly
load installed modules.
=item *
C<Time::Piece> provides an object oriented interface to time functions,
overriding the built-ins localtime() and gmtime().
=item *
C<IPC::Cmd> helps to find and run external commands, possibly
interactively.
=item *
C<File::Fetch> provide a simple generic file fetching mechanism.
=item *
C<Log::Message> and C<Log::Message::Simple> are used by the log facility
of C<CPANPLUS>.
=item *
C<Archive::Extract> is a generic archive extraction mechanism
for F<.tar> (plain, gzipped or bzipped) or F<.zip> files.
=item *
C<CPANPLUS> provides an API and a command-line tool to access the CPAN
mirrors.
=item *
C<Pod::Escapes> provides utilities that are useful in decoding Pod
EE<lt>...E<gt> sequences.
=item *
C<Pod::Simple> is now the backend for several of the Pod-related modules
included with Perl.
=back
=head2 Selected Changes to Core Modules
=over 4
=item C<Attribute::Handlers>
C<Attribute::Handlers> can now report the caller's file and line number.
(David Feldman)
All interpreted attributes are now passed as array references. (Damian
Conway)
=item C<B::Lint>
C<B::Lint> is now based on C<Module::Pluggable>, and so can be extended
with plugins. (Joshua ben Jore)
=item C<B>
It's now possible to access the lexical pragma hints (C<%^H>) by using the
method B::COP::hints_hash(). It returns a C<B::RHE> object, which in turn
can be used to get a hash reference via the method B::RHE::HASH(). (Joshua
ben Jore)
=item C<Thread>
As the old 5005thread threading model has been removed, in favor of the
ithreads scheme, the C<Thread> module is now a compatibility wrapper, to
be used in old code only. It has been removed from the default list of
dynamic extensions.
=back
=head1 Utility Changes
=over 4
=item perl -d
The Perl debugger can now save all debugger commands for sourcing later;
notably, it can now emulate stepping backwards, by restarting and
rerunning all bar the last command from a saved command history.
It can also display the parent inheritance tree of a given class, with the
C<i> command.
=item ptar
C<ptar> is a pure perl implementation of C<tar> that comes with
C<Archive::Tar>.
=item ptardiff
C<ptardiff> is a small utility used to generate a diff between the contents
of a tar archive and a directory tree. Like C<ptar>, it comes with
C<Archive::Tar>.
=item shasum
C<shasum> is a command-line utility, used to print or to check SHA
digests. It comes with the new C<Digest::SHA> module.
=item corelist
The C<corelist> utility is now installed with perl (see L</"New modules">
above).
=item h2ph and h2xs
C<h2ph> and C<h2xs> have been made more robust with regard to
"modern" C code.
C<h2xs> implements a new option C<--use-xsloader> to force use of
C<XSLoader> even in backwards compatible modules.
The handling of authors' names that had apostrophes has been fixed.
Any enums with negative values are now skipped.
=item perlivp
C<perlivp> no longer checks for F<*.ph> files by default. Use the new C<-a>
option to run I<all> tests.
=item find2perl
C<find2perl> now assumes C<-print> as a default action. Previously, it
needed to be specified explicitly.
Several bugs have been fixed in C<find2perl>, regarding C<-exec> and
C<-eval>. Also the options C<-path>, C<-ipath> and C<-iname> have been
added.
=item config_data
C<config_data> is a new utility that comes with C<Module::Build>. It
provides a command-line interface to the configuration of Perl modules
that use Module::Build's framework of configurability (that is,
C<*::ConfigData> modules that contain local configuration information for
their parent modules.)
=item cpanp
C<cpanp>, the CPANPLUS shell, has been added. (C<cpanp-run-perl>, a
helper for CPANPLUS operation, has been added too, but isn't intended for
direct use).
=item cpan2dist
C<cpan2dist> is a new utility that comes with CPANPLUS. It's a tool to
create distributions (or packages) from CPAN modules.
=item pod2html
The output of C<pod2html> has been enhanced to be more customizable via
CSS. Some formatting problems were also corrected. (Jari Aalto)
=back
=head1 New Documentation
The L<perlpragma> manpage documents how to write one's own lexical
pragmas in pure Perl (something that is possible starting with 5.9.4).
The new L<perlglossary> manpage is a glossary of terms used in the Perl
documentation, technical and otherwise, kindly provided by O'Reilly Media,
Inc.
The L<perlreguts> manpage, courtesy of Yves Orton, describes internals of the
Perl regular expression engine.
The L<perlreapi> manpage describes the interface to the perl interpreter
used to write pluggable regular expression engines (by Ævar Arnfjörð
Bjarmason).
The L<perlunitut> manpage is an tutorial for programming with Unicode and
string encodings in Perl, courtesy of Juerd Waalboer.
A new manual page, L<perlunifaq> (the Perl Unicode FAQ), has been added
(Juerd Waalboer).
The L<perlcommunity> manpage gives a description of the Perl community
on the Internet and in real life. (Edgar "Trizor" Bering)
The L<CORE> manual page documents the C<CORE::> namespace. (Tels)
The long-existing feature of C</(?{...})/> regexps setting C<$_> and pos()
is now documented.
=head1 Performance Enhancements
=head2 In-place sorting
Sorting arrays in place (C<@a = sort @a>) is now optimized to avoid
making a temporary copy of the array.
Likewise, C<reverse sort ...> is now optimized to sort in reverse,
avoiding the generation of a temporary intermediate list.
=head2 Lexical array access
Access to elements of lexical arrays via a numeric constant between 0 and
255 is now faster. (This used to be only the case for global arrays.)
=head2 XS-assisted SWASHGET
Some pure-perl code that perl was using to retrieve Unicode properties and
transliteration mappings has been reimplemented in XS.
=head2 Constant subroutines
The interpreter internals now support a far more memory efficient form of
inlineable constants. Storing a reference to a constant value in a symbol
table is equivalent to a full typeglob referencing a constant subroutine,
but using about 400 bytes less memory. This proxy constant subroutine is
automatically upgraded to a real typeglob with subroutine if necessary.
The approach taken is analogous to the existing space optimisation for
subroutine stub declarations, which are stored as plain scalars in place
of the full typeglob.
Several of the core modules have been converted to use this feature for
their system dependent constants - as a result C<use POSIX;> now takes about
200K less memory.
=head2 C<PERL_DONT_CREATE_GVSV>
The new compilation flag C<PERL_DONT_CREATE_GVSV>, introduced as an option
in perl 5.8.8, is turned on by default in perl 5.9.3. It prevents perl
from creating an empty scalar with every new typeglob. See L<perl589delta>
for details.
=head2 Weak references are cheaper
Weak reference creation is now I<O(1)> rather than I<O(n)>, courtesy of
Nicholas Clark. Weak reference deletion remains I<O(n)>, but if deletion only
happens at program exit, it may be skipped completely.
=head2 sort() enhancements
Salvador Fandiño provided improvements to reduce the memory usage of C<sort>
and to speed up some cases.
=head2 Memory optimisations
Several internal data structures (typeglobs, GVs, CVs, formats) have been
restructured to use less memory. (Nicholas Clark)
=head2 UTF-8 cache optimisation
The UTF-8 caching code is now more efficient, and used more often.
(Nicholas Clark)
=head2 Sloppy stat on Windows
On Windows, perl's stat() function normally opens the file to determine
the link count and update attributes that may have been changed through
hard links. Setting ${^WIN32_SLOPPY_STAT} to a true value speeds up
stat() by not performing this operation. (Jan Dubois)
=head2 Regular expressions optimisations
=over 4
=item Engine de-recursivised
The regular expression engine is no longer recursive, meaning that
patterns that used to overflow the stack will either die with useful
explanations, or run to completion, which, since they were able to blow
the stack before, will likely take a very long time to happen. If you were
experiencing the occasional stack overflow (or segfault) and upgrade to
discover that now perl apparently hangs instead, look for a degenerate
regex. (Dave Mitchell)
=item Single char char-classes treated as literals
Classes of a single character are now treated the same as if the character
had been used as a literal, meaning that code that uses char-classes as an
escaping mechanism will see a speedup. (Yves Orton)
=item Trie optimisation of literal string alternations
Alternations, where possible, are optimised into more efficient matching
structures. String literal alternations are merged into a trie and are
matched simultaneously. This means that instead of O(N) time for matching
N alternations at a given point, the new code performs in O(1) time.
A new special variable, ${^RE_TRIE_MAXBUF}, has been added to fine-tune
this optimization. (Yves Orton)
B<Note:> Much code exists that works around perl's historic poor
performance on alternations. Often the tricks used to do so will disable
the new optimisations. Hopefully the utility modules used for this purpose
will be educated about these new optimisations.
=item Aho-Corasick start-point optimisation
When a pattern starts with a trie-able alternation and there aren't
better optimisations available, the regex engine will use Aho-Corasick
matching to find the start point. (Yves Orton)
=back
=head1 Installation and Configuration Improvements
=head2 Configuration improvements
=over 4
=item C<-Dusesitecustomize>
Run-time customization of @INC can be enabled by passing the
C<-Dusesitecustomize> flag to Configure. When enabled, this will make perl
run F<$sitelibexp/sitecustomize.pl> before anything else. This script can
then be set up to add additional entries to @INC.
=item Relocatable installations
There is now Configure support for creating a relocatable perl tree. If
you Configure with C<-Duserelocatableinc>, then the paths in @INC (and
everything else in %Config) can be optionally located via the path of the
perl executable.
That means that, if the string C<".../"> is found at the start of any
path, it's substituted with the directory of $^X. So, the relocation can
be configured on a per-directory basis, although the default with
C<-Duserelocatableinc> is that everything is relocated. The initial
install is done to the original configured prefix.
=item strlcat() and strlcpy()
The configuration process now detects whether strlcat() and strlcpy() are
available. When they are not available, perl's own version is used (from
Russ Allbery's public domain implementation). Various places in the perl
interpreter now use them. (Steve Peters)
=item C<d_pseudofork> and C<d_printf_format_null>
A new configuration variable, available as C<$Config{d_pseudofork}> in
the L<Config> module, has been added, to distinguish real fork() support
from fake pseudofork used on Windows platforms.
A new configuration variable, C<d_printf_format_null>, has been added,
to see if printf-like formats are allowed to be NULL.
=item Configure help
C<Configure -h> has been extended with the most commonly used options.
=back
=head2 Compilation improvements
=over 4
=item Parallel build
Parallel makes should work properly now, although there may still be problems
if C<make test> is instructed to run in parallel.
=item Borland's compilers support
Building with Borland's compilers on Win32 should work more smoothly. In
particular Steve Hay has worked to side step many warnings emitted by their
compilers and at least one C compiler internal error.
=item Static build on Windows
Perl extensions on Windows now can be statically built into the Perl DLL.
Also, it's now possible to build a C<perl-static.exe> that doesn't depend
on the Perl DLL on Win32. See the Win32 makefiles for details.
(Vadim Konovalov)
=item ppport.h files
All F<ppport.h> files in the XS modules bundled with perl are now
autogenerated at build time. (Marcus Holland-Moritz)
=item C++ compatibility
Efforts have been made to make perl and the core XS modules compilable
with various C++ compilers (although the situation is not perfect with
some of the compilers on some of the platforms tested.)
=item Support for Microsoft 64-bit compiler
Support for building perl with Microsoft's 64-bit compiler has been
improved. (ActiveState)
=item Visual C++
Perl can now be compiled with Microsoft Visual C++ 2005 (and 2008 Beta 2).
=item Win32 builds
All win32 builds (MS-Win, WinCE) have been merged and cleaned up.
=back
=head2 Installation improvements
=over 4
=item Module auxiliary files
README files and changelogs for CPAN modules bundled with perl are no
longer installed.
=back
=head2 New Or Improved Platforms
Perl has been reported to work on Symbian OS. See L<perlsymbian> for more
information.
Many improvements have been made towards making Perl work correctly on
z/OS.
Perl has been reported to work on DragonFlyBSD and MidnightBSD.
Perl has also been reported to work on NexentaOS
( http://www.gnusolaris.org/ ).
The VMS port has been improved. See L<perlvms>.
Support for Cray XT4 Catamount/Qk has been added. See
F<hints/catamount.sh> in the source code distribution for more
information.
Vendor patches have been merged for RedHat and Gentoo.
DynaLoader::dl_unload_file() now works on Windows.
=head1 Selected Bug Fixes
=over 4
=item strictures in regexp-eval blocks
C<strict> wasn't in effect in regexp-eval blocks (C</(?{...})/>).
=item Calling CORE::require()
CORE::require() and CORE::do() were always parsed as require() and do()
when they were overridden. This is now fixed.
=item Subscripts of slices
You can now use a non-arrowed form for chained subscripts after a list
slice, like in:
({foo => "bar"})[0]{foo}
This used to be a syntax error; a C<< -> >> was required.
=item C<no warnings 'category'> works correctly with -w
Previously when running with warnings enabled globally via C<-w>, selective
disabling of specific warning categories would actually turn off all warnings.
This is now fixed; now C<no warnings 'io';> will only turn off warnings in the
C<io> class. Previously it would erroneously turn off all warnings.
=item threads improvements
Several memory leaks in ithreads were closed. Also, ithreads were made
less memory-intensive.
C<threads> is now a dual-life module, also available on CPAN. It has been
expanded in many ways. A kill() method is available for thread signalling.
One can get thread status, or the list of running or joinable threads.
A new C<< threads->exit() >> method is used to exit from the application
(this is the default for the main thread) or from the current thread only
(this is the default for all other threads). On the other hand, the exit()
built-in now always causes the whole application to terminate. (Jerry
D. Hedden)
=item chr() and negative values
chr() on a negative value now gives C<\x{FFFD}>, the Unicode replacement
character, unless when the C<bytes> pragma is in effect, where the low
eight bits of the value are used.
=item PERL5SHELL and tainting
On Windows, the PERL5SHELL environment variable is now checked for
taintedness. (Rafael Garcia-Suarez)
=item Using *FILE{IO}
C<stat()> and C<-X> filetests now treat *FILE{IO} filehandles like *FILE
filehandles. (Steve Peters)
=item Overloading and reblessing
Overloading now works when references are reblessed into another class.
Internally, this has been implemented by moving the flag for "overloading"
from the reference to the referent, which logically is where it should
always have been. (Nicholas Clark)
=item Overloading and UTF-8
A few bugs related to UTF-8 handling with objects that have
stringification overloaded have been fixed. (Nicholas Clark)
=item eval memory leaks fixed
Traditionally, C<eval 'syntax error'> has leaked badly. Many (but not all)
of these leaks have now been eliminated or reduced. (Dave Mitchell)
=item Random device on Windows
In previous versions, perl would read the file F</dev/urandom> if it
existed when seeding its random number generator. That file is unlikely
to exist on Windows, and if it did would probably not contain appropriate
data, so perl no longer tries to read it on Windows. (Alex Davies)
=item PERLIO_DEBUG
The C<PERLIO_DEBUG> environment variable no longer has any effect for
setuid scripts and for scripts run with B<-T>.
Moreover, with a thread-enabled perl, using C<PERLIO_DEBUG> could lead to
an internal buffer overflow. This has been fixed.
=item PerlIO::scalar and read-only scalars
PerlIO::scalar will now prevent writing to read-only scalars. Moreover,
seek() is now supported with PerlIO::scalar-based filehandles, the
underlying string being zero-filled as needed. (Rafael, Jarkko Hietaniemi)
=item study() and UTF-8
study() never worked for UTF-8 strings, but could lead to false results.
It's now a no-op on UTF-8 data. (Yves Orton)
=item Critical signals
The signals SIGILL, SIGBUS and SIGSEGV are now always delivered in an
"unsafe" manner (contrary to other signals, that are deferred until the
perl interpreter reaches a reasonably stable state; see
L<perlipc/"Deferred Signals (Safe Signals)">). (Rafael)
=item @INC-hook fix
When a module or a file is loaded through an @INC-hook, and when this hook
has set a filename entry in %INC, __FILE__ is now set for this module
accordingly to the contents of that %INC entry. (Rafael)
=item C<-t> switch fix
The C<-w> and C<-t> switches can now be used together without messing
up which categories of warnings are activated. (Rafael)
=item Duping UTF-8 filehandles
Duping a filehandle which has the C<:utf8> PerlIO layer set will now
properly carry that layer on the duped filehandle. (Rafael)
=item Localisation of hash elements
Localizing a hash element whose key was given as a variable didn't work
correctly if the variable was changed while the local() was in effect (as
in C<local $h{$x}; ++$x>). (Bo Lindbergh)
=back
=head1 New or Changed Diagnostics
=over 4
=item Use of uninitialized value
Perl will now try to tell you the name of the variable (if any) that was
undefined.
=item Deprecated use of my() in false conditional
A new deprecation warning, I<Deprecated use of my() in false conditional>,
has been added, to warn against the use of the dubious and deprecated
construct
my $x if 0;
See L<perldiag>. Use C<state> variables instead.
=item !=~ should be !~
A new warning, C<!=~ should be !~>, is emitted to prevent this misspelling
of the non-matching operator.
=item Newline in left-justified string
The warning I<Newline in left-justified string> has been removed.
=item Too late for "-T" option
The error I<Too late for "-T" option> has been reformulated to be more
descriptive.
=item "%s" variable %s masks earlier declaration
This warning is now emitted in more consistent cases; in short, when one
of the declarations involved is a C<my> variable:
my $x; my $x; # warns
my $x; our $x; # warns
our $x; my $x; # warns
On the other hand, the following:
our $x; our $x;
now gives a C<"our" variable %s redeclared> warning.
=item readdir()/closedir()/etc. attempted on invalid dirhandle
These new warnings are now emitted when a dirhandle is used but is
either closed or not really a dirhandle.
=item Opening dirhandle/filehandle %s also as a file/directory
Two deprecation warnings have been added: (Rafael)
Opening dirhandle %s also as a file
Opening filehandle %s also as a directory
=item Use of -P is deprecated
Perl's command-line switch C<-P> is now deprecated.
=item v-string in use/require is non-portable
Perl will warn you against potential backwards compatibility problems with
the C<use VERSION> syntax.
=item perl -V
C<perl -V> has several improvements, making it more useable from shell
scripts to get the value of configuration variables. See L<perlrun> for
details.
=back
=head1 Changed Internals
In general, the source code of perl has been refactored, tidied up,
and optimized in many places. Also, memory management and allocation
has been improved in several points.
When compiling the perl core with gcc, as many gcc warning flags are
turned on as is possible on the platform. (This quest for cleanliness
doesn't extend to XS code because we cannot guarantee the tidiness of
code we didn't write.) Similar strictness flags have been added or
tightened for various other C compilers.
=head2 Reordering of SVt_* constants
The relative ordering of constants that define the various types of C<SV>
have changed; in particular, C<SVt_PVGV> has been moved before C<SVt_PVLV>,
C<SVt_PVAV>, C<SVt_PVHV> and C<SVt_PVCV>. This is unlikely to make any
difference unless you have code that explicitly makes assumptions about that
ordering. (The inheritance hierarchy of C<B::*> objects has been changed
to reflect this.)
=head2 Elimination of SVt_PVBM
Related to this, the internal type C<SVt_PVBM> has been removed. This
dedicated type of C<SV> was used by the C<index> operator and parts of the
regexp engine to facilitate fast Boyer-Moore matches. Its use internally has
been replaced by C<SV>s of type C<SVt_PVGV>.
=head2 New type SVt_BIND
A new type C<SVt_BIND> has been added, in readiness for the project to
implement Perl 6 on 5. There deliberately is no implementation yet, and
they cannot yet be created or destroyed.
=head2 Removal of CPP symbols
The C preprocessor symbols C<PERL_PM_APIVERSION> and
C<PERL_XS_APIVERSION>, which were supposed to give the version number of
the oldest perl binary-compatible (resp. source-compatible) with the
present one, were not used, and sometimes had misleading values. They have
been removed.
=head2 Less space is used by ops
The C<BASEOP> structure now uses less space. The C<op_seq> field has been
removed and replaced by a single bit bit-field C<op_opt>. C<op_type> is now 9
bits long. (Consequently, the C<B::OP> class doesn't provide an C<seq>
method anymore.)
=head2 New parser
perl's parser is now generated by bison (it used to be generated by
byacc.) As a result, it seems to be a bit more robust.
Also, Dave Mitchell improved the lexer debugging output under C<-DT>.
=head2 Use of C<const>
Andy Lester supplied many improvements to determine which function
parameters and local variables could actually be declared C<const> to the C
compiler. Steve Peters provided new C<*_set> macros and reworked the core to
use these rather than assigning to macros in LVALUE context.
=head2 Mathoms
A new file, F<mathoms.c>, has been added. It contains functions that are
no longer used in the perl core, but that remain available for binary or
source compatibility reasons. However, those functions will not be
compiled in if you add C<-DNO_MATHOMS> in the compiler flags.
=head2 C<AvFLAGS> has been removed
The C<AvFLAGS> macro has been removed.
=head2 C<av_*> changes
The C<av_*()> functions, used to manipulate arrays, no longer accept null
C<AV*> parameters.
=head2 $^H and %^H
The implementation of the special variables $^H and %^H has changed, to
allow implementing lexical pragmas in pure Perl.
=head2 B:: modules inheritance changed
The inheritance hierarchy of C<B::> modules has changed; C<B::NV> now
inherits from C<B::SV> (it used to inherit from C<B::IV>).
=head2 Anonymous hash and array constructors
The anonymous hash and array constructors now take 1 op in the optree
instead of 3, now that pp_anonhash and pp_anonlist return a reference to
an hash/array when the op is flagged with OPf_SPECIAL. (Nicholas Clark)
=head1 Known Problems
There's still a remaining problem in the implementation of the lexical
C<$_>: it doesn't work inside C</(?{...})/> blocks. (See the TODO test in
F<t/op/mydef.t>.)
Stacked filetest operators won't work when the C<filetest> pragma is in
effect, because they rely on the stat() buffer C<_> being populated, and
filetest bypasses stat().
=head2 UTF-8 problems
The handling of Unicode still is unclean in several places, where it's
dependent on whether a string is internally flagged as UTF-8. This will
be made more consistent in perl 5.12, but that won't be possible without
a certain amount of backwards incompatibility.
=head1 Platform Specific Problems
When compiled with g++ and thread support on Linux, it's reported that the
C<$!> stops working correctly. This is related to the fact that the glibc
provides two strerror_r(3) implementation, and perl selects the wrong
one.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles
recently posted to the comp.lang.perl.misc newsgroup and the perl
bug database at http://rt.perl.org/rt3/ . There may also be
information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the B<perlbug>
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
=head1 SEE ALSO
The F<Changes> file and the perl590delta to perl595delta man pages for
exhaustive details on what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK y3�ZT'�W perl5241delta.podnu �[��� =encoding utf8
=head1 NAME
perl5241delta - what is new for perl v5.24.1
=head1 DESCRIPTION
This document describes differences between the 5.24.0 release and the 5.24.1
release.
If you are upgrading from an earlier release such as 5.22.0, first read
L<perl5240delta>, which describes differences between 5.22.0 and 5.24.0.
=head1 Security
=head2 B<-Di> switch is now required for PerlIO debugging output
Previously PerlIO debugging output would be sent to the file specified by the
C<PERLIO_DEBUG> environment variable if perl wasn't running setuid and the
B<-T> or B<-t> switches hadn't been parsed yet.
If perl performed output at a point where it hadn't yet parsed its switches
this could result in perl creating or overwriting the file named by
C<PERLIO_DEBUG> even when the B<-T> switch had been supplied.
Perl now requires the B<-Di> switch to produce PerlIO debugging output. By
default this is written to C<stderr>, but can optionally be redirected to a
file by setting the C<PERLIO_DEBUG> environment variable.
If perl is running setuid or the B<-T> switch was supplied C<PERLIO_DEBUG> is
ignored and the debugging output is sent to C<stderr> as for any other B<-D>
switch.
=head2 Core modules and tools no longer search F<"."> for optional modules
The tools and many modules supplied in core no longer search the default
current directory entry in L<C<@INC>|perlvar/@INC> for optional modules. For
example, L<Storable> will remove the final F<"."> from C<@INC> before trying to
load L<Log::Agent>.
This prevents an attacker injecting an optional module into a process run by
another user where the current directory is writable by the attacker, e.g. the
F</tmp> directory.
In most cases this removal should not cause problems, but difficulties were
encountered with L<base>, which treats every module name supplied as optional.
These difficulties have not yet been resolved, so for this release there are no
changes to L<base>. We hope to have a fix for L<base> in Perl 5.24.2.
To protect your own code from this attack, either remove the default F<".">
entry from C<@INC> at the start of your script, so:
#!/usr/bin/perl
use strict;
...
becomes:
#!/usr/bin/perl
BEGIN { pop @INC if $INC[-1] eq '.' }
use strict;
...
or for modules, remove F<"."> from a localized C<@INC>, so:
my $can_foo = eval { require Foo; }
becomes:
my $can_foo = eval {
local @INC = @INC;
pop @INC if $INC[-1] eq '.';
require Foo;
};
=head1 Incompatible Changes
Other than the security changes above there are no changes intentionally
incompatible with Perl 5.24.0. If any exist, they are bugs, and we request
that you submit a report. See L</Reporting Bugs> below.
=head1 Modules and Pragmata
=head2 Updated Modules and Pragmata
=over 4
=item *
L<Archive::Tar> has been upgraded from version 2.04 to 2.04_01.
=item *
L<bignum> has been upgraded from version 0.42 to 0.42_01.
=item *
L<CPAN> has been upgraded from version 2.11 to 2.11_01.
=item *
L<Digest> has been upgraded from version 1.17 to 1.17_01.
=item *
L<Digest::SHA> has been upgraded from version 5.95 to 5.95_01.
=item *
L<Encode> has been upgraded from version 2.80 to 2.80_01.
=item *
L<ExtUtils::MakeMaker> has been upgraded from version 7.10_01 to 7.10_02.
=item *
L<File::Fetch> has been upgraded from version 0.48 to 0.48_01.
=item *
L<File::Spec> has been upgraded from version 3.63 to 3.63_01.
=item *
L<HTTP::Tiny> has been upgraded from version 0.056 to 0.056_001.
=item *
L<IO> has been upgraded from version 1.36 to 1.36_01.
=item *
The IO-Compress modules have been upgraded from version 2.069 to 2.069_001.
=item *
L<IPC::Cmd> has been upgraded from version 0.92 to 0.92_01.
=item *
L<JSON::PP> has been upgraded from version 2.27300 to 2.27300_01.
=item *
L<Locale::Maketext> has been upgraded from version 1.26 to 1.26_01.
=item *
L<Locale::Maketext::Simple> has been upgraded from version 0.21 to 0.21_01.
=item *
L<Memoize> has been upgraded from version 1.03 to 1.03_01.
=item *
L<Module::CoreList> has been upgraded from version 5.20160506 to 5.20170114_24.
=item *
L<Net::Ping> has been upgraded from version 2.43 to 2.43_01.
=item *
L<Parse::CPAN::Meta> has been upgraded from version 1.4417 to 1.4417_001.
=item *
L<Pod::Html> has been upgraded from version 1.22 to 1.2201.
=item *
L<Pod::Perldoc> has been upgraded from version 3.25_02 to 3.25_03.
=item *
L<Storable> has been upgraded from version 2.56 to 2.56_01.
=item *
L<Sys::Syslog> has been upgraded from version 0.33 to 0.33_01.
=item *
L<Test> has been upgraded from version 1.28 to 1.28_01.
=item *
L<Test::Harness> has been upgraded from version 3.36 to 3.36_01.
=item *
L<XSLoader> has been upgraded from version 0.21 to 0.22, fixing a security hole
in which binary files could be loaded from a path outside of C<@INC>.
L<[perl #128528]|https://rt.perl.org/Public/Bug/Display.html?id=128528>
=back
=head1 Documentation
=head2 Changes to Existing Documentation
=head3 L<perlapio>
=over 4
=item *
The documentation of C<PERLIO_DEBUG> has been updated.
=back
=head3 L<perlrun>
=over 4
=item *
The new B<-Di> switch has been documented, and the documentation of
C<PERLIO_DEBUG> has been updated.
=back
=head1 Testing
=over 4
=item *
A new test script, F<t/run/switchDx.t>, has been added to test that the new
B<-Di> switch is working correctly.
=back
=head1 Selected Bug Fixes
=over 4
=item *
The change to hashbang redirection introduced in Perl 5.24.0, whereby perl
would redirect to another interpreter (Perl 6) if it found a hashbang path
which contains "perl" followed by "6", has been reverted because it broke in
cases such as C<#!/opt/perl64/bin/perl>.
=back
=head1 Acknowledgements
Perl 5.24.1 represents approximately 8 months of development since Perl 5.24.0
and contains approximately 8,100 lines of changes across 240 files from 18
authors.
Excluding auto-generated files, documentation and release tools, there were
approximately 2,200 lines of changes to 170 .pm, .t, .c and .h files.
Perl continues to flourish into its third decade thanks to a vibrant community
of users and developers. The following people are known to have contributed
the improvements that became Perl 5.24.1:
Aaron Crane, Alex Vandiver, Aristotle Pagaltzis, Chad Granum, Chris 'BinGOs'
Williams, Craig A. Berry, Father Chrysostomos, James E Keenan, Jarkko
Hietaniemi, Karen Etheridge, Leon Timmermans, Matthew Horsfall, Ricardo Signes,
Sawyer X, Sébastien Aperghis-Tramoni, Stevan Little, Steve Hay, Tony Cook.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles recently
posted to the comp.lang.perl.misc newsgroup and the Perl bug database at
L<https://rt.perl.org/> . There may also be information at
L<http://www.perl.org/> , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug> program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of C<perl -V>,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications which make it
inappropriate to send to a publicly archived mailing list, then see
L<perlsec/SECURITY VULNERABILITY CONTACT INFORMATION> for details of how to
report the issue.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details on
what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK y3�Zm4�� � perlipc.podnu �[��� =head1 NAME
perlipc - Perl interprocess communication (signals, fifos, pipes, safe subprocesses, sockets, and semaphores)
=head1 DESCRIPTION
The basic IPC facilities of Perl are built out of the good old Unix
signals, named pipes, pipe opens, the Berkeley socket routines, and SysV
IPC calls. Each is used in slightly different situations.
=head1 Signals
Perl uses a simple signal handling model: the %SIG hash contains names
or references of user-installed signal handlers. These handlers will
be called with an argument which is the name of the signal that
triggered it. A signal may be generated intentionally from a
particular keyboard sequence like control-C or control-Z, sent to you
from another process, or triggered automatically by the kernel when
special events transpire, like a child process exiting, your own process
running out of stack space, or hitting a process file-size limit.
For example, to trap an interrupt signal, set up a handler like this:
our $shucks;
sub catch_zap {
my $signame = shift;
$shucks++;
die "Somebody sent me a SIG$signame";
}
$SIG{INT} = __PACKAGE__ . "::catch_zap";
$SIG{INT} = \&catch_zap; # best strategy
Prior to Perl 5.8.0 it was necessary to do as little as you possibly
could in your handler; notice how all we do is set a global variable
and then raise an exception. That's because on most systems,
libraries are not re-entrant; particularly, memory allocation and I/O
routines are not. That meant that doing nearly I<anything> in your
handler could in theory trigger a memory fault and subsequent core
dump - see L</Deferred Signals (Safe Signals)> below.
The names of the signals are the ones listed out by C<kill -l> on your
system, or you can retrieve them using the CPAN module L<IPC::Signal>.
You may also choose to assign the strings C<"IGNORE"> or C<"DEFAULT"> as
the handler, in which case Perl will try to discard the signal or do the
default thing.
On most Unix platforms, the C<CHLD> (sometimes also known as C<CLD>) signal
has special behavior with respect to a value of C<"IGNORE">.
Setting C<$SIG{CHLD}> to C<"IGNORE"> on such a platform has the effect of
not creating zombie processes when the parent process fails to C<wait()>
on its child processes (i.e., child processes are automatically reaped).
Calling C<wait()> with C<$SIG{CHLD}> set to C<"IGNORE"> usually returns
C<-1> on such platforms.
Some signals can be neither trapped nor ignored, such as the KILL and STOP
(but not the TSTP) signals. Note that ignoring signals makes them disappear.
If you only want them blocked temporarily without them getting lost you'll
have to use POSIX' sigprocmask.
Sending a signal to a negative process ID means that you send the signal
to the entire Unix process group. This code sends a hang-up signal to all
processes in the current process group, and also sets $SIG{HUP} to C<"IGNORE">
so it doesn't kill itself:
# block scope for local
{
local $SIG{HUP} = "IGNORE";
kill HUP => -$$;
# snazzy writing of: kill("HUP", -$$)
}
Another interesting signal to send is signal number zero. This doesn't
actually affect a child process, but instead checks whether it's alive
or has changed its UIDs.
unless (kill 0 => $kid_pid) {
warn "something wicked happened to $kid_pid";
}
Signal number zero may fail because you lack permission to send the
signal when directed at a process whose real or saved UID is not
identical to the real or effective UID of the sending process, even
though the process is alive. You may be able to determine the cause of
failure using C<$!> or C<%!>.
unless (kill(0 => $pid) || $!{EPERM}) {
warn "$pid looks dead";
}
You might also want to employ anonymous functions for simple signal
handlers:
$SIG{INT} = sub { die "\nOutta here!\n" };
SIGCHLD handlers require some special care. If a second child dies
while in the signal handler caused by the first death, we won't get
another signal. So must loop here else we will leave the unreaped child
as a zombie. And the next time two children die we get another zombie.
And so on.
use POSIX ":sys_wait_h";
$SIG{CHLD} = sub {
while ((my $child = waitpid(-1, WNOHANG)) > 0) {
$Kid_Status{$child} = $?;
}
};
# do something that forks...
Be careful: qx(), system(), and some modules for calling external commands
do a fork(), then wait() for the result. Thus, your signal handler
will be called. Because wait() was already called by system() or qx(),
the wait() in the signal handler will see no more zombies and will
therefore block.
The best way to prevent this issue is to use waitpid(), as in the following
example:
use POSIX ":sys_wait_h"; # for nonblocking read
my %children;
$SIG{CHLD} = sub {
# don't change $! and $? outside handler
local ($!, $?);
while ( (my $pid = waitpid(-1, WNOHANG)) > 0 ) {
delete $children{$pid};
cleanup_child($pid, $?);
}
};
while (1) {
my $pid = fork();
die "cannot fork" unless defined $pid;
if ($pid == 0) {
# ...
exit 0;
} else {
$children{$pid}=1;
# ...
system($command);
# ...
}
}
Signal handling is also used for timeouts in Unix. While safely
protected within an C<eval{}> block, you set a signal handler to trap
alarm signals and then schedule to have one delivered to you in some
number of seconds. Then try your blocking operation, clearing the alarm
when it's done but not before you've exited your C<eval{}> block. If it
goes off, you'll use die() to jump out of the block.
Here's an example:
my $ALARM_EXCEPTION = "alarm clock restart";
eval {
local $SIG{ALRM} = sub { die $ALARM_EXCEPTION };
alarm 10;
flock(FH, 2) # blocking write lock
|| die "cannot flock: $!";
alarm 0;
};
if ($@ && $@ !~ quotemeta($ALARM_EXCEPTION)) { die }
If the operation being timed out is system() or qx(), this technique
is liable to generate zombies. If this matters to you, you'll
need to do your own fork() and exec(), and kill the errant child process.
For more complex signal handling, you might see the standard POSIX
module. Lamentably, this is almost entirely undocumented, but the
F<ext/POSIX/t/sigaction.t> file from the Perl source distribution has
some examples in it.
=head2 Handling the SIGHUP Signal in Daemons
A process that usually starts when the system boots and shuts down
when the system is shut down is called a daemon (Disk And Execution
MONitor). If a daemon process has a configuration file which is
modified after the process has been started, there should be a way to
tell that process to reread its configuration file without stopping
the process. Many daemons provide this mechanism using a C<SIGHUP>
signal handler. When you want to tell the daemon to reread the file,
simply send it the C<SIGHUP> signal.
The following example implements a simple daemon, which restarts
itself every time the C<SIGHUP> signal is received. The actual code is
located in the subroutine C<code()>, which just prints some debugging
info to show that it works; it should be replaced with the real code.
#!/usr/bin/perl
use strict;
use warnings;
use POSIX ();
use FindBin ();
use File::Basename ();
use File::Spec::Functions qw(catfile);
$| = 1;
# make the daemon cross-platform, so exec always calls the script
# itself with the right path, no matter how the script was invoked.
my $script = File::Basename::basename($0);
my $SELF = catfile($FindBin::Bin, $script);
# POSIX unmasks the sigprocmask properly
$SIG{HUP} = sub {
print "got SIGHUP\n";
exec($SELF, @ARGV) || die "$0: couldn't restart: $!";
};
code();
sub code {
print "PID: $$\n";
print "ARGV: @ARGV\n";
my $count = 0;
while (1) {
sleep 2;
print ++$count, "\n";
}
}
=head2 Deferred Signals (Safe Signals)
Before Perl 5.8.0, installing Perl code to deal with signals exposed you to
danger from two things. First, few system library functions are
re-entrant. If the signal interrupts while Perl is executing one function
(like malloc(3) or printf(3)), and your signal handler then calls the same
function again, you could get unpredictable behavior--often, a core dump.
Second, Perl isn't itself re-entrant at the lowest levels. If the signal
interrupts Perl while Perl is changing its own internal data structures,
similarly unpredictable behavior may result.
There were two things you could do, knowing this: be paranoid or be
pragmatic. The paranoid approach was to do as little as possible in your
signal handler. Set an existing integer variable that already has a
value, and return. This doesn't help you if you're in a slow system call,
which will just restart. That means you have to C<die> to longjmp(3) out
of the handler. Even this is a little cavalier for the true paranoiac,
who avoids C<die> in a handler because the system I<is> out to get you.
The pragmatic approach was to say "I know the risks, but prefer the
convenience", and to do anything you wanted in your signal handler,
and be prepared to clean up core dumps now and again.
Perl 5.8.0 and later avoid these problems by "deferring" signals. That is,
when the signal is delivered to the process by the system (to the C code
that implements Perl) a flag is set, and the handler returns immediately.
Then at strategic "safe" points in the Perl interpreter (e.g. when it is
about to execute a new opcode) the flags are checked and the Perl level
handler from %SIG is executed. The "deferred" scheme allows much more
flexibility in the coding of signal handlers as we know the Perl
interpreter is in a safe state, and that we are not in a system library
function when the handler is called. However the implementation does
differ from previous Perls in the following ways:
=over 4
=item Long-running opcodes
As the Perl interpreter looks at signal flags only when it is about
to execute a new opcode, a signal that arrives during a long-running
opcode (e.g. a regular expression operation on a very large string) will
not be seen until the current opcode completes.
If a signal of any given type fires multiple times during an opcode
(such as from a fine-grained timer), the handler for that signal will
be called only once, after the opcode completes; all other
instances will be discarded. Furthermore, if your system's signal queue
gets flooded to the point that there are signals that have been raised
but not yet caught (and thus not deferred) at the time an opcode
completes, those signals may well be caught and deferred during
subsequent opcodes, with sometimes surprising results. For example, you
may see alarms delivered even after calling C<alarm(0)> as the latter
stops the raising of alarms but does not cancel the delivery of alarms
raised but not yet caught. Do not depend on the behaviors described in
this paragraph as they are side effects of the current implementation and
may change in future versions of Perl.
=item Interrupting IO
When a signal is delivered (e.g., SIGINT from a control-C) the operating
system breaks into IO operations like I<read>(2), which is used to
implement Perl's readline() function, the C<< <> >> operator. On older
Perls the handler was called immediately (and as C<read> is not "unsafe",
this worked well). With the "deferred" scheme the handler is I<not> called
immediately, and if Perl is using the system's C<stdio> library that
library may restart the C<read> without returning to Perl to give it a
chance to call the %SIG handler. If this happens on your system the
solution is to use the C<:perlio> layer to do IO--at least on those handles
that you want to be able to break into with signals. (The C<:perlio> layer
checks the signal flags and calls %SIG handlers before resuming IO
operation.)
The default in Perl 5.8.0 and later is to automatically use
the C<:perlio> layer.
Note that it is not advisable to access a file handle within a signal
handler where that signal has interrupted an I/O operation on that same
handle. While perl will at least try hard not to crash, there are no
guarantees of data integrity; for example, some data might get dropped or
written twice.
Some networking library functions like gethostbyname() are known to have
their own implementations of timeouts which may conflict with your
timeouts. If you have problems with such functions, try using the POSIX
sigaction() function, which bypasses Perl safe signals. Be warned that
this does subject you to possible memory corruption, as described above.
Instead of setting C<$SIG{ALRM}>:
local $SIG{ALRM} = sub { die "alarm" };
try something like the following:
use POSIX qw(SIGALRM);
POSIX::sigaction(SIGALRM,
POSIX::SigAction->new(sub { die "alarm" }))
|| die "Error setting SIGALRM handler: $!\n";
Another way to disable the safe signal behavior locally is to use
the C<Perl::Unsafe::Signals> module from CPAN, which affects
all signals.
=item Restartable system calls
On systems that supported it, older versions of Perl used the
SA_RESTART flag when installing %SIG handlers. This meant that
restartable system calls would continue rather than returning when
a signal arrived. In order to deliver deferred signals promptly,
Perl 5.8.0 and later do I<not> use SA_RESTART. Consequently,
restartable system calls can fail (with $! set to C<EINTR>) in places
where they previously would have succeeded.
The default C<:perlio> layer retries C<read>, C<write>
and C<close> as described above; interrupted C<wait> and
C<waitpid> calls will always be retried.
=item Signals as "faults"
Certain signals like SEGV, ILL, and BUS are generated by virtual memory
addressing errors and similar "faults". These are normally fatal: there is
little a Perl-level handler can do with them. So Perl delivers them
immediately rather than attempting to defer them.
=item Signals triggered by operating system state
On some operating systems certain signal handlers are supposed to "do
something" before returning. One example can be CHLD or CLD, which
indicates a child process has completed. On some operating systems the
signal handler is expected to C<wait> for the completed child
process. On such systems the deferred signal scheme will not work for
those signals: it does not do the C<wait>. Again the failure will
look like a loop as the operating system will reissue the signal because
there are completed child processes that have not yet been C<wait>ed for.
=back
If you want the old signal behavior back despite possible
memory corruption, set the environment variable C<PERL_SIGNALS> to
C<"unsafe">. This feature first appeared in Perl 5.8.1.
=head1 Named Pipes
A named pipe (often referred to as a FIFO) is an old Unix IPC
mechanism for processes communicating on the same machine. It works
just like regular anonymous pipes, except that the
processes rendezvous using a filename and need not be related.
To create a named pipe, use the C<POSIX::mkfifo()> function.
use POSIX qw(mkfifo);
mkfifo($path, 0700) || die "mkfifo $path failed: $!";
You can also use the Unix command mknod(1), or on some
systems, mkfifo(1). These may not be in your normal path, though.
# system return val is backwards, so && not ||
#
$ENV{PATH} .= ":/etc:/usr/etc";
if ( system("mknod", $path, "p")
&& system("mkfifo", $path) )
{
die "mk{nod,fifo} $path failed";
}
A fifo is convenient when you want to connect a process to an unrelated
one. When you open a fifo, the program will block until there's something
on the other end.
For example, let's say you'd like to have your F<.signature> file be a
named pipe that has a Perl program on the other end. Now every time any
program (like a mailer, news reader, finger program, etc.) tries to read
from that file, the reading program will read the new signature from your
program. We'll use the pipe-checking file-test operator, B<-p>, to find
out whether anyone (or anything) has accidentally removed our fifo.
chdir(); # go home
my $FIFO = ".signature";
while (1) {
unless (-p $FIFO) {
unlink $FIFO; # discard any failure, will catch later
require POSIX; # delayed loading of heavy module
POSIX::mkfifo($FIFO, 0700)
|| die "can't mkfifo $FIFO: $!";
}
# next line blocks till there's a reader
open (FIFO, "> $FIFO") || die "can't open $FIFO: $!";
print FIFO "John Smith (smith\@host.org)\n", `fortune -s`;
close(FIFO) || die "can't close $FIFO: $!";
sleep 2; # to avoid dup signals
}
=head1 Using open() for IPC
Perl's basic open() statement can also be used for unidirectional
interprocess communication by either appending or prepending a pipe
symbol to the second argument to open(). Here's how to start
something up in a child process you intend to write to:
open(SPOOLER, "| cat -v | lpr -h 2>/dev/null")
|| die "can't fork: $!";
local $SIG{PIPE} = sub { die "spooler pipe broke" };
print SPOOLER "stuff\n";
close SPOOLER || die "bad spool: $! $?";
And here's how to start up a child process you intend to read from:
open(STATUS, "netstat -an 2>&1 |")
|| die "can't fork: $!";
while (<STATUS>) {
next if /^(tcp|udp)/;
print;
}
close STATUS || die "bad netstat: $! $?";
If one can be sure that a particular program is a Perl script expecting
filenames in @ARGV, the clever programmer can write something like this:
% program f1 "cmd1|" - f2 "cmd2|" f3 < tmpfile
and no matter which sort of shell it's called from, the Perl program will
read from the file F<f1>, the process F<cmd1>, standard input (F<tmpfile>
in this case), the F<f2> file, the F<cmd2> command, and finally the F<f3>
file. Pretty nifty, eh?
You might notice that you could use backticks for much the
same effect as opening a pipe for reading:
print grep { !/^(tcp|udp)/ } `netstat -an 2>&1`;
die "bad netstatus ($?)" if $?;
While this is true on the surface, it's much more efficient to process the
file one line or record at a time because then you don't have to read the
whole thing into memory at once. It also gives you finer control of the
whole process, letting you kill off the child process early if you'd like.
Be careful to check the return values from both open() and close(). If
you're I<writing> to a pipe, you should also trap SIGPIPE. Otherwise,
think of what happens when you start up a pipe to a command that doesn't
exist: the open() will in all likelihood succeed (it only reflects the
fork()'s success), but then your output will fail--spectacularly. Perl
can't know whether the command worked, because your command is actually
running in a separate process whose exec() might have failed. Therefore,
while readers of bogus commands return just a quick EOF, writers
to bogus commands will get hit with a signal, which they'd best be prepared
to handle. Consider:
open(FH, "|bogus") || die "can't fork: $!";
print FH "bang\n"; # neither necessary nor sufficient
# to check print retval!
close(FH) || die "can't close: $!";
The reason for not checking the return value from print() is because of
pipe buffering; physical writes are delayed. That won't blow up until the
close, and it will blow up with a SIGPIPE. To catch it, you could use
this:
$SIG{PIPE} = "IGNORE";
open(FH, "|bogus") || die "can't fork: $!";
print FH "bang\n";
close(FH) || die "can't close: status=$?";
=head2 Filehandles
Both the main process and any child processes it forks share the same
STDIN, STDOUT, and STDERR filehandles. If both processes try to access
them at once, strange things can happen. You may also want to close
or reopen the filehandles for the child. You can get around this by
opening your pipe with open(), but on some systems this means that the
child process cannot outlive the parent.
=head2 Background Processes
You can run a command in the background with:
system("cmd &");
The command's STDOUT and STDERR (and possibly STDIN, depending on your
shell) will be the same as the parent's. You won't need to catch
SIGCHLD because of the double-fork taking place; see below for details.
=head2 Complete Dissociation of Child from Parent
In some cases (starting server processes, for instance) you'll want to
completely dissociate the child process from the parent. This is
often called daemonization. A well-behaved daemon will also chdir()
to the root directory so it doesn't prevent unmounting the filesystem
containing the directory from which it was launched, and redirect its
standard file descriptors from and to F</dev/null> so that random
output doesn't wind up on the user's terminal.
use POSIX "setsid";
sub daemonize {
chdir("/") || die "can't chdir to /: $!";
open(STDIN, "< /dev/null") || die "can't read /dev/null: $!";
open(STDOUT, "> /dev/null") || die "can't write to /dev/null: $!";
defined(my $pid = fork()) || die "can't fork: $!";
exit if $pid; # non-zero now means I am the parent
(setsid() != -1) || die "Can't start a new session: $!";
open(STDERR, ">&STDOUT") || die "can't dup stdout: $!";
}
The fork() has to come before the setsid() to ensure you aren't a
process group leader; the setsid() will fail if you are. If your
system doesn't have the setsid() function, open F</dev/tty> and use the
C<TIOCNOTTY> ioctl() on it instead. See tty(4) for details.
Non-Unix users should check their C<< I<Your_OS>::Process >> module for
other possible solutions.
=head2 Safe Pipe Opens
Another interesting approach to IPC is making your single program go
multiprocess and communicate between--or even amongst--yourselves. The
open() function will accept a file argument of either C<"-|"> or C<"|-">
to do a very interesting thing: it forks a child connected to the
filehandle you've opened. The child is running the same program as the
parent. This is useful for safely opening a file when running under an
assumed UID or GID, for example. If you open a pipe I<to> minus, you can
write to the filehandle you opened and your kid will find it in I<his>
STDIN. If you open a pipe I<from> minus, you can read from the filehandle
you opened whatever your kid writes to I<his> STDOUT.
use English;
my $PRECIOUS = "/path/to/some/safe/file";
my $sleep_count;
my $pid;
do {
$pid = open(KID_TO_WRITE, "|-");
unless (defined $pid) {
warn "cannot fork: $!";
die "bailing out" if $sleep_count++ > 6;
sleep 10;
}
} until defined $pid;
if ($pid) { # I am the parent
print KID_TO_WRITE @some_data;
close(KID_TO_WRITE) || warn "kid exited $?";
} else { # I am the child
# drop permissions in setuid and/or setgid programs:
($EUID, $EGID) = ($UID, $GID);
open (OUTFILE, "> $PRECIOUS")
|| die "can't open $PRECIOUS: $!";
while (<STDIN>) {
print OUTFILE; # child's STDIN is parent's KID_TO_WRITE
}
close(OUTFILE) || die "can't close $PRECIOUS: $!";
exit(0); # don't forget this!!
}
Another common use for this construct is when you need to execute
something without the shell's interference. With system(), it's
straightforward, but you can't use a pipe open or backticks safely.
That's because there's no way to stop the shell from getting its hands on
your arguments. Instead, use lower-level control to call exec() directly.
Here's a safe backtick or pipe open for read:
my $pid = open(KID_TO_READ, "-|");
defined($pid) || die "can't fork: $!";
if ($pid) { # parent
while (<KID_TO_READ>) {
# do something interesting
}
close(KID_TO_READ) || warn "kid exited $?";
} else { # child
($EUID, $EGID) = ($UID, $GID); # suid only
exec($program, @options, @args)
|| die "can't exec program: $!";
# NOTREACHED
}
And here's a safe pipe open for writing:
my $pid = open(KID_TO_WRITE, "|-");
defined($pid) || die "can't fork: $!";
$SIG{PIPE} = sub { die "whoops, $program pipe broke" };
if ($pid) { # parent
print KID_TO_WRITE @data;
close(KID_TO_WRITE) || warn "kid exited $?";
} else { # child
($EUID, $EGID) = ($UID, $GID);
exec($program, @options, @args)
|| die "can't exec program: $!";
# NOTREACHED
}
It is very easy to dead-lock a process using this form of open(), or
indeed with any use of pipe() with multiple subprocesses. The
example above is "safe" because it is simple and calls exec(). See
L</"Avoiding Pipe Deadlocks"> for general safety principles, but there
are extra gotchas with Safe Pipe Opens.
In particular, if you opened the pipe using C<open FH, "|-">, then you
cannot simply use close() in the parent process to close an unwanted
writer. Consider this code:
my $pid = open(WRITER, "|-"); # fork open a kid
defined($pid) || die "first fork failed: $!";
if ($pid) {
if (my $sub_pid = fork()) {
defined($sub_pid) || die "second fork failed: $!";
close(WRITER) || die "couldn't close WRITER: $!";
# now do something else...
}
else {
# first write to WRITER
# ...
# then when finished
close(WRITER) || die "couldn't close WRITER: $!";
exit(0);
}
}
else {
# first do something with STDIN, then
exit(0);
}
In the example above, the true parent does not want to write to the WRITER
filehandle, so it closes it. However, because WRITER was opened using
C<open FH, "|-">, it has a special behavior: closing it calls
waitpid() (see L<perlfunc/waitpid>), which waits for the subprocess
to exit. If the child process ends up waiting for something happening
in the section marked "do something else", you have deadlock.
This can also be a problem with intermediate subprocesses in more
complicated code, which will call waitpid() on all open filehandles
during global destruction--in no predictable order.
To solve this, you must manually use pipe(), fork(), and the form of
open() which sets one file descriptor to another, as shown below:
pipe(READER, WRITER) || die "pipe failed: $!";
$pid = fork();
defined($pid) || die "first fork failed: $!";
if ($pid) {
close READER;
if (my $sub_pid = fork()) {
defined($sub_pid) || die "first fork failed: $!";
close(WRITER) || die "can't close WRITER: $!";
}
else {
# write to WRITER...
# ...
# then when finished
close(WRITER) || die "can't close WRITER: $!";
exit(0);
}
# write to WRITER...
}
else {
open(STDIN, "<&READER") || die "can't reopen STDIN: $!";
close(WRITER) || die "can't close WRITER: $!";
# do something...
exit(0);
}
Since Perl 5.8.0, you can also use the list form of C<open> for pipes.
This is preferred when you wish to avoid having the shell interpret
metacharacters that may be in your command string.
So for example, instead of using:
open(PS_PIPE, "ps aux|") || die "can't open ps pipe: $!";
One would use either of these:
open(PS_PIPE, "-|", "ps", "aux")
|| die "can't open ps pipe: $!";
@ps_args = qw[ ps aux ];
open(PS_PIPE, "-|", @ps_args)
|| die "can't open @ps_args|: $!";
Because there are more than three arguments to open(), forks the ps(1)
command I<without> spawning a shell, and reads its standard output via the
C<PS_PIPE> filehandle. The corresponding syntax to I<write> to command
pipes is to use C<"|-"> in place of C<"-|">.
This was admittedly a rather silly example, because you're using string
literals whose content is perfectly safe. There is therefore no cause to
resort to the harder-to-read, multi-argument form of pipe open(). However,
whenever you cannot be assured that the program arguments are free of shell
metacharacters, the fancier form of open() should be used. For example:
@grep_args = ("egrep", "-i", $some_pattern, @many_files);
open(GREP_PIPE, "-|", @grep_args)
|| die "can't open @grep_args|: $!";
Here the multi-argument form of pipe open() is preferred because the
pattern and indeed even the filenames themselves might hold metacharacters.
Be aware that these operations are full Unix forks, which means they may
not be correctly implemented on all alien systems.
=head2 Avoiding Pipe Deadlocks
Whenever you have more than one subprocess, you must be careful that each
closes whichever half of any pipes created for interprocess communication
it is not using. This is because any child process reading from the pipe
and expecting an EOF will never receive it, and therefore never exit. A
single process closing a pipe is not enough to close it; the last process
with the pipe open must close it for it to read EOF.
Certain built-in Unix features help prevent this most of the time. For
instance, filehandles have a "close on exec" flag, which is set I<en masse>
under control of the C<$^F> variable. This is so any filehandles you
didn't explicitly route to the STDIN, STDOUT or STDERR of a child
I<program> will be automatically closed.
Always explicitly and immediately call close() on the writable end of any
pipe, unless that process is actually writing to it. Even if you don't
explicitly call close(), Perl will still close() all filehandles during
global destruction. As previously discussed, if those filehandles have
been opened with Safe Pipe Open, this will result in calling waitpid(),
which may again deadlock.
=head2 Bidirectional Communication with Another Process
While this works reasonably well for unidirectional communication, what
about bidirectional communication? The most obvious approach doesn't work:
# THIS DOES NOT WORK!!
open(PROG_FOR_READING_AND_WRITING, "| some program |")
If you forget to C<use warnings>, you'll miss out entirely on the
helpful diagnostic message:
Can't do bidirectional pipe at -e line 1.
If you really want to, you can use the standard open2() from the
C<IPC::Open2> module to catch both ends. There's also an open3() in
C<IPC::Open3> for tridirectional I/O so you can also catch your child's
STDERR, but doing so would then require an awkward select() loop and
wouldn't allow you to use normal Perl input operations.
If you look at its source, you'll see that open2() uses low-level
primitives like the pipe() and exec() syscalls to create all the
connections. Although it might have been more efficient by using
socketpair(), this would have been even less portable than it already
is. The open2() and open3() functions are unlikely to work anywhere
except on a Unix system, or at least one purporting POSIX compliance.
=for TODO
Hold on, is this even true? First it says that socketpair() is avoided
for portability, but then it says it probably won't work except on
Unixy systems anyway. Which one of those is true?
Here's an example of using open2():
use FileHandle;
use IPC::Open2;
$pid = open2(*Reader, *Writer, "cat -un");
print Writer "stuff\n";
$got = <Reader>;
The problem with this is that buffering is really going to ruin your
day. Even though your C<Writer> filehandle is auto-flushed so the process
on the other end gets your data in a timely manner, you can't usually do
anything to force that process to give its data to you in a similarly quick
fashion. In this special case, we could actually so, because we gave
I<cat> a B<-u> flag to make it unbuffered. But very few commands are
designed to operate over pipes, so this seldom works unless you yourself
wrote the program on the other end of the double-ended pipe.
A solution to this is to use a library which uses pseudottys to make your
program behave more reasonably. This way you don't have to have control
over the source code of the program you're using. The C<Expect> module
from CPAN also addresses this kind of thing. This module requires two
other modules from CPAN, C<IO::Pty> and C<IO::Stty>. It sets up a pseudo
terminal to interact with programs that insist on talking to the terminal
device driver. If your system is supported, this may be your best bet.
=head2 Bidirectional Communication with Yourself
If you want, you may make low-level pipe() and fork() syscalls to stitch
this together by hand. This example only talks to itself, but you could
reopen the appropriate handles to STDIN and STDOUT and call other processes.
(The following example lacks proper error checking.)
#!/usr/bin/perl -w
# pipe1 - bidirectional communication using two pipe pairs
# designed for the socketpair-challenged
use IO::Handle; # thousands of lines just for autoflush :-(
pipe(PARENT_RDR, CHILD_WTR); # XXX: check failure?
pipe(CHILD_RDR, PARENT_WTR); # XXX: check failure?
CHILD_WTR->autoflush(1);
PARENT_WTR->autoflush(1);
if ($pid = fork()) {
close PARENT_RDR;
close PARENT_WTR;
print CHILD_WTR "Parent Pid $$ is sending this\n";
chomp($line = <CHILD_RDR>);
print "Parent Pid $$ just read this: '$line'\n";
close CHILD_RDR; close CHILD_WTR;
waitpid($pid, 0);
} else {
die "cannot fork: $!" unless defined $pid;
close CHILD_RDR;
close CHILD_WTR;
chomp($line = <PARENT_RDR>);
print "Child Pid $$ just read this: '$line'\n";
print PARENT_WTR "Child Pid $$ is sending this\n";
close PARENT_RDR;
close PARENT_WTR;
exit(0);
}
But you don't actually have to make two pipe calls. If you
have the socketpair() system call, it will do this all for you.
#!/usr/bin/perl -w
# pipe2 - bidirectional communication using socketpair
# "the best ones always go both ways"
use Socket;
use IO::Handle; # thousands of lines just for autoflush :-(
# We say AF_UNIX because although *_LOCAL is the
# POSIX 1003.1g form of the constant, many machines
# still don't have it.
socketpair(CHILD, PARENT, AF_UNIX, SOCK_STREAM, PF_UNSPEC)
|| die "socketpair: $!";
CHILD->autoflush(1);
PARENT->autoflush(1);
if ($pid = fork()) {
close PARENT;
print CHILD "Parent Pid $$ is sending this\n";
chomp($line = <CHILD>);
print "Parent Pid $$ just read this: '$line'\n";
close CHILD;
waitpid($pid, 0);
} else {
die "cannot fork: $!" unless defined $pid;
close CHILD;
chomp($line = <PARENT>);
print "Child Pid $$ just read this: '$line'\n";
print PARENT "Child Pid $$ is sending this\n";
close PARENT;
exit(0);
}
=head1 Sockets: Client/Server Communication
While not entirely limited to Unix-derived operating systems (e.g., WinSock
on PCs provides socket support, as do some VMS libraries), you might not have
sockets on your system, in which case this section probably isn't going to
do you much good. With sockets, you can do both virtual circuits like TCP
streams and datagrams like UDP packets. You may be able to do even more
depending on your system.
The Perl functions for dealing with sockets have the same names as
the corresponding system calls in C, but their arguments tend to differ
for two reasons. First, Perl filehandles work differently than C file
descriptors. Second, Perl already knows the length of its strings, so you
don't need to pass that information.
One of the major problems with ancient, antemillennial socket code in Perl
was that it used hard-coded values for some of the constants, which
severely hurt portability. If you ever see code that does anything like
explicitly setting C<$AF_INET = 2>, you know you're in for big trouble.
An immeasurably superior approach is to use the C<Socket> module, which more
reliably grants access to the various constants and functions you'll need.
If you're not writing a server/client for an existing protocol like
NNTP or SMTP, you should give some thought to how your server will
know when the client has finished talking, and vice-versa. Most
protocols are based on one-line messages and responses (so one party
knows the other has finished when a "\n" is received) or multi-line
messages and responses that end with a period on an empty line
("\n.\n" terminates a message/response).
=head2 Internet Line Terminators
The Internet line terminator is "\015\012". Under ASCII variants of
Unix, that could usually be written as "\r\n", but under other systems,
"\r\n" might at times be "\015\015\012", "\012\012\015", or something
completely different. The standards specify writing "\015\012" to be
conformant (be strict in what you provide), but they also recommend
accepting a lone "\012" on input (be lenient in what you require).
We haven't always been very good about that in the code in this manpage,
but unless you're on a Mac from way back in its pre-Unix dark ages, you'll
probably be ok.
=head2 Internet TCP Clients and Servers
Use Internet-domain sockets when you want to do client-server
communication that might extend to machines outside of your own system.
Here's a sample TCP client using Internet-domain sockets:
#!/usr/bin/perl -w
use strict;
use Socket;
my ($remote, $port, $iaddr, $paddr, $proto, $line);
$remote = shift || "localhost";
$port = shift || 2345; # random port
if ($port =~ /\D/) { $port = getservbyname($port, "tcp") }
die "No port" unless $port;
$iaddr = inet_aton($remote) || die "no host: $remote";
$paddr = sockaddr_in($port, $iaddr);
$proto = getprotobyname("tcp");
socket(SOCK, PF_INET, SOCK_STREAM, $proto) || die "socket: $!";
connect(SOCK, $paddr) || die "connect: $!";
while ($line = <SOCK>) {
print $line;
}
close (SOCK) || die "close: $!";
exit(0);
And here's a corresponding server to go along with it. We'll
leave the address as C<INADDR_ANY> so that the kernel can choose
the appropriate interface on multihomed hosts. If you want sit
on a particular interface (like the external side of a gateway
or firewall machine), fill this in with your real address instead.
#!/usr/bin/perl -Tw
use strict;
BEGIN { $ENV{PATH} = "/usr/bin:/bin" }
use Socket;
use Carp;
my $EOL = "\015\012";
sub logmsg { print "$0 $$: @_ at ", scalar localtime(), "\n" }
my $port = shift || 2345;
die "invalid port" unless $port =~ /^ \d+ $/x;
my $proto = getprotobyname("tcp");
socket(Server, PF_INET, SOCK_STREAM, $proto) || die "socket: $!";
setsockopt(Server, SOL_SOCKET, SO_REUSEADDR, pack("l", 1))
|| die "setsockopt: $!";
bind(Server, sockaddr_in($port, INADDR_ANY)) || die "bind: $!";
listen(Server, SOMAXCONN) || die "listen: $!";
logmsg "server started on port $port";
my $paddr;
for ( ; $paddr = accept(Client, Server); close Client) {
my($port, $iaddr) = sockaddr_in($paddr);
my $name = gethostbyaddr($iaddr, AF_INET);
logmsg "connection from $name [",
inet_ntoa($iaddr), "]
at port $port";
print Client "Hello there, $name, it's now ",
scalar localtime(), $EOL;
}
And here's a multitasking version. It's multitasked in that
like most typical servers, it spawns (fork()s) a slave server to
handle the client request so that the master server can quickly
go back to service a new client.
#!/usr/bin/perl -Tw
use strict;
BEGIN { $ENV{PATH} = "/usr/bin:/bin" }
use Socket;
use Carp;
my $EOL = "\015\012";
sub spawn; # forward declaration
sub logmsg { print "$0 $$: @_ at ", scalar localtime(), "\n" }
my $port = shift || 2345;
die "invalid port" unless $port =~ /^ \d+ $/x;
my $proto = getprotobyname("tcp");
socket(Server, PF_INET, SOCK_STREAM, $proto) || die "socket: $!";
setsockopt(Server, SOL_SOCKET, SO_REUSEADDR, pack("l", 1))
|| die "setsockopt: $!";
bind(Server, sockaddr_in($port, INADDR_ANY)) || die "bind: $!";
listen(Server, SOMAXCONN) || die "listen: $!";
logmsg "server started on port $port";
my $waitedpid = 0;
my $paddr;
use POSIX ":sys_wait_h";
use Errno;
sub REAPER {
local $!; # don't let waitpid() overwrite current error
while ((my $pid = waitpid(-1, WNOHANG)) > 0 && WIFEXITED($?)) {
logmsg "reaped $waitedpid" . ($? ? " with exit $?" : "");
}
$SIG{CHLD} = \&REAPER; # loathe SysV
}
$SIG{CHLD} = \&REAPER;
while (1) {
$paddr = accept(Client, Server) || do {
# try again if accept() returned because got a signal
next if $!{EINTR};
die "accept: $!";
};
my ($port, $iaddr) = sockaddr_in($paddr);
my $name = gethostbyaddr($iaddr, AF_INET);
logmsg "connection from $name [",
inet_ntoa($iaddr),
"] at port $port";
spawn sub {
$| = 1;
print "Hello there, $name, it's now ",
scalar localtime(),
$EOL;
exec "/usr/games/fortune" # XXX: "wrong" line terminators
or confess "can't exec fortune: $!";
};
close Client;
}
sub spawn {
my $coderef = shift;
unless (@_ == 0 && $coderef && ref($coderef) eq "CODE") {
confess "usage: spawn CODEREF";
}
my $pid;
unless (defined($pid = fork())) {
logmsg "cannot fork: $!";
return;
}
elsif ($pid) {
logmsg "begat $pid";
return; # I'm the parent
}
# else I'm the child -- go spawn
open(STDIN, "<&Client") || die "can't dup client to stdin";
open(STDOUT, ">&Client") || die "can't dup client to stdout";
## open(STDERR, ">&STDOUT") || die "can't dup stdout to stderr";
exit($coderef->());
}
This server takes the trouble to clone off a child version via fork()
for each incoming request. That way it can handle many requests at
once, which you might not always want. Even if you don't fork(), the
listen() will allow that many pending connections. Forking servers
have to be particularly careful about cleaning up their dead children
(called "zombies" in Unix parlance), because otherwise you'll quickly
fill up your process table. The REAPER subroutine is used here to
call waitpid() for any child processes that have finished, thereby
ensuring that they terminate cleanly and don't join the ranks of the
living dead.
Within the while loop we call accept() and check to see if it returns
a false value. This would normally indicate a system error needs
to be reported. However, the introduction of safe signals (see
L</Deferred Signals (Safe Signals)> above) in Perl 5.8.0 means that
accept() might also be interrupted when the process receives a signal.
This typically happens when one of the forked subprocesses exits and
notifies the parent process with a CHLD signal.
If accept() is interrupted by a signal, $! will be set to EINTR.
If this happens, we can safely continue to the next iteration of
the loop and another call to accept(). It is important that your
signal handling code not modify the value of $!, or else this test
will likely fail. In the REAPER subroutine we create a local version
of $! before calling waitpid(). When waitpid() sets $! to ECHILD as
it inevitably does when it has no more children waiting, it
updates the local copy and leaves the original unchanged.
You should use the B<-T> flag to enable taint checking (see L<perlsec>)
even if we aren't running setuid or setgid. This is always a good idea
for servers or any program run on behalf of someone else (like CGI
scripts), because it lessens the chances that people from the outside will
be able to compromise your system.
Let's look at another TCP client. This one connects to the TCP "time"
service on a number of different machines and shows how far their clocks
differ from the system on which it's being run:
#!/usr/bin/perl -w
use strict;
use Socket;
my $SECS_OF_70_YEARS = 2208988800;
sub ctime { scalar localtime(shift() || time()) }
my $iaddr = gethostbyname("localhost");
my $proto = getprotobyname("tcp");
my $port = getservbyname("time", "tcp");
my $paddr = sockaddr_in(0, $iaddr);
my($host);
$| = 1;
printf "%-24s %8s %s\n", "localhost", 0, ctime();
foreach $host (@ARGV) {
printf "%-24s ", $host;
my $hisiaddr = inet_aton($host) || die "unknown host";
my $hispaddr = sockaddr_in($port, $hisiaddr);
socket(SOCKET, PF_INET, SOCK_STREAM, $proto)
|| die "socket: $!";
connect(SOCKET, $hispaddr) || die "connect: $!";
my $rtime = pack("C4", ());
read(SOCKET, $rtime, 4);
close(SOCKET);
my $histime = unpack("N", $rtime) - $SECS_OF_70_YEARS;
printf "%8d %s\n", $histime - time(), ctime($histime);
}
=head2 Unix-Domain TCP Clients and Servers
That's fine for Internet-domain clients and servers, but what about local
communications? While you can use the same setup, sometimes you don't
want to. Unix-domain sockets are local to the current host, and are often
used internally to implement pipes. Unlike Internet domain sockets, Unix
domain sockets can show up in the file system with an ls(1) listing.
% ls -l /dev/log
srw-rw-rw- 1 root 0 Oct 31 07:23 /dev/log
You can test for these with Perl's B<-S> file test:
unless (-S "/dev/log") {
die "something's wicked with the log system";
}
Here's a sample Unix-domain client:
#!/usr/bin/perl -w
use Socket;
use strict;
my ($rendezvous, $line);
$rendezvous = shift || "catsock";
socket(SOCK, PF_UNIX, SOCK_STREAM, 0) || die "socket: $!";
connect(SOCK, sockaddr_un($rendezvous)) || die "connect: $!";
while (defined($line = <SOCK>)) {
print $line;
}
exit(0);
And here's a corresponding server. You don't have to worry about silly
network terminators here because Unix domain sockets are guaranteed
to be on the localhost, and thus everything works right.
#!/usr/bin/perl -Tw
use strict;
use Socket;
use Carp;
BEGIN { $ENV{PATH} = "/usr/bin:/bin" }
sub spawn; # forward declaration
sub logmsg { print "$0 $$: @_ at ", scalar localtime(), "\n" }
my $NAME = "catsock";
my $uaddr = sockaddr_un($NAME);
my $proto = getprotobyname("tcp");
socket(Server, PF_UNIX, SOCK_STREAM, 0) || die "socket: $!";
unlink($NAME);
bind (Server, $uaddr) || die "bind: $!";
listen(Server, SOMAXCONN) || die "listen: $!";
logmsg "server started on $NAME";
my $waitedpid;
use POSIX ":sys_wait_h";
sub REAPER {
my $child;
while (($waitedpid = waitpid(-1, WNOHANG)) > 0) {
logmsg "reaped $waitedpid" . ($? ? " with exit $?" : "");
}
$SIG{CHLD} = \&REAPER; # loathe SysV
}
$SIG{CHLD} = \&REAPER;
for ( $waitedpid = 0;
accept(Client, Server) || $waitedpid;
$waitedpid = 0, close Client)
{
next if $waitedpid;
logmsg "connection on $NAME";
spawn sub {
print "Hello there, it's now ", scalar localtime(), "\n";
exec("/usr/games/fortune") || die "can't exec fortune: $!";
};
}
sub spawn {
my $coderef = shift();
unless (@_ == 0 && $coderef && ref($coderef) eq "CODE") {
confess "usage: spawn CODEREF";
}
my $pid;
unless (defined($pid = fork())) {
logmsg "cannot fork: $!";
return;
}
elsif ($pid) {
logmsg "begat $pid";
return; # I'm the parent
}
else {
# I'm the child -- go spawn
}
open(STDIN, "<&Client") || die "can't dup client to stdin";
open(STDOUT, ">&Client") || die "can't dup client to stdout";
## open(STDERR, ">&STDOUT") || die "can't dup stdout to stderr";
exit($coderef->());
}
As you see, it's remarkably similar to the Internet domain TCP server, so
much so, in fact, that we've omitted several duplicate functions--spawn(),
logmsg(), ctime(), and REAPER()--which are the same as in the other server.
So why would you ever want to use a Unix domain socket instead of a
simpler named pipe? Because a named pipe doesn't give you sessions. You
can't tell one process's data from another's. With socket programming,
you get a separate session for each client; that's why accept() takes two
arguments.
For example, let's say that you have a long-running database server daemon
that you want folks to be able to access from the Web, but only
if they go through a CGI interface. You'd have a small, simple CGI
program that does whatever checks and logging you feel like, and then acts
as a Unix-domain client and connects to your private server.
=head1 TCP Clients with IO::Socket
For those preferring a higher-level interface to socket programming, the
IO::Socket module provides an object-oriented approach. If for some reason
you lack this module, you can just fetch IO::Socket from CPAN, where you'll also
find modules providing easy interfaces to the following systems: DNS, FTP,
Ident (RFC 931), NIS and NISPlus, NNTP, Ping, POP3, SMTP, SNMP, SSLeay,
Telnet, and Time--to name just a few.
=head2 A Simple Client
Here's a client that creates a TCP connection to the "daytime"
service at port 13 of the host name "localhost" and prints out everything
that the server there cares to provide.
#!/usr/bin/perl -w
use IO::Socket;
$remote = IO::Socket::INET->new(
Proto => "tcp",
PeerAddr => "localhost",
PeerPort => "daytime(13)",
)
|| die "can't connect to daytime service on localhost";
while (<$remote>) { print }
When you run this program, you should get something back that
looks like this:
Wed May 14 08:40:46 MDT 1997
Here are what those parameters to the new() constructor mean:
=over 4
=item C<Proto>
This is which protocol to use. In this case, the socket handle returned
will be connected to a TCP socket, because we want a stream-oriented
connection, that is, one that acts pretty much like a plain old file.
Not all sockets are this of this type. For example, the UDP protocol
can be used to make a datagram socket, used for message-passing.
=item C<PeerAddr>
This is the name or Internet address of the remote host the server is
running on. We could have specified a longer name like C<"www.perl.com">,
or an address like C<"207.171.7.72">. For demonstration purposes, we've
used the special hostname C<"localhost">, which should always mean the
current machine you're running on. The corresponding Internet address
for localhost is C<"127.0.0.1">, if you'd rather use that.
=item C<PeerPort>
This is the service name or port number we'd like to connect to.
We could have gotten away with using just C<"daytime"> on systems with a
well-configured system services file,[FOOTNOTE: The system services file
is found in I</etc/services> under Unixy systems.] but here we've specified the
port number (13) in parentheses. Using just the number would have also
worked, but numeric literals make careful programmers nervous.
=back
Notice how the return value from the C<new> constructor is used as
a filehandle in the C<while> loop? That's what's called an I<indirect
filehandle>, a scalar variable containing a filehandle. You can use
it the same way you would a normal filehandle. For example, you
can read one line from it this way:
$line = <$handle>;
all remaining lines from is this way:
@lines = <$handle>;
and send a line of data to it this way:
print $handle "some data\n";
=head2 A Webget Client
Here's a simple client that takes a remote host to fetch a document
from, and then a list of files to get from that host. This is a
more interesting client than the previous one because it first sends
something to the server before fetching the server's response.
#!/usr/bin/perl -w
use IO::Socket;
unless (@ARGV > 1) { die "usage: $0 host url ..." }
$host = shift(@ARGV);
$EOL = "\015\012";
$BLANK = $EOL x 2;
for my $document (@ARGV) {
$remote = IO::Socket::INET->new( Proto => "tcp",
PeerAddr => $host,
PeerPort => "http(80)",
) || die "cannot connect to httpd on $host";
$remote->autoflush(1);
print $remote "GET $document HTTP/1.0" . $BLANK;
while ( <$remote> ) { print }
close $remote;
}
The web server handling the HTTP service is assumed to be at
its standard port, number 80. If the server you're trying to
connect to is at a different port, like 1080 or 8080, you should specify it
as the named-parameter pair, C<< PeerPort => 8080 >>. The C<autoflush>
method is used on the socket because otherwise the system would buffer
up the output we sent it. (If you're on a prehistoric Mac, you'll also
need to change every C<"\n"> in your code that sends data over the network
to be a C<"\015\012"> instead.)
Connecting to the server is only the first part of the process: once you
have the connection, you have to use the server's language. Each server
on the network has its own little command language that it expects as
input. The string that we send to the server starting with "GET" is in
HTTP syntax. In this case, we simply request each specified document.
Yes, we really are making a new connection for each document, even though
it's the same host. That's the way you always used to have to speak HTTP.
Recent versions of web browsers may request that the remote server leave
the connection open a little while, but the server doesn't have to honor
such a request.
Here's an example of running that program, which we'll call I<webget>:
% webget www.perl.com /guanaco.html
HTTP/1.1 404 File Not Found
Date: Thu, 08 May 1997 18:02:32 GMT
Server: Apache/1.2b6
Connection: close
Content-type: text/html
<HEAD><TITLE>404 File Not Found</TITLE></HEAD>
<BODY><H1>File Not Found</H1>
The requested URL /guanaco.html was not found on this server.<P>
</BODY>
Ok, so that's not very interesting, because it didn't find that
particular document. But a long response wouldn't have fit on this page.
For a more featureful version of this program, you should look to
the I<lwp-request> program included with the LWP modules from CPAN.
=head2 Interactive Client with IO::Socket
Well, that's all fine if you want to send one command and get one answer,
but what about setting up something fully interactive, somewhat like
the way I<telnet> works? That way you can type a line, get the answer,
type a line, get the answer, etc.
This client is more complicated than the two we've done so far, but if
you're on a system that supports the powerful C<fork> call, the solution
isn't that rough. Once you've made the connection to whatever service
you'd like to chat with, call C<fork> to clone your process. Each of
these two identical process has a very simple job to do: the parent
copies everything from the socket to standard output, while the child
simultaneously copies everything from standard input to the socket.
To accomplish the same thing using just one process would be I<much>
harder, because it's easier to code two processes to do one thing than it
is to code one process to do two things. (This keep-it-simple principle
a cornerstones of the Unix philosophy, and good software engineering as
well, which is probably why it's spread to other systems.)
Here's the code:
#!/usr/bin/perl -w
use strict;
use IO::Socket;
my ($host, $port, $kidpid, $handle, $line);
unless (@ARGV == 2) { die "usage: $0 host port" }
($host, $port) = @ARGV;
# create a tcp connection to the specified host and port
$handle = IO::Socket::INET->new(Proto => "tcp",
PeerAddr => $host,
PeerPort => $port)
|| die "can't connect to port $port on $host: $!";
$handle->autoflush(1); # so output gets there right away
print STDERR "[Connected to $host:$port]\n";
# split the program into two processes, identical twins
die "can't fork: $!" unless defined($kidpid = fork());
# the if{} block runs only in the parent process
if ($kidpid) {
# copy the socket to standard output
while (defined ($line = <$handle>)) {
print STDOUT $line;
}
kill("TERM", $kidpid); # send SIGTERM to child
}
# the else{} block runs only in the child process
else {
# copy standard input to the socket
while (defined ($line = <STDIN>)) {
print $handle $line;
}
exit(0); # just in case
}
The C<kill> function in the parent's C<if> block is there to send a
signal to our child process, currently running in the C<else> block,
as soon as the remote server has closed its end of the connection.
If the remote server sends data a byte at time, and you need that
data immediately without waiting for a newline (which might not happen),
you may wish to replace the C<while> loop in the parent with the
following:
my $byte;
while (sysread($handle, $byte, 1) == 1) {
print STDOUT $byte;
}
Making a system call for each byte you want to read is not very efficient
(to put it mildly) but is the simplest to explain and works reasonably
well.
=head1 TCP Servers with IO::Socket
As always, setting up a server is little bit more involved than running a client.
The model is that the server creates a special kind of socket that
does nothing but listen on a particular port for incoming connections.
It does this by calling the C<< IO::Socket::INET->new() >> method with
slightly different arguments than the client did.
=over 4
=item Proto
This is which protocol to use. Like our clients, we'll
still specify C<"tcp"> here.
=item LocalPort
We specify a local
port in the C<LocalPort> argument, which we didn't do for the client.
This is service name or port number for which you want to be the
server. (Under Unix, ports under 1024 are restricted to the
superuser.) In our sample, we'll use port 9000, but you can use
any port that's not currently in use on your system. If you try
to use one already in used, you'll get an "Address already in use"
message. Under Unix, the C<netstat -a> command will show
which services current have servers.
=item Listen
The C<Listen> parameter is set to the maximum number of
pending connections we can accept until we turn away incoming clients.
Think of it as a call-waiting queue for your telephone.
The low-level Socket module has a special symbol for the system maximum, which
is SOMAXCONN.
=item Reuse
The C<Reuse> parameter is needed so that we restart our server
manually without waiting a few minutes to allow system buffers to
clear out.
=back
Once the generic server socket has been created using the parameters
listed above, the server then waits for a new client to connect
to it. The server blocks in the C<accept> method, which eventually accepts a
bidirectional connection from the remote client. (Make sure to autoflush
this handle to circumvent buffering.)
To add to user-friendliness, our server prompts the user for commands.
Most servers don't do this. Because of the prompt without a newline,
you'll have to use the C<sysread> variant of the interactive client above.
This server accepts one of five different commands, sending output back to
the client. Unlike most network servers, this one handles only one
incoming client at a time. Multitasking servers are covered in
Chapter 16 of the Camel.
Here's the code. We'll
#!/usr/bin/perl -w
use IO::Socket;
use Net::hostent; # for OOish version of gethostbyaddr
$PORT = 9000; # pick something not in use
$server = IO::Socket::INET->new( Proto => "tcp",
LocalPort => $PORT,
Listen => SOMAXCONN,
Reuse => 1);
die "can't setup server" unless $server;
print "[Server $0 accepting clients]\n";
while ($client = $server->accept()) {
$client->autoflush(1);
print $client "Welcome to $0; type help for command list.\n";
$hostinfo = gethostbyaddr($client->peeraddr);
printf "[Connect from %s]\n",
$hostinfo ? $hostinfo->name : $client->peerhost;
print $client "Command? ";
while ( <$client>) {
next unless /\S/; # blank line
if (/quit|exit/i) { last }
elsif (/date|time/i) { printf $client "%s\n", scalar localtime() }
elsif (/who/i ) { print $client `who 2>&1` }
elsif (/cookie/i ) { print $client `/usr/games/fortune 2>&1` }
elsif (/motd/i ) { print $client `cat /etc/motd 2>&1` }
else {
print $client "Commands: quit date who cookie motd\n";
}
} continue {
print $client "Command? ";
}
close $client;
}
=head1 UDP: Message Passing
Another kind of client-server setup is one that uses not connections, but
messages. UDP communications involve much lower overhead but also provide
less reliability, as there are no promises that messages will arrive at
all, let alone in order and unmangled. Still, UDP offers some advantages
over TCP, including being able to "broadcast" or "multicast" to a whole
bunch of destination hosts at once (usually on your local subnet). If you
find yourself overly concerned about reliability and start building checks
into your message system, then you probably should use just TCP to start
with.
UDP datagrams are I<not> a bytestream and should not be treated as such.
This makes using I/O mechanisms with internal buffering like stdio (i.e.
print() and friends) especially cumbersome. Use syswrite(), or better
send(), like in the example below.
Here's a UDP program similar to the sample Internet TCP client given
earlier. However, instead of checking one host at a time, the UDP version
will check many of them asynchronously by simulating a multicast and then
using select() to do a timed-out wait for I/O. To do something similar
with TCP, you'd have to use a different socket handle for each host.
#!/usr/bin/perl -w
use strict;
use Socket;
use Sys::Hostname;
my ( $count, $hisiaddr, $hispaddr, $histime,
$host, $iaddr, $paddr, $port, $proto,
$rin, $rout, $rtime, $SECS_OF_70_YEARS);
$SECS_OF_70_YEARS = 2_208_988_800;
$iaddr = gethostbyname(hostname());
$proto = getprotobyname("udp");
$port = getservbyname("time", "udp");
$paddr = sockaddr_in(0, $iaddr); # 0 means let kernel pick
socket(SOCKET, PF_INET, SOCK_DGRAM, $proto) || die "socket: $!";
bind(SOCKET, $paddr) || die "bind: $!";
$| = 1;
printf "%-12s %8s %s\n", "localhost", 0, scalar localtime();
$count = 0;
for $host (@ARGV) {
$count++;
$hisiaddr = inet_aton($host) || die "unknown host";
$hispaddr = sockaddr_in($port, $hisiaddr);
defined(send(SOCKET, 0, 0, $hispaddr)) || die "send $host: $!";
}
$rin = "";
vec($rin, fileno(SOCKET), 1) = 1;
# timeout after 10.0 seconds
while ($count && select($rout = $rin, undef, undef, 10.0)) {
$rtime = "";
$hispaddr = recv(SOCKET, $rtime, 4, 0) || die "recv: $!";
($port, $hisiaddr) = sockaddr_in($hispaddr);
$host = gethostbyaddr($hisiaddr, AF_INET);
$histime = unpack("N", $rtime) - $SECS_OF_70_YEARS;
printf "%-12s ", $host;
printf "%8d %s\n", $histime - time(), scalar localtime($histime);
$count--;
}
This example does not include any retries and may consequently fail to
contact a reachable host. The most prominent reason for this is congestion
of the queues on the sending host if the number of hosts to contact is
sufficiently large.
=head1 SysV IPC
While System V IPC isn't so widely used as sockets, it still has some
interesting uses. However, you cannot use SysV IPC or Berkeley mmap() to
have a variable shared amongst several processes. That's because Perl
would reallocate your string when you weren't wanting it to. You might
look into the C<IPC::Shareable> or C<threads::shared> modules for that.
Here's a small example showing shared memory usage.
use IPC::SysV qw(IPC_PRIVATE IPC_RMID S_IRUSR S_IWUSR);
$size = 2000;
$id = shmget(IPC_PRIVATE, $size, S_IRUSR | S_IWUSR);
defined($id) || die "shmget: $!";
print "shm key $id\n";
$message = "Message #1";
shmwrite($id, $message, 0, 60) || die "shmwrite: $!";
print "wrote: '$message'\n";
shmread($id, $buff, 0, 60) || die "shmread: $!";
print "read : '$buff'\n";
# the buffer of shmread is zero-character end-padded.
substr($buff, index($buff, "\0")) = "";
print "un" unless $buff eq $message;
print "swell\n";
print "deleting shm $id\n";
shmctl($id, IPC_RMID, 0) || die "shmctl: $!";
Here's an example of a semaphore:
use IPC::SysV qw(IPC_CREAT);
$IPC_KEY = 1234;
$id = semget($IPC_KEY, 10, 0666 | IPC_CREAT);
defined($id) || die "semget: $!";
print "sem id $id\n";
Put this code in a separate file to be run in more than one process.
Call the file F<take>:
# create a semaphore
$IPC_KEY = 1234;
$id = semget($IPC_KEY, 0, 0);
defined($id) || die "semget: $!";
$semnum = 0;
$semflag = 0;
# "take" semaphore
# wait for semaphore to be zero
$semop = 0;
$opstring1 = pack("s!s!s!", $semnum, $semop, $semflag);
# Increment the semaphore count
$semop = 1;
$opstring2 = pack("s!s!s!", $semnum, $semop, $semflag);
$opstring = $opstring1 . $opstring2;
semop($id, $opstring) || die "semop: $!";
Put this code in a separate file to be run in more than one process.
Call this file F<give>:
# "give" the semaphore
# run this in the original process and you will see
# that the second process continues
$IPC_KEY = 1234;
$id = semget($IPC_KEY, 0, 0);
die unless defined($id);
$semnum = 0;
$semflag = 0;
# Decrement the semaphore count
$semop = -1;
$opstring = pack("s!s!s!", $semnum, $semop, $semflag);
semop($id, $opstring) || die "semop: $!";
The SysV IPC code above was written long ago, and it's definitely
clunky looking. For a more modern look, see the IPC::SysV module.
A small example demonstrating SysV message queues:
use IPC::SysV qw(IPC_PRIVATE IPC_RMID IPC_CREAT S_IRUSR S_IWUSR);
my $id = msgget(IPC_PRIVATE, IPC_CREAT | S_IRUSR | S_IWUSR);
defined($id) || die "msgget failed: $!";
my $sent = "message";
my $type_sent = 1234;
msgsnd($id, pack("l! a*", $type_sent, $sent), 0)
|| die "msgsnd failed: $!";
msgrcv($id, my $rcvd_buf, 60, 0, 0)
|| die "msgrcv failed: $!";
my($type_rcvd, $rcvd) = unpack("l! a*", $rcvd_buf);
if ($rcvd eq $sent) {
print "okay\n";
} else {
print "not okay\n";
}
msgctl($id, IPC_RMID, 0) || die "msgctl failed: $!\n";
=head1 NOTES
Most of these routines quietly but politely return C<undef> when they
fail instead of causing your program to die right then and there due to
an uncaught exception. (Actually, some of the new I<Socket> conversion
functions do croak() on bad arguments.) It is therefore essential to
check return values from these functions. Always begin your socket
programs this way for optimal success, and don't forget to add the B<-T>
taint-checking flag to the C<#!> line for servers:
#!/usr/bin/perl -Tw
use strict;
use sigtrap;
use Socket;
=head1 BUGS
These routines all create system-specific portability problems. As noted
elsewhere, Perl is at the mercy of your C libraries for much of its system
behavior. It's probably safest to assume broken SysV semantics for
signals and to stick with simple TCP and UDP socket operations; e.g., don't
try to pass open file descriptors over a local UDP datagram socket if you
want your code to stand a chance of being portable.
=head1 AUTHOR
Tom Christiansen, with occasional vestiges of Larry Wall's original
version and suggestions from the Perl Porters.
=head1 SEE ALSO
There's a lot more to networking than this, but this should get you
started.
For intrepid programmers, the indispensable textbook is I<Unix Network
Programming, 2nd Edition, Volume 1> by W. Richard Stevens (published by
Prentice-Hall). Most books on networking address the subject from the
perspective of a C programmer; translation to Perl is left as an exercise
for the reader.
The IO::Socket(3) manpage describes the object library, and the Socket(3)
manpage describes the low-level interface to sockets. Besides the obvious
functions in L<perlfunc>, you should also check out the F<modules> file at
your nearest CPAN site, especially
L<http://www.cpan.org/modules/00modlist.long.html#ID5_Networking_>.
See L<perlmodlib> or best yet, the F<Perl FAQ> for a description
of what CPAN is and where to get it if the previous link doesn't work
for you.
Section 5 of CPAN's F<modules> file is devoted to "Networking, Device
Control (modems), and Interprocess Communication", and contains numerous
unbundled modules numerous networking modules, Chat and Expect operations,
CGI programming, DCE, FTP, IPC, NNTP, Proxy, Ptty, RPC, SNMP, SMTP, Telnet,
Threads, and ToolTalk--to name just a few.
PK y3�Zf
3�~v ~v
perlreapi.podnu �[��� =head1 NAME
perlreapi - Perl regular expression plugin interface
=head1 DESCRIPTION
As of Perl 5.9.5 there is a new interface for plugging and using
regular expression engines other than the default one.
Each engine is supposed to provide access to a constant structure of the
following format:
typedef struct regexp_engine {
REGEXP* (*comp) (pTHX_
const SV * const pattern, const U32 flags);
I32 (*exec) (pTHX_
REGEXP * const rx,
char* stringarg,
char* strend, char* strbeg,
SSize_t minend, SV* sv,
void* data, U32 flags);
char* (*intuit) (pTHX_
REGEXP * const rx, SV *sv,
const char * const strbeg,
char *strpos, char *strend, U32 flags,
struct re_scream_pos_data_s *data);
SV* (*checkstr) (pTHX_ REGEXP * const rx);
void (*free) (pTHX_ REGEXP * const rx);
void (*numbered_buff_FETCH) (pTHX_
REGEXP * const rx,
const I32 paren,
SV * const sv);
void (*numbered_buff_STORE) (pTHX_
REGEXP * const rx,
const I32 paren,
SV const * const value);
I32 (*numbered_buff_LENGTH) (pTHX_
REGEXP * const rx,
const SV * const sv,
const I32 paren);
SV* (*named_buff) (pTHX_
REGEXP * const rx,
SV * const key,
SV * const value,
U32 flags);
SV* (*named_buff_iter) (pTHX_
REGEXP * const rx,
const SV * const lastkey,
const U32 flags);
SV* (*qr_package)(pTHX_ REGEXP * const rx);
#ifdef USE_ITHREADS
void* (*dupe) (pTHX_ REGEXP * const rx, CLONE_PARAMS *param);
#endif
REGEXP* (*op_comp) (...);
When a regexp is compiled, its C<engine> field is then set to point at
the appropriate structure, so that when it needs to be used Perl can find
the right routines to do so.
In order to install a new regexp handler, C<$^H{regcomp}> is set
to an integer which (when casted appropriately) resolves to one of these
structures. When compiling, the C<comp> method is executed, and the
resulting C<regexp> structure's engine field is expected to point back at
the same structure.
The pTHX_ symbol in the definition is a macro used by Perl under threading
to provide an extra argument to the routine holding a pointer back to
the interpreter that is executing the regexp. So under threading all
routines get an extra argument.
=head1 Callbacks
=head2 comp
REGEXP* comp(pTHX_ const SV * const pattern, const U32 flags);
Compile the pattern stored in C<pattern> using the given C<flags> and
return a pointer to a prepared C<REGEXP> structure that can perform
the match. See L</The REGEXP structure> below for an explanation of
the individual fields in the REGEXP struct.
The C<pattern> parameter is the scalar that was used as the
pattern. Previous versions of Perl would pass two C<char*> indicating
the start and end of the stringified pattern; the following snippet can
be used to get the old parameters:
STRLEN plen;
char* exp = SvPV(pattern, plen);
char* xend = exp + plen;
Since any scalar can be passed as a pattern, it's possible to implement
an engine that does something with an array (C<< "ook" =~ [ qw/ eek
hlagh / ] >>) or with the non-stringified form of a compiled regular
expression (C<< "ook" =~ qr/eek/ >>). Perl's own engine will always
stringify everything using the snippet above, but that doesn't mean
other engines have to.
The C<flags> parameter is a bitfield which indicates which of the
C<msixpn> flags the regex was compiled with. It also contains
additional info, such as if C<use locale> is in effect.
The C<eogc> flags are stripped out before being passed to the comp
routine. The regex engine does not need to know if any of these
are set, as those flags should only affect what Perl does with the
pattern and its match variables, not how it gets compiled and
executed.
By the time the comp callback is called, some of these flags have
already had effect (noted below where applicable). However most of
their effect occurs after the comp callback has run, in routines that
read the C<< rx->extflags >> field which it populates.
In general the flags should be preserved in C<< rx->extflags >> after
compilation, although the regex engine might want to add or delete
some of them to invoke or disable some special behavior in Perl. The
flags along with any special behavior they cause are documented below:
The pattern modifiers:
=over 4
=item C</m> - RXf_PMf_MULTILINE
If this is in C<< rx->extflags >> it will be passed to
C<Perl_fbm_instr> by C<pp_split> which will treat the subject string
as a multi-line string.
=item C</s> - RXf_PMf_SINGLELINE
=item C</i> - RXf_PMf_FOLD
=item C</x> - RXf_PMf_EXTENDED
If present on a regex, C<"#"> comments will be handled differently by the
tokenizer in some cases.
TODO: Document those cases.
=item C</p> - RXf_PMf_KEEPCOPY
TODO: Document this
=item Character set
The character set rules are determined by an enum that is contained
in this field. This is still experimental and subject to change, but
the current interface returns the rules by use of the in-line function
C<get_regex_charset(const U32 flags)>. The only currently documented
value returned from it is REGEX_LOCALE_CHARSET, which is set if
C<use locale> is in effect. If present in C<< rx->extflags >>,
C<split> will use the locale dependent definition of whitespace
when RXf_SKIPWHITE or RXf_WHITE is in effect. ASCII whitespace
is defined as per L<isSPACE|perlapi/isSPACE>, and by the internal
macros C<is_utf8_space> under UTF-8, and C<isSPACE_LC> under C<use
locale>.
=back
Additional flags:
=over 4
=item RXf_SPLIT
This flag was removed in perl 5.18.0. C<split ' '> is now special-cased
solely in the parser. RXf_SPLIT is still #defined, so you can test for it.
This is how it used to work:
If C<split> is invoked as C<split ' '> or with no arguments (which
really means C<split(' ', $_)>, see L<split|perlfunc/split>), Perl will
set this flag. The regex engine can then check for it and set the
SKIPWHITE and WHITE extflags. To do this, the Perl engine does:
if (flags & RXf_SPLIT && r->prelen == 1 && r->precomp[0] == ' ')
r->extflags |= (RXf_SKIPWHITE|RXf_WHITE);
=back
These flags can be set during compilation to enable optimizations in
the C<split> operator.
=over 4
=item RXf_SKIPWHITE
This flag was removed in perl 5.18.0. It is still #defined, so you can
set it, but doing so will have no effect. This is how it used to work:
If the flag is present in C<< rx->extflags >> C<split> will delete
whitespace from the start of the subject string before it's operated
on. What is considered whitespace depends on if the subject is a
UTF-8 string and if the C<RXf_PMf_LOCALE> flag is set.
If RXf_WHITE is set in addition to this flag, C<split> will behave like
C<split " "> under the Perl engine.
=item RXf_START_ONLY
Tells the split operator to split the target string on newlines
(C<\n>) without invoking the regex engine.
Perl's engine sets this if the pattern is C</^/> (C<plen == 1 && *exp
== '^'>), even under C</^/s>; see L<split|perlfunc>. Of course a
different regex engine might want to use the same optimizations
with a different syntax.
=item RXf_WHITE
Tells the split operator to split the target string on whitespace
without invoking the regex engine. The definition of whitespace varies
depending on if the target string is a UTF-8 string and on
if RXf_PMf_LOCALE is set.
Perl's engine sets this flag if the pattern is C<\s+>.
=item RXf_NULL
Tells the split operator to split the target string on
characters. The definition of character varies depending on if
the target string is a UTF-8 string.
Perl's engine sets this flag on empty patterns, this optimization
makes C<split //> much faster than it would otherwise be. It's even
faster than C<unpack>.
=item RXf_NO_INPLACE_SUBST
Added in perl 5.18.0, this flag indicates that a regular expression might
perform an operation that would interfere with inplace substitution. For
instance it might contain lookbehind, or assign to non-magical variables
(such as $REGMARK and $REGERROR) during matching. C<s///> will skip
certain optimisations when this is set.
=back
=head2 exec
I32 exec(pTHX_ REGEXP * const rx,
char *stringarg, char* strend, char* strbeg,
SSize_t minend, SV* sv,
void* data, U32 flags);
Execute a regexp. The arguments are
=over 4
=item rx
The regular expression to execute.
=item sv
This is the SV to be matched against. Note that the
actual char array to be matched against is supplied by the arguments
described below; the SV is just used to determine UTF8ness, C<pos()> etc.
=item strbeg
Pointer to the physical start of the string.
=item strend
Pointer to the character following the physical end of the string (i.e.
the C<\0>, if any).
=item stringarg
Pointer to the position in the string where matching should start; it might
not be equal to C<strbeg> (for example in a later iteration of C</.../g>).
=item minend
Minimum length of string (measured in bytes from C<stringarg>) that must
match; if the engine reaches the end of the match but hasn't reached this
position in the string, it should fail.
=item data
Optimisation data; subject to change.
=item flags
Optimisation flags; subject to change.
=back
=head2 intuit
char* intuit(pTHX_
REGEXP * const rx,
SV *sv,
const char * const strbeg,
char *strpos,
char *strend,
const U32 flags,
struct re_scream_pos_data_s *data);
Find the start position where a regex match should be attempted,
or possibly if the regex engine should not be run because the
pattern can't match. This is called, as appropriate, by the core,
depending on the values of the C<extflags> member of the C<regexp>
structure.
Arguments:
rx: the regex to match against
sv: the SV being matched: only used for utf8 flag; the string
itself is accessed via the pointers below. Note that on
something like an overloaded SV, SvPOK(sv) may be false
and the string pointers may point to something unrelated to
the SV itself.
strbeg: real beginning of string
strpos: the point in the string at which to begin matching
strend: pointer to the byte following the last char of the string
flags currently unused; set to 0
data: currently unused; set to NULL
=head2 checkstr
SV* checkstr(pTHX_ REGEXP * const rx);
Return a SV containing a string that must appear in the pattern. Used
by C<split> for optimising matches.
=head2 free
void free(pTHX_ REGEXP * const rx);
Called by Perl when it is freeing a regexp pattern so that the engine
can release any resources pointed to by the C<pprivate> member of the
C<regexp> structure. This is only responsible for freeing private data;
Perl will handle releasing anything else contained in the C<regexp> structure.
=head2 Numbered capture callbacks
Called to get/set the value of C<$`>, C<$'>, C<$&> and their named
equivalents, ${^PREMATCH}, ${^POSTMATCH} and ${^MATCH}, as well as the
numbered capture groups (C<$1>, C<$2>, ...).
The C<paren> parameter will be C<1> for C<$1>, C<2> for C<$2> and so
forth, and have these symbolic values for the special variables:
${^PREMATCH} RX_BUFF_IDX_CARET_PREMATCH
${^POSTMATCH} RX_BUFF_IDX_CARET_POSTMATCH
${^MATCH} RX_BUFF_IDX_CARET_FULLMATCH
$` RX_BUFF_IDX_PREMATCH
$' RX_BUFF_IDX_POSTMATCH
$& RX_BUFF_IDX_FULLMATCH
Note that in Perl 5.17.3 and earlier, the last three constants were also
used for the caret variants of the variables.
The names have been chosen by analogy with L<Tie::Scalar> methods
names with an additional B<LENGTH> callback for efficiency. However
named capture variables are currently not tied internally but
implemented via magic.
=head3 numbered_buff_FETCH
void numbered_buff_FETCH(pTHX_ REGEXP * const rx, const I32 paren,
SV * const sv);
Fetch a specified numbered capture. C<sv> should be set to the scalar
to return, the scalar is passed as an argument rather than being
returned from the function because when it's called Perl already has a
scalar to store the value, creating another one would be
redundant. The scalar can be set with C<sv_setsv>, C<sv_setpvn> and
friends, see L<perlapi>.
This callback is where Perl untaints its own capture variables under
taint mode (see L<perlsec>). See the C<Perl_reg_numbered_buff_fetch>
function in F<regcomp.c> for how to untaint capture variables if
that's something you'd like your engine to do as well.
=head3 numbered_buff_STORE
void (*numbered_buff_STORE) (pTHX_
REGEXP * const rx,
const I32 paren,
SV const * const value);
Set the value of a numbered capture variable. C<value> is the scalar
that is to be used as the new value. It's up to the engine to make
sure this is used as the new value (or reject it).
Example:
if ("ook" =~ /(o*)/) {
# 'paren' will be '1' and 'value' will be 'ee'
$1 =~ tr/o/e/;
}
Perl's own engine will croak on any attempt to modify the capture
variables, to do this in another engine use the following callback
(copied from C<Perl_reg_numbered_buff_store>):
void
Example_reg_numbered_buff_store(pTHX_
REGEXP * const rx,
const I32 paren,
SV const * const value)
{
PERL_UNUSED_ARG(rx);
PERL_UNUSED_ARG(paren);
PERL_UNUSED_ARG(value);
if (!PL_localizing)
Perl_croak(aTHX_ PL_no_modify);
}
Actually Perl will not I<always> croak in a statement that looks
like it would modify a numbered capture variable. This is because the
STORE callback will not be called if Perl can determine that it
doesn't have to modify the value. This is exactly how tied variables
behave in the same situation:
package CaptureVar;
use parent 'Tie::Scalar';
sub TIESCALAR { bless [] }
sub FETCH { undef }
sub STORE { die "This doesn't get called" }
package main;
tie my $sv => "CaptureVar";
$sv =~ y/a/b/;
Because C<$sv> is C<undef> when the C<y///> operator is applied to it,
the transliteration won't actually execute and the program won't
C<die>. This is different to how 5.8 and earlier versions behaved
since the capture variables were READONLY variables then; now they'll
just die when assigned to in the default engine.
=head3 numbered_buff_LENGTH
I32 numbered_buff_LENGTH (pTHX_
REGEXP * const rx,
const SV * const sv,
const I32 paren);
Get the C<length> of a capture variable. There's a special callback
for this so that Perl doesn't have to do a FETCH and run C<length> on
the result, since the length is (in Perl's case) known from an offset
stored in C<< rx->offs >>, this is much more efficient:
I32 s1 = rx->offs[paren].start;
I32 s2 = rx->offs[paren].end;
I32 len = t1 - s1;
This is a little bit more complex in the case of UTF-8, see what
C<Perl_reg_numbered_buff_length> does with
L<is_utf8_string_loclen|perlapi/is_utf8_string_loclen>.
=head2 Named capture callbacks
Called to get/set the value of C<%+> and C<%->, as well as by some
utility functions in L<re>.
There are two callbacks, C<named_buff> is called in all the cases the
FETCH, STORE, DELETE, CLEAR, EXISTS and SCALAR L<Tie::Hash> callbacks
would be on changes to C<%+> and C<%-> and C<named_buff_iter> in the
same cases as FIRSTKEY and NEXTKEY.
The C<flags> parameter can be used to determine which of these
operations the callbacks should respond to. The following flags are
currently defined:
Which L<Tie::Hash> operation is being performed from the Perl level on
C<%+> or C<%+>, if any:
RXapif_FETCH
RXapif_STORE
RXapif_DELETE
RXapif_CLEAR
RXapif_EXISTS
RXapif_SCALAR
RXapif_FIRSTKEY
RXapif_NEXTKEY
If C<%+> or C<%-> is being operated on, if any.
RXapif_ONE /* %+ */
RXapif_ALL /* %- */
If this is being called as C<re::regname>, C<re::regnames> or
C<re::regnames_count>, if any. The first two will be combined with
C<RXapif_ONE> or C<RXapif_ALL>.
RXapif_REGNAME
RXapif_REGNAMES
RXapif_REGNAMES_COUNT
Internally C<%+> and C<%-> are implemented with a real tied interface
via L<Tie::Hash::NamedCapture>. The methods in that package will call
back into these functions. However the usage of
L<Tie::Hash::NamedCapture> for this purpose might change in future
releases. For instance this might be implemented by magic instead
(would need an extension to mgvtbl).
=head3 named_buff
SV* (*named_buff) (pTHX_ REGEXP * const rx, SV * const key,
SV * const value, U32 flags);
=head3 named_buff_iter
SV* (*named_buff_iter) (pTHX_
REGEXP * const rx,
const SV * const lastkey,
const U32 flags);
=head2 qr_package
SV* qr_package(pTHX_ REGEXP * const rx);
The package the qr// magic object is blessed into (as seen by C<ref
qr//>). It is recommended that engines change this to their package
name for identification regardless of if they implement methods
on the object.
The package this method returns should also have the internal
C<Regexp> package in its C<@ISA>. C<< qr//->isa("Regexp") >> should always
be true regardless of what engine is being used.
Example implementation might be:
SV*
Example_qr_package(pTHX_ REGEXP * const rx)
{
PERL_UNUSED_ARG(rx);
return newSVpvs("re::engine::Example");
}
Any method calls on an object created with C<qr//> will be dispatched to the
package as a normal object.
use re::engine::Example;
my $re = qr//;
$re->meth; # dispatched to re::engine::Example::meth()
To retrieve the C<REGEXP> object from the scalar in an XS function use
the C<SvRX> macro, see L<"REGEXP Functions" in perlapi|perlapi/REGEXP
Functions>.
void meth(SV * rv)
PPCODE:
REGEXP * re = SvRX(sv);
=head2 dupe
void* dupe(pTHX_ REGEXP * const rx, CLONE_PARAMS *param);
On threaded builds a regexp may need to be duplicated so that the pattern
can be used by multiple threads. This routine is expected to handle the
duplication of any private data pointed to by the C<pprivate> member of
the C<regexp> structure. It will be called with the preconstructed new
C<regexp> structure as an argument, the C<pprivate> member will point at
the B<old> private structure, and it is this routine's responsibility to
construct a copy and return a pointer to it (which Perl will then use to
overwrite the field as passed to this routine.)
This allows the engine to dupe its private data but also if necessary
modify the final structure if it really must.
On unthreaded builds this field doesn't exist.
=head2 op_comp
This is private to the Perl core and subject to change. Should be left
null.
=head1 The REGEXP structure
The REGEXP struct is defined in F<regexp.h>.
All regex engines must be able to
correctly build such a structure in their L</comp> routine.
The REGEXP structure contains all the data that Perl needs to be aware of
to properly work with the regular expression. It includes data about
optimisations that Perl can use to determine if the regex engine should
really be used, and various other control info that is needed to properly
execute patterns in various contexts, such as if the pattern anchored in
some way, or what flags were used during the compile, or if the
program contains special constructs that Perl needs to be aware of.
In addition it contains two fields that are intended for the private
use of the regex engine that compiled the pattern. These are the
C<intflags> and C<pprivate> members. C<pprivate> is a void pointer to
an arbitrary structure, whose use and management is the responsibility
of the compiling engine. Perl will never modify either of these
values.
typedef struct regexp {
/* what engine created this regexp? */
const struct regexp_engine* engine;
/* what re is this a lightweight copy of? */
struct regexp* mother_re;
/* Information about the match that the Perl core uses to manage
* things */
U32 extflags; /* Flags used both externally and internally */
I32 minlen; /* mininum possible number of chars in */
string to match */
I32 minlenret; /* mininum possible number of chars in $& */
U32 gofs; /* chars left of pos that we search from */
/* substring data about strings that must appear
in the final match, used for optimisations */
struct reg_substr_data *substrs;
U32 nparens; /* number of capture groups */
/* private engine specific data */
U32 intflags; /* Engine Specific Internal flags */
void *pprivate; /* Data private to the regex engine which
created this object. */
/* Data about the last/current match. These are modified during
* matching*/
U32 lastparen; /* highest close paren matched ($+) */
U32 lastcloseparen; /* last close paren matched ($^N) */
regexp_paren_pair *swap; /* Swap copy of *offs */
regexp_paren_pair *offs; /* Array of offsets for (@-) and
(@+) */
char *subbeg; /* saved or original string so \digit works
forever. */
SV_SAVED_COPY /* If non-NULL, SV which is COW from original */
I32 sublen; /* Length of string pointed by subbeg */
I32 suboffset; /* byte offset of subbeg from logical start of
str */
I32 subcoffset; /* suboffset equiv, but in chars (for @-/@+) */
/* Information about the match that isn't often used */
I32 prelen; /* length of precomp */
const char *precomp; /* pre-compilation regular expression */
char *wrapped; /* wrapped version of the pattern */
I32 wraplen; /* length of wrapped */
I32 seen_evals; /* number of eval groups in the pattern - for
security checks */
HV *paren_names; /* Optional hash of paren names */
/* Refcount of this regexp */
I32 refcnt; /* Refcount of this regexp */
} regexp;
The fields are discussed in more detail below:
=head2 C<engine>
This field points at a C<regexp_engine> structure which contains pointers
to the subroutines that are to be used for performing a match. It
is the compiling routine's responsibility to populate this field before
returning the regexp object.
Internally this is set to C<NULL> unless a custom engine is specified in
C<$^H{regcomp}>, Perl's own set of callbacks can be accessed in the struct
pointed to by C<RE_ENGINE_PTR>.
=head2 C<mother_re>
TODO, see L<http://www.mail-archive.com/perl5-changes@perl.org/msg17328.html>
=head2 C<extflags>
This will be used by Perl to see what flags the regexp was compiled
with, this will normally be set to the value of the flags parameter by
the L<comp|/comp> callback. See the L<comp|/comp> documentation for
valid flags.
=head2 C<minlen> C<minlenret>
The minimum string length (in characters) required for the pattern to match.
This is used to
prune the search space by not bothering to match any closer to the end of a
string than would allow a match. For instance there is no point in even
starting the regex engine if the minlen is 10 but the string is only 5
characters long. There is no way that the pattern can match.
C<minlenret> is the minimum length (in characters) of the string that would
be found in $& after a match.
The difference between C<minlen> and C<minlenret> can be seen in the
following pattern:
/ns(?=\d)/
where the C<minlen> would be 3 but C<minlenret> would only be 2 as the \d is
required to match but is not actually
included in the matched content. This
distinction is particularly important as the substitution logic uses the
C<minlenret> to tell if it can do in-place substitutions (these can
result in considerable speed-up).
=head2 C<gofs>
Left offset from pos() to start match at.
=head2 C<substrs>
Substring data about strings that must appear in the final match. This
is currently only used internally by Perl's engine, but might be
used in the future for all engines for optimisations.
=head2 C<nparens>, C<lastparen>, and C<lastcloseparen>
These fields are used to keep track of how many paren groups could be matched
in the pattern, which was the last open paren to be entered, and which was
the last close paren to be entered.
=head2 C<intflags>
The engine's private copy of the flags the pattern was compiled with. Usually
this is the same as C<extflags> unless the engine chose to modify one of them.
=head2 C<pprivate>
A void* pointing to an engine-defined
data structure. The Perl engine uses the
C<regexp_internal> structure (see L<perlreguts/Base Structures>) but a custom
engine should use something else.
=head2 C<swap>
Unused. Left in for compatibility with Perl 5.10.0.
=head2 C<offs>
A C<regexp_paren_pair> structure which defines offsets into the string being
matched which correspond to the C<$&> and C<$1>, C<$2> etc. captures, the
C<regexp_paren_pair> struct is defined as follows:
typedef struct regexp_paren_pair {
I32 start;
I32 end;
} regexp_paren_pair;
If C<< ->offs[num].start >> or C<< ->offs[num].end >> is C<-1> then that
capture group did not match.
C<< ->offs[0].start/end >> represents C<$&> (or
C<${^MATCH}> under C</p>) and C<< ->offs[paren].end >> matches C<$$paren> where
C<$paren >= 1>.
=head2 C<precomp> C<prelen>
Used for optimisations. C<precomp> holds a copy of the pattern that
was compiled and C<prelen> its length. When a new pattern is to be
compiled (such as inside a loop) the internal C<regcomp> operator
checks if the last compiled C<REGEXP>'s C<precomp> and C<prelen>
are equivalent to the new one, and if so uses the old pattern instead
of compiling a new one.
The relevant snippet from C<Perl_pp_regcomp>:
if (!re || !re->precomp || re->prelen != (I32)len ||
memNE(re->precomp, t, len))
/* Compile a new pattern */
=head2 C<paren_names>
This is a hash used internally to track named capture groups and their
offsets. The keys are the names of the buffers the values are dualvars,
with the IV slot holding the number of buffers with the given name and the
pv being an embedded array of I32. The values may also be contained
independently in the data array in cases where named backreferences are
used.
=head2 C<substrs>
Holds information on the longest string that must occur at a fixed
offset from the start of the pattern, and the longest string that must
occur at a floating offset from the start of the pattern. Used to do
Fast-Boyer-Moore searches on the string to find out if its worth using
the regex engine at all, and if so where in the string to search.
=head2 C<subbeg> C<sublen> C<saved_copy> C<suboffset> C<subcoffset>
Used during the execution phase for managing search and replace patterns,
and for providing the text for C<$&>, C<$1> etc. C<subbeg> points to a
buffer (either the original string, or a copy in the case of
C<RX_MATCH_COPIED(rx)>), and C<sublen> is the length of the buffer. The
C<RX_OFFS> start and end indices index into this buffer.
In the presence of the C<REXEC_COPY_STR> flag, but with the addition of
the C<REXEC_COPY_SKIP_PRE> or C<REXEC_COPY_SKIP_POST> flags, an engine
can choose not to copy the full buffer (although it must still do so in
the presence of C<RXf_PMf_KEEPCOPY> or the relevant bits being set in
C<PL_sawampersand>). In this case, it may set C<suboffset> to indicate the
number of bytes from the logical start of the buffer to the physical start
(i.e. C<subbeg>). It should also set C<subcoffset>, the number of
characters in the offset. The latter is needed to support C<@-> and C<@+>
which work in characters, not bytes.
=head2 C<wrapped> C<wraplen>
Stores the string C<qr//> stringifies to. The Perl engine for example
stores C<(?^:eek)> in the case of C<qr/eek/>.
When using a custom engine that doesn't support the C<(?:)> construct
for inline modifiers, it's probably best to have C<qr//> stringify to
the supplied pattern, note that this will create undesired patterns in
cases such as:
my $x = qr/a|b/; # "a|b"
my $y = qr/c/i; # "c"
my $z = qr/$x$y/; # "a|bc"
There's no solution for this problem other than making the custom
engine understand a construct like C<(?:)>.
=head2 C<seen_evals>
This stores the number of eval groups in
the pattern. This is used for security
purposes when embedding compiled regexes into larger patterns with C<qr//>.
=head2 C<refcnt>
The number of times the structure is referenced. When
this falls to 0, the regexp is automatically freed
by a call to pregfree. This should be set to 1 in
each engine's L</comp> routine.
=head1 HISTORY
Originally part of L<perlreguts>.
=head1 AUTHORS
Originally written by Yves Orton, expanded by E<AElig>var ArnfjE<ouml>rE<eth>
Bjarmason.
=head1 LICENSE
Copyright 2006 Yves Orton and 2007 E<AElig>var ArnfjE<ouml>rE<eth> Bjarmason.
This program is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.
=cut
PK y3�Z*���zZ zZ perluniprops.podnu �[��� =begin comment
# !!!!!!! DO NOT EDIT THIS FILE !!!!!!!
# This file is machine-generated by lib/unicore/mktables from the Unicode
# database, Version 9.0.0. Any changes made here will be lost!
To change this file, edit lib/unicore/mktables instead.
=end comment
=head1 NAME
perluniprops - Index of Unicode Version 9.0.0 character properties in Perl
=head1 DESCRIPTION
This document provides information about the portion of the Unicode database
that deals with character properties, that is the portion that is defined on
single code points. (L</Other information in the Unicode data base>
below briefly mentions other data that Unicode provides.)
Perl can provide access to all non-provisional Unicode character properties,
though not all are enabled by default. The omitted ones are the Unihan
properties (accessible via the CPAN module L<Unicode::Unihan>) and certain
deprecated or Unicode-internal properties. (An installation may choose to
recompile Perl's tables to change this. See L<Unicode character
properties that are NOT accepted by Perl>.)
For most purposes, access to Unicode properties from the Perl core is through
regular expression matches, as described in the next section.
For some special purposes, and to access the properties that are not suitable
for regular expression matching, all the Unicode character properties that
Perl handles are accessible via the standard L<Unicode::UCD> module, as
described in the section L</Properties accessible through Unicode::UCD>.
Perl also provides some additional extensions and short-cut synonyms
for Unicode properties.
This document merely lists all available properties and does not attempt to
explain what each property really means. There is a brief description of each
Perl extension; see L<perlunicode/Other Properties> for more information on
these. There is some detail about Blocks, Scripts, General_Category,
and Bidi_Class in L<perlunicode>, but to find out about the intricacies of the
official Unicode properties, refer to the Unicode standard. A good starting
place is L<http://www.unicode.org/reports/tr44/>.
Note that you can define your own properties; see
L<perlunicode/"User-Defined Character Properties">.
=head1 Properties accessible through C<\p{}> and C<\P{}>
The Perl regular expression C<\p{}> and C<\P{}> constructs give access to
most of the Unicode character properties. The table below shows all these
constructs, both single and compound forms.
B<Compound forms> consist of two components, separated by an equals sign or a
colon. The first component is the property name, and the second component is
the particular value of the property to match against, for example,
C<\p{Script: Greek}> and C<\p{Script=Greek}> both mean to match characters
whose Script property value is Greek.
B<Single forms>, like C<\p{Greek}>, are mostly Perl-defined shortcuts for
their equivalent compound forms. The table shows these equivalences. (In our
example, C<\p{Greek}> is a just a shortcut for C<\p{Script=Greek}>.)
There are also a few Perl-defined single forms that are not shortcuts for a
compound form. One such is C<\p{Word}>. These are also listed in the table.
In parsing these constructs, Perl always ignores Upper/lower case differences
everywhere within the {braces}. Thus C<\p{Greek}> means the same thing as
C<\p{greek}>. But note that changing the case of the C<"p"> or C<"P"> before
the left brace completely changes the meaning of the construct, from "match"
(for C<\p{}>) to "doesn't match" (for C<\P{}>). Casing in this document is
for improved legibility.
Also, white space, hyphens, and underscores are normally ignored
everywhere between the {braces}, and hence can be freely added or removed
even if the C</x> modifier hasn't been specified on the regular expression.
But in the table below a 'B<T>' at the beginning of an entry
means that tighter (stricter) rules are used for that entry:
=over 4
=over 4
=item Single form (C<\p{name}>) tighter rules:
White space, hyphens, and underscores ARE significant
except for:
=over 4
=item * white space adjacent to a non-word character
=item * underscores separating digits in numbers
=back
That means, for example, that you can freely add or remove white space
adjacent to (but within) the braces without affecting the meaning.
=item Compound form (C<\p{name=value}> or C<\p{name:value}>) tighter rules:
The tighter rules given above for the single form apply to everything to the
right of the colon or equals; the looser rules still apply to everything to
the left.
That means, for example, that you can freely add or remove white space
adjacent to (but within) the braces and the colon or equal sign.
=back
=back
Some properties are considered obsolete by Unicode, but still available.
There are several varieties of obsolescence:
=over 4
=over 4
=item Stabilized
A property may be stabilized. Such a determination does not indicate
that the property should or should not be used; instead it is a declaration
that the property will not be maintained nor extended for newly encoded
characters. Such properties are marked with an 'B<S>' in the
table.
=item Deprecated
A property may be deprecated, perhaps because its original intent
has been replaced by another property, or because its specification was
somehow defective. This means that its use is strongly
discouraged, so much so that a warning will be issued if used, unless the
regular expression is in the scope of a C<S<no warnings 'deprecated'>>
statement. A 'B<D>' flags each such entry in the table, and
the entry there for the longest, most descriptive version of the property will
give the reason it is deprecated, and perhaps advice. Perl may issue such a
warning, even for properties that aren't officially deprecated by Unicode,
when there used to be characters or code points that were matched by them, but
no longer. This is to warn you that your program may not work like it did on
earlier Unicode releases.
A deprecated property may be made unavailable in a future Perl version, so it
is best to move away from them.
A deprecated property may also be stabilized, but this fact is not shown.
=item Obsolete
Properties marked with an 'B<O>' in the table are considered (plain)
obsolete. Generally this designation is given to properties that Unicode once
used for internal purposes (but not any longer).
=item Discouraged
This is not actually a Unicode-specified obsolescence, but applies to certain
Perl extensions that are present for backwards compatibility, but are
discouraged from being used. These are not obsolete, but their meanings are
not stable. Future Unicode versions could force any of these extensions to be
removed without warning, replaced by another property with the same name that
means something different. An 'B<X>' flags each such entry in the
table. Use the equivalent shown instead.
In particular, matches in the Block property have single forms
defined by Perl that begin with C<"In_">, C<"Is_>, or even with no prefix at
all, Like all B<DISCOURAGED> forms, these are not stable. For example,
C<\p{Block=Deseret}> can currently be written as C<\p{In_Deseret}>,
C<\p{Is_Deseret}>, or C<\p{Deseret}>. But, a new Unicode version may
come along that would force Perl to change the meaning of one or more of
these, and your program would no longer be correct. Currently there are no
such conflicts with the form that begins C<"In_">, but there are many with the
other two shortcuts, and Unicode continues to define new properties that begin
with C<"In">, so it's quite possible that a conflict will occur in the future.
The compound form is guaranteed to not become obsolete, and its meaning is
clearer anyway. See L<perlunicode/"Blocks"> for more information about this.
=back
=back
The table below has two columns. The left column contains the C<\p{}>
constructs to look up, possibly preceded by the flags mentioned above; and
the right column contains information about them, like a description, or
synonyms. The table shows both the single and compound forms for each
property that has them. If the left column is a short name for a property,
the right column will give its longer, more descriptive name; and if the left
column is the longest name, the right column will show any equivalent shortest
name, in both single and compound forms if applicable.
If braces are not needed to specify a property (e.g., C<\pL>), the left
column contains both forms, with and without braces.
The right column will also caution you if a property means something different
than what might normally be expected.
All single forms are Perl extensions; a few compound forms are as well, and
are noted as such.
Numbers in (parentheses) indicate the total number of Unicode code points
matched by the property. For emphasis, those properties that match no code
points at all are listed as well in a separate section following the table.
Most properties match the same code points regardless of whether C<"/i">
case-insensitive matching is specified or not. But a few properties are
affected. These are shown with the notation S<C<(/i= I<other_property>)>>
in the second column. Under case-insensitive matching they match the
same code pode points as the property I<other_property>.
There is no description given for most non-Perl defined properties (See
L<http://www.unicode.org/reports/tr44/> for that).
For compactness, 'B<*>' is used as a wildcard instead of showing all possible
combinations. For example, entries like:
\p{Gc: *} \p{General_Category: *}
mean that 'Gc' is a synonym for 'General_Category', and anything that is valid
for the latter is also valid for the former. Similarly,
\p{Is_*} \p{*}
means that if and only if, for example, C<\p{Foo}> exists, then
C<\p{Is_Foo}> and C<\p{IsFoo}> are also valid and all mean the same thing.
And similarly, C<\p{Foo=Bar}> means the same as C<\p{Is_Foo=Bar}> and
C<\p{IsFoo=Bar}>. "*" here is restricted to something not beginning with an
underscore.
Also, in binary properties, 'Yes', 'T', and 'True' are all synonyms for 'Y'.
And 'No', 'F', and 'False' are all synonyms for 'N'. The table shows 'Y*' and
'N*' to indicate this, and doesn't have separate entries for the other
possibilities. Note that not all properties which have values 'Yes' and 'No'
are binary, and they have all their values spelled out without using this wild
card, and a C<NOT> clause in their description that highlights their not being
binary. These also require the compound form to match them, whereas true
binary properties have both single and compound forms available.
Note that all non-essential underscores are removed in the display of the
short names below.
B<Legend summary:>
=over 4
=item Z<>B<*> is a wild-card
=item B<(\d+)> in the info column gives the number of Unicode code points matched
by this property.
=item B<D> means this is deprecated.
=item B<O> means this is obsolete.
=item B<S> means this is stabilized.
=item B<T> means tighter (stricter) name matching applies.
=item B<X> means use of this form is discouraged, and may not be
stable.
=back
NAME INFO
\p{Adlam} \p{Script_Extensions=Adlam} (Short:
\p{Adlm}; NOT \p{Block=Adlam}) (88)
\p{Adlm} \p{Adlam} (= \p{Script_Extensions=Adlam})
(NOT \p{Block=Adlam}) (88)
X \p{Aegean_Numbers} \p{Block=Aegean_Numbers} (64)
T \p{Age: 1.1} \p{Age=V1_1} (33_979)
T \p{Age: 2.0} \p{Age=V2_0} (144_521)
T \p{Age: 2.1} \p{Age=V2_1} (2)
T \p{Age: 3.0} \p{Age=V3_0} (10_307)
T \p{Age: 3.1} \p{Age=V3_1} (44_978)
T \p{Age: 3.2} \p{Age=V3_2} (1016)
T \p{Age: 4.0} \p{Age=V4_0} (1226)
T \p{Age: 4.1} \p{Age=V4_1} (1273)
T \p{Age: 5.0} \p{Age=V5_0} (1369)
T \p{Age: 5.1} \p{Age=V5_1} (1624)
T \p{Age: 5.2} \p{Age=V5_2} (6648)
T \p{Age: 6.0} \p{Age=V6_0} (2088)
T \p{Age: 6.1} \p{Age=V6_1} (732)
T \p{Age: 6.2} \p{Age=V6_2} (1)
T \p{Age: 6.3} \p{Age=V6_3} (5)
T \p{Age: 7.0} \p{Age=V7_0} (2834)
T \p{Age: 8.0} \p{Age=V8_0} (7716)
T \p{Age: 9.0} \p{Age=V9_0} (7500)
\p{Age: NA} \p{Age=Unassigned} (846_293 plus all
above-Unicode code points)
\p{Age: Unassigned} Code point's usage has not been assigned
in any Unicode release thus far. (Short:
\p{Age=NA}) (846_293 plus all above-
Unicode code points)
\p{Age: V1_1} Code point's usage introduced in version
1.1 (33_979)
\p{Age: V2_0} Code point's usage was introduced in
version 2.0; See also Property
'Present_In' (144_521)
\p{Age: V2_1} Code point's usage was introduced in
version 2.1; See also Property
'Present_In' (2)
\p{Age: V3_0} Code point's usage was introduced in
version 3.0; See also Property
'Present_In' (10_307)
\p{Age: V3_1} Code point's usage was introduced in
version 3.1; See also Property
'Present_In' (44_978)
\p{Age: V3_2} Code point's usage was introduced in
version 3.2; See also Property
'Present_In' (1016)
\p{Age: V4_0} Code point's usage was introduced in
version 4.0; See also Property
'Present_In' (1226)
\p{Age: V4_1} Code point's usage was introduced in
version 4.1; See also Property
'Present_In' (1273)
\p{Age: V5_0} Code point's usage was introduced in
version 5.0; See also Property
'Present_In' (1369)
\p{Age: V5_1} Code point's usage was introduced in
version 5.1; See also Property
'Present_In' (1624)
\p{Age: V5_2} Code point's usage was introduced in
version 5.2; See also Property
'Present_In' (6648)
\p{Age: V6_0} Code point's usage was introduced in
version 6.0; See also Property
'Present_In' (2088)
\p{Age: V6_1} Code point's usage was introduced in
version 6.1; See also Property
'Present_In' (732)
\p{Age: V6_2} Code point's usage was introduced in
version 6.2; See also Property
'Present_In' (1)
\p{Age: V6_3} Code point's usage was introduced in
version 6.3; See also Property
'Present_In' (5)
\p{Age: V7_0} Code point's usage was introduced in
version 7.0; See also Property
'Present_In' (2834)
\p{Age: V8_0} Code point's usage was introduced in
version 8.0; See also Property
'Present_In' (7716)
\p{Age: V9_0} Code point's usage was introduced in
version 9.0; See also Property
'Present_In' (7500)
\p{Aghb} \p{Caucasian_Albanian} (=
\p{Script_Extensions=
Caucasian_Albanian}) (NOT \p{Block=
Caucasian_Albanian}) (53)
\p{AHex} \p{PosixXDigit} (= \p{ASCII_Hex_Digit=Y})
(22)
\p{AHex: *} \p{ASCII_Hex_Digit: *}
\p{Ahom} \p{Script_Extensions=Ahom} (NOT \p{Block=
Ahom}) (57)
X \p{Alchemical} \p{Alchemical_Symbols} (= \p{Block=
Alchemical_Symbols}) (128)
X \p{Alchemical_Symbols} \p{Block=Alchemical_Symbols} (Short:
\p{InAlchemical}) (128)
\p{All} All code points, including those above
Unicode. Same as qr/./s (1_114_112 plus
all above-Unicode code points)
\p{Alnum} \p{XPosixAlnum} (118_820)
\p{Alpha} \p{XPosixAlpha} (= \p{Alphabetic=Y})
(118_240)
\p{Alpha: *} \p{Alphabetic: *}
\p{Alphabetic} \p{XPosixAlpha} (= \p{Alphabetic=Y})
(118_240)
\p{Alphabetic: N*} (Short: \p{Alpha=N}, \P{Alpha}) (995_872
plus all above-Unicode code points)
\p{Alphabetic: Y*} (Short: \p{Alpha=Y}, \p{Alpha}) (118_240)
X \p{Alphabetic_PF} \p{Alphabetic_Presentation_Forms} (=
\p{Block=Alphabetic_Presentation_Forms})
(80)
X \p{Alphabetic_Presentation_Forms} \p{Block=
Alphabetic_Presentation_Forms} (Short:
\p{InAlphabeticPF}) (80)
\p{Anatolian_Hieroglyphs} \p{Script_Extensions=
Anatolian_Hieroglyphs} (Short: \p{Hluw};
NOT \p{Block=Anatolian_Hieroglyphs})
(583)
X \p{Ancient_Greek_Music} \p{Ancient_Greek_Musical_Notation} (=
\p{Block=
Ancient_Greek_Musical_Notation}) (80)
X \p{Ancient_Greek_Musical_Notation} \p{Block=
Ancient_Greek_Musical_Notation} (Short:
\p{InAncientGreekMusic}) (80)
X \p{Ancient_Greek_Numbers} \p{Block=Ancient_Greek_Numbers} (80)
X \p{Ancient_Symbols} \p{Block=Ancient_Symbols} (64)
\p{Any} All Unicode code points: [\x{0000}-
\x{10FFFF}] (1_114_112)
\p{Arab} \p{Arabic} (= \p{Script_Extensions=
Arabic}) (NOT \p{Block=Arabic}) (1323)
\p{Arabic} \p{Script_Extensions=Arabic} (Short:
\p{Arab}; NOT \p{Block=Arabic}) (1323)
X \p{Arabic_Ext_A} \p{Arabic_Extended_A} (= \p{Block=
Arabic_Extended_A}) (96)
X \p{Arabic_Extended_A} \p{Block=Arabic_Extended_A} (Short:
\p{InArabicExtA}) (96)
X \p{Arabic_Math} \p{Arabic_Mathematical_Alphabetic_Symbols}
(= \p{Block=
Arabic_Mathematical_Alphabetic_Symbols})
(256)
X \p{Arabic_Mathematical_Alphabetic_Symbols} \p{Block=
Arabic_Mathematical_Alphabetic_Symbols}
(Short: \p{InArabicMath}) (256)
X \p{Arabic_PF_A} \p{Arabic_Presentation_Forms_A} (=
\p{Block=Arabic_Presentation_Forms_A})
(688)
X \p{Arabic_PF_B} \p{Arabic_Presentation_Forms_B} (=
\p{Block=Arabic_Presentation_Forms_B})
(144)
X \p{Arabic_Presentation_Forms_A} \p{Block=
Arabic_Presentation_Forms_A} (Short:
\p{InArabicPFA}) (688)
X \p{Arabic_Presentation_Forms_B} \p{Block=
Arabic_Presentation_Forms_B} (Short:
\p{InArabicPFB}) (144)
X \p{Arabic_Sup} \p{Arabic_Supplement} (= \p{Block=
Arabic_Supplement}) (48)
X \p{Arabic_Supplement} \p{Block=Arabic_Supplement} (Short:
\p{InArabicSup}) (48)
\p{Armenian} \p{Script_Extensions=Armenian} (Short:
\p{Armn}; NOT \p{Block=Armenian}) (94)
\p{Armi} \p{Imperial_Aramaic} (=
\p{Script_Extensions=Imperial_Aramaic})
(NOT \p{Block=Imperial_Aramaic}) (31)
\p{Armn} \p{Armenian} (= \p{Script_Extensions=
Armenian}) (NOT \p{Block=Armenian}) (94)
X \p{Arrows} \p{Block=Arrows} (112)
\p{ASCII} \p{Block=Basic_Latin} [[:ASCII:]] (128)
\p{ASCII_Hex_Digit} \p{PosixXDigit} (= \p{ASCII_Hex_Digit=Y})
(22)
\p{ASCII_Hex_Digit: N*} (Short: \p{AHex=N}, \P{AHex}) (1_114_090
plus all above-Unicode code points)
\p{ASCII_Hex_Digit: Y*} (Short: \p{AHex=Y}, \p{AHex}) (22)
\p{Assigned} All assigned code points (267_753)
\p{Avestan} \p{Script_Extensions=Avestan} (Short:
\p{Avst}; NOT \p{Block=Avestan}) (61)
\p{Avst} \p{Avestan} (= \p{Script_Extensions=
Avestan}) (NOT \p{Block=Avestan}) (61)
\p{Bali} \p{Balinese} (= \p{Script_Extensions=
Balinese}) (NOT \p{Block=Balinese}) (121)
\p{Balinese} \p{Script_Extensions=Balinese} (Short:
\p{Bali}; NOT \p{Block=Balinese}) (121)
\p{Bamu} \p{Bamum} (= \p{Script_Extensions=Bamum})
(NOT \p{Block=Bamum}) (657)
\p{Bamum} \p{Script_Extensions=Bamum} (Short:
\p{Bamu}; NOT \p{Block=Bamum}) (657)
X \p{Bamum_Sup} \p{Bamum_Supplement} (= \p{Block=
Bamum_Supplement}) (576)
X \p{Bamum_Supplement} \p{Block=Bamum_Supplement} (Short:
\p{InBamumSup}) (576)
X \p{Basic_Latin} \p{ASCII} (= \p{Block=Basic_Latin}) (128)
\p{Bass} \p{Bassa_Vah} (= \p{Script_Extensions=
Bassa_Vah}) (NOT \p{Block=Bassa_Vah})
(36)
\p{Bassa_Vah} \p{Script_Extensions=Bassa_Vah} (Short:
\p{Bass}; NOT \p{Block=Bassa_Vah}) (36)
\p{Batak} \p{Script_Extensions=Batak} (Short:
\p{Batk}; NOT \p{Block=Batak}) (56)
\p{Batk} \p{Batak} (= \p{Script_Extensions=Batak})
(NOT \p{Block=Batak}) (56)
\p{Bc: *} \p{Bidi_Class: *}
\p{Beng} \p{Bengali} (= \p{Script_Extensions=
Bengali}) (NOT \p{Block=Bengali}) (98)
\p{Bengali} \p{Script_Extensions=Bengali} (Short:
\p{Beng}; NOT \p{Block=Bengali}) (98)
\p{Bhaiksuki} \p{Script_Extensions=Bhaiksuki} (Short:
\p{Bhks}; NOT \p{Block=Bhaiksuki}) (97)
\p{Bhks} \p{Bhaiksuki} (= \p{Script_Extensions=
Bhaiksuki}) (NOT \p{Block=Bhaiksuki})
(97)
\p{Bidi_C} \p{Bidi_Control} (= \p{Bidi_Control=Y})
(12)
\p{Bidi_C: *} \p{Bidi_Control: *}
\p{Bidi_Class: AL} \p{Bidi_Class=Arabic_Letter} (1420)
\p{Bidi_Class: AN} \p{Bidi_Class=Arabic_Number} (51)
\p{Bidi_Class: Arabic_Letter} (Short: \p{Bc=AL}) (1420)
\p{Bidi_Class: Arabic_Number} (Short: \p{Bc=AN}) (51)
\p{Bidi_Class: B} \p{Bidi_Class=Paragraph_Separator} (7)
\p{Bidi_Class: BN} \p{Bidi_Class=Boundary_Neutral} (4016)
\p{Bidi_Class: Boundary_Neutral} (Short: \p{Bc=BN}) (4016)
\p{Bidi_Class: Common_Separator} (Short: \p{Bc=CS}) (15)
\p{Bidi_Class: CS} \p{Bidi_Class=Common_Separator} (15)
\p{Bidi_Class: EN} \p{Bidi_Class=European_Number} (158)
\p{Bidi_Class: ES} \p{Bidi_Class=European_Separator} (12)
\p{Bidi_Class: ET} \p{Bidi_Class=European_Terminator} (87)
\p{Bidi_Class: European_Number} (Short: \p{Bc=EN}) (158)
\p{Bidi_Class: European_Separator} (Short: \p{Bc=ES}) (12)
\p{Bidi_Class: European_Terminator} (Short: \p{Bc=ET}) (87)
\p{Bidi_Class: First_Strong_Isolate} (Short: \p{Bc=FSI}) (1)
\p{Bidi_Class: FSI} \p{Bidi_Class=First_Strong_Isolate} (1)
\p{Bidi_Class: L} \p{Bidi_Class=Left_To_Right} (1_097_280
plus all above-Unicode code points)
\p{Bidi_Class: Left_To_Right} (Short: \p{Bc=L}) (1_097_280 plus
all above-Unicode code points)
\p{Bidi_Class: Left_To_Right_Embedding} (Short: \p{Bc=LRE}) (1)
\p{Bidi_Class: Left_To_Right_Isolate} (Short: \p{Bc=LRI}) (1)
\p{Bidi_Class: Left_To_Right_Override} (Short: \p{Bc=LRO}) (1)
\p{Bidi_Class: LRE} \p{Bidi_Class=Left_To_Right_Embedding} (1)
\p{Bidi_Class: LRI} \p{Bidi_Class=Left_To_Right_Isolate} (1)
\p{Bidi_Class: LRO} \p{Bidi_Class=Left_To_Right_Override} (1)
\p{Bidi_Class: Nonspacing_Mark} (Short: \p{Bc=NSM}) (1700)
\p{Bidi_Class: NSM} \p{Bidi_Class=Nonspacing_Mark} (1700)
\p{Bidi_Class: ON} \p{Bidi_Class=Other_Neutral} (5267)
\p{Bidi_Class: Other_Neutral} (Short: \p{Bc=ON}) (5267)
\p{Bidi_Class: Paragraph_Separator} (Short: \p{Bc=B}) (7)
\p{Bidi_Class: PDF} \p{Bidi_Class=Pop_Directional_Format} (1)
\p{Bidi_Class: PDI} \p{Bidi_Class=Pop_Directional_Isolate} (1)
\p{Bidi_Class: Pop_Directional_Format} (Short: \p{Bc=PDF}) (1)
\p{Bidi_Class: Pop_Directional_Isolate} (Short: \p{Bc=PDI}) (1)
\p{Bidi_Class: R} \p{Bidi_Class=Right_To_Left} (4070)
\p{Bidi_Class: Right_To_Left} (Short: \p{Bc=R}) (4070)
\p{Bidi_Class: Right_To_Left_Embedding} (Short: \p{Bc=RLE}) (1)
\p{Bidi_Class: Right_To_Left_Isolate} (Short: \p{Bc=RLI}) (1)
\p{Bidi_Class: Right_To_Left_Override} (Short: \p{Bc=RLO}) (1)
\p{Bidi_Class: RLE} \p{Bidi_Class=Right_To_Left_Embedding} (1)
\p{Bidi_Class: RLI} \p{Bidi_Class=Right_To_Left_Isolate} (1)
\p{Bidi_Class: RLO} \p{Bidi_Class=Right_To_Left_Override} (1)
\p{Bidi_Class: S} \p{Bidi_Class=Segment_Separator} (3)
\p{Bidi_Class: Segment_Separator} (Short: \p{Bc=S}) (3)
\p{Bidi_Class: White_Space} (Short: \p{Bc=WS}) (17)
\p{Bidi_Class: WS} \p{Bidi_Class=White_Space} (17)
\p{Bidi_Control} \p{Bidi_Control=Y} (Short: \p{BidiC}) (12)
\p{Bidi_Control: N*} (Short: \p{BidiC=N}, \P{BidiC}) (1_114_100
plus all above-Unicode code points)
\p{Bidi_Control: Y*} (Short: \p{BidiC=Y}, \p{BidiC}) (12)
\p{Bidi_M} \p{Bidi_Mirrored} (= \p{Bidi_Mirrored=Y})
(545)
\p{Bidi_M: *} \p{Bidi_Mirrored: *}
\p{Bidi_Mirrored} \p{Bidi_Mirrored=Y} (Short: \p{BidiM})
(545)
\p{Bidi_Mirrored: N*} (Short: \p{BidiM=N}, \P{BidiM}) (1_113_567
plus all above-Unicode code points)
\p{Bidi_Mirrored: Y*} (Short: \p{BidiM=Y}, \p{BidiM}) (545)
\p{Bidi_Paired_Bracket_Type: C} \p{Bidi_Paired_Bracket_Type=Close}
(60)
\p{Bidi_Paired_Bracket_Type: Close} (Short: \p{Bpt=C}) (60)
\p{Bidi_Paired_Bracket_Type: N} \p{Bidi_Paired_Bracket_Type=None}
(1_113_992 plus all above-Unicode code
points)
\p{Bidi_Paired_Bracket_Type: None} (Short: \p{Bpt=N}) (1_113_992
plus all above-Unicode code points)
\p{Bidi_Paired_Bracket_Type: O} \p{Bidi_Paired_Bracket_Type=Open}
(60)
\p{Bidi_Paired_Bracket_Type: Open} (Short: \p{Bpt=O}) (60)
\p{Blank} \p{XPosixBlank} (18)
\p{Blk: *} \p{Block: *}
\p{Block: Adlam} (NOT \p{Adlam} NOR \p{Is_Adlam}) (96)
\p{Block: Aegean_Numbers} (64)
\p{Block: Ahom} (NOT \p{Ahom} NOR \p{Is_Ahom}) (64)
\p{Block: Alchemical} \p{Block=Alchemical_Symbols} (128)
\p{Block: Alchemical_Symbols} (Short: \p{Blk=Alchemical}) (128)
\p{Block: Alphabetic_PF} \p{Block=Alphabetic_Presentation_Forms}
(80)
\p{Block: Alphabetic_Presentation_Forms} (Short: \p{Blk=
AlphabeticPF}) (80)
\p{Block: Anatolian_Hieroglyphs} (NOT \p{Anatolian_Hieroglyphs}
NOR \p{Is_Anatolian_Hieroglyphs}) (640)
\p{Block: Ancient_Greek_Music} \p{Block=
Ancient_Greek_Musical_Notation} (80)
\p{Block: Ancient_Greek_Musical_Notation} (Short: \p{Blk=
AncientGreekMusic}) (80)
\p{Block: Ancient_Greek_Numbers} (80)
\p{Block: Ancient_Symbols} (64)
\p{Block: Arabic} (NOT \p{Arabic} NOR \p{Is_Arabic}) (256)
\p{Block: Arabic_Ext_A} \p{Block=Arabic_Extended_A} (96)
\p{Block: Arabic_Extended_A} (Short: \p{Blk=ArabicExtA}) (96)
\p{Block: Arabic_Math} \p{Block=
Arabic_Mathematical_Alphabetic_Symbols}
(256)
\p{Block: Arabic_Mathematical_Alphabetic_Symbols} (Short: \p{Blk=
ArabicMath}) (256)
\p{Block: Arabic_PF_A} \p{Block=Arabic_Presentation_Forms_A} (688)
\p{Block: Arabic_PF_B} \p{Block=Arabic_Presentation_Forms_B} (144)
\p{Block: Arabic_Presentation_Forms_A} (Short: \p{Blk=ArabicPFA})
(688)
\p{Block: Arabic_Presentation_Forms_B} (Short: \p{Blk=ArabicPFB})
(144)
\p{Block: Arabic_Sup} \p{Block=Arabic_Supplement} (48)
\p{Block: Arabic_Supplement} (Short: \p{Blk=ArabicSup}) (48)
\p{Block: Armenian} (NOT \p{Armenian} NOR \p{Is_Armenian}) (96)
\p{Block: Arrows} (112)
\p{Block: ASCII} \p{Block=Basic_Latin} (128)
\p{Block: Avestan} (NOT \p{Avestan} NOR \p{Is_Avestan}) (64)
\p{Block: Balinese} (NOT \p{Balinese} NOR \p{Is_Balinese})
(128)
\p{Block: Bamum} (NOT \p{Bamum} NOR \p{Is_Bamum}) (96)
\p{Block: Bamum_Sup} \p{Block=Bamum_Supplement} (576)
\p{Block: Bamum_Supplement} (Short: \p{Blk=BamumSup}) (576)
\p{Block: Basic_Latin} (Short: \p{Blk=ASCII}) (128)
\p{Block: Bassa_Vah} (NOT \p{Bassa_Vah} NOR \p{Is_Bassa_Vah})
(48)
\p{Block: Batak} (NOT \p{Batak} NOR \p{Is_Batak}) (64)
\p{Block: Bengali} (NOT \p{Bengali} NOR \p{Is_Bengali}) (128)
\p{Block: Bhaiksuki} (NOT \p{Bhaiksuki} NOR \p{Is_Bhaiksuki})
(112)
\p{Block: Block_Elements} (32)
\p{Block: Bopomofo} (NOT \p{Bopomofo} NOR \p{Is_Bopomofo}) (48)
\p{Block: Bopomofo_Ext} \p{Block=Bopomofo_Extended} (32)
\p{Block: Bopomofo_Extended} (Short: \p{Blk=BopomofoExt}) (32)
\p{Block: Box_Drawing} (128)
\p{Block: Brahmi} (NOT \p{Brahmi} NOR \p{Is_Brahmi}) (128)
\p{Block: Braille} \p{Block=Braille_Patterns} (256)
\p{Block: Braille_Patterns} (Short: \p{Blk=Braille}) (256)
\p{Block: Buginese} (NOT \p{Buginese} NOR \p{Is_Buginese}) (32)
\p{Block: Buhid} (NOT \p{Buhid} NOR \p{Is_Buhid}) (32)
\p{Block: Byzantine_Music} \p{Block=Byzantine_Musical_Symbols}
(256)
\p{Block: Byzantine_Musical_Symbols} (Short: \p{Blk=
ByzantineMusic}) (256)
\p{Block: Canadian_Syllabics} \p{Block=
Unified_Canadian_Aboriginal_Syllabics}
(640)
\p{Block: Carian} (NOT \p{Carian} NOR \p{Is_Carian}) (64)
\p{Block: Caucasian_Albanian} (NOT \p{Caucasian_Albanian} NOR
\p{Is_Caucasian_Albanian}) (64)
\p{Block: Chakma} (NOT \p{Chakma} NOR \p{Is_Chakma}) (80)
\p{Block: Cham} (NOT \p{Cham} NOR \p{Is_Cham}) (96)
\p{Block: Cherokee} (NOT \p{Cherokee} NOR \p{Is_Cherokee}) (96)
\p{Block: Cherokee_Sup} \p{Block=Cherokee_Supplement} (80)
\p{Block: Cherokee_Supplement} (Short: \p{Blk=CherokeeSup}) (80)
\p{Block: CJK} \p{Block=CJK_Unified_Ideographs} (20_992)
\p{Block: CJK_Compat} \p{Block=CJK_Compatibility} (256)
\p{Block: CJK_Compat_Forms} \p{Block=CJK_Compatibility_Forms} (32)
\p{Block: CJK_Compat_Ideographs} \p{Block=
CJK_Compatibility_Ideographs} (512)
\p{Block: CJK_Compat_Ideographs_Sup} \p{Block=
CJK_Compatibility_Ideographs_Supplement}
(544)
\p{Block: CJK_Compatibility} (Short: \p{Blk=CJKCompat}) (256)
\p{Block: CJK_Compatibility_Forms} (Short: \p{Blk=CJKCompatForms})
(32)
\p{Block: CJK_Compatibility_Ideographs} (Short: \p{Blk=
CJKCompatIdeographs}) (512)
\p{Block: CJK_Compatibility_Ideographs_Supplement} (Short: \p{Blk=
CJKCompatIdeographsSup}) (544)
\p{Block: CJK_Ext_A} \p{Block=
CJK_Unified_Ideographs_Extension_A}
(6592)
\p{Block: CJK_Ext_B} \p{Block=
CJK_Unified_Ideographs_Extension_B}
(42_720)
\p{Block: CJK_Ext_C} \p{Block=
CJK_Unified_Ideographs_Extension_C}
(4160)
\p{Block: CJK_Ext_D} \p{Block=
CJK_Unified_Ideographs_Extension_D} (224)
\p{Block: CJK_Ext_E} \p{Block=
CJK_Unified_Ideographs_Extension_E}
(5776)
\p{Block: CJK_Radicals_Sup} \p{Block=CJK_Radicals_Supplement} (128)
\p{Block: CJK_Radicals_Supplement} (Short: \p{Blk=CJKRadicalsSup})
(128)
\p{Block: CJK_Strokes} (48)
\p{Block: CJK_Symbols} \p{Block=CJK_Symbols_And_Punctuation} (64)
\p{Block: CJK_Symbols_And_Punctuation} (Short: \p{Blk=CJKSymbols})
(64)
\p{Block: CJK_Unified_Ideographs} (Short: \p{Blk=CJK}) (20_992)
\p{Block: CJK_Unified_Ideographs_Extension_A} (Short: \p{Blk=
CJKExtA}) (6592)
\p{Block: CJK_Unified_Ideographs_Extension_B} (Short: \p{Blk=
CJKExtB}) (42_720)
\p{Block: CJK_Unified_Ideographs_Extension_C} (Short: \p{Blk=
CJKExtC}) (4160)
\p{Block: CJK_Unified_Ideographs_Extension_D} (Short: \p{Blk=
CJKExtD}) (224)
\p{Block: CJK_Unified_Ideographs_Extension_E} (Short: \p{Blk=
CJKExtE}) (5776)
\p{Block: Combining_Diacritical_Marks} (Short: \p{Blk=
Diacriticals}) (112)
\p{Block: Combining_Diacritical_Marks_Extended} (Short: \p{Blk=
DiacriticalsExt}) (80)
\p{Block: Combining_Diacritical_Marks_For_Symbols} (Short: \p{Blk=
DiacriticalsForSymbols}) (48)
\p{Block: Combining_Diacritical_Marks_Supplement} (Short: \p{Blk=
DiacriticalsSup}) (64)
\p{Block: Combining_Half_Marks} (Short: \p{Blk=HalfMarks}) (16)
\p{Block: Combining_Marks_For_Symbols} \p{Block=
Combining_Diacritical_Marks_For_Symbols}
(48)
\p{Block: Common_Indic_Number_Forms} (Short: \p{Blk=
IndicNumberForms}) (16)
\p{Block: Compat_Jamo} \p{Block=Hangul_Compatibility_Jamo} (96)
\p{Block: Control_Pictures} (64)
\p{Block: Coptic} (NOT \p{Coptic} NOR \p{Is_Coptic}) (128)
\p{Block: Coptic_Epact_Numbers} (32)
\p{Block: Counting_Rod} \p{Block=Counting_Rod_Numerals} (32)
\p{Block: Counting_Rod_Numerals} (Short: \p{Blk=CountingRod}) (32)
\p{Block: Cuneiform} (NOT \p{Cuneiform} NOR \p{Is_Cuneiform})
(1024)
\p{Block: Cuneiform_Numbers} \p{Block=
Cuneiform_Numbers_And_Punctuation} (128)
\p{Block: Cuneiform_Numbers_And_Punctuation} (Short: \p{Blk=
CuneiformNumbers}) (128)
\p{Block: Currency_Symbols} (48)
\p{Block: Cypriot_Syllabary} (64)
\p{Block: Cyrillic} (NOT \p{Cyrillic} NOR \p{Is_Cyrillic})
(256)
\p{Block: Cyrillic_Ext_A} \p{Block=Cyrillic_Extended_A} (32)
\p{Block: Cyrillic_Ext_B} \p{Block=Cyrillic_Extended_B} (96)
\p{Block: Cyrillic_Ext_C} \p{Block=Cyrillic_Extended_C} (16)
\p{Block: Cyrillic_Extended_A} (Short: \p{Blk=CyrillicExtA}) (32)
\p{Block: Cyrillic_Extended_B} (Short: \p{Blk=CyrillicExtB}) (96)
\p{Block: Cyrillic_Extended_C} (Short: \p{Blk=CyrillicExtC}) (16)
\p{Block: Cyrillic_Sup} \p{Block=Cyrillic_Supplement} (48)
\p{Block: Cyrillic_Supplement} (Short: \p{Blk=CyrillicSup}) (48)
\p{Block: Cyrillic_Supplementary} \p{Block=Cyrillic_Supplement}
(48)
\p{Block: Deseret} (80)
\p{Block: Devanagari} (NOT \p{Devanagari} NOR \p{Is_Devanagari})
(128)
\p{Block: Devanagari_Ext} \p{Block=Devanagari_Extended} (32)
\p{Block: Devanagari_Extended} (Short: \p{Blk=DevanagariExt}) (32)
\p{Block: Diacriticals} \p{Block=Combining_Diacritical_Marks} (112)
\p{Block: Diacriticals_Ext} \p{Block=
Combining_Diacritical_Marks_Extended}
(80)
\p{Block: Diacriticals_For_Symbols} \p{Block=
Combining_Diacritical_Marks_For_Symbols}
(48)
\p{Block: Diacriticals_Sup} \p{Block=
Combining_Diacritical_Marks_Supplement}
(64)
\p{Block: Dingbats} (192)
\p{Block: Domino} \p{Block=Domino_Tiles} (112)
\p{Block: Domino_Tiles} (Short: \p{Blk=Domino}) (112)
\p{Block: Duployan} (NOT \p{Duployan} NOR \p{Is_Duployan})
(160)
\p{Block: Early_Dynastic_Cuneiform} (208)
\p{Block: Egyptian_Hieroglyphs} (NOT \p{Egyptian_Hieroglyphs} NOR
\p{Is_Egyptian_Hieroglyphs}) (1072)
\p{Block: Elbasan} (NOT \p{Elbasan} NOR \p{Is_Elbasan}) (48)
\p{Block: Emoticons} (80)
\p{Block: Enclosed_Alphanum} \p{Block=Enclosed_Alphanumerics} (160)
\p{Block: Enclosed_Alphanum_Sup} \p{Block=
Enclosed_Alphanumeric_Supplement} (256)
\p{Block: Enclosed_Alphanumeric_Supplement} (Short: \p{Blk=
EnclosedAlphanumSup}) (256)
\p{Block: Enclosed_Alphanumerics} (Short: \p{Blk=
EnclosedAlphanum}) (160)
\p{Block: Enclosed_CJK} \p{Block=Enclosed_CJK_Letters_And_Months}
(256)
\p{Block: Enclosed_CJK_Letters_And_Months} (Short: \p{Blk=
EnclosedCJK}) (256)
\p{Block: Enclosed_Ideographic_Sup} \p{Block=
Enclosed_Ideographic_Supplement} (256)
\p{Block: Enclosed_Ideographic_Supplement} (Short: \p{Blk=
EnclosedIdeographicSup}) (256)
\p{Block: Ethiopic} (NOT \p{Ethiopic} NOR \p{Is_Ethiopic})
(384)
\p{Block: Ethiopic_Ext} \p{Block=Ethiopic_Extended} (96)
\p{Block: Ethiopic_Ext_A} \p{Block=Ethiopic_Extended_A} (48)
\p{Block: Ethiopic_Extended} (Short: \p{Blk=EthiopicExt}) (96)
\p{Block: Ethiopic_Extended_A} (Short: \p{Blk=EthiopicExtA}) (48)
\p{Block: Ethiopic_Sup} \p{Block=Ethiopic_Supplement} (32)
\p{Block: Ethiopic_Supplement} (Short: \p{Blk=EthiopicSup}) (32)
\p{Block: General_Punctuation} (Short: \p{Blk=Punctuation}; NOT
\p{Punct} NOR \p{Is_Punctuation}) (112)
\p{Block: Geometric_Shapes} (96)
\p{Block: Geometric_Shapes_Ext} \p{Block=
Geometric_Shapes_Extended} (128)
\p{Block: Geometric_Shapes_Extended} (Short: \p{Blk=
GeometricShapesExt}) (128)
\p{Block: Georgian} (NOT \p{Georgian} NOR \p{Is_Georgian}) (96)
\p{Block: Georgian_Sup} \p{Block=Georgian_Supplement} (48)
\p{Block: Georgian_Supplement} (Short: \p{Blk=GeorgianSup}) (48)
\p{Block: Glagolitic} (NOT \p{Glagolitic} NOR \p{Is_Glagolitic})
(96)
\p{Block: Glagolitic_Sup} \p{Block=Glagolitic_Supplement} (48)
\p{Block: Glagolitic_Supplement} (Short: \p{Blk=GlagoliticSup})
(48)
\p{Block: Gothic} (NOT \p{Gothic} NOR \p{Is_Gothic}) (32)
\p{Block: Grantha} (NOT \p{Grantha} NOR \p{Is_Grantha}) (128)
\p{Block: Greek} \p{Block=Greek_And_Coptic} (NOT \p{Greek}
NOR \p{Is_Greek}) (144)
\p{Block: Greek_And_Coptic} (Short: \p{Blk=Greek}; NOT \p{Greek}
NOR \p{Is_Greek}) (144)
\p{Block: Greek_Ext} \p{Block=Greek_Extended} (256)
\p{Block: Greek_Extended} (Short: \p{Blk=GreekExt}) (256)
\p{Block: Gujarati} (NOT \p{Gujarati} NOR \p{Is_Gujarati})
(128)
\p{Block: Gurmukhi} (NOT \p{Gurmukhi} NOR \p{Is_Gurmukhi})
(128)
\p{Block: Half_And_Full_Forms} \p{Block=
Halfwidth_And_Fullwidth_Forms} (240)
\p{Block: Half_Marks} \p{Block=Combining_Half_Marks} (16)
\p{Block: Halfwidth_And_Fullwidth_Forms} (Short: \p{Blk=
HalfAndFullForms}) (240)
\p{Block: Hangul} \p{Block=Hangul_Syllables} (NOT \p{Hangul}
NOR \p{Is_Hangul}) (11_184)
\p{Block: Hangul_Compatibility_Jamo} (Short: \p{Blk=CompatJamo})
(96)
\p{Block: Hangul_Jamo} (Short: \p{Blk=Jamo}) (256)
\p{Block: Hangul_Jamo_Extended_A} (Short: \p{Blk=JamoExtA}) (32)
\p{Block: Hangul_Jamo_Extended_B} (Short: \p{Blk=JamoExtB}) (80)
\p{Block: Hangul_Syllables} (Short: \p{Blk=Hangul}; NOT \p{Hangul}
NOR \p{Is_Hangul}) (11_184)
\p{Block: Hanunoo} (NOT \p{Hanunoo} NOR \p{Is_Hanunoo}) (32)
\p{Block: Hatran} (NOT \p{Hatran} NOR \p{Is_Hatran}) (32)
\p{Block: Hebrew} (NOT \p{Hebrew} NOR \p{Is_Hebrew}) (112)
\p{Block: High_Private_Use_Surrogates} (Short: \p{Blk=
HighPUSurrogates}) (128)
\p{Block: High_PU_Surrogates} \p{Block=
High_Private_Use_Surrogates} (128)
\p{Block: High_Surrogates} (896)
\p{Block: Hiragana} (NOT \p{Hiragana} NOR \p{Is_Hiragana}) (96)
\p{Block: IDC} \p{Block=
Ideographic_Description_Characters} (NOT
\p{ID_Continue} NOR \p{Is_IDC}) (16)
\p{Block: Ideographic_Description_Characters} (Short: \p{Blk=IDC};
NOT \p{ID_Continue} NOR \p{Is_IDC}) (16)
\p{Block: Ideographic_Symbols} \p{Block=
Ideographic_Symbols_And_Punctuation} (32)
\p{Block: Ideographic_Symbols_And_Punctuation} (Short: \p{Blk=
IdeographicSymbols}) (32)
\p{Block: Imperial_Aramaic} (NOT \p{Imperial_Aramaic} NOR
\p{Is_Imperial_Aramaic}) (32)
\p{Block: Indic_Number_Forms} \p{Block=Common_Indic_Number_Forms}
(16)
\p{Block: Inscriptional_Pahlavi} (NOT \p{Inscriptional_Pahlavi}
NOR \p{Is_Inscriptional_Pahlavi}) (32)
\p{Block: Inscriptional_Parthian} (NOT \p{Inscriptional_Parthian}
NOR \p{Is_Inscriptional_Parthian}) (32)
\p{Block: IPA_Ext} \p{Block=IPA_Extensions} (96)
\p{Block: IPA_Extensions} (Short: \p{Blk=IPAExt}) (96)
\p{Block: Jamo} \p{Block=Hangul_Jamo} (256)
\p{Block: Jamo_Ext_A} \p{Block=Hangul_Jamo_Extended_A} (32)
\p{Block: Jamo_Ext_B} \p{Block=Hangul_Jamo_Extended_B} (80)
\p{Block: Javanese} (NOT \p{Javanese} NOR \p{Is_Javanese}) (96)
\p{Block: Kaithi} (NOT \p{Kaithi} NOR \p{Is_Kaithi}) (80)
\p{Block: Kana_Sup} \p{Block=Kana_Supplement} (256)
\p{Block: Kana_Supplement} (Short: \p{Blk=KanaSup}) (256)
\p{Block: Kanbun} (16)
\p{Block: Kangxi} \p{Block=Kangxi_Radicals} (224)
\p{Block: Kangxi_Radicals} (Short: \p{Blk=Kangxi}) (224)
\p{Block: Kannada} (NOT \p{Kannada} NOR \p{Is_Kannada}) (128)
\p{Block: Katakana} (NOT \p{Katakana} NOR \p{Is_Katakana}) (96)
\p{Block: Katakana_Ext} \p{Block=Katakana_Phonetic_Extensions} (16)
\p{Block: Katakana_Phonetic_Extensions} (Short: \p{Blk=
KatakanaExt}) (16)
\p{Block: Kayah_Li} (48)
\p{Block: Kharoshthi} (NOT \p{Kharoshthi} NOR \p{Is_Kharoshthi})
(96)
\p{Block: Khmer} (NOT \p{Khmer} NOR \p{Is_Khmer}) (128)
\p{Block: Khmer_Symbols} (32)
\p{Block: Khojki} (NOT \p{Khojki} NOR \p{Is_Khojki}) (80)
\p{Block: Khudawadi} (NOT \p{Khudawadi} NOR \p{Is_Khudawadi})
(80)
\p{Block: Lao} (NOT \p{Lao} NOR \p{Is_Lao}) (128)
\p{Block: Latin_1} \p{Block=Latin_1_Supplement} (128)
\p{Block: Latin_1_Sup} \p{Block=Latin_1_Supplement} (128)
\p{Block: Latin_1_Supplement} (Short: \p{Blk=Latin1}) (128)
\p{Block: Latin_Ext_A} \p{Block=Latin_Extended_A} (128)
\p{Block: Latin_Ext_Additional} \p{Block=
Latin_Extended_Additional} (256)
\p{Block: Latin_Ext_B} \p{Block=Latin_Extended_B} (208)
\p{Block: Latin_Ext_C} \p{Block=Latin_Extended_C} (32)
\p{Block: Latin_Ext_D} \p{Block=Latin_Extended_D} (224)
\p{Block: Latin_Ext_E} \p{Block=Latin_Extended_E} (64)
\p{Block: Latin_Extended_A} (Short: \p{Blk=LatinExtA}) (128)
\p{Block: Latin_Extended_Additional} (Short: \p{Blk=
LatinExtAdditional}) (256)
\p{Block: Latin_Extended_B} (Short: \p{Blk=LatinExtB}) (208)
\p{Block: Latin_Extended_C} (Short: \p{Blk=LatinExtC}) (32)
\p{Block: Latin_Extended_D} (Short: \p{Blk=LatinExtD}) (224)
\p{Block: Latin_Extended_E} (Short: \p{Blk=LatinExtE}) (64)
\p{Block: Lepcha} (NOT \p{Lepcha} NOR \p{Is_Lepcha}) (80)
\p{Block: Letterlike_Symbols} (80)
\p{Block: Limbu} (NOT \p{Limbu} NOR \p{Is_Limbu}) (80)
\p{Block: Linear_A} (NOT \p{Linear_A} NOR \p{Is_Linear_A})
(384)
\p{Block: Linear_B_Ideograms} (128)
\p{Block: Linear_B_Syllabary} (128)
\p{Block: Lisu} (48)
\p{Block: Low_Surrogates} (1024)
\p{Block: Lycian} (NOT \p{Lycian} NOR \p{Is_Lycian}) (32)
\p{Block: Lydian} (NOT \p{Lydian} NOR \p{Is_Lydian}) (32)
\p{Block: Mahajani} (NOT \p{Mahajani} NOR \p{Is_Mahajani}) (48)
\p{Block: Mahjong} \p{Block=Mahjong_Tiles} (48)
\p{Block: Mahjong_Tiles} (Short: \p{Blk=Mahjong}) (48)
\p{Block: Malayalam} (NOT \p{Malayalam} NOR \p{Is_Malayalam})
(128)
\p{Block: Mandaic} (NOT \p{Mandaic} NOR \p{Is_Mandaic}) (32)
\p{Block: Manichaean} (NOT \p{Manichaean} NOR \p{Is_Manichaean})
(64)
\p{Block: Marchen} (NOT \p{Marchen} NOR \p{Is_Marchen}) (80)
\p{Block: Math_Alphanum} \p{Block=
Mathematical_Alphanumeric_Symbols} (1024)
\p{Block: Math_Operators} \p{Block=Mathematical_Operators} (256)
\p{Block: Mathematical_Alphanumeric_Symbols} (Short: \p{Blk=
MathAlphanum}) (1024)
\p{Block: Mathematical_Operators} (Short: \p{Blk=MathOperators})
(256)
\p{Block: Meetei_Mayek} (NOT \p{Meetei_Mayek} NOR
\p{Is_Meetei_Mayek}) (64)
\p{Block: Meetei_Mayek_Ext} \p{Block=Meetei_Mayek_Extensions} (32)
\p{Block: Meetei_Mayek_Extensions} (Short: \p{Blk=MeeteiMayekExt})
(32)
\p{Block: Mende_Kikakui} (NOT \p{Mende_Kikakui} NOR
\p{Is_Mende_Kikakui}) (224)
\p{Block: Meroitic_Cursive} (NOT \p{Meroitic_Cursive} NOR
\p{Is_Meroitic_Cursive}) (96)
\p{Block: Meroitic_Hieroglyphs} (32)
\p{Block: Miao} (NOT \p{Miao} NOR \p{Is_Miao}) (160)
\p{Block: Misc_Arrows} \p{Block=Miscellaneous_Symbols_And_Arrows}
(256)
\p{Block: Misc_Math_Symbols_A} \p{Block=
Miscellaneous_Mathematical_Symbols_A}
(48)
\p{Block: Misc_Math_Symbols_B} \p{Block=
Miscellaneous_Mathematical_Symbols_B}
(128)
\p{Block: Misc_Pictographs} \p{Block=
Miscellaneous_Symbols_And_Pictographs}
(768)
\p{Block: Misc_Symbols} \p{Block=Miscellaneous_Symbols} (256)
\p{Block: Misc_Technical} \p{Block=Miscellaneous_Technical} (256)
\p{Block: Miscellaneous_Mathematical_Symbols_A} (Short: \p{Blk=
MiscMathSymbolsA}) (48)
\p{Block: Miscellaneous_Mathematical_Symbols_B} (Short: \p{Blk=
MiscMathSymbolsB}) (128)
\p{Block: Miscellaneous_Symbols} (Short: \p{Blk=MiscSymbols}) (256)
\p{Block: Miscellaneous_Symbols_And_Arrows} (Short: \p{Blk=
MiscArrows}) (256)
\p{Block: Miscellaneous_Symbols_And_Pictographs} (Short: \p{Blk=
MiscPictographs}) (768)
\p{Block: Miscellaneous_Technical} (Short: \p{Blk=MiscTechnical})
(256)
\p{Block: Modi} (NOT \p{Modi} NOR \p{Is_Modi}) (96)
\p{Block: Modifier_Letters} \p{Block=Spacing_Modifier_Letters} (80)
\p{Block: Modifier_Tone_Letters} (32)
\p{Block: Mongolian} (NOT \p{Mongolian} NOR \p{Is_Mongolian})
(176)
\p{Block: Mongolian_Sup} \p{Block=Mongolian_Supplement} (32)
\p{Block: Mongolian_Supplement} (Short: \p{Blk=MongolianSup}) (32)
\p{Block: Mro} (NOT \p{Mro} NOR \p{Is_Mro}) (48)
\p{Block: Multani} (NOT \p{Multani} NOR \p{Is_Multani}) (48)
\p{Block: Music} \p{Block=Musical_Symbols} (256)
\p{Block: Musical_Symbols} (Short: \p{Blk=Music}) (256)
\p{Block: Myanmar} (NOT \p{Myanmar} NOR \p{Is_Myanmar}) (160)
\p{Block: Myanmar_Ext_A} \p{Block=Myanmar_Extended_A} (32)
\p{Block: Myanmar_Ext_B} \p{Block=Myanmar_Extended_B} (32)
\p{Block: Myanmar_Extended_A} (Short: \p{Blk=MyanmarExtA}) (32)
\p{Block: Myanmar_Extended_B} (Short: \p{Blk=MyanmarExtB}) (32)
\p{Block: Nabataean} (NOT \p{Nabataean} NOR \p{Is_Nabataean})
(48)
\p{Block: NB} \p{Block=No_Block} (842_320 plus all
above-Unicode code points)
\p{Block: New_Tai_Lue} (NOT \p{New_Tai_Lue} NOR
\p{Is_New_Tai_Lue}) (96)
\p{Block: Newa} (NOT \p{Newa} NOR \p{Is_Newa}) (128)
\p{Block: NKo} (NOT \p{Nko} NOR \p{Is_NKo}) (64)
\p{Block: No_Block} (Short: \p{Blk=NB}) (842_320 plus all
above-Unicode code points)
\p{Block: Number_Forms} (64)
\p{Block: OCR} \p{Block=Optical_Character_Recognition}
(32)
\p{Block: Ogham} (NOT \p{Ogham} NOR \p{Is_Ogham}) (32)
\p{Block: Ol_Chiki} (48)
\p{Block: Old_Hungarian} (NOT \p{Old_Hungarian} NOR
\p{Is_Old_Hungarian}) (128)
\p{Block: Old_Italic} (NOT \p{Old_Italic} NOR \p{Is_Old_Italic})
(48)
\p{Block: Old_North_Arabian} (32)
\p{Block: Old_Permic} (NOT \p{Old_Permic} NOR \p{Is_Old_Permic})
(48)
\p{Block: Old_Persian} (NOT \p{Old_Persian} NOR
\p{Is_Old_Persian}) (64)
\p{Block: Old_South_Arabian} (32)
\p{Block: Old_Turkic} (NOT \p{Old_Turkic} NOR \p{Is_Old_Turkic})
(80)
\p{Block: Optical_Character_Recognition} (Short: \p{Blk=OCR}) (32)
\p{Block: Oriya} (NOT \p{Oriya} NOR \p{Is_Oriya}) (128)
\p{Block: Ornamental_Dingbats} (48)
\p{Block: Osage} (NOT \p{Osage} NOR \p{Is_Osage}) (80)
\p{Block: Osmanya} (NOT \p{Osmanya} NOR \p{Is_Osmanya}) (48)
\p{Block: Pahawh_Hmong} (NOT \p{Pahawh_Hmong} NOR
\p{Is_Pahawh_Hmong}) (144)
\p{Block: Palmyrene} (32)
\p{Block: Pau_Cin_Hau} (NOT \p{Pau_Cin_Hau} NOR
\p{Is_Pau_Cin_Hau}) (64)
\p{Block: Phags_Pa} (NOT \p{Phags_Pa} NOR \p{Is_Phags_Pa}) (64)
\p{Block: Phaistos} \p{Block=Phaistos_Disc} (48)
\p{Block: Phaistos_Disc} (Short: \p{Blk=Phaistos}) (48)
\p{Block: Phoenician} (NOT \p{Phoenician} NOR \p{Is_Phoenician})
(32)
\p{Block: Phonetic_Ext} \p{Block=Phonetic_Extensions} (128)
\p{Block: Phonetic_Ext_Sup} \p{Block=
Phonetic_Extensions_Supplement} (64)
\p{Block: Phonetic_Extensions} (Short: \p{Blk=PhoneticExt}) (128)
\p{Block: Phonetic_Extensions_Supplement} (Short: \p{Blk=
PhoneticExtSup}) (64)
\p{Block: Playing_Cards} (96)
\p{Block: Private_Use} \p{Block=Private_Use_Area} (NOT
\p{Private_Use} NOR \p{Is_Private_Use})
(6400)
\p{Block: Private_Use_Area} (Short: \p{Blk=PUA}; NOT
\p{Private_Use} NOR \p{Is_Private_Use})
(6400)
\p{Block: Psalter_Pahlavi} (NOT \p{Psalter_Pahlavi} NOR
\p{Is_Psalter_Pahlavi}) (48)
\p{Block: PUA} \p{Block=Private_Use_Area} (NOT
\p{Private_Use} NOR \p{Is_Private_Use})
(6400)
\p{Block: Punctuation} \p{Block=General_Punctuation} (NOT
\p{Punct} NOR \p{Is_Punctuation}) (112)
\p{Block: Rejang} (NOT \p{Rejang} NOR \p{Is_Rejang}) (48)
\p{Block: Rumi} \p{Block=Rumi_Numeral_Symbols} (32)
\p{Block: Rumi_Numeral_Symbols} (Short: \p{Blk=Rumi}) (32)
\p{Block: Runic} (NOT \p{Runic} NOR \p{Is_Runic}) (96)
\p{Block: Samaritan} (NOT \p{Samaritan} NOR \p{Is_Samaritan})
(64)
\p{Block: Saurashtra} (NOT \p{Saurashtra} NOR \p{Is_Saurashtra})
(96)
\p{Block: Sharada} (NOT \p{Sharada} NOR \p{Is_Sharada}) (96)
\p{Block: Shavian} (48)
\p{Block: Shorthand_Format_Controls} (16)
\p{Block: Siddham} (NOT \p{Siddham} NOR \p{Is_Siddham}) (128)
\p{Block: Sinhala} (NOT \p{Sinhala} NOR \p{Is_Sinhala}) (128)
\p{Block: Sinhala_Archaic_Numbers} (32)
\p{Block: Small_Form_Variants} (Short: \p{Blk=SmallForms}) (32)
\p{Block: Small_Forms} \p{Block=Small_Form_Variants} (32)
\p{Block: Sora_Sompeng} (NOT \p{Sora_Sompeng} NOR
\p{Is_Sora_Sompeng}) (48)
\p{Block: Spacing_Modifier_Letters} (Short: \p{Blk=
ModifierLetters}) (80)
\p{Block: Specials} (16)
\p{Block: Sundanese} (NOT \p{Sundanese} NOR \p{Is_Sundanese})
(64)
\p{Block: Sundanese_Sup} \p{Block=Sundanese_Supplement} (16)
\p{Block: Sundanese_Supplement} (Short: \p{Blk=SundaneseSup}) (16)
\p{Block: Sup_Arrows_A} \p{Block=Supplemental_Arrows_A} (16)
\p{Block: Sup_Arrows_B} \p{Block=Supplemental_Arrows_B} (128)
\p{Block: Sup_Arrows_C} \p{Block=Supplemental_Arrows_C} (256)
\p{Block: Sup_Math_Operators} \p{Block=
Supplemental_Mathematical_Operators}
(256)
\p{Block: Sup_PUA_A} \p{Block=Supplementary_Private_Use_Area_A}
(65_536)
\p{Block: Sup_PUA_B} \p{Block=Supplementary_Private_Use_Area_B}
(65_536)
\p{Block: Sup_Punctuation} \p{Block=Supplemental_Punctuation} (128)
\p{Block: Sup_Symbols_And_Pictographs} \p{Block=
Supplemental_Symbols_And_Pictographs}
(256)
\p{Block: Super_And_Sub} \p{Block=Superscripts_And_Subscripts} (48)
\p{Block: Superscripts_And_Subscripts} (Short: \p{Blk=
SuperAndSub}) (48)
\p{Block: Supplemental_Arrows_A} (Short: \p{Blk=SupArrowsA}) (16)
\p{Block: Supplemental_Arrows_B} (Short: \p{Blk=SupArrowsB}) (128)
\p{Block: Supplemental_Arrows_C} (Short: \p{Blk=SupArrowsC}) (256)
\p{Block: Supplemental_Mathematical_Operators} (Short: \p{Blk=
SupMathOperators}) (256)
\p{Block: Supplemental_Punctuation} (Short: \p{Blk=
SupPunctuation}) (128)
\p{Block: Supplemental_Symbols_And_Pictographs} (Short: \p{Blk=
SupSymbolsAndPictographs}) (256)
\p{Block: Supplementary_Private_Use_Area_A} (Short: \p{Blk=
SupPUAA}) (65_536)
\p{Block: Supplementary_Private_Use_Area_B} (Short: \p{Blk=
SupPUAB}) (65_536)
\p{Block: Sutton_SignWriting} (688)
\p{Block: Syloti_Nagri} (NOT \p{Syloti_Nagri} NOR
\p{Is_Syloti_Nagri}) (48)
\p{Block: Syriac} (NOT \p{Syriac} NOR \p{Is_Syriac}) (80)
\p{Block: Tagalog} (NOT \p{Tagalog} NOR \p{Is_Tagalog}) (32)
\p{Block: Tagbanwa} (NOT \p{Tagbanwa} NOR \p{Is_Tagbanwa}) (32)
\p{Block: Tags} (128)
\p{Block: Tai_Le} (NOT \p{Tai_Le} NOR \p{Is_Tai_Le}) (48)
\p{Block: Tai_Tham} (NOT \p{Tai_Tham} NOR \p{Is_Tai_Tham})
(144)
\p{Block: Tai_Viet} (NOT \p{Tai_Viet} NOR \p{Is_Tai_Viet}) (96)
\p{Block: Tai_Xuan_Jing} \p{Block=Tai_Xuan_Jing_Symbols} (96)
\p{Block: Tai_Xuan_Jing_Symbols} (Short: \p{Blk=TaiXuanJing}) (96)
\p{Block: Takri} (NOT \p{Takri} NOR \p{Is_Takri}) (80)
\p{Block: Tamil} (NOT \p{Tamil} NOR \p{Is_Tamil}) (128)
\p{Block: Tangut} (NOT \p{Tangut} NOR \p{Is_Tangut}) (6144)
\p{Block: Tangut_Components} (768)
\p{Block: Telugu} (NOT \p{Telugu} NOR \p{Is_Telugu}) (128)
\p{Block: Thaana} (NOT \p{Thaana} NOR \p{Is_Thaana}) (64)
\p{Block: Thai} (NOT \p{Thai} NOR \p{Is_Thai}) (128)
\p{Block: Tibetan} (NOT \p{Tibetan} NOR \p{Is_Tibetan}) (256)
\p{Block: Tifinagh} (NOT \p{Tifinagh} NOR \p{Is_Tifinagh}) (80)
\p{Block: Tirhuta} (NOT \p{Tirhuta} NOR \p{Is_Tirhuta}) (96)
\p{Block: Transport_And_Map} \p{Block=Transport_And_Map_Symbols}
(128)
\p{Block: Transport_And_Map_Symbols} (Short: \p{Blk=
TransportAndMap}) (128)
\p{Block: UCAS} \p{Block=
Unified_Canadian_Aboriginal_Syllabics}
(640)
\p{Block: UCAS_Ext} \p{Block=
Unified_Canadian_Aboriginal_Syllabics_-
Extended} (80)
\p{Block: Ugaritic} (NOT \p{Ugaritic} NOR \p{Is_Ugaritic}) (32)
\p{Block: Unified_Canadian_Aboriginal_Syllabics} (Short: \p{Blk=
UCAS}) (640)
\p{Block: Unified_Canadian_Aboriginal_Syllabics_Extended} (Short:
\p{Blk=UCASExt}) (80)
\p{Block: Vai} (NOT \p{Vai} NOR \p{Is_Vai}) (320)
\p{Block: Variation_Selectors} (Short: \p{Blk=VS}; NOT
\p{Variation_Selector} NOR \p{Is_VS})
(16)
\p{Block: Variation_Selectors_Supplement} (Short: \p{Blk=VSSup})
(240)
\p{Block: Vedic_Ext} \p{Block=Vedic_Extensions} (48)
\p{Block: Vedic_Extensions} (Short: \p{Blk=VedicExt}) (48)
\p{Block: Vertical_Forms} (16)
\p{Block: VS} \p{Block=Variation_Selectors} (NOT
\p{Variation_Selector} NOR \p{Is_VS})
(16)
\p{Block: VS_Sup} \p{Block=Variation_Selectors_Supplement}
(240)
\p{Block: Warang_Citi} (NOT \p{Warang_Citi} NOR
\p{Is_Warang_Citi}) (96)
\p{Block: Yi_Radicals} (64)
\p{Block: Yi_Syllables} (1168)
\p{Block: Yijing} \p{Block=Yijing_Hexagram_Symbols} (64)
\p{Block: Yijing_Hexagram_Symbols} (Short: \p{Blk=Yijing}) (64)
X \p{Block_Elements} \p{Block=Block_Elements} (32)
\p{Bopo} \p{Bopomofo} (= \p{Script_Extensions=
Bopomofo}) (NOT \p{Block=Bopomofo}) (110)
\p{Bopomofo} \p{Script_Extensions=Bopomofo} (Short:
\p{Bopo}; NOT \p{Block=Bopomofo}) (110)
X \p{Bopomofo_Ext} \p{Bopomofo_Extended} (= \p{Block=
Bopomofo_Extended}) (32)
X \p{Bopomofo_Extended} \p{Block=Bopomofo_Extended} (Short:
\p{InBopomofoExt}) (32)
X \p{Box_Drawing} \p{Block=Box_Drawing} (128)
\p{Bpt: *} \p{Bidi_Paired_Bracket_Type: *}
\p{Brah} \p{Brahmi} (= \p{Script_Extensions=
Brahmi}) (NOT \p{Block=Brahmi}) (109)
\p{Brahmi} \p{Script_Extensions=Brahmi} (Short:
\p{Brah}; NOT \p{Block=Brahmi}) (109)
\p{Brai} \p{Braille} (= \p{Script_Extensions=
Braille}) (256)
\p{Braille} \p{Script_Extensions=Braille} (Short:
\p{Brai}) (256)
X \p{Braille_Patterns} \p{Block=Braille_Patterns} (Short:
\p{InBraille}) (256)
\p{Bugi} \p{Buginese} (= \p{Script_Extensions=
Buginese}) (NOT \p{Block=Buginese}) (31)
\p{Buginese} \p{Script_Extensions=Buginese} (Short:
\p{Bugi}; NOT \p{Block=Buginese}) (31)
\p{Buhd} \p{Buhid} (= \p{Script_Extensions=Buhid})
(NOT \p{Block=Buhid}) (22)
\p{Buhid} \p{Script_Extensions=Buhid} (Short:
\p{Buhd}; NOT \p{Block=Buhid}) (22)
X \p{Byzantine_Music} \p{Byzantine_Musical_Symbols} (= \p{Block=
Byzantine_Musical_Symbols}) (256)
X \p{Byzantine_Musical_Symbols} \p{Block=Byzantine_Musical_Symbols}
(Short: \p{InByzantineMusic}) (256)
\p{C} \pC \p{Other} (= \p{General_Category=Other})
(986_091 plus all above-Unicode code
points)
\p{Cakm} \p{Chakma} (= \p{Script_Extensions=
Chakma}) (NOT \p{Block=Chakma}) (87)
\p{Canadian_Aboriginal} \p{Script_Extensions=Canadian_Aboriginal}
(Short: \p{Cans}) (710)
X \p{Canadian_Syllabics} \p{Unified_Canadian_Aboriginal_Syllabics}
(= \p{Block=
Unified_Canadian_Aboriginal_Syllabics})
(640)
T \p{Canonical_Combining_Class: 0} \p{Canonical_Combining_Class=
Not_Reordered} (1_113_298 plus all
above-Unicode code points)
T \p{Canonical_Combining_Class: 1} \p{Canonical_Combining_Class=
Overlay} (32)
T \p{Canonical_Combining_Class: 7} \p{Canonical_Combining_Class=
Nukta} (22)
T \p{Canonical_Combining_Class: 8} \p{Canonical_Combining_Class=
Kana_Voicing} (2)
T \p{Canonical_Combining_Class: 9} \p{Canonical_Combining_Class=
Virama} (47)
T \p{Canonical_Combining_Class: 10} \p{Canonical_Combining_Class=
CCC10} (1)
T \p{Canonical_Combining_Class: 11} \p{Canonical_Combining_Class=
CCC11} (1)
T \p{Canonical_Combining_Class: 12} \p{Canonical_Combining_Class=
CCC12} (1)
T \p{Canonical_Combining_Class: 13} \p{Canonical_Combining_Class=
CCC13} (1)
T \p{Canonical_Combining_Class: 14} \p{Canonical_Combining_Class=
CCC14} (1)
T \p{Canonical_Combining_Class: 15} \p{Canonical_Combining_Class=
CCC15} (1)
T \p{Canonical_Combining_Class: 16} \p{Canonical_Combining_Class=
CCC16} (1)
T \p{Canonical_Combining_Class: 17} \p{Canonical_Combining_Class=
CCC17} (1)
T \p{Canonical_Combining_Class: 18} \p{Canonical_Combining_Class=
CCC18} (2)
T \p{Canonical_Combining_Class: 19} \p{Canonical_Combining_Class=
CCC19} (2)
T \p{Canonical_Combining_Class: 20} \p{Canonical_Combining_Class=
CCC20} (1)
T \p{Canonical_Combining_Class: 21} \p{Canonical_Combining_Class=
CCC21} (1)
T \p{Canonical_Combining_Class: 22} \p{Canonical_Combining_Class=
CCC22} (1)
T \p{Canonical_Combining_Class: 23} \p{Canonical_Combining_Class=
CCC23} (1)
T \p{Canonical_Combining_Class: 24} \p{Canonical_Combining_Class=
CCC24} (1)
T \p{Canonical_Combining_Class: 25} \p{Canonical_Combining_Class=
CCC25} (1)
T \p{Canonical_Combining_Class: 26} \p{Canonical_Combining_Class=
CCC26} (1)
T \p{Canonical_Combining_Class: 27} \p{Canonical_Combining_Class=
CCC27} (2)
T \p{Canonical_Combining_Class: 28} \p{Canonical_Combining_Class=
CCC28} (2)
T \p{Canonical_Combining_Class: 29} \p{Canonical_Combining_Class=
CCC29} (2)
T \p{Canonical_Combining_Class: 30} \p{Canonical_Combining_Class=
CCC30} (2)
T \p{Canonical_Combining_Class: 31} \p{Canonical_Combining_Class=
CCC31} (2)
T \p{Canonical_Combining_Class: 32} \p{Canonical_Combining_Class=
CCC32} (2)
T \p{Canonical_Combining_Class: 33} \p{Canonical_Combining_Class=
CCC33} (1)
T \p{Canonical_Combining_Class: 34} \p{Canonical_Combining_Class=
CCC34} (1)
T \p{Canonical_Combining_Class: 35} \p{Canonical_Combining_Class=
CCC35} (1)
T \p{Canonical_Combining_Class: 36} \p{Canonical_Combining_Class=
CCC36} (1)
T \p{Canonical_Combining_Class: 84} \p{Canonical_Combining_Class=
CCC84} (1)
T \p{Canonical_Combining_Class: 91} \p{Canonical_Combining_Class=
CCC91} (1)
T \p{Canonical_Combining_Class: 103} \p{Canonical_Combining_Class=
CCC103} (2)
T \p{Canonical_Combining_Class: 107} \p{Canonical_Combining_Class=
CCC107} (4)
T \p{Canonical_Combining_Class: 118} \p{Canonical_Combining_Class=
CCC118} (2)
T \p{Canonical_Combining_Class: 122} \p{Canonical_Combining_Class=
CCC122} (4)
T \p{Canonical_Combining_Class: 129} \p{Canonical_Combining_Class=
CCC129} (1)
T \p{Canonical_Combining_Class: 130} \p{Canonical_Combining_Class=
CCC130} (6)
T \p{Canonical_Combining_Class: 132} \p{Canonical_Combining_Class=
CCC132} (1)
T \p{Canonical_Combining_Class: 133} \p{Canonical_Combining_Class=
CCC133} (0)
T \p{Canonical_Combining_Class: 200} \p{Canonical_Combining_Class=
Attached_Below_Left} (0)
T \p{Canonical_Combining_Class: 202} \p{Canonical_Combining_Class=
Attached_Below} (5)
T \p{Canonical_Combining_Class: 214} \p{Canonical_Combining_Class=
Attached_Above} (1)
T \p{Canonical_Combining_Class: 216} \p{Canonical_Combining_Class=
Attached_Above_Right} (9)
T \p{Canonical_Combining_Class: 218} \p{Canonical_Combining_Class=
Below_Left} (1)
T \p{Canonical_Combining_Class: 220} \p{Canonical_Combining_Class=
Below} (153)
T \p{Canonical_Combining_Class: 222} \p{Canonical_Combining_Class=
Below_Right} (4)
T \p{Canonical_Combining_Class: 224} \p{Canonical_Combining_Class=
Left} (2)
T \p{Canonical_Combining_Class: 226} \p{Canonical_Combining_Class=
Right} (1)
T \p{Canonical_Combining_Class: 228} \p{Canonical_Combining_Class=
Above_Left} (3)
T \p{Canonical_Combining_Class: 230} \p{Canonical_Combining_Class=
Above} (461)
T \p{Canonical_Combining_Class: 232} \p{Canonical_Combining_Class=
Above_Right} (4)
T \p{Canonical_Combining_Class: 233} \p{Canonical_Combining_Class=
Double_Below} (4)
T \p{Canonical_Combining_Class: 234} \p{Canonical_Combining_Class=
Double_Above} (5)
T \p{Canonical_Combining_Class: 240} \p{Canonical_Combining_Class=
Iota_Subscript} (1)
\p{Canonical_Combining_Class: A} \p{Canonical_Combining_Class=
Above} (461)
\p{Canonical_Combining_Class: Above} (Short: \p{Ccc=A}) (461)
\p{Canonical_Combining_Class: Above_Left} (Short: \p{Ccc=AL}) (3)
\p{Canonical_Combining_Class: Above_Right} (Short: \p{Ccc=AR}) (4)
\p{Canonical_Combining_Class: AL} \p{Canonical_Combining_Class=
Above_Left} (3)
\p{Canonical_Combining_Class: AR} \p{Canonical_Combining_Class=
Above_Right} (4)
\p{Canonical_Combining_Class: ATA} \p{Canonical_Combining_Class=
Attached_Above} (1)
\p{Canonical_Combining_Class: ATAR} \p{Canonical_Combining_Class=
Attached_Above_Right} (9)
\p{Canonical_Combining_Class: ATB} \p{Canonical_Combining_Class=
Attached_Below} (5)
\p{Canonical_Combining_Class: ATBL} \p{Canonical_Combining_Class=
Attached_Below_Left} (0)
\p{Canonical_Combining_Class: Attached_Above} (Short: \p{Ccc=ATA})
(1)
\p{Canonical_Combining_Class: Attached_Above_Right} (Short:
\p{Ccc=ATAR}) (9)
\p{Canonical_Combining_Class: Attached_Below} (Short: \p{Ccc=ATB})
(5)
\p{Canonical_Combining_Class: Attached_Below_Left} (Short: \p{Ccc=
ATBL}) (0)
\p{Canonical_Combining_Class: B} \p{Canonical_Combining_Class=
Below} (153)
\p{Canonical_Combining_Class: Below} (Short: \p{Ccc=B}) (153)
\p{Canonical_Combining_Class: Below_Left} (Short: \p{Ccc=BL}) (1)
\p{Canonical_Combining_Class: Below_Right} (Short: \p{Ccc=BR}) (4)
\p{Canonical_Combining_Class: BL} \p{Canonical_Combining_Class=
Below_Left} (1)
\p{Canonical_Combining_Class: BR} \p{Canonical_Combining_Class=
Below_Right} (4)
\p{Canonical_Combining_Class: CCC10} (Short: \p{Ccc=CCC10}) (1)
\p{Canonical_Combining_Class: CCC103} (Short: \p{Ccc=CCC103}) (2)
\p{Canonical_Combining_Class: CCC107} (Short: \p{Ccc=CCC107}) (4)
\p{Canonical_Combining_Class: CCC11} (Short: \p{Ccc=CCC11}) (1)
\p{Canonical_Combining_Class: CCC118} (Short: \p{Ccc=CCC118}) (2)
\p{Canonical_Combining_Class: CCC12} (Short: \p{Ccc=CCC12}) (1)
\p{Canonical_Combining_Class: CCC122} (Short: \p{Ccc=CCC122}) (4)
\p{Canonical_Combining_Class: CCC129} (Short: \p{Ccc=CCC129}) (1)
\p{Canonical_Combining_Class: CCC13} (Short: \p{Ccc=CCC13}) (1)
\p{Canonical_Combining_Class: CCC130} (Short: \p{Ccc=CCC130}) (6)
\p{Canonical_Combining_Class: CCC132} (Short: \p{Ccc=CCC132}) (1)
\p{Canonical_Combining_Class: CCC133} (Short: \p{Ccc=CCC133}) (0)
\p{Canonical_Combining_Class: CCC14} (Short: \p{Ccc=CCC14}) (1)
\p{Canonical_Combining_Class: CCC15} (Short: \p{Ccc=CCC15}) (1)
\p{Canonical_Combining_Class: CCC16} (Short: \p{Ccc=CCC16}) (1)
\p{Canonical_Combining_Class: CCC17} (Short: \p{Ccc=CCC17}) (1)
\p{Canonical_Combining_Class: CCC18} (Short: \p{Ccc=CCC18}) (2)
\p{Canonical_Combining_Class: CCC19} (Short: \p{Ccc=CCC19}) (2)
\p{Canonical_Combining_Class: CCC20} (Short: \p{Ccc=CCC20}) (1)
\p{Canonical_Combining_Class: CCC21} (Short: \p{Ccc=CCC21}) (1)
\p{Canonical_Combining_Class: CCC22} (Short: \p{Ccc=CCC22}) (1)
\p{Canonical_Combining_Class: CCC23} (Short: \p{Ccc=CCC23}) (1)
\p{Canonical_Combining_Class: CCC24} (Short: \p{Ccc=CCC24}) (1)
\p{Canonical_Combining_Class: CCC25} (Short: \p{Ccc=CCC25}) (1)
\p{Canonical_Combining_Class: CCC26} (Short: \p{Ccc=CCC26}) (1)
\p{Canonical_Combining_Class: CCC27} (Short: \p{Ccc=CCC27}) (2)
\p{Canonical_Combining_Class: CCC28} (Short: \p{Ccc=CCC28}) (2)
\p{Canonical_Combining_Class: CCC29} (Short: \p{Ccc=CCC29}) (2)
\p{Canonical_Combining_Class: CCC30} (Short: \p{Ccc=CCC30}) (2)
\p{Canonical_Combining_Class: CCC31} (Short: \p{Ccc=CCC31}) (2)
\p{Canonical_Combining_Class: CCC32} (Short: \p{Ccc=CCC32}) (2)
\p{Canonical_Combining_Class: CCC33} (Short: \p{Ccc=CCC33}) (1)
\p{Canonical_Combining_Class: CCC34} (Short: \p{Ccc=CCC34}) (1)
\p{Canonical_Combining_Class: CCC35} (Short: \p{Ccc=CCC35}) (1)
\p{Canonical_Combining_Class: CCC36} (Short: \p{Ccc=CCC36}) (1)
\p{Canonical_Combining_Class: CCC84} (Short: \p{Ccc=CCC84}) (1)
\p{Canonical_Combining_Class: CCC91} (Short: \p{Ccc=CCC91}) (1)
\p{Canonical_Combining_Class: DA} \p{Canonical_Combining_Class=
Double_Above} (5)
\p{Canonical_Combining_Class: DB} \p{Canonical_Combining_Class=
Double_Below} (4)
\p{Canonical_Combining_Class: Double_Above} (Short: \p{Ccc=DA}) (5)
\p{Canonical_Combining_Class: Double_Below} (Short: \p{Ccc=DB}) (4)
\p{Canonical_Combining_Class: Iota_Subscript} (Short: \p{Ccc=IS})
(1)
\p{Canonical_Combining_Class: IS} \p{Canonical_Combining_Class=
Iota_Subscript} (1)
\p{Canonical_Combining_Class: Kana_Voicing} (Short: \p{Ccc=KV}) (2)
\p{Canonical_Combining_Class: KV} \p{Canonical_Combining_Class=
Kana_Voicing} (2)
\p{Canonical_Combining_Class: L} \p{Canonical_Combining_Class=
Left} (2)
\p{Canonical_Combining_Class: Left} (Short: \p{Ccc=L}) (2)
\p{Canonical_Combining_Class: NK} \p{Canonical_Combining_Class=
Nukta} (22)
\p{Canonical_Combining_Class: Not_Reordered} (Short: \p{Ccc=NR})
(1_113_298 plus all above-Unicode code
points)
\p{Canonical_Combining_Class: NR} \p{Canonical_Combining_Class=
Not_Reordered} (1_113_298 plus all
above-Unicode code points)
\p{Canonical_Combining_Class: Nukta} (Short: \p{Ccc=NK}) (22)
\p{Canonical_Combining_Class: OV} \p{Canonical_Combining_Class=
Overlay} (32)
\p{Canonical_Combining_Class: Overlay} (Short: \p{Ccc=OV}) (32)
\p{Canonical_Combining_Class: R} \p{Canonical_Combining_Class=
Right} (1)
\p{Canonical_Combining_Class: Right} (Short: \p{Ccc=R}) (1)
\p{Canonical_Combining_Class: Virama} (Short: \p{Ccc=VR}) (47)
\p{Canonical_Combining_Class: VR} \p{Canonical_Combining_Class=
Virama} (47)
\p{Cans} \p{Canadian_Aboriginal} (=
\p{Script_Extensions=
Canadian_Aboriginal}) (710)
\p{Cari} \p{Carian} (= \p{Script_Extensions=
Carian}) (NOT \p{Block=Carian}) (49)
\p{Carian} \p{Script_Extensions=Carian} (Short:
\p{Cari}; NOT \p{Block=Carian}) (49)
\p{Case_Ignorable} \p{Case_Ignorable=Y} (Short: \p{CI}) (2240)
\p{Case_Ignorable: N*} (Short: \p{CI=N}, \P{CI}) (1_111_872 plus
all above-Unicode code points)
\p{Case_Ignorable: Y*} (Short: \p{CI=Y}, \p{CI}) (2240)
\p{Cased} \p{Cased=Y} (4105)
\p{Cased: N*} (Single: \P{Cased}) (1_110_007 plus all
above-Unicode code points)
\p{Cased: Y*} (Single: \p{Cased}) (4105)
\p{Cased_Letter} \p{General_Category=Cased_Letter} (Short:
\p{LC}) (3796)
\p{Category: *} \p{General_Category: *}
\p{Caucasian_Albanian} \p{Script_Extensions=Caucasian_Albanian}
(Short: \p{Aghb}; NOT \p{Block=
Caucasian_Albanian}) (53)
\p{Cc} \p{XPosixCntrl} (= \p{General_Category=
Control}) (65)
\p{Ccc: *} \p{Canonical_Combining_Class: *}
\p{CE} \p{Composition_Exclusion} (=
\p{Composition_Exclusion=Y}) (81)
\p{CE: *} \p{Composition_Exclusion: *}
\p{Cf} \p{Format} (= \p{General_Category=Format})
(151)
\p{Chakma} \p{Script_Extensions=Chakma} (Short:
\p{Cakm}; NOT \p{Block=Chakma}) (87)
\p{Cham} \p{Script_Extensions=Cham} (NOT \p{Block=
Cham}) (83)
\p{Changes_When_Casefolded} \p{Changes_When_Casefolded=Y} (Short:
\p{CWCF}) (1377)
\p{Changes_When_Casefolded: N*} (Short: \p{CWCF=N}, \P{CWCF})
(1_112_735 plus all above-Unicode code
points)
\p{Changes_When_Casefolded: Y*} (Short: \p{CWCF=Y}, \p{CWCF})
(1377)
\p{Changes_When_Casemapped} \p{Changes_When_Casemapped=Y} (Short:
\p{CWCM}) (2669)
\p{Changes_When_Casemapped: N*} (Short: \p{CWCM=N}, \P{CWCM})
(1_111_443 plus all above-Unicode code
points)
\p{Changes_When_Casemapped: Y*} (Short: \p{CWCM=Y}, \p{CWCM})
(2669)
\p{Changes_When_Lowercased} \p{Changes_When_Lowercased=Y} (Short:
\p{CWL}) (1304)
\p{Changes_When_Lowercased: N*} (Short: \p{CWL=N}, \P{CWL})
(1_112_808 plus all above-Unicode code
points)
\p{Changes_When_Lowercased: Y*} (Short: \p{CWL=Y}, \p{CWL}) (1304)
\p{Changes_When_NFKC_Casefolded} \p{Changes_When_NFKC_Casefolded=
Y} (Short: \p{CWKCF}) (10_227)
\p{Changes_When_NFKC_Casefolded: N*} (Short: \p{CWKCF=N},
\P{CWKCF}) (1_103_885 plus all above-
Unicode code points)
\p{Changes_When_NFKC_Casefolded: Y*} (Short: \p{CWKCF=Y},
\p{CWKCF}) (10_227)
\p{Changes_When_Titlecased} \p{Changes_When_Titlecased=Y} (Short:
\p{CWT}) (1369)
\p{Changes_When_Titlecased: N*} (Short: \p{CWT=N}, \P{CWT})
(1_112_743 plus all above-Unicode code
points)
\p{Changes_When_Titlecased: Y*} (Short: \p{CWT=Y}, \p{CWT}) (1369)
\p{Changes_When_Uppercased} \p{Changes_When_Uppercased=Y} (Short:
\p{CWU}) (1396)
\p{Changes_When_Uppercased: N*} (Short: \p{CWU=N}, \P{CWU})
(1_112_716 plus all above-Unicode code
points)
\p{Changes_When_Uppercased: Y*} (Short: \p{CWU=Y}, \p{CWU}) (1396)
\p{Cher} \p{Cherokee} (= \p{Script_Extensions=
Cherokee}) (NOT \p{Block=Cherokee}) (172)
\p{Cherokee} \p{Script_Extensions=Cherokee} (Short:
\p{Cher}; NOT \p{Block=Cherokee}) (172)
X \p{Cherokee_Sup} \p{Cherokee_Supplement} (= \p{Block=
Cherokee_Supplement}) (80)
X \p{Cherokee_Supplement} \p{Block=Cherokee_Supplement} (Short:
\p{InCherokeeSup}) (80)
\p{CI} \p{Case_Ignorable} (= \p{Case_Ignorable=
Y}) (2240)
\p{CI: *} \p{Case_Ignorable: *}
X \p{CJK} \p{CJK_Unified_Ideographs} (= \p{Block=
CJK_Unified_Ideographs}) (20_992)
X \p{CJK_Compat} \p{CJK_Compatibility} (= \p{Block=
CJK_Compatibility}) (256)
X \p{CJK_Compat_Forms} \p{CJK_Compatibility_Forms} (= \p{Block=
CJK_Compatibility_Forms}) (32)
X \p{CJK_Compat_Ideographs} \p{CJK_Compatibility_Ideographs} (=
\p{Block=CJK_Compatibility_Ideographs})
(512)
X \p{CJK_Compat_Ideographs_Sup}
\p{CJK_Compatibility_Ideographs_-
Supplement} (= \p{Block=
CJK_Compatibility_Ideographs_-
Supplement}) (544)
X \p{CJK_Compatibility} \p{Block=CJK_Compatibility} (Short:
\p{InCJKCompat}) (256)
X \p{CJK_Compatibility_Forms} \p{Block=CJK_Compatibility_Forms}
(Short: \p{InCJKCompatForms}) (32)
X \p{CJK_Compatibility_Ideographs} \p{Block=
CJK_Compatibility_Ideographs} (Short:
\p{InCJKCompatIdeographs}) (512)
X \p{CJK_Compatibility_Ideographs_Supplement} \p{Block=
CJK_Compatibility_Ideographs_Supplement}
(Short: \p{InCJKCompatIdeographsSup})
(544)
X \p{CJK_Ext_A} \p{CJK_Unified_Ideographs_Extension_A} (=
\p{Block=
CJK_Unified_Ideographs_Extension_A})
(6592)
X \p{CJK_Ext_B} \p{CJK_Unified_Ideographs_Extension_B} (=
\p{Block=
CJK_Unified_Ideographs_Extension_B})
(42_720)
X \p{CJK_Ext_C} \p{CJK_Unified_Ideographs_Extension_C} (=
\p{Block=
CJK_Unified_Ideographs_Extension_C})
(4160)
X \p{CJK_Ext_D} \p{CJK_Unified_Ideographs_Extension_D} (=
\p{Block=
CJK_Unified_Ideographs_Extension_D})
(224)
X \p{CJK_Ext_E} \p{CJK_Unified_Ideographs_Extension_E} (=
\p{Block=
CJK_Unified_Ideographs_Extension_E})
(5776)
X \p{CJK_Radicals_Sup} \p{CJK_Radicals_Supplement} (= \p{Block=
CJK_Radicals_Supplement}) (128)
X \p{CJK_Radicals_Supplement} \p{Block=CJK_Radicals_Supplement}
(Short: \p{InCJKRadicalsSup}) (128)
X \p{CJK_Strokes} \p{Block=CJK_Strokes} (48)
X \p{CJK_Symbols} \p{CJK_Symbols_And_Punctuation} (=
\p{Block=CJK_Symbols_And_Punctuation})
(64)
X \p{CJK_Symbols_And_Punctuation} \p{Block=
CJK_Symbols_And_Punctuation} (Short:
\p{InCJKSymbols}) (64)
X \p{CJK_Unified_Ideographs} \p{Block=CJK_Unified_Ideographs}
(Short: \p{InCJK}) (20_992)
X \p{CJK_Unified_Ideographs_Extension_A} \p{Block=
CJK_Unified_Ideographs_Extension_A}
(Short: \p{InCJKExtA}) (6592)
X \p{CJK_Unified_Ideographs_Extension_B} \p{Block=
CJK_Unified_Ideographs_Extension_B}
(Short: \p{InCJKExtB}) (42_720)
X \p{CJK_Unified_Ideographs_Extension_C} \p{Block=
CJK_Unified_Ideographs_Extension_C}
(Short: \p{InCJKExtC}) (4160)
X \p{CJK_Unified_Ideographs_Extension_D} \p{Block=
CJK_Unified_Ideographs_Extension_D}
(Short: \p{InCJKExtD}) (224)
X \p{CJK_Unified_Ideographs_Extension_E} \p{Block=
CJK_Unified_Ideographs_Extension_E}
(Short: \p{InCJKExtE}) (5776)
\p{Close_Punctuation} \p{General_Category=Close_Punctuation}
(Short: \p{Pe}) (73)
\p{Cn} \p{Unassigned} (= \p{General_Category=
Unassigned}) (846_359 plus all above-
Unicode code points)
\p{Cntrl} \p{XPosixCntrl} (= \p{General_Category=
Control}) (65)
\p{Co} \p{Private_Use} (= \p{General_Category=
Private_Use}) (NOT \p{Private_Use_Area})
(137_468)
X \p{Combining_Diacritical_Marks} \p{Block=
Combining_Diacritical_Marks} (Short:
\p{InDiacriticals}) (112)
X \p{Combining_Diacritical_Marks_Extended} \p{Block=
Combining_Diacritical_Marks_Extended}
(Short: \p{InDiacriticalsExt}) (80)
X \p{Combining_Diacritical_Marks_For_Symbols} \p{Block=
Combining_Diacritical_Marks_For_Symbols}
(Short: \p{InDiacriticalsForSymbols})
(48)
X \p{Combining_Diacritical_Marks_Supplement} \p{Block=
Combining_Diacritical_Marks_Supplement}
(Short: \p{InDiacriticalsSup}) (64)
X \p{Combining_Half_Marks} \p{Block=Combining_Half_Marks} (Short:
\p{InHalfMarks}) (16)
\p{Combining_Mark} \p{Mark} (= \p{General_Category=Mark})
(2097)
X \p{Combining_Marks_For_Symbols}
\p{Combining_Diacritical_Marks_For_-
Symbols} (= \p{Block=
Combining_Diacritical_Marks_For_-
Symbols}) (48)
\p{Common} \p{Script_Extensions=Common} (Short:
\p{Zyyy}) (6864)
X \p{Common_Indic_Number_Forms} \p{Block=Common_Indic_Number_Forms}
(Short: \p{InIndicNumberForms}) (16)
\p{Comp_Ex} \p{Full_Composition_Exclusion} (=
\p{Full_Composition_Exclusion=Y}) (1120)
\p{Comp_Ex: *} \p{Full_Composition_Exclusion: *}
X \p{Compat_Jamo} \p{Hangul_Compatibility_Jamo} (= \p{Block=
Hangul_Compatibility_Jamo}) (96)
\p{Composition_Exclusion} \p{Composition_Exclusion=Y} (Short:
\p{CE}) (81)
\p{Composition_Exclusion: N*} (Short: \p{CE=N}, \P{CE}) (1_114_031
plus all above-Unicode code points)
\p{Composition_Exclusion: Y*} (Short: \p{CE=Y}, \p{CE}) (81)
\p{Connector_Punctuation} \p{General_Category=
Connector_Punctuation} (Short: \p{Pc})
(10)
\p{Control} \p{XPosixCntrl} (= \p{General_Category=
Control}) (65)
X \p{Control_Pictures} \p{Block=Control_Pictures} (64)
\p{Copt} \p{Coptic} (= \p{Script_Extensions=
Coptic}) (NOT \p{Block=Coptic}) (165)
\p{Coptic} \p{Script_Extensions=Coptic} (Short:
\p{Copt}; NOT \p{Block=Coptic}) (165)
X \p{Coptic_Epact_Numbers} \p{Block=Coptic_Epact_Numbers} (32)
X \p{Counting_Rod} \p{Counting_Rod_Numerals} (= \p{Block=
Counting_Rod_Numerals}) (32)
X \p{Counting_Rod_Numerals} \p{Block=Counting_Rod_Numerals} (Short:
\p{InCountingRod}) (32)
\p{Cprt} \p{Cypriot} (= \p{Script_Extensions=
Cypriot}) (112)
\p{Cs} \p{Surrogate} (= \p{General_Category=
Surrogate}) (2048)
\p{Cuneiform} \p{Script_Extensions=Cuneiform} (Short:
\p{Xsux}; NOT \p{Block=Cuneiform}) (1234)
X \p{Cuneiform_Numbers} \p{Cuneiform_Numbers_And_Punctuation} (=
\p{Block=
Cuneiform_Numbers_And_Punctuation}) (128)
X \p{Cuneiform_Numbers_And_Punctuation} \p{Block=
Cuneiform_Numbers_And_Punctuation}
(Short: \p{InCuneiformNumbers}) (128)
\p{Currency_Symbol} \p{General_Category=Currency_Symbol}
(Short: \p{Sc}) (53)
X \p{Currency_Symbols} \p{Block=Currency_Symbols} (48)
\p{CWCF} \p{Changes_When_Casefolded} (=
\p{Changes_When_Casefolded=Y}) (1377)
\p{CWCF: *} \p{Changes_When_Casefolded: *}
\p{CWCM} \p{Changes_When_Casemapped} (=
\p{Changes_When_Casemapped=Y}) (2669)
\p{CWCM: *} \p{Changes_When_Casemapped: *}
\p{CWKCF} \p{Changes_When_NFKC_Casefolded} (=
\p{Changes_When_NFKC_Casefolded=Y})
(10_227)
\p{CWKCF: *} \p{Changes_When_NFKC_Casefolded: *}
\p{CWL} \p{Changes_When_Lowercased} (=
\p{Changes_When_Lowercased=Y}) (1304)
\p{CWL: *} \p{Changes_When_Lowercased: *}
\p{CWT} \p{Changes_When_Titlecased} (=
\p{Changes_When_Titlecased=Y}) (1369)
\p{CWT: *} \p{Changes_When_Titlecased: *}
\p{CWU} \p{Changes_When_Uppercased} (=
\p{Changes_When_Uppercased=Y}) (1396)
\p{CWU: *} \p{Changes_When_Uppercased: *}
\p{Cypriot} \p{Script_Extensions=Cypriot} (Short:
\p{Cprt}) (112)
X \p{Cypriot_Syllabary} \p{Block=Cypriot_Syllabary} (64)
\p{Cyrillic} \p{Script_Extensions=Cyrillic} (Short:
\p{Cyrl}; NOT \p{Block=Cyrillic}) (446)
X \p{Cyrillic_Ext_A} \p{Cyrillic_Extended_A} (= \p{Block=
Cyrillic_Extended_A}) (32)
X \p{Cyrillic_Ext_B} \p{Cyrillic_Extended_B} (= \p{Block=
Cyrillic_Extended_B}) (96)
X \p{Cyrillic_Ext_C} \p{Cyrillic_Extended_C} (= \p{Block=
Cyrillic_Extended_C}) (16)
X \p{Cyrillic_Extended_A} \p{Block=Cyrillic_Extended_A} (Short:
\p{InCyrillicExtA}) (32)
X \p{Cyrillic_Extended_B} \p{Block=Cyrillic_Extended_B} (Short:
\p{InCyrillicExtB}) (96)
X \p{Cyrillic_Extended_C} \p{Block=Cyrillic_Extended_C} (Short:
\p{InCyrillicExtC}) (16)
X \p{Cyrillic_Sup} \p{Cyrillic_Supplement} (= \p{Block=
Cyrillic_Supplement}) (48)
X \p{Cyrillic_Supplement} \p{Block=Cyrillic_Supplement} (Short:
\p{InCyrillicSup}) (48)
X \p{Cyrillic_Supplementary} \p{Cyrillic_Supplement} (= \p{Block=
Cyrillic_Supplement}) (48)
\p{Cyrl} \p{Cyrillic} (= \p{Script_Extensions=
Cyrillic}) (NOT \p{Block=Cyrillic}) (446)
\p{Dash} \p{Dash=Y} (28)
\p{Dash: N*} (Single: \P{Dash}) (1_114_084 plus all
above-Unicode code points)
\p{Dash: Y*} (Single: \p{Dash}) (28)
\p{Dash_Punctuation} \p{General_Category=Dash_Punctuation}
(Short: \p{Pd}) (24)
\p{Decimal_Number} \p{XPosixDigit} (= \p{General_Category=
Decimal_Number}) (580)
\p{Decomposition_Type: Can} \p{Decomposition_Type=Canonical}
(13_232)
\p{Decomposition_Type: Canonical} (Short: \p{Dt=Can}) (13_232)
\p{Decomposition_Type: Circle} (Short: \p{Dt=Enc}) (240)
\p{Decomposition_Type: Com} \p{Decomposition_Type=Compat} (720)
\p{Decomposition_Type: Compat} (Short: \p{Dt=Com}) (720)
\p{Decomposition_Type: Enc} \p{Decomposition_Type=Circle} (240)
\p{Decomposition_Type: Fin} \p{Decomposition_Type=Final} (240)
\p{Decomposition_Type: Final} (Short: \p{Dt=Fin}) (240)
\p{Decomposition_Type: Font} (Short: \p{Dt=Font}) (1184)
\p{Decomposition_Type: Fra} \p{Decomposition_Type=Fraction} (20)
\p{Decomposition_Type: Fraction} (Short: \p{Dt=Fra}) (20)
\p{Decomposition_Type: Init} \p{Decomposition_Type=Initial} (171)
\p{Decomposition_Type: Initial} (Short: \p{Dt=Init}) (171)
\p{Decomposition_Type: Iso} \p{Decomposition_Type=Isolated} (238)
\p{Decomposition_Type: Isolated} (Short: \p{Dt=Iso}) (238)
\p{Decomposition_Type: Med} \p{Decomposition_Type=Medial} (82)
\p{Decomposition_Type: Medial} (Short: \p{Dt=Med}) (82)
\p{Decomposition_Type: Nar} \p{Decomposition_Type=Narrow} (122)
\p{Decomposition_Type: Narrow} (Short: \p{Dt=Nar}) (122)
\p{Decomposition_Type: Nb} \p{Decomposition_Type=Nobreak} (5)
\p{Decomposition_Type: Nobreak} (Short: \p{Dt=Nb}) (5)
\p{Decomposition_Type: Non_Canon} \p{Decomposition_Type=
Non_Canonical} (Perl extension) (3662)
\p{Decomposition_Type: Non_Canonical} Union of all non-canonical
decompositions (Short: \p{Dt=NonCanon})
(Perl extension) (3662)
\p{Decomposition_Type: None} (Short: \p{Dt=None}) (1_097_218 plus
all above-Unicode code points)
\p{Decomposition_Type: Small} (Short: \p{Dt=Sml}) (26)
\p{Decomposition_Type: Sml} \p{Decomposition_Type=Small} (26)
\p{Decomposition_Type: Sqr} \p{Decomposition_Type=Square} (285)
\p{Decomposition_Type: Square} (Short: \p{Dt=Sqr}) (285)
\p{Decomposition_Type: Sub} (Short: \p{Dt=Sub}) (38)
\p{Decomposition_Type: Sup} \p{Decomposition_Type=Super} (152)
\p{Decomposition_Type: Super} (Short: \p{Dt=Sup}) (152)
\p{Decomposition_Type: Vert} \p{Decomposition_Type=Vertical} (35)
\p{Decomposition_Type: Vertical} (Short: \p{Dt=Vert}) (35)
\p{Decomposition_Type: Wide} (Short: \p{Dt=Wide}) (104)
\p{Default_Ignorable_Code_Point} \p{Default_Ignorable_Code_Point=
Y} (Short: \p{DI}) (4173)
\p{Default_Ignorable_Code_Point: N*} (Short: \p{DI=N}, \P{DI})
(1_109_939 plus all above-Unicode code
points)
\p{Default_Ignorable_Code_Point: Y*} (Short: \p{DI=Y}, \p{DI})
(4173)
\p{Dep} \p{Deprecated} (= \p{Deprecated=Y}) (15)
\p{Dep: *} \p{Deprecated: *}
\p{Deprecated} \p{Deprecated=Y} (Short: \p{Dep}) (15)
\p{Deprecated: N*} (Short: \p{Dep=N}, \P{Dep}) (1_114_097
plus all above-Unicode code points)
\p{Deprecated: Y*} (Short: \p{Dep=Y}, \p{Dep}) (15)
\p{Deseret} \p{Script_Extensions=Deseret} (Short:
\p{Dsrt}) (80)
\p{Deva} \p{Devanagari} (= \p{Script_Extensions=
Devanagari}) (NOT \p{Block=Devanagari})
(210)
\p{Devanagari} \p{Script_Extensions=Devanagari} (Short:
\p{Deva}; NOT \p{Block=Devanagari}) (210)
X \p{Devanagari_Ext} \p{Devanagari_Extended} (= \p{Block=
Devanagari_Extended}) (32)
X \p{Devanagari_Extended} \p{Block=Devanagari_Extended} (Short:
\p{InDevanagariExt}) (32)
\p{DI} \p{Default_Ignorable_Code_Point} (=
\p{Default_Ignorable_Code_Point=Y})
(4173)
\p{DI: *} \p{Default_Ignorable_Code_Point: *}
\p{Dia} \p{Diacritic} (= \p{Diacritic=Y}) (782)
\p{Dia: *} \p{Diacritic: *}
\p{Diacritic} \p{Diacritic=Y} (Short: \p{Dia}) (782)
\p{Diacritic: N*} (Short: \p{Dia=N}, \P{Dia}) (1_113_330
plus all above-Unicode code points)
\p{Diacritic: Y*} (Short: \p{Dia=Y}, \p{Dia}) (782)
X \p{Diacriticals} \p{Combining_Diacritical_Marks} (=
\p{Block=Combining_Diacritical_Marks})
(112)
X \p{Diacriticals_Ext} \p{Combining_Diacritical_Marks_Extended}
(= \p{Block=
Combining_Diacritical_Marks_Extended})
(80)
X \p{Diacriticals_For_Symbols}
\p{Combining_Diacritical_Marks_For_-
Symbols} (= \p{Block=
Combining_Diacritical_Marks_For_-
Symbols}) (48)
X \p{Diacriticals_Sup} \p{Combining_Diacritical_Marks_Supplement}
(= \p{Block=
Combining_Diacritical_Marks_Supplement})
(64)
\p{Digit} \p{XPosixDigit} (= \p{General_Category=
Decimal_Number}) (580)
X \p{Dingbats} \p{Block=Dingbats} (192)
X \p{Domino} \p{Domino_Tiles} (= \p{Block=
Domino_Tiles}) (112)
X \p{Domino_Tiles} \p{Block=Domino_Tiles} (Short:
\p{InDomino}) (112)
\p{Dsrt} \p{Deseret} (= \p{Script_Extensions=
Deseret}) (80)
\p{Dt: *} \p{Decomposition_Type: *}
\p{Dupl} \p{Duployan} (= \p{Script_Extensions=
Duployan}) (NOT \p{Block=Duployan}) (147)
\p{Duployan} \p{Script_Extensions=Duployan} (Short:
\p{Dupl}; NOT \p{Block=Duployan}) (147)
\p{Ea: *} \p{East_Asian_Width: *}
X \p{Early_Dynastic_Cuneiform} \p{Block=Early_Dynastic_Cuneiform}
(208)
\p{East_Asian_Width: A} \p{East_Asian_Width=Ambiguous} (138_739)
\p{East_Asian_Width: Ambiguous} (Short: \p{Ea=A}) (138_739)
\p{East_Asian_Width: F} \p{East_Asian_Width=Fullwidth} (104)
\p{East_Asian_Width: Fullwidth} (Short: \p{Ea=F}) (104)
\p{East_Asian_Width: H} \p{East_Asian_Width=Halfwidth} (123)
\p{East_Asian_Width: Halfwidth} (Short: \p{Ea=H}) (123)
\p{East_Asian_Width: N} \p{East_Asian_Width=Neutral} (794_146 plus
all above-Unicode code points)
\p{East_Asian_Width: Na} \p{East_Asian_Width=Narrow} (111)
\p{East_Asian_Width: Narrow} (Short: \p{Ea=Na}) (111)
\p{East_Asian_Width: Neutral} (Short: \p{Ea=N}) (794_146 plus all
above-Unicode code points)
\p{East_Asian_Width: W} \p{East_Asian_Width=Wide} (180_889)
\p{East_Asian_Width: Wide} (Short: \p{Ea=W}) (180_889)
\p{Egyp} \p{Egyptian_Hieroglyphs} (=
\p{Script_Extensions=
Egyptian_Hieroglyphs}) (NOT \p{Block=
Egyptian_Hieroglyphs}) (1071)
\p{Egyptian_Hieroglyphs} \p{Script_Extensions=
Egyptian_Hieroglyphs} (Short: \p{Egyp};
NOT \p{Block=Egyptian_Hieroglyphs})
(1071)
\p{Elba} \p{Elbasan} (= \p{Script_Extensions=
Elbasan}) (NOT \p{Block=Elbasan}) (40)
\p{Elbasan} \p{Script_Extensions=Elbasan} (Short:
\p{Elba}; NOT \p{Block=Elbasan}) (40)
X \p{Emoticons} \p{Block=Emoticons} (80)
X \p{Enclosed_Alphanum} \p{Enclosed_Alphanumerics} (= \p{Block=
Enclosed_Alphanumerics}) (160)
X \p{Enclosed_Alphanum_Sup} \p{Enclosed_Alphanumeric_Supplement} (=
\p{Block=
Enclosed_Alphanumeric_Supplement}) (256)
X \p{Enclosed_Alphanumeric_Supplement} \p{Block=
Enclosed_Alphanumeric_Supplement}
(Short: \p{InEnclosedAlphanumSup}) (256)
X \p{Enclosed_Alphanumerics} \p{Block=Enclosed_Alphanumerics}
(Short: \p{InEnclosedAlphanum}) (160)
X \p{Enclosed_CJK} \p{Enclosed_CJK_Letters_And_Months} (=
\p{Block=
Enclosed_CJK_Letters_And_Months}) (256)
X \p{Enclosed_CJK_Letters_And_Months} \p{Block=
Enclosed_CJK_Letters_And_Months} (Short:
\p{InEnclosedCJK}) (256)
X \p{Enclosed_Ideographic_Sup} \p{Enclosed_Ideographic_Supplement}
(= \p{Block=
Enclosed_Ideographic_Supplement}) (256)
X \p{Enclosed_Ideographic_Supplement} \p{Block=
Enclosed_Ideographic_Supplement} (Short:
\p{InEnclosedIdeographicSup}) (256)
\p{Enclosing_Mark} \p{General_Category=Enclosing_Mark}
(Short: \p{Me}) (13)
\p{Ethi} \p{Ethiopic} (= \p{Script_Extensions=
Ethiopic}) (NOT \p{Block=Ethiopic}) (495)
\p{Ethiopic} \p{Script_Extensions=Ethiopic} (Short:
\p{Ethi}; NOT \p{Block=Ethiopic}) (495)
X \p{Ethiopic_Ext} \p{Ethiopic_Extended} (= \p{Block=
Ethiopic_Extended}) (96)
X \p{Ethiopic_Ext_A} \p{Ethiopic_Extended_A} (= \p{Block=
Ethiopic_Extended_A}) (48)
X \p{Ethiopic_Extended} \p{Block=Ethiopic_Extended} (Short:
\p{InEthiopicExt}) (96)
X \p{Ethiopic_Extended_A} \p{Block=Ethiopic_Extended_A} (Short:
\p{InEthiopicExtA}) (48)
X \p{Ethiopic_Sup} \p{Ethiopic_Supplement} (= \p{Block=
Ethiopic_Supplement}) (32)
X \p{Ethiopic_Supplement} \p{Block=Ethiopic_Supplement} (Short:
\p{InEthiopicSup}) (32)
\p{Ext} \p{Extender} (= \p{Extender=Y}) (42)
\p{Ext: *} \p{Extender: *}
\p{Extender} \p{Extender=Y} (Short: \p{Ext}) (42)
\p{Extender: N*} (Short: \p{Ext=N}, \P{Ext}) (1_114_070
plus all above-Unicode code points)
\p{Extender: Y*} (Short: \p{Ext=Y}, \p{Ext}) (42)
\p{Final_Punctuation} \p{General_Category=Final_Punctuation}
(Short: \p{Pf}) (10)
\p{Format} \p{General_Category=Format} (Short:
\p{Cf}) (151)
\p{Full_Composition_Exclusion} \p{Full_Composition_Exclusion=Y}
(Short: \p{CompEx}) (1120)
\p{Full_Composition_Exclusion: N*} (Short: \p{CompEx=N},
\P{CompEx}) (1_112_992 plus all above-
Unicode code points)
\p{Full_Composition_Exclusion: Y*} (Short: \p{CompEx=Y},
\p{CompEx}) (1120)
\p{Gc: *} \p{General_Category: *}
\p{GCB: *} \p{Grapheme_Cluster_Break: *}
\p{General_Category: C} \p{General_Category=Other} (986_091 plus
all above-Unicode code points)
\p{General_Category: Cased_Letter} [\p{Ll}\p{Lu}\p{Lt}] (Short:
\p{Gc=LC}, \p{LC}) (3796)
\p{General_Category: Cc} \p{General_Category=Control} (65)
\p{General_Category: Cf} \p{General_Category=Format} (151)
\p{General_Category: Close_Punctuation} (Short: \p{Gc=Pe}, \p{Pe})
(73)
\p{General_Category: Cn} \p{General_Category=Unassigned} (846_359
plus all above-Unicode code points)
\p{General_Category: Cntrl} \p{General_Category=Control} (65)
\p{General_Category: Co} \p{General_Category=Private_Use} (137_468)
\p{General_Category: Combining_Mark} \p{General_Category=Mark}
(2097)
\p{General_Category: Connector_Punctuation} (Short: \p{Gc=Pc},
\p{Pc}) (10)
\p{General_Category: Control} (Short: \p{Gc=Cc}, \p{Cc}) (65)
\p{General_Category: Cs} \p{General_Category=Surrogate} (2048)
\p{General_Category: Currency_Symbol} (Short: \p{Gc=Sc}, \p{Sc})
(53)
\p{General_Category: Dash_Punctuation} (Short: \p{Gc=Pd}, \p{Pd})
(24)
\p{General_Category: Decimal_Number} (Short: \p{Gc=Nd}, \p{Nd})
(580)
\p{General_Category: Digit} \p{General_Category=Decimal_Number}
(580)
\p{General_Category: Enclosing_Mark} (Short: \p{Gc=Me}, \p{Me})
(13)
\p{General_Category: Final_Punctuation} (Short: \p{Gc=Pf}, \p{Pf})
(10)
\p{General_Category: Format} (Short: \p{Gc=Cf}, \p{Cf}) (151)
\p{General_Category: Initial_Punctuation} (Short: \p{Gc=Pi},
\p{Pi}) (12)
\p{General_Category: L} \p{General_Category=Letter} (116_766)
X \p{General_Category: L&} \p{General_Category=Cased_Letter} (3796)
X \p{General_Category: L_} \p{General_Category=Cased_Letter} Note
the trailing '_' matters in spite of
loose matching rules. (3796)
\p{General_Category: LC} \p{General_Category=Cased_Letter} (3796)
\p{General_Category: Letter} (Short: \p{Gc=L}, \p{L}) (116_766)
\p{General_Category: Letter_Number} (Short: \p{Gc=Nl}, \p{Nl})
(236)
\p{General_Category: Line_Separator} (Short: \p{Gc=Zl}, \p{Zl}) (1)
\p{General_Category: Ll} \p{General_Category=Lowercase_Letter}
(/i= General_Category=Cased_Letter)
(2063)
\p{General_Category: Lm} \p{General_Category=Modifier_Letter} (249)
\p{General_Category: Lo} \p{General_Category=Other_Letter}
(112_721)
\p{General_Category: Lowercase_Letter} (Short: \p{Gc=Ll}, \p{Ll};
/i= General_Category=Cased_Letter) (2063)
\p{General_Category: Lt} \p{General_Category=Titlecase_Letter}
(/i= General_Category=Cased_Letter) (31)
\p{General_Category: Lu} \p{General_Category=Uppercase_Letter}
(/i= General_Category=Cased_Letter)
(1702)
\p{General_Category: M} \p{General_Category=Mark} (2097)
\p{General_Category: Mark} (Short: \p{Gc=M}, \p{M}) (2097)
\p{General_Category: Math_Symbol} (Short: \p{Gc=Sm}, \p{Sm}) (948)
\p{General_Category: Mc} \p{General_Category=Spacing_Mark} (394)
\p{General_Category: Me} \p{General_Category=Enclosing_Mark} (13)
\p{General_Category: Mn} \p{General_Category=Nonspacing_Mark}
(1690)
\p{General_Category: Modifier_Letter} (Short: \p{Gc=Lm}, \p{Lm})
(249)
\p{General_Category: Modifier_Symbol} (Short: \p{Gc=Sk}, \p{Sk})
(121)
\p{General_Category: N} \p{General_Category=Number} (1492)
\p{General_Category: Nd} \p{General_Category=Decimal_Number} (580)
\p{General_Category: Nl} \p{General_Category=Letter_Number} (236)
\p{General_Category: No} \p{General_Category=Other_Number} (676)
\p{General_Category: Nonspacing_Mark} (Short: \p{Gc=Mn}, \p{Mn})
(1690)
\p{General_Category: Number} (Short: \p{Gc=N}, \p{N}) (1492)
\p{General_Category: Open_Punctuation} (Short: \p{Gc=Ps}, \p{Ps})
(75)
\p{General_Category: Other} (Short: \p{Gc=C}, \p{C}) (986_091 plus
all above-Unicode code points)
\p{General_Category: Other_Letter} (Short: \p{Gc=Lo}, \p{Lo})
(112_721)
\p{General_Category: Other_Number} (Short: \p{Gc=No}, \p{No}) (676)
\p{General_Category: Other_Punctuation} (Short: \p{Gc=Po}, \p{Po})
(544)
\p{General_Category: Other_Symbol} (Short: \p{Gc=So}, \p{So})
(5777)
\p{General_Category: P} \p{General_Category=Punctuation} (748)
\p{General_Category: Paragraph_Separator} (Short: \p{Gc=Zp},
\p{Zp}) (1)
\p{General_Category: Pc} \p{General_Category=
Connector_Punctuation} (10)
\p{General_Category: Pd} \p{General_Category=Dash_Punctuation} (24)
\p{General_Category: Pe} \p{General_Category=Close_Punctuation}
(73)
\p{General_Category: Pf} \p{General_Category=Final_Punctuation}
(10)
\p{General_Category: Pi} \p{General_Category=Initial_Punctuation}
(12)
\p{General_Category: Po} \p{General_Category=Other_Punctuation}
(544)
\p{General_Category: Private_Use} (Short: \p{Gc=Co}, \p{Co})
(137_468)
\p{General_Category: Ps} \p{General_Category=Open_Punctuation} (75)
\p{General_Category: Punct} \p{General_Category=Punctuation} (748)
\p{General_Category: Punctuation} (Short: \p{Gc=P}, \p{P}) (748)
\p{General_Category: S} \p{General_Category=Symbol} (6899)
\p{General_Category: Sc} \p{General_Category=Currency_Symbol} (53)
\p{General_Category: Separator} (Short: \p{Gc=Z}, \p{Z}) (19)
\p{General_Category: Sk} \p{General_Category=Modifier_Symbol} (121)
\p{General_Category: Sm} \p{General_Category=Math_Symbol} (948)
\p{General_Category: So} \p{General_Category=Other_Symbol} (5777)
\p{General_Category: Space_Separator} (Short: \p{Gc=Zs}, \p{Zs})
(17)
\p{General_Category: Spacing_Mark} (Short: \p{Gc=Mc}, \p{Mc}) (394)
\p{General_Category: Surrogate} (Short: \p{Gc=Cs}, \p{Cs}) (2048)
\p{General_Category: Symbol} (Short: \p{Gc=S}, \p{S}) (6899)
\p{General_Category: Titlecase_Letter} (Short: \p{Gc=Lt}, \p{Lt};
/i= General_Category=Cased_Letter) (31)
\p{General_Category: Unassigned} (Short: \p{Gc=Cn}, \p{Cn})
(846_359 plus all above-Unicode code
points)
\p{General_Category: Uppercase_Letter} (Short: \p{Gc=Lu}, \p{Lu};
/i= General_Category=Cased_Letter) (1702)
\p{General_Category: Z} \p{General_Category=Separator} (19)
\p{General_Category: Zl} \p{General_Category=Line_Separator} (1)
\p{General_Category: Zp} \p{General_Category=Paragraph_Separator}
(1)
\p{General_Category: Zs} \p{General_Category=Space_Separator} (17)
X \p{General_Punctuation} \p{Block=General_Punctuation} (Short:
\p{InPunctuation}) (112)
X \p{Geometric_Shapes} \p{Block=Geometric_Shapes} (96)
X \p{Geometric_Shapes_Ext} \p{Geometric_Shapes_Extended} (=
\p{Block=Geometric_Shapes_Extended})
(128)
X \p{Geometric_Shapes_Extended} \p{Block=Geometric_Shapes_Extended}
(Short: \p{InGeometricShapesExt}) (128)
\p{Geor} \p{Georgian} (= \p{Script_Extensions=
Georgian}) (NOT \p{Block=Georgian}) (129)
\p{Georgian} \p{Script_Extensions=Georgian} (Short:
\p{Geor}; NOT \p{Block=Georgian}) (129)
X \p{Georgian_Sup} \p{Georgian_Supplement} (= \p{Block=
Georgian_Supplement}) (48)
X \p{Georgian_Supplement} \p{Block=Georgian_Supplement} (Short:
\p{InGeorgianSup}) (48)
\p{Glag} \p{Glagolitic} (= \p{Script_Extensions=
Glagolitic}) (NOT \p{Block=Glagolitic})
(136)
\p{Glagolitic} \p{Script_Extensions=Glagolitic} (Short:
\p{Glag}; NOT \p{Block=Glagolitic}) (136)
X \p{Glagolitic_Sup} \p{Glagolitic_Supplement} (= \p{Block=
Glagolitic_Supplement}) (48)
X \p{Glagolitic_Supplement} \p{Block=Glagolitic_Supplement} (Short:
\p{InGlagoliticSup}) (48)
\p{Goth} \p{Gothic} (= \p{Script_Extensions=
Gothic}) (NOT \p{Block=Gothic}) (27)
\p{Gothic} \p{Script_Extensions=Gothic} (Short:
\p{Goth}; NOT \p{Block=Gothic}) (27)
\p{Gr_Base} \p{Grapheme_Base} (= \p{Grapheme_Base=Y})
(126_288)
\p{Gr_Base: *} \p{Grapheme_Base: *}
\p{Gr_Ext} \p{Grapheme_Extend} (= \p{Grapheme_Extend=
Y}) (1828)
\p{Gr_Ext: *} \p{Grapheme_Extend: *}
\p{Gran} \p{Grantha} (= \p{Script_Extensions=
Grantha}) (NOT \p{Block=Grantha}) (113)
\p{Grantha} \p{Script_Extensions=Grantha} (Short:
\p{Gran}; NOT \p{Block=Grantha}) (113)
\p{Graph} \p{XPosixGraph} (265_621)
\p{Grapheme_Base} \p{Grapheme_Base=Y} (Short: \p{GrBase})
(126_288)
\p{Grapheme_Base: N*} (Short: \p{GrBase=N}, \P{GrBase}) (987_824
plus all above-Unicode code points)
\p{Grapheme_Base: Y*} (Short: \p{GrBase=Y}, \p{GrBase}) (126_288)
\p{Grapheme_Cluster_Break: CN} \p{Grapheme_Cluster_Break=Control}
(5925)
\p{Grapheme_Cluster_Break: Control} (Short: \p{GCB=CN}) (5925)
\p{Grapheme_Cluster_Break: CR} (Short: \p{GCB=CR}) (1)
\p{Grapheme_Cluster_Break: E_Base} (Short: \p{GCB=EB}) (79)
\p{Grapheme_Cluster_Break: E_Base_GAZ} (Short: \p{GCB=EBG}) (4)
\p{Grapheme_Cluster_Break: E_Modifier} (Short: \p{GCB=EM}) (5)
\p{Grapheme_Cluster_Break: EB} \p{Grapheme_Cluster_Break=E_Base}
(79)
\p{Grapheme_Cluster_Break: EBG} \p{Grapheme_Cluster_Break=
E_Base_GAZ} (4)
\p{Grapheme_Cluster_Break: EM} \p{Grapheme_Cluster_Break=
E_Modifier} (5)
\p{Grapheme_Cluster_Break: EX} \p{Grapheme_Cluster_Break=Extend}
(1828)
\p{Grapheme_Cluster_Break: Extend} (Short: \p{GCB=EX}) (1828)
\p{Grapheme_Cluster_Break: GAZ} \p{Grapheme_Cluster_Break=
Glue_After_Zwj} (3)
\p{Grapheme_Cluster_Break: Glue_After_Zwj} (Short: \p{GCB=GAZ}) (3)
\p{Grapheme_Cluster_Break: L} (Short: \p{GCB=L}) (125)
\p{Grapheme_Cluster_Break: LF} (Short: \p{GCB=LF}) (1)
\p{Grapheme_Cluster_Break: LV} (Short: \p{GCB=LV}) (399)
\p{Grapheme_Cluster_Break: LVT} (Short: \p{GCB=LVT}) (10_773)
\p{Grapheme_Cluster_Break: Other} (Short: \p{GCB=XX}) (1_094_356
plus all above-Unicode code points)
\p{Grapheme_Cluster_Break: PP} \p{Grapheme_Cluster_Break=Prepend}
(13)
\p{Grapheme_Cluster_Break: Prepend} (Short: \p{GCB=PP}) (13)
\p{Grapheme_Cluster_Break: Regional_Indicator} (Short: \p{GCB=RI})
(26)
\p{Grapheme_Cluster_Break: RI} \p{Grapheme_Cluster_Break=
Regional_Indicator} (26)
\p{Grapheme_Cluster_Break: SM} \p{Grapheme_Cluster_Break=
SpacingMark} (341)
\p{Grapheme_Cluster_Break: SpacingMark} (Short: \p{GCB=SM}) (341)
\p{Grapheme_Cluster_Break: T} (Short: \p{GCB=T}) (137)
\p{Grapheme_Cluster_Break: V} (Short: \p{GCB=V}) (95)
\p{Grapheme_Cluster_Break: XX} \p{Grapheme_Cluster_Break=Other}
(1_094_356 plus all above-Unicode code
points)
\p{Grapheme_Cluster_Break: ZWJ} (Short: \p{GCB=ZWJ}) (1)
\p{Grapheme_Extend} \p{Grapheme_Extend=Y} (Short: \p{GrExt})
(1828)
\p{Grapheme_Extend: N*} (Short: \p{GrExt=N}, \P{GrExt}) (1_112_284
plus all above-Unicode code points)
\p{Grapheme_Extend: Y*} (Short: \p{GrExt=Y}, \p{GrExt}) (1828)
\p{Greek} \p{Script_Extensions=Greek} (Short:
\p{Grek}; NOT \p{Greek_And_Coptic}) (522)
X \p{Greek_And_Coptic} \p{Block=Greek_And_Coptic} (Short:
\p{InGreek}) (144)
X \p{Greek_Ext} \p{Greek_Extended} (= \p{Block=
Greek_Extended}) (256)
X \p{Greek_Extended} \p{Block=Greek_Extended} (Short:
\p{InGreekExt}) (256)
\p{Grek} \p{Greek} (= \p{Script_Extensions=Greek})
(NOT \p{Greek_And_Coptic}) (522)
\p{Gujarati} \p{Script_Extensions=Gujarati} (Short:
\p{Gujr}; NOT \p{Block=Gujarati}) (99)
\p{Gujr} \p{Gujarati} (= \p{Script_Extensions=
Gujarati}) (NOT \p{Block=Gujarati}) (99)
\p{Gurmukhi} \p{Script_Extensions=Gurmukhi} (Short:
\p{Guru}; NOT \p{Block=Gurmukhi}) (93)
\p{Guru} \p{Gurmukhi} (= \p{Script_Extensions=
Gurmukhi}) (NOT \p{Block=Gurmukhi}) (93)
X \p{Half_And_Full_Forms} \p{Halfwidth_And_Fullwidth_Forms} (=
\p{Block=Halfwidth_And_Fullwidth_Forms})
(240)
X \p{Half_Marks} \p{Combining_Half_Marks} (= \p{Block=
Combining_Half_Marks}) (16)
X \p{Halfwidth_And_Fullwidth_Forms} \p{Block=
Halfwidth_And_Fullwidth_Forms} (Short:
\p{InHalfAndFullForms}) (240)
\p{Han} \p{Script_Extensions=Han} (82_013)
\p{Hang} \p{Hangul} (= \p{Script_Extensions=
Hangul}) (NOT \p{Hangul_Syllables})
(11_775)
\p{Hangul} \p{Script_Extensions=Hangul} (Short:
\p{Hang}; NOT \p{Hangul_Syllables})
(11_775)
X \p{Hangul_Compatibility_Jamo} \p{Block=Hangul_Compatibility_Jamo}
(Short: \p{InCompatJamo}) (96)
X \p{Hangul_Jamo} \p{Block=Hangul_Jamo} (Short: \p{InJamo})
(256)
X \p{Hangul_Jamo_Extended_A} \p{Block=Hangul_Jamo_Extended_A}
(Short: \p{InJamoExtA}) (32)
X \p{Hangul_Jamo_Extended_B} \p{Block=Hangul_Jamo_Extended_B}
(Short: \p{InJamoExtB}) (80)
\p{Hangul_Syllable_Type: L} \p{Hangul_Syllable_Type=Leading_Jamo}
(125)
\p{Hangul_Syllable_Type: Leading_Jamo} (Short: \p{Hst=L}) (125)
\p{Hangul_Syllable_Type: LV} \p{Hangul_Syllable_Type=LV_Syllable}
(399)
\p{Hangul_Syllable_Type: LV_Syllable} (Short: \p{Hst=LV}) (399)
\p{Hangul_Syllable_Type: LVT} \p{Hangul_Syllable_Type=
LVT_Syllable} (10_773)
\p{Hangul_Syllable_Type: LVT_Syllable} (Short: \p{Hst=LVT})
(10_773)
\p{Hangul_Syllable_Type: NA} \p{Hangul_Syllable_Type=
Not_Applicable} (1_102_583 plus all
above-Unicode code points)
\p{Hangul_Syllable_Type: Not_Applicable} (Short: \p{Hst=NA})
(1_102_583 plus all above-Unicode code
points)
\p{Hangul_Syllable_Type: T} \p{Hangul_Syllable_Type=Trailing_Jamo}
(137)
\p{Hangul_Syllable_Type: Trailing_Jamo} (Short: \p{Hst=T}) (137)
\p{Hangul_Syllable_Type: V} \p{Hangul_Syllable_Type=Vowel_Jamo}
(95)
\p{Hangul_Syllable_Type: Vowel_Jamo} (Short: \p{Hst=V}) (95)
X \p{Hangul_Syllables} \p{Block=Hangul_Syllables} (Short:
\p{InHangul}) (11_184)
\p{Hani} \p{Han} (= \p{Script_Extensions=Han})
(82_013)
\p{Hano} \p{Hanunoo} (= \p{Script_Extensions=
Hanunoo}) (NOT \p{Block=Hanunoo}) (23)
\p{Hanunoo} \p{Script_Extensions=Hanunoo} (Short:
\p{Hano}; NOT \p{Block=Hanunoo}) (23)
\p{Hatr} \p{Hatran} (= \p{Script_Extensions=
Hatran}) (NOT \p{Block=Hatran}) (26)
\p{Hatran} \p{Script_Extensions=Hatran} (Short:
\p{Hatr}; NOT \p{Block=Hatran}) (26)
\p{Hebr} \p{Hebrew} (= \p{Script_Extensions=
Hebrew}) (NOT \p{Block=Hebrew}) (133)
\p{Hebrew} \p{Script_Extensions=Hebrew} (Short:
\p{Hebr}; NOT \p{Block=Hebrew}) (133)
\p{Hex} \p{XPosixXDigit} (= \p{Hex_Digit=Y}) (44)
\p{Hex: *} \p{Hex_Digit: *}
\p{Hex_Digit} \p{XPosixXDigit} (= \p{Hex_Digit=Y}) (44)
\p{Hex_Digit: N*} (Short: \p{Hex=N}, \P{Hex}) (1_114_068
plus all above-Unicode code points)
\p{Hex_Digit: Y*} (Short: \p{Hex=Y}, \p{Hex}) (44)
X \p{High_Private_Use_Surrogates} \p{Block=
High_Private_Use_Surrogates} (Short:
\p{InHighPUSurrogates}) (128)
X \p{High_PU_Surrogates} \p{High_Private_Use_Surrogates} (=
\p{Block=High_Private_Use_Surrogates})
(128)
X \p{High_Surrogates} \p{Block=High_Surrogates} (896)
\p{Hira} \p{Hiragana} (= \p{Script_Extensions=
Hiragana}) (NOT \p{Block=Hiragana}) (143)
\p{Hiragana} \p{Script_Extensions=Hiragana} (Short:
\p{Hira}; NOT \p{Block=Hiragana}) (143)
\p{Hluw} \p{Anatolian_Hieroglyphs} (=
\p{Script_Extensions=
Anatolian_Hieroglyphs}) (NOT \p{Block=
Anatolian_Hieroglyphs}) (583)
\p{Hmng} \p{Pahawh_Hmong} (= \p{Script_Extensions=
Pahawh_Hmong}) (NOT \p{Block=
Pahawh_Hmong}) (127)
\p{HorizSpace} \p{XPosixBlank} (18)
\p{Hst: *} \p{Hangul_Syllable_Type: *}
\p{Hung} \p{Old_Hungarian} (= \p{Script_Extensions=
Old_Hungarian}) (NOT \p{Block=
Old_Hungarian}) (108)
D \p{Hyphen} \p{Hyphen=Y} (11)
D \p{Hyphen: N*} Supplanted by Line_Break property values;
see www.unicode.org/reports/tr14
(Single: \P{Hyphen}) (1_114_101 plus all
above-Unicode code points)
D \p{Hyphen: Y*} Supplanted by Line_Break property values;
see www.unicode.org/reports/tr14
(Single: \p{Hyphen}) (11)
\p{ID_Continue} \p{ID_Continue=Y} (Short: \p{IDC}; NOT
\p{Ideographic_Description_Characters})
(119_691)
\p{ID_Continue: N*} (Short: \p{IDC=N}, \P{IDC}) (994_421 plus
all above-Unicode code points)
\p{ID_Continue: Y*} (Short: \p{IDC=Y}, \p{IDC}) (119_691)
\p{ID_Start} \p{ID_Start=Y} (Short: \p{IDS}) (117_007)
\p{ID_Start: N*} (Short: \p{IDS=N}, \P{IDS}) (997_105 plus
all above-Unicode code points)
\p{ID_Start: Y*} (Short: \p{IDS=Y}, \p{IDS}) (117_007)
\p{IDC} \p{ID_Continue} (= \p{ID_Continue=Y}) (NOT
\p{Ideographic_Description_Characters})
(119_691)
\p{IDC: *} \p{ID_Continue: *}
\p{Ideo} \p{Ideographic} (= \p{Ideographic=Y})
(88_284)
\p{Ideo: *} \p{Ideographic: *}
\p{Ideographic} \p{Ideographic=Y} (Short: \p{Ideo})
(88_284)
\p{Ideographic: N*} (Short: \p{Ideo=N}, \P{Ideo}) (1_025_828
plus all above-Unicode code points)
\p{Ideographic: Y*} (Short: \p{Ideo=Y}, \p{Ideo}) (88_284)
X \p{Ideographic_Description_Characters} \p{Block=
Ideographic_Description_Characters}
(Short: \p{InIDC}) (16)
X \p{Ideographic_Symbols} \p{Ideographic_Symbols_And_Punctuation} (=
\p{Block=
Ideographic_Symbols_And_Punctuation})
(32)
X \p{Ideographic_Symbols_And_Punctuation} \p{Block=
Ideographic_Symbols_And_Punctuation}
(Short: \p{InIdeographicSymbols}) (32)
\p{IDS} \p{ID_Start} (= \p{ID_Start=Y}) (117_007)
\p{IDS: *} \p{ID_Start: *}
\p{IDS_Binary_Operator} \p{IDS_Binary_Operator=Y} (Short:
\p{IDSB}) (10)
\p{IDS_Binary_Operator: N*} (Short: \p{IDSB=N}, \P{IDSB})
(1_114_102 plus all above-Unicode code
points)
\p{IDS_Binary_Operator: Y*} (Short: \p{IDSB=Y}, \p{IDSB}) (10)
\p{IDS_Trinary_Operator} \p{IDS_Trinary_Operator=Y} (Short:
\p{IDST}) (2)
\p{IDS_Trinary_Operator: N*} (Short: \p{IDST=N}, \P{IDST})
(1_114_110 plus all above-Unicode code
points)
\p{IDS_Trinary_Operator: Y*} (Short: \p{IDST=Y}, \p{IDST}) (2)
\p{IDSB} \p{IDS_Binary_Operator} (=
\p{IDS_Binary_Operator=Y}) (10)
\p{IDSB: *} \p{IDS_Binary_Operator: *}
\p{IDST} \p{IDS_Trinary_Operator} (=
\p{IDS_Trinary_Operator=Y}) (2)
\p{IDST: *} \p{IDS_Trinary_Operator: *}
\p{Imperial_Aramaic} \p{Script_Extensions=Imperial_Aramaic}
(Short: \p{Armi}; NOT \p{Block=
Imperial_Aramaic}) (31)
\p{In: *} \p{Present_In: *} (Perl extension)
X \p{In_*} \p{Block: *}
X \p{Indic_Number_Forms} \p{Common_Indic_Number_Forms} (= \p{Block=
Common_Indic_Number_Forms}) (16)
\p{Indic_Positional_Category: Bottom} (Short: \p{InPC=Bottom})
(300)
\p{Indic_Positional_Category: Bottom_And_Right} (Short: \p{InPC=
BottomAndRight}) (2)
\p{Indic_Positional_Category: Left} (Short: \p{InPC=Left}) (57)
\p{Indic_Positional_Category: Left_And_Right} (Short: \p{InPC=
LeftAndRight}) (21)
\p{Indic_Positional_Category: NA} (Short: \p{InPC=NA}) (1_113_069
plus all above-Unicode code points)
\p{Indic_Positional_Category: Overstruck} (Short: \p{InPC=
Overstruck}) (10)
\p{Indic_Positional_Category: Right} (Short: \p{InPC=Right}) (258)
\p{Indic_Positional_Category: Top} (Short: \p{InPC=Top}) (342)
\p{Indic_Positional_Category: Top_And_Bottom} (Short: \p{InPC=
TopAndBottom}) (10)
\p{Indic_Positional_Category: Top_And_Bottom_And_Right} (Short:
\p{InPC=TopAndBottomAndRight}) (1)
\p{Indic_Positional_Category: Top_And_Left} (Short: \p{InPC=
TopAndLeft}) (6)
\p{Indic_Positional_Category: Top_And_Left_And_Right} (Short:
\p{InPC=TopAndLeftAndRight}) (4)
\p{Indic_Positional_Category: Top_And_Right} (Short: \p{InPC=
TopAndRight}) (13)
\p{Indic_Positional_Category: Visual_Order_Left} (Short: \p{InPC=
VisualOrderLeft}) (19)
\p{Indic_Syllabic_Category: Avagraha} (Short: \p{InSC=Avagraha})
(15)
\p{Indic_Syllabic_Category: Bindu} (Short: \p{InSC=Bindu}) (67)
\p{Indic_Syllabic_Category: Brahmi_Joining_Number} (Short:
\p{InSC=BrahmiJoiningNumber}) (20)
\p{Indic_Syllabic_Category: Cantillation_Mark} (Short: \p{InSC=
CantillationMark}) (53)
\p{Indic_Syllabic_Category: Consonant} (Short: \p{InSC=Consonant})
(1907)
\p{Indic_Syllabic_Category: Consonant_Dead} (Short: \p{InSC=
ConsonantDead}) (10)
\p{Indic_Syllabic_Category: Consonant_Final} (Short: \p{InSC=
ConsonantFinal}) (62)
\p{Indic_Syllabic_Category: Consonant_Head_Letter} (Short:
\p{InSC=ConsonantHeadLetter}) (5)
\p{Indic_Syllabic_Category: Consonant_Killer} (Short: \p{InSC=
ConsonantKiller}) (2)
\p{Indic_Syllabic_Category: Consonant_Medial} (Short: \p{InSC=
ConsonantMedial}) (22)
\p{Indic_Syllabic_Category: Consonant_Placeholder} (Short:
\p{InSC=ConsonantPlaceholder}) (16)
\p{Indic_Syllabic_Category: Consonant_Preceding_Repha} (Short:
\p{InSC=ConsonantPrecedingRepha}) (1)
\p{Indic_Syllabic_Category: Consonant_Prefixed} (Short: \p{InSC=
ConsonantPrefixed}) (2)
\p{Indic_Syllabic_Category: Consonant_Subjoined} (Short: \p{InSC=
ConsonantSubjoined}) (90)
\p{Indic_Syllabic_Category: Consonant_Succeeding_Repha} (Short:
\p{InSC=ConsonantSucceedingRepha}) (4)
\p{Indic_Syllabic_Category: Consonant_With_Stacker} (Short:
\p{InSC=ConsonantWithStacker}) (4)
\p{Indic_Syllabic_Category: Gemination_Mark} (Short: \p{InSC=
GeminationMark}) (2)
\p{Indic_Syllabic_Category: Invisible_Stacker} (Short: \p{InSC=
InvisibleStacker}) (7)
\p{Indic_Syllabic_Category: Joiner} (Short: \p{InSC=Joiner}) (1)
\p{Indic_Syllabic_Category: Modifying_Letter} (Short: \p{InSC=
ModifyingLetter}) (1)
\p{Indic_Syllabic_Category: Non_Joiner} (Short: \p{InSC=
NonJoiner}) (1)
\p{Indic_Syllabic_Category: Nukta} (Short: \p{InSC=Nukta}) (24)
\p{Indic_Syllabic_Category: Number} (Short: \p{InSC=Number}) (459)
\p{Indic_Syllabic_Category: Number_Joiner} (Short: \p{InSC=
NumberJoiner}) (1)
\p{Indic_Syllabic_Category: Other} (Short: \p{InSC=Other})
(1_110_129 plus all above-Unicode code
points)
\p{Indic_Syllabic_Category: Pure_Killer} (Short: \p{InSC=
PureKiller}) (16)
\p{Indic_Syllabic_Category: Register_Shifter} (Short: \p{InSC=
RegisterShifter}) (2)
\p{Indic_Syllabic_Category: Syllable_Modifier} (Short: \p{InSC=
SyllableModifier}) (22)
\p{Indic_Syllabic_Category: Tone_Letter} (Short: \p{InSC=
ToneLetter}) (7)
\p{Indic_Syllabic_Category: Tone_Mark} (Short: \p{InSC=ToneMark})
(42)
\p{Indic_Syllabic_Category: Virama} (Short: \p{InSC=Virama}) (24)
\p{Indic_Syllabic_Category: Visarga} (Short: \p{InSC=Visarga}) (31)
\p{Indic_Syllabic_Category: Vowel} (Short: \p{InSC=Vowel}) (30)
\p{Indic_Syllabic_Category: Vowel_Dependent} (Short: \p{InSC=
VowelDependent}) (602)
\p{Indic_Syllabic_Category: Vowel_Independent} (Short: \p{InSC=
VowelIndependent}) (431)
\p{Inherited} \p{Script_Extensions=Inherited} (Short:
\p{Zinh}) (496)
\p{Initial_Punctuation} \p{General_Category=Initial_Punctuation}
(Short: \p{Pi}) (12)
\p{InPC: *} \p{Indic_Positional_Category: *}
\p{InSC: *} \p{Indic_Syllabic_Category: *}
\p{Inscriptional_Pahlavi} \p{Script_Extensions=
Inscriptional_Pahlavi} (Short: \p{Phli};
NOT \p{Block=Inscriptional_Pahlavi}) (27)
\p{Inscriptional_Parthian} \p{Script_Extensions=
Inscriptional_Parthian} (Short:
\p{Prti}; NOT \p{Block=
Inscriptional_Parthian}) (30)
X \p{IPA_Ext} \p{IPA_Extensions} (= \p{Block=
IPA_Extensions}) (96)
X \p{IPA_Extensions} \p{Block=IPA_Extensions} (Short:
\p{InIPAExt}) (96)
\p{Is_*} \p{*} (Any exceptions are individually
noted beginning with the word NOT.) If
an entry has flag(s) at its beginning,
like "D", the "Is_" form has the same
flag(s)
\p{Ital} \p{Old_Italic} (= \p{Script_Extensions=
Old_Italic}) (NOT \p{Block=Old_Italic})
(36)
X \p{Jamo} \p{Hangul_Jamo} (= \p{Block=Hangul_Jamo})
(256)
X \p{Jamo_Ext_A} \p{Hangul_Jamo_Extended_A} (= \p{Block=
Hangul_Jamo_Extended_A}) (32)
X \p{Jamo_Ext_B} \p{Hangul_Jamo_Extended_B} (= \p{Block=
Hangul_Jamo_Extended_B}) (80)
\p{Java} \p{Javanese} (= \p{Script_Extensions=
Javanese}) (NOT \p{Block=Javanese}) (91)
\p{Javanese} \p{Script_Extensions=Javanese} (Short:
\p{Java}; NOT \p{Block=Javanese}) (91)
\p{Jg: *} \p{Joining_Group: *}
\p{Join_C} \p{Join_Control} (= \p{Join_Control=Y}) (2)
\p{Join_C: *} \p{Join_Control: *}
\p{Join_Control} \p{Join_Control=Y} (Short: \p{JoinC}) (2)
\p{Join_Control: N*} (Short: \p{JoinC=N}, \P{JoinC}) (1_114_110
plus all above-Unicode code points)
\p{Join_Control: Y*} (Short: \p{JoinC=Y}, \p{JoinC}) (2)
\p{Joining_Group: African_Feh} (Short: \p{Jg=AfricanFeh}) (1)
\p{Joining_Group: African_Noon} (Short: \p{Jg=AfricanNoon}) (1)
\p{Joining_Group: African_Qaf} (Short: \p{Jg=AfricanQaf}) (1)
\p{Joining_Group: Ain} (Short: \p{Jg=Ain}) (8)
\p{Joining_Group: Alaph} (Short: \p{Jg=Alaph}) (1)
\p{Joining_Group: Alef} (Short: \p{Jg=Alef}) (10)
\p{Joining_Group: Beh} (Short: \p{Jg=Beh}) (24)
\p{Joining_Group: Beth} (Short: \p{Jg=Beth}) (2)
\p{Joining_Group: Burushaski_Yeh_Barree} (Short: \p{Jg=
BurushaskiYehBarree}) (2)
\p{Joining_Group: Dal} (Short: \p{Jg=Dal}) (15)
\p{Joining_Group: Dalath_Rish} (Short: \p{Jg=DalathRish}) (4)
\p{Joining_Group: E} (Short: \p{Jg=E}) (1)
\p{Joining_Group: Farsi_Yeh} (Short: \p{Jg=FarsiYeh}) (7)
\p{Joining_Group: Fe} (Short: \p{Jg=Fe}) (1)
\p{Joining_Group: Feh} (Short: \p{Jg=Feh}) (10)
\p{Joining_Group: Final_Semkath} (Short: \p{Jg=FinalSemkath}) (1)
\p{Joining_Group: Gaf} (Short: \p{Jg=Gaf}) (14)
\p{Joining_Group: Gamal} (Short: \p{Jg=Gamal}) (3)
\p{Joining_Group: Hah} (Short: \p{Jg=Hah}) (18)
\p{Joining_Group: Hamza_On_Heh_Goal} (Short: \p{Jg=
HamzaOnHehGoal}) (1)
\p{Joining_Group: He} (Short: \p{Jg=He}) (1)
\p{Joining_Group: Heh} (Short: \p{Jg=Heh}) (1)
\p{Joining_Group: Heh_Goal} (Short: \p{Jg=HehGoal}) (2)
\p{Joining_Group: Heth} (Short: \p{Jg=Heth}) (1)
\p{Joining_Group: Kaf} (Short: \p{Jg=Kaf}) (6)
\p{Joining_Group: Kaph} (Short: \p{Jg=Kaph}) (1)
\p{Joining_Group: Khaph} (Short: \p{Jg=Khaph}) (1)
\p{Joining_Group: Knotted_Heh} (Short: \p{Jg=KnottedHeh}) (2)
\p{Joining_Group: Lam} (Short: \p{Jg=Lam}) (7)
\p{Joining_Group: Lamadh} (Short: \p{Jg=Lamadh}) (1)
\p{Joining_Group: Manichaean_Aleph} (Short: \p{Jg=
ManichaeanAleph}) (1)
\p{Joining_Group: Manichaean_Ayin} (Short: \p{Jg=ManichaeanAyin})
(2)
\p{Joining_Group: Manichaean_Beth} (Short: \p{Jg=ManichaeanBeth})
(2)
\p{Joining_Group: Manichaean_Daleth} (Short: \p{Jg=
ManichaeanDaleth}) (1)
\p{Joining_Group: Manichaean_Dhamedh} (Short: \p{Jg=
ManichaeanDhamedh}) (1)
\p{Joining_Group: Manichaean_Five} (Short: \p{Jg=ManichaeanFive})
(1)
\p{Joining_Group: Manichaean_Gimel} (Short: \p{Jg=
ManichaeanGimel}) (2)
\p{Joining_Group: Manichaean_Heth} (Short: \p{Jg=ManichaeanHeth})
(1)
\p{Joining_Group: Manichaean_Hundred} (Short: \p{Jg=
ManichaeanHundred}) (1)
\p{Joining_Group: Manichaean_Kaph} (Short: \p{Jg=ManichaeanKaph})
(3)
\p{Joining_Group: Manichaean_Lamedh} (Short: \p{Jg=
ManichaeanLamedh}) (1)
\p{Joining_Group: Manichaean_Mem} (Short: \p{Jg=ManichaeanMem}) (1)
\p{Joining_Group: Manichaean_Nun} (Short: \p{Jg=ManichaeanNun}) (1)
\p{Joining_Group: Manichaean_One} (Short: \p{Jg=ManichaeanOne}) (1)
\p{Joining_Group: Manichaean_Pe} (Short: \p{Jg=ManichaeanPe}) (2)
\p{Joining_Group: Manichaean_Qoph} (Short: \p{Jg=ManichaeanQoph})
(3)
\p{Joining_Group: Manichaean_Resh} (Short: \p{Jg=ManichaeanResh})
(1)
\p{Joining_Group: Manichaean_Sadhe} (Short: \p{Jg=
ManichaeanSadhe}) (1)
\p{Joining_Group: Manichaean_Samekh} (Short: \p{Jg=
ManichaeanSamekh}) (1)
\p{Joining_Group: Manichaean_Taw} (Short: \p{Jg=ManichaeanTaw}) (1)
\p{Joining_Group: Manichaean_Ten} (Short: \p{Jg=ManichaeanTen}) (1)
\p{Joining_Group: Manichaean_Teth} (Short: \p{Jg=ManichaeanTeth})
(1)
\p{Joining_Group: Manichaean_Thamedh} (Short: \p{Jg=
ManichaeanThamedh}) (1)
\p{Joining_Group: Manichaean_Twenty} (Short: \p{Jg=
ManichaeanTwenty}) (1)
\p{Joining_Group: Manichaean_Waw} (Short: \p{Jg=ManichaeanWaw}) (1)
\p{Joining_Group: Manichaean_Yodh} (Short: \p{Jg=ManichaeanYodh})
(1)
\p{Joining_Group: Manichaean_Zayin} (Short: \p{Jg=
ManichaeanZayin}) (2)
\p{Joining_Group: Meem} (Short: \p{Jg=Meem}) (4)
\p{Joining_Group: Mim} (Short: \p{Jg=Mim}) (1)
\p{Joining_Group: No_Joining_Group} (Short: \p{Jg=NoJoiningGroup})
(1_113_818 plus all above-Unicode code
points)
\p{Joining_Group: Noon} (Short: \p{Jg=Noon}) (8)
\p{Joining_Group: Nun} (Short: \p{Jg=Nun}) (1)
\p{Joining_Group: Nya} (Short: \p{Jg=Nya}) (1)
\p{Joining_Group: Pe} (Short: \p{Jg=Pe}) (1)
\p{Joining_Group: Qaf} (Short: \p{Jg=Qaf}) (5)
\p{Joining_Group: Qaph} (Short: \p{Jg=Qaph}) (1)
\p{Joining_Group: Reh} (Short: \p{Jg=Reh}) (19)
\p{Joining_Group: Reversed_Pe} (Short: \p{Jg=ReversedPe}) (1)
\p{Joining_Group: Rohingya_Yeh} (Short: \p{Jg=RohingyaYeh}) (1)
\p{Joining_Group: Sad} (Short: \p{Jg=Sad}) (6)
\p{Joining_Group: Sadhe} (Short: \p{Jg=Sadhe}) (1)
\p{Joining_Group: Seen} (Short: \p{Jg=Seen}) (11)
\p{Joining_Group: Semkath} (Short: \p{Jg=Semkath}) (1)
\p{Joining_Group: Shin} (Short: \p{Jg=Shin}) (1)
\p{Joining_Group: Straight_Waw} (Short: \p{Jg=StraightWaw}) (1)
\p{Joining_Group: Swash_Kaf} (Short: \p{Jg=SwashKaf}) (1)
\p{Joining_Group: Syriac_Waw} (Short: \p{Jg=SyriacWaw}) (1)
\p{Joining_Group: Tah} (Short: \p{Jg=Tah}) (4)
\p{Joining_Group: Taw} (Short: \p{Jg=Taw}) (1)
\p{Joining_Group: Teh_Marbuta} (Short: \p{Jg=TehMarbuta}) (3)
\p{Joining_Group: Teh_Marbuta_Goal} \p{Joining_Group=
Hamza_On_Heh_Goal} (1)
\p{Joining_Group: Teth} (Short: \p{Jg=Teth}) (2)
\p{Joining_Group: Waw} (Short: \p{Jg=Waw}) (16)
\p{Joining_Group: Yeh} (Short: \p{Jg=Yeh}) (11)
\p{Joining_Group: Yeh_Barree} (Short: \p{Jg=YehBarree}) (2)
\p{Joining_Group: Yeh_With_Tail} (Short: \p{Jg=YehWithTail}) (1)
\p{Joining_Group: Yudh} (Short: \p{Jg=Yudh}) (1)
\p{Joining_Group: Yudh_He} (Short: \p{Jg=YudhHe}) (1)
\p{Joining_Group: Zain} (Short: \p{Jg=Zain}) (1)
\p{Joining_Group: Zhain} (Short: \p{Jg=Zhain}) (1)
\p{Joining_Type: C} \p{Joining_Type=Join_Causing} (4)
\p{Joining_Type: D} \p{Joining_Type=Dual_Joining} (501)
\p{Joining_Type: Dual_Joining} (Short: \p{Jt=D}) (501)
\p{Joining_Type: Join_Causing} (Short: \p{Jt=C}) (4)
\p{Joining_Type: L} \p{Joining_Type=Left_Joining} (3)
\p{Joining_Type: Left_Joining} (Short: \p{Jt=L}) (3)
\p{Joining_Type: Non_Joining} (Short: \p{Jt=U}) (1_111_653 plus
all above-Unicode code points)
\p{Joining_Type: R} \p{Joining_Type=Right_Joining} (112)
\p{Joining_Type: Right_Joining} (Short: \p{Jt=R}) (112)
\p{Joining_Type: T} \p{Joining_Type=Transparent} (1839)
\p{Joining_Type: Transparent} (Short: \p{Jt=T}) (1839)
\p{Joining_Type: U} \p{Joining_Type=Non_Joining} (1_111_653
plus all above-Unicode code points)
\p{Jt: *} \p{Joining_Type: *}
\p{Kaithi} \p{Script_Extensions=Kaithi} (Short:
\p{Kthi}; NOT \p{Block=Kaithi}) (86)
\p{Kali} \p{Kayah_Li} (= \p{Script_Extensions=
Kayah_Li}) (48)
\p{Kana} \p{Katakana} (= \p{Script_Extensions=
Katakana}) (NOT \p{Block=Katakana}) (352)
X \p{Kana_Sup} \p{Kana_Supplement} (= \p{Block=
Kana_Supplement}) (256)
X \p{Kana_Supplement} \p{Block=Kana_Supplement} (Short:
\p{InKanaSup}) (256)
X \p{Kanbun} \p{Block=Kanbun} (16)
X \p{Kangxi} \p{Kangxi_Radicals} (= \p{Block=
Kangxi_Radicals}) (224)
X \p{Kangxi_Radicals} \p{Block=Kangxi_Radicals} (Short:
\p{InKangxi}) (224)
\p{Kannada} \p{Script_Extensions=Kannada} (Short:
\p{Knda}; NOT \p{Block=Kannada}) (100)
\p{Katakana} \p{Script_Extensions=Katakana} (Short:
\p{Kana}; NOT \p{Block=Katakana}) (352)
X \p{Katakana_Ext} \p{Katakana_Phonetic_Extensions} (=
\p{Block=Katakana_Phonetic_Extensions})
(16)
X \p{Katakana_Phonetic_Extensions} \p{Block=
Katakana_Phonetic_Extensions} (Short:
\p{InKatakanaExt}) (16)
\p{Kayah_Li} \p{Script_Extensions=Kayah_Li} (Short:
\p{Kali}) (48)
\p{Khar} \p{Kharoshthi} (= \p{Script_Extensions=
Kharoshthi}) (NOT \p{Block=Kharoshthi})
(65)
\p{Kharoshthi} \p{Script_Extensions=Kharoshthi} (Short:
\p{Khar}; NOT \p{Block=Kharoshthi}) (65)
\p{Khmer} \p{Script_Extensions=Khmer} (Short:
\p{Khmr}; NOT \p{Block=Khmer}) (146)
X \p{Khmer_Symbols} \p{Block=Khmer_Symbols} (32)
\p{Khmr} \p{Khmer} (= \p{Script_Extensions=Khmer})
(NOT \p{Block=Khmer}) (146)
\p{Khoj} \p{Khojki} (= \p{Script_Extensions=
Khojki}) (NOT \p{Block=Khojki}) (72)
\p{Khojki} \p{Script_Extensions=Khojki} (Short:
\p{Khoj}; NOT \p{Block=Khojki}) (72)
\p{Khudawadi} \p{Script_Extensions=Khudawadi} (Short:
\p{Sind}; NOT \p{Block=Khudawadi}) (81)
\p{Knda} \p{Kannada} (= \p{Script_Extensions=
Kannada}) (NOT \p{Block=Kannada}) (100)
\p{Kthi} \p{Kaithi} (= \p{Script_Extensions=
Kaithi}) (NOT \p{Block=Kaithi}) (86)
\p{L} \pL \p{Letter} (= \p{General_Category=Letter})
(116_766)
X \p{L&} \p{Cased_Letter} (= \p{General_Category=
Cased_Letter}) (3796)
X \p{L_} \p{Cased_Letter} (= \p{General_Category=
Cased_Letter}) Note the trailing '_'
matters in spite of loose matching
rules. (3796)
\p{Lana} \p{Tai_Tham} (= \p{Script_Extensions=
Tai_Tham}) (NOT \p{Block=Tai_Tham}) (127)
\p{Lao} \p{Script_Extensions=Lao} (NOT \p{Block=
Lao}) (67)
\p{Laoo} \p{Lao} (= \p{Script_Extensions=Lao}) (NOT
\p{Block=Lao}) (67)
\p{Latin} \p{Script_Extensions=Latin} (Short:
\p{Latn}) (1370)
X \p{Latin_1} \p{Latin_1_Supplement} (= \p{Block=
Latin_1_Supplement}) (128)
X \p{Latin_1_Sup} \p{Latin_1_Supplement} (= \p{Block=
Latin_1_Supplement}) (128)
X \p{Latin_1_Supplement} \p{Block=Latin_1_Supplement} (Short:
\p{InLatin1}) (128)
X \p{Latin_Ext_A} \p{Latin_Extended_A} (= \p{Block=
Latin_Extended_A}) (128)
X \p{Latin_Ext_Additional} \p{Latin_Extended_Additional} (=
\p{Block=Latin_Extended_Additional})
(256)
X \p{Latin_Ext_B} \p{Latin_Extended_B} (= \p{Block=
Latin_Extended_B}) (208)
X \p{Latin_Ext_C} \p{Latin_Extended_C} (= \p{Block=
Latin_Extended_C}) (32)
X \p{Latin_Ext_D} \p{Latin_Extended_D} (= \p{Block=
Latin_Extended_D}) (224)
X \p{Latin_Ext_E} \p{Latin_Extended_E} (= \p{Block=
Latin_Extended_E}) (64)
X \p{Latin_Extended_A} \p{Block=Latin_Extended_A} (Short:
\p{InLatinExtA}) (128)
X \p{Latin_Extended_Additional} \p{Block=Latin_Extended_Additional}
(Short: \p{InLatinExtAdditional}) (256)
X \p{Latin_Extended_B} \p{Block=Latin_Extended_B} (Short:
\p{InLatinExtB}) (208)
X \p{Latin_Extended_C} \p{Block=Latin_Extended_C} (Short:
\p{InLatinExtC}) (32)
X \p{Latin_Extended_D} \p{Block=Latin_Extended_D} (Short:
\p{InLatinExtD}) (224)
X \p{Latin_Extended_E} \p{Block=Latin_Extended_E} (Short:
\p{InLatinExtE}) (64)
\p{Latn} \p{Latin} (= \p{Script_Extensions=Latin})
(1370)
\p{Lb: *} \p{Line_Break: *}
\p{LC} \p{Cased_Letter} (= \p{General_Category=
Cased_Letter}) (3796)
\p{Lepc} \p{Lepcha} (= \p{Script_Extensions=
Lepcha}) (NOT \p{Block=Lepcha}) (74)
\p{Lepcha} \p{Script_Extensions=Lepcha} (Short:
\p{Lepc}; NOT \p{Block=Lepcha}) (74)
\p{Letter} \p{General_Category=Letter} (Short: \p{L})
(116_766)
\p{Letter_Number} \p{General_Category=Letter_Number} (Short:
\p{Nl}) (236)
X \p{Letterlike_Symbols} \p{Block=Letterlike_Symbols} (80)
\p{Limb} \p{Limbu} (= \p{Script_Extensions=Limbu})
(NOT \p{Block=Limbu}) (69)
\p{Limbu} \p{Script_Extensions=Limbu} (Short:
\p{Limb}; NOT \p{Block=Limbu}) (69)
\p{Lina} \p{Linear_A} (= \p{Script_Extensions=
Linear_A}) (NOT \p{Block=Linear_A}) (386)
\p{Linb} \p{Linear_B} (= \p{Script_Extensions=
Linear_B}) (268)
\p{Line_Break: AI} \p{Line_Break=Ambiguous} (707)
\p{Line_Break: AL} \p{Line_Break=Alphabetic} (19_523)
\p{Line_Break: Alphabetic} (Short: \p{Lb=AL}) (19_523)
\p{Line_Break: Ambiguous} (Short: \p{Lb=AI}) (707)
\p{Line_Break: B2} \p{Line_Break=Break_Both} (3)
\p{Line_Break: BA} \p{Line_Break=Break_After} (218)
\p{Line_Break: BB} \p{Line_Break=Break_Before} (37)
\p{Line_Break: BK} \p{Line_Break=Mandatory_Break} (4)
\p{Line_Break: Break_After} (Short: \p{Lb=BA}) (218)
\p{Line_Break: Break_Before} (Short: \p{Lb=BB}) (37)
\p{Line_Break: Break_Both} (Short: \p{Lb=B2}) (3)
\p{Line_Break: Break_Symbols} (Short: \p{Lb=SY}) (1)
\p{Line_Break: Carriage_Return} (Short: \p{Lb=CR}) (1)
\p{Line_Break: CB} \p{Line_Break=Contingent_Break} (1)
\p{Line_Break: CJ} \p{Line_Break=
Conditional_Japanese_Starter} (51)
\p{Line_Break: CL} \p{Line_Break=Close_Punctuation} (90)
\p{Line_Break: Close_Parenthesis} (Short: \p{Lb=CP}) (2)
\p{Line_Break: Close_Punctuation} (Short: \p{Lb=CL}) (90)
\p{Line_Break: CM} \p{Line_Break=Combining_Mark} (2090)
\p{Line_Break: Combining_Mark} (Short: \p{Lb=CM}) (2090)
\p{Line_Break: Complex_Context} (Short: \p{Lb=SA}) (734)
\p{Line_Break: Conditional_Japanese_Starter} (Short: \p{Lb=CJ})
(51)
\p{Line_Break: Contingent_Break} (Short: \p{Lb=CB}) (1)
\p{Line_Break: CP} \p{Line_Break=Close_Parenthesis} (2)
\p{Line_Break: CR} \p{Line_Break=Carriage_Return} (1)
\p{Line_Break: E_Base} (Short: \p{Lb=EB}) (83)
\p{Line_Break: E_Modifier} (Short: \p{Lb=EM}) (5)
\p{Line_Break: EB} \p{Line_Break=E_Base} (83)
\p{Line_Break: EM} \p{Line_Break=E_Modifier} (5)
\p{Line_Break: EX} \p{Line_Break=Exclamation} (37)
\p{Line_Break: Exclamation} (Short: \p{Lb=EX}) (37)
\p{Line_Break: GL} \p{Line_Break=Glue} (18)
\p{Line_Break: Glue} (Short: \p{Lb=GL}) (18)
\p{Line_Break: H2} (Short: \p{Lb=H2}) (399)
\p{Line_Break: H3} (Short: \p{Lb=H3}) (10_773)
\p{Line_Break: Hebrew_Letter} (Short: \p{Lb=HL}) (74)
\p{Line_Break: HL} \p{Line_Break=Hebrew_Letter} (74)
\p{Line_Break: HY} \p{Line_Break=Hyphen} (1)
\p{Line_Break: Hyphen} (Short: \p{Lb=HY}) (1)
\p{Line_Break: ID} \p{Line_Break=Ideographic} (172_133)
\p{Line_Break: Ideographic} (Short: \p{Lb=ID}) (172_133)
\p{Line_Break: IN} \p{Line_Break=Inseparable} (6)
\p{Line_Break: Infix_Numeric} (Short: \p{Lb=IS}) (13)
\p{Line_Break: Inseparable} (Short: \p{Lb=IN}) (6)
\p{Line_Break: Inseperable} \p{Line_Break=Inseparable} (6)
\p{Line_Break: IS} \p{Line_Break=Infix_Numeric} (13)
\p{Line_Break: JL} (Short: \p{Lb=JL}) (125)
\p{Line_Break: JT} (Short: \p{Lb=JT}) (137)
\p{Line_Break: JV} (Short: \p{Lb=JV}) (95)
\p{Line_Break: LF} \p{Line_Break=Line_Feed} (1)
\p{Line_Break: Line_Feed} (Short: \p{Lb=LF}) (1)
\p{Line_Break: Mandatory_Break} (Short: \p{Lb=BK}) (4)
\p{Line_Break: Next_Line} (Short: \p{Lb=NL}) (1)
\p{Line_Break: NL} \p{Line_Break=Next_Line} (1)
\p{Line_Break: Nonstarter} (Short: \p{Lb=NS}) (30)
\p{Line_Break: NS} \p{Line_Break=Nonstarter} (30)
\p{Line_Break: NU} \p{Line_Break=Numeric} (572)
\p{Line_Break: Numeric} (Short: \p{Lb=NU}) (572)
\p{Line_Break: OP} \p{Line_Break=Open_Punctuation} (87)
\p{Line_Break: Open_Punctuation} (Short: \p{Lb=OP}) (87)
\p{Line_Break: PO} \p{Line_Break=Postfix_Numeric} (30)
\p{Line_Break: Postfix_Numeric} (Short: \p{Lb=PO}) (30)
\p{Line_Break: PR} \p{Line_Break=Prefix_Numeric} (65)
\p{Line_Break: Prefix_Numeric} (Short: \p{Lb=PR}) (65)
\p{Line_Break: QU} \p{Line_Break=Quotation} (39)
\p{Line_Break: Quotation} (Short: \p{Lb=QU}) (39)
\p{Line_Break: Regional_Indicator} (Short: \p{Lb=RI}) (26)
\p{Line_Break: RI} \p{Line_Break=Regional_Indicator} (26)
\p{Line_Break: SA} \p{Line_Break=Complex_Context} (734)
D \p{Line_Break: SG} \p{Line_Break=Surrogate} (2048)
\p{Line_Break: SP} \p{Line_Break=Space} (1)
\p{Line_Break: Space} (Short: \p{Lb=SP}) (1)
D \p{Line_Break: Surrogate} Deprecated by Unicode because surrogates
should never appear in well-formed text,
and therefore shouldn't be the basis for
line breaking (Short: \p{Lb=SG}) (2048)
\p{Line_Break: SY} \p{Line_Break=Break_Symbols} (1)
\p{Line_Break: Unknown} (Short: \p{Lb=XX}) (903_847 plus all
above-Unicode code points)
\p{Line_Break: WJ} \p{Line_Break=Word_Joiner} (2)
\p{Line_Break: Word_Joiner} (Short: \p{Lb=WJ}) (2)
\p{Line_Break: XX} \p{Line_Break=Unknown} (903_847 plus all
above-Unicode code points)
\p{Line_Break: ZW} \p{Line_Break=ZWSpace} (1)
\p{Line_Break: ZWJ} (Short: \p{Lb=ZWJ}) (1)
\p{Line_Break: ZWSpace} (Short: \p{Lb=ZW}) (1)
\p{Line_Separator} \p{General_Category=Line_Separator}
(Short: \p{Zl}) (1)
\p{Linear_A} \p{Script_Extensions=Linear_A} (Short:
\p{Lina}; NOT \p{Block=Linear_A}) (386)
\p{Linear_B} \p{Script_Extensions=Linear_B} (Short:
\p{Linb}) (268)
X \p{Linear_B_Ideograms} \p{Block=Linear_B_Ideograms} (128)
X \p{Linear_B_Syllabary} \p{Block=Linear_B_Syllabary} (128)
\p{Lisu} \p{Script_Extensions=Lisu} (48)
\p{Ll} \p{Lowercase_Letter} (=
\p{General_Category=Lowercase_Letter})
(/i= General_Category=Cased_Letter)
(2063)
\p{Lm} \p{Modifier_Letter} (=
\p{General_Category=Modifier_Letter})
(249)
\p{Lo} \p{Other_Letter} (= \p{General_Category=
Other_Letter}) (112_721)
\p{LOE} \p{Logical_Order_Exception} (=
\p{Logical_Order_Exception=Y}) (19)
\p{LOE: *} \p{Logical_Order_Exception: *}
\p{Logical_Order_Exception} \p{Logical_Order_Exception=Y} (Short:
\p{LOE}) (19)
\p{Logical_Order_Exception: N*} (Short: \p{LOE=N}, \P{LOE})
(1_114_093 plus all above-Unicode code
points)
\p{Logical_Order_Exception: Y*} (Short: \p{LOE=Y}, \p{LOE}) (19)
X \p{Low_Surrogates} \p{Block=Low_Surrogates} (1024)
\p{Lower} \p{XPosixLower} (= \p{Lowercase=Y}) (/i=
Cased=Yes) (2252)
\p{Lower: *} \p{Lowercase: *}
\p{Lowercase} \p{XPosixLower} (= \p{Lowercase=Y}) (/i=
Cased=Yes) (2252)
\p{Lowercase: N*} (Short: \p{Lower=N}, \P{Lower}; /i= Cased=
No) (1_111_860 plus all above-Unicode
code points)
\p{Lowercase: Y*} (Short: \p{Lower=Y}, \p{Lower}; /i= Cased=
Yes) (2252)
\p{Lowercase_Letter} \p{General_Category=Lowercase_Letter}
(Short: \p{Ll}; /i= General_Category=
Cased_Letter) (2063)
\p{Lt} \p{Titlecase_Letter} (=
\p{General_Category=Titlecase_Letter})
(/i= General_Category=Cased_Letter) (31)
\p{Lu} \p{Uppercase_Letter} (=
\p{General_Category=Uppercase_Letter})
(/i= General_Category=Cased_Letter)
(1702)
\p{Lyci} \p{Lycian} (= \p{Script_Extensions=
Lycian}) (NOT \p{Block=Lycian}) (29)
\p{Lycian} \p{Script_Extensions=Lycian} (Short:
\p{Lyci}; NOT \p{Block=Lycian}) (29)
\p{Lydi} \p{Lydian} (= \p{Script_Extensions=
Lydian}) (NOT \p{Block=Lydian}) (27)
\p{Lydian} \p{Script_Extensions=Lydian} (Short:
\p{Lydi}; NOT \p{Block=Lydian}) (27)
\p{M} \pM \p{Mark} (= \p{General_Category=Mark})
(2097)
\p{Mahajani} \p{Script_Extensions=Mahajani} (Short:
\p{Mahj}; NOT \p{Block=Mahajani}) (61)
\p{Mahj} \p{Mahajani} (= \p{Script_Extensions=
Mahajani}) (NOT \p{Block=Mahajani}) (61)
X \p{Mahjong} \p{Mahjong_Tiles} (= \p{Block=
Mahjong_Tiles}) (48)
X \p{Mahjong_Tiles} \p{Block=Mahjong_Tiles} (Short:
\p{InMahjong}) (48)
\p{Malayalam} \p{Script_Extensions=Malayalam} (Short:
\p{Mlym}; NOT \p{Block=Malayalam}) (119)
\p{Mand} \p{Mandaic} (= \p{Script_Extensions=
Mandaic}) (NOT \p{Block=Mandaic}) (30)
\p{Mandaic} \p{Script_Extensions=Mandaic} (Short:
\p{Mand}; NOT \p{Block=Mandaic}) (30)
\p{Mani} \p{Manichaean} (= \p{Script_Extensions=
Manichaean}) (NOT \p{Block=Manichaean})
(52)
\p{Manichaean} \p{Script_Extensions=Manichaean} (Short:
\p{Mani}; NOT \p{Block=Manichaean}) (52)
\p{Marc} \p{Marchen} (= \p{Script_Extensions=
Marchen}) (NOT \p{Block=Marchen}) (68)
\p{Marchen} \p{Script_Extensions=Marchen} (Short:
\p{Marc}; NOT \p{Block=Marchen}) (68)
\p{Mark} \p{General_Category=Mark} (Short: \p{M})
(2097)
\p{Math} \p{Math=Y} (2310)
\p{Math: N*} (Single: \P{Math}) (1_111_802 plus all
above-Unicode code points)
\p{Math: Y*} (Single: \p{Math}) (2310)
X \p{Math_Alphanum} \p{Mathematical_Alphanumeric_Symbols} (=
\p{Block=
Mathematical_Alphanumeric_Symbols})
(1024)
X \p{Math_Operators} \p{Mathematical_Operators} (= \p{Block=
Mathematical_Operators}) (256)
\p{Math_Symbol} \p{General_Category=Math_Symbol} (Short:
\p{Sm}) (948)
X \p{Mathematical_Alphanumeric_Symbols} \p{Block=
Mathematical_Alphanumeric_Symbols}
(Short: \p{InMathAlphanum}) (1024)
X \p{Mathematical_Operators} \p{Block=Mathematical_Operators}
(Short: \p{InMathOperators}) (256)
\p{Mc} \p{Spacing_Mark} (= \p{General_Category=
Spacing_Mark}) (394)
\p{Me} \p{Enclosing_Mark} (= \p{General_Category=
Enclosing_Mark}) (13)
\p{Meetei_Mayek} \p{Script_Extensions=Meetei_Mayek} (Short:
\p{Mtei}; NOT \p{Block=Meetei_Mayek})
(79)
X \p{Meetei_Mayek_Ext} \p{Meetei_Mayek_Extensions} (= \p{Block=
Meetei_Mayek_Extensions}) (32)
X \p{Meetei_Mayek_Extensions} \p{Block=Meetei_Mayek_Extensions}
(Short: \p{InMeeteiMayekExt}) (32)
\p{Mend} \p{Mende_Kikakui} (= \p{Script_Extensions=
Mende_Kikakui}) (NOT \p{Block=
Mende_Kikakui}) (213)
\p{Mende_Kikakui} \p{Script_Extensions=Mende_Kikakui}
(Short: \p{Mend}; NOT \p{Block=
Mende_Kikakui}) (213)
\p{Merc} \p{Meroitic_Cursive} (=
\p{Script_Extensions=Meroitic_Cursive})
(NOT \p{Block=Meroitic_Cursive}) (90)
\p{Mero} \p{Meroitic_Hieroglyphs} (=
\p{Script_Extensions=
Meroitic_Hieroglyphs}) (32)
\p{Meroitic_Cursive} \p{Script_Extensions=Meroitic_Cursive}
(Short: \p{Merc}; NOT \p{Block=
Meroitic_Cursive}) (90)
\p{Meroitic_Hieroglyphs} \p{Script_Extensions=
Meroitic_Hieroglyphs} (Short: \p{Mero})
(32)
\p{Miao} \p{Script_Extensions=Miao} (NOT \p{Block=
Miao}) (133)
X \p{Misc_Arrows} \p{Miscellaneous_Symbols_And_Arrows} (=
\p{Block=
Miscellaneous_Symbols_And_Arrows}) (256)
X \p{Misc_Math_Symbols_A} \p{Miscellaneous_Mathematical_Symbols_A}
(= \p{Block=
Miscellaneous_Mathematical_Symbols_A})
(48)
X \p{Misc_Math_Symbols_B} \p{Miscellaneous_Mathematical_Symbols_B}
(= \p{Block=
Miscellaneous_Mathematical_Symbols_B})
(128)
X \p{Misc_Pictographs} \p{Miscellaneous_Symbols_And_Pictographs}
(= \p{Block=
Miscellaneous_Symbols_And_Pictographs})
(768)
X \p{Misc_Symbols} \p{Miscellaneous_Symbols} (= \p{Block=
Miscellaneous_Symbols}) (256)
X \p{Misc_Technical} \p{Miscellaneous_Technical} (= \p{Block=
Miscellaneous_Technical}) (256)
X \p{Miscellaneous_Mathematical_Symbols_A} \p{Block=
Miscellaneous_Mathematical_Symbols_A}
(Short: \p{InMiscMathSymbolsA}) (48)
X \p{Miscellaneous_Mathematical_Symbols_B} \p{Block=
Miscellaneous_Mathematical_Symbols_B}
(Short: \p{InMiscMathSymbolsB}) (128)
X \p{Miscellaneous_Symbols} \p{Block=Miscellaneous_Symbols} (Short:
\p{InMiscSymbols}) (256)
X \p{Miscellaneous_Symbols_And_Arrows} \p{Block=
Miscellaneous_Symbols_And_Arrows}
(Short: \p{InMiscArrows}) (256)
X \p{Miscellaneous_Symbols_And_Pictographs} \p{Block=
Miscellaneous_Symbols_And_Pictographs}
(Short: \p{InMiscPictographs}) (768)
X \p{Miscellaneous_Technical} \p{Block=Miscellaneous_Technical}
(Short: \p{InMiscTechnical}) (256)
\p{Mlym} \p{Malayalam} (= \p{Script_Extensions=
Malayalam}) (NOT \p{Block=Malayalam})
(119)
\p{Mn} \p{Nonspacing_Mark} (=
\p{General_Category=Nonspacing_Mark})
(1690)
\p{Modi} \p{Script_Extensions=Modi} (NOT \p{Block=
Modi}) (89)
\p{Modifier_Letter} \p{General_Category=Modifier_Letter}
(Short: \p{Lm}) (249)
X \p{Modifier_Letters} \p{Spacing_Modifier_Letters} (= \p{Block=
Spacing_Modifier_Letters}) (80)
\p{Modifier_Symbol} \p{General_Category=Modifier_Symbol}
(Short: \p{Sk}) (121)
X \p{Modifier_Tone_Letters} \p{Block=Modifier_Tone_Letters} (32)
\p{Mong} \p{Mongolian} (= \p{Script_Extensions=
Mongolian}) (NOT \p{Block=Mongolian})
(169)
\p{Mongolian} \p{Script_Extensions=Mongolian} (Short:
\p{Mong}; NOT \p{Block=Mongolian}) (169)
X \p{Mongolian_Sup} \p{Mongolian_Supplement} (= \p{Block=
Mongolian_Supplement}) (32)
X \p{Mongolian_Supplement} \p{Block=Mongolian_Supplement} (Short:
\p{InMongolianSup}) (32)
\p{Mro} \p{Script_Extensions=Mro} (NOT \p{Block=
Mro}) (43)
\p{Mroo} \p{Mro} (= \p{Script_Extensions=Mro}) (NOT
\p{Block=Mro}) (43)
\p{Mtei} \p{Meetei_Mayek} (= \p{Script_Extensions=
Meetei_Mayek}) (NOT \p{Block=
Meetei_Mayek}) (79)
\p{Mult} \p{Multani} (= \p{Script_Extensions=
Multani}) (NOT \p{Block=Multani}) (48)
\p{Multani} \p{Script_Extensions=Multani} (Short:
\p{Mult}; NOT \p{Block=Multani}) (48)
X \p{Music} \p{Musical_Symbols} (= \p{Block=
Musical_Symbols}) (256)
X \p{Musical_Symbols} \p{Block=Musical_Symbols} (Short:
\p{InMusic}) (256)
\p{Myanmar} \p{Script_Extensions=Myanmar} (Short:
\p{Mymr}; NOT \p{Block=Myanmar}) (224)
X \p{Myanmar_Ext_A} \p{Myanmar_Extended_A} (= \p{Block=
Myanmar_Extended_A}) (32)
X \p{Myanmar_Ext_B} \p{Myanmar_Extended_B} (= \p{Block=
Myanmar_Extended_B}) (32)
X \p{Myanmar_Extended_A} \p{Block=Myanmar_Extended_A} (Short:
\p{InMyanmarExtA}) (32)
X \p{Myanmar_Extended_B} \p{Block=Myanmar_Extended_B} (Short:
\p{InMyanmarExtB}) (32)
\p{Mymr} \p{Myanmar} (= \p{Script_Extensions=
Myanmar}) (NOT \p{Block=Myanmar}) (224)
\p{N} \pN \p{Number} (= \p{General_Category=Number})
(1492)
\p{Nabataean} \p{Script_Extensions=Nabataean} (Short:
\p{Nbat}; NOT \p{Block=Nabataean}) (40)
\p{Narb} \p{Old_North_Arabian} (=
\p{Script_Extensions=Old_North_Arabian})
(32)
X \p{NB} \p{No_Block} (= \p{Block=No_Block})
(842_320 plus all above-Unicode code
points)
\p{Nbat} \p{Nabataean} (= \p{Script_Extensions=
Nabataean}) (NOT \p{Block=Nabataean})
(40)
\p{NChar} \p{Noncharacter_Code_Point} (=
\p{Noncharacter_Code_Point=Y}) (66)
\p{NChar: *} \p{Noncharacter_Code_Point: *}
\p{Nd} \p{XPosixDigit} (= \p{General_Category=
Decimal_Number}) (580)
\p{New_Tai_Lue} \p{Script_Extensions=New_Tai_Lue} (Short:
\p{Talu}; NOT \p{Block=New_Tai_Lue}) (83)
\p{Newa} \p{Script_Extensions=Newa} (NOT \p{Block=
Newa}) (92)
\p{NFC_QC: *} \p{NFC_Quick_Check: *}
\p{NFC_Quick_Check: M} \p{NFC_Quick_Check=Maybe} (110)
\p{NFC_Quick_Check: Maybe} (Short: \p{NFCQC=M}) (110)
\p{NFC_Quick_Check: N} \p{NFC_Quick_Check=No} (NOT
\P{NFC_Quick_Check} NOR \P{NFC_QC})
(1120)
\p{NFC_Quick_Check: No} (Short: \p{NFCQC=N}; NOT
\P{NFC_Quick_Check} NOR \P{NFC_QC})
(1120)
\p{NFC_Quick_Check: Y} \p{NFC_Quick_Check=Yes} (NOT
\p{NFC_Quick_Check} NOR \p{NFC_QC})
(1_112_882 plus all above-Unicode code
points)
\p{NFC_Quick_Check: Yes} (Short: \p{NFCQC=Y}; NOT
\p{NFC_Quick_Check} NOR \p{NFC_QC})
(1_112_882 plus all above-Unicode code
points)
\p{NFD_QC: *} \p{NFD_Quick_Check: *}
\p{NFD_Quick_Check: N} \p{NFD_Quick_Check=No} (NOT
\P{NFD_Quick_Check} NOR \P{NFD_QC})
(13_232)
\p{NFD_Quick_Check: No} (Short: \p{NFDQC=N}; NOT
\P{NFD_Quick_Check} NOR \P{NFD_QC})
(13_232)
\p{NFD_Quick_Check: Y} \p{NFD_Quick_Check=Yes} (NOT
\p{NFD_Quick_Check} NOR \p{NFD_QC})
(1_100_880 plus all above-Unicode code
points)
\p{NFD_Quick_Check: Yes} (Short: \p{NFDQC=Y}; NOT
\p{NFD_Quick_Check} NOR \p{NFD_QC})
(1_100_880 plus all above-Unicode code
points)
\p{NFKC_QC: *} \p{NFKC_Quick_Check: *}
\p{NFKC_Quick_Check: M} \p{NFKC_Quick_Check=Maybe} (110)
\p{NFKC_Quick_Check: Maybe} (Short: \p{NFKCQC=M}) (110)
\p{NFKC_Quick_Check: N} \p{NFKC_Quick_Check=No} (NOT
\P{NFKC_Quick_Check} NOR \P{NFKC_QC})
(4794)
\p{NFKC_Quick_Check: No} (Short: \p{NFKCQC=N}; NOT
\P{NFKC_Quick_Check} NOR \P{NFKC_QC})
(4794)
\p{NFKC_Quick_Check: Y} \p{NFKC_Quick_Check=Yes} (NOT
\p{NFKC_Quick_Check} NOR \p{NFKC_QC})
(1_109_208 plus all above-Unicode code
points)
\p{NFKC_Quick_Check: Yes} (Short: \p{NFKCQC=Y}; NOT
\p{NFKC_Quick_Check} NOR \p{NFKC_QC})
(1_109_208 plus all above-Unicode code
points)
\p{NFKD_QC: *} \p{NFKD_Quick_Check: *}
\p{NFKD_Quick_Check: N} \p{NFKD_Quick_Check=No} (NOT
\P{NFKD_Quick_Check} NOR \P{NFKD_QC})
(16_894)
\p{NFKD_Quick_Check: No} (Short: \p{NFKDQC=N}; NOT
\P{NFKD_Quick_Check} NOR \P{NFKD_QC})
(16_894)
\p{NFKD_Quick_Check: Y} \p{NFKD_Quick_Check=Yes} (NOT
\p{NFKD_Quick_Check} NOR \p{NFKD_QC})
(1_097_218 plus all above-Unicode code
points)
\p{NFKD_Quick_Check: Yes} (Short: \p{NFKDQC=Y}; NOT
\p{NFKD_Quick_Check} NOR \p{NFKD_QC})
(1_097_218 plus all above-Unicode code
points)
\p{Nko} \p{Script_Extensions=Nko} (NOT \p{NKo})
(59)
\p{Nkoo} \p{Nko} (= \p{Script_Extensions=Nko}) (NOT
\p{NKo}) (59)
\p{Nl} \p{Letter_Number} (= \p{General_Category=
Letter_Number}) (236)
\p{No} \p{Other_Number} (= \p{General_Category=
Other_Number}) (676)
X \p{No_Block} \p{Block=No_Block} (Short: \p{InNB})
(842_320 plus all above-Unicode code
points)
\p{Noncharacter_Code_Point} \p{Noncharacter_Code_Point=Y} (Short:
\p{NChar}) (66)
\p{Noncharacter_Code_Point: N*} (Short: \p{NChar=N}, \P{NChar})
(1_114_046 plus all above-Unicode code
points)
\p{Noncharacter_Code_Point: Y*} (Short: \p{NChar=Y}, \p{NChar})
(66)
\p{Nonspacing_Mark} \p{General_Category=Nonspacing_Mark}
(Short: \p{Mn}) (1690)
\p{Nt: *} \p{Numeric_Type: *}
\p{Number} \p{General_Category=Number} (Short: \p{N})
(1492)
X \p{Number_Forms} \p{Block=Number_Forms} (64)
\p{Numeric_Type: De} \p{Numeric_Type=Decimal} (580)
\p{Numeric_Type: Decimal} (Short: \p{Nt=De}) (580)
\p{Numeric_Type: Di} \p{Numeric_Type=Digit} (128)
\p{Numeric_Type: Digit} (Short: \p{Nt=Di}) (128)
\p{Numeric_Type: None} (Short: \p{Nt=None}) (1_112_539 plus all
above-Unicode code points)
\p{Numeric_Type: Nu} \p{Numeric_Type=Numeric} (865)
\p{Numeric_Type: Numeric} (Short: \p{Nt=Nu}) (865)
T \p{Numeric_Value: -1/2} (Short: \p{Nv=-1/2}) (1)
T \p{Numeric_Value: 0} (Short: \p{Nv=0}) (74)
T \p{Numeric_Value: 1/160} (Short: \p{Nv=1/160}) (1)
T \p{Numeric_Value: 1/40} (Short: \p{Nv=1/40}) (1)
T \p{Numeric_Value: 3/80} (Short: \p{Nv=3/80}) (1)
T \p{Numeric_Value: 1/20} (Short: \p{Nv=1/20}) (1)
T \p{Numeric_Value: 1/16} (Short: \p{Nv=1/16}) (4)
T \p{Numeric_Value: 1/12} (Short: \p{Nv=1/12}) (1)
T \p{Numeric_Value: 1/10} (Short: \p{Nv=1/10}) (2)
T \p{Numeric_Value: 1/9} (Short: \p{Nv=1/9}) (1)
T \p{Numeric_Value: 1/8} (Short: \p{Nv=1/8}) (6)
T \p{Numeric_Value: 1/7} (Short: \p{Nv=1/7}) (1)
T \p{Numeric_Value: 3/20} (Short: \p{Nv=3/20}) (1)
T \p{Numeric_Value: 1/6} (Short: \p{Nv=1/6}) (3)
T \p{Numeric_Value: 3/16} (Short: \p{Nv=3/16}) (4)
T \p{Numeric_Value: 1/5} (Short: \p{Nv=1/5}) (2)
T \p{Numeric_Value: 1/4} (Short: \p{Nv=1/4}) (12)
T \p{Numeric_Value: 1/3} (Short: \p{Nv=1/3}) (6)
T \p{Numeric_Value: 3/8} (Short: \p{Nv=3/8}) (1)
T \p{Numeric_Value: 2/5} (Short: \p{Nv=2/5}) (1)
T \p{Numeric_Value: 5/12} (Short: \p{Nv=5/12}) (1)
T \p{Numeric_Value: 1/2} (Short: \p{Nv=1/2}) (13)
T \p{Numeric_Value: 7/12} (Short: \p{Nv=7/12}) (1)
T \p{Numeric_Value: 3/5} (Short: \p{Nv=3/5}) (1)
T \p{Numeric_Value: 5/8} (Short: \p{Nv=5/8}) (1)
T \p{Numeric_Value: 2/3} (Short: \p{Nv=2/3}) (7)
T \p{Numeric_Value: 3/4} (Short: \p{Nv=3/4}) (7)
T \p{Numeric_Value: 4/5} (Short: \p{Nv=4/5}) (1)
T \p{Numeric_Value: 5/6} (Short: \p{Nv=5/6}) (3)
T \p{Numeric_Value: 7/8} (Short: \p{Nv=7/8}) (1)
T \p{Numeric_Value: 11/12} (Short: \p{Nv=11/12}) (1)
T \p{Numeric_Value: 1} (Short: \p{Nv=1}) (121)
T \p{Numeric_Value: 3/2} (Short: \p{Nv=3/2}) (1)
T \p{Numeric_Value: 2} (Short: \p{Nv=2}) (121)
T \p{Numeric_Value: 5/2} (Short: \p{Nv=5/2}) (1)
T \p{Numeric_Value: 3} (Short: \p{Nv=3}) (123)
T \p{Numeric_Value: 7/2} (Short: \p{Nv=7/2}) (1)
T \p{Numeric_Value: 4} (Short: \p{Nv=4}) (115)
T \p{Numeric_Value: 9/2} (Short: \p{Nv=9/2}) (1)
T \p{Numeric_Value: 5} (Short: \p{Nv=5}) (113)
T \p{Numeric_Value: 11/2} (Short: \p{Nv=11/2}) (1)
T \p{Numeric_Value: 6} (Short: \p{Nv=6}) (100)
T \p{Numeric_Value: 13/2} (Short: \p{Nv=13/2}) (1)
T \p{Numeric_Value: 7} (Short: \p{Nv=7}) (99)
T \p{Numeric_Value: 15/2} (Short: \p{Nv=15/2}) (1)
T \p{Numeric_Value: 8} (Short: \p{Nv=8}) (95)
T \p{Numeric_Value: 17/2} (Short: \p{Nv=17/2}) (1)
T \p{Numeric_Value: 9} (Short: \p{Nv=9}) (99)
T \p{Numeric_Value: 10} (Short: \p{Nv=10}) (54)
T \p{Numeric_Value: 11} (Short: \p{Nv=11}) (6)
T \p{Numeric_Value: 12} (Short: \p{Nv=12}) (6)
T \p{Numeric_Value: 13} (Short: \p{Nv=13}) (4)
T \p{Numeric_Value: 14} (Short: \p{Nv=14}) (4)
T \p{Numeric_Value: 15} (Short: \p{Nv=15}) (4)
T \p{Numeric_Value: 16} (Short: \p{Nv=16}) (5)
T \p{Numeric_Value: 17} (Short: \p{Nv=17}) (5)
T \p{Numeric_Value: 18} (Short: \p{Nv=18}) (5)
T \p{Numeric_Value: 19} (Short: \p{Nv=19}) (5)
T \p{Numeric_Value: 20} (Short: \p{Nv=20}) (31)
T \p{Numeric_Value: 21} (Short: \p{Nv=21}) (1)
T \p{Numeric_Value: 22} (Short: \p{Nv=22}) (1)
T \p{Numeric_Value: 23} (Short: \p{Nv=23}) (1)
T \p{Numeric_Value: 24} (Short: \p{Nv=24}) (1)
T \p{Numeric_Value: 25} (Short: \p{Nv=25}) (1)
T \p{Numeric_Value: 26} (Short: \p{Nv=26}) (1)
T \p{Numeric_Value: 27} (Short: \p{Nv=27}) (1)
T \p{Numeric_Value: 28} (Short: \p{Nv=28}) (1)
T \p{Numeric_Value: 29} (Short: \p{Nv=29}) (1)
T \p{Numeric_Value: 30} (Short: \p{Nv=30}) (16)
T \p{Numeric_Value: 31} (Short: \p{Nv=31}) (1)
T \p{Numeric_Value: 32} (Short: \p{Nv=32}) (1)
T \p{Numeric_Value: 33} (Short: \p{Nv=33}) (1)
T \p{Numeric_Value: 34} (Short: \p{Nv=34}) (1)
T \p{Numeric_Value: 35} (Short: \p{Nv=35}) (1)
T \p{Numeric_Value: 36} (Short: \p{Nv=36}) (1)
T \p{Numeric_Value: 37} (Short: \p{Nv=37}) (1)
T \p{Numeric_Value: 38} (Short: \p{Nv=38}) (1)
T \p{Numeric_Value: 39} (Short: \p{Nv=39}) (1)
T \p{Numeric_Value: 40} (Short: \p{Nv=40}) (16)
T \p{Numeric_Value: 41} (Short: \p{Nv=41}) (1)
T \p{Numeric_Value: 42} (Short: \p{Nv=42}) (1)
T \p{Numeric_Value: 43} (Short: \p{Nv=43}) (1)
T \p{Numeric_Value: 44} (Short: \p{Nv=44}) (1)
T \p{Numeric_Value: 45} (Short: \p{Nv=45}) (1)
T \p{Numeric_Value: 46} (Short: \p{Nv=46}) (1)
T \p{Numeric_Value: 47} (Short: \p{Nv=47}) (1)
T \p{Numeric_Value: 48} (Short: \p{Nv=48}) (1)
T \p{Numeric_Value: 49} (Short: \p{Nv=49}) (1)
T \p{Numeric_Value: 50} (Short: \p{Nv=50}) (27)
T \p{Numeric_Value: 60} (Short: \p{Nv=60}) (11)
T \p{Numeric_Value: 70} (Short: \p{Nv=70}) (11)
T \p{Numeric_Value: 80} (Short: \p{Nv=80}) (10)
T \p{Numeric_Value: 90} (Short: \p{Nv=90}) (10)
T \p{Numeric_Value: 100} (Short: \p{Nv=100}) (30)
T \p{Numeric_Value: 200} (Short: \p{Nv=200}) (4)
T \p{Numeric_Value: 300} (Short: \p{Nv=300}) (5)
T \p{Numeric_Value: 400} (Short: \p{Nv=400}) (4)
T \p{Numeric_Value: 500} (Short: \p{Nv=500}) (14)
T \p{Numeric_Value: 600} (Short: \p{Nv=600}) (4)
T \p{Numeric_Value: 700} (Short: \p{Nv=700}) (4)
T \p{Numeric_Value: 800} (Short: \p{Nv=800}) (4)
T \p{Numeric_Value: 900} (Short: \p{Nv=900}) (5)
T \p{Numeric_Value: 1000} (Short: \p{Nv=1000}) (20)
T \p{Numeric_Value: 2000} (Short: \p{Nv=2000}) (2)
T \p{Numeric_Value: 3000} (Short: \p{Nv=3000}) (2)
T \p{Numeric_Value: 4000} (Short: \p{Nv=4000}) (2)
T \p{Numeric_Value: 5000} (Short: \p{Nv=5000}) (6)
T \p{Numeric_Value: 6000} (Short: \p{Nv=6000}) (2)
T \p{Numeric_Value: 7000} (Short: \p{Nv=7000}) (2)
T \p{Numeric_Value: 8000} (Short: \p{Nv=8000}) (2)
T \p{Numeric_Value: 9000} (Short: \p{Nv=9000}) (2)
T \p{Numeric_Value: 10000} (= 1.0e+04) (Short: \p{Nv=10000}) (9)
T \p{Numeric_Value: 20000} (= 2.0e+04) (Short: \p{Nv=20000}) (2)
T \p{Numeric_Value: 30000} (= 3.0e+04) (Short: \p{Nv=30000}) (2)
T \p{Numeric_Value: 40000} (= 4.0e+04) (Short: \p{Nv=40000}) (2)
T \p{Numeric_Value: 50000} (= 5.0e+04) (Short: \p{Nv=50000}) (5)
T \p{Numeric_Value: 60000} (= 6.0e+04) (Short: \p{Nv=60000}) (2)
T \p{Numeric_Value: 70000} (= 7.0e+04) (Short: \p{Nv=70000}) (2)
T \p{Numeric_Value: 80000} (= 8.0e+04) (Short: \p{Nv=80000}) (2)
T \p{Numeric_Value: 90000} (= 9.0e+04) (Short: \p{Nv=90000}) (2)
T \p{Numeric_Value: 100000} (= 1.0e+05) (Short: \p{Nv=100000}) (2)
T \p{Numeric_Value: 200000} (= 2.0e+05) (Short: \p{Nv=200000}) (1)
T \p{Numeric_Value: 216000} (= 2.2e+05) (Short: \p{Nv=216000}) (1)
T \p{Numeric_Value: 300000} (= 3.0e+05) (Short: \p{Nv=300000}) (1)
T \p{Numeric_Value: 400000} (= 4.0e+05) (Short: \p{Nv=400000}) (1)
T \p{Numeric_Value: 432000} (= 4.3e+05) (Short: \p{Nv=432000}) (1)
T \p{Numeric_Value: 500000} (= 5.0e+05) (Short: \p{Nv=500000}) (1)
T \p{Numeric_Value: 600000} (= 6.0e+05) (Short: \p{Nv=600000}) (1)
T \p{Numeric_Value: 700000} (= 7.0e+05) (Short: \p{Nv=700000}) (1)
T \p{Numeric_Value: 800000} (= 8.0e+05) (Short: \p{Nv=800000}) (1)
T \p{Numeric_Value: 900000} (= 9.0e+05) (Short: \p{Nv=900000}) (1)
T \p{Numeric_Value: 1000000} (= 1.0e+06) (Short: \p{Nv=1000000}) (1)
T \p{Numeric_Value: 100000000} (= 1.0e+08) (Short: \p{Nv=100000000})
(3)
T \p{Numeric_Value: 10000000000} (= 1.0e+10) (Short: \p{Nv=
10000000000}) (1)
T \p{Numeric_Value: 1000000000000} (= 1.0e+12) (Short: \p{Nv=
1000000000000}) (2)
\p{Numeric_Value: NaN} (Short: \p{Nv=NaN}) (1_112_539 plus all
above-Unicode code points)
\p{Nv: *} \p{Numeric_Value: *}
X \p{OCR} \p{Optical_Character_Recognition} (=
\p{Block=Optical_Character_Recognition})
(32)
\p{Ogam} \p{Ogham} (= \p{Script_Extensions=Ogham})
(NOT \p{Block=Ogham}) (29)
\p{Ogham} \p{Script_Extensions=Ogham} (Short:
\p{Ogam}; NOT \p{Block=Ogham}) (29)
\p{Ol_Chiki} \p{Script_Extensions=Ol_Chiki} (Short:
\p{Olck}) (48)
\p{Olck} \p{Ol_Chiki} (= \p{Script_Extensions=
Ol_Chiki}) (48)
\p{Old_Hungarian} \p{Script_Extensions=Old_Hungarian}
(Short: \p{Hung}; NOT \p{Block=
Old_Hungarian}) (108)
\p{Old_Italic} \p{Script_Extensions=Old_Italic} (Short:
\p{Ital}; NOT \p{Block=Old_Italic}) (36)
\p{Old_North_Arabian} \p{Script_Extensions=Old_North_Arabian}
(Short: \p{Narb}) (32)
\p{Old_Permic} \p{Script_Extensions=Old_Permic} (Short:
\p{Perm}; NOT \p{Block=Old_Permic}) (44)
\p{Old_Persian} \p{Script_Extensions=Old_Persian} (Short:
\p{Xpeo}; NOT \p{Block=Old_Persian}) (50)
\p{Old_South_Arabian} \p{Script_Extensions=Old_South_Arabian}
(Short: \p{Sarb}) (32)
\p{Old_Turkic} \p{Script_Extensions=Old_Turkic} (Short:
\p{Orkh}; NOT \p{Block=Old_Turkic}) (73)
\p{Open_Punctuation} \p{General_Category=Open_Punctuation}
(Short: \p{Ps}) (75)
X \p{Optical_Character_Recognition} \p{Block=
Optical_Character_Recognition} (Short:
\p{InOCR}) (32)
\p{Oriya} \p{Script_Extensions=Oriya} (Short:
\p{Orya}; NOT \p{Block=Oriya}) (94)
\p{Orkh} \p{Old_Turkic} (= \p{Script_Extensions=
Old_Turkic}) (NOT \p{Block=Old_Turkic})
(73)
X \p{Ornamental_Dingbats} \p{Block=Ornamental_Dingbats} (48)
\p{Orya} \p{Oriya} (= \p{Script_Extensions=Oriya})
(NOT \p{Block=Oriya}) (94)
\p{Osage} \p{Script_Extensions=Osage} (Short:
\p{Osge}; NOT \p{Block=Osage}) (72)
\p{Osge} \p{Osage} (= \p{Script_Extensions=Osage})
(NOT \p{Block=Osage}) (72)
\p{Osma} \p{Osmanya} (= \p{Script_Extensions=
Osmanya}) (NOT \p{Block=Osmanya}) (40)
\p{Osmanya} \p{Script_Extensions=Osmanya} (Short:
\p{Osma}; NOT \p{Block=Osmanya}) (40)
\p{Other} \p{General_Category=Other} (Short: \p{C})
(986_091 plus all above-Unicode code
points)
\p{Other_Letter} \p{General_Category=Other_Letter} (Short:
\p{Lo}) (112_721)
\p{Other_Number} \p{General_Category=Other_Number} (Short:
\p{No}) (676)
\p{Other_Punctuation} \p{General_Category=Other_Punctuation}
(Short: \p{Po}) (544)
\p{Other_Symbol} \p{General_Category=Other_Symbol} (Short:
\p{So}) (5777)
\p{P} \pP \p{Punct} (= \p{General_Category=
Punctuation}) (NOT
\p{General_Punctuation}) (748)
\p{Pahawh_Hmong} \p{Script_Extensions=Pahawh_Hmong} (Short:
\p{Hmng}; NOT \p{Block=Pahawh_Hmong})
(127)
\p{Palm} \p{Palmyrene} (= \p{Script_Extensions=
Palmyrene}) (32)
\p{Palmyrene} \p{Script_Extensions=Palmyrene} (Short:
\p{Palm}) (32)
\p{Paragraph_Separator} \p{General_Category=Paragraph_Separator}
(Short: \p{Zp}) (1)
\p{Pat_Syn} \p{Pattern_Syntax} (= \p{Pattern_Syntax=
Y}) (2760)
\p{Pat_Syn: *} \p{Pattern_Syntax: *}
\p{Pat_WS} \p{Pattern_White_Space} (=
\p{Pattern_White_Space=Y}) (11)
\p{Pat_WS: *} \p{Pattern_White_Space: *}
\p{Pattern_Syntax} \p{Pattern_Syntax=Y} (Short: \p{PatSyn})
(2760)
\p{Pattern_Syntax: N*} (Short: \p{PatSyn=N}, \P{PatSyn})
(1_111_352 plus all above-Unicode code
points)
\p{Pattern_Syntax: Y*} (Short: \p{PatSyn=Y}, \p{PatSyn}) (2760)
\p{Pattern_White_Space} \p{Pattern_White_Space=Y} (Short:
\p{PatWS}) (11)
\p{Pattern_White_Space: N*} (Short: \p{PatWS=N}, \P{PatWS})
(1_114_101 plus all above-Unicode code
points)
\p{Pattern_White_Space: Y*} (Short: \p{PatWS=Y}, \p{PatWS}) (11)
\p{Pau_Cin_Hau} \p{Script_Extensions=Pau_Cin_Hau} (Short:
\p{Pauc}; NOT \p{Block=Pau_Cin_Hau}) (57)
\p{Pauc} \p{Pau_Cin_Hau} (= \p{Script_Extensions=
Pau_Cin_Hau}) (NOT \p{Block=
Pau_Cin_Hau}) (57)
\p{Pc} \p{Connector_Punctuation} (=
\p{General_Category=
Connector_Punctuation}) (10)
\p{PCM} \p{Prepended_Concatenation_Mark} (=
\p{Prepended_Concatenation_Mark=Y}) (10)
\p{PCM: *} \p{Prepended_Concatenation_Mark: *}
\p{Pd} \p{Dash_Punctuation} (=
\p{General_Category=Dash_Punctuation})
(24)
\p{Pe} \p{Close_Punctuation} (=
\p{General_Category=Close_Punctuation})
(73)
\p{PerlSpace} \p{PosixSpace} (6)
\p{PerlWord} \p{PosixWord} (63)
\p{Perm} \p{Old_Permic} (= \p{Script_Extensions=
Old_Permic}) (NOT \p{Block=Old_Permic})
(44)
\p{Pf} \p{Final_Punctuation} (=
\p{General_Category=Final_Punctuation})
(10)
\p{Phag} \p{Phags_Pa} (= \p{Script_Extensions=
Phags_Pa}) (NOT \p{Block=Phags_Pa}) (59)
\p{Phags_Pa} \p{Script_Extensions=Phags_Pa} (Short:
\p{Phag}; NOT \p{Block=Phags_Pa}) (59)
X \p{Phaistos} \p{Phaistos_Disc} (= \p{Block=
Phaistos_Disc}) (48)
X \p{Phaistos_Disc} \p{Block=Phaistos_Disc} (Short:
\p{InPhaistos}) (48)
\p{Phli} \p{Inscriptional_Pahlavi} (=
\p{Script_Extensions=
Inscriptional_Pahlavi}) (NOT \p{Block=
Inscriptional_Pahlavi}) (27)
\p{Phlp} \p{Psalter_Pahlavi} (=
\p{Script_Extensions=Psalter_Pahlavi})
(NOT \p{Block=Psalter_Pahlavi}) (30)
\p{Phnx} \p{Phoenician} (= \p{Script_Extensions=
Phoenician}) (NOT \p{Block=Phoenician})
(29)
\p{Phoenician} \p{Script_Extensions=Phoenician} (Short:
\p{Phnx}; NOT \p{Block=Phoenician}) (29)
X \p{Phonetic_Ext} \p{Phonetic_Extensions} (= \p{Block=
Phonetic_Extensions}) (128)
X \p{Phonetic_Ext_Sup} \p{Phonetic_Extensions_Supplement} (=
\p{Block=
Phonetic_Extensions_Supplement}) (64)
X \p{Phonetic_Extensions} \p{Block=Phonetic_Extensions} (Short:
\p{InPhoneticExt}) (128)
X \p{Phonetic_Extensions_Supplement} \p{Block=
Phonetic_Extensions_Supplement} (Short:
\p{InPhoneticExtSup}) (64)
\p{Pi} \p{Initial_Punctuation} (=
\p{General_Category=
Initial_Punctuation}) (12)
X \p{Playing_Cards} \p{Block=Playing_Cards} (96)
\p{Plrd} \p{Miao} (= \p{Script_Extensions=Miao})
(NOT \p{Block=Miao}) (133)
\p{Po} \p{Other_Punctuation} (=
\p{General_Category=Other_Punctuation})
(544)
\p{PosixAlnum} [A-Za-z0-9] (62)
\p{PosixAlpha} [A-Za-z] (52)
\p{PosixBlank} \t and ' ' (2)
\p{PosixCntrl} ASCII control characters: NUL, SOH, STX,
ETX, EOT, ENQ, ACK, BEL, BS, HT, LF, VT,
FF, CR, SO, SI, DLE, DC1, DC2, DC3, DC4,
NAK, SYN, ETB, CAN, EOM, SUB, ESC, FS,
GS, RS, US, and DEL (33)
\p{PosixDigit} [0-9] (10)
\p{PosixGraph} [-!"#$%&'()*+,./:;<=>?@[\\]^_`{|}~0-9A-Za-
z] (94)
\p{PosixLower} [a-z] (/i= PosixAlpha) (26)
\p{PosixPrint} [- 0-9A-Za-z!"#$%&'()*+,./:;<=
>?@[\\]^_`{|}~] (95)
\p{PosixPunct} [-!"#$%&'()*+,./:;<=>?@[\\]^_`{|}~] (32)
\p{PosixSpace} \t, \n, \cK, \f, \r, and ' '. (\cK is
vertical tab) (Short: \p{PerlSpace}) (6)
\p{PosixUpper} [A-Z] (/i= PosixAlpha) (26)
\p{PosixWord} \w, restricted to ASCII = [A-Za-z0-9_]
(Short: \p{PerlWord}) (63)
\p{PosixXDigit} \p{ASCII_Hex_Digit=Y} [0-9A-Fa-f] (Short:
\p{AHex}) (22)
\p{Prepended_Concatenation_Mark} \p{Prepended_Concatenation_Mark=
Y} (Short: \p{PCM}) (10)
\p{Prepended_Concatenation_Mark: N*} (Short: \p{PCM=N}, \P{PCM})
(1_114_102 plus all above-Unicode code
points)
\p{Prepended_Concatenation_Mark: Y*} (Short: \p{PCM=Y}, \p{PCM})
(10)
T \p{Present_In: 1.1} \p{Age=V1_1} (Short: \p{In=1.1}) (Perl
extension) (33_979)
T \p{Present_In: 2.0} Code point's usage introduced in version
2.0 or earlier (Short: \p{In=2.0}) (Perl
extension) (178_500)
T \p{Present_In: 2.1} Code point's usage introduced in version
2.1 or earlier (Short: \p{In=2.1}) (Perl
extension) (178_502)
T \p{Present_In: 3.0} Code point's usage introduced in version
3.0 or earlier (Short: \p{In=3.0}) (Perl
extension) (188_809)
T \p{Present_In: 3.1} Code point's usage introduced in version
3.1 or earlier (Short: \p{In=3.1}) (Perl
extension) (233_787)
T \p{Present_In: 3.2} Code point's usage introduced in version
3.2 or earlier (Short: \p{In=3.2}) (Perl
extension) (234_803)
T \p{Present_In: 4.0} Code point's usage introduced in version
4.0 or earlier (Short: \p{In=4.0}) (Perl
extension) (236_029)
T \p{Present_In: 4.1} Code point's usage introduced in version
4.1 or earlier (Short: \p{In=4.1}) (Perl
extension) (237_302)
T \p{Present_In: 5.0} Code point's usage introduced in version
5.0 or earlier (Short: \p{In=5.0}) (Perl
extension) (238_671)
T \p{Present_In: 5.1} Code point's usage introduced in version
5.1 or earlier (Short: \p{In=5.1}) (Perl
extension) (240_295)
T \p{Present_In: 5.2} Code point's usage introduced in version
5.2 or earlier (Short: \p{In=5.2}) (Perl
extension) (246_943)
T \p{Present_In: 6.0} Code point's usage introduced in version
6.0 or earlier (Short: \p{In=6.0}) (Perl
extension) (249_031)
T \p{Present_In: 6.1} Code point's usage introduced in version
6.1 or earlier (Short: \p{In=6.1}) (Perl
extension) (249_763)
T \p{Present_In: 6.2} Code point's usage introduced in version
6.2 or earlier (Short: \p{In=6.2}) (Perl
extension) (249_764)
T \p{Present_In: 6.3} Code point's usage introduced in version
6.3 or earlier (Short: \p{In=6.3}) (Perl
extension) (249_769)
T \p{Present_In: 7.0} Code point's usage introduced in version
7.0 or earlier (Short: \p{In=7.0}) (Perl
extension) (252_603)
T \p{Present_In: 8.0} Code point's usage introduced in version
8.0 or earlier (Short: \p{In=8.0}) (Perl
extension) (260_319)
T \p{Present_In: 9.0} Code point's usage introduced in version
9.0 or earlier (Short: \p{In=9.0}) (Perl
extension) (267_819)
\p{Present_In: Unassigned} \p{Age=Unassigned} (Short: \p{In=
Unassigned}) (Perl extension) (846_293
plus all above-Unicode code points)
\p{Print} \p{XPosixPrint} (265_638)
\p{Private_Use} \p{General_Category=Private_Use} (Short:
\p{Co}; NOT \p{Private_Use_Area})
(137_468)
X \p{Private_Use_Area} \p{Block=Private_Use_Area} (Short:
\p{InPUA}) (6400)
\p{Prti} \p{Inscriptional_Parthian} (=
\p{Script_Extensions=
Inscriptional_Parthian}) (NOT \p{Block=
Inscriptional_Parthian}) (30)
\p{Ps} \p{Open_Punctuation} (=
\p{General_Category=Open_Punctuation})
(75)
\p{Psalter_Pahlavi} \p{Script_Extensions=Psalter_Pahlavi}
(Short: \p{Phlp}; NOT \p{Block=
Psalter_Pahlavi}) (30)
X \p{PUA} \p{Private_Use_Area} (= \p{Block=
Private_Use_Area}) (6400)
\p{Punct} \p{General_Category=Punctuation} (Short:
\p{P}; NOT \p{General_Punctuation}) (748)
\p{Punctuation} \p{Punct} (= \p{General_Category=
Punctuation}) (NOT
\p{General_Punctuation}) (748)
\p{Qaac} \p{Coptic} (= \p{Script_Extensions=
Coptic}) (NOT \p{Block=Coptic}) (165)
\p{Qaai} \p{Inherited} (= \p{Script_Extensions=
Inherited}) (496)
\p{QMark} \p{Quotation_Mark} (= \p{Quotation_Mark=
Y}) (30)
\p{QMark: *} \p{Quotation_Mark: *}
\p{Quotation_Mark} \p{Quotation_Mark=Y} (Short: \p{QMark})
(30)
\p{Quotation_Mark: N*} (Short: \p{QMark=N}, \P{QMark}) (1_114_082
plus all above-Unicode code points)
\p{Quotation_Mark: Y*} (Short: \p{QMark=Y}, \p{QMark}) (30)
\p{Radical} \p{Radical=Y} (329)
\p{Radical: N*} (Single: \P{Radical}) (1_113_783 plus all
above-Unicode code points)
\p{Radical: Y*} (Single: \p{Radical}) (329)
\p{Rejang} \p{Script_Extensions=Rejang} (Short:
\p{Rjng}; NOT \p{Block=Rejang}) (37)
\p{Rjng} \p{Rejang} (= \p{Script_Extensions=
Rejang}) (NOT \p{Block=Rejang}) (37)
X \p{Rumi} \p{Rumi_Numeral_Symbols} (= \p{Block=
Rumi_Numeral_Symbols}) (32)
X \p{Rumi_Numeral_Symbols} \p{Block=Rumi_Numeral_Symbols} (Short:
\p{InRumi}) (32)
\p{Runic} \p{Script_Extensions=Runic} (Short:
\p{Runr}; NOT \p{Block=Runic}) (86)
\p{Runr} \p{Runic} (= \p{Script_Extensions=Runic})
(NOT \p{Block=Runic}) (86)
\p{S} \pS \p{Symbol} (= \p{General_Category=Symbol})
(6899)
\p{Samaritan} \p{Script_Extensions=Samaritan} (Short:
\p{Samr}; NOT \p{Block=Samaritan}) (61)
\p{Samr} \p{Samaritan} (= \p{Script_Extensions=
Samaritan}) (NOT \p{Block=Samaritan})
(61)
\p{Sarb} \p{Old_South_Arabian} (=
\p{Script_Extensions=Old_South_Arabian})
(32)
\p{Saur} \p{Saurashtra} (= \p{Script_Extensions=
Saurashtra}) (NOT \p{Block=Saurashtra})
(82)
\p{Saurashtra} \p{Script_Extensions=Saurashtra} (Short:
\p{Saur}; NOT \p{Block=Saurashtra}) (82)
\p{SB: *} \p{Sentence_Break: *}
\p{Sc} \p{Currency_Symbol} (=
\p{General_Category=Currency_Symbol})
(53)
\p{Sc: *} \p{Script: *}
\p{Script: Adlam} (Short: \p{Sc=Adlm}) (87)
\p{Script: Adlm} \p{Script=Adlam} (87)
\p{Script: Aghb} \p{Script=Caucasian_Albanian} (53)
\p{Script: Ahom} (Short: \p{Sc=Ahom}) (57)
\p{Script: Anatolian_Hieroglyphs} (Short: \p{Sc=Hluw}) (583)
\p{Script: Arab} \p{Script=Arabic} (1279)
\p{Script: Arabic} (Short: \p{Sc=Arab}) (1279)
\p{Script: Armenian} (Short: \p{Sc=Armn}) (93)
\p{Script: Armi} \p{Script=Imperial_Aramaic} (31)
\p{Script: Armn} \p{Script=Armenian} (93)
\p{Script: Avestan} (Short: \p{Sc=Avst}) (61)
\p{Script: Avst} \p{Script=Avestan} (61)
\p{Script: Bali} \p{Script=Balinese} (121)
\p{Script: Balinese} (Short: \p{Sc=Bali}) (121)
\p{Script: Bamu} \p{Script=Bamum} (657)
\p{Script: Bamum} (Short: \p{Sc=Bamu}) (657)
\p{Script: Bass} \p{Script=Bassa_Vah} (36)
\p{Script: Bassa_Vah} (Short: \p{Sc=Bass}) (36)
\p{Script: Batak} (Short: \p{Sc=Batk}) (56)
\p{Script: Batk} \p{Script=Batak} (56)
\p{Script: Beng} \p{Script=Bengali} (93)
\p{Script: Bengali} (Short: \p{Sc=Beng}) (93)
\p{Script: Bhaiksuki} (Short: \p{Sc=Bhks}) (97)
\p{Script: Bhks} \p{Script=Bhaiksuki} (97)
\p{Script: Bopo} \p{Script=Bopomofo} (70)
\p{Script: Bopomofo} (Short: \p{Sc=Bopo}) (70)
\p{Script: Brah} \p{Script=Brahmi} (109)
\p{Script: Brahmi} (Short: \p{Sc=Brah}) (109)
\p{Script: Brai} \p{Script=Braille} (256)
\p{Script: Braille} (Short: \p{Sc=Brai}) (256)
\p{Script: Bugi} \p{Script=Buginese} (30)
\p{Script: Buginese} (Short: \p{Sc=Bugi}) (30)
\p{Script: Buhd} \p{Script=Buhid} (20)
\p{Script: Buhid} (Short: \p{Sc=Buhd}) (20)
\p{Script: Cakm} \p{Script=Chakma} (67)
\p{Script: Canadian_Aboriginal} (Short: \p{Sc=Cans}) (710)
\p{Script: Cans} \p{Script=Canadian_Aboriginal} (710)
\p{Script: Cari} \p{Script=Carian} (49)
\p{Script: Carian} (Short: \p{Sc=Cari}) (49)
\p{Script: Caucasian_Albanian} (Short: \p{Sc=Aghb}) (53)
\p{Script: Chakma} (Short: \p{Sc=Cakm}) (67)
\p{Script: Cham} (Short: \p{Sc=Cham}) (83)
\p{Script: Cher} \p{Script=Cherokee} (172)
\p{Script: Cherokee} (Short: \p{Sc=Cher}) (172)
\p{Script: Common} (Short: \p{Sc=Zyyy}) (7279)
\p{Script: Copt} \p{Script=Coptic} (137)
\p{Script: Coptic} (Short: \p{Sc=Copt}) (137)
\p{Script: Cprt} \p{Script=Cypriot} (55)
\p{Script: Cuneiform} (Short: \p{Sc=Xsux}) (1234)
\p{Script: Cypriot} (Short: \p{Sc=Cprt}) (55)
\p{Script: Cyrillic} (Short: \p{Sc=Cyrl}) (443)
\p{Script: Cyrl} \p{Script=Cyrillic} (443)
\p{Script: Deseret} (Short: \p{Sc=Dsrt}) (80)
\p{Script: Deva} \p{Script=Devanagari} (154)
\p{Script: Devanagari} (Short: \p{Sc=Deva}) (154)
\p{Script: Dsrt} \p{Script=Deseret} (80)
\p{Script: Dupl} \p{Script=Duployan} (143)
\p{Script: Duployan} (Short: \p{Sc=Dupl}) (143)
\p{Script: Egyp} \p{Script=Egyptian_Hieroglyphs} (1071)
\p{Script: Egyptian_Hieroglyphs} (Short: \p{Sc=Egyp}) (1071)
\p{Script: Elba} \p{Script=Elbasan} (40)
\p{Script: Elbasan} (Short: \p{Sc=Elba}) (40)
\p{Script: Ethi} \p{Script=Ethiopic} (495)
\p{Script: Ethiopic} (Short: \p{Sc=Ethi}) (495)
\p{Script: Geor} \p{Script=Georgian} (127)
\p{Script: Georgian} (Short: \p{Sc=Geor}) (127)
\p{Script: Glag} \p{Script=Glagolitic} (132)
\p{Script: Glagolitic} (Short: \p{Sc=Glag}) (132)
\p{Script: Goth} \p{Script=Gothic} (27)
\p{Script: Gothic} (Short: \p{Sc=Goth}) (27)
\p{Script: Gran} \p{Script=Grantha} (85)
\p{Script: Grantha} (Short: \p{Sc=Gran}) (85)
\p{Script: Greek} (Short: \p{Sc=Grek}) (518)
\p{Script: Grek} \p{Script=Greek} (518)
\p{Script: Gujarati} (Short: \p{Sc=Gujr}) (85)
\p{Script: Gujr} \p{Script=Gujarati} (85)
\p{Script: Gurmukhi} (Short: \p{Sc=Guru}) (79)
\p{Script: Guru} \p{Script=Gurmukhi} (79)
\p{Script: Han} (Short: \p{Sc=Han}) (81_734)
\p{Script: Hang} \p{Script=Hangul} (11_739)
\p{Script: Hangul} (Short: \p{Sc=Hang}) (11_739)
\p{Script: Hani} \p{Script=Han} (81_734)
\p{Script: Hano} \p{Script=Hanunoo} (21)
\p{Script: Hanunoo} (Short: \p{Sc=Hano}) (21)
\p{Script: Hatr} \p{Script=Hatran} (26)
\p{Script: Hatran} (Short: \p{Sc=Hatr}) (26)
\p{Script: Hebr} \p{Script=Hebrew} (133)
\p{Script: Hebrew} (Short: \p{Sc=Hebr}) (133)
\p{Script: Hira} \p{Script=Hiragana} (91)
\p{Script: Hiragana} (Short: \p{Sc=Hira}) (91)
\p{Script: Hluw} \p{Script=Anatolian_Hieroglyphs} (583)
\p{Script: Hmng} \p{Script=Pahawh_Hmong} (127)
\p{Script: Hung} \p{Script=Old_Hungarian} (108)
\p{Script: Imperial_Aramaic} (Short: \p{Sc=Armi}) (31)
\p{Script: Inherited} (Short: \p{Sc=Zinh}) (564)
\p{Script: Inscriptional_Pahlavi} (Short: \p{Sc=Phli}) (27)
\p{Script: Inscriptional_Parthian} (Short: \p{Sc=Prti}) (30)
\p{Script: Ital} \p{Script=Old_Italic} (36)
\p{Script: Java} \p{Script=Javanese} (90)
\p{Script: Javanese} (Short: \p{Sc=Java}) (90)
\p{Script: Kaithi} (Short: \p{Sc=Kthi}) (66)
\p{Script: Kali} \p{Script=Kayah_Li} (47)
\p{Script: Kana} \p{Script=Katakana} (300)
\p{Script: Kannada} (Short: \p{Sc=Knda}) (88)
\p{Script: Katakana} (Short: \p{Sc=Kana}) (300)
\p{Script: Kayah_Li} (Short: \p{Sc=Kali}) (47)
\p{Script: Khar} \p{Script=Kharoshthi} (65)
\p{Script: Kharoshthi} (Short: \p{Sc=Khar}) (65)
\p{Script: Khmer} (Short: \p{Sc=Khmr}) (146)
\p{Script: Khmr} \p{Script=Khmer} (146)
\p{Script: Khoj} \p{Script=Khojki} (62)
\p{Script: Khojki} (Short: \p{Sc=Khoj}) (62)
\p{Script: Khudawadi} (Short: \p{Sc=Sind}) (69)
\p{Script: Knda} \p{Script=Kannada} (88)
\p{Script: Kthi} \p{Script=Kaithi} (66)
\p{Script: Lana} \p{Script=Tai_Tham} (127)
\p{Script: Lao} (Short: \p{Sc=Lao}) (67)
\p{Script: Laoo} \p{Script=Lao} (67)
\p{Script: Latin} (Short: \p{Sc=Latn}) (1350)
\p{Script: Latn} \p{Script=Latin} (1350)
\p{Script: Lepc} \p{Script=Lepcha} (74)
\p{Script: Lepcha} (Short: \p{Sc=Lepc}) (74)
\p{Script: Limb} \p{Script=Limbu} (68)
\p{Script: Limbu} (Short: \p{Sc=Limb}) (68)
\p{Script: Lina} \p{Script=Linear_A} (341)
\p{Script: Linb} \p{Script=Linear_B} (211)
\p{Script: Linear_A} (Short: \p{Sc=Lina}) (341)
\p{Script: Linear_B} (Short: \p{Sc=Linb}) (211)
\p{Script: Lisu} (Short: \p{Sc=Lisu}) (48)
\p{Script: Lyci} \p{Script=Lycian} (29)
\p{Script: Lycian} (Short: \p{Sc=Lyci}) (29)
\p{Script: Lydi} \p{Script=Lydian} (27)
\p{Script: Lydian} (Short: \p{Sc=Lydi}) (27)
\p{Script: Mahajani} (Short: \p{Sc=Mahj}) (39)
\p{Script: Mahj} \p{Script=Mahajani} (39)
\p{Script: Malayalam} (Short: \p{Sc=Mlym}) (114)
\p{Script: Mand} \p{Script=Mandaic} (29)
\p{Script: Mandaic} (Short: \p{Sc=Mand}) (29)
\p{Script: Mani} \p{Script=Manichaean} (51)
\p{Script: Manichaean} (Short: \p{Sc=Mani}) (51)
\p{Script: Marc} \p{Script=Marchen} (68)
\p{Script: Marchen} (Short: \p{Sc=Marc}) (68)
\p{Script: Meetei_Mayek} (Short: \p{Sc=Mtei}) (79)
\p{Script: Mend} \p{Script=Mende_Kikakui} (213)
\p{Script: Mende_Kikakui} (Short: \p{Sc=Mend}) (213)
\p{Script: Merc} \p{Script=Meroitic_Cursive} (90)
\p{Script: Mero} \p{Script=Meroitic_Hieroglyphs} (32)
\p{Script: Meroitic_Cursive} (Short: \p{Sc=Merc}) (90)
\p{Script: Meroitic_Hieroglyphs} (Short: \p{Sc=Mero}) (32)
\p{Script: Miao} (Short: \p{Sc=Miao}) (133)
\p{Script: Mlym} \p{Script=Malayalam} (114)
\p{Script: Modi} (Short: \p{Sc=Modi}) (79)
\p{Script: Mong} \p{Script=Mongolian} (166)
\p{Script: Mongolian} (Short: \p{Sc=Mong}) (166)
\p{Script: Mro} (Short: \p{Sc=Mro}) (43)
\p{Script: Mroo} \p{Script=Mro} (43)
\p{Script: Mtei} \p{Script=Meetei_Mayek} (79)
\p{Script: Mult} \p{Script=Multani} (38)
\p{Script: Multani} (Short: \p{Sc=Mult}) (38)
\p{Script: Myanmar} (Short: \p{Sc=Mymr}) (223)
\p{Script: Mymr} \p{Script=Myanmar} (223)
\p{Script: Nabataean} (Short: \p{Sc=Nbat}) (40)
\p{Script: Narb} \p{Script=Old_North_Arabian} (32)
\p{Script: Nbat} \p{Script=Nabataean} (40)
\p{Script: New_Tai_Lue} (Short: \p{Sc=Talu}) (83)
\p{Script: Newa} (Short: \p{Sc=Newa}) (92)
\p{Script: Nko} (Short: \p{Sc=Nko}) (59)
\p{Script: Nkoo} \p{Script=Nko} (59)
\p{Script: Ogam} \p{Script=Ogham} (29)
\p{Script: Ogham} (Short: \p{Sc=Ogam}) (29)
\p{Script: Ol_Chiki} (Short: \p{Sc=Olck}) (48)
\p{Script: Olck} \p{Script=Ol_Chiki} (48)
\p{Script: Old_Hungarian} (Short: \p{Sc=Hung}) (108)
\p{Script: Old_Italic} (Short: \p{Sc=Ital}) (36)
\p{Script: Old_North_Arabian} (Short: \p{Sc=Narb}) (32)
\p{Script: Old_Permic} (Short: \p{Sc=Perm}) (43)
\p{Script: Old_Persian} (Short: \p{Sc=Xpeo}) (50)
\p{Script: Old_South_Arabian} (Short: \p{Sc=Sarb}) (32)
\p{Script: Old_Turkic} (Short: \p{Sc=Orkh}) (73)
\p{Script: Oriya} (Short: \p{Sc=Orya}) (90)
\p{Script: Orkh} \p{Script=Old_Turkic} (73)
\p{Script: Orya} \p{Script=Oriya} (90)
\p{Script: Osage} (Short: \p{Sc=Osge}) (72)
\p{Script: Osge} \p{Script=Osage} (72)
\p{Script: Osma} \p{Script=Osmanya} (40)
\p{Script: Osmanya} (Short: \p{Sc=Osma}) (40)
\p{Script: Pahawh_Hmong} (Short: \p{Sc=Hmng}) (127)
\p{Script: Palm} \p{Script=Palmyrene} (32)
\p{Script: Palmyrene} (Short: \p{Sc=Palm}) (32)
\p{Script: Pau_Cin_Hau} (Short: \p{Sc=Pauc}) (57)
\p{Script: Pauc} \p{Script=Pau_Cin_Hau} (57)
\p{Script: Perm} \p{Script=Old_Permic} (43)
\p{Script: Phag} \p{Script=Phags_Pa} (56)
\p{Script: Phags_Pa} (Short: \p{Sc=Phag}) (56)
\p{Script: Phli} \p{Script=Inscriptional_Pahlavi} (27)
\p{Script: Phlp} \p{Script=Psalter_Pahlavi} (29)
\p{Script: Phnx} \p{Script=Phoenician} (29)
\p{Script: Phoenician} (Short: \p{Sc=Phnx}) (29)
\p{Script: Plrd} \p{Script=Miao} (133)
\p{Script: Prti} \p{Script=Inscriptional_Parthian} (30)
\p{Script: Psalter_Pahlavi} (Short: \p{Sc=Phlp}) (29)
\p{Script: Qaac} \p{Script=Coptic} (137)
\p{Script: Qaai} \p{Script=Inherited} (564)
\p{Script: Rejang} (Short: \p{Sc=Rjng}) (37)
\p{Script: Rjng} \p{Script=Rejang} (37)
\p{Script: Runic} (Short: \p{Sc=Runr}) (86)
\p{Script: Runr} \p{Script=Runic} (86)
\p{Script: Samaritan} (Short: \p{Sc=Samr}) (61)
\p{Script: Samr} \p{Script=Samaritan} (61)
\p{Script: Sarb} \p{Script=Old_South_Arabian} (32)
\p{Script: Saur} \p{Script=Saurashtra} (82)
\p{Script: Saurashtra} (Short: \p{Sc=Saur}) (82)
\p{Script: Sgnw} \p{Script=SignWriting} (672)
\p{Script: Sharada} (Short: \p{Sc=Shrd}) (94)
\p{Script: Shavian} (Short: \p{Sc=Shaw}) (48)
\p{Script: Shaw} \p{Script=Shavian} (48)
\p{Script: Shrd} \p{Script=Sharada} (94)
\p{Script: Sidd} \p{Script=Siddham} (92)
\p{Script: Siddham} (Short: \p{Sc=Sidd}) (92)
\p{Script: SignWriting} (Short: \p{Sc=Sgnw}) (672)
\p{Script: Sind} \p{Script=Khudawadi} (69)
\p{Script: Sinh} \p{Script=Sinhala} (110)
\p{Script: Sinhala} (Short: \p{Sc=Sinh}) (110)
\p{Script: Sora} \p{Script=Sora_Sompeng} (35)
\p{Script: Sora_Sompeng} (Short: \p{Sc=Sora}) (35)
\p{Script: Sund} \p{Script=Sundanese} (72)
\p{Script: Sundanese} (Short: \p{Sc=Sund}) (72)
\p{Script: Sylo} \p{Script=Syloti_Nagri} (44)
\p{Script: Syloti_Nagri} (Short: \p{Sc=Sylo}) (44)
\p{Script: Syrc} \p{Script=Syriac} (77)
\p{Script: Syriac} (Short: \p{Sc=Syrc}) (77)
\p{Script: Tagalog} (Short: \p{Sc=Tglg}) (20)
\p{Script: Tagb} \p{Script=Tagbanwa} (18)
\p{Script: Tagbanwa} (Short: \p{Sc=Tagb}) (18)
\p{Script: Tai_Le} (Short: \p{Sc=Tale}) (35)
\p{Script: Tai_Tham} (Short: \p{Sc=Lana}) (127)
\p{Script: Tai_Viet} (Short: \p{Sc=Tavt}) (72)
\p{Script: Takr} \p{Script=Takri} (66)
\p{Script: Takri} (Short: \p{Sc=Takr}) (66)
\p{Script: Tale} \p{Script=Tai_Le} (35)
\p{Script: Talu} \p{Script=New_Tai_Lue} (83)
\p{Script: Tamil} (Short: \p{Sc=Taml}) (72)
\p{Script: Taml} \p{Script=Tamil} (72)
\p{Script: Tang} \p{Script=Tangut} (6881)
\p{Script: Tangut} (Short: \p{Sc=Tang}) (6881)
\p{Script: Tavt} \p{Script=Tai_Viet} (72)
\p{Script: Telu} \p{Script=Telugu} (96)
\p{Script: Telugu} (Short: \p{Sc=Telu}) (96)
\p{Script: Tfng} \p{Script=Tifinagh} (59)
\p{Script: Tglg} \p{Script=Tagalog} (20)
\p{Script: Thaa} \p{Script=Thaana} (50)
\p{Script: Thaana} (Short: \p{Sc=Thaa}) (50)
\p{Script: Thai} (Short: \p{Sc=Thai}) (86)
\p{Script: Tibetan} (Short: \p{Sc=Tibt}) (207)
\p{Script: Tibt} \p{Script=Tibetan} (207)
\p{Script: Tifinagh} (Short: \p{Sc=Tfng}) (59)
\p{Script: Tirh} \p{Script=Tirhuta} (82)
\p{Script: Tirhuta} (Short: \p{Sc=Tirh}) (82)
\p{Script: Ugar} \p{Script=Ugaritic} (31)
\p{Script: Ugaritic} (Short: \p{Sc=Ugar}) (31)
\p{Script: Unknown} (Short: \p{Sc=Zzzz}) (985_875 plus all
above-Unicode code points)
\p{Script: Vai} (Short: \p{Sc=Vai}) (300)
\p{Script: Vaii} \p{Script=Vai} (300)
\p{Script: Wara} \p{Script=Warang_Citi} (84)
\p{Script: Warang_Citi} (Short: \p{Sc=Wara}) (84)
\p{Script: Xpeo} \p{Script=Old_Persian} (50)
\p{Script: Xsux} \p{Script=Cuneiform} (1234)
\p{Script: Yi} (Short: \p{Sc=Yi}) (1220)
\p{Script: Yiii} \p{Script=Yi} (1220)
\p{Script: Zinh} \p{Script=Inherited} (564)
\p{Script: Zyyy} \p{Script=Common} (7279)
\p{Script: Zzzz} \p{Script=Unknown} (985_875 plus all
above-Unicode code points)
\p{Script_Extensions: Adlam} (Short: \p{Scx=Adlm}, \p{Adlm}) (88)
\p{Script_Extensions: Adlm} \p{Script_Extensions=Adlam} (88)
\p{Script_Extensions: Aghb} \p{Script_Extensions=
Caucasian_Albanian} (53)
\p{Script_Extensions: Ahom} (Short: \p{Scx=Ahom}, \p{Ahom}) (57)
\p{Script_Extensions: Anatolian_Hieroglyphs} (Short: \p{Scx=Hluw},
\p{Hluw}) (583)
\p{Script_Extensions: Arab} \p{Script_Extensions=Arabic} (1323)
\p{Script_Extensions: Arabic} (Short: \p{Scx=Arab}, \p{Arab})
(1323)
\p{Script_Extensions: Armenian} (Short: \p{Scx=Armn}, \p{Armn})
(94)
\p{Script_Extensions: Armi} \p{Script_Extensions=Imperial_Aramaic}
(31)
\p{Script_Extensions: Armn} \p{Script_Extensions=Armenian} (94)
\p{Script_Extensions: Avestan} (Short: \p{Scx=Avst}, \p{Avst}) (61)
\p{Script_Extensions: Avst} \p{Script_Extensions=Avestan} (61)
\p{Script_Extensions: Bali} \p{Script_Extensions=Balinese} (121)
\p{Script_Extensions: Balinese} (Short: \p{Scx=Bali}, \p{Bali})
(121)
\p{Script_Extensions: Bamu} \p{Script_Extensions=Bamum} (657)
\p{Script_Extensions: Bamum} (Short: \p{Scx=Bamu}, \p{Bamu}) (657)
\p{Script_Extensions: Bass} \p{Script_Extensions=Bassa_Vah} (36)
\p{Script_Extensions: Bassa_Vah} (Short: \p{Scx=Bass}, \p{Bass})
(36)
\p{Script_Extensions: Batak} (Short: \p{Scx=Batk}, \p{Batk}) (56)
\p{Script_Extensions: Batk} \p{Script_Extensions=Batak} (56)
\p{Script_Extensions: Beng} \p{Script_Extensions=Bengali} (98)
\p{Script_Extensions: Bengali} (Short: \p{Scx=Beng}, \p{Beng}) (98)
\p{Script_Extensions: Bhaiksuki} (Short: \p{Scx=Bhks}, \p{Bhks})
(97)
\p{Script_Extensions: Bhks} \p{Script_Extensions=Bhaiksuki} (97)
\p{Script_Extensions: Bopo} \p{Script_Extensions=Bopomofo} (110)
\p{Script_Extensions: Bopomofo} (Short: \p{Scx=Bopo}, \p{Bopo})
(110)
\p{Script_Extensions: Brah} \p{Script_Extensions=Brahmi} (109)
\p{Script_Extensions: Brahmi} (Short: \p{Scx=Brah}, \p{Brah}) (109)
\p{Script_Extensions: Brai} \p{Script_Extensions=Braille} (256)
\p{Script_Extensions: Braille} (Short: \p{Scx=Brai}, \p{Brai})
(256)
\p{Script_Extensions: Bugi} \p{Script_Extensions=Buginese} (31)
\p{Script_Extensions: Buginese} (Short: \p{Scx=Bugi}, \p{Bugi})
(31)
\p{Script_Extensions: Buhd} \p{Script_Extensions=Buhid} (22)
\p{Script_Extensions: Buhid} (Short: \p{Scx=Buhd}, \p{Buhd}) (22)
\p{Script_Extensions: Cakm} \p{Script_Extensions=Chakma} (87)
\p{Script_Extensions: Canadian_Aboriginal} (Short: \p{Scx=Cans},
\p{Cans}) (710)
\p{Script_Extensions: Cans} \p{Script_Extensions=
Canadian_Aboriginal} (710)
\p{Script_Extensions: Cari} \p{Script_Extensions=Carian} (49)
\p{Script_Extensions: Carian} (Short: \p{Scx=Cari}, \p{Cari}) (49)
\p{Script_Extensions: Caucasian_Albanian} (Short: \p{Scx=Aghb},
\p{Aghb}) (53)
\p{Script_Extensions: Chakma} (Short: \p{Scx=Cakm}, \p{Cakm}) (87)
\p{Script_Extensions: Cham} (Short: \p{Scx=Cham}, \p{Cham}) (83)
\p{Script_Extensions: Cher} \p{Script_Extensions=Cherokee} (172)
\p{Script_Extensions: Cherokee} (Short: \p{Scx=Cher}, \p{Cher})
(172)
\p{Script_Extensions: Common} (Short: \p{Scx=Zyyy}, \p{Zyyy})
(6864)
\p{Script_Extensions: Copt} \p{Script_Extensions=Coptic} (165)
\p{Script_Extensions: Coptic} (Short: \p{Scx=Copt}, \p{Copt}) (165)
\p{Script_Extensions: Cprt} \p{Script_Extensions=Cypriot} (112)
\p{Script_Extensions: Cuneiform} (Short: \p{Scx=Xsux}, \p{Xsux})
(1234)
\p{Script_Extensions: Cypriot} (Short: \p{Scx=Cprt}, \p{Cprt})
(112)
\p{Script_Extensions: Cyrillic} (Short: \p{Scx=Cyrl}, \p{Cyrl})
(446)
\p{Script_Extensions: Cyrl} \p{Script_Extensions=Cyrillic} (446)
\p{Script_Extensions: Deseret} (Short: \p{Scx=Dsrt}, \p{Dsrt}) (80)
\p{Script_Extensions: Deva} \p{Script_Extensions=Devanagari} (210)
\p{Script_Extensions: Devanagari} (Short: \p{Scx=Deva}, \p{Deva})
(210)
\p{Script_Extensions: Dsrt} \p{Script_Extensions=Deseret} (80)
\p{Script_Extensions: Dupl} \p{Script_Extensions=Duployan} (147)
\p{Script_Extensions: Duployan} (Short: \p{Scx=Dupl}, \p{Dupl})
(147)
\p{Script_Extensions: Egyp} \p{Script_Extensions=
Egyptian_Hieroglyphs} (1071)
\p{Script_Extensions: Egyptian_Hieroglyphs} (Short: \p{Scx=Egyp},
\p{Egyp}) (1071)
\p{Script_Extensions: Elba} \p{Script_Extensions=Elbasan} (40)
\p{Script_Extensions: Elbasan} (Short: \p{Scx=Elba}, \p{Elba}) (40)
\p{Script_Extensions: Ethi} \p{Script_Extensions=Ethiopic} (495)
\p{Script_Extensions: Ethiopic} (Short: \p{Scx=Ethi}, \p{Ethi})
(495)
\p{Script_Extensions: Geor} \p{Script_Extensions=Georgian} (129)
\p{Script_Extensions: Georgian} (Short: \p{Scx=Geor}, \p{Geor})
(129)
\p{Script_Extensions: Glag} \p{Script_Extensions=Glagolitic} (136)
\p{Script_Extensions: Glagolitic} (Short: \p{Scx=Glag}, \p{Glag})
(136)
\p{Script_Extensions: Goth} \p{Script_Extensions=Gothic} (27)
\p{Script_Extensions: Gothic} (Short: \p{Scx=Goth}, \p{Goth}) (27)
\p{Script_Extensions: Gran} \p{Script_Extensions=Grantha} (113)
\p{Script_Extensions: Grantha} (Short: \p{Scx=Gran}, \p{Gran})
(113)
\p{Script_Extensions: Greek} (Short: \p{Scx=Grek}, \p{Grek}) (522)
\p{Script_Extensions: Grek} \p{Script_Extensions=Greek} (522)
\p{Script_Extensions: Gujarati} (Short: \p{Scx=Gujr}, \p{Gujr})
(99)
\p{Script_Extensions: Gujr} \p{Script_Extensions=Gujarati} (99)
\p{Script_Extensions: Gurmukhi} (Short: \p{Scx=Guru}, \p{Guru})
(93)
\p{Script_Extensions: Guru} \p{Script_Extensions=Gurmukhi} (93)
\p{Script_Extensions: Han} (Short: \p{Scx=Han}, \p{Han}) (82_013)
\p{Script_Extensions: Hang} \p{Script_Extensions=Hangul} (11_775)
\p{Script_Extensions: Hangul} (Short: \p{Scx=Hang}, \p{Hang})
(11_775)
\p{Script_Extensions: Hani} \p{Script_Extensions=Han} (82_013)
\p{Script_Extensions: Hano} \p{Script_Extensions=Hanunoo} (23)
\p{Script_Extensions: Hanunoo} (Short: \p{Scx=Hano}, \p{Hano}) (23)
\p{Script_Extensions: Hatr} \p{Script_Extensions=Hatran} (26)
\p{Script_Extensions: Hatran} (Short: \p{Scx=Hatr}, \p{Hatr}) (26)
\p{Script_Extensions: Hebr} \p{Script_Extensions=Hebrew} (133)
\p{Script_Extensions: Hebrew} (Short: \p{Scx=Hebr}, \p{Hebr}) (133)
\p{Script_Extensions: Hira} \p{Script_Extensions=Hiragana} (143)
\p{Script_Extensions: Hiragana} (Short: \p{Scx=Hira}, \p{Hira})
(143)
\p{Script_Extensions: Hluw} \p{Script_Extensions=
Anatolian_Hieroglyphs} (583)
\p{Script_Extensions: Hmng} \p{Script_Extensions=Pahawh_Hmong}
(127)
\p{Script_Extensions: Hung} \p{Script_Extensions=Old_Hungarian}
(108)
\p{Script_Extensions: Imperial_Aramaic} (Short: \p{Scx=Armi},
\p{Armi}) (31)
\p{Script_Extensions: Inherited} (Short: \p{Scx=Zinh}, \p{Zinh})
(496)
\p{Script_Extensions: Inscriptional_Pahlavi} (Short: \p{Scx=Phli},
\p{Phli}) (27)
\p{Script_Extensions: Inscriptional_Parthian} (Short: \p{Scx=
Prti}, \p{Prti}) (30)
\p{Script_Extensions: Ital} \p{Script_Extensions=Old_Italic} (36)
\p{Script_Extensions: Java} \p{Script_Extensions=Javanese} (91)
\p{Script_Extensions: Javanese} (Short: \p{Scx=Java}, \p{Java})
(91)
\p{Script_Extensions: Kaithi} (Short: \p{Scx=Kthi}, \p{Kthi}) (86)
\p{Script_Extensions: Kali} \p{Script_Extensions=Kayah_Li} (48)
\p{Script_Extensions: Kana} \p{Script_Extensions=Katakana} (352)
\p{Script_Extensions: Kannada} (Short: \p{Scx=Knda}, \p{Knda})
(100)
\p{Script_Extensions: Katakana} (Short: \p{Scx=Kana}, \p{Kana})
(352)
\p{Script_Extensions: Kayah_Li} (Short: \p{Scx=Kali}, \p{Kali})
(48)
\p{Script_Extensions: Khar} \p{Script_Extensions=Kharoshthi} (65)
\p{Script_Extensions: Kharoshthi} (Short: \p{Scx=Khar}, \p{Khar})
(65)
\p{Script_Extensions: Khmer} (Short: \p{Scx=Khmr}, \p{Khmr}) (146)
\p{Script_Extensions: Khmr} \p{Script_Extensions=Khmer} (146)
\p{Script_Extensions: Khoj} \p{Script_Extensions=Khojki} (72)
\p{Script_Extensions: Khojki} (Short: \p{Scx=Khoj}, \p{Khoj}) (72)
\p{Script_Extensions: Khudawadi} (Short: \p{Scx=Sind}, \p{Sind})
(81)
\p{Script_Extensions: Knda} \p{Script_Extensions=Kannada} (100)
\p{Script_Extensions: Kthi} \p{Script_Extensions=Kaithi} (86)
\p{Script_Extensions: Lana} \p{Script_Extensions=Tai_Tham} (127)
\p{Script_Extensions: Lao} (Short: \p{Scx=Lao}, \p{Lao}) (67)
\p{Script_Extensions: Laoo} \p{Script_Extensions=Lao} (67)
\p{Script_Extensions: Latin} (Short: \p{Scx=Latn}, \p{Latn}) (1370)
\p{Script_Extensions: Latn} \p{Script_Extensions=Latin} (1370)
\p{Script_Extensions: Lepc} \p{Script_Extensions=Lepcha} (74)
\p{Script_Extensions: Lepcha} (Short: \p{Scx=Lepc}, \p{Lepc}) (74)
\p{Script_Extensions: Limb} \p{Script_Extensions=Limbu} (69)
\p{Script_Extensions: Limbu} (Short: \p{Scx=Limb}, \p{Limb}) (69)
\p{Script_Extensions: Lina} \p{Script_Extensions=Linear_A} (386)
\p{Script_Extensions: Linb} \p{Script_Extensions=Linear_B} (268)
\p{Script_Extensions: Linear_A} (Short: \p{Scx=Lina}, \p{Lina})
(386)
\p{Script_Extensions: Linear_B} (Short: \p{Scx=Linb}, \p{Linb})
(268)
\p{Script_Extensions: Lisu} (Short: \p{Scx=Lisu}, \p{Lisu}) (48)
\p{Script_Extensions: Lyci} \p{Script_Extensions=Lycian} (29)
\p{Script_Extensions: Lycian} (Short: \p{Scx=Lyci}, \p{Lyci}) (29)
\p{Script_Extensions: Lydi} \p{Script_Extensions=Lydian} (27)
\p{Script_Extensions: Lydian} (Short: \p{Scx=Lydi}, \p{Lydi}) (27)
\p{Script_Extensions: Mahajani} (Short: \p{Scx=Mahj}, \p{Mahj})
(61)
\p{Script_Extensions: Mahj} \p{Script_Extensions=Mahajani} (61)
\p{Script_Extensions: Malayalam} (Short: \p{Scx=Mlym}, \p{Mlym})
(119)
\p{Script_Extensions: Mand} \p{Script_Extensions=Mandaic} (30)
\p{Script_Extensions: Mandaic} (Short: \p{Scx=Mand}, \p{Mand}) (30)
\p{Script_Extensions: Mani} \p{Script_Extensions=Manichaean} (52)
\p{Script_Extensions: Manichaean} (Short: \p{Scx=Mani}, \p{Mani})
(52)
\p{Script_Extensions: Marc} \p{Script_Extensions=Marchen} (68)
\p{Script_Extensions: Marchen} (Short: \p{Scx=Marc}, \p{Marc}) (68)
\p{Script_Extensions: Meetei_Mayek} (Short: \p{Scx=Mtei},
\p{Mtei}) (79)
\p{Script_Extensions: Mend} \p{Script_Extensions=Mende_Kikakui}
(213)
\p{Script_Extensions: Mende_Kikakui} (Short: \p{Scx=Mend},
\p{Mend}) (213)
\p{Script_Extensions: Merc} \p{Script_Extensions=Meroitic_Cursive}
(90)
\p{Script_Extensions: Mero} \p{Script_Extensions=
Meroitic_Hieroglyphs} (32)
\p{Script_Extensions: Meroitic_Cursive} (Short: \p{Scx=Merc},
\p{Merc}) (90)
\p{Script_Extensions: Meroitic_Hieroglyphs} (Short: \p{Scx=Mero},
\p{Mero}) (32)
\p{Script_Extensions: Miao} (Short: \p{Scx=Miao}, \p{Miao}) (133)
\p{Script_Extensions: Mlym} \p{Script_Extensions=Malayalam} (119)
\p{Script_Extensions: Modi} (Short: \p{Scx=Modi}, \p{Modi}) (89)
\p{Script_Extensions: Mong} \p{Script_Extensions=Mongolian} (169)
\p{Script_Extensions: Mongolian} (Short: \p{Scx=Mong}, \p{Mong})
(169)
\p{Script_Extensions: Mro} (Short: \p{Scx=Mro}, \p{Mro}) (43)
\p{Script_Extensions: Mroo} \p{Script_Extensions=Mro} (43)
\p{Script_Extensions: Mtei} \p{Script_Extensions=Meetei_Mayek} (79)
\p{Script_Extensions: Mult} \p{Script_Extensions=Multani} (48)
\p{Script_Extensions: Multani} (Short: \p{Scx=Mult}, \p{Mult}) (48)
\p{Script_Extensions: Myanmar} (Short: \p{Scx=Mymr}, \p{Mymr})
(224)
\p{Script_Extensions: Mymr} \p{Script_Extensions=Myanmar} (224)
\p{Script_Extensions: Nabataean} (Short: \p{Scx=Nbat}, \p{Nbat})
(40)
\p{Script_Extensions: Narb} \p{Script_Extensions=
Old_North_Arabian} (32)
\p{Script_Extensions: Nbat} \p{Script_Extensions=Nabataean} (40)
\p{Script_Extensions: New_Tai_Lue} (Short: \p{Scx=Talu}, \p{Talu})
(83)
\p{Script_Extensions: Newa} (Short: \p{Scx=Newa}, \p{Newa}) (92)
\p{Script_Extensions: Nko} (Short: \p{Scx=Nko}, \p{Nko}) (59)
\p{Script_Extensions: Nkoo} \p{Script_Extensions=Nko} (59)
\p{Script_Extensions: Ogam} \p{Script_Extensions=Ogham} (29)
\p{Script_Extensions: Ogham} (Short: \p{Scx=Ogam}, \p{Ogam}) (29)
\p{Script_Extensions: Ol_Chiki} (Short: \p{Scx=Olck}, \p{Olck})
(48)
\p{Script_Extensions: Olck} \p{Script_Extensions=Ol_Chiki} (48)
\p{Script_Extensions: Old_Hungarian} (Short: \p{Scx=Hung},
\p{Hung}) (108)
\p{Script_Extensions: Old_Italic} (Short: \p{Scx=Ital}, \p{Ital})
(36)
\p{Script_Extensions: Old_North_Arabian} (Short: \p{Scx=Narb},
\p{Narb}) (32)
\p{Script_Extensions: Old_Permic} (Short: \p{Scx=Perm}, \p{Perm})
(44)
\p{Script_Extensions: Old_Persian} (Short: \p{Scx=Xpeo}, \p{Xpeo})
(50)
\p{Script_Extensions: Old_South_Arabian} (Short: \p{Scx=Sarb},
\p{Sarb}) (32)
\p{Script_Extensions: Old_Turkic} (Short: \p{Scx=Orkh}, \p{Orkh})
(73)
\p{Script_Extensions: Oriya} (Short: \p{Scx=Orya}, \p{Orya}) (94)
\p{Script_Extensions: Orkh} \p{Script_Extensions=Old_Turkic} (73)
\p{Script_Extensions: Orya} \p{Script_Extensions=Oriya} (94)
\p{Script_Extensions: Osage} (Short: \p{Scx=Osge}, \p{Osge}) (72)
\p{Script_Extensions: Osge} \p{Script_Extensions=Osage} (72)
\p{Script_Extensions: Osma} \p{Script_Extensions=Osmanya} (40)
\p{Script_Extensions: Osmanya} (Short: \p{Scx=Osma}, \p{Osma}) (40)
\p{Script_Extensions: Pahawh_Hmong} (Short: \p{Scx=Hmng},
\p{Hmng}) (127)
\p{Script_Extensions: Palm} \p{Script_Extensions=Palmyrene} (32)
\p{Script_Extensions: Palmyrene} (Short: \p{Scx=Palm}, \p{Palm})
(32)
\p{Script_Extensions: Pau_Cin_Hau} (Short: \p{Scx=Pauc}, \p{Pauc})
(57)
\p{Script_Extensions: Pauc} \p{Script_Extensions=Pau_Cin_Hau} (57)
\p{Script_Extensions: Perm} \p{Script_Extensions=Old_Permic} (44)
\p{Script_Extensions: Phag} \p{Script_Extensions=Phags_Pa} (59)
\p{Script_Extensions: Phags_Pa} (Short: \p{Scx=Phag}, \p{Phag})
(59)
\p{Script_Extensions: Phli} \p{Script_Extensions=
Inscriptional_Pahlavi} (27)
\p{Script_Extensions: Phlp} \p{Script_Extensions=Psalter_Pahlavi}
(30)
\p{Script_Extensions: Phnx} \p{Script_Extensions=Phoenician} (29)
\p{Script_Extensions: Phoenician} (Short: \p{Scx=Phnx}, \p{Phnx})
(29)
\p{Script_Extensions: Plrd} \p{Script_Extensions=Miao} (133)
\p{Script_Extensions: Prti} \p{Script_Extensions=
Inscriptional_Parthian} (30)
\p{Script_Extensions: Psalter_Pahlavi} (Short: \p{Scx=Phlp},
\p{Phlp}) (30)
\p{Script_Extensions: Qaac} \p{Script_Extensions=Coptic} (165)
\p{Script_Extensions: Qaai} \p{Script_Extensions=Inherited} (496)
\p{Script_Extensions: Rejang} (Short: \p{Scx=Rjng}, \p{Rjng}) (37)
\p{Script_Extensions: Rjng} \p{Script_Extensions=Rejang} (37)
\p{Script_Extensions: Runic} (Short: \p{Scx=Runr}, \p{Runr}) (86)
\p{Script_Extensions: Runr} \p{Script_Extensions=Runic} (86)
\p{Script_Extensions: Samaritan} (Short: \p{Scx=Samr}, \p{Samr})
(61)
\p{Script_Extensions: Samr} \p{Script_Extensions=Samaritan} (61)
\p{Script_Extensions: Sarb} \p{Script_Extensions=
Old_South_Arabian} (32)
\p{Script_Extensions: Saur} \p{Script_Extensions=Saurashtra} (82)
\p{Script_Extensions: Saurashtra} (Short: \p{Scx=Saur}, \p{Saur})
(82)
\p{Script_Extensions: Sgnw} \p{Script_Extensions=SignWriting} (672)
\p{Script_Extensions: Sharada} (Short: \p{Scx=Shrd}, \p{Shrd})
(100)
\p{Script_Extensions: Shavian} (Short: \p{Scx=Shaw}, \p{Shaw}) (48)
\p{Script_Extensions: Shaw} \p{Script_Extensions=Shavian} (48)
\p{Script_Extensions: Shrd} \p{Script_Extensions=Sharada} (100)
\p{Script_Extensions: Sidd} \p{Script_Extensions=Siddham} (92)
\p{Script_Extensions: Siddham} (Short: \p{Scx=Sidd}, \p{Sidd}) (92)
\p{Script_Extensions: SignWriting} (Short: \p{Scx=Sgnw}, \p{Sgnw})
(672)
\p{Script_Extensions: Sind} \p{Script_Extensions=Khudawadi} (81)
\p{Script_Extensions: Sinh} \p{Script_Extensions=Sinhala} (112)
\p{Script_Extensions: Sinhala} (Short: \p{Scx=Sinh}, \p{Sinh})
(112)
\p{Script_Extensions: Sora} \p{Script_Extensions=Sora_Sompeng} (35)
\p{Script_Extensions: Sora_Sompeng} (Short: \p{Scx=Sora},
\p{Sora}) (35)
\p{Script_Extensions: Sund} \p{Script_Extensions=Sundanese} (72)
\p{Script_Extensions: Sundanese} (Short: \p{Scx=Sund}, \p{Sund})
(72)
\p{Script_Extensions: Sylo} \p{Script_Extensions=Syloti_Nagri} (56)
\p{Script_Extensions: Syloti_Nagri} (Short: \p{Scx=Sylo},
\p{Sylo}) (56)
\p{Script_Extensions: Syrc} \p{Script_Extensions=Syriac} (93)
\p{Script_Extensions: Syriac} (Short: \p{Scx=Syrc}, \p{Syrc}) (93)
\p{Script_Extensions: Tagalog} (Short: \p{Scx=Tglg}, \p{Tglg}) (22)
\p{Script_Extensions: Tagb} \p{Script_Extensions=Tagbanwa} (20)
\p{Script_Extensions: Tagbanwa} (Short: \p{Scx=Tagb}, \p{Tagb})
(20)
\p{Script_Extensions: Tai_Le} (Short: \p{Scx=Tale}, \p{Tale}) (45)
\p{Script_Extensions: Tai_Tham} (Short: \p{Scx=Lana}, \p{Lana})
(127)
\p{Script_Extensions: Tai_Viet} (Short: \p{Scx=Tavt}, \p{Tavt})
(72)
\p{Script_Extensions: Takr} \p{Script_Extensions=Takri} (78)
\p{Script_Extensions: Takri} (Short: \p{Scx=Takr}, \p{Takr}) (78)
\p{Script_Extensions: Tale} \p{Script_Extensions=Tai_Le} (45)
\p{Script_Extensions: Talu} \p{Script_Extensions=New_Tai_Lue} (83)
\p{Script_Extensions: Tamil} (Short: \p{Scx=Taml}, \p{Taml}) (80)
\p{Script_Extensions: Taml} \p{Script_Extensions=Tamil} (80)
\p{Script_Extensions: Tang} \p{Script_Extensions=Tangut} (6881)
\p{Script_Extensions: Tangut} (Short: \p{Scx=Tang}, \p{Tang})
(6881)
\p{Script_Extensions: Tavt} \p{Script_Extensions=Tai_Viet} (72)
\p{Script_Extensions: Telu} \p{Script_Extensions=Telugu} (101)
\p{Script_Extensions: Telugu} (Short: \p{Scx=Telu}, \p{Telu}) (101)
\p{Script_Extensions: Tfng} \p{Script_Extensions=Tifinagh} (59)
\p{Script_Extensions: Tglg} \p{Script_Extensions=Tagalog} (22)
\p{Script_Extensions: Thaa} \p{Script_Extensions=Thaana} (65)
\p{Script_Extensions: Thaana} (Short: \p{Scx=Thaa}, \p{Thaa}) (65)
\p{Script_Extensions: Thai} (Short: \p{Scx=Thai}, \p{Thai}) (86)
\p{Script_Extensions: Tibetan} (Short: \p{Scx=Tibt}, \p{Tibt})
(207)
\p{Script_Extensions: Tibt} \p{Script_Extensions=Tibetan} (207)
\p{Script_Extensions: Tifinagh} (Short: \p{Scx=Tfng}, \p{Tfng})
(59)
\p{Script_Extensions: Tirh} \p{Script_Extensions=Tirhuta} (94)
\p{Script_Extensions: Tirhuta} (Short: \p{Scx=Tirh}, \p{Tirh}) (94)
\p{Script_Extensions: Ugar} \p{Script_Extensions=Ugaritic} (31)
\p{Script_Extensions: Ugaritic} (Short: \p{Scx=Ugar}, \p{Ugar})
(31)
\p{Script_Extensions: Unknown} (Short: \p{Scx=Zzzz}, \p{Zzzz})
(985_875 plus all above-Unicode code
points)
\p{Script_Extensions: Vai} (Short: \p{Scx=Vai}, \p{Vai}) (300)
\p{Script_Extensions: Vaii} \p{Script_Extensions=Vai} (300)
\p{Script_Extensions: Wara} \p{Script_Extensions=Warang_Citi} (84)
\p{Script_Extensions: Warang_Citi} (Short: \p{Scx=Wara}, \p{Wara})
(84)
\p{Script_Extensions: Xpeo} \p{Script_Extensions=Old_Persian} (50)
\p{Script_Extensions: Xsux} \p{Script_Extensions=Cuneiform} (1234)
\p{Script_Extensions: Yi} (Short: \p{Scx=Yi}, \p{Yi}) (1246)
\p{Script_Extensions: Yiii} \p{Script_Extensions=Yi} (1246)
\p{Script_Extensions: Zinh} \p{Script_Extensions=Inherited} (496)
\p{Script_Extensions: Zyyy} \p{Script_Extensions=Common} (6864)
\p{Script_Extensions: Zzzz} \p{Script_Extensions=Unknown} (985_875
plus all above-Unicode code points)
\p{Scx: *} \p{Script_Extensions: *}
\p{SD} \p{Soft_Dotted} (= \p{Soft_Dotted=Y}) (46)
\p{SD: *} \p{Soft_Dotted: *}
\p{Sentence_Break: AT} \p{Sentence_Break=ATerm} (4)
\p{Sentence_Break: ATerm} (Short: \p{SB=AT}) (4)
\p{Sentence_Break: CL} \p{Sentence_Break=Close} (187)
\p{Sentence_Break: Close} (Short: \p{SB=CL}) (187)
\p{Sentence_Break: CR} (Short: \p{SB=CR}) (1)
\p{Sentence_Break: EX} \p{Sentence_Break=Extend} (2197)
\p{Sentence_Break: Extend} (Short: \p{SB=EX}) (2197)
\p{Sentence_Break: FO} \p{Sentence_Break=Format} (53)
\p{Sentence_Break: Format} (Short: \p{SB=FO}) (53)
\p{Sentence_Break: LE} \p{Sentence_Break=OLetter} (113_027)
\p{Sentence_Break: LF} (Short: \p{SB=LF}) (1)
\p{Sentence_Break: LO} \p{Sentence_Break=Lower} (2251)
\p{Sentence_Break: Lower} (Short: \p{SB=LO}) (2251)
\p{Sentence_Break: NU} \p{Sentence_Break=Numeric} (572)
\p{Sentence_Break: Numeric} (Short: \p{SB=NU}) (572)
\p{Sentence_Break: OLetter} (Short: \p{SB=LE}) (113_027)
\p{Sentence_Break: Other} (Short: \p{SB=XX}) (993_796 plus all
above-Unicode code points)
\p{Sentence_Break: SC} \p{Sentence_Break=SContinue} (26)
\p{Sentence_Break: SContinue} (Short: \p{SB=SC}) (26)
\p{Sentence_Break: SE} \p{Sentence_Break=Sep} (3)
\p{Sentence_Break: Sep} (Short: \p{SB=SE}) (3)
\p{Sentence_Break: Sp} (Short: \p{SB=Sp}) (20)
\p{Sentence_Break: ST} \p{Sentence_Break=STerm} (121)
\p{Sentence_Break: STerm} (Short: \p{SB=ST}) (121)
\p{Sentence_Break: UP} \p{Sentence_Break=Upper} (1853)
\p{Sentence_Break: Upper} (Short: \p{SB=UP}) (1853)
\p{Sentence_Break: XX} \p{Sentence_Break=Other} (993_796 plus all
above-Unicode code points)
\p{Sentence_Terminal} \p{Sentence_Terminal=Y} (Short: \p{STerm})
(124)
\p{Sentence_Terminal: N*} (Short: \p{STerm=N}, \P{STerm})
(1_113_988 plus all above-Unicode code
points)
\p{Sentence_Terminal: Y*} (Short: \p{STerm=Y}, \p{STerm}) (124)
\p{Separator} \p{General_Category=Separator} (Short:
\p{Z}) (19)
\p{Sgnw} \p{SignWriting} (= \p{Script_Extensions=
SignWriting}) (672)
\p{Sharada} \p{Script_Extensions=Sharada} (Short:
\p{Shrd}; NOT \p{Block=Sharada}) (100)
\p{Shavian} \p{Script_Extensions=Shavian} (Short:
\p{Shaw}) (48)
\p{Shaw} \p{Shavian} (= \p{Script_Extensions=
Shavian}) (48)
X \p{Shorthand_Format_Controls} \p{Block=Shorthand_Format_Controls}
(16)
\p{Shrd} \p{Sharada} (= \p{Script_Extensions=
Sharada}) (NOT \p{Block=Sharada}) (100)
\p{Sidd} \p{Siddham} (= \p{Script_Extensions=
Siddham}) (NOT \p{Block=Siddham}) (92)
\p{Siddham} \p{Script_Extensions=Siddham} (Short:
\p{Sidd}; NOT \p{Block=Siddham}) (92)
\p{SignWriting} \p{Script_Extensions=SignWriting} (Short:
\p{Sgnw}) (672)
\p{Sind} \p{Khudawadi} (= \p{Script_Extensions=
Khudawadi}) (NOT \p{Block=Khudawadi})
(81)
\p{Sinh} \p{Sinhala} (= \p{Script_Extensions=
Sinhala}) (NOT \p{Block=Sinhala}) (112)
\p{Sinhala} \p{Script_Extensions=Sinhala} (Short:
\p{Sinh}; NOT \p{Block=Sinhala}) (112)
X \p{Sinhala_Archaic_Numbers} \p{Block=Sinhala_Archaic_Numbers} (32)
\p{Sk} \p{Modifier_Symbol} (=
\p{General_Category=Modifier_Symbol})
(121)
\p{Sm} \p{Math_Symbol} (= \p{General_Category=
Math_Symbol}) (948)
X \p{Small_Form_Variants} \p{Block=Small_Form_Variants} (Short:
\p{InSmallForms}) (32)
X \p{Small_Forms} \p{Small_Form_Variants} (= \p{Block=
Small_Form_Variants}) (32)
\p{So} \p{Other_Symbol} (= \p{General_Category=
Other_Symbol}) (5777)
\p{Soft_Dotted} \p{Soft_Dotted=Y} (Short: \p{SD}) (46)
\p{Soft_Dotted: N*} (Short: \p{SD=N}, \P{SD}) (1_114_066 plus
all above-Unicode code points)
\p{Soft_Dotted: Y*} (Short: \p{SD=Y}, \p{SD}) (46)
\p{Sora} \p{Sora_Sompeng} (= \p{Script_Extensions=
Sora_Sompeng}) (NOT \p{Block=
Sora_Sompeng}) (35)
\p{Sora_Sompeng} \p{Script_Extensions=Sora_Sompeng} (Short:
\p{Sora}; NOT \p{Block=Sora_Sompeng})
(35)
\p{Space} \p{White_Space} (= \p{White_Space=Y}) (25)
\p{Space: *} \p{White_Space: *}
\p{Space_Separator} \p{General_Category=Space_Separator}
(Short: \p{Zs}) (17)
\p{SpacePerl} \p{XPosixSpace} (25)
\p{Spacing_Mark} \p{General_Category=Spacing_Mark} (Short:
\p{Mc}) (394)
X \p{Spacing_Modifier_Letters} \p{Block=Spacing_Modifier_Letters}
(Short: \p{InModifierLetters}) (80)
X \p{Specials} \p{Block=Specials} (16)
\p{STerm} \p{Sentence_Terminal} (=
\p{Sentence_Terminal=Y}) (124)
\p{STerm: *} \p{Sentence_Terminal: *}
\p{Sund} \p{Sundanese} (= \p{Script_Extensions=
Sundanese}) (NOT \p{Block=Sundanese})
(72)
\p{Sundanese} \p{Script_Extensions=Sundanese} (Short:
\p{Sund}; NOT \p{Block=Sundanese}) (72)
X \p{Sundanese_Sup} \p{Sundanese_Supplement} (= \p{Block=
Sundanese_Supplement}) (16)
X \p{Sundanese_Supplement} \p{Block=Sundanese_Supplement} (Short:
\p{InSundaneseSup}) (16)
X \p{Sup_Arrows_A} \p{Supplemental_Arrows_A} (= \p{Block=
Supplemental_Arrows_A}) (16)
X \p{Sup_Arrows_B} \p{Supplemental_Arrows_B} (= \p{Block=
Supplemental_Arrows_B}) (128)
X \p{Sup_Arrows_C} \p{Supplemental_Arrows_C} (= \p{Block=
Supplemental_Arrows_C}) (256)
X \p{Sup_Math_Operators} \p{Supplemental_Mathematical_Operators} (=
\p{Block=
Supplemental_Mathematical_Operators})
(256)
X \p{Sup_PUA_A} \p{Supplementary_Private_Use_Area_A} (=
\p{Block=
Supplementary_Private_Use_Area_A})
(65_536)
X \p{Sup_PUA_B} \p{Supplementary_Private_Use_Area_B} (=
\p{Block=
Supplementary_Private_Use_Area_B})
(65_536)
X \p{Sup_Punctuation} \p{Supplemental_Punctuation} (= \p{Block=
Supplemental_Punctuation}) (128)
X \p{Sup_Symbols_And_Pictographs}
\p{Supplemental_Symbols_And_Pictographs}
(= \p{Block=
Supplemental_Symbols_And_Pictographs})
(256)
X \p{Super_And_Sub} \p{Superscripts_And_Subscripts} (=
\p{Block=Superscripts_And_Subscripts})
(48)
X \p{Superscripts_And_Subscripts} \p{Block=
Superscripts_And_Subscripts} (Short:
\p{InSuperAndSub}) (48)
X \p{Supplemental_Arrows_A} \p{Block=Supplemental_Arrows_A} (Short:
\p{InSupArrowsA}) (16)
X \p{Supplemental_Arrows_B} \p{Block=Supplemental_Arrows_B} (Short:
\p{InSupArrowsB}) (128)
X \p{Supplemental_Arrows_C} \p{Block=Supplemental_Arrows_C} (Short:
\p{InSupArrowsC}) (256)
X \p{Supplemental_Mathematical_Operators} \p{Block=
Supplemental_Mathematical_Operators}
(Short: \p{InSupMathOperators}) (256)
X \p{Supplemental_Punctuation} \p{Block=Supplemental_Punctuation}
(Short: \p{InSupPunctuation}) (128)
X \p{Supplemental_Symbols_And_Pictographs} \p{Block=
Supplemental_Symbols_And_Pictographs}
(Short: \p{InSupSymbolsAndPictographs})
(256)
X \p{Supplementary_Private_Use_Area_A} \p{Block=
Supplementary_Private_Use_Area_A}
(Short: \p{InSupPUAA}) (65_536)
X \p{Supplementary_Private_Use_Area_B} \p{Block=
Supplementary_Private_Use_Area_B}
(Short: \p{InSupPUAB}) (65_536)
\p{Surrogate} \p{General_Category=Surrogate} (Short:
\p{Cs}) (2048)
X \p{Sutton_SignWriting} \p{Block=Sutton_SignWriting} (688)
\p{Sylo} \p{Syloti_Nagri} (= \p{Script_Extensions=
Syloti_Nagri}) (NOT \p{Block=
Syloti_Nagri}) (56)
\p{Syloti_Nagri} \p{Script_Extensions=Syloti_Nagri} (Short:
\p{Sylo}; NOT \p{Block=Syloti_Nagri})
(56)
\p{Symbol} \p{General_Category=Symbol} (Short: \p{S})
(6899)
\p{Syrc} \p{Syriac} (= \p{Script_Extensions=
Syriac}) (NOT \p{Block=Syriac}) (93)
\p{Syriac} \p{Script_Extensions=Syriac} (Short:
\p{Syrc}; NOT \p{Block=Syriac}) (93)
\p{Tagalog} \p{Script_Extensions=Tagalog} (Short:
\p{Tglg}; NOT \p{Block=Tagalog}) (22)
\p{Tagb} \p{Tagbanwa} (= \p{Script_Extensions=
Tagbanwa}) (NOT \p{Block=Tagbanwa}) (20)
\p{Tagbanwa} \p{Script_Extensions=Tagbanwa} (Short:
\p{Tagb}; NOT \p{Block=Tagbanwa}) (20)
X \p{Tags} \p{Block=Tags} (128)
\p{Tai_Le} \p{Script_Extensions=Tai_Le} (Short:
\p{Tale}; NOT \p{Block=Tai_Le}) (45)
\p{Tai_Tham} \p{Script_Extensions=Tai_Tham} (Short:
\p{Lana}; NOT \p{Block=Tai_Tham}) (127)
\p{Tai_Viet} \p{Script_Extensions=Tai_Viet} (Short:
\p{Tavt}; NOT \p{Block=Tai_Viet}) (72)
X \p{Tai_Xuan_Jing} \p{Tai_Xuan_Jing_Symbols} (= \p{Block=
Tai_Xuan_Jing_Symbols}) (96)
X \p{Tai_Xuan_Jing_Symbols} \p{Block=Tai_Xuan_Jing_Symbols} (Short:
\p{InTaiXuanJing}) (96)
\p{Takr} \p{Takri} (= \p{Script_Extensions=Takri})
(NOT \p{Block=Takri}) (78)
\p{Takri} \p{Script_Extensions=Takri} (Short:
\p{Takr}; NOT \p{Block=Takri}) (78)
\p{Tale} \p{Tai_Le} (= \p{Script_Extensions=
Tai_Le}) (NOT \p{Block=Tai_Le}) (45)
\p{Talu} \p{New_Tai_Lue} (= \p{Script_Extensions=
New_Tai_Lue}) (NOT \p{Block=
New_Tai_Lue}) (83)
\p{Tamil} \p{Script_Extensions=Tamil} (Short:
\p{Taml}; NOT \p{Block=Tamil}) (80)
\p{Taml} \p{Tamil} (= \p{Script_Extensions=Tamil})
(NOT \p{Block=Tamil}) (80)
\p{Tang} \p{Tangut} (= \p{Script_Extensions=
Tangut}) (NOT \p{Block=Tangut}) (6881)
\p{Tangut} \p{Script_Extensions=Tangut} (Short:
\p{Tang}; NOT \p{Block=Tangut}) (6881)
X \p{Tangut_Components} \p{Block=Tangut_Components} (768)
\p{Tavt} \p{Tai_Viet} (= \p{Script_Extensions=
Tai_Viet}) (NOT \p{Block=Tai_Viet}) (72)
\p{Telu} \p{Telugu} (= \p{Script_Extensions=
Telugu}) (NOT \p{Block=Telugu}) (101)
\p{Telugu} \p{Script_Extensions=Telugu} (Short:
\p{Telu}; NOT \p{Block=Telugu}) (101)
\p{Term} \p{Terminal_Punctuation} (=
\p{Terminal_Punctuation=Y}) (246)
\p{Term: *} \p{Terminal_Punctuation: *}
\p{Terminal_Punctuation} \p{Terminal_Punctuation=Y} (Short:
\p{Term}) (246)
\p{Terminal_Punctuation: N*} (Short: \p{Term=N}, \P{Term})
(1_113_866 plus all above-Unicode code
points)
\p{Terminal_Punctuation: Y*} (Short: \p{Term=Y}, \p{Term}) (246)
\p{Tfng} \p{Tifinagh} (= \p{Script_Extensions=
Tifinagh}) (NOT \p{Block=Tifinagh}) (59)
\p{Tglg} \p{Tagalog} (= \p{Script_Extensions=
Tagalog}) (NOT \p{Block=Tagalog}) (22)
\p{Thaa} \p{Thaana} (= \p{Script_Extensions=
Thaana}) (NOT \p{Block=Thaana}) (65)
\p{Thaana} \p{Script_Extensions=Thaana} (Short:
\p{Thaa}; NOT \p{Block=Thaana}) (65)
\p{Thai} \p{Script_Extensions=Thai} (NOT \p{Block=
Thai}) (86)
\p{Tibetan} \p{Script_Extensions=Tibetan} (Short:
\p{Tibt}; NOT \p{Block=Tibetan}) (207)
\p{Tibt} \p{Tibetan} (= \p{Script_Extensions=
Tibetan}) (NOT \p{Block=Tibetan}) (207)
\p{Tifinagh} \p{Script_Extensions=Tifinagh} (Short:
\p{Tfng}; NOT \p{Block=Tifinagh}) (59)
\p{Tirh} \p{Tirhuta} (= \p{Script_Extensions=
Tirhuta}) (NOT \p{Block=Tirhuta}) (94)
\p{Tirhuta} \p{Script_Extensions=Tirhuta} (Short:
\p{Tirh}; NOT \p{Block=Tirhuta}) (94)
\p{Title} \p{Titlecase} (/i= Cased=Yes) (31)
\p{Titlecase} (= \p{Gc=Lt}) (Short: \p{Title}; /i=
Cased=Yes) (31)
\p{Titlecase_Letter} \p{General_Category=Titlecase_Letter}
(Short: \p{Lt}; /i= General_Category=
Cased_Letter) (31)
X \p{Transport_And_Map} \p{Transport_And_Map_Symbols} (= \p{Block=
Transport_And_Map_Symbols}) (128)
X \p{Transport_And_Map_Symbols} \p{Block=Transport_And_Map_Symbols}
(Short: \p{InTransportAndMap}) (128)
X \p{UCAS} \p{Unified_Canadian_Aboriginal_Syllabics}
(= \p{Block=
Unified_Canadian_Aboriginal_Syllabics})
(640)
X \p{UCAS_Ext} \p{Unified_Canadian_Aboriginal_Syllabics_-
Extended} (= \p{Block=
Unified_Canadian_Aboriginal_Syllabics_-
Extended}) (80)
\p{Ugar} \p{Ugaritic} (= \p{Script_Extensions=
Ugaritic}) (NOT \p{Block=Ugaritic}) (31)
\p{Ugaritic} \p{Script_Extensions=Ugaritic} (Short:
\p{Ugar}; NOT \p{Block=Ugaritic}) (31)
\p{UIdeo} \p{Unified_Ideograph} (=
\p{Unified_Ideograph=Y}) (80_388)
\p{UIdeo: *} \p{Unified_Ideograph: *}
\p{Unassigned} \p{General_Category=Unassigned} (Short:
\p{Cn}) (846_359 plus all above-Unicode
code points)
\p{Unicode} \p{Any} (1_114_112)
X \p{Unified_Canadian_Aboriginal_Syllabics} \p{Block=
Unified_Canadian_Aboriginal_Syllabics}
(Short: \p{InUCAS}) (640)
X \p{Unified_Canadian_Aboriginal_Syllabics_Extended} \p{Block=
Unified_Canadian_Aboriginal_Syllabics_-
Extended} (Short: \p{InUCASExt}) (80)
\p{Unified_Ideograph} \p{Unified_Ideograph=Y} (Short: \p{UIdeo})
(80_388)
\p{Unified_Ideograph: N*} (Short: \p{UIdeo=N}, \P{UIdeo})
(1_033_724 plus all above-Unicode code
points)
\p{Unified_Ideograph: Y*} (Short: \p{UIdeo=Y}, \p{UIdeo}) (80_388)
\p{Unknown} \p{Script_Extensions=Unknown} (Short:
\p{Zzzz}) (985_875 plus all above-
Unicode code points)
\p{Upper} \p{XPosixUpper} (= \p{Uppercase=Y}) (/i=
Cased=Yes) (1822)
\p{Upper: *} \p{Uppercase: *}
\p{Uppercase} \p{XPosixUpper} (= \p{Uppercase=Y}) (/i=
Cased=Yes) (1822)
\p{Uppercase: N*} (Short: \p{Upper=N}, \P{Upper}; /i= Cased=
No) (1_112_290 plus all above-Unicode
code points)
\p{Uppercase: Y*} (Short: \p{Upper=Y}, \p{Upper}; /i= Cased=
Yes) (1822)
\p{Uppercase_Letter} \p{General_Category=Uppercase_Letter}
(Short: \p{Lu}; /i= General_Category=
Cased_Letter) (1702)
\p{Vai} \p{Script_Extensions=Vai} (NOT \p{Block=
Vai}) (300)
\p{Vaii} \p{Vai} (= \p{Script_Extensions=Vai}) (NOT
\p{Block=Vai}) (300)
\p{Variation_Selector} \p{Variation_Selector=Y} (Short: \p{VS};
NOT \p{Variation_Selectors}) (259)
\p{Variation_Selector: N*} (Short: \p{VS=N}, \P{VS}) (1_113_853
plus all above-Unicode code points)
\p{Variation_Selector: Y*} (Short: \p{VS=Y}, \p{VS}) (259)
X \p{Variation_Selectors} \p{Block=Variation_Selectors} (Short:
\p{InVS}) (16)
X \p{Variation_Selectors_Supplement} \p{Block=
Variation_Selectors_Supplement} (Short:
\p{InVSSup}) (240)
X \p{Vedic_Ext} \p{Vedic_Extensions} (= \p{Block=
Vedic_Extensions}) (48)
X \p{Vedic_Extensions} \p{Block=Vedic_Extensions} (Short:
\p{InVedicExt}) (48)
X \p{Vertical_Forms} \p{Block=Vertical_Forms} (16)
\p{VertSpace} \v (7)
\p{VS} \p{Variation_Selector} (=
\p{Variation_Selector=Y}) (NOT
\p{Variation_Selectors}) (259)
\p{VS: *} \p{Variation_Selector: *}
X \p{VS_Sup} \p{Variation_Selectors_Supplement} (=
\p{Block=
Variation_Selectors_Supplement}) (240)
\p{Wara} \p{Warang_Citi} (= \p{Script_Extensions=
Warang_Citi}) (NOT \p{Block=
Warang_Citi}) (84)
\p{Warang_Citi} \p{Script_Extensions=Warang_Citi} (Short:
\p{Wara}; NOT \p{Block=Warang_Citi}) (84)
\p{WB: *} \p{Word_Break: *}
\p{White_Space} \p{White_Space=Y} (Short: \p{Space}) (25)
\p{White_Space: N*} (Short: \p{Space=N}, \P{Space}) (1_114_087
plus all above-Unicode code points)
\p{White_Space: Y*} (Short: \p{Space=Y}, \p{Space}) (25)
\p{Word} \p{XPosixWord} (119_821)
\p{Word_Break: ALetter} (Short: \p{WB=LE}) (27_992)
\p{Word_Break: CR} (Short: \p{WB=CR}) (1)
\p{Word_Break: Double_Quote} (Short: \p{WB=DQ}) (1)
\p{Word_Break: DQ} \p{Word_Break=Double_Quote} (1)
\p{Word_Break: E_Base} (Short: \p{WB=EB}) (79)
\p{Word_Break: E_Base_GAZ} (Short: \p{WB=EBG}) (4)
\p{Word_Break: E_Modifier} (Short: \p{WB=EM}) (5)
\p{Word_Break: EB} \p{Word_Break=E_Base} (79)
\p{Word_Break: EBG} \p{Word_Break=E_Base_GAZ} (4)
\p{Word_Break: EM} \p{Word_Break=E_Modifier} (5)
\p{Word_Break: EX} \p{Word_Break=ExtendNumLet} (11)
\p{Word_Break: Extend} (Short: \p{WB=Extend}) (2196)
\p{Word_Break: ExtendNumLet} (Short: \p{WB=EX}) (11)
\p{Word_Break: FO} \p{Word_Break=Format} (52)
\p{Word_Break: Format} (Short: \p{WB=FO}) (52)
\p{Word_Break: GAZ} \p{Word_Break=Glue_After_Zwj} (3)
\p{Word_Break: Glue_After_Zwj} (Short: \p{WB=GAZ}) (3)
\p{Word_Break: Hebrew_Letter} (Short: \p{WB=HL}) (74)
\p{Word_Break: HL} \p{Word_Break=Hebrew_Letter} (74)
\p{Word_Break: KA} \p{Word_Break=Katakana} (310)
\p{Word_Break: Katakana} (Short: \p{WB=KA}) (310)
\p{Word_Break: LE} \p{Word_Break=ALetter} (27_992)
\p{Word_Break: LF} (Short: \p{WB=LF}) (1)
\p{Word_Break: MB} \p{Word_Break=MidNumLet} (7)
\p{Word_Break: MidLetter} (Short: \p{WB=ML}) (9)
\p{Word_Break: MidNum} (Short: \p{WB=MN}) (15)
\p{Word_Break: MidNumLet} (Short: \p{WB=MB}) (7)
\p{Word_Break: ML} \p{Word_Break=MidLetter} (9)
\p{Word_Break: MN} \p{Word_Break=MidNum} (15)
\p{Word_Break: Newline} (Short: \p{WB=NL}) (5)
\p{Word_Break: NL} \p{Word_Break=Newline} (5)
\p{Word_Break: NU} \p{Word_Break=Numeric} (571)
\p{Word_Break: Numeric} (Short: \p{WB=NU}) (571)
\p{Word_Break: Other} (Short: \p{WB=XX}) (1_082_748 plus all
above-Unicode code points)
\p{Word_Break: Regional_Indicator} (Short: \p{WB=RI}) (26)
\p{Word_Break: RI} \p{Word_Break=Regional_Indicator} (26)
\p{Word_Break: Single_Quote} (Short: \p{WB=SQ}) (1)
\p{Word_Break: SQ} \p{Word_Break=Single_Quote} (1)
\p{Word_Break: XX} \p{Word_Break=Other} (1_082_748 plus all
above-Unicode code points)
\p{Word_Break: ZWJ} (Short: \p{WB=ZWJ}) (1)
\p{WSpace} \p{White_Space} (= \p{White_Space=Y}) (25)
\p{WSpace: *} \p{White_Space: *}
\p{XDigit} \p{XPosixXDigit} (= \p{Hex_Digit=Y}) (44)
\p{XID_Continue} \p{XID_Continue=Y} (Short: \p{XIDC})
(119_672)
\p{XID_Continue: N*} (Short: \p{XIDC=N}, \P{XIDC}) (994_440
plus all above-Unicode code points)
\p{XID_Continue: Y*} (Short: \p{XIDC=Y}, \p{XIDC}) (119_672)
\p{XID_Start} \p{XID_Start=Y} (Short: \p{XIDS}) (116_984)
\p{XID_Start: N*} (Short: \p{XIDS=N}, \P{XIDS}) (997_128
plus all above-Unicode code points)
\p{XID_Start: Y*} (Short: \p{XIDS=Y}, \p{XIDS}) (116_984)
\p{XIDC} \p{XID_Continue} (= \p{XID_Continue=Y})
(119_672)
\p{XIDC: *} \p{XID_Continue: *}
\p{XIDS} \p{XID_Start} (= \p{XID_Start=Y}) (116_984)
\p{XIDS: *} \p{XID_Start: *}
\p{Xpeo} \p{Old_Persian} (= \p{Script_Extensions=
Old_Persian}) (NOT \p{Block=
Old_Persian}) (50)
\p{XPerlSpace} \p{XPosixSpace} (25)
\p{XPosixAlnum} Alphabetic and (decimal) Numeric (Short:
\p{Alnum}) (118_820)
\p{XPosixAlpha} \p{Alphabetic=Y} (Short: \p{Alpha})
(118_240)
\p{XPosixBlank} \h, Horizontal white space (Short:
\p{Blank}) (18)
\p{XPosixCntrl} \p{General_Category=Control} Control
characters (Short: \p{Cc}) (65)
\p{XPosixDigit} \p{General_Category=Decimal_Number} [0-9]
+ all other decimal digits (Short:
\p{Nd}) (580)
\p{XPosixGraph} Characters that are graphical (Short:
\p{Graph}) (265_621)
\p{XPosixLower} \p{Lowercase=Y} (Short: \p{Lower}; /i=
Cased=Yes) (2252)
\p{XPosixPrint} Characters that are graphical plus space
characters (but no controls) (Short:
\p{Print}) (265_638)
\p{XPosixPunct} \p{Punct} + ASCII-range \p{Symbol} (757)
\p{XPosixSpace} \s including beyond ASCII and vertical tab
(Short: \p{SpacePerl}) (25)
\p{XPosixUpper} \p{Uppercase=Y} (Short: \p{Upper}; /i=
Cased=Yes) (1822)
\p{XPosixWord} \w, including beyond ASCII; = \p{Alnum} +
\pM + \p{Pc} + \p{Join_Control} (Short:
\p{Word}) (119_821)
\p{XPosixXDigit} \p{Hex_Digit=Y} (Short: \p{Hex}) (44)
\p{Xsux} \p{Cuneiform} (= \p{Script_Extensions=
Cuneiform}) (NOT \p{Block=Cuneiform})
(1234)
\p{Yi} \p{Script_Extensions=Yi} (1246)
X \p{Yi_Radicals} \p{Block=Yi_Radicals} (64)
X \p{Yi_Syllables} \p{Block=Yi_Syllables} (1168)
\p{Yiii} \p{Yi} (= \p{Script_Extensions=Yi}) (1246)
X \p{Yijing} \p{Yijing_Hexagram_Symbols} (= \p{Block=
Yijing_Hexagram_Symbols}) (64)
X \p{Yijing_Hexagram_Symbols} \p{Block=Yijing_Hexagram_Symbols}
(Short: \p{InYijing}) (64)
\p{Z} \pZ \p{Separator} (= \p{General_Category=
Separator}) (19)
\p{Zinh} \p{Inherited} (= \p{Script_Extensions=
Inherited}) (496)
\p{Zl} \p{Line_Separator} (= \p{General_Category=
Line_Separator}) (1)
\p{Zp} \p{Paragraph_Separator} (=
\p{General_Category=
Paragraph_Separator}) (1)
\p{Zs} \p{Space_Separator} (=
\p{General_Category=Space_Separator})
(17)
\p{Zyyy} \p{Common} (= \p{Script_Extensions=
Common}) (6864)
\p{Zzzz} \p{Unknown} (= \p{Script_Extensions=
Unknown}) (985_875 plus all above-
Unicode code points)
TX\p{_CanonDCIJ} (For internal use by Perl, not necessarily
stable) (= \p{Soft_Dotted=Y}) (46)
TX\p{_Case_Ignorable} (For internal use by Perl, not necessarily
stable) (= \p{Case_Ignorable=Y}) (2240)
TX\p{_CombAbove} (For internal use by Perl, not necessarily
stable) (= \p{Canonical_Combining_Class=
Above}) (461)
=head2 Legal C<\p{}> and C<\P{}> constructs that match no characters
Unicode has some property-value pairs that currently don't match anything.
This happens generally either because they are obsolete, or they exist for
symmetry with other forms, but no language has yet been encoded that uses
them. In this version of Unicode, the following match zero code points:
=over 4
=item \p{Canonical_Combining_Class=Attached_Below_Left}
=item \p{Canonical_Combining_Class=CCC133}
=back
=head1 Properties accessible through Unicode::UCD
The value of any Unicode (not including Perl extensions) character
property mentioned above for any single code point is available through
L<Unicode::UCD/charprop()>. L<Unicode::UCD/charprops_all()> returns the
values of all the Unicode properties for a given code point.
Besides these, all the Unicode character properties mentioned above
(except for those marked as for internal use by Perl) are also
accessible by L<Unicode::UCD/prop_invlist()>.
Due to their nature, not all Unicode character properties are suitable for
regular expression matches, nor C<prop_invlist()>. The remaining
non-provisional, non-internal ones are accessible via
L<Unicode::UCD/prop_invmap()> (except for those that this Perl installation
hasn't included; see L<below for which those are|/Unicode character properties
that are NOT accepted by Perl>).
For compatibility with other parts of Perl, all the single forms given in the
table in the L<section above|/Properties accessible through \p{} and \P{}>
are recognized. BUT, there are some ambiguities between some Perl extensions
and the Unicode properties, all of which are silently resolved in favor of the
official Unicode property. To avoid surprises, you should only use
C<prop_invmap()> for forms listed in the table below, which omits the
non-recommended ones. The affected forms are the Perl single form equivalents
of Unicode properties, such as C<\p{sc}> being a single-form equivalent of
C<\p{gc=sc}>, which is treated by C<prop_invmap()> as the C<Script> property,
whose short name is C<sc>. The table indicates the current ambiguities in the
INFO column, beginning with the word C<"NOT">.
The standard Unicode properties listed below are documented in
L<http://www.unicode.org/reports/tr44/>; Perl_Decimal_Digit is documented in
L<Unicode::UCD/prop_invmap()>. The other Perl extensions are in
L<perlunicode/Other Properties>;
The first column in the table is a name for the property; the second column is
an alternative name, if any, plus possibly some annotations. The alternative
name is the property's full name, unless that would simply repeat the first
column, in which case the second column indicates the property's short name
(if different). The annotations are given only in the entry for the full
name. If a property is obsolete, etc, the entry will be flagged with the same
characters used in the table in the L<section above|/Properties accessible
through \p{} and \P{}>, like B<D> or B<S>.
NAME INFO
Age
AHex ASCII_Hex_Digit
All (Perl extension). All code points,
including those above Unicode. Same as
qr/./s
Alnum XPosixAlnum. (Perl extension)
Alpha Alphabetic
Alphabetic (Short: Alpha)
Any (Perl extension). All Unicode code
points: [\x{0000}-\x{10FFFF}]
ASCII Block=ASCII. (Perl extension).
[[:ASCII:]]
ASCII_Hex_Digit (Short: AHex)
Assigned (Perl extension). All assigned code points
Bc Bidi_Class
Bidi_C Bidi_Control
Bidi_Class (Short: bc)
Bidi_Control (Short: Bidi_C)
Bidi_M Bidi_Mirrored
Bidi_Mirrored (Short: Bidi_M)
Bidi_Mirroring_Glyph (Short: bmg)
Bidi_Paired_Bracket (Short: bpb)
Bidi_Paired_Bracket_Type (Short: bpt)
Blank XPosixBlank. (Perl extension)
Blk Block
Block (Short: blk)
Bmg Bidi_Mirroring_Glyph
Bpb Bidi_Paired_Bracket
Bpt Bidi_Paired_Bracket_Type
Canonical_Combining_Class (Short: ccc)
Case_Folding (Short: cf)
Case_Ignorable (Short: CI)
Cased
Category General_Category
Ccc Canonical_Combining_Class
CE Composition_Exclusion
Cf Case_Folding; NOT 'cf' meaning
'General_Category=Format'
Changes_When_Casefolded (Short: CWCF)
Changes_When_Casemapped (Short: CWCM)
Changes_When_Lowercased (Short: CWL)
Changes_When_NFKC_Casefolded (Short: CWKCF)
Changes_When_Titlecased (Short: CWT)
Changes_When_Uppercased (Short: CWU)
CI Case_Ignorable
Cntrl General_Category=XPosixCntrl. (Perl
extension)
Comp_Ex Full_Composition_Exclusion
Composition_Exclusion (Short: CE)
CWCF Changes_When_Casefolded
CWCM Changes_When_Casemapped
CWKCF Changes_When_NFKC_Casefolded
CWL Changes_When_Lowercased
CWT Changes_When_Titlecased
CWU Changes_When_Uppercased
Dash
Decomposition_Mapping (Short: dm)
Decomposition_Type (Short: dt)
Default_Ignorable_Code_Point (Short: DI)
Dep Deprecated
Deprecated (Short: Dep)
DI Default_Ignorable_Code_Point
Dia Diacritic
Diacritic (Short: Dia)
Digit General_Category=XPosixDigit. (Perl
extension)
Dm Decomposition_Mapping
Dt Decomposition_Type
Ea East_Asian_Width
East_Asian_Width (Short: ea)
Ext Extender
Extender (Short: Ext)
Full_Composition_Exclusion (Short: Comp_Ex)
Gc General_Category
GCB Grapheme_Cluster_Break
General_Category (Short: gc)
Gr_Base Grapheme_Base
Gr_Ext Grapheme_Extend
Graph XPosixGraph. (Perl extension)
Grapheme_Base (Short: Gr_Base)
Grapheme_Cluster_Break (Short: GCB)
Grapheme_Extend (Short: Gr_Ext)
Hangul_Syllable_Type (Short: hst)
Hex Hex_Digit
Hex_Digit (Short: Hex)
HorizSpace XPosixBlank. (Perl extension)
Hst Hangul_Syllable_Type
D Hyphen Supplanted by Line_Break property values;
see www.unicode.org/reports/tr14
ID_Continue (Short: IDC)
ID_Start (Short: IDS)
IDC ID_Continue
Ideo Ideographic
Ideographic (Short: Ideo)
IDS ID_Start
IDS_Binary_Operator (Short: IDSB)
IDS_Trinary_Operator (Short: IDST)
IDSB IDS_Binary_Operator
IDST IDS_Trinary_Operator
In Present_In. (Perl extension)
Indic_Positional_Category (Short: InPC)
Indic_Syllabic_Category (Short: InSC)
InPC Indic_Positional_Category
InSC Indic_Syllabic_Category
Isc ISO_Comment; NOT 'isc' meaning
'General_Category=Other'
ISO_Comment (Short: isc)
Jg Joining_Group
Join_C Join_Control
Join_Control (Short: Join_C)
Joining_Group (Short: jg)
Joining_Type (Short: jt)
Jt Joining_Type
Lb Line_Break
Lc Lowercase_Mapping; NOT 'lc' meaning
'General_Category=Cased_Letter'
Line_Break (Short: lb)
LOE Logical_Order_Exception
Logical_Order_Exception (Short: LOE)
Lower Lowercase
Lowercase (Short: Lower)
Lowercase_Mapping (Short: lc)
Math
Na Name
Na1 Unicode_1_Name
Name (Short: na)
Name_Alias
NChar Noncharacter_Code_Point
NFC_QC NFC_Quick_Check
NFC_Quick_Check (Short: NFC_QC)
NFD_QC NFD_Quick_Check
NFD_Quick_Check (Short: NFD_QC)
NFKC_Casefold (Short: NFKC_CF)
NFKC_CF NFKC_Casefold
NFKC_QC NFKC_Quick_Check
NFKC_Quick_Check (Short: NFKC_QC)
NFKD_QC NFKD_Quick_Check
NFKD_Quick_Check (Short: NFKD_QC)
Noncharacter_Code_Point (Short: NChar)
Nt Numeric_Type
Numeric_Type (Short: nt)
Numeric_Value (Short: nv)
Nv Numeric_Value
Pat_Syn Pattern_Syntax
Pat_WS Pattern_White_Space
Pattern_Syntax (Short: Pat_Syn)
Pattern_White_Space (Short: Pat_WS)
PCM Prepended_Concatenation_Mark
Perl_Decimal_Digit (Perl extension)
PerlSpace PosixSpace. (Perl extension)
PerlWord PosixWord. (Perl extension)
PosixAlnum (Perl extension). [A-Za-z0-9]
PosixAlpha (Perl extension). [A-Za-z]
PosixBlank (Perl extension). \t and ' '
PosixCntrl (Perl extension). ASCII control
characters: NUL, SOH, STX, ETX, EOT, ENQ,
ACK, BEL, BS, HT, LF, VT, FF, CR, SO, SI,
DLE, DC1, DC2, DC3, DC4, NAK, SYN, ETB,
CAN, EOM, SUB, ESC, FS, GS, RS, US, and DEL
PosixDigit (Perl extension). [0-9]
PosixGraph (Perl extension). [-!"#$%&'()*+,./:;<=
>?@[\\]^_`{|}~0-9A-Za-z]
PosixLower (Perl extension). [a-z]
PosixPrint (Perl extension). [- 0-9A-Za-
z!"#$%&'()*+,./:;<=>?@[\\]^_`{|}~]
PosixPunct (Perl extension). [-!"#$%&'()*+,./:;<=
>?@[\\]^_`{|}~]
PosixSpace (Perl extension). \t, \n, \cK, \f, \r,
and ' '. (\cK is vertical tab)
PosixUpper (Perl extension). [A-Z]
PosixWord (Perl extension). \w, restricted to ASCII
= [A-Za-z0-9_]
PosixXDigit (Perl extension). [0-9A-Fa-f]
Prepended_Concatenation_Mark (Short: PCM)
Present_In (Short: In). (Perl extension)
Print XPosixPrint. (Perl extension)
Punct General_Category=Punct. (Perl extension)
QMark Quotation_Mark
Quotation_Mark (Short: QMark)
Radical
SB Sentence_Break
Sc Script; NOT 'sc' meaning
'General_Category=Currency_Symbol'
Scf Simple_Case_Folding
Script (Short: sc)
Script_Extensions (Short: scx)
Scx Script_Extensions
SD Soft_Dotted
Sentence_Break (Short: SB)
Sentence_Terminal (Short: STerm)
Sfc Simple_Case_Folding
Simple_Case_Folding (Short: scf)
Simple_Lowercase_Mapping (Short: slc)
Simple_Titlecase_Mapping (Short: stc)
Simple_Uppercase_Mapping (Short: suc)
Slc Simple_Lowercase_Mapping
Soft_Dotted (Short: SD)
Space White_Space
SpacePerl XPosixSpace. (Perl extension)
Stc Simple_Titlecase_Mapping
STerm Sentence_Terminal
Suc Simple_Uppercase_Mapping
Tc Titlecase_Mapping
Term Terminal_Punctuation
Terminal_Punctuation (Short: Term)
Title Titlecase. (Perl extension)
Titlecase (Short: Title). (Perl extension). (=
\p{Gc=Lt})
Titlecase_Mapping (Short: tc)
Uc Uppercase_Mapping
UIdeo Unified_Ideograph
Unicode Any. (Perl extension)
Unicode_1_Name (Short: na1)
Unified_Ideograph (Short: UIdeo)
Upper Uppercase
Uppercase (Short: Upper)
Uppercase_Mapping (Short: uc)
Variation_Selector (Short: VS)
VertSpace (Perl extension). \v
VS Variation_Selector
WB Word_Break
White_Space (Short: WSpace)
Word XPosixWord. (Perl extension)
Word_Break (Short: WB)
WSpace White_Space
XDigit XPosixXDigit. (Perl extension)
XID_Continue (Short: XIDC)
XID_Start (Short: XIDS)
XIDC XID_Continue
XIDS XID_Start
XPerlSpace XPosixSpace. (Perl extension)
XPosixAlnum (Short: Alnum). (Perl extension).
Alphabetic and (decimal) Numeric
XPosixAlpha (Perl extension)
XPosixBlank (Short: Blank). (Perl extension). \h,
Horizontal white space
XPosixCntrl General_Category=XPosixCntrl (Short:
Cntrl). (Perl extension). Control
characters
XPosixDigit General_Category=XPosixDigit (Short:
Digit). (Perl extension). [0-9] + all
other decimal digits
XPosixGraph (Short: Graph). (Perl extension).
Characters that are graphical
XPosixLower (Perl extension)
XPosixPrint (Short: Print). (Perl extension).
Characters that are graphical plus space
characters (but no controls)
XPosixPunct (Perl extension). \p{Punct} + ASCII-range
\p{Symbol}
XPosixSpace (Perl extension). \s including beyond
ASCII and vertical tab
XPosixUpper (Perl extension)
XPosixWord (Short: Word). (Perl extension). \w,
including beyond ASCII; = \p{Alnum} + \pM
+ \p{Pc} + \p{Join_Control}
XPosixXDigit (Short: XDigit). (Perl extension)
=head1 Properties accessible through other means
Certain properties are accessible also via core function calls. These are:
Lowercase_Mapping lc() and lcfirst()
Titlecase_Mapping ucfirst()
Uppercase_Mapping uc()
Also, Case_Folding is accessible through the C</i> modifier in regular
expressions, the C<\F> transliteration escape, and the C<L<fc|perlfunc/fc>>
operator.
And, the Name and Name_Aliases properties are accessible through the C<\N{}>
interpolation in double-quoted strings and regular expressions; and functions
C<charnames::viacode()>, C<charnames::vianame()>, and
C<charnames::string_vianame()> (which require a C<use charnames ();> to be
specified.
Finally, most properties related to decomposition are accessible via
L<Unicode::Normalize>.
=head1 Unicode character properties that are NOT accepted by Perl
Perl will generate an error for a few character properties in Unicode when
used in a regular expression. The non-Unihan ones are listed below, with the
reasons they are not accepted, perhaps with work-arounds. The short names for
the properties are listed enclosed in (parentheses).
As described after the list, an installation can change the defaults and choose
to accept any of these. The list is machine generated based on the
choices made for the installation that generated this document.
=over 4
=item I<Expands_On_NFC> (XO_NFC)
=item I<Expands_On_NFD> (XO_NFD)
=item I<Expands_On_NFKC> (XO_NFKC)
=item I<Expands_On_NFKD> (XO_NFKD)
Deprecated by Unicode. These are characters that expand to more than one character in the specified normalization form, but whether they actually take up more bytes or not depends on the encoding being used. For example, a UTF-8 encoded character may expand to a different number of bytes than a UTF-32 encoded character.
=item I<Grapheme_Link> (Gr_Link)
Deprecated by Unicode: Duplicates ccc=vr (Canonical_Combining_Class=Virama)
=item I<Jamo_Short_Name> (JSN)
=item I<Other_Alphabetic> (OAlpha)
=item I<Other_Default_Ignorable_Code_Point> (ODI)
=item I<Other_Grapheme_Extend> (OGr_Ext)
=item I<Other_ID_Continue> (OIDC)
=item I<Other_ID_Start> (OIDS)
=item I<Other_Lowercase> (OLower)
=item I<Other_Math> (OMath)
=item I<Other_Uppercase> (OUpper)
Used by Unicode internally for generating other properties and not intended to be used stand-alone
=item I<Script=Katakana_Or_Hiragana> (sc=Hrkt)
Obsolete. All code points previously matched by this have been moved to "Script=Common". Consider instead using "Script_Extensions=Katakana" or "Script_Extensions=Hiragana" (or both)
=item I<Script_Extensions=Katakana_Or_Hiragana> (scx=Hrkt)
All code points that would be matched by this are matched by either "Script_Extensions=Katakana" or "Script_Extensions=Hiragana"
=back
An installation can choose to allow any of these to be matched by downloading
the Unicode database from L<http://www.unicode.org/Public/> to
C<$Config{privlib}>/F<unicore/> in the Perl source tree, changing the
controlling lists contained in the program
C<$Config{privlib}>/F<unicore/mktables> and then re-compiling and installing.
(C<%Config> is available from the Config module).
Also, perl can be recompiled to operate on an earlier version of the Unicode
standard. Further information is at
C<$Config{privlib}>/F<unicore/README.perl>.
=head1 Other information in the Unicode data base
The Unicode data base is delivered in two different formats. The XML version
is valid for more modern Unicode releases. The other version is a collection
of files. The two are intended to give equivalent information. Perl uses the
older form; this allows you to recompile Perl to use early Unicode releases.
The only non-character property that Perl currently supports is Named
Sequences, in which a sequence of code points
is given a name and generally treated as a single entity. (Perl supports
these via the C<\N{...}> double-quotish construct,
L<charnames/charnames::string_vianame(name)>, and L<Unicode::UCD/namedseq()>.
Below is a list of the files in the Unicode data base that Perl doesn't
currently use, along with very brief descriptions of their purposes.
Some of the names of the files have been shortened from those that Unicode
uses, in order to allow them to be distinguishable from similarly named files
on file systems for which only the first 8 characters of a name are
significant.
=over 4
=item F<auxiliary/GraphemeBreakTest.html>
=item F<auxiliary/LineBreakTest.html>
=item F<auxiliary/SentenceBreakTest.html>
=item F<auxiliary/WordBreakTest.html>
Documentation of validation Tests
=item F<BidiCharacterTest.txt>
=item F<BidiTest.txt>
=item F<NormTest.txt>
Validation Tests
=item F<CJKRadicals.txt>
Maps the kRSUnicode property values to corresponding code points
=item F<EmojiSources.txt>
Maps certain Unicode code points to their legacy Japanese cell-phone values
=item F<Index.txt>
Alphabetical index of Unicode characters
=item F<NamedSqProv.txt>
Named sequences proposed for inclusion in a later version of the Unicode Standard; if you need them now, you can append this file to F<NamedSequences.txt> and recompile perl
=item F<NamesList.html>
Describes the format and contents of F<NamesList.txt>
=item F<NamesList.txt>
Annotated list of characters
=item F<NormalizationCorrections.txt>
Documentation of corrections already incorporated into the Unicode data base
=item F<ReadMe.txt>
Documentation
=item F<StandardizedVariants.html>
Obsoleted as of Unicode 9.0, but previously provided a visual display of the standard variant sequences derived from F<StandardizedVariants.txt>.
=item F<StandardizedVariants.txt>
Certain glyph variations for character display are standardized. This lists the non-Unihan ones; the Unihan ones are also not used by Perl, and are in a separate Unicode data base L<http://www.unicode.org/ivd>
=item F<TangutSources.txt>
Specifies source mappings for Tangut ideographs and components. This data file also includes informative radical-stroke values that are used internally by Unicode
=item F<USourceData.txt>
Documentation of status and cross reference of proposals for encoding by Unicode of Unihan characters
=item F<USourceGlyphs.pdf>
Pictures of the characters in F<USourceData.txt>
=back
=head1 SEE ALSO
L<http://www.unicode.org/reports/tr44/>
L<perlrecharclass>
L<perlunicode>
PK z3�ZdSA A perlguts.podnu �[��� =head1 NAME
perlguts - Introduction to the Perl API
=head1 DESCRIPTION
This document attempts to describe how to use the Perl API, as well as
to provide some info on the basic workings of the Perl core. It is far
from complete and probably contains many errors. Please refer any
questions or comments to the author below.
=head1 Variables
=head2 Datatypes
Perl has three typedefs that handle Perl's three main data types:
SV Scalar Value
AV Array Value
HV Hash Value
Each typedef has specific routines that manipulate the various data types.
=head2 What is an "IV"?
Perl uses a special typedef IV which is a simple signed integer type that is
guaranteed to be large enough to hold a pointer (as well as an integer).
Additionally, there is the UV, which is simply an unsigned IV.
Perl also uses two special typedefs, I32 and I16, which will always be at
least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16,
as well.) They will usually be exactly 32 and 16 bits long, but on Crays
they will both be 64 bits.
=head2 Working with SVs
An SV can be created and loaded with one command. There are five types of
values that can be loaded: an integer value (IV), an unsigned integer
value (UV), a double (NV), a string (PV), and another scalar (SV).
("PV" stands for "Pointer Value". You might think that it is misnamed
because it is described as pointing only to strings. However, it is
possible to have it point to other things. For example, it could point
to an array of UVs. But,
using it for non-strings requires care, as the underlying assumption of
much of the internals is that PVs are just for strings. Often, for
example, a trailing C<NUL> is tacked on automatically. The non-string use
is documented only in this paragraph.)
The seven routines are:
SV* newSViv(IV);
SV* newSVuv(UV);
SV* newSVnv(double);
SV* newSVpv(const char*, STRLEN);
SV* newSVpvn(const char*, STRLEN);
SV* newSVpvf(const char*, ...);
SV* newSVsv(SV*);
C<STRLEN> is an integer type (Size_t, usually defined as size_t in
F<config.h>) guaranteed to be large enough to represent the size of
any string that perl can handle.
In the unlikely case of a SV requiring more complex initialization, you
can create an empty SV with newSV(len). If C<len> is 0 an empty SV of
type NULL is returned, else an SV of type PV is returned with len + 1 (for
the C<NUL>) bytes of storage allocated, accessible via SvPVX. In both cases
the SV has the undef value.
SV *sv = newSV(0); /* no storage allocated */
SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage
* allocated */
To change the value of an I<already-existing> SV, there are eight routines:
void sv_setiv(SV*, IV);
void sv_setuv(SV*, UV);
void sv_setnv(SV*, double);
void sv_setpv(SV*, const char*);
void sv_setpvn(SV*, const char*, STRLEN)
void sv_setpvf(SV*, const char*, ...);
void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *,
SV **, I32, bool *);
void sv_setsv(SV*, SV*);
Notice that you can choose to specify the length of the string to be
assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may
allow Perl to calculate the length by using C<sv_setpv> or by specifying
0 as the second argument to C<newSVpv>. Be warned, though, that Perl will
determine the string's length by using C<strlen>, which depends on the
string terminating with a C<NUL> character, and not otherwise containing
NULs.
The arguments of C<sv_setpvf> are processed like C<sprintf>, and the
formatted output becomes the value.
C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify
either a pointer to a variable argument list or the address and length of
an array of SVs. The last argument points to a boolean; on return, if that
boolean is true, then locale-specific information has been used to format
the string, and the string's contents are therefore untrustworthy (see
L<perlsec>). This pointer may be NULL if that information is not
important. Note that this function requires you to specify the length of
the format.
The C<sv_set*()> functions are not generic enough to operate on values
that have "magic". See L</Magic Virtual Tables> later in this document.
All SVs that contain strings should be terminated with a C<NUL> character.
If it is not C<NUL>-terminated there is a risk of
core dumps and corruptions from code which passes the string to C
functions or system calls which expect a C<NUL>-terminated string.
Perl's own functions typically add a trailing C<NUL> for this reason.
Nevertheless, you should be very careful when you pass a string stored
in an SV to a C function or system call.
To access the actual value that an SV points to, you can use the macros:
SvIV(SV*)
SvUV(SV*)
SvNV(SV*)
SvPV(SV*, STRLEN len)
SvPV_nolen(SV*)
which will automatically coerce the actual scalar type into an IV, UV, double,
or string.
In the C<SvPV> macro, the length of the string returned is placed into the
variable C<len> (this is a macro, so you do I<not> use C<&len>). If you do
not care what the length of the data is, use the C<SvPV_nolen> macro.
Historically the C<SvPV> macro with the global variable C<PL_na> has been
used in this case. But that can be quite inefficient because C<PL_na> must
be accessed in thread-local storage in threaded Perl. In any case, remember
that Perl allows arbitrary strings of data that may both contain NULs and
might not be terminated by a C<NUL>.
Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len),
len);>. It might work with your
compiler, but it won't work for everyone.
Break this sort of statement up into separate assignments:
SV *s;
STRLEN len;
char *ptr;
ptr = SvPV(s, len);
foo(ptr, len);
If you want to know if the scalar value is TRUE, you can use:
SvTRUE(SV*)
Although Perl will automatically grow strings for you, if you need to force
Perl to allocate more memory for your SV, you can use the macro
SvGROW(SV*, STRLEN newlen)
which will determine if more memory needs to be allocated. If so, it will
call the function C<sv_grow>. Note that C<SvGROW> can only increase, not
decrease, the allocated memory of an SV and that it does not automatically
add space for the trailing C<NUL> byte (perl's own string functions typically do
C<SvGROW(sv, len + 1)>).
If you want to write to an existing SV's buffer and set its value to a
string, use SvPV_force() or one of its variants to force the SV to be
a PV. This will remove any of various types of non-stringness from
the SV while preserving the content of the SV in the PV. This can be
used, for example, to append data from an API function to a buffer
without extra copying:
(void)SvPVbyte_force(sv, len);
s = SvGROW(sv, len + needlen + 1);
/* something that modifies up to needlen bytes at s+len, but
modifies newlen bytes
eg. newlen = read(fd, s + len, needlen);
ignoring errors for these examples
*/
s[len + newlen] = '\0';
SvCUR_set(sv, len + newlen);
SvUTF8_off(sv);
SvSETMAGIC(sv);
If you already have the data in memory or if you want to keep your
code simple, you can use one of the sv_cat*() variants, such as
sv_catpvn(). If you want to insert anywhere in the string you can use
sv_insert() or sv_insert_flags().
If you don't need the existing content of the SV, you can avoid some
copying with:
SvPVCLEAR(sv);
s = SvGROW(sv, needlen + 1);
/* something that modifies up to needlen bytes at s, but modifies
newlen bytes
eg. newlen = read(fd, s. needlen);
*/
s[newlen] = '\0';
SvCUR_set(sv, newlen);
SvPOK_only(sv); /* also clears SVf_UTF8 */
SvSETMAGIC(sv);
Again, if you already have the data in memory or want to avoid the
complexity of the above, you can use sv_setpvn().
If you have a buffer allocated with Newx() and want to set that as the
SV's value, you can use sv_usepvn_flags(). That has some requirements
if you want to avoid perl re-allocating the buffer to fit the trailing
NUL:
Newx(buf, somesize+1, char);
/* ... fill in buf ... */
buf[somesize] = '\0';
sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC | SV_HAS_TRAILING_NUL);
/* buf now belongs to perl, don't release it */
If you have an SV and want to know what kind of data Perl thinks is stored
in it, you can use the following macros to check the type of SV you have.
SvIOK(SV*)
SvNOK(SV*)
SvPOK(SV*)
You can get and set the current length of the string stored in an SV with
the following macros:
SvCUR(SV*)
SvCUR_set(SV*, I32 val)
You can also get a pointer to the end of the string stored in the SV
with the macro:
SvEND(SV*)
But note that these last three macros are valid only if C<SvPOK()> is true.
If you want to append something to the end of string stored in an C<SV*>,
you can use the following functions:
void sv_catpv(SV*, const char*);
void sv_catpvn(SV*, const char*, STRLEN);
void sv_catpvf(SV*, const char*, ...);
void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **,
I32, bool);
void sv_catsv(SV*, SV*);
The first function calculates the length of the string to be appended by
using C<strlen>. In the second, you specify the length of the string
yourself. The third function processes its arguments like C<sprintf> and
appends the formatted output. The fourth function works like C<vsprintf>.
You can specify the address and length of an array of SVs instead of the
va_list argument. The fifth function
extends the string stored in the first
SV with the string stored in the second SV. It also forces the second SV
to be interpreted as a string.
The C<sv_cat*()> functions are not generic enough to operate on values that
have "magic". See L</Magic Virtual Tables> later in this document.
If you know the name of a scalar variable, you can get a pointer to its SV
by using the following:
SV* get_sv("package::varname", 0);
This returns NULL if the variable does not exist.
If you want to know if this variable (or any other SV) is actually C<defined>,
you can call:
SvOK(SV*)
The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>.
Its address can be used whenever an C<SV*> is needed. Make sure that
you don't try to compare a random sv with C<&PL_sv_undef>. For example
when interfacing Perl code, it'll work correctly for:
foo(undef);
But won't work when called as:
$x = undef;
foo($x);
So to repeat always use SvOK() to check whether an sv is defined.
Also you have to be careful when using C<&PL_sv_undef> as a value in
AVs or HVs (see L</AVs, HVs and undefined values>).
There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain
boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their
addresses can be used whenever an C<SV*> is needed.
Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>.
Take this code:
SV* sv = (SV*) 0;
if (I-am-to-return-a-real-value) {
sv = sv_2mortal(newSViv(42));
}
sv_setsv(ST(0), sv);
This code tries to return a new SV (which contains the value 42) if it should
return a real value, or undef otherwise. Instead it has returned a NULL
pointer which, somewhere down the line, will cause a segmentation violation,
bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the
first line and all will be well.
To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this
call is not necessary (see L</Reference Counts and Mortality>).
=head2 Offsets
Perl provides the function C<sv_chop> to efficiently remove characters
from the beginning of a string; you give it an SV and a pointer to
somewhere inside the PV, and it discards everything before the
pointer. The efficiency comes by means of a little hack: instead of
actually removing the characters, C<sv_chop> sets the flag C<OOK>
(offset OK) to signal to other functions that the offset hack is in
effect, and it moves the PV pointer (called C<SvPVX>) forward
by the number of bytes chopped off, and adjusts C<SvCUR> and C<SvLEN>
accordingly. (A portion of the space between the old and new PV
pointers is used to store the count of chopped bytes.)
Hence, at this point, the start of the buffer that we allocated lives
at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing
into the middle of this allocated storage.
This is best demonstrated by example. Normally copy-on-write will prevent
the substitution from operator from using this hack, but if you can craft a
string for which copy-on-write is not possible, you can see it in play. In
the current implementation, the final byte of a string buffer is used as a
copy-on-write reference count. If the buffer is not big enough, then
copy-on-write is skipped. First have a look at an empty string:
% ./perl -Ilib -MDevel::Peek -le '$a=""; $a .= ""; Dump $a'
SV = PV(0x7ffb7c008a70) at 0x7ffb7c030390
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x7ffb7bc05b50 ""\0
CUR = 0
LEN = 10
Notice here the LEN is 10. (It may differ on your platform.) Extend the
length of the string to one less than 10, and do a substitution:
% ./perl -Ilib -MDevel::Peek -le '$a=""; $a.="123456789"; $a=~s/.//; \
Dump($a)'
SV = PV(0x7ffa04008a70) at 0x7ffa04030390
REFCNT = 1
FLAGS = (POK,OOK,pPOK)
OFFSET = 1
PV = 0x7ffa03c05b61 ( "\1" . ) "23456789"\0
CUR = 8
LEN = 9
Here the number of bytes chopped off (1) is shown next as the OFFSET. The
portion of the string between the "real" and the "fake" beginnings is
shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect
the fake beginning, not the real one. (The first character of the string
buffer happens to have changed to "\1" here, not "1", because the current
implementation stores the offset count in the string buffer. This is
subject to change.)
Something similar to the offset hack is performed on AVs to enable
efficient shifting and splicing off the beginning of the array; while
C<AvARRAY> points to the first element in the array that is visible from
Perl, C<AvALLOC> points to the real start of the C array. These are
usually the same, but a C<shift> operation can be carried out by
increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvMAX>.
Again, the location of the real start of the C array only comes into
play when freeing the array. See C<av_shift> in F<av.c>.
=head2 What's Really Stored in an SV?
Recall that the usual method of determining the type of scalar you have is
to use C<Sv*OK> macros. Because a scalar can be both a number and a string,
usually these macros will always return TRUE and calling the C<Sv*V>
macros will do the appropriate conversion of string to integer/double or
integer/double to string.
If you I<really> need to know if you have an integer, double, or string
pointer in an SV, you can use the following three macros instead:
SvIOKp(SV*)
SvNOKp(SV*)
SvPOKp(SV*)
These will tell you if you truly have an integer, double, or string pointer
stored in your SV. The "p" stands for private.
There are various ways in which the private and public flags may differ.
For example, in perl 5.16 and earlier a tied SV may have a valid
underlying value in the IV slot (so SvIOKp is true), but the data
should be accessed via the FETCH routine rather than directly,
so SvIOK is false. (In perl 5.18 onwards, tied scalars use
the flags the same way as untied scalars.) Another is when
numeric conversion has occurred and precision has been lost: only the
private flag is set on 'lossy' values. So when an NV is converted to an
IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be.
In general, though, it's best to use the C<Sv*V> macros.
=head2 Working with AVs
There are two ways to create and load an AV. The first method creates an
empty AV:
AV* newAV();
The second method both creates the AV and initially populates it with SVs:
AV* av_make(SSize_t num, SV **ptr);
The second argument points to an array containing C<num> C<SV*>'s. Once the
AV has been created, the SVs can be destroyed, if so desired.
Once the AV has been created, the following operations are possible on it:
void av_push(AV*, SV*);
SV* av_pop(AV*);
SV* av_shift(AV*);
void av_unshift(AV*, SSize_t num);
These should be familiar operations, with the exception of C<av_unshift>.
This routine adds C<num> elements at the front of the array with the C<undef>
value. You must then use C<av_store> (described below) to assign values
to these new elements.
Here are some other functions:
SSize_t av_top_index(AV*);
SV** av_fetch(AV*, SSize_t key, I32 lval);
SV** av_store(AV*, SSize_t key, SV* val);
The C<av_top_index> function returns the highest index value in an array (just
like $#array in Perl). If the array is empty, -1 is returned. The
C<av_fetch> function returns the value at index C<key>, but if C<lval>
is non-zero, then C<av_fetch> will store an undef value at that index.
The C<av_store> function stores the value C<val> at index C<key>, and does
not increment the reference count of C<val>. Thus the caller is responsible
for taking care of that, and if C<av_store> returns NULL, the caller will
have to decrement the reference count to avoid a memory leak. Note that
C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their
return value.
A few more:
void av_clear(AV*);
void av_undef(AV*);
void av_extend(AV*, SSize_t key);
The C<av_clear> function deletes all the elements in the AV* array, but
does not actually delete the array itself. The C<av_undef> function will
delete all the elements in the array plus the array itself. The
C<av_extend> function extends the array so that it contains at least C<key+1>
elements. If C<key+1> is less than the currently allocated length of the array,
then nothing is done.
If you know the name of an array variable, you can get a pointer to its AV
by using the following:
AV* get_av("package::varname", 0);
This returns NULL if the variable does not exist.
See L</Understanding the Magic of Tied Hashes and Arrays> for more
information on how to use the array access functions on tied arrays.
=head2 Working with HVs
To create an HV, you use the following routine:
HV* newHV();
Once the HV has been created, the following operations are possible on it:
SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval);
The C<klen> parameter is the length of the key being passed in (Note that
you cannot pass 0 in as a value of C<klen> to tell Perl to measure the
length of the key). The C<val> argument contains the SV pointer to the
scalar being stored, and C<hash> is the precomputed hash value (zero if
you want C<hv_store> to calculate it for you). The C<lval> parameter
indicates whether this fetch is actually a part of a store operation, in
which case a new undefined value will be added to the HV with the supplied
key and C<hv_fetch> will return as if the value had already existed.
Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just
C<SV*>. To access the scalar value, you must first dereference the return
value. However, you should check to make sure that the return value is
not NULL before dereferencing it.
The first of these two functions checks if a hash table entry exists, and the
second deletes it.
bool hv_exists(HV*, const char* key, U32 klen);
SV* hv_delete(HV*, const char* key, U32 klen, I32 flags);
If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will
create and return a mortal copy of the deleted value.
And more miscellaneous functions:
void hv_clear(HV*);
void hv_undef(HV*);
Like their AV counterparts, C<hv_clear> deletes all the entries in the hash
table but does not actually delete the hash table. The C<hv_undef> deletes
both the entries and the hash table itself.
Perl keeps the actual data in a linked list of structures with a typedef of HE.
These contain the actual key and value pointers (plus extra administrative
overhead). The key is a string pointer; the value is an C<SV*>. However,
once you have an C<HE*>, to get the actual key and value, use the routines
specified below.
I32 hv_iterinit(HV*);
/* Prepares starting point to traverse hash table */
HE* hv_iternext(HV*);
/* Get the next entry, and return a pointer to a
structure that has both the key and value */
char* hv_iterkey(HE* entry, I32* retlen);
/* Get the key from an HE structure and also return
the length of the key string */
SV* hv_iterval(HV*, HE* entry);
/* Return an SV pointer to the value of the HE
structure */
SV* hv_iternextsv(HV*, char** key, I32* retlen);
/* This convenience routine combines hv_iternext,
hv_iterkey, and hv_iterval. The key and retlen
arguments are return values for the key and its
length. The value is returned in the SV* argument */
If you know the name of a hash variable, you can get a pointer to its HV
by using the following:
HV* get_hv("package::varname", 0);
This returns NULL if the variable does not exist.
The hash algorithm is defined in the C<PERL_HASH> macro:
PERL_HASH(hash, key, klen)
The exact implementation of this macro varies by architecture and version
of perl, and the return value may change per invocation, so the value
is only valid for the duration of a single perl process.
See L</Understanding the Magic of Tied Hashes and Arrays> for more
information on how to use the hash access functions on tied hashes.
=head2 Hash API Extensions
Beginning with version 5.004, the following functions are also supported:
HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash);
HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash);
bool hv_exists_ent (HV* tb, SV* key, U32 hash);
SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
SV* hv_iterkeysv (HE* entry);
Note that these functions take C<SV*> keys, which simplifies writing
of extension code that deals with hash structures. These functions
also allow passing of C<SV*> keys to C<tie> functions without forcing
you to stringify the keys (unlike the previous set of functions).
They also return and accept whole hash entries (C<HE*>), making their
use more efficient (since the hash number for a particular string
doesn't have to be recomputed every time). See L<perlapi> for detailed
descriptions.
The following macros must always be used to access the contents of hash
entries. Note that the arguments to these macros must be simple
variables, since they may get evaluated more than once. See
L<perlapi> for detailed descriptions of these macros.
HePV(HE* he, STRLEN len)
HeVAL(HE* he)
HeHASH(HE* he)
HeSVKEY(HE* he)
HeSVKEY_force(HE* he)
HeSVKEY_set(HE* he, SV* sv)
These two lower level macros are defined, but must only be used when
dealing with keys that are not C<SV*>s:
HeKEY(HE* he)
HeKLEN(HE* he)
Note that both C<hv_store> and C<hv_store_ent> do not increment the
reference count of the stored C<val>, which is the caller's responsibility.
If these functions return a NULL value, the caller will usually have to
decrement the reference count of C<val> to avoid a memory leak.
=head2 AVs, HVs and undefined values
Sometimes you have to store undefined values in AVs or HVs. Although
this may be a rare case, it can be tricky. That's because you're
used to using C<&PL_sv_undef> if you need an undefined SV.
For example, intuition tells you that this XS code:
AV *av = newAV();
av_store( av, 0, &PL_sv_undef );
is equivalent to this Perl code:
my @av;
$av[0] = undef;
Unfortunately, this isn't true. In perl 5.18 and earlier, AVs use C<&PL_sv_undef> as a marker
for indicating that an array element has not yet been initialized.
Thus, C<exists $av[0]> would be true for the above Perl code, but
false for the array generated by the XS code. In perl 5.20, storing
&PL_sv_undef will create a read-only element, because the scalar
&PL_sv_undef itself is stored, not a copy.
Similar problems can occur when storing C<&PL_sv_undef> in HVs:
hv_store( hv, "key", 3, &PL_sv_undef, 0 );
This will indeed make the value C<undef>, but if you try to modify
the value of C<key>, you'll get the following error:
Modification of non-creatable hash value attempted
In perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders
in restricted hashes. This caused such hash entries not to appear
when iterating over the hash or when checking for the keys
with the C<hv_exists> function.
You can run into similar problems when you store C<&PL_sv_yes> or
C<&PL_sv_no> into AVs or HVs. Trying to modify such elements
will give you the following error:
Modification of a read-only value attempted
To make a long story short, you can use the special variables
C<&PL_sv_undef>, C<&PL_sv_yes> and C<&PL_sv_no> with AVs and
HVs, but you have to make sure you know what you're doing.
Generally, if you want to store an undefined value in an AV
or HV, you should not use C<&PL_sv_undef>, but rather create a
new undefined value using the C<newSV> function, for example:
av_store( av, 42, newSV(0) );
hv_store( hv, "foo", 3, newSV(0), 0 );
=head2 References
References are a special type of scalar that point to other data types
(including other references).
To create a reference, use either of the following functions:
SV* newRV_inc((SV*) thing);
SV* newRV_noinc((SV*) thing);
The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The
functions are identical except that C<newRV_inc> increments the reference
count of the C<thing>, while C<newRV_noinc> does not. For historical
reasons, C<newRV> is a synonym for C<newRV_inc>.
Once you have a reference, you can use the following macro to dereference
the reference:
SvRV(SV*)
then call the appropriate routines, casting the returned C<SV*> to either an
C<AV*> or C<HV*>, if required.
To determine if an SV is a reference, you can use the following macro:
SvROK(SV*)
To discover what type of value the reference refers to, use the following
macro and then check the return value.
SvTYPE(SvRV(SV*))
The most useful types that will be returned are:
< SVt_PVAV Scalar
SVt_PVAV Array
SVt_PVHV Hash
SVt_PVCV Code
SVt_PVGV Glob (possibly a file handle)
See L<perlapi/svtype> for more details.
=head2 Blessed References and Class Objects
References are also used to support object-oriented programming. In perl's
OO lexicon, an object is simply a reference that has been blessed into a
package (or class). Once blessed, the programmer may now use the reference
to access the various methods in the class.
A reference can be blessed into a package with the following function:
SV* sv_bless(SV* sv, HV* stash);
The C<sv> argument must be a reference value. The C<stash> argument
specifies which class the reference will belong to. See
L</Stashes and Globs> for information on converting class names into stashes.
/* Still under construction */
The following function upgrades rv to reference if not already one.
Creates a new SV for rv to point to. If C<classname> is non-null, the SV
is blessed into the specified class. SV is returned.
SV* newSVrv(SV* rv, const char* classname);
The following three functions copy integer, unsigned integer or double
into an SV whose reference is C<rv>. SV is blessed if C<classname> is
non-null.
SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
The following function copies the pointer value (I<the address, not the
string!>) into an SV whose reference is rv. SV is blessed if C<classname>
is non-null.
SV* sv_setref_pv(SV* rv, const char* classname, void* pv);
The following function copies a string into an SV whose reference is C<rv>.
Set length to 0 to let Perl calculate the string length. SV is blessed if
C<classname> is non-null.
SV* sv_setref_pvn(SV* rv, const char* classname, char* pv,
STRLEN length);
The following function tests whether the SV is blessed into the specified
class. It does not check inheritance relationships.
int sv_isa(SV* sv, const char* name);
The following function tests whether the SV is a reference to a blessed object.
int sv_isobject(SV* sv);
The following function tests whether the SV is derived from the specified
class. SV can be either a reference to a blessed object or a string
containing a class name. This is the function implementing the
C<UNIVERSAL::isa> functionality.
bool sv_derived_from(SV* sv, const char* name);
To check if you've got an object derived from a specific class you have
to write:
if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
=head2 Creating New Variables
To create a new Perl variable with an undef value which can be accessed from
your Perl script, use the following routines, depending on the variable type.
SV* get_sv("package::varname", GV_ADD);
AV* get_av("package::varname", GV_ADD);
HV* get_hv("package::varname", GV_ADD);
Notice the use of GV_ADD as the second parameter. The new variable can now
be set, using the routines appropriate to the data type.
There are additional macros whose values may be bitwise OR'ed with the
C<GV_ADD> argument to enable certain extra features. Those bits are:
=over
=item GV_ADDMULTI
Marks the variable as multiply defined, thus preventing the:
Name <varname> used only once: possible typo
warning.
=item GV_ADDWARN
Issues the warning:
Had to create <varname> unexpectedly
if the variable did not exist before the function was called.
=back
If you do not specify a package name, the variable is created in the current
package.
=head2 Reference Counts and Mortality
Perl uses a reference count-driven garbage collection mechanism. SVs,
AVs, or HVs (xV for short in the following) start their life with a
reference count of 1. If the reference count of an xV ever drops to 0,
then it will be destroyed and its memory made available for reuse.
This normally doesn't happen at the Perl level unless a variable is
undef'ed or the last variable holding a reference to it is changed or
overwritten. At the internal level, however, reference counts can be
manipulated with the following macros:
int SvREFCNT(SV* sv);
SV* SvREFCNT_inc(SV* sv);
void SvREFCNT_dec(SV* sv);
However, there is one other function which manipulates the reference
count of its argument. The C<newRV_inc> function, you will recall,
creates a reference to the specified argument. As a side effect,
it increments the argument's reference count. If this is not what
you want, use C<newRV_noinc> instead.
For example, imagine you want to return a reference from an XSUB function.
Inside the XSUB routine, you create an SV which initially has a reference
count of one. Then you call C<newRV_inc>, passing it the just-created SV.
This returns the reference as a new SV, but the reference count of the
SV you passed to C<newRV_inc> has been incremented to two. Now you
return the reference from the XSUB routine and forget about the SV.
But Perl hasn't! Whenever the returned reference is destroyed, the
reference count of the original SV is decreased to one and nothing happens.
The SV will hang around without any way to access it until Perl itself
terminates. This is a memory leak.
The correct procedure, then, is to use C<newRV_noinc> instead of
C<newRV_inc>. Then, if and when the last reference is destroyed,
the reference count of the SV will go to zero and it will be destroyed,
stopping any memory leak.
There are some convenience functions available that can help with the
destruction of xVs. These functions introduce the concept of "mortality".
An xV that is mortal has had its reference count marked to be decremented,
but not actually decremented, until "a short time later". Generally the
term "short time later" means a single Perl statement, such as a call to
an XSUB function. The actual determinant for when mortal xVs have their
reference count decremented depends on two macros, SAVETMPS and FREETMPS.
See L<perlcall> and L<perlxs> for more details on these macros.
"Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>.
However, if you mortalize a variable twice, the reference count will
later be decremented twice.
"Mortal" SVs are mainly used for SVs that are placed on perl's stack.
For example an SV which is created just to pass a number to a called sub
is made mortal to have it cleaned up automatically when it's popped off
the stack. Similarly, results returned by XSUBs (which are pushed on the
stack) are often made mortal.
To create a mortal variable, use the functions:
SV* sv_newmortal()
SV* sv_2mortal(SV*)
SV* sv_mortalcopy(SV*)
The first call creates a mortal SV (with no value), the second converts an existing
SV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the
third creates a mortal copy of an existing SV.
Because C<sv_newmortal> gives the new SV no value, it must normally be given one
via C<sv_setpv>, C<sv_setiv>, etc. :
SV *tmp = sv_newmortal();
sv_setiv(tmp, an_integer);
As that is multiple C statements it is quite common so see this idiom instead:
SV *tmp = sv_2mortal(newSViv(an_integer));
You should be careful about creating mortal variables. Strange things
can happen if you make the same value mortal within multiple contexts,
or if you make a variable mortal multiple
times. Thinking of "Mortalization"
as deferred C<SvREFCNT_dec> should help to minimize such problems.
For example if you are passing an SV which you I<know> has a high enough REFCNT
to survive its use on the stack you need not do any mortalization.
If you are not sure then doing an C<SvREFCNT_inc> and C<sv_2mortal>, or
making a C<sv_mortalcopy> is safer.
The mortal routines are not just for SVs; AVs and HVs can be
made mortal by passing their address (type-casted to C<SV*>) to the
C<sv_2mortal> or C<sv_mortalcopy> routines.
=head2 Stashes and Globs
A B<stash> is a hash that contains all variables that are defined
within a package. Each key of the stash is a symbol
name (shared by all the different types of objects that have the same
name), and each value in the hash table is a GV (Glob Value). This GV
in turn contains references to the various objects of that name,
including (but not limited to) the following:
Scalar Value
Array Value
Hash Value
I/O Handle
Format
Subroutine
There is a single stash called C<PL_defstash> that holds the items that exist
in the C<main> package. To get at the items in other packages, append the
string "::" to the package name. The items in the C<Foo> package are in
the stash C<Foo::> in PL_defstash. The items in the C<Bar::Baz> package are
in the stash C<Baz::> in C<Bar::>'s stash.
To get the stash pointer for a particular package, use the function:
HV* gv_stashpv(const char* name, I32 flags)
HV* gv_stashsv(SV*, I32 flags)
The first function takes a literal string, the second uses the string stored
in the SV. Remember that a stash is just a hash table, so you get back an
C<HV*>. The C<flags> flag will create a new package if it is set to GV_ADD.
The name that C<gv_stash*v> wants is the name of the package whose symbol table
you want. The default package is called C<main>. If you have multiply nested
packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl
language itself.
Alternately, if you have an SV that is a blessed reference, you can find
out the stash pointer by using:
HV* SvSTASH(SvRV(SV*));
then use the following to get the package name itself:
char* HvNAME(HV* stash);
If you need to bless or re-bless an object you can use the following
function:
SV* sv_bless(SV*, HV* stash)
where the first argument, an C<SV*>, must be a reference, and the second
argument is a stash. The returned C<SV*> can now be used in the same way
as any other SV.
For more information on references and blessings, consult L<perlref>.
=head2 Double-Typed SVs
Scalar variables normally contain only one type of value, an integer,
double, pointer, or reference. Perl will automatically convert the
actual scalar data from the stored type into the requested type.
Some scalar variables contain more than one type of scalar data. For
example, the variable C<$!> contains either the numeric value of C<errno>
or its string equivalent from either C<strerror> or C<sys_errlist[]>.
To force multiple data values into an SV, you must do two things: use the
C<sv_set*v> routines to add the additional scalar type, then set a flag
so that Perl will believe it contains more than one type of data. The
four macros to set the flags are:
SvIOK_on
SvNOK_on
SvPOK_on
SvROK_on
The particular macro you must use depends on which C<sv_set*v> routine
you called first. This is because every C<sv_set*v> routine turns on
only the bit for the particular type of data being set, and turns off
all the rest.
For example, to create a new Perl variable called "dberror" that contains
both the numeric and descriptive string error values, you could use the
following code:
extern int dberror;
extern char *dberror_list;
SV* sv = get_sv("dberror", GV_ADD);
sv_setiv(sv, (IV) dberror);
sv_setpv(sv, dberror_list[dberror]);
SvIOK_on(sv);
If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the
macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>.
=head2 Read-Only Values
In Perl 5.16 and earlier, copy-on-write (see the next section) shared a
flag bit with read-only scalars. So the only way to test whether
C<sv_setsv>, etc., will raise a "Modification of a read-only value" error
in those versions is:
SvREADONLY(sv) && !SvIsCOW(sv)
Under Perl 5.18 and later, SvREADONLY only applies to read-only variables,
and, under 5.20, copy-on-write scalars can also be read-only, so the above
check is incorrect. You just want:
SvREADONLY(sv)
If you need to do this check often, define your own macro like this:
#if PERL_VERSION >= 18
# define SvTRULYREADONLY(sv) SvREADONLY(sv)
#else
# define SvTRULYREADONLY(sv) (SvREADONLY(sv) && !SvIsCOW(sv))
#endif
=head2 Copy on Write
Perl implements a copy-on-write (COW) mechanism for scalars, in which
string copies are not immediately made when requested, but are deferred
until made necessary by one or the other scalar changing. This is mostly
transparent, but one must take care not to modify string buffers that are
shared by multiple SVs.
You can test whether an SV is using copy-on-write with C<SvIsCOW(sv)>.
You can force an SV to make its own copy of its string buffer by calling C<sv_force_normal(sv)> or SvPV_force_nolen(sv).
If you want to make the SV drop its string buffer, use
C<sv_force_normal_flags(sv, SV_COW_DROP_PV)> or simply
C<sv_setsv(sv, NULL)>.
All of these functions will croak on read-only scalars (see the previous
section for more on those).
To test that your code is behaving correctly and not modifying COW buffers,
on systems that support L<mmap(2)> (i.e., Unix) you can configure perl with
C<-Accflags=-DPERL_DEBUG_READONLY_COW> and it will turn buffer violations
into crashes. You will find it to be marvellously slow, so you may want to
skip perl's own tests.
=head2 Magic Variables
[This section still under construction. Ignore everything here. Post no
bills. Everything not permitted is forbidden.]
Any SV may be magical, that is, it has special features that a normal
SV does not have. These features are stored in the SV structure in a
linked list of C<struct magic>'s, typedef'ed to C<MAGIC>.
struct magic {
MAGIC* mg_moremagic;
MGVTBL* mg_virtual;
U16 mg_private;
char mg_type;
U8 mg_flags;
I32 mg_len;
SV* mg_obj;
char* mg_ptr;
};
Note this is current as of patchlevel 0, and could change at any time.
=head2 Assigning Magic
Perl adds magic to an SV using the sv_magic function:
void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
The C<sv> argument is a pointer to the SV that is to acquire a new magical
feature.
If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to
convert C<sv> to type C<SVt_PVMG>.
Perl then continues by adding new magic
to the beginning of the linked list of magical features. Any prior entry
of the same type of magic is deleted. Note that this can be overridden,
and multiple instances of the same type of magic can be associated with an
SV.
The C<name> and C<namlen> arguments are used to associate a string with
the magic, typically the name of a variable. C<namlen> is stored in the
C<mg_len> field and if C<name> is non-null then either a C<savepvn> copy of
C<name> or C<name> itself is stored in the C<mg_ptr> field, depending on
whether C<namlen> is greater than zero or equal to zero respectively. As a
special case, if C<(name && namlen == HEf_SVKEY)> then C<name> is assumed
to contain an C<SV*> and is stored as-is with its REFCNT incremented.
The sv_magic function uses C<how> to determine which, if any, predefined
"Magic Virtual Table" should be assigned to the C<mg_virtual> field.
See the L</Magic Virtual Tables> section below. The C<how> argument is also
stored in the C<mg_type> field. The value of
C<how> should be chosen from the set of macros
C<PERL_MAGIC_foo> found in F<perl.h>. Note that before
these macros were added, Perl internals used to directly use character
literals, so you may occasionally come across old code or documentation
referring to 'U' magic rather than C<PERL_MAGIC_uvar> for example.
The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC>
structure. If it is not the same as the C<sv> argument, the reference
count of the C<obj> object is incremented. If it is the same, or if
the C<how> argument is C<PERL_MAGIC_arylen>, C<PERL_MAGIC_regdatum>,
C<PERL_MAGIC_regdata>, or if it is a NULL pointer, then C<obj> is merely
stored, without the reference count being incremented.
See also C<sv_magicext> in L<perlapi> for a more flexible way to add magic
to an SV.
There is also a function to add magic to an C<HV>:
void hv_magic(HV *hv, GV *gv, int how);
This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>.
To remove the magic from an SV, call the function sv_unmagic:
int sv_unmagic(SV *sv, int type);
The C<type> argument should be equal to the C<how> value when the C<SV>
was initially made magical.
However, note that C<sv_unmagic> removes all magic of a certain C<type> from the
C<SV>. If you want to remove only certain
magic of a C<type> based on the magic
virtual table, use C<sv_unmagicext> instead:
int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl);
=head2 Magic Virtual Tables
The C<mg_virtual> field in the C<MAGIC> structure is a pointer to an
C<MGVTBL>, which is a structure of function pointers and stands for
"Magic Virtual Table" to handle the various operations that might be
applied to that variable.
The C<MGVTBL> has five (or sometimes eight) pointers to the following
routine types:
int (*svt_get) (pTHX_ SV* sv, MAGIC* mg);
int (*svt_set) (pTHX_ SV* sv, MAGIC* mg);
U32 (*svt_len) (pTHX_ SV* sv, MAGIC* mg);
int (*svt_clear)(pTHX_ SV* sv, MAGIC* mg);
int (*svt_free) (pTHX_ SV* sv, MAGIC* mg);
int (*svt_copy) (pTHX_ SV *sv, MAGIC* mg, SV *nsv,
const char *name, I32 namlen);
int (*svt_dup) (pTHX_ MAGIC *mg, CLONE_PARAMS *param);
int (*svt_local)(pTHX_ SV *nsv, MAGIC *mg);
This MGVTBL structure is set at compile-time in F<perl.h> and there are
currently 32 types. These different structures contain pointers to various
routines that perform additional actions depending on which function is
being called.
Function pointer Action taken
---------------- ------------
svt_get Do something before the value of the SV is
retrieved.
svt_set Do something after the SV is assigned a value.
svt_len Report on the SV's length.
svt_clear Clear something the SV represents.
svt_free Free any extra storage associated with the SV.
svt_copy copy tied variable magic to a tied element
svt_dup duplicate a magic structure during thread cloning
svt_local copy magic to local value during 'local'
For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds
to an C<mg_type> of C<PERL_MAGIC_sv>) contains:
{ magic_get, magic_set, magic_len, 0, 0 }
Thus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>,
if a get operation is being performed, the routine C<magic_get> is
called. All the various routines for the various magical types begin
with C<magic_>. NOTE: the magic routines are not considered part of
the Perl API, and may not be exported by the Perl library.
The last three slots are a recent addition, and for source code
compatibility they are only checked for if one of the three flags
MGf_COPY, MGf_DUP or MGf_LOCAL is set in mg_flags.
This means that most code can continue declaring
a vtable as a 5-element value. These three are
currently used exclusively by the threading code, and are highly subject
to change.
The current kinds of Magic Virtual Tables are:
=for comment
This table is generated by regen/mg_vtable.pl. Any changes made here
will be lost.
=for mg_vtable.pl begin
mg_type
(old-style char and macro) MGVTBL Type of magic
-------------------------- ------ -------------
\0 PERL_MAGIC_sv vtbl_sv Special scalar variable
# PERL_MAGIC_arylen vtbl_arylen Array length ($#ary)
% PERL_MAGIC_rhash (none) Extra data for restricted
hashes
* PERL_MAGIC_debugvar vtbl_debugvar $DB::single, signal, trace
vars
. PERL_MAGIC_pos vtbl_pos pos() lvalue
: PERL_MAGIC_symtab (none) Extra data for symbol
tables
< PERL_MAGIC_backref vtbl_backref For weak ref data
@ PERL_MAGIC_arylen_p (none) To move arylen out of XPVAV
B PERL_MAGIC_bm vtbl_regexp Boyer-Moore
(fast string search)
c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table
(AMT) on stash
D PERL_MAGIC_regdata vtbl_regdata Regex match position data
(@+ and @- vars)
d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data
element
E PERL_MAGIC_env vtbl_env %ENV hash
e PERL_MAGIC_envelem vtbl_envelem %ENV hash element
f PERL_MAGIC_fm vtbl_regexp Formline
('compiled' format)
g PERL_MAGIC_regex_global vtbl_mglob m//g target
H PERL_MAGIC_hints vtbl_hints %^H hash
h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element
I PERL_MAGIC_isa vtbl_isa @ISA array
i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element
k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue
L PERL_MAGIC_dbfile (none) Debugger %_<filename
l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename
element
N PERL_MAGIC_shared (none) Shared between threads
n PERL_MAGIC_shared_scalar (none) Shared between threads
o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation
P PERL_MAGIC_tied vtbl_pack Tied array or hash
p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element
q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle
r PERL_MAGIC_qr vtbl_regexp Precompiled qr// regex
S PERL_MAGIC_sig (none) %SIG hash
s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element
t PERL_MAGIC_taint vtbl_taint Taintedness
U PERL_MAGIC_uvar vtbl_uvar Available for use by
extensions
u PERL_MAGIC_uvar_elem (none) Reserved for use by
extensions
V PERL_MAGIC_vstring (none) SV was vstring literal
v PERL_MAGIC_vec vtbl_vec vec() lvalue
w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information
x PERL_MAGIC_substr vtbl_substr substr() lvalue
y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator
variable / smart parameter
vivification
\ PERL_MAGIC_lvref vtbl_lvref Lvalue reference
constructor
] PERL_MAGIC_checkcall vtbl_checkcall Inlining/mutation of call
to this CV
~ PERL_MAGIC_ext (none) Available for use by
extensions
=for mg_vtable.pl end
When an uppercase and lowercase letter both exist in the table, then the
uppercase letter is typically used to represent some kind of composite type
(a list or a hash), and the lowercase letter is used to represent an element
of that composite type. Some internals code makes use of this case
relationship. However, 'v' and 'V' (vec and v-string) are in no way related.
The C<PERL_MAGIC_ext> and C<PERL_MAGIC_uvar> magic types are defined
specifically for use by extensions and will not be used by perl itself.
Extensions can use C<PERL_MAGIC_ext> magic to 'attach' private information
to variables (typically objects). This is especially useful because
there is no way for normal perl code to corrupt this private information
(unlike using extra elements of a hash object).
Similarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a
C function any time a scalar's value is used or changed. The C<MAGIC>'s
C<mg_ptr> field points to a C<ufuncs> structure:
struct ufuncs {
I32 (*uf_val)(pTHX_ IV, SV*);
I32 (*uf_set)(pTHX_ IV, SV*);
IV uf_index;
};
When the SV is read from or written to, the C<uf_val> or C<uf_set>
function will be called with C<uf_index> as the first arg and a pointer to
the SV as the second. A simple example of how to add C<PERL_MAGIC_uvar>
magic is shown below. Note that the ufuncs structure is copied by
sv_magic, so you can safely allocate it on the stack.
void
Umagic(sv)
SV *sv;
PREINIT:
struct ufuncs uf;
CODE:
uf.uf_val = &my_get_fn;
uf.uf_set = &my_set_fn;
uf.uf_index = 0;
sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf));
Attaching C<PERL_MAGIC_uvar> to arrays is permissible but has no effect.
For hashes there is a specialized hook that gives control over hash
keys (but not values). This hook calls C<PERL_MAGIC_uvar> 'get' magic
if the "set" function in the C<ufuncs> structure is NULL. The hook
is activated whenever the hash is accessed with a key specified as
an C<SV> through the functions C<hv_store_ent>, C<hv_fetch_ent>,
C<hv_delete_ent>, and C<hv_exists_ent>. Accessing the key as a string
through the functions without the C<..._ent> suffix circumvents the
hook. See L<Hash::Util::FieldHash/GUTS> for a detailed description.
Note that because multiple extensions may be using C<PERL_MAGIC_ext>
or C<PERL_MAGIC_uvar> magic, it is important for extensions to take
extra care to avoid conflict. Typically only using the magic on
objects blessed into the same class as the extension is sufficient.
For C<PERL_MAGIC_ext> magic, it is usually a good idea to define an
C<MGVTBL>, even if all its fields will be C<0>, so that individual
C<MAGIC> pointers can be identified as a particular kind of magic
using their magic virtual table. C<mg_findext> provides an easy way
to do that:
STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 };
MAGIC *mg;
if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) {
/* this is really ours, not another module's PERL_MAGIC_ext */
my_priv_data_t *priv = (my_priv_data_t *)mg->mg_ptr;
...
}
Also note that the C<sv_set*()> and C<sv_cat*()> functions described
earlier do B<not> invoke 'set' magic on their targets. This must
be done by the user either by calling the C<SvSETMAGIC()> macro after
calling these functions, or by using one of the C<sv_set*_mg()> or
C<sv_cat*_mg()> functions. Similarly, generic C code must call the
C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV
obtained from external sources in functions that don't handle magic.
See L<perlapi> for a description of these functions.
For example, calls to the C<sv_cat*()> functions typically need to be
followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()>
since their implementation handles 'get' magic.
=head2 Finding Magic
MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that
* type */
This routine returns a pointer to a C<MAGIC> structure stored in the SV.
If the SV does not have that magical
feature, C<NULL> is returned. If the
SV has multiple instances of that magical feature, the first one will be
returned. C<mg_findext> can be used
to find a C<MAGIC> structure of an SV
based on both its magic type and its magic virtual table:
MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl);
Also, if the SV passed to C<mg_find> or C<mg_findext> is not of type
SVt_PVMG, Perl may core dump.
int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
This routine checks to see what types of magic C<sv> has. If the mg_type
field is an uppercase letter, then the mg_obj is copied to C<nsv>, but
the mg_type field is changed to be the lowercase letter.
=head2 Understanding the Magic of Tied Hashes and Arrays
Tied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied>
magic type.
WARNING: As of the 5.004 release, proper usage of the array and hash
access functions requires understanding a few caveats. Some
of these caveats are actually considered bugs in the API, to be fixed
in later releases, and are bracketed with [MAYCHANGE] below. If
you find yourself actually applying such information in this section, be
aware that the behavior may change in the future, umm, without warning.
The perl tie function associates a variable with an object that implements
the various GET, SET, etc methods. To perform the equivalent of the perl
tie function from an XSUB, you must mimic this behaviour. The code below
carries out the necessary steps -- firstly it creates a new hash, and then
creates a second hash which it blesses into the class which will implement
the tie methods. Lastly it ties the two hashes together, and returns a
reference to the new tied hash. Note that the code below does NOT call the
TIEHASH method in the MyTie class -
see L</Calling Perl Routines from within C Programs> for details on how
to do this.
SV*
mytie()
PREINIT:
HV *hash;
HV *stash;
SV *tie;
CODE:
hash = newHV();
tie = newRV_noinc((SV*)newHV());
stash = gv_stashpv("MyTie", GV_ADD);
sv_bless(tie, stash);
hv_magic(hash, (GV*)tie, PERL_MAGIC_tied);
RETVAL = newRV_noinc(hash);
OUTPUT:
RETVAL
The C<av_store> function, when given a tied array argument, merely
copies the magic of the array onto the value to be "stored", using
C<mg_copy>. It may also return NULL, indicating that the value did not
actually need to be stored in the array. [MAYCHANGE] After a call to
C<av_store> on a tied array, the caller will usually need to call
C<mg_set(val)> to actually invoke the perl level "STORE" method on the
TIEARRAY object. If C<av_store> did return NULL, a call to
C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory
leak. [/MAYCHANGE]
The previous paragraph is applicable verbatim to tied hash access using the
C<hv_store> and C<hv_store_ent> functions as well.
C<av_fetch> and the corresponding hash functions C<hv_fetch> and
C<hv_fetch_ent> actually return an undefined mortal value whose magic
has been initialized using C<mg_copy>. Note the value so returned does not
need to be deallocated, as it is already mortal. [MAYCHANGE] But you will
need to call C<mg_get()> on the returned value in order to actually invoke
the perl level "FETCH" method on the underlying TIE object. Similarly,
you may also call C<mg_set()> on the return value after possibly assigning
a suitable value to it using C<sv_setsv>, which will invoke the "STORE"
method on the TIE object. [/MAYCHANGE]
[MAYCHANGE]
In other words, the array or hash fetch/store functions don't really
fetch and store actual values in the case of tied arrays and hashes. They
merely call C<mg_copy> to attach magic to the values that were meant to be
"stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually
do the job of invoking the TIE methods on the underlying objects. Thus
the magic mechanism currently implements a kind of lazy access to arrays
and hashes.
Currently (as of perl version 5.004), use of the hash and array access
functions requires the user to be aware of whether they are operating on
"normal" hashes and arrays, or on their tied variants. The API may be
changed to provide more transparent access to both tied and normal data
types in future versions.
[/MAYCHANGE]
You would do well to understand that the TIEARRAY and TIEHASH interfaces
are mere sugar to invoke some perl method calls while using the uniform hash
and array syntax. The use of this sugar imposes some overhead (typically
about two to four extra opcodes per FETCH/STORE operation, in addition to
the creation of all the mortal variables required to invoke the methods).
This overhead will be comparatively small if the TIE methods are themselves
substantial, but if they are only a few statements long, the overhead
will not be insignificant.
=head2 Localizing changes
Perl has a very handy construction
{
local $var = 2;
...
}
This construction is I<approximately> equivalent to
{
my $oldvar = $var;
$var = 2;
...
$var = $oldvar;
}
The biggest difference is that the first construction would
reinstate the initial value of $var, irrespective of how control exits
the block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit
more efficient as well.
There is a way to achieve a similar task from C via Perl API: create a
I<pseudo-block>, and arrange for some changes to be automatically
undone at the end of it, either explicit, or via a non-local exit (via
die()). A I<block>-like construct is created by a pair of
C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">).
Such a construct may be created specially for some important localized
task, or an existing one (like boundaries of enclosing Perl
subroutine/block, or an existing pair for freeing TMPs) may be
used. (In the second case the overhead of additional localization must
be almost negligible.) Note that any XSUB is automatically enclosed in
an C<ENTER>/C<LEAVE> pair.
Inside such a I<pseudo-block> the following service is available:
=over 4
=item C<SAVEINT(int i)>
=item C<SAVEIV(IV i)>
=item C<SAVEI32(I32 i)>
=item C<SAVELONG(long i)>
These macros arrange things to restore the value of integer variable
C<i> at the end of enclosing I<pseudo-block>.
=item C<SAVESPTR(s)>
=item C<SAVEPPTR(p)>
These macros arrange things to restore the value of pointers C<s> and
C<p>. C<s> must be a pointer of a type which survives conversion to
C<SV*> and back, C<p> should be able to survive conversion to C<char*>
and back.
=item C<SAVEFREESV(SV *sv)>
The refcount of C<sv> will be decremented at the end of
I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a
mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal>
extends the lifetime of C<sv> until the beginning of the next statement,
C<SAVEFREESV> extends it until the end of the enclosing scope. These
lifetimes can be wildly different.
Also compare C<SAVEMORTALIZESV>.
=item C<SAVEMORTALIZESV(SV *sv)>
Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current
scope instead of decrementing its reference count. This usually has the
effect of keeping C<sv> alive until the statement that called the currently
live scope has finished executing.
=item C<SAVEFREEOP(OP *op)>
The C<OP *> is op_free()ed at the end of I<pseudo-block>.
=item C<SAVEFREEPV(p)>
The chunk of memory which is pointed to by C<p> is Safefree()ed at the
end of I<pseudo-block>.
=item C<SAVECLEARSV(SV *sv)>
Clears a slot in the current scratchpad which corresponds to C<sv> at
the end of I<pseudo-block>.
=item C<SAVEDELETE(HV *hv, char *key, I32 length)>
The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The
string pointed to by C<key> is Safefree()ed. If one has a I<key> in
short-lived storage, the corresponding string may be reallocated like
this:
SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)>
At the end of I<pseudo-block> the function C<f> is called with the
only argument C<p>.
=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)>
At the end of I<pseudo-block> the function C<f> is called with the
implicit context argument (if any), and C<p>.
=item C<SAVESTACK_POS()>
The current offset on the Perl internal stack (cf. C<SP>) is restored
at the end of I<pseudo-block>.
=back
The following API list contains functions, thus one needs to
provide pointers to the modifiable data explicitly (either C pointers,
or Perlish C<GV *>s). Where the above macros take C<int>, a similar
function takes C<int *>.
=over 4
=item C<SV* save_scalar(GV *gv)>
Equivalent to Perl code C<local $gv>.
=item C<AV* save_ary(GV *gv)>
=item C<HV* save_hash(GV *gv)>
Similar to C<save_scalar>, but localize C<@gv> and C<%gv>.
=item C<void save_item(SV *item)>
Duplicates the current value of C<SV>, on the exit from the current
C<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV>
using the stored value. It doesn't handle magic. Use C<save_scalar> if
magic is affected.
=item C<void save_list(SV **sarg, I32 maxsarg)>
A variant of C<save_item> which takes multiple arguments via an array
C<sarg> of C<SV*> of length C<maxsarg>.
=item C<SV* save_svref(SV **sptr)>
Similar to C<save_scalar>, but will reinstate an C<SV *>.
=item C<void save_aptr(AV **aptr)>
=item C<void save_hptr(HV **hptr)>
Similar to C<save_svref>, but localize C<AV *> and C<HV *>.
=back
The C<Alias> module implements localization of the basic types within the
I<caller's scope>. People who are interested in how to localize things in
the containing scope should take a look there too.
=head1 Subroutines
=head2 XSUBs and the Argument Stack
The XSUB mechanism is a simple way for Perl programs to access C subroutines.
An XSUB routine will have a stack that contains the arguments from the Perl
program, and a way to map from the Perl data structures to a C equivalent.
The stack arguments are accessible through the C<ST(n)> macro, which returns
the C<n>'th stack argument. Argument 0 is the first argument passed in the
Perl subroutine call. These arguments are C<SV*>, and can be used anywhere
an C<SV*> is used.
Most of the time, output from the C routine can be handled through use of
the RETVAL and OUTPUT directives. However, there are some cases where the
argument stack is not already long enough to handle all the return values.
An example is the POSIX tzname() call, which takes no arguments, but returns
two, the local time zone's standard and summer time abbreviations.
To handle this situation, the PPCODE directive is used and the stack is
extended using the macro:
EXTEND(SP, num);
where C<SP> is the macro that represents the local copy of the stack pointer,
and C<num> is the number of elements the stack should be extended by.
Now that there is room on the stack, values can be pushed on it using C<PUSHs>
macro. The pushed values will often need to be "mortal" (See
L</Reference Counts and Mortality>):
PUSHs(sv_2mortal(newSViv(an_integer)))
PUSHs(sv_2mortal(newSVuv(an_unsigned_integer)))
PUSHs(sv_2mortal(newSVnv(a_double)))
PUSHs(sv_2mortal(newSVpv("Some String",0)))
/* Although the last example is better written as the more
* efficient: */
PUSHs(newSVpvs_flags("Some String", SVs_TEMP))
And now the Perl program calling C<tzname>, the two values will be assigned
as in:
($standard_abbrev, $summer_abbrev) = POSIX::tzname;
An alternate (and possibly simpler) method to pushing values on the stack is
to use the macro:
XPUSHs(SV*)
This macro automatically adjusts the stack for you, if needed. Thus, you
do not need to call C<EXTEND> to extend the stack.
Despite their suggestions in earlier versions of this document the macros
C<(X)PUSH[iunp]> are I<not> suited to XSUBs which return multiple results.
For that, either stick to the C<(X)PUSHs> macros shown above, or use the new
C<m(X)PUSH[iunp]> macros instead; see L</Putting a C value on Perl stack>.
For more information, consult L<perlxs> and L<perlxstut>.
=head2 Autoloading with XSUBs
If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts the
fully-qualified name of the autoloaded subroutine in the $AUTOLOAD variable
of the XSUB's package.
But it also puts the same information in certain fields of the XSUB itself:
HV *stash = CvSTASH(cv);
const char *subname = SvPVX(cv);
STRLEN name_length = SvCUR(cv); /* in bytes */
U32 is_utf8 = SvUTF8(cv);
C<SvPVX(cv)> contains just the sub name itself, not including the package.
For an AUTOLOAD routine in UNIVERSAL or one of its superclasses,
C<CvSTASH(cv)> returns NULL during a method call on a nonexistent package.
B<Note>: Setting $AUTOLOAD stopped working in 5.6.1, which did not support
XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in the
XSUB itself. Perl 5.16.0 restored the setting of $AUTOLOAD. If you need
to support 5.8-5.14, use the XSUB's fields.
=head2 Calling Perl Routines from within C Programs
There are four routines that can be used to call a Perl subroutine from
within a C program. These four are:
I32 call_sv(SV*, I32);
I32 call_pv(const char*, I32);
I32 call_method(const char*, I32);
I32 call_argv(const char*, I32, char**);
The routine most often used is C<call_sv>. The C<SV*> argument
contains either the name of the Perl subroutine to be called, or a
reference to the subroutine. The second argument consists of flags
that control the context in which the subroutine is called, whether
or not the subroutine is being passed arguments, how errors should be
trapped, and how to treat return values.
All four routines return the number of arguments that the subroutine returned
on the Perl stack.
These routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0,
but those names are now deprecated; macros of the same name are provided for
compatibility.
When using any of these routines (except C<call_argv>), the programmer
must manipulate the Perl stack. These include the following macros and
functions:
dSP
SP
PUSHMARK()
PUTBACK
SPAGAIN
ENTER
SAVETMPS
FREETMPS
LEAVE
XPUSH*()
POP*()
For a detailed description of calling conventions from C to Perl,
consult L<perlcall>.
=head2 Putting a C value on Perl stack
A lot of opcodes (this is an elementary operation in the internal perl
stack machine) put an SV* on the stack. However, as an optimization
the corresponding SV is (usually) not recreated each time. The opcodes
reuse specially assigned SVs (I<target>s) which are (as a corollary)
not constantly freed/created.
Each of the targets is created only once (but see
L</Scratchpads and recursion> below), and when an opcode needs to put
an integer, a double, or a string on stack, it just sets the
corresponding parts of its I<target> and puts the I<target> on stack.
The macro to put this target on stack is C<PUSHTARG>, and it is
directly used in some opcodes, as well as indirectly in zillions of
others, which use it via C<(X)PUSH[iunp]>.
Because the target is reused, you must be careful when pushing multiple
values on the stack. The following code will not do what you think:
XPUSHi(10);
XPUSHi(20);
This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto
the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack".
At the end of the operation, the stack does not contain the values 10
and 20, but actually contains two pointers to C<TARG>, which we have set
to 20.
If you need to push multiple different values then you should either use
the C<(X)PUSHs> macros, or else use the new C<m(X)PUSH[iunp]> macros,
none of which make use of C<TARG>. The C<(X)PUSHs> macros simply push an
SV* on the stack, which, as noted under L</XSUBs and the Argument Stack>,
will often need to be "mortal". The new C<m(X)PUSH[iunp]> macros make
this a little easier to achieve by creating a new mortal for you (via
C<(X)PUSHmortal>), pushing that onto the stack (extending it if necessary
in the case of the C<mXPUSH[iunp]> macros), and then setting its value.
Thus, instead of writing this to "fix" the example above:
XPUSHs(sv_2mortal(newSViv(10)))
XPUSHs(sv_2mortal(newSViv(20)))
you can simply write:
mXPUSHi(10)
mXPUSHi(20)
On a related note, if you do use C<(X)PUSH[iunp]>, then you're going to
need a C<dTARG> in your variable declarations so that the C<*PUSH*>
macros can make use of the local variable C<TARG>. See also C<dTARGET>
and C<dXSTARG>.
=head2 Scratchpads
The question remains on when the SVs which are I<target>s for opcodes
are created. The answer is that they are created when the current
unit--a subroutine or a file (for opcodes for statements outside of
subroutines)--is compiled. During this time a special anonymous Perl
array is created, which is called a scratchpad for the current unit.
A scratchpad keeps SVs which are lexicals for the current unit and are
targets for opcodes. A previous version of this document
stated that one can deduce that an SV lives on a scratchpad
by looking on its flags: lexicals have C<SVs_PADMY> set, and
I<target>s have C<SVs_PADTMP> set. But this has never been fully true.
C<SVs_PADMY> could be set on a variable that no longer resides in any pad.
While I<target>s do have C<SVs_PADTMP> set, it can also be set on variables
that have never resided in a pad, but nonetheless act like I<target>s. As
of perl 5.21.5, the C<SVs_PADMY> flag is no longer used and is defined as
0. C<SvPADMY()> now returns true for anything without C<SVs_PADTMP>.
The correspondence between OPs and I<target>s is not 1-to-1. Different
OPs in the compile tree of the unit can use the same target, if this
would not conflict with the expected life of the temporary.
=head2 Scratchpads and recursion
In fact it is not 100% true that a compiled unit contains a pointer to
the scratchpad AV. In fact it contains a pointer to an AV of
(initially) one element, and this element is the scratchpad AV. Why do
we need an extra level of indirection?
The answer is B<recursion>, and maybe B<threads>. Both
these can create several execution pointers going into the same
subroutine. For the subroutine-child not write over the temporaries
for the subroutine-parent (lifespan of which covers the call to the
child), the parent and the child should have different
scratchpads. (I<And> the lexicals should be separate anyway!)
So each subroutine is born with an array of scratchpads (of length 1).
On each entry to the subroutine it is checked that the current
depth of the recursion is not more than the length of this array, and
if it is, new scratchpad is created and pushed into the array.
The I<target>s on this scratchpad are C<undef>s, but they are already
marked with correct flags.
=head1 Memory Allocation
=head2 Allocation
All memory meant to be used with the Perl API functions should be manipulated
using the macros described in this section. The macros provide the necessary
transparency between differences in the actual malloc implementation that is
used within perl.
It is suggested that you enable the version of malloc that is distributed
with Perl. It keeps pools of various sizes of unallocated memory in
order to satisfy allocation requests more quickly. However, on some
platforms, it may cause spurious malloc or free errors.
The following three macros are used to initially allocate memory :
Newx(pointer, number, type);
Newxc(pointer, number, type, cast);
Newxz(pointer, number, type);
The first argument C<pointer> should be the name of a variable that will
point to the newly allocated memory.
The second and third arguments C<number> and C<type> specify how many of
the specified type of data structure should be allocated. The argument
C<type> is passed to C<sizeof>. The final argument to C<Newxc>, C<cast>,
should be used if the C<pointer> argument is different from the C<type>
argument.
Unlike the C<Newx> and C<Newxc> macros, the C<Newxz> macro calls C<memzero>
to zero out all the newly allocated memory.
=head2 Reallocation
Renew(pointer, number, type);
Renewc(pointer, number, type, cast);
Safefree(pointer)
These three macros are used to change a memory buffer size or to free a
piece of memory no longer needed. The arguments to C<Renew> and C<Renewc>
match those of C<New> and C<Newc> with the exception of not needing the
"magic cookie" argument.
=head2 Moving
Move(source, dest, number, type);
Copy(source, dest, number, type);
Zero(dest, number, type);
These three macros are used to move, copy, or zero out previously allocated
memory. The C<source> and C<dest> arguments point to the source and
destination starting points. Perl will move, copy, or zero out C<number>
instances of the size of the C<type> data structure (using the C<sizeof>
function).
=head1 PerlIO
The most recent development releases of Perl have been experimenting with
removing Perl's dependency on the "normal" standard I/O suite and allowing
other stdio implementations to be used. This involves creating a new
abstraction layer that then calls whichever implementation of stdio Perl
was compiled with. All XSUBs should now use the functions in the PerlIO
abstraction layer and not make any assumptions about what kind of stdio
is being used.
For a complete description of the PerlIO abstraction, consult L<perlapio>.
=head1 Compiled code
=head2 Code tree
Here we describe the internal form your code is converted to by
Perl. Start with a simple example:
$a = $b + $c;
This is converted to a tree similar to this one:
assign-to
/ \
+ $a
/ \
$b $c
(but slightly more complicated). This tree reflects the way Perl
parsed your code, but has nothing to do with the execution order.
There is an additional "thread" going through the nodes of the tree
which shows the order of execution of the nodes. In our simplified
example above it looks like:
$b ---> $c ---> + ---> $a ---> assign-to
But with the actual compile tree for C<$a = $b + $c> it is different:
some nodes I<optimized away>. As a corollary, though the actual tree
contains more nodes than our simplified example, the execution order
is the same as in our example.
=head2 Examining the tree
If you have your perl compiled for debugging (usually done with
C<-DDEBUGGING> on the C<Configure> command line), you may examine the
compiled tree by specifying C<-Dx> on the Perl command line. The
output takes several lines per node, and for C<$b+$c> it looks like
this:
5 TYPE = add ===> 6
TARG = 1
FLAGS = (SCALAR,KIDS)
{
TYPE = null ===> (4)
(was rv2sv)
FLAGS = (SCALAR,KIDS)
{
3 TYPE = gvsv ===> 4
FLAGS = (SCALAR)
GV = main::b
}
}
{
TYPE = null ===> (5)
(was rv2sv)
FLAGS = (SCALAR,KIDS)
{
4 TYPE = gvsv ===> 5
FLAGS = (SCALAR)
GV = main::c
}
}
This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are
not optimized away (one per number in the left column). The immediate
children of the given node correspond to C<{}> pairs on the same level
of indentation, thus this listing corresponds to the tree:
add
/ \
null null
| |
gvsv gvsv
The execution order is indicated by C<===E<gt>> marks, thus it is C<3
4 5 6> (node C<6> is not included into above listing), i.e.,
C<gvsv gvsv add whatever>.
Each of these nodes represents an op, a fundamental operation inside the
Perl core. The code which implements each operation can be found in the
F<pp*.c> files; the function which implements the op with type C<gvsv>
is C<pp_gvsv>, and so on. As the tree above shows, different ops have
different numbers of children: C<add> is a binary operator, as one would
expect, and so has two children. To accommodate the various different
numbers of children, there are various types of op data structure, and
they link together in different ways.
The simplest type of op structure is C<OP>: this has no children. Unary
operators, C<UNOP>s, have one child, and this is pointed to by the
C<op_first> field. Binary operators (C<BINOP>s) have not only an
C<op_first> field but also an C<op_last> field. The most complex type of
op is a C<LISTOP>, which has any number of children. In this case, the
first child is pointed to by C<op_first> and the last child by
C<op_last>. The children in between can be found by iteratively
following the C<OpSIBLING> pointer from the first child to the last (but
see below).
There are also some other op types: a C<PMOP> holds a regular expression,
and has no children, and a C<LOOP> may or may not have children. If the
C<op_children> field is non-zero, it behaves like a C<LISTOP>. To
complicate matters, if a C<UNOP> is actually a C<null> op after
optimization (see L</Compile pass 2: context propagation>) it will still
have children in accordance with its former type.
Finally, there is a C<LOGOP>, or logic op. Like a C<LISTOP>, this has one
or more children, but it doesn't have an C<op_last> field: so you have to
follow C<op_first> and then the C<OpSIBLING> chain itself to find the
last child. Instead it has an C<op_other> field, which is comparable to
the C<op_next> field described below, and represents an alternate
execution path. Operators like C<and>, C<or> and C<?> are C<LOGOP>s. Note
that in general, C<op_other> may not point to any of the direct children
of the C<LOGOP>.
Starting in version 5.21.2, perls built with the experimental
define C<-DPERL_OP_PARENT> add an extra boolean flag for each op,
C<op_moresib>. When not set, this indicates that this is the last op in an
C<OpSIBLING> chain. This frees up the C<op_sibling> field on the last
sibling to point back to the parent op. Under this build, that field is
also renamed C<op_sibparent> to reflect its joint role. The macro
C<OpSIBLING(o)> wraps this special behaviour, and always returns NULL on
the last sibling. With this build the C<op_parent(o)> function can be
used to find the parent of any op. Thus for forward compatibility, you
should always use the C<OpSIBLING(o)> macro rather than accessing
C<op_sibling> directly.
Another way to examine the tree is to use a compiler back-end module, such
as L<B::Concise>.
=head2 Compile pass 1: check routines
The tree is created by the compiler while I<yacc> code feeds it
the constructions it recognizes. Since I<yacc> works bottom-up, so does
the first pass of perl compilation.
What makes this pass interesting for perl developers is that some
optimization may be performed on this pass. This is optimization by
so-called "check routines". The correspondence between node names
and corresponding check routines is described in F<opcode.pl> (do not
forget to run C<make regen_headers> if you modify this file).
A check routine is called when the node is fully constructed except
for the execution-order thread. Since at this time there are no
back-links to the currently constructed node, one can do most any
operation to the top-level node, including freeing it and/or creating
new nodes above/below it.
The check routine returns the node which should be inserted into the
tree (if the top-level node was not modified, check routine returns
its argument).
By convention, check routines have names C<ck_*>. They are usually
called from C<new*OP> subroutines (or C<convert>) (which in turn are
called from F<perly.y>).
=head2 Compile pass 1a: constant folding
Immediately after the check routine is called the returned node is
checked for being compile-time executable. If it is (the value is
judged to be constant) it is immediately executed, and a I<constant>
node with the "return value" of the corresponding subtree is
substituted instead. The subtree is deleted.
If constant folding was not performed, the execution-order thread is
created.
=head2 Compile pass 2: context propagation
When a context for a part of compile tree is known, it is propagated
down through the tree. At this time the context can have 5 values
(instead of 2 for runtime context): void, boolean, scalar, list, and
lvalue. In contrast with the pass 1 this pass is processed from top
to bottom: a node's context determines the context for its children.
Additional context-dependent optimizations are performed at this time.
Since at this moment the compile tree contains back-references (via
"thread" pointers), nodes cannot be free()d now. To allow
optimized-away nodes at this stage, such nodes are null()ified instead
of free()ing (i.e. their type is changed to OP_NULL).
=head2 Compile pass 3: peephole optimization
After the compile tree for a subroutine (or for an C<eval> or a file)
is created, an additional pass over the code is performed. This pass
is neither top-down or bottom-up, but in the execution order (with
additional complications for conditionals). Optimizations performed
at this stage are subject to the same restrictions as in the pass 2.
Peephole optimizations are done by calling the function pointed to
by the global variable C<PL_peepp>. By default, C<PL_peepp> just
calls the function pointed to by the global variable C<PL_rpeepp>.
By default, that performs some basic op fixups and optimisations along
the execution-order op chain, and recursively calls C<PL_rpeepp> for
each side chain of ops (resulting from conditionals). Extensions may
provide additional optimisations or fixups, hooking into either the
per-subroutine or recursive stage, like this:
static peep_t prev_peepp;
static void my_peep(pTHX_ OP *o)
{
/* custom per-subroutine optimisation goes here */
prev_peepp(aTHX_ o);
/* custom per-subroutine optimisation may also go here */
}
BOOT:
prev_peepp = PL_peepp;
PL_peepp = my_peep;
static peep_t prev_rpeepp;
static void my_rpeep(pTHX_ OP *o)
{
OP *orig_o = o;
for(; o; o = o->op_next) {
/* custom per-op optimisation goes here */
}
prev_rpeepp(aTHX_ orig_o);
}
BOOT:
prev_rpeepp = PL_rpeepp;
PL_rpeepp = my_rpeep;
=head2 Pluggable runops
The compile tree is executed in a runops function. There are two runops
functions, in F<run.c> and in F<dump.c>. C<Perl_runops_debug> is used
with DEBUGGING and C<Perl_runops_standard> is used otherwise. For fine
control over the execution of the compile tree it is possible to provide
your own runops function.
It's probably best to copy one of the existing runops functions and
change it to suit your needs. Then, in the BOOT section of your XS
file, add the line:
PL_runops = my_runops;
This function should be as efficient as possible to keep your programs
running as fast as possible.
=head2 Compile-time scope hooks
As of perl 5.14 it is possible to hook into the compile-time lexical
scope mechanism using C<Perl_blockhook_register>. This is used like
this:
STATIC void my_start_hook(pTHX_ int full);
STATIC BHK my_hooks;
BOOT:
BhkENTRY_set(&my_hooks, bhk_start, my_start_hook);
Perl_blockhook_register(aTHX_ &my_hooks);
This will arrange to have C<my_start_hook> called at the start of
compiling every lexical scope. The available hooks are:
=over 4
=item C<void bhk_start(pTHX_ int full)>
This is called just after starting a new lexical scope. Note that Perl
code like
if ($x) { ... }
creates two scopes: the first starts at the C<(> and has C<full == 1>,
the second starts at the C<{> and has C<full == 0>. Both end at the
C<}>, so calls to C<start> and C<pre>/C<post_end> will match. Anything
pushed onto the save stack by this hook will be popped just before the
scope ends (between the C<pre_> and C<post_end> hooks, in fact).
=item C<void bhk_pre_end(pTHX_ OP **o)>
This is called at the end of a lexical scope, just before unwinding the
stack. I<o> is the root of the optree representing the scope; it is a
double pointer so you can replace the OP if you need to.
=item C<void bhk_post_end(pTHX_ OP **o)>
This is called at the end of a lexical scope, just after unwinding the
stack. I<o> is as above. Note that it is possible for calls to C<pre_>
and C<post_end> to nest, if there is something on the save stack that
calls string eval.
=item C<void bhk_eval(pTHX_ OP *const o)>
This is called just before starting to compile an C<eval STRING>, C<do
FILE>, C<require> or C<use>, after the eval has been set up. I<o> is the
OP that requested the eval, and will normally be an C<OP_ENTEREVAL>,
C<OP_DOFILE> or C<OP_REQUIRE>.
=back
Once you have your hook functions, you need a C<BHK> structure to put
them in. It's best to allocate it statically, since there is no way to
free it once it's registered. The function pointers should be inserted
into this structure using the C<BhkENTRY_set> macro, which will also set
flags indicating which entries are valid. If you do need to allocate
your C<BHK> dynamically for some reason, be sure to zero it before you
start.
Once registered, there is no mechanism to switch these hooks off, so if
that is necessary you will need to do this yourself. An entry in C<%^H>
is probably the best way, so the effect is lexically scoped; however it
is also possible to use the C<BhkDISABLE> and C<BhkENABLE> macros to
temporarily switch entries on and off. You should also be aware that
generally speaking at least one scope will have opened before your
extension is loaded, so you will see some C<pre>/C<post_end> pairs that
didn't have a matching C<start>.
=head1 Examining internal data structures with the C<dump> functions
To aid debugging, the source file F<dump.c> contains a number of
functions which produce formatted output of internal data structures.
The most commonly used of these functions is C<Perl_sv_dump>; it's used
for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls
C<sv_dump> to produce debugging output from Perl-space, so users of that
module should already be familiar with its format.
C<Perl_op_dump> can be used to dump an C<OP> structure or any of its
derivatives, and produces output similar to C<perl -Dx>; in fact,
C<Perl_dump_eval> will dump the main root of the code being evaluated,
exactly like C<-Dx>.
Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an
op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the
subroutines in a package like so: (Thankfully, these are all xsubs, so
there is no op tree)
(gdb) print Perl_dump_packsubs(PL_defstash)
SUB attributes::bootstrap = (xsub 0x811fedc 0)
SUB UNIVERSAL::can = (xsub 0x811f50c 0)
SUB UNIVERSAL::isa = (xsub 0x811f304 0)
SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
and C<Perl_dump_all>, which dumps all the subroutines in the stash and
the op tree of the main root.
=head1 How multiple interpreters and concurrency are supported
=head2 Background and PERL_IMPLICIT_CONTEXT
The Perl interpreter can be regarded as a closed box: it has an API
for feeding it code or otherwise making it do things, but it also has
functions for its own use. This smells a lot like an object, and
there are ways for you to build Perl so that you can have multiple
interpreters, with one interpreter represented either as a C structure,
or inside a thread-specific structure. These structures contain all
the context, the state of that interpreter.
One macro controls the major Perl build flavor: MULTIPLICITY. The
MULTIPLICITY build has a C structure that packages all the interpreter
state. With multiplicity-enabled perls, PERL_IMPLICIT_CONTEXT is also
normally defined, and enables the support for passing in a "hidden" first
argument that represents all three data structures. MULTIPLICITY makes
multi-threaded perls possible (with the ithreads threading model, related
to the macro USE_ITHREADS.)
Two other "encapsulation" macros are the PERL_GLOBAL_STRUCT and
PERL_GLOBAL_STRUCT_PRIVATE (the latter turns on the former, and the
former turns on MULTIPLICITY.) The PERL_GLOBAL_STRUCT causes all the
internal variables of Perl to be wrapped inside a single global struct,
struct perl_vars, accessible as (globals) &PL_Vars or PL_VarsPtr or
the function Perl_GetVars(). The PERL_GLOBAL_STRUCT_PRIVATE goes
one step further, there is still a single struct (allocated in main()
either from heap or from stack) but there are no global data symbols
pointing to it. In either case the global struct should be initialized
as the very first thing in main() using Perl_init_global_struct() and
correspondingly tear it down after perl_free() using Perl_free_global_struct(),
please see F<miniperlmain.c> for usage details. You may also need
to use C<dVAR> in your coding to "declare the global variables"
when you are using them. dTHX does this for you automatically.
To see whether you have non-const data you can use a BSD (or GNU)
compatible C<nm>:
nm libperl.a | grep -v ' [TURtr] '
If this displays any C<D> or C<d> symbols (or possibly C<C> or C<c>),
you have non-const data. The symbols the C<grep> removed are as follows:
C<Tt> are I<text>, or code, the C<Rr> are I<read-only> (const) data,
and the C<U> is <undefined>, external symbols referred to.
The test F<t/porting/libperl.t> does this kind of symbol sanity
checking on C<libperl.a>.
For backward compatibility reasons defining just PERL_GLOBAL_STRUCT
doesn't actually hide all symbols inside a big global struct: some
PerlIO_xxx vtables are left visible. The PERL_GLOBAL_STRUCT_PRIVATE
then hides everything (see how the PERLIO_FUNCS_DECL is used).
All this obviously requires a way for the Perl internal functions to be
either subroutines taking some kind of structure as the first
argument, or subroutines taking nothing as the first argument. To
enable these two very different ways of building the interpreter,
the Perl source (as it does in so many other situations) makes heavy
use of macros and subroutine naming conventions.
First problem: deciding which functions will be public API functions and
which will be private. All functions whose names begin C<S_> are private
(think "S" for "secret" or "static"). All other functions begin with
"Perl_", but just because a function begins with "Perl_" does not mean it is
part of the API. (See L</Internal
Functions>.) The easiest way to be B<sure> a
function is part of the API is to find its entry in L<perlapi>.
If it exists in L<perlapi>, it's part of the API. If it doesn't, and you
think it should be (i.e., you need it for your extension), send mail via
L<perlbug> explaining why you think it should be.
Second problem: there must be a syntax so that the same subroutine
declarations and calls can pass a structure as their first argument,
or pass nothing. To solve this, the subroutines are named and
declared in a particular way. Here's a typical start of a static
function used within the Perl guts:
STATIC void
S_incline(pTHX_ char *s)
STATIC becomes "static" in C, and may be #define'd to nothing in some
configurations in the future.
A public function (i.e. part of the internal API, but not necessarily
sanctioned for use in extensions) begins like this:
void
Perl_sv_setiv(pTHX_ SV* dsv, IV num)
C<pTHX_> is one of a number of macros (in F<perl.h>) that hide the
details of the interpreter's context. THX stands for "thread", "this",
or "thingy", as the case may be. (And no, George Lucas is not involved. :-)
The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument,
or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and
their variants.
When Perl is built without options that set PERL_IMPLICIT_CONTEXT, there is no
first argument containing the interpreter's context. The trailing underscore
in the pTHX_ macro indicates that the macro expansion needs a comma
after the context argument because other arguments follow it. If
PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the
subroutine is not prototyped to take the extra argument. The form of the
macro without the trailing underscore is used when there are no additional
explicit arguments.
When a core function calls another, it must pass the context. This
is normally hidden via macros. Consider C<sv_setiv>. It expands into
something like this:
#ifdef PERL_IMPLICIT_CONTEXT
#define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b)
/* can't do this for vararg functions, see below */
#else
#define sv_setiv Perl_sv_setiv
#endif
This works well, and means that XS authors can gleefully write:
sv_setiv(foo, bar);
and still have it work under all the modes Perl could have been
compiled with.
This doesn't work so cleanly for varargs functions, though, as macros
imply that the number of arguments is known in advance. Instead we
either need to spell them out fully, passing C<aTHX_> as the first
argument (the Perl core tends to do this with functions like
Perl_warner), or use a context-free version.
The context-free version of Perl_warner is called
Perl_warner_nocontext, and does not take the extra argument. Instead
it does dTHX; to get the context from thread-local storage. We
C<#define warner Perl_warner_nocontext> so that extensions get source
compatibility at the expense of performance. (Passing an arg is
cheaper than grabbing it from thread-local storage.)
You can ignore [pad]THXx when browsing the Perl headers/sources.
Those are strictly for use within the core. Extensions and embedders
need only be aware of [pad]THX.
=head2 So what happened to dTHR?
C<dTHR> was introduced in perl 5.005 to support the older thread model.
The older thread model now uses the C<THX> mechanism to pass context
pointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and
later still have it for backward source compatibility, but it is defined
to be a no-op.
=head2 How do I use all this in extensions?
When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call
any functions in the Perl API will need to pass the initial context
argument somehow. The kicker is that you will need to write it in
such a way that the extension still compiles when Perl hasn't been
built with PERL_IMPLICIT_CONTEXT enabled.
There are three ways to do this. First, the easy but inefficient way,
which is also the default, in order to maintain source compatibility
with extensions: whenever F<XSUB.h> is #included, it redefines the aTHX
and aTHX_ macros to call a function that will return the context.
Thus, something like:
sv_setiv(sv, num);
in your extension will translate to this when PERL_IMPLICIT_CONTEXT is
in effect:
Perl_sv_setiv(Perl_get_context(), sv, num);
or to this otherwise:
Perl_sv_setiv(sv, num);
You don't have to do anything new in your extension to get this; since
the Perl library provides Perl_get_context(), it will all just
work.
The second, more efficient way is to use the following template for
your Foo.xs:
#define PERL_NO_GET_CONTEXT /* we want efficiency */
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
STATIC void my_private_function(int arg1, int arg2);
STATIC void
my_private_function(int arg1, int arg2)
{
dTHX; /* fetch context */
... call many Perl API functions ...
}
[... etc ...]
MODULE = Foo PACKAGE = Foo
/* typical XSUB */
void
my_xsub(arg)
int arg
CODE:
my_private_function(arg, 10);
Note that the only two changes from the normal way of writing an
extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before
including the Perl headers, followed by a C<dTHX;> declaration at
the start of every function that will call the Perl API. (You'll
know which functions need this, because the C compiler will complain
that there's an undeclared identifier in those functions.) No changes
are needed for the XSUBs themselves, because the XS() macro is
correctly defined to pass in the implicit context if needed.
The third, even more efficient way is to ape how it is done within
the Perl guts:
#define PERL_NO_GET_CONTEXT /* we want efficiency */
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
/* pTHX_ only needed for functions that call Perl API */
STATIC void my_private_function(pTHX_ int arg1, int arg2);
STATIC void
my_private_function(pTHX_ int arg1, int arg2)
{
/* dTHX; not needed here, because THX is an argument */
... call Perl API functions ...
}
[... etc ...]
MODULE = Foo PACKAGE = Foo
/* typical XSUB */
void
my_xsub(arg)
int arg
CODE:
my_private_function(aTHX_ arg, 10);
This implementation never has to fetch the context using a function
call, since it is always passed as an extra argument. Depending on
your needs for simplicity or efficiency, you may mix the previous
two approaches freely.
Never add a comma after C<pTHX> yourself--always use the form of the
macro with the underscore for functions that take explicit arguments,
or the form without the argument for functions with no explicit arguments.
If one is compiling Perl with the C<-DPERL_GLOBAL_STRUCT> the C<dVAR>
definition is needed if the Perl global variables (see F<perlvars.h>
or F<globvar.sym>) are accessed in the function and C<dTHX> is not
used (the C<dTHX> includes the C<dVAR> if necessary). One notices
the need for C<dVAR> only with the said compile-time define, because
otherwise the Perl global variables are visible as-is.
=head2 Should I do anything special if I call perl from multiple threads?
If you create interpreters in one thread and then proceed to call them in
another, you need to make sure perl's own Thread Local Storage (TLS) slot is
initialized correctly in each of those threads.
The C<perl_alloc> and C<perl_clone> API functions will automatically set
the TLS slot to the interpreter they created, so that there is no need to do
anything special if the interpreter is always accessed in the same thread that
created it, and that thread did not create or call any other interpreters
afterwards. If that is not the case, you have to set the TLS slot of the
thread before calling any functions in the Perl API on that particular
interpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that
thread as the first thing you do:
/* do this before doing anything else with some_perl */
PERL_SET_CONTEXT(some_perl);
... other Perl API calls on some_perl go here ...
=head2 Future Plans and PERL_IMPLICIT_SYS
Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything
that the interpreter knows about itself and pass it around, so too are
there plans to allow the interpreter to bundle up everything it knows
about the environment it's running on. This is enabled with the
PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on
Windows.
This allows the ability to provide an extra pointer (called the "host"
environment) for all the system calls. This makes it possible for
all the system stuff to maintain their own state, broken down into
seven C structures. These are thin wrappers around the usual system
calls (see F<win32/perllib.c>) for the default perl executable, but for a
more ambitious host (like the one that would do fork() emulation) all
the extra work needed to pretend that different interpreters are
actually different "processes", would be done here.
The Perl engine/interpreter and the host are orthogonal entities.
There could be one or more interpreters in a process, and one or
more "hosts", with free association between them.
=head1 Internal Functions
All of Perl's internal functions which will be exposed to the outside
world are prefixed by C<Perl_> so that they will not conflict with XS
functions or functions used in a program in which Perl is embedded.
Similarly, all global variables begin with C<PL_>. (By convention,
static functions start with C<S_>.)
Inside the Perl core (C<PERL_CORE> defined), you can get at the functions
either with or without the C<Perl_> prefix, thanks to a bunch of defines
that live in F<embed.h>. Note that extension code should I<not> set
C<PERL_CORE>; this exposes the full perl internals, and is likely to cause
breakage of the XS in each new perl release.
The file F<embed.h> is generated automatically from
F<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping
header files for the internal functions, generates the documentation
and a lot of other bits and pieces. It's important that when you add
a new function to the core or change an existing one, you change the
data in the table in F<embed.fnc> as well. Here's a sample entry from
that table:
Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval
The second column is the return type, the third column the name. Columns
after that are the arguments. The first column is a set of flags:
=over 3
=item A
This function is a part of the public
API. All such functions should also
have 'd', very few do not.
=item p
This function has a C<Perl_> prefix; i.e. it is defined as
C<Perl_av_fetch>.
=item d
This function has documentation using the C<apidoc> feature which we'll
look at in a second. Some functions have 'd' but not 'A'; docs are good.
=back
Other available flags are:
=over 3
=item s
This is a static function and is defined as C<STATIC S_whatever>, and
usually called within the sources as C<whatever(...)>.
=item n
This does not need an interpreter context, so the definition has no
C<pTHX>, and it follows that callers don't use C<aTHX>. (See
L</Background and PERL_IMPLICIT_CONTEXT>.)
=item r
This function never returns; C<croak>, C<exit> and friends.
=item f
This function takes a variable number of arguments, C<printf> style.
The argument list should end with C<...>, like this:
Afprd |void |croak |const char* pat|...
=item M
This function is part of the experimental development API, and may change
or disappear without notice.
=item o
This function should not have a compatibility macro to define, say,
C<Perl_parse> to C<parse>. It must be called as C<Perl_parse>.
=item x
This function isn't exported out of the Perl core.
=item m
This is implemented as a macro.
=item X
This function is explicitly exported.
=item E
This function is visible to extensions included in the Perl core.
=item b
Binary backward compatibility; this function is a macro but also has
a C<Perl_> implementation (which is exported).
=item others
See the comments at the top of C<embed.fnc> for others.
=back
If you edit F<embed.pl> or F<embed.fnc>, you will need to run
C<make regen_headers> to force a rebuild of F<embed.h> and other
auto-generated files.
=head2 Formatted Printing of IVs, UVs, and NVs
If you are printing IVs, UVs, or NVS instead of the stdio(3) style
formatting codes like C<%d>, C<%ld>, C<%f>, you should use the
following macros for portability
IVdf IV in decimal
UVuf UV in decimal
UVof UV in octal
UVxf UV in hexadecimal
NVef NV %e-like
NVff NV %f-like
NVgf NV %g-like
These will take care of 64-bit integers and long doubles.
For example:
printf("IV is %"IVdf"\n", iv);
The IVdf will expand to whatever is the correct format for the IVs.
Note that there are different "long doubles": Perl will use
whatever the compiler has.
If you are printing addresses of pointers, use UVxf combined
with PTR2UV(), do not use %lx or %p.
=head2 Formatted Printing of Size_t and SSize_t
The most general way to do this is to cast them to a UV or IV, and
print as in the
L<previous section|/Formatted Printing of IVs, UVs, and NVs>.
But if you're using C<PerlIO_printf()>, it's less typing and visual
clutter to use the C<"%z"> length modifier (for I<siZe>):
PerlIO_printf("STRLEN is %zu\n", len);
This modifier is not portable, so its use should be restricted to
C<PerlIO_printf()>.
=head2 Pointer-To-Integer and Integer-To-Pointer
Because pointer size does not necessarily equal integer size,
use the follow macros to do it right.
PTR2UV(pointer)
PTR2IV(pointer)
PTR2NV(pointer)
INT2PTR(pointertotype, integer)
For example:
IV iv = ...;
SV *sv = INT2PTR(SV*, iv);
and
AV *av = ...;
UV uv = PTR2UV(av);
=head2 Exception Handling
There are a couple of macros to do very basic exception handling in XS
modules. You have to define C<NO_XSLOCKS> before including F<XSUB.h> to
be able to use these macros:
#define NO_XSLOCKS
#include "XSUB.h"
You can use these macros if you call code that may croak, but you need
to do some cleanup before giving control back to Perl. For example:
dXCPT; /* set up necessary variables */
XCPT_TRY_START {
code_that_may_croak();
} XCPT_TRY_END
XCPT_CATCH
{
/* do cleanup here */
XCPT_RETHROW;
}
Note that you always have to rethrow an exception that has been
caught. Using these macros, it is not possible to just catch the
exception and ignore it. If you have to ignore the exception, you
have to use the C<call_*> function.
The advantage of using the above macros is that you don't have
to setup an extra function for C<call_*>, and that using these
macros is faster than using C<call_*>.
=head2 Source Documentation
There's an effort going on to document the internal functions and
automatically produce reference manuals from them -- L<perlapi> is one
such manual which details all the functions which are available to XS
writers. L<perlintern> is the autogenerated manual for the functions
which are not part of the API and are supposedly for internal use only.
Source documentation is created by putting POD comments into the C
source, like this:
/*
=for apidoc sv_setiv
Copies an integer into the given SV. Does not handle 'set' magic. See
L<perlapi/sv_setiv_mg>.
=cut
*/
Please try and supply some documentation if you add functions to the
Perl core.
=head2 Backwards compatibility
The Perl API changes over time. New functions are
added or the interfaces of existing functions are
changed. The C<Devel::PPPort> module tries to
provide compatibility code for some of these changes, so XS writers don't
have to code it themselves when supporting multiple versions of Perl.
C<Devel::PPPort> generates a C header file F<ppport.h> that can also
be run as a Perl script. To generate F<ppport.h>, run:
perl -MDevel::PPPort -eDevel::PPPort::WriteFile
Besides checking existing XS code, the script can also be used to retrieve
compatibility information for various API calls using the C<--api-info>
command line switch. For example:
% perl ppport.h --api-info=sv_magicext
For details, see C<perldoc ppport.h>.
=head1 Unicode Support
Perl 5.6.0 introduced Unicode support. It's important for porters and XS
writers to understand this support and make sure that the code they
write does not corrupt Unicode data.
=head2 What B<is> Unicode, anyway?
In the olden, less enlightened times, we all used to use ASCII. Most of
us did, anyway. The big problem with ASCII is that it's American. Well,
no, that's not actually the problem; the problem is that it's not
particularly useful for people who don't use the Roman alphabet. What
used to happen was that particular languages would stick their own
alphabet in the upper range of the sequence, between 128 and 255. Of
course, we then ended up with plenty of variants that weren't quite
ASCII, and the whole point of it being a standard was lost.
Worse still, if you've got a language like Chinese or
Japanese that has hundreds or thousands of characters, then you really
can't fit them into a mere 256, so they had to forget about ASCII
altogether, and build their own systems using pairs of numbers to refer
to one character.
To fix this, some people formed Unicode, Inc. and
produced a new character set containing all the characters you can
possibly think of and more. There are several ways of representing these
characters, and the one Perl uses is called UTF-8. UTF-8 uses
a variable number of bytes to represent a character. You can learn more
about Unicode and Perl's Unicode model in L<perlunicode>.
(On EBCDIC platforms, Perl uses instead UTF-EBCDIC, which is a form of
UTF-8 adapted for EBCDIC platforms. Below, we just talk about UTF-8.
UTF-EBCDIC is like UTF-8, but the details are different. The macros
hide the differences from you, just remember that the particular numbers
and bit patterns presented below will differ in UTF-EBCDIC.)
=head2 How can I recognise a UTF-8 string?
You can't. This is because UTF-8 data is stored in bytes just like
non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types)
capital E with a grave accent, is represented by the two bytes
C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)>
has that byte sequence as well. So you can't tell just by looking -- this
is what makes Unicode input an interesting problem.
In general, you either have to know what you're dealing with, or you
have to guess. The API function C<is_utf8_string> can help; it'll tell
you if a string contains only valid UTF-8 characters, and the chances
of a non-UTF-8 string looking like valid UTF-8 become very small very
quickly with increasing string length. On a character-by-character
basis, C<isUTF8_CHAR>
will tell you whether the current character in a string is valid UTF-8.
=head2 How does UTF-8 represent Unicode characters?
As mentioned above, UTF-8 uses a variable number of bytes to store a
character. Characters with values 0...127 are stored in one
byte, just like good ol' ASCII. Character 128 is stored as
C<v194.128>; this continues up to character 191, which is
C<v194.191>. Now we've run out of bits (191 is binary
C<10111111>) so we move on; character 192 is C<v195.128>. And
so it goes on, moving to three bytes at character 2048.
L<perlunicode/Unicode Encodings> has pictures of how this works.
Assuming you know you're dealing with a UTF-8 string, you can find out
how long the first character in it is with the C<UTF8SKIP> macro:
char *utf = "\305\233\340\240\201";
I32 len;
len = UTF8SKIP(utf); /* len is 2 here */
utf += len;
len = UTF8SKIP(utf); /* len is 3 here */
Another way to skip over characters in a UTF-8 string is to use
C<utf8_hop>, which takes a string and a number of characters to skip
over. You're on your own about bounds checking, though, so don't use it
lightly.
All bytes in a multi-byte UTF-8 character will have the high bit set,
so you can test if you need to do something special with this
character like this (the C<UTF8_IS_INVARIANT()> is a macro that tests
whether the byte is encoded as a single byte even in UTF-8):
U8 *utf;
U8 *utf_end; /* 1 beyond buffer pointed to by utf */
UV uv; /* Note: a UV, not a U8, not a char */
STRLEN len; /* length of character in bytes */
if (!UTF8_IS_INVARIANT(*utf))
/* Must treat this as UTF-8 */
uv = utf8_to_uvchr_buf(utf, utf_end, &len);
else
/* OK to treat this character as a byte */
uv = *utf;
You can also see in that example that we use C<utf8_to_uvchr_buf> to get the
value of the character; the inverse function C<uvchr_to_utf8> is available
for putting a UV into UTF-8:
if (!UVCHR_IS_INVARIANT(uv))
/* Must treat this as UTF8 */
utf8 = uvchr_to_utf8(utf8, uv);
else
/* OK to treat this character as a byte */
*utf8++ = uv;
You B<must> convert characters to UVs using the above functions if
you're ever in a situation where you have to match UTF-8 and non-UTF-8
characters. You may not skip over UTF-8 characters in this case. If you
do this, you'll lose the ability to match hi-bit non-UTF-8 characters;
for instance, if your UTF-8 string contains C<v196.172>, and you skip
that character, you can never match a C<chr(200)> in a non-UTF-8 string.
So don't do that!
(Note that we don't have to test for invariant characters in the
examples above. The functions work on any well-formed UTF-8 input.
It's just that its faster to avoid the function overhead when it's not
needed.)
=head2 How does Perl store UTF-8 strings?
Currently, Perl deals with UTF-8 strings and non-UTF-8 strings
slightly differently. A flag in the SV, C<SVf_UTF8>, indicates that the
string is internally encoded as UTF-8. Without it, the byte value is the
codepoint number and vice versa. This flag is only meaningful if the SV
is C<SvPOK> or immediately after stringification via C<SvPV> or a
similar macro. You can check and manipulate this flag with the
following macros:
SvUTF8(sv)
SvUTF8_on(sv)
SvUTF8_off(sv)
This flag has an important effect on Perl's treatment of the string: if
UTF-8 data is not properly distinguished, regular expressions,
C<length>, C<substr> and other string handling operations will have
undesirable (wrong) results.
The problem comes when you have, for instance, a string that isn't
flagged as UTF-8, and contains a byte sequence that could be UTF-8 --
especially when combining non-UTF-8 and UTF-8 strings.
Never forget that the C<SVf_UTF8> flag is separate from the PV value; you
need to be sure you don't accidentally knock it off while you're
manipulating SVs. More specifically, you cannot expect to do this:
SV *sv;
SV *nsv;
STRLEN len;
char *p;
p = SvPV(sv, len);
frobnicate(p);
nsv = newSVpvn(p, len);
The C<char*> string does not tell you the whole story, and you can't
copy or reconstruct an SV just by copying the string value. Check if the
old SV has the UTF8 flag set (I<after> the C<SvPV> call), and act
accordingly:
p = SvPV(sv, len);
is_utf8 = SvUTF8(sv);
frobnicate(p, is_utf8);
nsv = newSVpvn(p, len);
if (is_utf8)
SvUTF8_on(nsv);
In the above, your C<frobnicate> function has been changed to be made
aware of whether or not it's dealing with UTF-8 data, so that it can
handle the string appropriately.
Since just passing an SV to an XS function and copying the data of
the SV is not enough to copy the UTF8 flags, even less right is just
passing a S<C<char *>> to an XS function.
For full generality, use the L<C<DO_UTF8>|perlapi/DO_UTF8> macro to see if the
string in an SV is to be I<treated> as UTF-8. This takes into account
if the call to the XS function is being made from within the scope of
L<S<C<use bytes>>|bytes>. If so, the underlying bytes that comprise the
UTF-8 string are to be exposed, rather than the character they
represent. But this pragma should only really be used for debugging and
perhaps low-level testing at the byte level. Hence most XS code need
not concern itself with this, but various areas of the perl core do need
to support it.
And this isn't the whole story. Starting in Perl v5.12, strings that
aren't encoded in UTF-8 may also be treated as Unicode under various
conditions (see L<perlunicode/ASCII Rules versus Unicode Rules>).
This is only really a problem for characters whose ordinals are between
128 and 255, and their behavior varies under ASCII versus Unicode rules
in ways that your code cares about (see L<perlunicode/The "Unicode Bug">).
There is no published API for dealing with this, as it is subject to
change, but you can look at the code for C<pp_lc> in F<pp.c> for an
example as to how it's currently done.
=head2 How do I convert a string to UTF-8?
If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade
the non-UTF-8 strings to UTF-8. If you've got an SV, the easiest way to do
this is:
sv_utf8_upgrade(sv);
However, you must not do this, for example:
if (!SvUTF8(left))
sv_utf8_upgrade(left);
If you do this in a binary operator, you will actually change one of the
strings that came into the operator, and, while it shouldn't be noticeable
by the end user, it can cause problems in deficient code.
Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its
string argument. This is useful for having the data available for
comparisons and so on, without harming the original SV. There's also
C<utf8_to_bytes> to go the other way, but naturally, this will fail if
the string contains any characters above 255 that can't be represented
in a single byte.
=head2 How do I compare strings?
L<perlapi/sv_cmp> and L<perlapi/sv_cmp_flags> do a lexigraphic
comparison of two SV's, and handle UTF-8ness properly. Note, however,
that Unicode specifies a much fancier mechanism for collation, available
via the L<Unicode::Collate> module.
To just compare two strings for equality/non-equality, you can just use
L<C<memEQ()>|perlapi/memEQ> and L<C<memNE()>|perlapi/memEQ> as usual,
except the strings must be both UTF-8 or not UTF-8 encoded.
To compare two strings case-insensitively, use
L<C<foldEQ_utf8()>|perlapi/foldEQ_utf8> (the strings don't have to have
the same UTF-8ness).
=head2 Is there anything else I need to know?
Not really. Just remember these things:
=over 3
=item *
There's no way to tell if a S<C<char *>> or S<C<U8 *>> string is UTF-8
or not. But you can tell if an SV is to be treated as UTF-8 by calling
C<DO_UTF8> on it, after stringifying it with C<SvPV> or a similar
macro. And, you can tell if SV is actually UTF-8 (even if it is not to
be treated as such) by looking at its C<SvUTF8> flag (again after
stringifying it). Don't forget to set the flag if something should be
UTF-8.
Treat the flag as part of the PV, even though it's not -- if you pass on
the PV to somewhere, pass on the flag too.
=item *
If a string is UTF-8, B<always> use C<utf8_to_uvchr_buf> to get at the value,
unless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>.
=item *
When writing a character UV to a UTF-8 string, B<always> use
C<uvchr_to_utf8>, unless C<UVCHR_IS_INVARIANT(uv))> in which case
you can use C<*s = uv>.
=item *
Mixing UTF-8 and non-UTF-8 strings is
tricky. Use C<bytes_to_utf8> to get
a new string which is UTF-8 encoded, and then combine them.
=back
=head1 Custom Operators
Custom operator support is an experimental feature that allows you to
define your own ops. This is primarily to allow the building of
interpreters for other languages in the Perl core, but it also allows
optimizations through the creation of "macro-ops" (ops which perform the
functions of multiple ops which are usually executed together, such as
C<gvsv, gvsv, add>.)
This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl
core does not "know" anything special about this op type, and so it will
not be involved in any optimizations. This also means that you can
define your custom ops to be any op structure -- unary, binary, list and
so on -- you like.
It's important to know what custom operators won't do for you. They
won't let you add new syntax to Perl, directly. They won't even let you
add new keywords, directly. In fact, they won't change the way Perl
compiles a program at all. You have to do those changes yourself, after
Perl has compiled the program. You do this either by manipulating the op
tree using a C<CHECK> block and the C<B::Generate> module, or by adding
a custom peephole optimizer with the C<optimize> module.
When you do this, you replace ordinary Perl ops with custom ops by
creating ops with the type C<OP_CUSTOM> and the C<op_ppaddr> of your own
PP function. This should be defined in XS code, and should look like
the PP ops in C<pp_*.c>. You are responsible for ensuring that your op
takes the appropriate number of values from the stack, and you are
responsible for adding stack marks if necessary.
You should also "register" your op with the Perl interpreter so that it
can produce sensible error and warning messages. Since it is possible to
have multiple custom ops within the one "logical" op type C<OP_CUSTOM>,
Perl uses the value of C<< o->op_ppaddr >> to determine which custom op
it is dealing with. You should create an C<XOP> structure for each
ppaddr you use, set the properties of the custom op with
C<XopENTRY_set>, and register the structure against the ppaddr using
C<Perl_custom_op_register>. A trivial example might look like:
static XOP my_xop;
static OP *my_pp(pTHX);
BOOT:
XopENTRY_set(&my_xop, xop_name, "myxop");
XopENTRY_set(&my_xop, xop_desc, "Useless custom op");
Perl_custom_op_register(aTHX_ my_pp, &my_xop);
The available fields in the structure are:
=over 4
=item xop_name
A short name for your op. This will be included in some error messages,
and will also be returned as C<< $op->name >> by the L<B|B> module, so
it will appear in the output of module like L<B::Concise|B::Concise>.
=item xop_desc
A short description of the function of the op.
=item xop_class
Which of the various C<*OP> structures this op uses. This should be one of
the C<OA_*> constants from F<op.h>, namely
=over 4
=item OA_BASEOP
=item OA_UNOP
=item OA_BINOP
=item OA_LOGOP
=item OA_LISTOP
=item OA_PMOP
=item OA_SVOP
=item OA_PADOP
=item OA_PVOP_OR_SVOP
This should be interpreted as 'C<PVOP>' only. The C<_OR_SVOP> is because
the only core C<PVOP>, C<OP_TRANS>, can sometimes be a C<SVOP> instead.
=item OA_LOOP
=item OA_COP
=back
The other C<OA_*> constants should not be used.
=item xop_peep
This member is of type C<Perl_cpeep_t>, which expands to C<void
(*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)>. If it is set, this function
will be called from C<Perl_rpeep> when ops of this type are encountered
by the peephole optimizer. I<o> is the OP that needs optimizing;
I<oldop> is the previous OP optimized, whose C<op_next> points to I<o>.
=back
C<B::Generate> directly supports the creation of custom ops by name.
=head1 Dynamic Scope and the Context Stack
B<Note:> this section describes a non-public internal API that is subject
to change without notice.
=head2 Introduction to the context stack
In Perl, dynamic scoping refers to the runtime nesting of things like
subroutine calls, evals etc, as well as the entering and exiting of block
scopes. For example, the restoring of a C<local>ised variable is
determined by the dynamic scope.
Perl tracks the dynamic scope by a data structure called the context
stack, which is an array of C<PERL_CONTEXT> structures, and which is
itself a big union for all the types of context. Whenever a new scope is
entered (such as a block, a C<for> loop, or a subroutine call), a new
context entry is pushed onto the stack. Similarly when leaving a block or
returning from a subroutine call etc. a context is popped. Since the
context stack represents the current dynamic scope, it can be searched.
For example, C<next LABEL> searches back through the stack looking for a
loop context that matches the label; C<return> pops contexts until it
finds a sub or eval context or similar; C<caller> examines sub contexts on
the stack.
Each context entry is labelled with a context type, C<cx_type>. Typical
context types are C<CXt_SUB>, C<CXt_EVAL> etc., as well as C<CXt_BLOCK>
and C<CXt_NULL> which represent a basic scope (as pushed by C<pp_enter>)
and a sort block. The type determines which part of the context union are
valid.
The main division in the context struct is between a substitution scope
(C<CXt_SUBST>) and block scopes, which are everything else. The former is
just used while executing C<s///e>, and won't be discussed further
here.
All the block scope types share a common base, which corresponds to
C<CXt_BLOCK>. This stores the old values of various scope-related
variables like C<PL_curpm>, as well as information about the current
scope, such as C<gimme>. On scope exit, the old variables are restored.
Particular block scope types store extra per-type information. For
example, C<CXt_SUB> stores the currently executing CV, while the various
for loop types might hold the original loop variable SV. On scope exit,
the per-type data is processed; for example the CV has its reference count
decremented, and the original loop variable is restored.
The macro C<cxstack> returns the base of the current context stack, while
C<cxstack_ix> is the index of the current frame within that stack.
In fact, the context stack is actually part of a stack-of-stacks system;
whenever something unusual is done such as calling a C<DESTROY> or tie
handler, a new stack is pushed, then popped at the end.
Note that the API described here changed considerably in perl 5.24; prior
to that, big macros like C<PUSHBLOCK> and C<POPSUB> were used; in 5.24
they were replaced by the inline static functions described below. In
addition, the ordering and detail of how these macros/function work
changed in many ways, often subtly. In particular they didn't handle
saving the savestack and temps stack positions, and required additional
C<ENTER>, C<SAVETMPS> and C<LEAVE> compared to the new functions. The
old-style macros will not be described further.
=head2 Pushing contexts
For pushing a new context, the two basic functions are
C<cx = cx_pushblock()>, which pushes a new basic context block and returns
its address, and a family of similar functions with names like
C<cx_pushsub(cx)> which populate the additional type-dependent fields in
the C<cx> struct. Note that C<CXt_NULL> and C<CXt_BLOCK> don't have their
own push functions, as they don't store any data beyond that pushed by
C<cx_pushblock>.
The fields of the context struct and the arguments to the C<cx_*>
functions are subject to change between perl releases, representing
whatever is convenient or efficient for that release.
A typical context stack pushing can be found in C<pp_entersub>; the
following shows a simplified and stripped-down example of a non-XS call,
along with comments showing roughly what each function does.
dMARK;
U8 gimme = GIMME_V;
bool hasargs = cBOOL(PL_op->op_flags & OPf_STACKED);
OP *retop = PL_op->op_next;
I32 old_ss_ix = PL_savestack_ix;
CV *cv = ....;
/* ... make mortal copies of stack args which are PADTMPs here ... */
/* ... do any additional savestack pushes here ... */
/* Now push a new context entry of type 'CXt_SUB'; initially just
* doing the actions common to all block types: */
cx = cx_pushblock(CXt_SUB, gimme, MARK, old_ss_ix);
/* this does (approximately):
CXINC; /* cxstack_ix++ (grow if necessary) */
cx = CX_CUR(); /* and get the address of new frame */
cx->cx_type = CXt_SUB;
cx->blk_gimme = gimme;
cx->blk_oldsp = MARK - PL_stack_base;
cx->blk_oldsaveix = old_ss_ix;
cx->blk_oldcop = PL_curcop;
cx->blk_oldmarksp = PL_markstack_ptr - PL_markstack;
cx->blk_oldscopesp = PL_scopestack_ix;
cx->blk_oldpm = PL_curpm;
cx->blk_old_tmpsfloor = PL_tmps_floor;
PL_tmps_floor = PL_tmps_ix;
*/
/* then update the new context frame with subroutine-specific info,
* such as the CV about to be executed: */
cx_pushsub(cx, cv, retop, hasargs);
/* this does (approximately):
cx->blk_sub.cv = cv;
cx->blk_sub.olddepth = CvDEPTH(cv);
cx->blk_sub.prevcomppad = PL_comppad;
cx->cx_type |= (hasargs) ? CXp_HASARGS : 0;
cx->blk_sub.retop = retop;
SvREFCNT_inc_simple_void_NN(cv);
*/
Note that C<cx_pushblock()> sets two new floors: for the args stack (to
C<MARK>) and the temps stack (to C<PL_tmps_ix>). While executing at this
scope level, every C<nextstate> (amongst others) will reset the args and
tmps stack levels to these floors. Note that since C<cx_pushblock> uses
the current value of C<PL_tmps_ix> rather than it being passed as an arg,
this dictates at what point C<cx_pushblock> should be called. In
particular, any new mortals which should be freed only on scope exit
(rather than at the next C<nextstate>) should be created first.
Most callers of C<cx_pushblock> simply set the new args stack floor to the
top of the previous stack frame, but for C<CXt_LOOP_LIST> it stores the
items being iterated over on the stack, and so sets C<blk_oldsp> to the
top of these items instead. Note that, contrary to its name, C<blk_oldsp>
doesn't always represent the value to restore C<PL_stack_sp> to on scope
exit.
Note the early capture of C<PL_savestack_ix> to C<old_ss_ix>, which is
later passed as an arg to C<cx_pushblock>. In the case of C<pp_entersub>,
this is because, although most values needing saving are stored in fields
of the context struct, an extra value needs saving only when the debugger
is running, and it doesn't make sense to bloat the struct for this rare
case. So instead it is saved on the savestack. Since this value gets
calculated and saved before the context is pushed, it is necessary to pass
the old value of C<PL_savestack_ix> to C<cx_pushblock>, to ensure that the
saved value gets freed during scope exit. For most users of
C<cx_pushblock>, where nothing needs pushing on the save stack,
C<PL_savestack_ix> is just passed directly as an arg to C<cx_pushblock>.
Note that where possible, values should be saved in the context struct
rather than on the save stack; it's much faster that way.
Normally C<cx_pushblock> should be immediately followed by the appropriate
C<cx_pushfoo>, with nothing between them; this is because if code
in-between could die (e.g. a warning upgraded to fatal), then the context
stack unwinding code in C<dounwind> would see (in the example above) a
C<CXt_SUB> context frame, but without all the subroutine-specific fields
set, and crashes would soon ensue.
Where the two must be separate, initially set the type to C<CXt_NULL> or
C<CXt_BLOCK>, and later change it to C<CXt_foo> when doing the
C<cx_pushfoo>. This is exactly what C<pp_enteriter> does, once it's
determined which type of loop it's pushing.
=head2 Popping contexts
Contexts are popped using C<cx_popsub()> etc. and C<cx_popblock()>. Note
however, that unlike C<cx_pushblock>, neither of these functions actually
decrement the current context stack index; this is done separately using
C<CX_POP()>.
There are two main ways that contexts are popped. During normal execution
as scopes are exited, functions like C<pp_leave>, C<pp_leaveloop> and
C<pp_leavesub> process and pop just one context using C<cx_popfoo> and
C<cx_popblock>. On the other hand, things like C<pp_return> and C<next>
may have to pop back several scopes until a sub or loop context is found,
and exceptions (such as C<die>) need to pop back contexts until an eval
context is found. Both of these are accomplished by C<dounwind()>, which
is capable of processing and popping all contexts above the target one.
Here is a typical example of context popping, as found in C<pp_leavesub>
(simplified slightly):
U8 gimme;
PERL_CONTEXT *cx;
SV **oldsp;
OP *retop;
cx = CX_CUR();
gimme = cx->blk_gimme;
oldsp = PL_stack_base + cx->blk_oldsp; /* last arg of previous frame */
if (gimme == G_VOID)
PL_stack_sp = oldsp;
else
leave_adjust_stacks(oldsp, oldsp, gimme, 0);
CX_LEAVE_SCOPE(cx);
cx_popsub(cx);
cx_popblock(cx);
retop = cx->blk_sub.retop;
CX_POP(cx);
return retop;
The steps above are in a very specific order, designed to be the reverse
order of when the context was pushed. The first thing to do is to copy
and/or protect any any return arguments and free any temps in the current
scope. Scope exits like an rvalue sub normally return a mortal copy of
their return args (as opposed to lvalue subs). It is important to make
this copy before the save stack is popped or variables are restored, or
bad things like the following can happen:
sub f { my $x =...; $x } # $x freed before we get to copy it
sub f { /(...)/; $1 } # PL_curpm restored before $1 copied
Although we wish to free any temps at the same time, we have to be careful
not to free any temps which are keeping return args alive; nor to free the
temps we have just created while mortal copying return args. Fortunately,
C<leave_adjust_stacks()> is capable of making mortal copies of return args,
shifting args down the stack, and only processing those entries on the
temps stack that are safe to do so.
In void context no args are returned, so it's more efficient to skip
calling C<leave_adjust_stacks()>. Also in void context, a C<nextstate> op
is likely to be imminently called which will do a C<FREETMPS>, so there's
no need to do that either.
The next step is to pop savestack entries: C<CX_LEAVE_SCOPE(cx)> is just
defined as C<<LEAVE_SCOPE(cx->blk_oldsaveix)>>. Note that during the
popping, it's possible for perl to call destructors, call C<STORE> to undo
localisations of tied vars, and so on. Any of these can die or call
C<exit()>. In this case, C<dounwind()> will be called, and the current
context stack frame will be re-processed. Thus it is vital that all steps
in popping a context are done in such a way to support reentrancy. The
other alternative, of decrementing C<cxstack_ix> I<before> processing the
frame, would lead to leaks and the like if something died halfway through,
or overwriting of the current frame.
C<CX_LEAVE_SCOPE> itself is safely re-entrant: if only half the savestack
items have been popped before dying and getting trapped by eval, then the
C<CX_LEAVE_SCOPE>s in C<dounwind> or C<pp_leaveeval> will continue where
the first one left off.
The next step is the type-specific context processing; in this case
C<cx_popsub>. In part, this looks like:
cv = cx->blk_sub.cv;
CvDEPTH(cv) = cx->blk_sub.olddepth;
cx->blk_sub.cv = NULL;
SvREFCNT_dec(cv);
where its processing the just-executed CV. Note that before it decrements
the CV's reference count, it nulls the C<blk_sub.cv>. This means that if
it re-enters, the CV won't be freed twice. It also means that you can't
rely on such type-specific fields having useful values after the return
from C<cx_popfoo>.
Next, C<cx_popblock> restores all the various interpreter vars to their
previous values or previous high water marks; it expands to:
PL_markstack_ptr = PL_markstack + cx->blk_oldmarksp;
PL_scopestack_ix = cx->blk_oldscopesp;
PL_curpm = cx->blk_oldpm;
PL_curcop = cx->blk_oldcop;
PL_tmps_floor = cx->blk_old_tmpsfloor;
Note that it I<doesn't> restore C<PL_stack_sp>; as mentioned earlier,
which value to restore it to depends on the context type (specifically
C<for (list) {}>), and what args (if any) it returns; and that will
already have been sorted out earlier by C<leave_adjust_stacks()>.
Finally, the context stack pointer is actually decremented by C<CX_POP(cx)>.
After this point, it's possible that that the current context frame could
be overwritten by other contexts being pushed. Although things like ties
and C<DESTROY> are supposed to work within a new context stack, it's best
not to assume this. Indeed on debugging builds, C<CX_POP(cx)> deliberately
sets C<cx> to null to detect code that is still relying on the field
values in that context frame. Note in the C<pp_leavesub()> example above,
we grab C<blk_sub.retop> I<before> calling C<CX_POP>.
=head2 Redoing contexts
Finally, there is C<cx_topblock(cx)>, which acts like a super-C<nextstate>
as regards to resetting various vars to their base values. It is used in
places like C<pp_next>, C<pp_redo> and C<pp_goto> where rather than
exiting a scope, we want to re-initialise the scope. As well as resetting
C<PL_stack_sp> like C<nextstate>, it also resets C<PL_markstack_ptr>,
C<PL_scopestack_ix> and C<PL_curpm>. Note that it doesn't do a
C<FREETMPS>.
=head1 AUTHORS
Until May 1997, this document was maintained by Jeff Okamoto
E<lt>okamoto@corp.hp.comE<gt>. It is now maintained as part of Perl
itself by the Perl 5 Porters E<lt>perl5-porters@perl.orgE<gt>.
With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer,
Stephen McCamant, and Gurusamy Sarathy.
=head1 SEE ALSO
L<perlapi>, L<perlintern>, L<perlxs>, L<perlembed>
PK z3�Z�,U�c �c perlunicook.podnu �[���
=encoding utf8
=head1 NAME
perlunicook - cookbookish examples of handling Unicode in Perl
=head1 DESCRIPTION
This manpage contains short recipes demonstrating how to handle common Unicode
operations in Perl, plus one complete program at the end. Any undeclared
variables in individual recipes are assumed to have a previous appropriate
value in them.
=head1 EXAMPLES
=head2 ℞ 0: Standard preamble
Unless otherwise notes, all examples below require this standard preamble
to work correctly, with the C<#!> adjusted to work on your system:
#!/usr/bin/env perl
use utf8; # so literals and identifiers can be in UTF-8
use v5.12; # or later to get "unicode_strings" feature
use strict; # quote strings, declare variables
use warnings; # on by default
use warnings qw(FATAL utf8); # fatalize encoding glitches
use open qw(:std :encoding(UTF-8)); # undeclared streams in UTF-8
use charnames qw(:full :short); # unneeded in v5.16
This I<does> make even Unix programmers C<binmode> your binary streams,
or open them with C<:raw>, but that's the only way to get at them
portably anyway.
B<WARNING>: C<use autodie> (pre 2.26) and C<use open> do not get along with each
other.
=head2 ℞ 1: Generic Unicode-savvy filter
Always decompose on the way in, then recompose on the way out.
use Unicode::Normalize;
while (<>) {
$_ = NFD($_); # decompose + reorder canonically
...
} continue {
print NFC($_); # recompose (where possible) + reorder canonically
}
=head2 ℞ 2: Fine-tuning Unicode warnings
As of v5.14, Perl distinguishes three subclasses of UTF‑8 warnings.
use v5.14; # subwarnings unavailable any earlier
no warnings "nonchar"; # the 66 forbidden non-characters
no warnings "surrogate"; # UTF-16/CESU-8 nonsense
no warnings "non_unicode"; # for codepoints over 0x10_FFFF
=head2 ℞ 3: Declare source in utf8 for identifiers and literals
Without the all-critical C<use utf8> declaration, putting UTF‑8 in your
literals and identifiers won’t work right. If you used the standard
preamble just given above, this already happened. If you did, you can
do things like this:
use utf8;
my $measure = "Ångström";
my @μsoft = qw( cp852 cp1251 cp1252 );
my @ὑπέρμεγας = qw( ὑπέρ μεγας );
my @鯉 = qw( koi8-f koi8-u koi8-r );
my $motto = "👪 💗 🐪"; # FAMILY, GROWING HEART, DROMEDARY CAMEL
If you forget C<use utf8>, high bytes will be misunderstood as
separate characters, and nothing will work right.
=head2 ℞ 4: Characters and their numbers
The C<ord> and C<chr> functions work transparently on all codepoints,
not just on ASCII alone — nor in fact, not even just on Unicode alone.
# ASCII characters
ord("A")
chr(65)
# characters from the Basic Multilingual Plane
ord("Σ")
chr(0x3A3)
# beyond the BMP
ord("𝑛") # MATHEMATICAL ITALIC SMALL N
chr(0x1D45B)
# beyond Unicode! (up to MAXINT)
ord("\x{20_0000}")
chr(0x20_0000)
=head2 ℞ 5: Unicode literals by character number
In an interpolated literal, whether a double-quoted string or a
regex, you may specify a character by its number using the
C<\x{I<HHHHHH>}> escape.
String: "\x{3a3}"
Regex: /\x{3a3}/
String: "\x{1d45b}"
Regex: /\x{1d45b}/
# even non-BMP ranges in regex work fine
/[\x{1D434}-\x{1D467}]/
=head2 ℞ 6: Get character name by number
use charnames ();
my $name = charnames::viacode(0x03A3);
=head2 ℞ 7: Get character number by name
use charnames ();
my $number = charnames::vianame("GREEK CAPITAL LETTER SIGMA");
=head2 ℞ 8: Unicode named characters
Use the C<< \N{I<charname>} >> notation to get the character
by that name for use in interpolated literals (double-quoted
strings and regexes). In v5.16, there is an implicit
use charnames qw(:full :short);
But prior to v5.16, you must be explicit about which set of charnames you
want. The C<:full> names are the official Unicode character name, alias, or
sequence, which all share a namespace.
use charnames qw(:full :short latin greek);
"\N{MATHEMATICAL ITALIC SMALL N}" # :full
"\N{GREEK CAPITAL LETTER SIGMA}" # :full
Anything else is a Perl-specific convenience abbreviation. Specify one or
more scripts by names if you want short names that are script-specific.
"\N{Greek:Sigma}" # :short
"\N{ae}" # latin
"\N{epsilon}" # greek
The v5.16 release also supports a C<:loose> import for loose matching of
character names, which works just like loose matching of property names:
that is, it disregards case, whitespace, and underscores:
"\N{euro sign}" # :loose (from v5.16)
=head2 ℞ 9: Unicode named sequences
These look just like character names but return multiple codepoints.
Notice the C<%vx> vector-print functionality in C<printf>.
use charnames qw(:full);
my $seq = "\N{LATIN CAPITAL LETTER A WITH MACRON AND GRAVE}";
printf "U+%v04X\n", $seq;
U+0100.0300
=head2 ℞ 10: Custom named characters
Use C<:alias> to give your own lexically scoped nicknames to existing
characters, or even to give unnamed private-use characters useful names.
use charnames ":full", ":alias" => {
ecute => "LATIN SMALL LETTER E WITH ACUTE",
"APPLE LOGO" => 0xF8FF, # private use character
};
"\N{ecute}"
"\N{APPLE LOGO}"
=head2 ℞ 11: Names of CJK codepoints
Sinograms like “東京” come back with character names of
C<CJK UNIFIED IDEOGRAPH-6771> and C<CJK UNIFIED IDEOGRAPH-4EAC>,
because their “names” vary. The CPAN C<Unicode::Unihan> module
has a large database for decoding these (and a whole lot more), provided you
know how to understand its output.
# cpan -i Unicode::Unihan
use Unicode::Unihan;
my $str = "東京";
my $unhan = Unicode::Unihan->new;
for my $lang (qw(Mandarin Cantonese Korean JapaneseOn JapaneseKun)) {
printf "CJK $str in %-12s is ", $lang;
say $unhan->$lang($str);
}
prints:
CJK 東京 in Mandarin is DONG1JING1
CJK 東京 in Cantonese is dung1ging1
CJK 東京 in Korean is TONGKYENG
CJK 東京 in JapaneseOn is TOUKYOU KEI KIN
CJK 東京 in JapaneseKun is HIGASHI AZUMAMIYAKO
If you have a specific romanization scheme in mind,
use the specific module:
# cpan -i Lingua::JA::Romanize::Japanese
use Lingua::JA::Romanize::Japanese;
my $k2r = Lingua::JA::Romanize::Japanese->new;
my $str = "東京";
say "Japanese for $str is ", $k2r->chars($str);
prints
Japanese for 東京 is toukyou
=head2 ℞ 12: Explicit encode/decode
On rare occasion, such as a database read, you may be
given encoded text you need to decode.
use Encode qw(encode decode);
my $chars = decode("shiftjis", $bytes, 1);
# OR
my $bytes = encode("MIME-Header-ISO_2022_JP", $chars, 1);
For streams all in the same encoding, don't use encode/decode; instead
set the file encoding when you open the file or immediately after with
C<binmode> as described later below.
=head2 ℞ 13: Decode program arguments as utf8
$ perl -CA ...
or
$ export PERL_UNICODE=A
or
use Encode qw(decode);
@ARGV = map { decode('UTF-8', $_, 1) } @ARGV;
=head2 ℞ 14: Decode program arguments as locale encoding
# cpan -i Encode::Locale
use Encode qw(locale);
use Encode::Locale;
# use "locale" as an arg to encode/decode
@ARGV = map { decode(locale => $_, 1) } @ARGV;
=head2 ℞ 15: Declare STD{IN,OUT,ERR} to be utf8
Use a command-line option, an environment variable, or else
call C<binmode> explicitly:
$ perl -CS ...
or
$ export PERL_UNICODE=S
or
use open qw(:std :encoding(UTF-8));
or
binmode(STDIN, ":encoding(UTF-8)");
binmode(STDOUT, ":utf8");
binmode(STDERR, ":utf8");
=head2 ℞ 16: Declare STD{IN,OUT,ERR} to be in locale encoding
# cpan -i Encode::Locale
use Encode;
use Encode::Locale;
# or as a stream for binmode or open
binmode STDIN, ":encoding(console_in)" if -t STDIN;
binmode STDOUT, ":encoding(console_out)" if -t STDOUT;
binmode STDERR, ":encoding(console_out)" if -t STDERR;
=head2 ℞ 17: Make file I/O default to utf8
Files opened without an encoding argument will be in UTF-8:
$ perl -CD ...
or
$ export PERL_UNICODE=D
or
use open qw(:encoding(UTF-8));
=head2 ℞ 18: Make all I/O and args default to utf8
$ perl -CSDA ...
or
$ export PERL_UNICODE=SDA
or
use open qw(:std :encoding(UTF-8));
use Encode qw(decode);
@ARGV = map { decode('UTF-8', $_, 1) } @ARGV;
=head2 ℞ 19: Open file with specific encoding
Specify stream encoding. This is the normal way
to deal with encoded text, not by calling low-level
functions.
# input file
open(my $in_file, "< :encoding(UTF-16)", "wintext");
OR
open(my $in_file, "<", "wintext");
binmode($in_file, ":encoding(UTF-16)");
THEN
my $line = <$in_file>;
# output file
open($out_file, "> :encoding(cp1252)", "wintext");
OR
open(my $out_file, ">", "wintext");
binmode($out_file, ":encoding(cp1252)");
THEN
print $out_file "some text\n";
More layers than just the encoding can be specified here. For example,
the incantation C<":raw :encoding(UTF-16LE) :crlf"> includes implicit
CRLF handling.
=head2 ℞ 20: Unicode casing
Unicode casing is very different from ASCII casing.
uc("henry ⅷ") # "HENRY Ⅷ"
uc("tschüß") # "TSCHÜSS" notice ß => SS
# both are true:
"tschüß" =~ /TSCHÜSS/i # notice ß => SS
"Σίσυφος" =~ /ΣΊΣΥΦΟΣ/i # notice Σ,σ,ς sameness
=head2 ℞ 21: Unicode case-insensitive comparisons
Also available in the CPAN L<Unicode::CaseFold> module,
the new C<fc> “foldcase” function from v5.16 grants
access to the same Unicode casefolding as the C</i>
pattern modifier has always used:
use feature "fc"; # fc() function is from v5.16
# sort case-insensitively
my @sorted = sort { fc($a) cmp fc($b) } @list;
# both are true:
fc("tschüß") eq fc("TSCHÜSS")
fc("Σίσυφος") eq fc("ΣΊΣΥΦΟΣ")
=head2 ℞ 22: Match Unicode linebreak sequence in regex
A Unicode linebreak matches the two-character CRLF
grapheme or any of seven vertical whitespace characters.
Good for dealing with textfiles coming from different
operating systems.
\R
s/\R/\n/g; # normalize all linebreaks to \n
=head2 ℞ 23: Get character category
Find the general category of a numeric codepoint.
use Unicode::UCD qw(charinfo);
my $cat = charinfo(0x3A3)->{category}; # "Lu"
=head2 ℞ 24: Disabling Unicode-awareness in builtin charclasses
Disable C<\w>, C<\b>, C<\s>, C<\d>, and the POSIX
classes from working correctly on Unicode either in this
scope, or in just one regex.
use v5.14;
use re "/a";
# OR
my($num) = $str =~ /(\d+)/a;
Or use specific un-Unicode properties, like C<\p{ahex}>
and C<\p{POSIX_Digit>}. Properties still work normally
no matter what charset modifiers (C</d /u /l /a /aa>)
should be effect.
=head2 ℞ 25: Match Unicode properties in regex with \p, \P
These all match a single codepoint with the given
property. Use C<\P> in place of C<\p> to match
one codepoint lacking that property.
\pL, \pN, \pS, \pP, \pM, \pZ, \pC
\p{Sk}, \p{Ps}, \p{Lt}
\p{alpha}, \p{upper}, \p{lower}
\p{Latin}, \p{Greek}
\p{script_extensions=Latin}, \p{scx=Greek}
\p{East_Asian_Width=Wide}, \p{EA=W}
\p{Line_Break=Hyphen}, \p{LB=HY}
\p{Numeric_Value=4}, \p{NV=4}
=head2 ℞ 26: Custom character properties
Define at compile-time your own custom character
properties for use in regexes.
# using private-use characters
sub In_Tengwar { "E000\tE07F\n" }
if (/\p{In_Tengwar}/) { ... }
# blending existing properties
sub Is_GraecoRoman_Title {<<'END_OF_SET'}
+utf8::IsLatin
+utf8::IsGreek
&utf8::IsTitle
END_OF_SET
if (/\p{Is_GraecoRoman_Title}/ { ... }
=head2 ℞ 27: Unicode normalization
Typically render into NFD on input and NFC on output. Using NFKC or NFKD
functions improves recall on searches, assuming you've already done to the
same text to be searched. Note that this is about much more than just pre-
combined compatibility glyphs; it also reorders marks according to their
canonical combining classes and weeds out singletons.
use Unicode::Normalize;
my $nfd = NFD($orig);
my $nfc = NFC($orig);
my $nfkd = NFKD($orig);
my $nfkc = NFKC($orig);
=head2 ℞ 28: Convert non-ASCII Unicode numerics
Unless you’ve used C</a> or C</aa>, C<\d> matches more than
ASCII digits only, but Perl’s implicit string-to-number
conversion does not current recognize these. Here’s how to
convert such strings manually.
use v5.14; # needed for num() function
use Unicode::UCD qw(num);
my $str = "got Ⅻ and ४५६७ and ⅞ and here";
my @nums = ();
while ($str =~ /(\d+|\N)/g) { # not just ASCII!
push @nums, num($1);
}
say "@nums"; # 12 4567 0.875
use charnames qw(:full);
my $nv = num("\N{RUMI DIGIT ONE}\N{RUMI DIGIT TWO}");
=head2 ℞ 29: Match Unicode grapheme cluster in regex
Programmer-visible “characters” are codepoints matched by C</./s>,
but user-visible “characters” are graphemes matched by C</\X/>.
# Find vowel *plus* any combining diacritics,underlining,etc.
my $nfd = NFD($orig);
$nfd =~ / (?=[aeiou]) \X /xi
=head2 ℞ 30: Extract by grapheme instead of by codepoint (regex)
# match and grab five first graphemes
my($first_five) = $str =~ /^ ( \X{5} ) /x;
=head2 ℞ 31: Extract by grapheme instead of by codepoint (substr)
# cpan -i Unicode::GCString
use Unicode::GCString;
my $gcs = Unicode::GCString->new($str);
my $first_five = $gcs->substr(0, 5);
=head2 ℞ 32: Reverse string by grapheme
Reversing by codepoint messes up diacritics, mistakenly converting
C<crème brûlée> into C<éel̂urb em̀erc> instead of into C<eélûrb emèrc>;
so reverse by grapheme instead. Both these approaches work
right no matter what normalization the string is in:
$str = join("", reverse $str =~ /\X/g);
# OR: cpan -i Unicode::GCString
use Unicode::GCString;
$str = reverse Unicode::GCString->new($str);
=head2 ℞ 33: String length in graphemes
The string C<brûlée> has six graphemes but up to eight codepoints.
This counts by grapheme, not by codepoint:
my $str = "brûlée";
my $count = 0;
while ($str =~ /\X/g) { $count++ }
# OR: cpan -i Unicode::GCString
use Unicode::GCString;
my $gcs = Unicode::GCString->new($str);
my $count = $gcs->length;
=head2 ℞ 34: Unicode column-width for printing
Perl’s C<printf>, C<sprintf>, and C<format> think all
codepoints take up 1 print column, but many take 0 or 2.
Here to show that normalization makes no difference,
we print out both forms:
use Unicode::GCString;
use Unicode::Normalize;
my @words = qw/crème brûlée/;
@words = map { NFC($_), NFD($_) } @words;
for my $str (@words) {
my $gcs = Unicode::GCString->new($str);
my $cols = $gcs->columns;
my $pad = " " x (10 - $cols);
say str, $pad, " |";
}
generates this to show that it pads correctly no matter
the normalization:
crème |
crème |
brûlée |
brûlée |
=head2 ℞ 35: Unicode collation
Text sorted by numeric codepoint follows no reasonable alphabetic order;
use the UCA for sorting text.
use Unicode::Collate;
my $col = Unicode::Collate->new();
my @list = $col->sort(@old_list);
See the I<ucsort> program from the L<Unicode::Tussle> CPAN module
for a convenient command-line interface to this module.
=head2 ℞ 36: Case- I<and> accent-insensitive Unicode sort
Specify a collation strength of level 1 to ignore case and
diacritics, only looking at the basic character.
use Unicode::Collate;
my $col = Unicode::Collate->new(level => 1);
my @list = $col->sort(@old_list);
=head2 ℞ 37: Unicode locale collation
Some locales have special sorting rules.
# either use v5.12, OR: cpan -i Unicode::Collate::Locale
use Unicode::Collate::Locale;
my $col = Unicode::Collate::Locale->new(locale => "de__phonebook");
my @list = $col->sort(@old_list);
The I<ucsort> program mentioned above accepts a C<--locale> parameter.
=head2 ℞ 38: Making C<cmp> work on text instead of codepoints
Instead of this:
@srecs = sort {
$b->{AGE} <=> $a->{AGE}
||
$a->{NAME} cmp $b->{NAME}
} @recs;
Use this:
my $coll = Unicode::Collate->new();
for my $rec (@recs) {
$rec->{NAME_key} = $coll->getSortKey( $rec->{NAME} );
}
@srecs = sort {
$b->{AGE} <=> $a->{AGE}
||
$a->{NAME_key} cmp $b->{NAME_key}
} @recs;
=head2 ℞ 39: Case- I<and> accent-insensitive comparisons
Use a collator object to compare Unicode text by character
instead of by codepoint.
use Unicode::Collate;
my $es = Unicode::Collate->new(
level => 1,
normalization => undef
);
# now both are true:
$es->eq("García", "GARCIA" );
$es->eq("Márquez", "MARQUEZ");
=head2 ℞ 40: Case- I<and> accent-insensitive locale comparisons
Same, but in a specific locale.
my $de = Unicode::Collate::Locale->new(
locale => "de__phonebook",
);
# now this is true:
$de->eq("tschüß", "TSCHUESS"); # notice ü => UE, ß => SS
=head2 ℞ 41: Unicode linebreaking
Break up text into lines according to Unicode rules.
# cpan -i Unicode::LineBreak
use Unicode::LineBreak;
use charnames qw(:full);
my $para = "This is a super\N{HYPHEN}long string. " x 20;
my $fmt = Unicode::LineBreak->new;
print $fmt->break($para), "\n";
=head2 ℞ 42: Unicode text in DBM hashes, the tedious way
Using a regular Perl string as a key or value for a DBM
hash will trigger a wide character exception if any codepoints
won’t fit into a byte. Here’s how to manually manage the translation:
use DB_File;
use Encode qw(encode decode);
tie %dbhash, "DB_File", "pathname";
# STORE
# assume $uni_key and $uni_value are abstract Unicode strings
my $enc_key = encode("UTF-8", $uni_key, 1);
my $enc_value = encode("UTF-8", $uni_value, 1);
$dbhash{$enc_key} = $enc_value;
# FETCH
# assume $uni_key holds a normal Perl string (abstract Unicode)
my $enc_key = encode("UTF-8", $uni_key, 1);
my $enc_value = $dbhash{$enc_key};
my $uni_value = decode("UTF-8", $enc_value, 1);
=head2 ℞ 43: Unicode text in DBM hashes, the easy way
Here’s how to implicitly manage the translation; all encoding
and decoding is done automatically, just as with streams that
have a particular encoding attached to them:
use DB_File;
use DBM_Filter;
my $dbobj = tie %dbhash, "DB_File", "pathname";
$dbobj->Filter_Value("utf8"); # this is the magic bit
# STORE
# assume $uni_key and $uni_value are abstract Unicode strings
$dbhash{$uni_key} = $uni_value;
# FETCH
# $uni_key holds a normal Perl string (abstract Unicode)
my $uni_value = $dbhash{$uni_key};
=head2 ℞ 44: PROGRAM: Demo of Unicode collation and printing
Here’s a full program showing how to make use of locale-sensitive
sorting, Unicode casing, and managing print widths when some of the
characters take up zero or two columns, not just one column each time.
When run, the following program produces this nicely aligned output:
Crème Brûlée....... €2.00
Éclair............. €1.60
Fideuà............. €4.20
Hamburger.......... €6.00
Jamón Serrano...... €4.45
Linguiça........... €7.00
Pâté............... €4.15
Pears.............. €2.00
Pêches............. €2.25
Smørbrød........... €5.75
Spätzle............ €5.50
Xoriço............. €3.00
Γύρος.............. €6.50
막걸리............. €4.00
おもち............. €2.65
お好み焼き......... €8.00
シュークリーム..... €1.85
寿司............... €9.99
包子............... €7.50
Here's that program; tested on v5.14.
#!/usr/bin/env perl
# umenu - demo sorting and printing of Unicode food
#
# (obligatory and increasingly long preamble)
#
use utf8;
use v5.14; # for locale sorting
use strict;
use warnings;
use warnings qw(FATAL utf8); # fatalize encoding faults
use open qw(:std :encoding(UTF-8)); # undeclared streams in UTF-8
use charnames qw(:full :short); # unneeded in v5.16
# std modules
use Unicode::Normalize; # std perl distro as of v5.8
use List::Util qw(max); # std perl distro as of v5.10
use Unicode::Collate::Locale; # std perl distro as of v5.14
# cpan modules
use Unicode::GCString; # from CPAN
# forward defs
sub pad($$$);
sub colwidth(_);
sub entitle(_);
my %price = (
"γύρος" => 6.50, # gyros
"pears" => 2.00, # like um, pears
"linguiça" => 7.00, # spicy sausage, Portuguese
"xoriço" => 3.00, # chorizo sausage, Catalan
"hamburger" => 6.00, # burgermeister meisterburger
"éclair" => 1.60, # dessert, French
"smørbrød" => 5.75, # sandwiches, Norwegian
"spätzle" => 5.50, # Bayerisch noodles, little sparrows
"包子" => 7.50, # bao1 zi5, steamed pork buns, Mandarin
"jamón serrano" => 4.45, # country ham, Spanish
"pêches" => 2.25, # peaches, French
"シュークリーム" => 1.85, # cream-filled pastry like eclair
"막걸리" => 4.00, # makgeolli, Korean rice wine
"寿司" => 9.99, # sushi, Japanese
"おもち" => 2.65, # omochi, rice cakes, Japanese
"crème brûlée" => 2.00, # crema catalana
"fideuà" => 4.20, # more noodles, Valencian
# (Catalan=fideuada)
"pâté" => 4.15, # gooseliver paste, French
"お好み焼き" => 8.00, # okonomiyaki, Japanese
);
my $width = 5 + max map { colwidth } keys %price;
# So the Asian stuff comes out in an order that someone
# who reads those scripts won't freak out over; the
# CJK stuff will be in JIS X 0208 order that way.
my $coll = Unicode::Collate::Locale->new(locale => "ja");
for my $item ($coll->sort(keys %price)) {
print pad(entitle($item), $width, ".");
printf " €%.2f\n", $price{$item};
}
sub pad($$$) {
my($str, $width, $padchar) = @_;
return $str . ($padchar x ($width - colwidth($str)));
}
sub colwidth(_) {
my($str) = @_;
return Unicode::GCString->new($str)->columns;
}
sub entitle(_) {
my($str) = @_;
$str =~ s{ (?=\pL)(\S) (\S*) }
{ ucfirst($1) . lc($2) }xge;
return $str;
}
=head1 SEE ALSO
See these manpages, some of which are CPAN modules:
L<perlunicode>, L<perluniprops>,
L<perlre>, L<perlrecharclass>,
L<perluniintro>, L<perlunitut>, L<perlunifaq>,
L<PerlIO>, L<DB_File>, L<DBM_Filter>, L<DBM_Filter::utf8>,
L<Encode>, L<Encode::Locale>,
L<Unicode::UCD>,
L<Unicode::Normalize>,
L<Unicode::GCString>, L<Unicode::LineBreak>,
L<Unicode::Collate>, L<Unicode::Collate::Locale>,
L<Unicode::Unihan>,
L<Unicode::CaseFold>,
L<Unicode::Tussle>,
L<Lingua::JA::Romanize::Japanese>,
L<Lingua::ZH::Romanize::Pinyin>,
L<Lingua::KO::Romanize::Hangul>.
The L<Unicode::Tussle> CPAN module includes many programs
to help with working with Unicode, including
these programs to fully or partly replace standard utilities:
I<tcgrep> instead of I<egrep>,
I<uniquote> instead of I<cat -v> or I<hexdump>,
I<uniwc> instead of I<wc>,
I<unilook> instead of I<look>,
I<unifmt> instead of I<fmt>,
and
I<ucsort> instead of I<sort>.
For exploring Unicode character names and character properties,
see its I<uniprops>, I<unichars>, and I<uninames> programs.
It also supplies these programs, all of which are general filters that do Unicode-y things:
I<unititle> and I<unicaps>;
I<uniwide> and I<uninarrow>;
I<unisupers> and I<unisubs>;
I<nfd>, I<nfc>, I<nfkd>, and I<nfkc>;
and I<uc>, I<lc>, and I<tc>.
Finally, see the published Unicode Standard (page numbers are from version
6.0.0), including these specific annexes and technical reports:
=over
=item §3.13 Default Case Algorithms, page 113;
§4.2 Case, pages 120–122;
Case Mappings, page 166–172, especially Caseless Matching starting on page 170.
=item UAX #44: Unicode Character Database
=item UTS #18: Unicode Regular Expressions
=item UAX #15: Unicode Normalization Forms
=item UTS #10: Unicode Collation Algorithm
=item UAX #29: Unicode Text Segmentation
=item UAX #14: Unicode Line Breaking Algorithm
=item UAX #11: East Asian Width
=back
=head1 AUTHOR
Tom Christiansen E<lt>tchrist@perl.comE<gt> wrote this, with occasional
kibbitzing from Larry Wall and Jeffrey Friedl in the background.
=head1 COPYRIGHT AND LICENCE
Copyright © 2012 Tom Christiansen.
This program is free software; you may redistribute it and/or modify it
under the same terms as Perl itself.
Most of these examples taken from the current edition of the “Camel Book”;
that is, from the 4ᵗʰ Edition of I<Programming Perl>, Copyright © 2012 Tom
Christiansen <et al.>, 2012-02-13 by O’Reilly Media. The code itself is
freely redistributable, and you are encouraged to transplant, fold,
spindle, and mutilate any of the examples in this manpage however you please
for inclusion into your own programs without any encumbrance whatsoever.
Acknowledgement via code comment is polite but not required.
=head1 REVISION HISTORY
v1.0.0 – first public release, 2012-02-27
PK z3�Z�ѥ�� �� perl5200delta.podnu �[��� =encoding utf8
=head1 NAME
perl5200delta - what is new for perl v5.20.0
=head1 DESCRIPTION
This document describes differences between the 5.18.0 release and the
5.20.0 release.
If you are upgrading from an earlier release such as 5.16.0, first read
L<perl5180delta>, which describes differences between 5.16.0 and 5.18.0.
=head1 Core Enhancements
=head2 Experimental Subroutine signatures
Declarative syntax to unwrap argument list into lexical variables.
C<sub foo ($a,$b) {...}> checks the number of arguments and puts the
arguments into lexical variables. Signatures are not equivalent to
the existing idiom of C<sub foo { my($a,$b) = @_; ... }>. Signatures
are only available by enabling a non-default feature, and generate
warnings about being experimental. The syntactic clash with
prototypes is managed by disabling the short prototype syntax when
signatures are enabled.
See L<perlsub/Signatures> for details.
=head2 C<sub>s now take a C<prototype> attribute
When declaring or defining a C<sub>, the prototype can now be specified inside
of a C<prototype> attribute instead of in parens following the name.
For example, C<sub foo($$){}> could be rewritten as
C<sub foo : prototype($$){}>.
=head2 More consistent prototype parsing
Multiple semicolons in subroutine prototypes have long been tolerated and
treated as a single semicolon. There was one case where this did not
happen. A subroutine whose prototype begins with "*" or ";*" can affect
whether a bareword is considered a method name or sub call. This now
applies also to ";;;*".
Whitespace has long been allowed inside subroutine prototypes, so
C<sub( $ $ )> is equivalent to C<sub($$)>, but until now it was stripped
when the subroutine was parsed. Hence, whitespace was I<not> allowed in
prototypes set by C<Scalar::Util::set_prototype>. Now it is permitted,
and the parser no longer strips whitespace. This means
C<prototype &mysub> returns the original prototype, whitespace and all.
=head2 C<rand> now uses a consistent random number generator
Previously perl would use a platform specific random number generator, varying
between the libc rand(), random() or drand48().
This meant that the quality of perl's random numbers would vary from platform
to platform, from the 15 bits of rand() on Windows to 48-bits on POSIX
platforms such as Linux with drand48().
Perl now uses its own internal drand48() implementation on all platforms. This
does not make perl's C<rand> cryptographically secure. [perl #115928]
=head2 New slice syntax
The new C<%hash{...}> and C<%array[...]> syntax returns a list of key/value (or
index/value) pairs. See L<perldata/"Key/Value Hash Slices">.
=head2 Experimental Postfix Dereferencing
When the C<postderef> feature is in effect, the following syntactical
equivalencies are set up:
$sref->$*; # same as ${ $sref } # interpolates
$aref->@*; # same as @{ $aref } # interpolates
$href->%*; # same as %{ $href }
$cref->&*; # same as &{ $cref }
$gref->**; # same as *{ $gref }
$aref->$#*; # same as $#{ $aref }
$gref->*{ $slot }; # same as *{ $gref }{ $slot }
$aref->@[ ... ]; # same as @$aref[ ... ] # interpolates
$href->@{ ... }; # same as @$href{ ... } # interpolates
$aref->%[ ... ]; # same as %$aref[ ... ]
$href->%{ ... }; # same as %$href{ ... }
Those marked as interpolating only interpolate if the associated
C<postderef_qq> feature is also enabled. This feature is B<experimental> and
will trigger C<experimental::postderef>-category warnings when used, unless
they are suppressed.
For more information, consult L<the Postfix Dereference Syntax section of
perlref|perlref/Postfix Dereference Syntax>.
=head2 Unicode 6.3 now supported
Perl now supports and is shipped with Unicode 6.3 (though Perl may be
recompiled with any previous Unicode release as well). A detailed list of
Unicode 6.3 changes is at L<http://www.unicode.org/versions/Unicode6.3.0/>.
=head2 New C<\p{Unicode}> regular expression pattern property
This is a synonym for C<\p{Any}> and matches the set of Unicode-defined
code points 0 - 0x10FFFF.
=head2 Better 64-bit support
On 64-bit platforms, the internal array functions now use 64-bit offsets,
allowing Perl arrays to hold more than 2**31 elements, if you have the memory
available.
The regular expression engine now supports strings longer than 2**31
characters. [perl #112790, #116907]
The functions PerlIO_get_bufsiz, PerlIO_get_cnt, PerlIO_set_cnt and
PerlIO_set_ptrcnt now have SSize_t, rather than int, return values and
parameters.
=head2 C<S<use locale>> now works on UTF-8 locales
Until this release, only single-byte locales, such as the ISO 8859
series were supported. Now, the increasingly common multi-byte UTF-8
locales are also supported. A UTF-8 locale is one in which the
character set is Unicode and the encoding is UTF-8. The POSIX
C<LC_CTYPE> category operations (case changing (like C<lc()>, C<"\U">),
and character classification (C<\w>, C<\D>, C<qr/[[:punct:]]/>)) under
such a locale work just as if not under locale, but instead as if under
C<S<use feature 'unicode_strings'>>, except taint rules are followed.
Sorting remains by code point order in this release. [perl #56820].
=head2 C<S<use locale>> now compiles on systems without locale ability
Previously doing this caused the program to not compile. Within its
scope the program behaves as if in the "C" locale. Thus programs
written for platforms that support locales can run on locale-less
platforms without change. Attempts to change the locale away from the
"C" locale will, of course, fail.
=head2 More locale initialization fallback options
If there was an error with locales during Perl start-up, it immediately
gave up and tried to use the C<"C"> locale. Now it first tries using
other locales given by the environment variables, as detailed in
L<perllocale/ENVIRONMENT>. For example, if C<LC_ALL> and C<LANG> are
both set, and using the C<LC_ALL> locale fails, Perl will now try the
C<LANG> locale, and only if that fails, will it fall back to C<"C">. On
Windows machines, Perl will try, ahead of using C<"C">, the system
default locale if all the locales given by environment variables fail.
=head2 C<-DL> runtime option now added for tracing locale setting
This is designed for Perl core developers to aid in field debugging bugs
regarding locales.
=head2 B<-F> now implies B<-a> and B<-a> implies B<-n>
Previously B<-F> without B<-a> was a no-op, and B<-a> without B<-n> or B<-p>
was a no-op, with this change, if you supply B<-F> then both B<-a> and B<-n>
are implied and if you supply B<-a> then B<-n> is implied.
You can still use B<-p> for its extra behaviour. [perl #116190]
=head2 $a and $b warnings exemption
The special variables $a and $b, used in C<sort>, are now exempt from "used
once" warnings, even where C<sort> is not used. This makes it easier for
CPAN modules to provide functions using $a and $b for similar purposes.
[perl #120462]
=head1 Security
=head2 Avoid possible read of free()d memory during parsing
It was possible that free()d memory could be read during parsing in the unusual
circumstance of the Perl program ending with a heredoc and the last line of the
file on disk having no terminating newline character. This has now been fixed.
=head1 Incompatible Changes
=head2 C<do> can no longer be used to call subroutines
The C<do SUBROUTINE(LIST)> form has resulted in a deprecation warning
since Perl v5.0.0, and is now a syntax error.
=head2 Quote-like escape changes
The character after C<\c> in a double-quoted string ("..." or qq(...))
or regular expression must now be a printable character and may not be
C<{>.
A literal C<{> after C<\B> or C<\b> is now fatal.
These were deprecated in perl v5.14.0.
=head2 Tainting happens under more circumstances; now conforms to documentation
This affects regular expression matching and changing the case of a
string (C<lc>, C<"\U">, I<etc>.) within the scope of C<use locale>.
The result is now tainted based on the operation, no matter what the
contents of the string were, as the documentation (L<perlsec>,
L<perllocale/SECURITY>) indicates it should. Previously, for the case
change operation, if the string contained no characters whose case
change could be affected by the locale, the result would not be tainted.
For example, the result of C<uc()> on an empty string or one containing
only above-Latin1 code points is now tainted, and wasn't before. This
leads to more consistent tainting results. Regular expression patterns
taint their non-binary results (like C<$&>, C<$2>) if and only if the
pattern contains elements whose matching depends on the current
(potentially tainted) locale. Like the case changing functions, the
actual contents of the string being matched now do not matter, whereas
formerly it did. For example, if the pattern contains a C<\w>, the
results will be tainted even if the match did not have to use that
portion of the pattern to succeed or fail, because what a C<\w> matches
depends on locale. However, for example, a C<.> in a pattern will not
enable tainting, because the dot matches any single character, and what
the current locale is doesn't change in any way what matches and what
doesn't.
=head2 C<\p{}>, C<\P{}> matching has changed for non-Unicode code
points.
C<\p{}> and C<\P{}> are defined by Unicode only on Unicode-defined code
points (C<U+0000> through C<U+10FFFF>). Their behavior on matching
these legal Unicode code points is unchanged, but there are changes for
code points C<0x110000> and above. Previously, Perl treated the result
of matching C<\p{}> and C<\P{}> against these as C<undef>, which
translates into "false". For C<\P{}>, this was then complemented into
"true". A warning was supposed to be raised when this happened.
However, various optimizations could prevent the warning, and the
results were often counter-intuitive, with both a match and its seeming
complement being false. Now all non-Unicode code points are treated as
typical unassigned Unicode code points. This generally is more
Do-What-I-Mean. A warning is raised only if the results are arguably
different from a strict Unicode approach, and from what Perl used to do.
Code that needs to be strictly Unicode compliant can make this warning
fatal, and then Perl always raises the warning.
Details are in L<perlunicode/Beyond Unicode code points>.
=head2 C<\p{All}> has been expanded to match all possible code points
The Perl-defined regular expression pattern element C<\p{All}>, unused
on CPAN, used to match just the Unicode code points; now it matches all
possible code points; that is, it is equivalent to C<qr/./s>. Thus
C<\p{All}> is no longer synonymous with C<\p{Any}>, which continues to
match just the Unicode code points, as Unicode says it should.
=head2 Data::Dumper's output may change
Depending on the data structures dumped and the settings set for
Data::Dumper, the dumped output may have changed from previous
versions.
If you have tests that depend on the exact output of Data::Dumper,
they may fail.
To avoid this problem in your code, test against the data structure
from evaluating the dumped structure, instead of the dump itself.
=head2 Locale decimal point character no longer leaks outside of S<C<use locale>> scope
This is actually a bug fix, but some code has come to rely on the bug
being present, so this change is listed here. The current locale that
the program is running under is not supposed to be visible to Perl code
except within the scope of a S<C<use locale>>. However, until now under
certain circumstances, the character used for a decimal point (often a
comma) leaked outside the scope. If your code is affected by this
change, simply add a S<C<use locale>>.
=head2 Assignments of Windows sockets error codes to $! now prefer F<errno.h> values over WSAGetLastError() values
In previous versions of Perl, Windows sockets error codes as returned by
WSAGetLastError() were assigned to $!, and some constants such as ECONNABORTED,
not in F<errno.h> in VC++ (or the various Windows ports of gcc) were defined to
corresponding WSAE* values to allow $! to be tested against the E* constants
exported by L<Errno> and L<POSIX>.
This worked well until VC++ 2010 and later, which introduced new E* constants
with values E<gt> 100 into F<errno.h>, including some being (re)defined by perl
to WSAE* values. That caused problems when linking XS code against other
libraries which used the original definitions of F<errno.h> constants.
To avoid this incompatibility, perl now maps WSAE* error codes to E* values
where possible, and assigns those values to $!. The E* constants exported by
L<Errno> and L<POSIX> are updated to match so that testing $! against them,
wherever previously possible, will continue to work as expected, and all E*
constants found in F<errno.h> are now exported from those modules with their
original F<errno.h> values.
In order to avoid breakage in existing Perl code which assigns WSAE* values to
$!, perl now intercepts the assignment and performs the same mapping to E*
values as it uses internally when assigning to $! itself.
However, one backwards-incompatibility remains: existing Perl code which
compares $! against the numeric values of the WSAE* error codes that were
previously assigned to $! will now be broken in those cases where a
corresponding E* value has been assigned instead. This is only an issue for
those E* values E<lt> 100, which were always exported from L<Errno> and
L<POSIX> with their original F<errno.h> values, and therefore could not be used
for WSAE* error code tests (e.g. WSAEINVAL is 10022, but the corresponding
EINVAL is 22). (E* values E<gt> 100, if present, were redefined to WSAE*
values anyway, so compatibility can be achieved by using the E* constants,
which will work both before and after this change, albeit using different
numeric values under the hood.)
=head2 Functions C<PerlIO_vsprintf> and C<PerlIO_sprintf> have been removed
These two functions, undocumented, unused in CPAN, and problematic, have been
removed.
=head1 Deprecations
=head2 The C</\C/> character class
The C</\C/> regular expression character class is deprecated. From perl
5.22 onwards it will generate a warning, and from perl 5.24 onwards it
will be a regular expression compiler error. If you need to examine the
individual bytes that make up a UTF8-encoded character, then use
C<utf8::encode()> on the string (or a copy) first.
=head2 Literal control characters in variable names
This deprecation affects things like $\cT, where \cT is a literal control (such
as a C<NAK> or C<NEGATIVE ACKNOWLEDGE> character) in
the source code. Surprisingly, it appears that originally this was intended as
the canonical way of accessing variables like $^T, with the caret form only
being added as an alternative.
The literal control form is being deprecated for two main reasons. It has what
are likely unfixable bugs, such as $\cI not working as an alias for $^I, and
their usage not being portable to non-ASCII platforms: While $^T will work
everywhere, \cT is whitespace in EBCDIC. [perl #119123]
=head2 References to non-integers and non-positive integers in C<$/>
Setting C<$/> to a reference to zero or a reference to a negative integer is
now deprecated, and will behave B<exactly> as though it was set to C<undef>.
If you want slurp behavior set C<$/> to C<undef> explicitly.
Setting C<$/> to a reference to a non integer is now forbidden and will
throw an error. Perl has never documented what would happen in this
context and while it used to behave the same as setting C<$/> to
the address of the references in future it may behave differently, so we
have forbidden this usage.
=head2 Character matching routines in POSIX
Use of any of these functions in the C<POSIX> module is now deprecated:
C<isalnum>, C<isalpha>, C<iscntrl>, C<isdigit>, C<isgraph>, C<islower>,
C<isprint>, C<ispunct>, C<isspace>, C<isupper>, and C<isxdigit>. The
functions are buggy and don't work on UTF-8 encoded strings. See their
entries in L<POSIX> for more information.
A warning is raised on the first call to any of them from each place in
the code that they are called. (Hence a repeated statement in a loop
will raise just the one warning.)
=head2 Interpreter-based threads are now I<discouraged>
The "interpreter-based threads" provided by Perl are not the fast, lightweight
system for multitasking that one might expect or hope for. Threads are
implemented in a way that make them easy to misuse. Few people know how to
use them correctly or will be able to provide help.
The use of interpreter-based threads in perl is officially
L<discouraged|perlpolicy/discouraged>.
=head2 Module removals
The following modules will be removed from the core distribution in a
future release, and will at that time need to be installed from CPAN.
Distributions on CPAN which require these modules will need to list them as
prerequisites.
The core versions of these modules will now issue C<"deprecated">-category
warnings to alert you to this fact. To silence these deprecation warnings,
install the modules in question from CPAN.
Note that the planned removal of these modules from core does not reflect a
judgement about the quality of the code and should not be taken as a suggestion
that their use be halted. Their disinclusion from core primarily hinges on
their necessity to bootstrapping a fully functional, CPAN-capable Perl
installation, not on concerns over their design.
=over
=item L<CGI> and its associated CGI:: packages
=item L<inc::latest>
=item L<Package::Constants>
=item L<Module::Build> and its associated Module::Build:: packages
=back
=head2 Utility removals
The following utilities will be removed from the core distribution in a
future release, and will at that time need to be installed from CPAN.
=over 4
=item L<find2perl>
=item L<s2p>
=item L<a2p>
=back
=head1 Performance Enhancements
=over 4
=item *
Perl has a new copy-on-write mechanism that avoids the need to copy the
internal string buffer when assigning from one scalar to another. This
makes copying large strings appear much faster. Modifying one of the two
(or more) strings after an assignment will force a copy internally. This
makes it unnecessary to pass strings by reference for efficiency.
This feature was already available in 5.18.0, but wasn't enabled by
default. It is the default now, and so you no longer need build perl with
the F<Configure> argument:
-Accflags=-DPERL_NEW_COPY_ON_WRITE
It can be disabled (for now) in a perl build with:
-Accflags=-DPERL_NO_COW
On some operating systems Perl can be compiled in such a way that any
attempt to modify string buffers shared by multiple SVs will crash. This
way XS authors can test that their modules handle copy-on-write scalars
correctly. See L<perlguts/"Copy on Write"> for detail.
=item *
Perl has an optimizer for regular expression patterns. It analyzes the pattern
to find things such as the minimum length a string has to be to match, etc. It
now better handles code points that are above the Latin1 range.
=item *
Executing a regex that contains the C<^> anchor (or its variant under the
C</m> flag) has been made much faster in several situations.
=item *
Precomputed hash values are now used in more places during method lookup.
=item *
Constant hash key lookups (C<$hash{key}> as opposed to C<$hash{$key}>) have
long had the internal hash value computed at compile time, to speed up
lookup. This optimisation has only now been applied to hash slices as
well.
=item *
Combined C<and> and C<or> operators in void context, like those
generated for C<< unless ($a && $b) >> and C<< if ($a || b) >> now
short circuit directly to the end of the statement. [perl #120128]
=item *
In certain situations, when C<return> is the last statement in a subroutine's
main scope, it will be optimized out. This means code like:
sub baz { return $cat; }
will now behave like:
sub baz { $cat; }
which is notably faster.
[perl #120765]
=item *
Code like:
my $x; # or @x, %x
my $y;
is now optimized to:
my ($x, $y);
In combination with the L<padrange optimization introduced in
v5.18.0|perl5180delta/Internal Changes>, this means longer uninitialized my
variable statements are also optimized, so:
my $x; my @y; my %z;
becomes:
my ($x, @y, %z);
[perl #121077]
=item *
The creation of certain sorts of lists, including array and hash slices, is now
faster.
=item *
The optimisation for arrays indexed with a small constant integer is now
applied for integers in the range -128..127, rather than 0..255. This should
speed up Perl code using expressions like C<$x[-1]>, at the expense of
(presumably much rarer) code using expressions like C<$x[200]>.
=item *
The first iteration over a large hash (using C<keys> or C<each>) is now
faster. This is achieved by preallocating the hash's internal iterator
state, rather than lazily creating it when the hash is first iterated. (For
small hashes, the iterator is still created only when first needed. The
assumption is that small hashes are more likely to be used as objects, and
therefore never allocated. For large hashes, that's less likely to be true,
and the cost of allocating the iterator is swamped by the cost of allocating
space for the hash itself.)
=item *
When doing a global regex match on a string that came from the C<readline>
or C<E<lt>E<gt>> operator, the data is no longer copied unnecessarily.
[perl #121259]
=item *
Dereferencing (as in C<$obj-E<gt>[0]> or C<$obj-E<gt>{k}>) is now faster
when C<$obj> is an instance of a class that has overloaded methods, but
doesn't overload any of the dereferencing methods C<@{}>, C<%{}>, and so on.
=item *
Perl's optimiser no longer skips optimising code that follows certain
C<eval {}> expressions (including those with an apparent infinite loop).
=item *
The implementation now does a better job of avoiding meaningless work at
runtime. Internal effect-free "null" operations (created as a side-effect of
parsing Perl programs) are normally deleted during compilation. That
deletion is now applied in some situations that weren't previously handled.
=item *
Perl now does less disk I/O when dealing with Unicode properties that cover
up to three ranges of consecutive code points.
=back
=head1 Modules and Pragmata
=head2 New Modules and Pragmata
=over 4
=item *
L<experimental> 0.007 has been added to the Perl core.
=item *
L<IO::Socket::IP> 0.29 has been added to the Perl core.
=back
=head2 Updated Modules and Pragmata
=over 4
=item *
L<Archive::Tar> has been upgraded from version 1.90 to 1.96.
=item *
L<arybase> has been upgraded from version 0.06 to 0.07.
=item *
L<Attribute::Handlers> has been upgraded from version 0.94 to 0.96.
=item *
L<attributes> has been upgraded from version 0.21 to 0.22.
=item *
L<autodie> has been upgraded from version 2.13 to 2.23.
=item *
L<AutoLoader> has been upgraded from version 5.73 to 5.74.
=item *
L<autouse> has been upgraded from version 1.07 to 1.08.
=item *
L<B> has been upgraded from version 1.42 to 1.48.
=item *
L<B::Concise> has been upgraded from version 0.95 to 0.992.
=item *
L<B::Debug> has been upgraded from version 1.18 to 1.19.
=item *
L<B::Deparse> has been upgraded from version 1.20 to 1.26.
=item *
L<base> has been upgraded from version 2.18 to 2.22.
=item *
L<Benchmark> has been upgraded from version 1.15 to 1.18.
=item *
L<bignum> has been upgraded from version 0.33 to 0.37.
=item *
L<Carp> has been upgraded from version 1.29 to 1.3301.
=item *
L<CGI> has been upgraded from version 3.63 to 3.65.
NOTE: L<CGI> is deprecated and may be removed from a future version of Perl.
=item *
L<charnames> has been upgraded from version 1.36 to 1.40.
=item *
L<Class::Struct> has been upgraded from version 0.64 to 0.65.
=item *
L<Compress::Raw::Bzip2> has been upgraded from version 2.060 to 2.064.
=item *
L<Compress::Raw::Zlib> has been upgraded from version 2.060 to 2.065.
=item *
L<Config::Perl::V> has been upgraded from version 0.17 to 0.20.
=item *
L<constant> has been upgraded from version 1.27 to 1.31.
=item *
L<CPAN> has been upgraded from version 2.00 to 2.05.
=item *
L<CPAN::Meta> has been upgraded from version 2.120921 to 2.140640.
=item *
L<CPAN::Meta::Requirements> has been upgraded from version 2.122 to 2.125.
=item *
L<CPAN::Meta::YAML> has been upgraded from version 0.008 to 0.012.
=item *
L<Data::Dumper> has been upgraded from version 2.145 to 2.151.
=item *
L<DB> has been upgraded from version 1.04 to 1.07.
=item *
L<DB_File> has been upgraded from version 1.827 to 1.831.
=item *
L<DBM_Filter> has been upgraded from version 0.05 to 0.06.
=item *
L<deprecate> has been upgraded from version 0.02 to 0.03.
=item *
L<Devel::Peek> has been upgraded from version 1.11 to 1.16.
=item *
L<Devel::PPPort> has been upgraded from version 3.20 to 3.21.
=item *
L<diagnostics> has been upgraded from version 1.31 to 1.34.
=item *
L<Digest::MD5> has been upgraded from version 2.52 to 2.53.
=item *
L<Digest::SHA> has been upgraded from version 5.84 to 5.88.
=item *
L<DynaLoader> has been upgraded from version 1.18 to 1.25.
=item *
L<Encode> has been upgraded from version 2.49 to 2.60.
=item *
L<encoding> has been upgraded from version 2.6_01 to 2.12.
=item *
L<English> has been upgraded from version 1.06 to 1.09.
C<$OLD_PERL_VERSION> was added as an alias of C<$]>.
=item *
L<Errno> has been upgraded from version 1.18 to 1.20_03.
=item *
L<Exporter> has been upgraded from version 5.68 to 5.70.
=item *
L<ExtUtils::CBuilder> has been upgraded from version 0.280210 to 0.280216.
=item *
L<ExtUtils::Command> has been upgraded from version 1.17 to 1.18.
=item *
L<ExtUtils::Embed> has been upgraded from version 1.30 to 1.32.
=item *
L<ExtUtils::Install> has been upgraded from version 1.59 to 1.67.
=item *
L<ExtUtils::MakeMaker> has been upgraded from version 6.66 to 6.98.
=item *
L<ExtUtils::Miniperl> has been upgraded from version to 1.01.
=item *
L<ExtUtils::ParseXS> has been upgraded from version 3.18 to 3.24.
=item *
L<ExtUtils::Typemaps> has been upgraded from version 3.19 to 3.24.
=item *
L<ExtUtils::XSSymSet> has been upgraded from version 1.2 to 1.3.
=item *
L<feature> has been upgraded from version 1.32 to 1.36.
=item *
L<fields> has been upgraded from version 2.16 to 2.17.
=item *
L<File::Basename> has been upgraded from version 2.84 to 2.85.
=item *
L<File::Copy> has been upgraded from version 2.26 to 2.29.
=item *
L<File::DosGlob> has been upgraded from version 1.10 to 1.12.
=item *
L<File::Fetch> has been upgraded from version 0.38 to 0.48.
=item *
L<File::Find> has been upgraded from version 1.23 to 1.27.
=item *
L<File::Glob> has been upgraded from version 1.20 to 1.23.
=item *
L<File::Spec> has been upgraded from version 3.40 to 3.47.
=item *
L<File::Temp> has been upgraded from version 0.23 to 0.2304.
=item *
L<FileCache> has been upgraded from version 1.08 to 1.09.
=item *
L<Filter::Simple> has been upgraded from version 0.89 to 0.91.
=item *
L<Filter::Util::Call> has been upgraded from version 1.45 to 1.49.
=item *
L<Getopt::Long> has been upgraded from version 2.39 to 2.42.
=item *
L<Getopt::Std> has been upgraded from version 1.07 to 1.10.
=item *
L<Hash::Util::FieldHash> has been upgraded from version 1.10 to 1.15.
=item *
L<HTTP::Tiny> has been upgraded from version 0.025 to 0.043.
=item *
L<I18N::Langinfo> has been upgraded from version 0.10 to 0.11.
=item *
L<I18N::LangTags> has been upgraded from version 0.39 to 0.40.
=item *
L<if> has been upgraded from version 0.0602 to 0.0603.
=item *
L<inc::latest> has been upgraded from version 0.4003 to 0.4205.
NOTE: L<inc::latest> is deprecated and may be removed from a future version of Perl.
=item *
L<integer> has been upgraded from version 1.00 to 1.01.
=item *
L<IO> has been upgraded from version 1.28 to 1.31.
=item *
L<IO::Compress::Gzip> and friends have been upgraded from version 2.060 to
2.064.
=item *
L<IPC::Cmd> has been upgraded from version 0.80 to 0.92.
=item *
L<IPC::Open3> has been upgraded from version 1.13 to 1.16.
=item *
L<IPC::SysV> has been upgraded from version 2.03 to 2.04.
=item *
L<JSON::PP> has been upgraded from version 2.27202 to 2.27203.
=item *
L<List::Util> has been upgraded from version 1.27 to 1.38.
=item *
L<locale> has been upgraded from version 1.02 to 1.03.
=item *
L<Locale::Codes> has been upgraded from version 3.25 to 3.30.
=item *
L<Locale::Maketext> has been upgraded from version 1.23 to 1.25.
=item *
L<Math::BigInt> has been upgraded from version 1.9991 to 1.9993.
=item *
L<Math::BigInt::FastCalc> has been upgraded from version 0.30 to 0.31.
=item *
L<Math::BigRat> has been upgraded from version 0.2604 to 0.2606.
=item *
L<MIME::Base64> has been upgraded from version 3.13 to 3.14.
=item *
L<Module::Build> has been upgraded from version 0.4003 to 0.4205.
NOTE: L<Module::Build> is deprecated and may be removed from a future version of Perl.
=item *
L<Module::CoreList> has been upgraded from version 2.89 to 3.10.
=item *
L<Module::Load> has been upgraded from version 0.24 to 0.32.
=item *
L<Module::Load::Conditional> has been upgraded from version 0.54 to 0.62.
=item *
L<Module::Metadata> has been upgraded from version 1.000011 to 1.000019.
=item *
L<mro> has been upgraded from version 1.11 to 1.16.
=item *
L<Net::Ping> has been upgraded from version 2.41 to 2.43.
=item *
L<Opcode> has been upgraded from version 1.25 to 1.27.
=item *
L<Package::Constants> has been upgraded from version 0.02 to 0.04.
NOTE: L<Package::Constants> is deprecated and may be removed from a future version of Perl.
=item *
L<Params::Check> has been upgraded from version 0.36 to 0.38.
=item *
L<parent> has been upgraded from version 0.225 to 0.228.
=item *
L<Parse::CPAN::Meta> has been upgraded from version 1.4404 to 1.4414.
=item *
L<Perl::OSType> has been upgraded from version 1.003 to 1.007.
=item *
L<perlfaq> has been upgraded from version 5.0150042 to 5.0150044.
=item *
L<PerlIO> has been upgraded from version 1.07 to 1.09.
=item *
L<PerlIO::encoding> has been upgraded from version 0.16 to 0.18.
=item *
L<PerlIO::scalar> has been upgraded from version 0.16 to 0.18.
=item *
L<PerlIO::via> has been upgraded from version 0.12 to 0.14.
=item *
L<Pod::Escapes> has been upgraded from version 1.04 to 1.06.
=item *
L<Pod::Functions> has been upgraded from version 1.06 to 1.08.
=item *
L<Pod::Html> has been upgraded from version 1.18 to 1.21.
=item *
L<Pod::Parser> has been upgraded from version 1.60 to 1.62.
=item *
L<Pod::Perldoc> has been upgraded from version 3.19 to 3.23.
=item *
L<Pod::Usage> has been upgraded from version 1.61 to 1.63.
=item *
L<POSIX> has been upgraded from version 1.32 to 1.38_03.
=item *
L<re> has been upgraded from version 0.23 to 0.26.
=item *
L<Safe> has been upgraded from version 2.35 to 2.37.
=item *
L<Scalar::Util> has been upgraded from version 1.27 to 1.38.
=item *
L<SDBM_File> has been upgraded from version 1.09 to 1.11.
=item *
L<Socket> has been upgraded from version 2.009 to 2.013.
=item *
L<Storable> has been upgraded from version 2.41 to 2.49.
=item *
L<strict> has been upgraded from version 1.07 to 1.08.
=item *
L<subs> has been upgraded from version 1.01 to 1.02.
=item *
L<Sys::Hostname> has been upgraded from version 1.17 to 1.18.
=item *
L<Sys::Syslog> has been upgraded from version 0.32 to 0.33.
=item *
L<Term::Cap> has been upgraded from version 1.13 to 1.15.
=item *
L<Term::ReadLine> has been upgraded from version 1.12 to 1.14.
=item *
L<Test::Harness> has been upgraded from version 3.26 to 3.30.
=item *
L<Test::Simple> has been upgraded from version 0.98 to 1.001002.
=item *
L<Text::ParseWords> has been upgraded from version 3.28 to 3.29.
=item *
L<Text::Tabs> has been upgraded from version 2012.0818 to 2013.0523.
=item *
L<Text::Wrap> has been upgraded from version 2012.0818 to 2013.0523.
=item *
L<Thread> has been upgraded from version 3.02 to 3.04.
=item *
L<Thread::Queue> has been upgraded from version 3.02 to 3.05.
=item *
L<threads> has been upgraded from version 1.86 to 1.93.
=item *
L<threads::shared> has been upgraded from version 1.43 to 1.46.
=item *
L<Tie::Array> has been upgraded from version 1.05 to 1.06.
=item *
L<Tie::File> has been upgraded from version 0.99 to 1.00.
=item *
L<Tie::Hash> has been upgraded from version 1.04 to 1.05.
=item *
L<Tie::Scalar> has been upgraded from version 1.02 to 1.03.
=item *
L<Tie::StdHandle> has been upgraded from version 4.3 to 4.4.
=item *
L<Time::HiRes> has been upgraded from version 1.9725 to 1.9726.
=item *
L<Time::Piece> has been upgraded from version 1.20_01 to 1.27.
=item *
L<Unicode::Collate> has been upgraded from version 0.97 to 1.04.
=item *
L<Unicode::Normalize> has been upgraded from version 1.16 to 1.17.
=item *
L<Unicode::UCD> has been upgraded from version 0.51 to 0.57.
=item *
L<utf8> has been upgraded from version 1.10 to 1.13.
=item *
L<version> has been upgraded from version 0.9902 to 0.9908.
=item *
L<vmsish> has been upgraded from version 1.03 to 1.04.
=item *
L<warnings> has been upgraded from version 1.18 to 1.23.
=item *
L<Win32> has been upgraded from version 0.47 to 0.49.
=item *
L<XS::Typemap> has been upgraded from version 0.10 to 0.13.
=item *
L<XSLoader> has been upgraded from version 0.16 to 0.17.
=back
=head1 Documentation
=head2 New Documentation
=head3 L<perlrepository>
This document was removed (actually, renamed L<perlgit> and given a major
overhaul) in Perl v5.14, causing Perl documentation websites to show the now
out of date version in Perl v5.12 as the latest version. It has now been
restored in stub form, directing readers to current information.
=head2 Changes to Existing Documentation
=head3 L<perldata>
=over 4
=item *
New sections have been added to document the new index/value array slice and
key/value hash slice syntax.
=back
=head3 L<perldebguts>
=over 4
=item *
The C<DB::goto> and C<DB::lsub> debugger subroutines are now documented. [perl
#77680]
=back
=head3 L<perlexperiment>
=over
=item *
C<\s> matching C<\cK> is marked experimental.
=item *
ithreads were accepted in v5.8.0 (but are discouraged as of v5.20.0).
=item *
Long doubles are not considered experimental.
=item *
Code in regular expressions, regular expression backtracking verbs,
and lvalue subroutines are no longer listed as experimental. (This
also affects L<perlre> and L<perlsub>.)
=back
=head3 L<perlfunc>
=over
=item *
C<chop> and C<chomp> now note that they can reset the hash iterator.
=item *
C<exec>'s handling of arguments is now more clearly documented.
=item *
C<eval EXPR> now has caveats about expanding floating point numbers in some
locales.
=item *
C<goto EXPR> is now documented to handle an expression that evalutes to a
code reference as if it was C<goto &$coderef>. This behavior is at least ten
years old.
=item *
Since Perl v5.10, it has been possible for subroutines in C<@INC> to return
a reference to a scalar holding initial source code to prepend to the file.
This is now documented.
=item *
The documentation of C<ref> has been updated to recommend the use of
C<blessed>, C<isa> and C<reftype> when dealing with references to blessed
objects.
=back
=head3 L<perlguts>
=over 4
=item *
Numerous minor changes have been made to reflect changes made to the perl
internals in this release.
=item *
New sections on L<Read-Only Values|perlguts/"Read-Only Values"> and
L<Copy on Write|perlguts/"Copy on Write"> have been added.
=back
=head3 L<perlhack>
=over 4
=item *
The L<Super Quick Patch Guide|perlhack/SUPER QUICK PATCH GUIDE> section has
been updated.
=back
=head3 L<perlhacktips>
=over 4
=item *
The documentation has been updated to include some more examples of C<gdb>
usage.
=back
=head3 L<perllexwarn>
=over 4
=item *
The L<perllexwarn> documentation used to describe the hierarchy of warning
categories understood by the L<warnings> pragma. That description has now
been moved to the L<warnings> documentation itself, leaving L<perllexwarn>
as a stub that points to it. This change consolidates all documentation for
lexical warnings in a single place.
=back
=head3 L<perllocale>
=over
=item *
The documentation now mentions F<fc()> and C<\F>, and includes many
clarifications and corrections in general.
=back
=head3 L<perlop>
=over 4
=item *
The language design of Perl has always called for monomorphic operators.
This is now mentioned explicitly.
=back
=head3 L<perlopentut>
=over 4
=item *
The C<open> tutorial has been completely rewritten by Tom Christiansen, and now
focuses on covering only the basics, rather than providing a comprehensive
reference to all things openable. This rewrite came as the result of a
vigorous discussion on perl5-porters kicked off by a set of improvements
written by Alexander Hartmaier to the existing L<perlopentut>. A "more than
you ever wanted to know about C<open>" document may follow in subsequent
versions of perl.
=back
=head3 L<perlre>
=over 4
=item *
The fact that the regexp engine makes no effort to call (?{}) and (??{})
constructs any specified number of times (although it will basically DWIM
in case of a successful match) has been documented.
=item *
The C</r> modifier (for non-destructive substitution) is now documented. [perl
#119151]
=item *
The documentation for C</x> and C<(?# comment)> has been expanded and clarified.
=back
=head3 L<perlreguts>
=over 4
=item *
The documentation has been updated in the light of recent changes to
F<regcomp.c>.
=back
=head3 L<perlsub>
=over 4
=item *
The need to predeclare recursive functions with prototypes in order for the
prototype to be honoured in the recursive call is now documented. [perl #2726]
=item *
A list of subroutine names used by the perl implementation is now included.
[perl #77680]
=back
=head3 L<perltrap>
=over 4
=item *
There is now a L<JavaScript|perltrap/JavaScript Traps> section.
=back
=head3 L<perlunicode>
=over 4
=item *
The documentation has been updated to reflect C<Bidi_Class> changes in
Unicode 6.3.
=back
=head3 L<perlvar>
=over 4
=item *
A new section explaining the performance issues of $`, $& and $', including
workarounds and changes in different versions of Perl, has been added.
=item *
Three L<English> variable names which have long been documented but do not
actually exist have been removed from the documentation. These were
C<$OLD_PERL_VERSION>, C<$OFMT>, and C<$ARRAY_BASE>.
(Actually, C<OLD_PERL_VERSION> I<does> exist, starting with this revision, but
remained undocumented until perl 5.22.0.)
=back
=head3 L<perlxs>
=over 4
=item *
Several problems in the C<MY_CXT> example have been fixed.
=back
=head1 Diagnostics
The following additions or changes have been made to diagnostic output,
including warnings and fatal error messages. For the complete list of
diagnostic messages, see L<perldiag>.
=head2 New Diagnostics
=head3 New Errors
=over 4
=item *
L<delete argument is indexE<sol>value array slice, use array slice|perldiag/"delete argument is index/value array slice, use array slice">
(F) You used index/value array slice syntax (C<%array[...]>) as the argument to
C<delete>. You probably meant C<@array[...]> with an @ symbol instead.
=item *
L<delete argument is keyE<sol>value hash slice, use hash slice|perldiag/"delete argument is key/value hash slice, use hash slice">
(F) You used key/value hash slice syntax (C<%hash{...}>) as the argument to
C<delete>. You probably meant C<@hash{...}> with an @ symbol instead.
=item *
L<Magical list constants are not supported|perldiag/"Magical list constants are
not supported">
(F) You assigned a magical array to a stash element, and then tried to use the
subroutine from the same slot. You are asking Perl to do something it cannot
do, details subject to change between Perl versions.
=item *
Added L<Setting $E<sol> to a %s reference is forbidden|perldiag/"Setting $E<sol> to %s reference is forbidden">
=back
=head3 New Warnings
=over 4
=item *
L<%s on reference is experimental|perldiag/"push on reference is experimental">:
The "auto-deref" feature is experimental.
Starting in v5.14.0, it was possible to use push, pop, keys, and other
built-in functions not only on aggregate types, but on references to
them. The feature was not deployed to its original intended
specification, and now may become redundant to postfix dereferencing.
It has always been categorized as an experimental feature, and in
v5.20.0 is carries a warning as such.
Warnings will now be issued at compile time when these operations are
detected.
no if $] >= 5.01908, warnings => "experimental::autoderef";
Consider, though, replacing the use of these features, as they may
change behavior again before becoming stable.
=item *
L<A sequence of multiple spaces in a charnames alias definition is deprecated|perldiag/"A sequence of multiple spaces in a charnames alias definition is deprecated">
L<Trailing white-space in a charnames alias definition is deprecated|perldiag/"Trailing white-space in a charnames alias definition is deprecated">
These two deprecation warnings involving C<\N{...}> were incorrectly
implemented. They did not warn by default (now they do) and could not be
made fatal via C<< use warnings FATAL => 'deprecated' >> (now they can).
=item *
L<Attribute prototype(%s) discards earlier prototype attribute in same sub|perldiag/"Attribute prototype(%s) discards earlier prototype attribute in same sub">
(W misc) A sub was declared as C<sub foo : prototype(A) : prototype(B) {}>, for
example. Since each sub can only have one prototype, the earlier
declaration(s) are discarded while the last one is applied.
=item *
L<Invalid \0 character in %s for %s: %s\0%s|perldiag/"Invalid \0 character in %s for %s: %s\0%s">
(W syscalls) Embedded \0 characters in pathnames or other system call arguments
produce a warning as of 5.20. The parts after the \0 were formerly ignored by
system calls.
=item *
L<Matched non-Unicode code point 0x%X against Unicode property; may not be portable|perldiag/"Matched non-Unicode code point 0x%X against Unicode property; may not be portable">.
This replaces the message "Code point 0x%X is not Unicode, all \p{} matches
fail; all \P{} matches succeed".
=item *
L<Missing ']' in prototype for %s : %s|perldiag/"Missing ']' in prototype for %s : %s">
(W illegalproto) A grouping was started with C<[> but never closed with C<]>.
=item *
L<Possible precedence issue with control flow operator|perldiag/"Possible precedence issue with control flow operator">
(W syntax) There is a possible problem with the mixing of a control flow
operator (e.g. C<return>) and a low-precedence operator like C<or>. Consider:
sub { return $a or $b; }
This is parsed as:
sub { (return $a) or $b; }
Which is effectively just:
sub { return $a; }
Either use parentheses or the high-precedence variant of the operator.
Note this may be also triggered for constructs like:
sub { 1 if die; }
=item *
L<Postfix dereference is experimental|perldiag/"Postfix dereference is experimental">
(S experimental::postderef) This warning is emitted if you use the experimental
postfix dereference syntax. Simply suppress the warning if you want to use the
feature, but know that in doing so you are taking the risk of using an
experimental feature which may change or be removed in a future Perl version:
no warnings "experimental::postderef";
use feature "postderef", "postderef_qq";
$ref->$*;
$aref->@*;
$aref->@[@indices];
... etc ...
=item *
L<Prototype '%s' overridden by attribute 'prototype(%s)' in %s|perldiag/"Prototype '%s' overridden by attribute 'prototype(%s)' in %s">
(W prototype) A prototype was declared in both the parentheses after the sub
name and via the prototype attribute. The prototype in parentheses is useless,
since it will be replaced by the prototype from the attribute before it's ever
used.
=item *
L<Scalar value @%s[%s] better written as $%s[%s]|perldiag/"Scalar value @%s[%s] better written as $%s[%s]">
(W syntax) In scalar context, you've used an array index/value slice (indicated
by %) to select a single element of an array. Generally it's better to ask for
a scalar value (indicated by $). The difference is that C<$foo[&bar]> always
behaves like a scalar, both in the value it returns and when evaluating its
argument, while C<%foo[&bar]> provides a list context to its subscript, which
can do weird things if you're expecting only one subscript. When called in
list context, it also returns the index (what C<&bar> returns) in addition to
the value.
=item *
L<Scalar value @%s{%s} better written as $%s{%s}|perldiag/"Scalar value @%s{%s} better written as $%s{%s}">
(W syntax) In scalar context, you've used a hash key/value slice (indicated by
%) to select a single element of a hash. Generally it's better to ask for a
scalar value (indicated by $). The difference is that C<$foo{&bar}> always
behaves like a scalar, both in the value it returns and when evaluating its
argument, while C<@foo{&bar}> and provides a list context to its subscript,
which can do weird things if you're expecting only one subscript. When called
in list context, it also returns the key in addition to the value.
=item *
L<Setting $E<sol> to a reference to %s as a form of slurp is deprecated, treating as undef|perldiag/"Setting $E<sol> to a reference to %s as a form of slurp is deprecated, treating as undef">
=item *
L<Unexpected exit %u|perldiag/"Unexpected exit %u">
(S) exit() was called or the script otherwise finished gracefully when
C<PERL_EXIT_WARN> was set in C<PL_exit_flags>.
=item *
L<Unexpected exit failure %d|perldiag/"Unexpected exit failure %d">
(S) An uncaught die() was called when C<PERL_EXIT_WARN> was set in
C<PL_exit_flags>.
=item *
L<Use of literal control characters in variable names is deprecated|perldiag/"Use of literal control characters in variable names is deprecated">
(D deprecated) Using literal control characters in the source to refer to the
^FOO variables, like $^X and ${^GLOBAL_PHASE} is now deprecated. This only
affects code like $\cT, where \cT is a control (like a C<SOH>) in the
source code: ${"\cT"} and $^T remain valid.
=item *
L<Useless use of greediness modifier|perldiag/"Useless use of greediness modifier '%c' in regex; marked by <-- HERE in m/%s/">
This fixes [Perl #42957].
=back
=head2 Changes to Existing Diagnostics
=over 4
=item *
Warnings and errors from the regexp engine are now UTF-8 clean.
=item *
The "Unknown switch condition" error message has some slight changes. This
error triggers when there is an unknown condition in a C<(?(foo))> conditional.
The error message used to read:
Unknown switch condition (?(%s in regex;
But what %s could be was mostly up to luck. For C<(?(foobar))>, you might have
seen "fo" or "f". For Unicode characters, you would generally get a corrupted
string. The message has been changed to read:
Unknown switch condition (?(...)) in regex;
Additionally, the C<'E<lt>-- HERE'> marker in the error will now point to the
correct spot in the regex.
=item *
The "%s "\x%X" does not map to Unicode" warning is now correctly listed as a
severe warning rather than as a fatal error.
=item *
Under rare circumstances, one could get a "Can't coerce readonly REF to
string" instead of the customary "Modification of a read-only value". This
alternate error message has been removed.
=item *
"Ambiguous use of * resolved as operator *": This and similar warnings
about "%" and "&" used to occur in some circumstances where there was no
operator of the type cited, so the warning was completely wrong. This has
been fixed [perl #117535, #76910].
=item *
Warnings about malformed subroutine prototypes are now more consistent in
how the prototypes are rendered. Some of these warnings would truncate
prototypes containing nulls. In other cases one warning would suppress
another. The warning about illegal characters in prototypes no longer says
"after '_'" if the bad character came before the underscore.
=item *
L<Perl folding rules are not up-to-date for 0x%X; please use the perlbug
utility to report; in regex; marked by <-- HERE in
mE<sol>%sE<sol>|perldiag/"Perl folding rules are not up-to-date for 0x%X;
please use the perlbug utility to report; in regex; marked by <-- HERE in
m/%s/">
This message is now only in the regexp category, and not in the deprecated
category. It is still a default (i.e., severe) warning [perl #89648].
=item *
L<%%s[%s] in scalar context better written as $%s[%s]|perldiag/"%%s[%s] in scalar context better written as $%s[%s]">
This warning now occurs for any C<%array[$index]> or C<%hash{key}> known to
be in scalar context at compile time. Previously it was worded "Scalar
value %%s[%s] better written as $%s[%s]".
=item *
L<Switch condition not recognized in regex; marked by <-- HERE in mE<sol>%sE<sol>|perldiag/"Switch condition not recognized in regex; marked by <-- HERE in m/%s/">:
The description for this diagnostic has been extended to cover all cases where the warning may occur.
Issues with the positioning of the arrow indicator have also been resolved.
=item *
The error messages for C<my($a?$b$c)> and C<my(do{})> now mention "conditional
expression" and "do block", respectively, instead of reading 'Can't declare
null operation in "my"'.
=item *
When C<use re "debug"> executes a regex containing a backreference, the
debugging output now shows what string is being matched.
=item *
The now fatal error message C<Character following "\c" must be ASCII> has been
reworded as C<Character following "\c" must be printable ASCII> to emphasize
that in C<\cI<X>>, I<X> must be a I<printable (non-control)> ASCII character.
=back
=head1 Utility Changes
=head3 L<a2p>
=over 4
=item *
A possible crash from an off-by-one error when trying to access before the
beginning of a buffer has been fixed. [perl #120244]
=back
=head3 F<bisect.pl>
The git bisection tool F<Porting/bisect.pl> has had many enhancements.
It is provided as part of the source distribution but not installed because
it is not self-contained as it relies on being run from within a git
checkout. Note also that it makes no attempt to fix tests, correct runtime
bugs or make something useful to install - its purpose is to make minimal
changes to get any historical revision of interest to build and run as close
as possible to "as-was", and thereby make C<git bisect> easy to use.
=over 4
=item *
Can optionally run the test case with a timeout.
=item *
Can now run in-place in a clean git checkout.
=item *
Can run the test case under C<valgrind>.
=item *
Can apply user supplied patches and fixes to the source checkout before
building.
=item *
Now has fixups to enable building several more historical ranges of bleadperl,
which can be useful for pinpointing the origins of bugs or behaviour changes.
=back
=head3 L<find2perl>
=over 4
=item *
L<find2perl> now handles C<?> wildcards correctly. [perl #113054]
=back
=head3 L<perlbug>
=over 4
=item *
F<perlbug> now has a C<-p> option for attaching patches with a bug report.
=item *
L<perlbug> has been modified to supply the report template with CRLF line
endings on Windows.
[L<perl #121277|https://rt.perl.org/Public/Bug/Display.html?id=121277>]
=item *
L<perlbug> now makes as few assumptions as possible about the encoding of the
report. This will likely change in the future to assume UTF-8 by default but
allow a user override.
=back
=head1 Configuration and Compilation
=over 4
=item *
The F<Makefile.PL> for L<SDBM_File> now generates a better F<Makefile>, which
avoids a race condition during parallel makes, which could cause the build to
fail. This is the last known parallel make problem (on *nix platforms), and
therefore we believe that a parallel make should now always be error free.
=item *
F<installperl> and F<installman>'s option handling has been refactored to use
L<Getopt::Long>. Both are used by the F<Makefile> C<install> targets, and
are not installed, so these changes are only likely to affect custom
installation scripts.
=over 4
=item *
Single letter options now also have long names.
=item *
Invalid options are now rejected.
=item *
Command line arguments that are not options are now rejected.
=item *
Each now has a C<--help> option to display the usage message.
=back
The behaviour for all valid documented invocations is unchanged.
=item *
Where possible, the build now avoids recursive invocations of F<make> when
building pure-Perl extensions, without removing any parallelism from the
build. Currently around 80 extensions can be processed directly by the
F<make_ext.pl> tool, meaning that 80 invocations of F<make> and 160
invocations of F<miniperl> are no longer made.
=item *
The build system now works correctly when compiling under GCC or Clang with
link-time optimization enabled (the C<-flto> option). [perl #113022]
=item *
Distinct library basenames with C<d_libname_unique>.
When compiling perl with this option, the library files for XS modules are
named something "unique" -- for example, Hash/Util/Util.so becomes
Hash/Util/PL_Hash__Util.so. This behavior is similar to what currently
happens on VMS, and serves as groundwork for the Android port.
=item *
C<sysroot> option to indicate the logical root directory under gcc and clang.
When building with this option set, both Configure and the compilers search
for all headers and libraries under this new sysroot, instead of /.
This is a huge time saver if cross-compiling, but can also help
on native builds if your toolchain's files have non-standard locations.
=item *
The cross-compilation model has been renovated.
There's several new options, and some backwards-incompatible changes:
We now build binaries for miniperl and generate_uudmap to be used on the host,
rather than running every miniperl call on the target; this means that, short
of 'make test', we no longer need access to the target system once Configure is
done. You can provide already-built binaries through the C<hostperl> and
C<hostgenerate> options to Configure.
Additionally, if targeting an EBCDIC platform from an ASCII host,
or viceversa, you'll need to run Configure with C<-Uhostgenerate>, to
indicate that generate_uudmap should be run on the target.
Finally, there's also a way of having Configure end early, right after
building the host binaries, by cross-compiling without specifying a
C<targethost>.
The incompatible changes include no longer using xconfig.h, xlib, or
Cross.pm, so canned config files and Makefiles will have to be updated.
=item *
Related to the above, there is now a way of specifying the location of sh
(or equivalent) on the target system: C<targetsh>.
For example, Android has its sh in /system/bin/sh, so if cross-compiling
from a more normal Unixy system with sh in /bin/sh, "targetsh" would end
up as /system/bin/sh, and "sh" as /bin/sh.
=item *
By default, B<gcc> 4.9 does some optimizations that break perl. The B<-fwrapv>
option disables those optimizations (and probably others), so for B<gcc> 4.3
and later (since the there might be similar problems lurking on older versions
too, but B<-fwrapv> was broken before 4.3, and the optimizations probably won't
go away), F<Configure> now adds B<-fwrapv> unless the user requests
B<-fno-wrapv>, which disables B<-fwrapv>, or B<-fsanitize=undefined>, which
turns the overflows B<-fwrapv> ignores into runtime errors.
[L<perl #121505|https://rt.perl.org/Public/Bug/Display.html?id=121505>]
=back
=head1 Testing
=over 4
=item *
The C<test.valgrind> make target now allows tests to be run in parallel.
This target allows Perl's test suite to be run under Valgrind, which detects
certain sorts of C programming errors, though at significant cost in running
time. On suitable hardware, allowing parallel execution claws back a lot of
that additional cost. [perl #121431]
=item *
Various tests in F<t/porting/> are no longer skipped when the perl
F<.git> directory is outside the perl tree and pointed to by
C<$GIT_DIR>. [perl #120505]
=item *
The test suite no longer fails when the user's interactive shell maintains a
C<$PWD> environment variable, but the F</bin/sh> used for running tests
doesn't.
=back
=head1 Platform Support
=head2 New Platforms
=over 4
=item Android
Perl can now be built for Android, either natively or through
cross-compilation, for all three currently available architectures (ARM,
MIPS, and x86), on a wide range of versions.
=item Bitrig
Compile support has been added for Bitrig, a fork of OpenBSD.
=item FreeMiNT
Support has been added for FreeMiNT, a free open-source OS for the Atari ST
system and its successors, based on the original MiNT that was officially
adopted by Atari.
=item Synology
Synology ships its NAS boxes with a lean Linux distribution (DSM) on relative
cheap CPU's (like the Marvell Kirkwood mv6282 - ARMv5tel or Freescale QorIQ
P1022 ppc - e500v2) not meant for workstations or development. These boxes
should build now. The basic problems are the non-standard location for tools.
=back
=head2 Discontinued Platforms
=over 4
=item C<sfio>
Code related to supporting the C<sfio> I/O system has been removed.
Perl 5.004 added support to use the native API of C<sfio>, AT&T's Safe/Fast
I/O library. This code still built with v5.8.0, albeit with many regression
tests failing, but was inadvertently broken before the v5.8.1 release,
meaning that it has not worked on any version of Perl released since then.
In over a decade we have received no bug reports about this, hence it is clear
that no-one is using this functionality on any version of Perl that is still
supported to any degree.
=item AT&T 3b1
Configure support for the 3b1, also known as the AT&T Unix PC (and the similar
AT&T 7300), has been removed.
=item DG/UX
DG/UX was a Unix sold by Data General. The last release was in April 2001.
It only runs on Data General's own hardware.
=item EBCDIC
In the absence of a regular source of smoke reports, code intended to support
native EBCDIC platforms will be removed from perl before 5.22.0.
=back
=head2 Platform-Specific Notes
=over 4
=item Cygwin
=over 4
=item *
recv() on a connected handle would populate the returned sender
address with whatever happened to be in the working buffer. recv()
now uses a workaround similar to the Win32 recv() wrapper and returns
an empty string when recvfrom(2) doesn't modify the supplied address
length. [perl #118843]
=item *
Fixed a build error in cygwin.c on Cygwin 1.7.28.
Tests now handle the errors that occur when C<cygserver> isn't
running.
=back
=item GNU/Hurd
The BSD compatibility library C<libbsd> is no longer required for builds.
=item Linux
The hints file now looks for C<libgdbm_compat> only if C<libgdbm> itself is
also wanted. The former is never useful without the latter, and in some
circumstances, including it could actually prevent building.
=item Mac OS
The build system now honors an C<ld> setting supplied by the user running
F<Configure>.
=item MidnightBSD
C<objformat> was removed from version 0.4-RELEASE of MidnightBSD and had been
deprecated on earlier versions. This caused the build environment to be
erroneously configured for C<a.out> rather than C<elf>. This has been now
been corrected.
=item Mixed-endian platforms
The code supporting C<pack> and C<unpack> operations on mixed endian
platforms has been removed. We believe that Perl has long been unable to
build on mixed endian architectures (such as PDP-11s), so we don't think
that this change will affect any platforms which were able to build v5.18.0.
=item VMS
=over 4
=item *
The C<PERL_ENV_TABLES> feature to control the population of %ENV at perl
start-up was broken in Perl 5.16.0 but has now been fixed.
=item *
Skip access checks on remotes in opendir(). [perl #121002]
=item *
A check for glob metacharacters in a path returned by the
L<C<glob()>|perlfunc/glob> operator has been replaced with a check for VMS
wildcard characters. This saves a significant number of unnecessary
L<C<lstat()>|perlfunc/lstat> calls such that some simple glob operations become
60-80% faster.
=back
=item Win32
=over 4
=item *
C<rename> and C<link> on Win32 now set $! to ENOSPC and EDQUOT when
appropriate. [perl #119857]
=item *
The BUILD_STATIC and ALL_STATIC makefile options for linking some or (nearly)
all extensions statically (into perl520.dll, and into a separate
perl-static.exe too) were broken for MinGW builds. This has now been fixed.
The ALL_STATIC option has also been improved to include the Encode and Win32
extensions (for both VC++ and MinGW builds).
=item *
Support for building with Visual C++ 2013 has been added. There are currently
two possible test failures (see L<perlwin32/"Testing Perl on Windows">) which
will hopefully be resolved soon.
=item *
Experimental support for building with Intel C++ Compiler has been added. The
nmake makefile (win32/Makefile) and the dmake makefile (win32/makefile.mk) can
be used. A "nmake test" will not pass at this time due to F<cpan/CGI/t/url.t>.
=item *
Killing a process tree with L<perlfunc/kill> and a negative signal, was broken
starting in 5.18.0. In this bug, C<kill> always returned 0 for a negative
signal even for valid PIDs, and no processes were terminated. This has been
fixed [perl #121230].
=item *
The time taken to build perl on Windows has been reduced quite significantly
(time savings in the region of 30-40% are typically seen) by reducing the
number of, usually failing, I/O calls for each L<C<require()>|perlfunc/require>
(for B<miniperl.exe> only).
[L<perl #121119|https://rt.perl.org/Public/Bug/Display.html?id=121119>]
=item *
About 15 minutes of idle sleeping was removed from running C<make test> due to
a bug in which the timeout monitor used for tests could not be cancelled once
the test completes, and the full timeout period elapsed before running the next
test file.
[L<perl #121395|https://rt.perl.org/Public/Bug/Display.html?id=121395>]
=item *
On a perl built without pseudo-fork (pseudo-fork builds were not affected by
this bug), killing a process tree with L<C<kill()>|perlfunc/kill> and a negative
signal resulted in C<kill()> inverting the returned value. For example, if
C<kill()> killed 1 process tree PID then it returned 0 instead of 1, and if
C<kill()> was passed 2 invalid PIDs then it returned 2 instead of 0. This has
probably been the case since the process tree kill feature was implemented on
Win32. It has now been corrected to follow the documented behaviour.
[L<perl #121230|https://rt.perl.org/Public/Bug/Display.html?id=121230>]
=item *
When building a 64-bit perl, an uninitialized memory read in B<miniperl.exe>,
used during the build process, could lead to a 4GB B<wperl.exe> being created.
This has now been fixed. (Note that B<perl.exe> itself was unaffected, but
obviously B<wperl.exe> would have been completely broken.)
[L<perl #121471|https://rt.perl.org/Public/Bug/Display.html?id=121471>]
=item *
Perl can now be built with B<gcc> version 4.8.1 from L<http://www.mingw.org>.
This was previously broken due to an incorrect definition of DllMain() in one
of perl's source files. Earlier B<gcc> versions were also affected when using
version 4 of the w32api package. Versions of B<gcc> available from
L<http://mingw-w64.sourceforge.net/> were not affected.
[L<perl #121643|https://rt.perl.org/Public/Bug/Display.html?id=121643>]
=item *
The test harness now has no failures when perl is built on a FAT drive with the
Windows OS on an NTFS drive.
[L<perl #21442|https://rt.perl.org/Public/Bug/Display.html?id=21442>]
=item *
When cloning the context stack in fork() emulation, Perl_cx_dup()
would crash accessing parameter information for context stack entries
that included no parameters, as with C<&foo;>.
[L<perl #121721|https://rt.perl.org/Public/Bug/Display.html?id=121721>]
=item *
Introduced by
L<perl #113536|https://rt.perl.org/Public/Bug/Display.html?id=113536>, a memory
leak on every call to C<system> and backticks (C< `` >), on most Win32 Perls
starting from 5.18.0 has been fixed. The memory leak only occurred if you
enabled psuedo-fork in your build of Win32 Perl, and were running that build on
Server 2003 R2 or newer OS. The leak does not appear on WinXP SP3.
[L<perl #121676|https://rt.perl.org/Public/Bug/Display.html?id=121676>]
=back
=item WinCE
=over 4
=item *
The building of XS modules has largely been restored. Several still cannot
(yet) be built but it is now possible to build Perl on WinCE with only a couple
of further patches (to L<Socket> and L<ExtUtils::MakeMaker>), hopefully to be
incorporated soon.
=item *
Perl can now be built in one shot with no user intervention on WinCE by running
C<nmake -f Makefile.ce all>.
Support for building with EVC (Embedded Visual C++) 4 has been restored. Perl
can also be built using Smart Devices for Visual C++ 2005 or 2008.
=back
=back
=head1 Internal Changes
=over 4
=item *
The internal representation has changed for the match variables $1, $2 etc.,
$`, $&, $', ${^PREMATCH}, ${^MATCH} and ${^POSTMATCH}. It uses slightly less
memory, avoids string comparisons and numeric conversions during lookup, and
uses 23 fewer lines of C. This change should not affect any external code.
=item *
Arrays now use NULL internally to represent unused slots, instead of
&PL_sv_undef. &PL_sv_undef is no longer treated as a special value, so
av_store(av, 0, &PL_sv_undef) will cause element 0 of that array to hold a
read-only undefined scalar. C<$array[0] = anything> will croak and
C<\$array[0]> will compare equal to C<\undef>.
=item *
The SV returned by HeSVKEY_force() now correctly reflects the UTF8ness of the
underlying hash key when that key is not stored as a SV. [perl #79074]
=item *
Certain rarely used functions and macros available to XS code are now
deprecated. These are:
C<utf8_to_uvuni_buf> (use C<utf8_to_uvchr_buf> instead),
C<valid_utf8_to_uvuni> (use C<utf8_to_uvchr_buf> instead),
C<NATIVE_TO_NEED> (this did not work properly anyway),
and C<ASCII_TO_NEED> (this did not work properly anyway).
Starting in this release, almost never does application code need to
distinguish between the platform's character set and Latin1, on which the
lowest 256 characters of Unicode are based. New code should not use
C<utf8n_to_uvuni> (use C<utf8_to_uvchr_buf> instead),
nor
C<uvuni_to_utf8> (use C<uvchr_to_utf8> instead),
=item *
The Makefile shortcut targets for many rarely (or never) used testing and
profiling targets have been removed, or merged into the only other Makefile
target that uses them. Specifically, these targets are gone, along with
documentation that referenced them or explained how to use them:
check.third check.utf16 check.utf8 coretest minitest.prep
minitest.utf16 perl.config.dashg perl.config.dashpg
perl.config.gcov perl.gcov perl.gprof perl.gprof.config
perl.pixie perl.pixie.atom perl.pixie.config perl.pixie.irix
perl.third perl.third.config perl.valgrind.config purecovperl
pureperl quantperl test.deparse test.taintwarn test.third
test.torture test.utf16 test.utf8 test_notty.deparse
test_notty.third test_notty.valgrind test_prep.third
test_prep.valgrind torturetest ucheck ucheck.third ucheck.utf16
ucheck.valgrind utest utest.third utest.utf16 utest.valgrind
It's still possible to run the relevant commands by "hand" - no underlying
functionality has been removed.
=item *
It is now possible to keep Perl from initializing locale handling.
For the most part, Perl doesn't pay attention to locale. (See
L<perllocale>.) Nonetheless, until now, on startup, it has always
initialized locale handling to the system default, just in case the
program being executed ends up using locales. (This is one of the first
things a locale-aware program should do, long before Perl knows if it
will actually be needed or not.) This works well except when Perl is
embedded in another application which wants a locale that isn't the
system default. Now, if the environment variable
C<PERL_SKIP_LOCALE_INIT> is set at the time Perl is started, this
initialization step is skipped. Prior to this, on Windows platforms,
the only workaround for this deficiency was to use a hacked-up copy of
internal Perl code. Applications that need to use older Perls can
discover if the embedded Perl they are using needs the workaround by
testing that the C preprocessor symbol C<HAS_SKIP_LOCALE_INIT> is not
defined. [RT #38193]
=item *
C<BmRARE> and C<BmPREVIOUS> have been removed. They were not used anywhere
and are not part of the API. For XS modules, they are now #defined as 0.
=item *
C<sv_force_normal>, which usually croaks on read-only values, used to allow
read-only values to be modified at compile time. This has been changed to
croak on read-only values regardless. This change uncovered several core
bugs.
=item *
Perl's new copy-on-write mechanism (which is now enabled by default),
allows any C<SvPOK> scalar to be automatically upgraded to a copy-on-write
scalar when copied. A reference count on the string buffer is stored in
the string buffer itself.
For example:
$ perl -MDevel::Peek -e'$a="abc"; $b = $a; Dump $a; Dump $b'
SV = PV(0x260cd80) at 0x2620ad8
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK)
PV = 0x2619bc0 "abc"\0
CUR = 3
LEN = 16
COW_REFCNT = 1
SV = PV(0x260ce30) at 0x2620b20
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK)
PV = 0x2619bc0 "abc"\0
CUR = 3
LEN = 16
COW_REFCNT = 1
Note that both scalars share the same PV buffer and have a COW_REFCNT
greater than zero.
This means that XS code which wishes to modify the C<SvPVX()> buffer of an
SV should call C<SvPV_force()> or similar first, to ensure a valid (and
unshared) buffer, and to call C<SvSETMAGIC()> afterwards. This in fact has
always been the case (for example hash keys were already copy-on-write);
this change just spreads the COW behaviour to a wider variety of SVs.
One important difference is that before 5.18.0, shared hash-key scalars
used to have the C<SvREADONLY> flag set; this is no longer the case.
This new behaviour can still be disabled by running F<Configure> with
B<-Accflags=-DPERL_NO_COW>. This option will probably be removed in Perl
5.22.
=item *
C<PL_sawampersand> is now a constant. The switch this variable provided
(to enable/disable the pre-match copy depending on whether C<$&> had been
seen) has been removed and replaced with copy-on-write, eliminating a few
bugs.
The previous behaviour can still be enabled by running F<Configure> with
B<-Accflags=-DPERL_SAWAMPERSAND>.
=item *
The functions C<my_swap>, C<my_htonl> and C<my_ntohl> have been removed.
It is unclear why these functions were ever marked as I<A>, part of the
API. XS code can't call them directly, as it can't rely on them being
compiled. Unsurprisingly, no code on CPAN references them.
=item *
The signature of the C<Perl_re_intuit_start()> regex function has changed;
the function pointer C<intuit> in the regex engine plugin structure
has also changed accordingly. A new parameter, C<strbeg> has been added;
this has the same meaning as the same-named parameter in
C<Perl_regexec_flags>. Previously intuit would try to guess the start of
the string from the passed SV (if any), and would sometimes get it wrong
(e.g. with an overloaded SV).
=item *
The signature of the C<Perl_regexec_flags()> regex function has
changed; the function pointer C<exec> in the regex engine plugin
structure has also changed to match. The C<minend> parameter now has
type C<SSize_t> to better support 64-bit systems.
=item *
XS code may use various macros to change the case of a character or code
point (for example C<toLOWER_utf8()>). Only a couple of these were
documented until now;
and now they should be used in preference to calling the underlying
functions. See L<perlapi/Character case changing>.
=item *
The code dealt rather inconsistently with uids and gids. Some
places assumed that they could be safely stored in UVs, others
in IVs, others in ints. Four new macros are introduced:
SvUID(), sv_setuid(), SvGID(), and sv_setgid()
=item *
C<sv_pos_b2u_flags> has been added to the API. It is similar to C<sv_pos_b2u>,
but supports long strings on 64-bit platforms.
=item *
C<PL_exit_flags> can now be used by perl embedders or other XS code to have
perl C<warn> or C<abort> on an attempted exit. [perl #52000]
=item *
Compiling with C<-Accflags=-PERL_BOOL_AS_CHAR> now allows C99 and C++
compilers to emulate the aliasing of C<bool> to C<char> that perl does for
C89 compilers. [perl #120314]
=item *
The C<sv> argument in L<perlapi/sv_2pv_flags>, L<perlapi/sv_2iv_flags>,
L<perlapi/sv_2uv_flags>, and L<perlapi/sv_2nv_flags> and their older wrappers
sv_2pv, sv_2iv, sv_2uv, sv_2nv, is now non-NULL. Passing NULL now will crash.
When the non-NULL marker was introduced en masse in 5.9.3 the functions
were marked non-NULL, but since the creation of the SV API in 5.0 alpha 2, if
NULL was passed, the functions returned 0 or false-type values. The code that
supports C<sv> argument being non-NULL dates to 5.0 alpha 2 directly, and
indirectly to Perl 1.0 (pre 5.0 api). The lack of documentation that the
functions accepted a NULL C<sv> was corrected in 5.11.0 and between 5.11.0
and 5.19.5 the functions were marked NULLOK. As an optimization the NULLOK code
has now been removed, and the functions became non-NULL marked again, because
core getter-type macros never pass NULL to these functions and would crash
before ever passing NULL.
The only way a NULL C<sv> can be passed to sv_2*v* functions is if XS code
directly calls sv_2*v*. This is unlikely as XS code uses Sv*V* macros to get
the underlying value out of the SV. One possible situation which leads to
a NULL C<sv> being passed to sv_2*v* functions, is if XS code defines its own
getter type Sv*V* macros, which check for NULL B<before> dereferencing and
checking the SV's flags through public API Sv*OK* macros or directly using
private API C<SvFLAGS>, and if C<sv> is NULL, then calling the sv_2*v functions
with a NULL litteral or passing the C<sv> containing a NULL value.
=item *
newATTRSUB is now a macro
The public API newATTRSUB was previously a macro to the private
function Perl_newATTRSUB. Function Perl_newATTRSUB has been removed. newATTRSUB
is now macro to a different internal function.
=item *
Changes in warnings raised by C<utf8n_to_uvchr()>
This bottom level function decodes the first character of a UTF-8 string
into a code point. It is accessible to C<XS> level code, but it's
discouraged from using it directly. There are higher level functions
that call this that should be used instead, such as
L<perlapi/utf8_to_uvchr_buf>. For completeness though, this documents
some changes to it. Now, tests for malformations are done before any
tests for other potential issues. One of those issues involves code
points so large that they have never appeared in any official standard
(the current standard has scaled back the highest acceptable code point
from earlier versions). It is possible (though not done in CPAN) to
warn and/or forbid these code points, while accepting smaller code
points that are still above the legal Unicode maximum. The warning
message for this now includes the code point if representable on the
machine. Previously it always displayed raw bytes, which is what it
still does for non-representable code points.
=item *
Regexp engine changes that affect the pluggable regex engine interface
Many flags that used to be exposed via regexp.h and used to populate the
extflags member of struct regexp have been removed. These fields were
technically private to Perl's own regexp engine and should not have been
exposed there in the first place.
The affected flags are:
RXf_NOSCAN
RXf_CANY_SEEN
RXf_GPOS_SEEN
RXf_GPOS_FLOAT
RXf_ANCH_BOL
RXf_ANCH_MBOL
RXf_ANCH_SBOL
RXf_ANCH_GPOS
As well as the follow flag masks:
RXf_ANCH_SINGLE
RXf_ANCH
All have been renamed to PREGf_ equivalents and moved to regcomp.h.
The behavior previously achieved by setting one or more of the RXf_ANCH_
flags (via the RXf_ANCH mask) have now been replaced by a *single* flag bit
in extflags:
RXf_IS_ANCHORED
pluggable regex engines which previously used to set these flags should
now set this flag ALONE.
=item *
The Perl core now consistently uses C<av_tindex()> ("the top index of an
array") as a more clearly-named synonym for C<av_len()>.
=item *
The obscure interpreter variable C<PL_timesbuf> is expected to be removed
early in the 5.21.x development series, so that Perl 5.22.0 will not provide
it to XS authors. While the variable still exists in 5.20.0, we hope that
this advance warning of the deprecation will help anyone who is using that
variable.
=back
=head1 Selected Bug Fixes
=head2 Regular Expressions
=over 4
=item *
Fixed a small number of regexp constructions that could either fail to
match or crash perl when the string being matched against was
allocated above the 2GB line on 32-bit systems. [RT #118175]
=item *
Various memory leaks involving the parsing of the C<(?[...])> regular
expression construct have been fixed.
=item *
C<(?[...])> now allows interpolation of precompiled patterns consisting of
C<(?[...])> with bracketed character classes inside (C<$pat =
S<qr/(?[ [a] ])/;> S</(?[ $pat ])/>>). Formerly, the brackets would
confuse the regular expression parser.
=item *
The "Quantifier unexpected on zero-length expression" warning message could
appear twice starting in Perl v5.10 for a regular expression also
containing alternations (e.g., "a|b") triggering the trie optimisation.
=item *
Perl v5.18 inadvertently introduced a bug whereby interpolating mixed up-
and down-graded UTF-8 strings in a regex could result in malformed UTF-8
in the pattern: specifically if a downgraded character in the range
C<\x80..\xff> followed a UTF-8 string, e.g.
utf8::upgrade( my $u = "\x{e5}");
utf8::downgrade(my $d = "\x{e5}");
/$u$d/
[RT #118297]
=item *
In regular expressions containing multiple code blocks, the values of
C<$1>, C<$2>, etc., set by nested regular expression calls would leak from
one block to the next. Now these variables always refer to the outer
regular expression at the start of an embedded block [perl #117917].
=item *
C</$qr/p> was broken in Perl 5.18.0; the C</p> flag was ignored. This has been
fixed. [perl #118213]
=item *
Starting in Perl 5.18.0, a construct like C</[#](?{})/x> would have its C<#>
incorrectly interpreted as a comment. The code block would be skipped,
unparsed. This has been corrected.
=item *
Starting in Perl 5.001, a regular expression like C</[#$a]/x> or C</[#]$a/x>
would have its C<#> incorrectly interpreted as a comment, so the variable would
not interpolate. This has been corrected. [perl #45667]
=item *
Perl 5.18.0 inadvertently made dereferenced regular expressions
S<(C<${ qr// }>)> false as booleans. This has been fixed.
=item *
The use of C<\G> in regular expressions, where it's not at the start of the
pattern, is now slightly less buggy (although it is still somewhat
problematic).
=item *
Where a regular expression included code blocks (C</(?{...})/>), and where the
use of constant overloading triggered a re-compilation of the code block, the
second compilation didn't see its outer lexical scope. This was a regression
in Perl 5.18.0.
=item *
The string position set by C<pos> could shift if the string changed
representation internally to or from utf8. This could happen, e.g., with
references to objects with string overloading.
=item *
Taking references to the return values of two C<pos> calls with the same
argument, and then assigning a reference to one and C<undef> to the other,
could result in assertion failures or memory leaks.
=item *
Elements of @- and @+ now update correctly when they refer to non-existent
captures. Previously, a referenced element (C<$ref = \$-[1]>) could refer to
the wrong match after subsequent matches.
=item *
The code that parses regex backrefs (or ambiguous backref/octals) such as \123
did a simple atoi(), which could wrap round to negative values on long digit
strings and cause segmentation faults. This has now been fixed. [perl
#119505]
=item *
Assigning another typeglob to C<*^R> no longer makes the regular expression
engine crash.
=item *
The C<\N> regular expression escape, when used without the curly braces (to
mean C<[^\n]>), was ignoring a following C<*> if followed by whitespace
under /x. It had been this way since C<\N> to mean C<[^\n]> was introduced
in 5.12.0.
=item *
C<s///>, C<tr///> and C<y///> now work when a wide character is used as the
delimiter. [perl #120463]
=item *
Some cases of unterminated (?...) sequences in regular expressions (e.g.,
C</(?</>) have been fixed to produce the proper error message instead of
"panic: memory wrap". Other cases (e.g., C</(?(/>) have yet to be fixed.
=item *
When a reference to a reference to an overloaded object was returned from
a regular expression C<(??{...})> code block, an incorrect implicit
dereference could take place if the inner reference had been returned by
a code block previously.
=item *
A tied variable returned from C<(??{...})> sees the inner values of match
variables (i.e., the $1 etc. from any matches inside the block) in its
FETCH method. This was not the case if a reference to an overloaded object
was the last thing assigned to the tied variable. Instead, the match
variables referred to the outer pattern during the FETCH call.
=item *
Fix unexpected tainting via regexp using locale. Previously, under certain
conditions, the use of character classes could cause tainting when it
shouldn't. Some character classes are locale-dependent, but before this
patch, sometimes tainting was happening even for character classes that
don't depend on the locale. [perl #120675]
=item *
Under certain conditions, Perl would throw an error if in an lookbehind
assertion in a regexp, the assertion referred to a named subpattern,
complaining the lookbehind was variable when it wasn't. This has been
fixed. [perl #120600], [perl #120618]. The current fix may be improved
on in the future.
=item *
C<$^R> wasn't available outside of the regular expression that
initialized it. [perl #121070]
=item *
A large set of fixes and refactoring for re_intuit_start() was merged,
the highlights are:
=over
=item *
Fixed a panic when compiling the regular expression
C</\x{100}[xy]\x{100}{2}/>.
=item *
Fixed a performance regression when performing a global pattern match
against a UTF-8 string. [perl #120692]
=item *
Fixed another performance issue where matching a regular expression
like C</ab.{1,2}x/> against a long UTF-8 string would unnecessarily
calculate byte offsets for a large portion of the string. [perl
#120692]
=back
=item *
Fixed an alignment error when compiling regular expressions when built
with GCC on HP-UX 64-bit.
=item *
On 64-bit platforms C<pos> can now be set to a value higher than 2**31-1.
[perl #72766]
=back
=head2 Perl 5 Debugger and -d
=over 4
=item *
The debugger's C<man> command been fixed. It was broken in the v5.18.0
release. The C<man> command is aliased to the names C<doc> and C<perldoc> -
all now work again.
=item *
C<@_> is now correctly visible in the debugger, fixing a regression
introduced in v5.18.0's debugger. [RT #118169]
=item *
Under copy-on-write builds (the default as of 5.20.0) C<< ${'_<-e'}[0] >>
no longer gets mangled. This is the first line of input saved for the
debugger's use for one-liners [perl #118627].
=item *
On non-threaded builds, setting C<${"_E<lt>filename"}> to a reference or
typeglob no longer causes C<__FILE__> and some error messages to produce a
corrupt string, and no longer prevents C<#line> directives in string evals from
providing the source lines to the debugger. Threaded builds were unaffected.
=item *
Starting with Perl 5.12, line numbers were off by one if the B<-d> switch was
used on the #! line. Now they are correct.
=item *
C<*DB::DB = sub {} if 0> no longer stops Perl's debugging mode from finding
C<DB::DB> subs declared thereafter.
=item *
C<%{'_<...'}> hashes now set breakpoints on the corresponding C<@{'_<...'}>
rather than whichever array C<@DB::dbline> is aliased to. [perl #119799]
=item *
Call set-magic when setting $DB::sub. [perl #121255]
=item *
The debugger's "n" command now respects lvalue subroutines and steps over
them [perl #118839].
=back
=head2 Lexical Subroutines
=over 4
=item *
Lexical constants (C<my sub a() { 42 }>) no longer crash when inlined.
=item *
Parameter prototypes attached to lexical subroutines are now respected when
compiling sub calls without parentheses. Previously, the prototypes were
honoured only for calls I<with> parentheses. [RT #116735]
=item *
Syntax errors in lexical subroutines in combination with calls to the same
subroutines no longer cause crashes at compile time.
=item *
Deep recursion warnings no longer crash lexical subroutines. [RT #118521]
=item *
The dtrace sub-entry probe now works with lexical subs, instead of
crashing [perl #118305].
=item *
Undefining an inlinable lexical subroutine (C<my sub foo() { 42 } undef
&foo>) would result in a crash if warnings were turned on.
=item *
An undefined lexical sub used as an inherited method no longer crashes.
=item *
The presence of a lexical sub named "CORE" no longer stops the CORE::
prefix from working.
=back
=head2 Everything Else
=over 4
=item *
The OP allocation code now returns correctly aligned memory in all cases
for C<struct pmop>. Previously it could return memory only aligned to a
4-byte boundary, which is not correct for an ithreads build with 64 bit IVs
on some 32 bit platforms. Notably, this caused the build to fail completely
on sparc GNU/Linux. [RT #118055]
=item *
Evaluating large hashes in scalar context is now much faster, as the number
of used chains in the hash is now cached for larger hashes. Smaller hashes
continue not to store it and calculate it when needed, as this saves one IV.
That would be 1 IV overhead for every object built from a hash. [RT #114576]
=item *
Perl v5.16 inadvertently introduced a bug whereby calls to XSUBs that were
not visible at compile time were treated as lvalues and could be assigned
to, even when the subroutine was not an lvalue sub. This has been fixed.
[RT #117947]
=item *
In Perl v5.18.0 dualvars that had an empty string for the string part but a
non-zero number for the number part starting being treated as true. In
previous versions they were treated as false, the string representation
taking precedeence. The old behaviour has been restored. [RT #118159]
=item *
Since Perl v5.12, inlining of constants that override built-in keywords of
the same name had countermanded C<use subs>, causing subsequent mentions of
the constant to use the built-in keyword instead. This has been fixed.
=item *
The warning produced by C<-l $handle> now applies to IO refs and globs, not
just to glob refs. That warning is also now UTF8-clean. [RT #117595]
=item *
C<delete local $ENV{nonexistent_env_var}> no longer leaks memory.
=item *
C<sort> and C<require> followed by a keyword prefixed with C<CORE::> now
treat it as a keyword, and not as a subroutine or module name. [RT #24482]
=item *
Through certain conundrums, it is possible to cause the current package to
be freed. Certain operators (C<bless>, C<reset>, C<open>, C<eval>) could
not cope and would crash. They have been made more resilient. [RT #117941]
=item *
Aliasing filehandles through glob-to-glob assignment would not update
internal method caches properly if a package of the same name as the
filehandle existed, resulting in filehandle method calls going to the
package instead. This has been fixed.
=item *
C<./Configure -de -Dusevendorprefix> didn't default. [RT #64126]
=item *
The C<Statement unlikely to be reached> warning was listed in
L<perldiag> as an C<exec>-category warning, but was enabled and disabled
by the C<syntax> category. On the other hand, the C<exec> category
controlled its fatal-ness. It is now entirely handled by the C<exec>
category.
=item *
The "Replacement list is longer that search list" warning for C<tr///> and
C<y///> no longer occurs in the presence of the C</c> flag. [RT #118047]
=item *
Stringification of NVs are not cached so that the lexical locale controls
stringification of the decimal point. [perl #108378] [perl #115800]
=item *
There have been several fixes related to Perl's handling of locales. perl
#38193 was described above in L</Internal Changes>.
Also fixed is
#118197, where the radix (decimal point) character had to be an ASCII
character (which doesn't work for some non-Western languages);
and #115808, in which C<POSIX::setlocale()> on failure returned an
C<undef> which didn't warn about not being defined even if those
warnings were enabled.
=item *
Compiling a C<split> operator whose third argument is a named constant
evaluating to 0 no longer causes the constant's value to change.
=item *
A named constant used as the second argument to C<index> no longer gets
coerced to a string if it is a reference, regular expression, dualvar, etc.
=item *
A named constant evaluating to the undefined value used as the second
argument to C<index> no longer produces "uninitialized" warnings at compile
time. It will still produce them at run time.
=item *
When a scalar was returned from a subroutine in @INC, the referenced scalar
was magically converted into an IO thingy, possibly resulting in "Bizarre
copy" errors if that scalar continued to be used elsewhere. Now Perl uses
an internal copy of the scalar instead.
=item *
Certain uses of the C<sort> operator are optimised to modify an array in
place, such as C<@a = sort @a>. During the sorting, the array is made
read-only. If a sort block should happen to die, then the array remained
read-only even outside the C<sort>. This has been fixed.
=item *
C<$a> and C<$b> inside a sort block are aliased to the actual arguments to
C<sort>, so they can be modified through those two variables. This did not
always work, e.g., for lvalue subs and C<$#ary>, and probably many other
operators. It works now.
=item *
The arguments to C<sort> are now all in list context. If the C<sort>
itself were called in void or scalar context, then I<some>, but not all, of
the arguments used to be in void or scalar context.
=item *
Subroutine prototypes with Unicode characters above U+00FF were getting
mangled during closure cloning. This would happen with subroutines closing
over lexical variables declared outside, and with lexical subs.
=item *
C<UNIVERSAL::can> now treats its first argument the same way that method
calls do: Typeglobs and glob references with non-empty IO slots are treated
as handles, and strings are treated as filehandles, rather than packages,
if a handle with that name exists [perl #113932].
=item *
Method calls on typeglobs (e.g., C<< *ARGV->getline >>) used to stringify
the typeglob and then look it up again. Combined with changes in Perl
5.18.0, this allowed C<< *foo->bar >> to call methods on the "foo" package
(like C<< foo->bar >>). In some cases it could cause the method to be
called on the wrong handle. Now a typeglob argument is treated as a
handle (just like C<< (\*foo)->bar >>), or, if its IO slot is empty, an
error is raised.
=item *
Assigning a vstring to a tied variable or to a subroutine argument aliased
to a nonexistent hash or array element now works, without flattening the
vstring into a regular string.
=item *
C<pos>, C<tie>, C<tied> and C<untie> did not work
properly on subroutine arguments aliased to nonexistent
hash and array elements [perl #77814, #27010].
=item *
The C<< => >> fat arrow operator can now quote built-in keywords even if it
occurs on the next line, making it consistent with how it treats other
barewords.
=item *
Autovivifying a subroutine stub via C<\&$glob> started causing crashes in Perl
5.18.0 if the $glob was merely a copy of a real glob, i.e., a scalar that had
had a glob assigned to it. This has been fixed. [perl #119051]
=item *
Perl used to leak an implementation detail when it came to referencing the
return values of certain operators. C<for ($a+$b) { warn \$_; warn \$_ }> used
to display two different memory addresses, because the C<\> operator was
copying the variable. Under threaded builds, it would also happen for
constants (C<for(1) { ... }>). This has been fixed. [perl #21979, #78194,
#89188, #109746, #114838, #115388]
=item *
The range operator C<..> was returning the same modifiable scalars with each
call, unless it was the only thing in a C<foreach> loop header. This meant
that changes to values within the list returned would be visible the next time
the operator was executed. [perl #3105]
=item *
Constant folding and subroutine inlining no longer cause operations that would
normally return new modifiable scalars to return read-only values instead.
=item *
Closures of the form C<sub () { $some_variable }> are no longer inlined,
causing changes to the variable to be ignored by callers of the subroutine.
[perl #79908]
=item *
Return values of certain operators such as C<ref> would sometimes be shared
between recursive calls to the same subroutine, causing the inner call to
modify the value returned by C<ref> in the outer call. This has been fixed.
=item *
C<__PACKAGE__> and constants returning a package name or hash key are now
consistently read-only. In various previous Perl releases, they have become
mutable under certain circumstances.
=item *
Enabling "used once" warnings no longer causes crashes on stash circularities
created at compile time (C<*Foo::Bar::Foo:: = *Foo::>).
=item *
Undef constants used in hash keys (C<use constant u =E<gt> undef; $h{+u}>) no
longer produce "uninitialized" warnings at compile time.
=item *
Modifying a substitution target inside the substitution replacement no longer
causes crashes.
=item *
The first statement inside a string eval used to use the wrong pragma setting
sometimes during constant folding. C<eval 'uc chr 0xe0'> would randomly choose
between Unicode, byte, and locale semantics. This has been fixed.
=item *
The handling of return values of @INC filters (subroutines returned by
subroutines in @INC) has been fixed in various ways. Previously tied variables
were mishandled, and setting $_ to a reference or typeglob could result in
crashes.
=item *
The C<SvPVbyte> XS function has been fixed to work with tied scalars returning
something other than a string. It used to return utf8 in those cases where
C<SvPV> would.
=item *
Perl 5.18.0 inadvertently made C<--> and C<++> crash on dereferenced regular
expressions, and stopped C<++> from flattening vstrings.
=item *
C<bless> no longer dies with "Can't bless non-reference value" if its first
argument is a tied reference.
=item *
C<reset> with an argument no longer skips copy-on-write scalars, regular
expressions, typeglob copies, and vstrings. Also, when encountering those or
read-only values, it no longer skips any array or hash with the same name.
=item *
C<reset> with an argument now skips scalars aliased to typeglobs
(C<for $z (*foo) { reset "z" }>). Previously it would corrupt memory or crash.
=item *
C<ucfirst> and C<lcfirst> were not respecting the bytes pragma. This was a
regression from Perl 5.12. [perl #117355]
=item *
Changes to C<UNIVERSAL::DESTROY> now update DESTROY caches in all classes,
instead of causing classes that have already had objects destroyed to continue
using the old sub. This was a regression in Perl 5.18. [perl #114864]
=item *
All known false-positive occurrences of the deprecation warning "Useless use of
'\'; doesn't escape metacharacter '%c'", added in Perl 5.18.0, have been
removed. [perl #119101]
=item *
The value of $^E is now saved across signal handlers on Windows. [perl #85104]
=item *
A lexical filehandle (as in C<open my $fh...>) is usually given a name based on
the current package and the name of the variable, e.g. "main::$fh". Under
recursion, the filehandle was losing the "$fh" part of the name. This has been
fixed.
=item *
Uninitialized values returned by XSUBs are no longer exempt from uninitialized
warnings. [perl #118693]
=item *
C<elsif ("")> no longer erroneously produces a warning about void context.
[perl #118753]
=item *
Passing C<undef> to a subroutine now causes @_ to contain the same read-only
undefined scalar that C<undef> returns. Furthermore, C<exists $_[0]> will now
return true if C<undef> was the first argument. [perl #7508, #109726]
=item *
Passing a non-existent array element to a subroutine does not usually
autovivify it unless the subroutine modifies its argument. This did not work
correctly with negative indices and with non-existent elements within the
array. The element would be vivified immediately. The delayed vivification
has been extended to work with those. [perl #118691]
=item *
Assigning references or globs to the scalar returned by $#foo after the @foo
array has been freed no longer causes assertion failures on debugging builds
and memory leaks on regular builds.
=item *
On 64-bit platforms, large ranges like 1..1000000000000 no longer crash, but
eat up all your memory instead. [perl #119161]
=item *
C<__DATA__> now puts the C<DATA> handle in the right package, even if the
current package has been renamed through glob assignment.
=item *
When C<die>, C<last>, C<next>, C<redo>, C<goto> and C<exit> unwind the scope,
it is possible for C<DESTROY> recursively to call a subroutine or format that
is currently being exited. It that case, sometimes the lexical variables
inside the sub would start out having values from the outer call, instead of
being undefined as they should. This has been fixed. [perl #119311]
=item *
${^MPEN} is no longer treated as a synonym for ${^MATCH}.
=item *
Perl now tries a little harder to return the correct line number in
C<(caller)[2]>. [perl #115768]
=item *
Line numbers inside multiline quote-like operators are now reported correctly.
[perl #3643]
=item *
C<#line> directives inside code embedded in quote-like operators are now
respected.
=item *
Line numbers are now correct inside the second here-doc when two here-doc
markers occur on the same line.
=item *
An optimization in Perl 5.18 made incorrect assumptions causing a bad
interaction with the L<Devel::CallParser> CPAN module. If the module was
loaded then lexical variables declared in separate statements following a
C<my(...)> list might fail to be cleared on scope exit.
=item *
C<&xsub> and C<goto &xsub> calls now allow the called subroutine to autovivify
elements of @_.
=item *
C<&xsub> and C<goto &xsub> no longer crash if *_ has been undefined and has no
ARRAY entry (i.e. @_ does not exist).
=item *
C<&xsub> and C<goto &xsub> now work with tied @_.
=item *
Overlong identifiers no longer cause a buffer overflow (and a crash). They
started doing so in Perl 5.18.
=item *
The warning "Scalar value @hash{foo} better written as $hash{foo}" now produces
far fewer false positives. In particular, C<@hash{+function_returning_a_list}>
and C<@hash{ qw "foo bar baz" }> no longer warn. The same applies to array
slices. [perl #28380, #114024]
=item *
C<$! = EINVAL; waitpid(0, WNOHANG);> no longer goes into an internal infinite
loop. [perl #85228]
=item *
A possible segmentation fault in filehandle duplication has been fixed.
=item *
A subroutine in @INC can return a reference to a scalar containing the initial
contents of the file. However, that scalar was freed prematurely if not
referenced elsewhere, giving random results.
=item *
C<last> no longer returns values that the same statement has accumulated so
far, fixing amongst other things the long-standing bug that C<push @a, last>
would try to return the @a, copying it like a scalar in the process and
resulting in the error, "Bizarre copy of ARRAY in last." [perl #3112]
=item *
In some cases, closing file handles opened to pipe to or from a process, which
had been duplicated into a standard handle, would call perl's internal waitpid
wrapper with a pid of zero. With the fix for [perl #85228] this zero pid was
passed to C<waitpid>, possibly blocking the process. This wait for process
zero no longer occurs. [perl #119893]
=item *
C<select> used to ignore magic on the fourth (timeout) argument, leading to
effects such as C<select> blocking indefinitely rather than the expected sleep
time. This has now been fixed. [perl #120102]
=item *
The class name in C<for my class $foo> is now parsed correctly. In the case of
the second character of the class name being followed by a digit (e.g. 'a1b')
this used to give the error "Missing $ on loop variable". [perl #120112]
=item *
Perl 5.18.0 accidentally disallowed C<-bareword> under C<use strict> and
C<use integer>. This has been fixed. [perl #120288]
=item *
C<-a> at the start of a line (or a hyphen with any single letter that is
not a filetest operator) no longer produces an erroneous 'Use of "-a"
without parentheses is ambiguous' warning. [perl #120288]
=item *
Lvalue context is now properly propagated into bare blocks and C<if> and
C<else> blocks in lvalue subroutines. Previously, arrays and hashes would
sometimes incorrectly be flattened when returned in lvalue list context, or
"Bizarre copy" errors could occur. [perl #119797]
=item *
Lvalue context is now propagated to the branches of C<||> and C<&&> (and
their alphabetic equivalents, C<or> and C<and>). This means
C<foreach (pos $x || pos $y) {...}> now allows C<pos> to be modified
through $_.
=item *
C<stat> and C<readline> remember the last handle used; the former
for the special C<_> filehandle, the latter for C<${^LAST_FH}>.
C<eval "*foo if 0"> where *foo was the last handle passed to C<stat>
or C<readline> could cause that handle to be forgotten if the
handle were not opened yet. This has been fixed.
=item *
Various cases of C<delete $::{a}>, C<delete $::{ENV}> etc. causing a crash
have been fixed. [perl #54044]
=item *
Setting C<$!> to EACCESS before calling C<require> could affect
C<require>'s behaviour. This has been fixed.
=item *
The "Can't use \1 to mean $1 in expression" warning message now only occurs
on the right-hand (replacement) part of a substitution. Formerly it could
happen in code embedded in the left-hand side, or in any other quote-like
operator.
=item *
Blessing into a reference (C<bless $thisref, $thatref>) has long been
disallowed, but magical scalars for the second like C<$/> and those tied
were exempt. They no longer are. [perl #119809]
=item *
Blessing into a reference was accidentally allowed in 5.18 if the class
argument were a blessed reference with stale method caches (i.e., whose
class had had subs defined since the last method call). They are
disallowed once more, as in 5.16.
=item *
C<< $x->{key} >> where $x was declared as C<my Class $x> no longer crashes
if a Class::FIELDS subroutine stub has been declared.
=item *
C<@$obj{'key'}> and C<${$obj}{key}> used to be exempt from compile-time
field checking ("No such class field"; see L<fields>) but no longer are.
=item *
A nonexistent array element with a large index passed to a subroutine that
ties the array and then tries to access the element no longer results in a
crash.
=item *
Declaring a subroutine stub named NEGATIVE_INDICES no longer makes negative
array indices crash when the current package is a tied array class.
=item *
Declaring a C<require>, C<glob>, or C<do> subroutine stub in the
CORE::GLOBAL:: package no longer makes compilation of calls to the
corresponding functions crash.
=item *
Aliasing CORE::GLOBAL:: functions to constants stopped working in Perl 5.10
but has now been fixed.
=item *
When C<`...`> or C<qx/.../> calls a C<readpipe> override, double-quotish
interpolation now happens, as is the case when there is no override.
Previously, the presence of an override would make these quote-like
operators act like C<q{}>, suppressing interpolation. [perl #115330]
=item *
C<<<<`...`> here-docs (with backticks as the delimiters) now call
C<readpipe> overrides. [perl #119827]
=item *
C<&CORE::exit()> and C<&CORE::die()> now respect L<vmsish> hints.
=item *
Undefining a glob that triggers a DESTROY method that undefines the same
glob is now safe. It used to produce "Attempt to free unreferenced glob
pointer" warnings and leak memory.
=item *
If subroutine redefinition (C<eval 'sub foo{}'> or C<newXS> for XS code)
triggers a DESTROY method on the sub that is being redefined, and that
method assigns a subroutine to the same slot (C<*foo = sub {}>), C<$_[0]>
is no longer left pointing to a freed scalar. Now DESTROY is delayed until
the new subroutine has been installed.
=item *
On Windows, perl no longer calls CloseHandle() on a socket handle. This makes
debugging easier on Windows by removing certain irrelevant bad handle
exceptions. It also fixes a race condition that made socket functions randomly
fail in a Perl process with multiple OS threads, and possible test failures in
F<dist/IO/t/cachepropagate-tcp.t>. [perl #120091/118059]
=item *
Formats involving UTF-8 encoded strings, or strange vars like ties,
overloads, or stringified refs (and in recent
perls, pure NOK vars) would generally do the wrong thing in formats
when the var is treated as a string and repeatedly chopped, as in
C<< ^<<<~~ >> and similar. This has now been resolved.
[perl #33832/45325/113868/119847/119849/119851]
=item *
C<< semctl(..., SETVAL, ...) >> would set the semaphore to the top
32-bits of the supplied integer instead of the bottom 32-bits on
64-bit big-endian systems. [perl #120635]
=item *
C<< readdir() >> now only sets C<$!> on error. C<$!> is no longer set
to C<EBADF> when then terminating C<undef> is read from the directory
unless the system call sets C<$!>. [perl #118651]
=item *
C<&CORE::glob> no longer causes an intermittent crash due to perl's stack
getting corrupted. [perl #119993]
=item *
C<open> with layers that load modules (e.g., "<:encoding(utf8)") no longer
runs the risk of crashing due to stack corruption.
=item *
Perl 5.18 broke autoloading via C<< ->SUPER::foo >> method calls by looking
up AUTOLOAD from the current package rather than the current package's
superclass. This has been fixed. [perl #120694]
=item *
A longstanding bug causing C<do {} until CONSTANT>, where the constant
holds a true value, to read unallocated memory has been resolved. This
would usually happen after a syntax error. In past versions of Perl it has
crashed intermittently. [perl #72406]
=item *
Fix HP-UX C<$!> failure. HP-UX strerror() returns an empty string for an
unknown error code. This caused an assertion to fail under DEBUGGING
builds. Now instead, the returned string for C<"$!"> contains text
indicating the code is for an unknown error.
=item *
Individually-tied elements of @INC (as in C<tie $INC[0]...>) are now
handled correctly. Formerly, whether a sub returned by such a tied element
would be treated as a sub depended on whether a FETCH had occurred
previously.
=item *
C<getc> on a byte-sized handle after the same C<getc> operator had been
used on a utf8 handle used to treat the bytes as utf8, resulting in erratic
behavior (e.g., malformed UTF-8 warnings).
=item *
An initial C<{> at the beginning of a format argument line was always
interpreted as the beginning of a block prior to v5.18. In Perl v5.18, it
started being treated as an ambiguous token. The parser would guess
whether it was supposed to be an anonymous hash constructor or a block
based on the contents. Now the previous behavious has been restored.
[perl #119973]
=item *
In Perl v5.18 C<undef *_; goto &sub> and C<local *_; goto &sub> started
crashing. This has been fixed. [perl #119949]
=item *
Backticks (C< `` > or C< qx// >) combined with multiple threads on
Win32 could result in output sent to stdout on one thread being
captured by backticks of an external command in another thread.
This could occur for pseudo-forked processes too, as Win32's
pseudo-fork is implemented in terms of threads. [perl #77672]
=item *
C<< open $fh, ">+", undef >> no longer leaks memory when TMPDIR is set
but points to a directory a temporary file cannot be created in. [perl
#120951]
=item *
C< for ( $h{k} || '' ) > no longer auto-vivifies C<$h{k}>. [perl
#120374]
=item *
On Windows machines, Perl now emulates the POSIX use of the environment
for locale initialization. Previously, the environment was ignored.
See L<perllocale/ENVIRONMENT>.
=item *
Fixed a crash when destroying a self-referencing GLOB. [perl #121242]
=back
=head1 Known Problems
=over 4
=item *
L<IO::Socket> is known to fail tests on AIX 5.3. There is
L<a patch|https://rt.perl.org/Ticket/Display.html?id=120835> in the request
tracker, #120835, which may be applied to future releases.
=item *
The following modules are known to have test failures with this version of
Perl. Patches have been submitted, so there will hopefully be new releases
soon:
=over
=item *
L<Data::Structure::Util> version 0.15
=item *
L<HTML::StripScripts> version 1.05
=item *
L<List::Gather> version 0.08.
=back
=back
=head1 Obituary
Diana Rosa, 27, of Rio de Janeiro, went to her long rest on May 10,
2014, along with the plush camel she kept hanging on her computer screen
all the time. She was a passionate Perl hacker who loved the language and its
community, and who never missed a Rio.pm event. She was a true artist, an
enthusiast about writing code, singing arias and graffiting walls. We'll never
forget you.
Greg McCarroll died on August 28, 2013.
Greg was well known for many good reasons. He was one of the organisers of
the first YAPC::Europe, which concluded with an unscheduled auction where he
frantically tried to raise extra money to avoid the conference making a
loss. It was Greg who mistakenly arrived for a london.pm meeting a week
late; some years later he was the one who sold the choice of official
meeting date at a YAPC::Europe auction, and eventually as glorious leader of
london.pm he got to inherit the irreverent confusion that he had created.
Always helpful, friendly and cheerfully optimistic, you will be missed, but
never forgotten.
=head1 Acknowledgements
Perl 5.20.0 represents approximately 12 months of development since Perl 5.18.0
and contains approximately 470,000 lines of changes across 2,900 files from 124
authors.
Excluding auto-generated files, documentation and release tools, there were
approximately 280,000 lines of changes to 1,800 .pm, .t, .c and .h files.
Perl continues to flourish into its third decade thanks to a vibrant community
of users and developers. The following people are known to have contributed the
improvements that became Perl 5.20.0:
Aaron Crane, Abhijit Menon-Sen, Abigail, Abir Viqar, Alan Haggai Alavi, Alan
Hourihane, Alexander Voronov, Alexandr Ciornii, Andy Dougherty, Anno Siegel,
Aristotle Pagaltzis, Arthur Axel 'fREW' Schmidt, Brad Gilbert, Brendan Byrd,
Brian Childs, Brian Fraser, Brian Gottreu, Chris 'BinGOs' Williams, Christian
Millour, Colin Kuskie, Craig A. Berry, Dabrien 'Dabe' Murphy, Dagfinn Ilmari
Mannsåker, Daniel Dragan, Darin McBride, David Golden, David Leadbeater, David
Mitchell, David Nicol, David Steinbrunner, Dennis Kaarsemaker, Dominic
Hargreaves, Ed Avis, Eric Brine, Evan Zacks, Father Chrysostomos, Florian
Ragwitz, François Perrad, Gavin Shelley, Gideon Israel Dsouza, Gisle Aas,
Graham Knop, H.Merijn Brand, Hauke D, Heiko Eissfeldt, Hiroo Hayashi, Hojung
Youn, James E Keenan, Jarkko Hietaniemi, Jerry D. Hedden, Jess Robinson, Jesse
Luehrs, Johan Vromans, John Gardiner Myers, John Goodyear, John P. Linderman,
John Peacock, kafka, Kang-min Liu, Karen Etheridge, Karl Williamson, Keedi Kim,
Kent Fredric, kevin dawson, Kevin Falcone, Kevin Ryde, Leon Timmermans, Lukas
Mai, Marc Simpson, Marcel Grünauer, Marco Peereboom, Marcus Holland-Moritz,
Mark Jason Dominus, Martin McGrath, Matthew Horsfall, Max Maischein, Mike
Doherty, Moritz Lenz, Nathan Glenn, Nathan Trapuzzano, Neil Bowers, Neil
Williams, Nicholas Clark, Niels Thykier, Niko Tyni, Olivier Mengué, Owain G.
Ainsworth, Paul Green, Paul Johnson, Peter John Acklam, Peter Martini, Peter
Rabbitson, Petr Písař, Philip Boulain, Philip Guenther, Piotr Roszatycki,
Rafael Garcia-Suarez, Reini Urban, Reuben Thomas, Ricardo Signes, Ruslan
Zakirov, Sergey Alekseev, Shirakata Kentaro, Shlomi Fish, Slaven Rezic,
Smylers, Steffen Müller, Steve Hay, Sullivan Beck, Thomas Sibley, Tobias
Leich, Toby Inkster, Tokuhiro Matsuno, Tom Christiansen, Tom Hukins, Tony Cook,
Victor Efimov, Viktor Turskyi, Vladimir Timofeev, YAMASHINA Hio, Yves Orton,
Zefram, Zsbán Ambrus, Ævar Arnfjörð Bjarmason.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles recently
posted to the comp.lang.perl.misc newsgroup and the perl bug database at
http://rt.perl.org/perlbug/ . There may also be information at
http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug> program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of C<perl -V>,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it
inappropriate to send to a publicly archived mailing list, then please send it
to perl5-security-report@perl.org. This points to a closed subscription
unarchived mailing list, which includes all the core committers, who will be
able to help assess the impact of issues, figure out a resolution, and help
co-ordinate the release of patches to mitigate or fix the problem across all
platforms on which Perl is supported. Please only use this address for
security issues in the Perl core, not for modules independently distributed on
CPAN.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details on
what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�ZRZ��/ �/
perlko.podnu �[��� =encoding utf8
이 파일을 내용 그대로 읽고 있다면 우스꽝스러운 문자는 무시해주세요.
이 문서는 POD로 읽을 수 있도록 POD 형식(F<pod/perlpod.pod> 문서를
확인하세요)으로 작성되어 있습니다.
=head1 NAME
perlko - 한국어 Perl 안내서
=head1 DESCRIPTION
Perl의 세계에 오신 것을 환영합니다!
Perl은 가끔 B<'Practical Extraction and Report Language'>라고 하기도 합니다만
다른 널리 알려진 것들 중에서 B<'Pathologically Eclectic Rubbish Lister'>라고
하기도 합니다. 사실 이것은 끼워 맞춘 것이며 Perl이 이것들의 첫 글자를
가져와서 이름을 붙인 것은 아닙니다. Perl의 창시자 Larry가 첫 번째 이름을
먼저 생각했고 널리 알려진 것을 나중에 지었기 때문입니다. 그렇기 때문에
B<'Perl'>은 모두 대문자가 아닙니다. 널리 알려진 어떤 것을 가지고 논쟁하는
것은 의미가 없습니다. Larry는 두 개 다 지지합니다.
가끔 p가 소문자로 작성된 B<'perl'>을 볼 것입니다. P가 대문자로 되어 있는
B<'Perl'>은 언어를 참조할 때 쓰이며 B<'perl'>처럼 p가 소문자인 경우는 여러분의
프로그램을 컴파일하고 돌릴 때 사용되는 해석기를 지칭할 때 사용됩니다.
=head1 Perl에 관하여
Perl은 본래 문자열 생성을 위해 만들졌지만 지금은 시스템 관리와 웹 개발,
네트워크 프로그래밍, GUI 개발 등을 포함한 여러 분야에서 널리 사용되는
범용 프로그래밍 언어입니다.
이 언어는 아름다움(아주 작고, 우아하고, 아주 적고)보다
실용적(사용하기 쉽고, 효율적이며, 가능한 최대한)인 것을 지향하고 있습니다.
사용하기 쉽고, 절차적 프로그래밍과 객체 지향 프로그래밍을 모두 지원하고,
강력한 문자열 처리 기능을 내장하고, 세상에서 가장 인상적인 제 3자의 모듈
모음처를 가지고 있다는 것은 Perl의 가장 중요한 특징입니다.
Perl의 언어적 특징은 F<pod/perlintro.pod> 문서에서 소개합니다.
이번 릴리스에서 가장 중요한 변화는 F<pod/perldelta.pod>에서 논의합니다.
또한 다양한 출판사가 출판한 많은 Perl 책은 다양한 주제를 다루고 있습니다.
자세한 정보는 F<pod/perlbook.pod> 문서를 확인하세요.
=head1 설치
여러분이 비교적 현대의 운영체제를 사용하고 있고 현재 버전의 Perl을
지역적으로 설치하고 싶다면 다음 명령을 실행하세요.
./Configure -des -Dprefix=$HOME/localperl
make test
make install
앞의 명령은 여러분의 플랫폼에 맞게 환경을 설정하고 컴파일을 수행한 후,
회기 테스트를 수행한뒤, 홈 디렉터리 하부의 F<localperl> 디렉터리에 perl을
설치합니다.
여러분이 어떠한 문제든 겪게 되거나 사용자 정의 버전 Perl을 설치할 필요가 있다면
현재 배포판에 들어있는 F<INSTALL> 파일 안의 자세한 설명을 읽어야 합니다.
추가적으로 일반적이지 않은 다양한 플랫폼에서 Perl을 빌드하고 사용하는
방법에 대한 도움말과 귀띔이 적혀있는 많은 수의 F<README> 파일이 있습니다.
일단 Perl을 설치하고 나면 C<perldoc> 도구를 이용해 풍부한 문서를 사용할
수 있습니다. 시작하기 위해서 다음 명령을 실행하세요.
perldoc perl
=head1 실행에 어려움을 겪는다면
Perl은 뜨개질에서 부터 로켓 과학까지 모든 분야에서 사용할 수 있는 크고
복잡한 시스템입니다. 여러분이 어려움에 부딪혔을때 그 문제는 이미 다른
사람이 해결했을 가능성이 높습니다. 문서를 모두 확인했는데도 버그가
확실하다면 C<perlbug> 도구를 이용해서 저희에게 버그를 보고해주세요.
C<perlbug>에 대한 더 자세한 정보는 C<perldoc perlbug> 또는 C<perlbug>를
명령줄에서 실행해서 확인할 수 있습니다.
Perl을 사용 가능하게 만들었다 하더라도 Perl은 계속해서 진화하기 때문에
여러분이 맞닥뜨린 버그를 수정했거나 여러분이 유용하다고 생각할법한
새로운 기능이 추가된 좀 더 최신 버전이 있을 수 있습니다.
여러분은 항상 최신 버전의 perl을 CPAN (Comprehensive Perl Archive Network)
사이트 L<http://www.cpan.org/src/> 에서 찾을 수 있습니다.
perl 소스에 간단한 패치를 등록하고 싶다면 F<pod/perlhack.pod> 문서의
B<"SUPER QUICK PATCH GUIDE">를 살펴보세요.
그냥 개인적으로 참고하세요.
제가 이것처럼 멋진 물건을 만든다는 것을 여러분이 알기를 바랍니다.
그것은 제 이야기의 B<"저자(Author)">를 기쁘게하기 때문입니다.
이것이 여러분을 귀찮게 한다면 여러분의 B<"저작(Authorship)">에
대한 생각을 정정해야 할 수도 있습니다. 하지만 어쨌거나 여러분은
Perl을 사용하는데는 문제가 없답니다. :-)
- B<"저자">로부터.
=head1 인코딩
Perl은 5.8.0판부터 유니코드/ISO 10646에 대해 광범위하게 지원합니다.
유니코드 지원의 일환으로 한중일을 비롯한 세계 각국에서
유니코드 이전에 쓰고 있었고 지금도 널리 쓰이고 있는 수많은 인코딩을
지원합니다. 유니코드는 전 세계에서 쓰이는 모든 언어를 위한
표기 체계(유럽의 라틴 알파벳, 키릴 알파벳, 그리스 알파벳, 인도와 동남 아시아의
브라미 계열 스크립트, 아랍 문자, 히브리 문자, 한중일의 한자, 한국어의 한글,
일본어의 가나, 북미 인디안의 표기 체계 등)를 수용하는 것을 목표로 하고
있기 때문에 기존에 쓰이던 각 언어 및 국가 그리고 운영 체계에 고유한
문자 집합과 인코딩에 쓸 수 있는 모든 글자는 물론이고 기존 문자 집합에서
지원하고 있지 않던 아주 많은 글자를 포함하고 있습니다.
Perl은 내부적으로 유니코드를 문자 표현을 위해 사용합니다.
보다 구체적으로 말하면 Perl 스크립트 안에서 UTF-8 문자열을 쓸 수 있고,
각종 함수와 연산자(예를 들어, 정규식, index, substr)가 바이트 단위
대신 유니코드 글자 단위로 동작합니다.
더 자세한 것은 F<pod/perlunicode.pod> 문서를 참고하세요.
유니코드가 널리 보급되기 전에 널리 쓰이고 있었고, 여전히 널리 쓰이고 있는
각국/각 언어별 인코딩으로 입출력을 하고 이들 인코딩으로 된 데이터와 문서를
다루는 것을 돕기 위해 L<Encode> 모듈이 쓰이고 있습니다.
무엇보다 L<Encode> 모듈을 사용하면 수많은 인코딩 사이의 변환을 쉽게 할 수 있습니다.
=head2 Encode 모듈
=head3 지원 인코딩
L<Encode> 모듈은 다음과 같은 한국어 인코딩을 지원합니다.
=over 4
=item * C<euc-kr>
US-ASCII와 KS X 1001을 같이 쓰는 멀티바이트 인코딩으로 흔히
완성형이라고 불림. KS X 2901과 RFC 1557 참고.
=item * C<cp949>
MS-Windows 9x/ME에서 쓰이는 확장 완성형. euc-kr에 8,822자의
한글 음절을 더한 것임. alias는 uhc, windows-949, x-windows-949,
ks_c_5601-1987. 맨 마지막 이름은 적절하지 않은 이름이지만, Microsoft
제품에서 CP949의 의미로 쓰이고 있음.
=item * C<johab>
KS X 1001:1998 부록 3에서 규정한 조합형. 문자 레퍼토리는 cp949와 마찬가지로
US-ASCII와 KS X 1001에 8,822자의 한글 음절을 더한 것으로 인코딩 방식은 전혀 다름.
=item * C<iso-2022-kr>
RFC 1557에서 규정한 한국어 인터넷 메일 교환용 인코딩으로 US-ASCII와
KS X 1001을 레퍼토리로 하는 점에서 euc-kr과 같지만 인코딩 방식이 다름.
1997-8년 경까지 쓰였으나 더 이상 메일 교환에 쓰이지 않음.
=item * C<ksc5601-raw>
KS X 1001(KS C 5601)을 GL(즉, MSB를 0으로 한 경우)에 놓았을 때의 인코딩.
US-ASCII와 결합하지 않고 단독으로 쓰이는 일은 X11 등에서 글꼴
인코딩(ksc5601.1987-0. '0'은 GL을 의미함)으로 쓰이는 것을 제외하고는
거의 없음. KS C 5601은 1997년 KS X 1001로 이름을 바꾸었음. 1998년에는 두
글자(유로화 부호와 등록 상표 부호)가 더해졌음.
=back
=head3 변환 예제
예를 들어, euc-kr 인코딩으로 된 파일을 UTF-8로 변환하려면
명령줄에서 다음처럼 실행합니다.
perl -Mencoding=euc-kr,STDOUT,utf8 -pe1 < file.euc-kr > file.utf8
반대로 변환할 경우 다음처럼 실행합니다.
perl -Mencoding=utf8,STDOUT,euc-kr -pe1 < file.utf8 > file.euc-kr
이런 변환을 좀더 편리하게 할 수 있도록 도와주는 F<piconv>가 Perl에
기본으로 들어있습니다. 이 유틸리티는 L<Encode> 모듈을 이용한 순수 Perl
유틸리티로 이름에서 알 수 있듯이 Unix의 C<iconv>를 모델로 한 것입니다.
사용법은 다음과 같습니다.
piconv -f euc-kr -t utf8 < file.euc-kr > file.utf8
piconv -f utf8 -t euc-kr < file.utf8 > file.euc-kr
=head3 모범 사례
Perl은 기본적으로 내부에서 UTF-8을 사용하며 Encode 모듈을 통해
다양한 인코딩을 지원하지만 항상 다음 규칙을 지킴으로써 인코딩과
관련한 다양하게 발생할 수 있는 문제의 가능성을 줄이는 것을 추천합니다.
=over 4
=item * 소스 코드는 항상 UTF-8 인코딩으로 저장
=item * 소스 코드 상단에 C<use utf8;> 프라그마 사용
=item * 소스 코드, 터미널, 운영체제, 데이터 인코딩을 분리해서 이해
=item * 입출력 파일 핸들에 명시적인 인코딩을 사용
=item * 중복(double) 인코딩을 주의
=back
=head3 유니코드 및 한국어 인코딩 관련 자료
=over 4
=item * L<perluniintro>
=item * L<perlunicode>
=item * L<Encode>
=item * L<Encode::KR>
=item * L<encoding>
=item * L<http://www.unicode.org/>
유니코드 컨소시엄
=item * L<http://std.dkuug.dk/JTC1/SC2/WG2>
기본적으로 Unicode와 같은 ISO 표준인 ISO/IEC 10646 UCS(Universal
Character Set)을 만드는 ISO/IEC JTC1/SC2/WG2의 웹 페이지
=item * L<http://www.cl.cam.ac.uk/~mgk25/unicode.html>
유닉스/리눅스 사용자를 위한 UTF-8 및 유니코드 관련 FAQ
=item * L<http://wiki.kldp.org/Translations/html/UTF8-Unicode-KLDP/UTF8-Unicode-KLDP.html>
유닉스/리눅스 사용자를 위한 UTF-8 및 유니코드 관련 FAQ의 한국어 번역
=back
=head1 Perl 관련 자료
다음은 공식적인 Perl 관련 자료중 일부입니다.
=over 4
=item * L<http://www.perl.org/>
Perl 공식 홈페이지
=item * L<http://www.perl.com/>
O'Reilly의 Perl 웹 페이지
=item * L<http://www.cpan.org/>
CPAN - Comprehensive Perl Archive Network, 통합적 Perl 파일 보관 네트워크
=item * L<http://metacpan.org>
메타 CPAN
=item * L<http://lists.perl.org/>
Perl 메일링 리스트
=item * L<http://blogs.perl.org/>
Perl 메타 블로그
=item * L<http://www.perlmonks.org/>
Perl 수도승들을 위한 수도원
=item * L<http://www.pm.org/groups/asia.html>
아시아 지역 Perl 몽거스 모임
=item * L<http://www.perladvent.org/>
Perl 크리스마스 달력
=back
다음은 Perl을 더 깊게 공부하는데 도움을 줄 수 있는 한국어 관련 사이트입니다.
=over 4
=item * L<http://perl.kr/>
한국 Perl 커뮤니티 공식 포털
=item * L<http://doc.perl.kr/>
Perl 문서 한글화 프로젝트
=item * L<http://cafe.naver.com/perlstudy.cafe>
네이버 Perl 카페
=item * L<http://www.perl.or.kr/>
한국 Perl 사용자 모임
=item * L<http://advent.perl.kr>
Seoul.pm Perl 크리스마스 달력 (2010 ~ 2012)
=item * L<http://gypark.pe.kr/wiki/Perl>
GYPARK(Geunyoung Park)의 Perl 관련 한글 문서 저장소
=item * L<http://seoul.pm.org>
Seoul.pm - 서울 Perl 몽거스
=back
=head1 라이센스
F<README> 파일의 B<'LICENSING'> 항목을 참고하세요.
=head1 AUTHORS
=over
=item * Jarkko Hietaniemi E<lt>jhi@iki.fiE<gt>
=item * 신정식 E<lt>jshin@mailaps.orgE<gt>
=item * 김도형 E<lt>keedi@cpan.orgE<gt>
=back
=cut
PK z3�ZX��m� � perlriscos.podnu �[��� If you read this file _as_is_, just ignore the funny characters you
see. It is written in the POD format (see pod/perlpod.pod) which is
specifically designed to be readable as is.
=head1 NAME
perlriscos - Perl version 5 for RISC OS
=head1 DESCRIPTION
This document gives instructions for building Perl for RISC OS. It is
complicated by the need to cross compile. There is a binary version of
perl available from L<http://www.cp15.org/perl/> which you may wish to
use instead of trying to compile it yourself.
=head1 BUILD
You need an installed and working gccsdk cross compiler
L<http://gccsdk.riscos.info/> and REXEN
L<http://www.cp15.org/programming/>
Firstly, copy the source and build a native copy of perl for your host
system.
Then, in the source to be cross compiled:
=over 4
=item 1.
$ ./Configure
=item 2.
Select the riscos hint file. The default answers for the rest of the
questions are usually sufficient.
Note that, if you wish to run Configure non-interactively (see the INSTALL
document for details), to have it select the correct hint file, you'll
need to provide the argument -Dhintfile=riscos on the Configure
command-line.
=item 3.
$ make miniperl
=item 4.
This should build miniperl and then fail when it tries to run it.
=item 5.
Copy the miniperl executable from the native build done earlier to
replace the cross compiled miniperl.
=item 6.
$ make
=item 7.
This will use miniperl to complete the rest of the build.
=back
=head1 AUTHOR
Alex Waugh <alex@alexwaugh.com>
PK z3�ZOQK�a a
perljp.podnu �[��� =encoding utf8
=head1 NAME
perljp - 日本語 Perl ガイド
=head1 説明
Perl の世界へようこそ!
Perl 5.8.0 より、Unicodeサポートが大幅に強化され、その結果ラテン文字以外の文字コードのサポートが CJK (中国語、日本語、ハングル)を含めて加わりました。Unicodeは世界中の文字を一つの文字コードで扱うことを目指した標準規格であり、東から西、はたまたその間の文字(ギリシャ文字、キリール文字、アラビア文字、ヘブライ文字、ディーヴァナガーリ文字、などなど)や、これまではOSベンダーが独自に定めていた文字(PCおよびMacintosh)がすでに含まれています。
Perl 自身は Unicode で動作します。Perl スクリプト内の文字列リテラルや正規表現は Unicode を前提としています。そして入出力のためには、これまで使われてきたさまざまな文字コードに対応するモジュール、「 Encode 」が標準装備されており、Unicode とこれらの文字コードの相互変換も簡単に行えるようになっています。
現時点で Encode がサポートする文字コードは以下のとおりです。
7bit-jis AdobeStandardEncoding AdobeSymbol AdobeZdingbat
ascii big5 big5-hkscs cp1006
cp1026 cp1047 cp1250 cp1251
cp1252 cp1253 cp1254 cp1255
cp1256 cp1257 cp1258 cp37
cp424 cp437 cp500 cp737
cp775 cp850 cp852 cp855
cp856 cp857 cp860 cp861
cp862 cp863 cp864 cp865
cp866 cp869 cp874 cp875
cp932 cp936 cp949 cp950
dingbats euc-cn euc-jp euc-kr
gb12345-raw gb2312-raw gsm0338 hp-roman8
hz iso-2022-jp iso-2022-jp-1 iso-8859-1
iso-8859-10 iso-8859-11 iso-8859-13 iso-8859-14
iso-8859-15 iso-8859-16 iso-8859-2 iso-8859-3
iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7
iso-8859-8 iso-8859-9 iso-ir-165 jis0201-raw
jis0208-raw jis0212-raw johab koi8-f
koi8-r koi8-u ksc5601-raw MacArabic
MacCentralEurRoman MacChineseSimp MacChineseTrad MacCroatian
MacCyrillic MacDingbats MacFarsi MacGreek
MacHebrew MacIcelandic MacJapanese MacKorean
MacRoman MacRomanian MacRumanian MacSami
MacSymbol MacThai MacTurkish MacUkrainian
nextstep posix-bc shiftjis symbol
UCS-2BE UCS-2LE UTF-16 UTF-16BE
UTF-16LE UTF-32 UTF-32BE UTF-32LE
utf8 viscii
(全114種類)
例えば、文字コードFOOのファイルをUTF-8に変換するには、以下のようにします。
perl -Mencoding=FOO,STDOUT,utf8 -pe1 < file.FOO > file.utf8
また、Perlには、全部がPerlで書かれた文字コード変換ユーティリティ、piconvも付属しているので、以下のようにすることもできます。
piconv -f FOO -t utf8 < file.FOO > file.utf8
piconv -f utf8 -t FOO < file.utf8 > file.FOO
=head2 About (jcode.pl|Jcode.pm|JPerl)
5.8以前の、スクリプトがEUC-JPであればリテラルだけは扱うことができました。また、入出力を扱うモジュールとしてはJcode.pmが( L<http://openlab.ring.gr.jp/Jcode/> )、perl4用のユーティリティとしてはjcode.plがそれぞれ存在し、日本語の扱えるCGIでよく利用されていることを御存じの方も少なくないかと思われます。ただし、日本語による正規表現をうまく扱うことは不可能でした。
5.005以前のPerlには、日本語に特化したローカライズ版、Jperlが存在しました( L<http://homepage2.nifty.com/kipp/perl/jperl/index.html> )。また、Mac OS 9.x/Classic用のPerl、MacPerlの日本語版もMacJPerlとして存在してました。( L<http://habilis.net/macjperl/> ).これらでは文字コードとしてEUC-JPに加えShift_JISもそのまま扱うことができ、また日本語による正規表現を扱うことも可能でした。
Perl5.8では、これらの機能がすべてPerl本体だけで実現できる上に、日本語のみならず上記114の文字コードをすべて、しかも同時に扱うことができます。さらに、CPANなどから新しい文字コード用のモジュールを入手することも簡単にできるようになっています。
=over 4
=item *
入出力
以下の例はいづれもShift_JISの入力をEUC-JPに変換して出力します。
# jcode.pl
require "jcode.pl";
while(<>){
jcode::convert(*_, 'euc', 'sjis');
print;
}
# Jcode.pm
use Jcode;
while(<>){
print Jcode->new($_, 'sjis')->euc;
}
# Perl 5.8
use Encode;
while(<>){
from_to($_, 'shiftjis', 'euc-jp');
print;
}
# Perl 5.8 - encoding を利用して
use encoding 'euc-jp', STDIN => 'shiftjis';
while(<>){
print;
}
=item *
Jperl 互換スクリプト
いわゆる"shebang"を変更するだけで、Jperl用のscriptのほとんどは変更なしに利用可能だと思われます。
#!/path/to/jperl
↓
#!/path/to/perl -Mencoding=euc-jp
詳しくは perldoc encoding を参照してください。
=back
=head2 さらに詳しく
Perlには膨大な資料が付属しており、Perlの新機能やUnicodeサポート、そしてEncodeモジュールの使用法などが細かく網羅されています(残念ながら、ほとんど英語ではありますが)。以下のコマンドでそれらの一部を閲覧することが可能です。
perldoc perlunicode # PerlのUnicodeサポート全般
perldoc Encode # Encodeモジュールに関して
perldoc Encode::JP # うち日本語文字コードに関して
=head2 Perl全般に関する URL
=over 4
=item L<http://www.perl.com/>
Perl ホームページ (O'Reilly and Associates)
=item L<http://www.cpan.org/>
CPAN (Comprehensive Perl Archive Network)
=item L<http://lists.perl.org/>
Perl メーリングリスト集
=back
=head2 Perlの修得に役立つ URL
=over 4
=item L<http://www.oreilly.com.tw/>
O'Reilly 社のPerl関連書籍(繁体字中国語)
=item L<http://www.oreilly.com.cn/>
O'Reilly 社のPerl関連書籍(簡体字中国語)
=item L<http://www.oreilly.co.jp/catalog/>
オライリー社のPerl関連書籍(日本語)
=back
=head2 Perl ユーザーグループ
=over 4
=item L<http://www.pm.org/groups/asia.html>
=back
=head2 Unicode関連のURL
=over 4
=item L<http://www.unicode.org/>
Unicode コンソーシアム (Unicode規格の選定団体)
=item L<http://www.cl.cam.ac.uk/%7Emgk25/unicode.html>
UTF-8 and Unicode FAQ for Unix/Linux
=item L<http://wiki.kldp.org/Translations/html/UTF8-Unicode-KLDP/UTF8-Unicode-KLDP.html>
UTF-8 and Unicode FAQ for Unix/Linux (ハングル訳)
=back
=head1 AUTHORS
Jarkko Hietaniemi E<lt>jhi@iki.fiE<gt>
Dan Kogai (小飼 弾) E<lt>dankogai@dan.co.jpE<gt>
=cut
PK z3�ZE�'�� �� perlreguts.podnu �[��� =head1 NAME
perlreguts - Description of the Perl regular expression engine.
=head1 DESCRIPTION
This document is an attempt to shine some light on the guts of the regex
engine and how it works. The regex engine represents a significant chunk
of the perl codebase, but is relatively poorly understood. This document
is a meagre attempt at addressing this situation. It is derived from the
author's experience, comments in the source code, other papers on the
regex engine, feedback on the perl5-porters mail list, and no doubt other
places as well.
B<NOTICE!> It should be clearly understood that the behavior and
structures discussed in this represents the state of the engine as the
author understood it at the time of writing. It is B<NOT> an API
definition, it is purely an internals guide for those who want to hack
the regex engine, or understand how the regex engine works. Readers of
this document are expected to understand perl's regex syntax and its
usage in detail. If you want to learn about the basics of Perl's
regular expressions, see L<perlre>. And if you want to replace the
regex engine with your own, see L<perlreapi>.
=head1 OVERVIEW
=head2 A quick note on terms
There is some debate as to whether to say "regexp" or "regex". In this
document we will use the term "regex" unless there is a special reason
not to, in which case we will explain why.
When speaking about regexes we need to distinguish between their source
code form and their internal form. In this document we will use the term
"pattern" when we speak of their textual, source code form, and the term
"program" when we speak of their internal representation. These
correspond to the terms I<S-regex> and I<B-regex> that Mark Jason
Dominus employs in his paper on "Rx" ([1] in L</REFERENCES>).
=head2 What is a regular expression engine?
A regular expression engine is a program that takes a set of constraints
specified in a mini-language, and then applies those constraints to a
target string, and determines whether or not the string satisfies the
constraints. See L<perlre> for a full definition of the language.
In less grandiose terms, the first part of the job is to turn a pattern into
something the computer can efficiently use to find the matching point in
the string, and the second part is performing the search itself.
To do this we need to produce a program by parsing the text. We then
need to execute the program to find the point in the string that
matches. And we need to do the whole thing efficiently.
=head2 Structure of a Regexp Program
=head3 High Level
Although it is a bit confusing and some people object to the terminology, it
is worth taking a look at a comment that has
been in F<regexp.h> for years:
I<This is essentially a linear encoding of a nondeterministic
finite-state machine (aka syntax charts or "railroad normal form" in
parsing technology).>
The term "railroad normal form" is a bit esoteric, with "syntax
diagram/charts", or "railroad diagram/charts" being more common terms.
Nevertheless it provides a useful mental image of a regex program: each
node can be thought of as a unit of track, with a single entry and in
most cases a single exit point (there are pieces of track that fork, but
statistically not many), and the whole forms a layout with a
single entry and single exit point. The matching process can be thought
of as a car that moves along the track, with the particular route through
the system being determined by the character read at each possible
connector point. A car can fall off the track at any point but it may
only proceed as long as it matches the track.
Thus the pattern C</foo(?:\w+|\d+|\s+)bar/> can be thought of as the
following chart:
[start]
|
<foo>
|
+-----+-----+
| | |
<\w+> <\d+> <\s+>
| | |
+-----+-----+
|
<bar>
|
[end]
The truth of the matter is that perl's regular expressions these days are
much more complex than this kind of structure, but visualising it this way
can help when trying to get your bearings, and it matches the
current implementation pretty closely.
To be more precise, we will say that a regex program is an encoding
of a graph. Each node in the graph corresponds to part of
the original regex pattern, such as a literal string or a branch,
and has a pointer to the nodes representing the next component
to be matched. Since "node" and "opcode" already have other meanings in the
perl source, we will call the nodes in a regex program "regops".
The program is represented by an array of C<regnode> structures, one or
more of which represent a single regop of the program. Struct
C<regnode> is the smallest struct needed, and has a field structure which is
shared with all the other larger structures.
The "next" pointers of all regops except C<BRANCH> implement concatenation;
a "next" pointer with a C<BRANCH> on both ends of it is connecting two
alternatives. [Here we have one of the subtle syntax dependencies: an
individual C<BRANCH> (as opposed to a collection of them) is never
concatenated with anything because of operator precedence.]
The operand of some types of regop is a literal string; for others,
it is a regop leading into a sub-program. In particular, the operand
of a C<BRANCH> node is the first regop of the branch.
B<NOTE>: As the railroad metaphor suggests, this is B<not> a tree
structure: the tail of the branch connects to the thing following the
set of C<BRANCH>es. It is a like a single line of railway track that
splits as it goes into a station or railway yard and rejoins as it comes
out the other side.
=head3 Regops
The base structure of a regop is defined in F<regexp.h> as follows:
struct regnode {
U8 flags; /* Various purposes, sometimes overridden */
U8 type; /* Opcode value as specified by regnodes.h */
U16 next_off; /* Offset in size regnode */
};
Other larger C<regnode>-like structures are defined in F<regcomp.h>. They
are almost like subclasses in that they have the same fields as
C<regnode>, with possibly additional fields following in
the structure, and in some cases the specific meaning (and name)
of some of base fields are overridden. The following is a more
complete description.
=over 4
=item C<regnode_1>
=item C<regnode_2>
C<regnode_1> structures have the same header, followed by a single
four-byte argument; C<regnode_2> structures contain two two-byte
arguments instead:
regnode_1 U32 arg1;
regnode_2 U16 arg1; U16 arg2;
=item C<regnode_string>
C<regnode_string> structures, used for literal strings, follow the header
with a one-byte length and then the string data. Strings are padded on
the end with zero bytes so that the total length of the node is a
multiple of four bytes:
regnode_string char string[1];
U8 str_len; /* overrides flags */
=item C<regnode_charclass>
Bracketed character classes are represented by C<regnode_charclass>
structures, which have a four-byte argument and then a 32-byte (256-bit)
bitmap indicating which characters in the Latin1 range are included in
the class.
regnode_charclass U32 arg1;
char bitmap[ANYOF_BITMAP_SIZE];
Various flags whose names begin with C<ANYOF_> are used for special
situations. Above Latin1 matches and things not known until run-time
are stored in L</Perl's pprivate structure>.
=item C<regnode_charclass_posixl>
There is also a larger form of a char class structure used to represent
POSIX char classes under C</l> matching,
called C<regnode_charclass_posixl> which has an
additional 32-bit bitmap indicating which POSIX char classes
have been included.
regnode_charclass_posixl U32 arg1;
char bitmap[ANYOF_BITMAP_SIZE];
U32 classflags;
=back
F<regnodes.h> defines an array called C<regarglen[]> which gives the size
of each opcode in units of C<size regnode> (4-byte). A macro is used
to calculate the size of an C<EXACT> node based on its C<str_len> field.
The regops are defined in F<regnodes.h> which is generated from
F<regcomp.sym> by F<regcomp.pl>. Currently the maximum possible number
of distinct regops is restricted to 256, with about a quarter already
used.
A set of macros makes accessing the fields
easier and more consistent. These include C<OP()>, which is used to determine
the type of a C<regnode>-like structure; C<NEXT_OFF()>, which is the offset to
the next node (more on this later); C<ARG()>, C<ARG1()>, C<ARG2()>, C<ARG_SET()>,
and equivalents for reading and setting the arguments; and C<STR_LEN()>,
C<STRING()> and C<OPERAND()> for manipulating strings and regop bearing
types.
=head3 What regop is next?
There are three distinct concepts of "next" in the regex engine, and
it is important to keep them clear.
=over 4
=item *
There is the "next regnode" from a given regnode, a value which is
rarely useful except that sometimes it matches up in terms of value
with one of the others, and that sometimes the code assumes this to
always be so.
=item *
There is the "next regop" from a given regop/regnode. This is the
regop physically located after the current one, as determined by
the size of the current regop. This is often useful, such as when
dumping the structure we use this order to traverse. Sometimes the code
assumes that the "next regnode" is the same as the "next regop", or in
other words assumes that the sizeof a given regop type is always going
to be one regnode large.
=item *
There is the "regnext" from a given regop. This is the regop which
is reached by jumping forward by the value of C<NEXT_OFF()>,
or in a few cases for longer jumps by the C<arg1> field of the C<regnode_1>
structure. The subroutine C<regnext()> handles this transparently.
This is the logical successor of the node, which in some cases, like
that of the C<BRANCH> regop, has special meaning.
=back
=head1 Process Overview
Broadly speaking, performing a match of a string against a pattern
involves the following steps:
=over 5
=item A. Compilation
=over 5
=item 1. Parsing for size
=item 2. Parsing for construction
=item 3. Peep-hole optimisation and analysis
=back
=item B. Execution
=over 5
=item 4. Start position and no-match optimisations
=item 5. Program execution
=back
=back
Where these steps occur in the actual execution of a perl program is
determined by whether the pattern involves interpolating any string
variables. If interpolation occurs, then compilation happens at run time. If it
does not, then compilation is performed at compile time. (The C</o> modifier changes this,
as does C<qr//> to a certain extent.) The engine doesn't really care that
much.
=head2 Compilation
This code resides primarily in F<regcomp.c>, along with the header files
F<regcomp.h>, F<regexp.h> and F<regnodes.h>.
Compilation starts with C<pregcomp()>, which is mostly an initialisation
wrapper which farms work out to two other routines for the heavy lifting: the
first is C<reg()>, which is the start point for parsing; the second,
C<study_chunk()>, is responsible for optimisation.
Initialisation in C<pregcomp()> mostly involves the creation and data-filling
of a special structure, C<RExC_state_t> (defined in F<regcomp.c>).
Almost all internally-used routines in F<regcomp.h> take a pointer to one
of these structures as their first argument, with the name C<pRExC_state>.
This structure is used to store the compilation state and contains many
fields. Likewise there are many macros which operate on this
variable: anything that looks like C<RExC_xxxx> is a macro that operates on
this pointer/structure.
=head3 Parsing for size
In this pass the input pattern is parsed in order to calculate how much
space is needed for each regop we would need to emit. The size is also
used to determine whether long jumps will be required in the program.
This stage is controlled by the macro C<SIZE_ONLY> being set.
The parse proceeds pretty much exactly as it does during the
construction phase, except that most routines are short-circuited to
change the size field C<RExC_size> and not do anything else.
=head3 Parsing for construction
Once the size of the program has been determined, the pattern is parsed
again, but this time for real. Now C<SIZE_ONLY> will be false, and the
actual construction can occur.
C<reg()> is the start of the parse process. It is responsible for
parsing an arbitrary chunk of pattern up to either the end of the
string, or the first closing parenthesis it encounters in the pattern.
This means it can be used to parse the top-level regex, or any section
inside of a grouping parenthesis. It also handles the "special parens"
that perl's regexes have. For instance when parsing C</x(?:foo)y/> C<reg()>
will at one point be called to parse from the "?" symbol up to and
including the ")".
Additionally, C<reg()> is responsible for parsing the one or more
branches from the pattern, and for "finishing them off" by correctly
setting their next pointers. In order to do the parsing, it repeatedly
calls out to C<regbranch()>, which is responsible for handling up to the
first C<|> symbol it sees.
C<regbranch()> in turn calls C<regpiece()> which
handles "things" followed by a quantifier. In order to parse the
"things", C<regatom()> is called. This is the lowest level routine, which
parses out constant strings, character classes, and the
various special symbols like C<$>. If C<regatom()> encounters a "("
character it in turn calls C<reg()>.
The routine C<regtail()> is called by both C<reg()> and C<regbranch()>
in order to "set the tail pointer" correctly. When executing and
we get to the end of a branch, we need to go to the node following the
grouping parens. When parsing, however, we don't know where the end will
be until we get there, so when we do we must go back and update the
offsets as appropriate. C<regtail> is used to make this easier.
A subtlety of the parsing process means that a regex like C</foo/> is
originally parsed into an alternation with a single branch. It is only
afterwards that the optimiser converts single branch alternations into the
simpler form.
=head3 Parse Call Graph and a Grammar
The call graph looks like this:
reg() # parse a top level regex, or inside of
# parens
regbranch() # parse a single branch of an alternation
regpiece() # parse a pattern followed by a quantifier
regatom() # parse a simple pattern
regclass() # used to handle a class
reg() # used to handle a parenthesised
# subpattern
....
...
regtail() # finish off the branch
...
regtail() # finish off the branch sequence. Tie each
# branch's tail to the tail of the
# sequence
# (NEW) In Debug mode this is
# regtail_study().
A grammar form might be something like this:
atom : constant | class
quant : '*' | '+' | '?' | '{min,max}'
_branch: piece
| piece _branch
| nothing
branch: _branch
| _branch '|' branch
group : '(' branch ')'
_piece: atom | group
piece : _piece
| _piece quant
=head3 Parsing complications
The implication of the above description is that a pattern containing nested
parentheses will result in a call graph which cycles through C<reg()>,
C<regbranch()>, C<regpiece()>, C<regatom()>, C<reg()>, C<regbranch()> I<etc>
multiple times, until the deepest level of nesting is reached. All the above
routines return a pointer to a C<regnode>, which is usually the last regnode
added to the program. However, one complication is that reg() returns NULL
for parsing C<(?:)> syntax for embedded modifiers, setting the flag
C<TRYAGAIN>. The C<TRYAGAIN> propagates upwards until it is captured, in
some cases by C<regatom()>, but otherwise unconditionally by
C<regbranch()>. Hence it will never be returned by C<regbranch()> to
C<reg()>. This flag permits patterns such as C<(?i)+> to be detected as
errors (I<Quantifier follows nothing in regex; marked by <-- HERE in m/(?i)+
<-- HERE />).
Another complication is that the representation used for the program differs
if it needs to store Unicode, but it's not always possible to know for sure
whether it does until midway through parsing. The Unicode representation for
the program is larger, and cannot be matched as efficiently. (See L</Unicode
and Localisation Support> below for more details as to why.) If the pattern
contains literal Unicode, it's obvious that the program needs to store
Unicode. Otherwise, the parser optimistically assumes that the more
efficient representation can be used, and starts sizing on this basis.
However, if it then encounters something in the pattern which must be stored
as Unicode, such as an C<\x{...}> escape sequence representing a character
literal, then this means that all previously calculated sizes need to be
redone, using values appropriate for the Unicode representation. Currently,
all regular expression constructions which can trigger this are parsed by code
in C<regatom()>.
To avoid wasted work when a restart is needed, the sizing pass is abandoned
- C<regatom()> immediately returns NULL, setting the flag C<RESTART_UTF8>.
(This action is encapsulated using the macro C<REQUIRE_UTF8>.) This restart
request is propagated up the call chain in a similar fashion, until it is
"caught" in C<Perl_re_op_compile()>, which marks the pattern as containing
Unicode, and restarts the sizing pass. It is also possible for constructions
within run-time code blocks to turn out to need Unicode representation.,
which is signalled by C<S_compile_runtime_code()> returning false to
C<Perl_re_op_compile()>.
The restart was previously implemented using a C<longjmp> in C<regatom()>
back to a C<setjmp> in C<Perl_re_op_compile()>, but this proved to be
problematic as the latter is a large function containing many automatic
variables, which interact badly with the emergent control flow of C<setjmp>.
=head3 Debug Output
In the 5.9.x development version of perl you can C<< use re Debug => 'PARSE' >>
to see some trace information about the parse process. We will start with some
simple patterns and build up to more complex patterns.
So when we parse C</foo/> we see something like the following table. The
left shows what is being parsed, and the number indicates where the next regop
would go. The stuff on the right is the trace output of the graph. The
names are chosen to be short to make it less dense on the screen. 'tsdy'
is a special form of C<regtail()> which does some extra analysis.
>foo< 1 reg
brnc
piec
atom
>< 4 tsdy~ EXACT <foo> (EXACT) (1)
~ attach to END (3) offset to 2
The resulting program then looks like:
1: EXACT <foo>(3)
3: END(0)
As you can see, even though we parsed out a branch and a piece, it was ultimately
only an atom. The final program shows us how things work. We have an C<EXACT> regop,
followed by an C<END> regop. The number in parens indicates where the C<regnext> of
the node goes. The C<regnext> of an C<END> regop is unused, as C<END> regops mean
we have successfully matched. The number on the left indicates the position of
the regop in the regnode array.
Now let's try a harder pattern. We will add a quantifier, so now we have the pattern
C</foo+/>. We will see that C<regbranch()> calls C<regpiece()> twice.
>foo+< 1 reg
brnc
piec
atom
>o+< 3 piec
atom
>< 6 tail~ EXACT <fo> (1)
7 tsdy~ EXACT <fo> (EXACT) (1)
~ PLUS (END) (3)
~ attach to END (6) offset to 3
And we end up with the program:
1: EXACT <fo>(3)
3: PLUS(6)
4: EXACT <o>(0)
6: END(0)
Now we have a special case. The C<EXACT> regop has a C<regnext> of 0. This is
because if it matches it should try to match itself again. The C<PLUS> regop
handles the actual failure of the C<EXACT> regop and acts appropriately (going
to regnode 6 if the C<EXACT> matched at least once, or failing if it didn't).
Now for something much more complex: C</x(?:foo*|b[a][rR])(foo|bar)$/>
>x(?:foo*|b... 1 reg
brnc
piec
atom
>(?:foo*|b[... 3 piec
atom
>?:foo*|b[a... reg
>foo*|b[a][... brnc
piec
atom
>o*|b[a][rR... 5 piec
atom
>|b[a][rR])... 8 tail~ EXACT <fo> (3)
>b[a][rR])(... 9 brnc
10 piec
atom
>[a][rR])(f... 12 piec
atom
>a][rR])(fo... clas
>[rR])(foo|... 14 tail~ EXACT <b> (10)
piec
atom
>rR])(foo|b... clas
>)(foo|bar)... 25 tail~ EXACT <a> (12)
tail~ BRANCH (3)
26 tsdy~ BRANCH (END) (9)
~ attach to TAIL (25) offset to 16
tsdy~ EXACT <fo> (EXACT) (4)
~ STAR (END) (6)
~ attach to TAIL (25) offset to 19
tsdy~ EXACT <b> (EXACT) (10)
~ EXACT <a> (EXACT) (12)
~ ANYOF[Rr] (END) (14)
~ attach to TAIL (25) offset to 11
>(foo|bar)$< tail~ EXACT <x> (1)
piec
atom
>foo|bar)$< reg
28 brnc
piec
atom
>|bar)$< 31 tail~ OPEN1 (26)
>bar)$< brnc
32 piec
atom
>)$< 34 tail~ BRANCH (28)
36 tsdy~ BRANCH (END) (31)
~ attach to CLOSE1 (34) offset to 3
tsdy~ EXACT <foo> (EXACT) (29)
~ attach to CLOSE1 (34) offset to 5
tsdy~ EXACT <bar> (EXACT) (32)
~ attach to CLOSE1 (34) offset to 2
>$< tail~ BRANCH (3)
~ BRANCH (9)
~ TAIL (25)
piec
atom
>< 37 tail~ OPEN1 (26)
~ BRANCH (28)
~ BRANCH (31)
~ CLOSE1 (34)
38 tsdy~ EXACT <x> (EXACT) (1)
~ BRANCH (END) (3)
~ BRANCH (END) (9)
~ TAIL (END) (25)
~ OPEN1 (END) (26)
~ BRANCH (END) (28)
~ BRANCH (END) (31)
~ CLOSE1 (END) (34)
~ EOL (END) (36)
~ attach to END (37) offset to 1
Resulting in the program
1: EXACT <x>(3)
3: BRANCH(9)
4: EXACT <fo>(6)
6: STAR(26)
7: EXACT <o>(0)
9: BRANCH(25)
10: EXACT <ba>(14)
12: OPTIMIZED (2 nodes)
14: ANYOF[Rr](26)
25: TAIL(26)
26: OPEN1(28)
28: TRIE-EXACT(34)
[StS:1 Wds:2 Cs:6 Uq:5 #Sts:7 Mn:3 Mx:3 Stcls:bf]
<foo>
<bar>
30: OPTIMIZED (4 nodes)
34: CLOSE1(36)
36: EOL(37)
37: END(0)
Here we can see a much more complex program, with various optimisations in
play. At regnode 10 we see an example where a character class with only
one character in it was turned into an C<EXACT> node. We can also see where
an entire alternation was turned into a C<TRIE-EXACT> node. As a consequence,
some of the regnodes have been marked as optimised away. We can see that
the C<$> symbol has been converted into an C<EOL> regop, a special piece of
code that looks for C<\n> or the end of the string.
The next pointer for C<BRANCH>es is interesting in that it points at where
execution should go if the branch fails. When executing, if the engine
tries to traverse from a branch to a C<regnext> that isn't a branch then
the engine will know that the entire set of branches has failed.
=head3 Peep-hole Optimisation and Analysis
The regular expression engine can be a weighty tool to wield. On long
strings and complex patterns it can end up having to do a lot of work
to find a match, and even more to decide that no match is possible.
Consider a situation like the following pattern.
'ababababababababababab' =~ /(a|b)*z/
The C<(a|b)*> part can match at every char in the string, and then fail
every time because there is no C<z> in the string. So obviously we can
avoid using the regex engine unless there is a C<z> in the string.
Likewise in a pattern like:
/foo(\w+)bar/
In this case we know that the string must contain a C<foo> which must be
followed by C<bar>. We can use Fast Boyer-Moore matching as implemented
in C<fbm_instr()> to find the location of these strings. If they don't exist
then we don't need to resort to the much more expensive regex engine.
Even better, if they do exist then we can use their positions to
reduce the search space that the regex engine needs to cover to determine
if the entire pattern matches.
There are various aspects of the pattern that can be used to facilitate
optimisations along these lines:
=over 5
=item * anchored fixed strings
=item * floating fixed strings
=item * minimum and maximum length requirements
=item * start class
=item * Beginning/End of line positions
=back
Another form of optimisation that can occur is the post-parse "peep-hole"
optimisation, where inefficient constructs are replaced by more efficient
constructs. The C<TAIL> regops which are used during parsing to mark the end
of branches and the end of groups are examples of this. These regops are used
as place-holders during construction and "always match" so they can be
"optimised away" by making the things that point to the C<TAIL> point to the
thing that C<TAIL> points to, thus "skipping" the node.
Another optimisation that can occur is that of "C<EXACT> merging" which is
where two consecutive C<EXACT> nodes are merged into a single
regop. An even more aggressive form of this is that a branch
sequence of the form C<EXACT BRANCH ... EXACT> can be converted into a
C<TRIE-EXACT> regop.
All of this occurs in the routine C<study_chunk()> which uses a special
structure C<scan_data_t> to store the analysis that it has performed, and
does the "peep-hole" optimisations as it goes.
The code involved in C<study_chunk()> is extremely cryptic. Be careful. :-)
=head2 Execution
Execution of a regex generally involves two phases, the first being
finding the start point in the string where we should match from,
and the second being running the regop interpreter.
If we can tell that there is no valid start point then we don't bother running
the interpreter at all. Likewise, if we know from the analysis phase that we
cannot detect a short-cut to the start position, we go straight to the
interpreter.
The two entry points are C<re_intuit_start()> and C<pregexec()>. These routines
have a somewhat incestuous relationship with overlap between their functions,
and C<pregexec()> may even call C<re_intuit_start()> on its own. Nevertheless
other parts of the perl source code may call into either, or both.
Execution of the interpreter itself used to be recursive, but thanks to the
efforts of Dave Mitchell in the 5.9.x development track, that has changed: now an
internal stack is maintained on the heap and the routine is fully
iterative. This can make it tricky as the code is quite conservative
about what state it stores, with the result that two consecutive lines in the
code can actually be running in totally different contexts due to the
simulated recursion.
=head3 Start position and no-match optimisations
C<re_intuit_start()> is responsible for handling start points and no-match
optimisations as determined by the results of the analysis done by
C<study_chunk()> (and described in L</Peep-hole Optimisation and Analysis>).
The basic structure of this routine is to try to find the start- and/or
end-points of where the pattern could match, and to ensure that the string
is long enough to match the pattern. It tries to use more efficient
methods over less efficient methods and may involve considerable
cross-checking of constraints to find the place in the string that matches.
For instance it may try to determine that a given fixed string must be
not only present but a certain number of chars before the end of the
string, or whatever.
It calls several other routines, such as C<fbm_instr()> which does
Fast Boyer Moore matching and C<find_byclass()> which is responsible for
finding the start using the first mandatory regop in the program.
When the optimisation criteria have been satisfied, C<reg_try()> is called
to perform the match.
=head3 Program execution
C<pregexec()> is the main entry point for running a regex. It contains
support for initialising the regex interpreter's state, running
C<re_intuit_start()> if needed, and running the interpreter on the string
from various start positions as needed. When it is necessary to use
the regex interpreter C<pregexec()> calls C<regtry()>.
C<regtry()> is the entry point into the regex interpreter. It expects
as arguments a pointer to a C<regmatch_info> structure and a pointer to
a string. It returns an integer 1 for success and a 0 for failure.
It is basically a set-up wrapper around C<regmatch()>.
C<regmatch> is the main "recursive loop" of the interpreter. It is
basically a giant switch statement that implements a state machine, where
the possible states are the regops themselves, plus a number of additional
intermediate and failure states. A few of the states are implemented as
subroutines but the bulk are inline code.
=head1 MISCELLANEOUS
=head2 Unicode and Localisation Support
When dealing with strings containing characters that cannot be represented
using an eight-bit character set, perl uses an internal representation
that is a permissive version of Unicode's UTF-8 encoding[2]. This uses single
bytes to represent characters from the ASCII character set, and sequences
of two or more bytes for all other characters. (See L<perlunitut>
for more information about the relationship between UTF-8 and perl's
encoding, utf8. The difference isn't important for this discussion.)
No matter how you look at it, Unicode support is going to be a pain in a
regex engine. Tricks that might be fine when you have 256 possible
characters often won't scale to handle the size of the UTF-8 character
set. Things you can take for granted with ASCII may not be true with
Unicode. For instance, in ASCII, it is safe to assume that
C<sizeof(char1) == sizeof(char2)>, but in UTF-8 it isn't. Unicode case folding is
vastly more complex than the simple rules of ASCII, and even when not
using Unicode but only localised single byte encodings, things can get
tricky (for example, B<LATIN SMALL LETTER SHARP S> (U+00DF, E<szlig>)
should match 'SS' in localised case-insensitive matching).
Making things worse is that UTF-8 support was a later addition to the
regex engine (as it was to perl) and this necessarily made things a lot
more complicated. Obviously it is easier to design a regex engine with
Unicode support in mind from the beginning than it is to retrofit it to
one that wasn't.
Nearly all regops that involve looking at the input string have
two cases, one for UTF-8, and one not. In fact, it's often more complex
than that, as the pattern may be UTF-8 as well.
Care must be taken when making changes to make sure that you handle
UTF-8 properly, both at compile time and at execution time, including
when the string and pattern are mismatched.
=head2 Base Structures
The C<regexp> structure described in L<perlreapi> is common to all
regex engines. Two of its fields are intended for the private use
of the regex engine that compiled the pattern. These are the
C<intflags> and pprivate members. The C<pprivate> is a void pointer to
an arbitrary structure whose use and management is the responsibility
of the compiling engine. perl will never modify either of these
values. In the case of the stock engine the structure pointed to by
C<pprivate> is called C<regexp_internal>.
Its C<pprivate> and C<intflags> fields contain data
specific to each engine.
There are two structures used to store a compiled regular expression.
One, the C<regexp> structure described in L<perlreapi> is populated by
the engine currently being. used and some of its fields read by perl to
implement things such as the stringification of C<qr//>.
The other structure is pointed to by the C<regexp> struct's
C<pprivate> and is in addition to C<intflags> in the same struct
considered to be the property of the regex engine which compiled the
regular expression;
The regexp structure contains all the data that perl needs to be aware of
to properly work with the regular expression. It includes data about
optimisations that perl can use to determine if the regex engine should
really be used, and various other control info that is needed to properly
execute patterns in various contexts such as is the pattern anchored in
some way, or what flags were used during the compile, or whether the
program contains special constructs that perl needs to be aware of.
In addition it contains two fields that are intended for the private use
of the regex engine that compiled the pattern. These are the C<intflags>
and pprivate members. The C<pprivate> is a void pointer to an arbitrary
structure whose use and management is the responsibility of the compiling
engine. perl will never modify either of these values.
As mentioned earlier, in the case of the default engines, the C<pprivate>
will be a pointer to a regexp_internal structure which holds the compiled
program and any additional data that is private to the regex engine
implementation.
=head3 Perl's C<pprivate> structure
The following structure is used as the C<pprivate> struct by perl's
regex engine. Since it is specific to perl it is only of curiosity
value to other engine implementations.
typedef struct regexp_internal {
U32 *offsets; /* offset annotations 20001228 MJD
* data about mapping the program to
* the string*/
regnode *regstclass; /* Optional startclass as identified or
* constructed by the optimiser */
struct reg_data *data; /* Additional miscellaneous data used
* by the program. Used to make it
* easier to clone and free arbitrary
* data that the regops need. Often the
* ARG field of a regop is an index
* into this structure */
regnode program[1]; /* Unwarranted chumminess with
* compiler. */
} regexp_internal;
=over 5
=item C<offsets>
Offsets holds a mapping of offset in the C<program>
to offset in the C<precomp> string. This is only used by ActiveState's
visual regex debugger.
=item C<regstclass>
Special regop that is used by C<re_intuit_start()> to check if a pattern
can match at a certain position. For instance if the regex engine knows
that the pattern must start with a 'Z' then it can scan the string until
it finds one and then launch the regex engine from there. The routine
that handles this is called C<find_by_class()>. Sometimes this field
points at a regop embedded in the program, and sometimes it points at
an independent synthetic regop that has been constructed by the optimiser.
=item C<data>
This field points at a C<reg_data> structure, which is defined as follows
struct reg_data {
U32 count;
U8 *what;
void* data[1];
};
This structure is used for handling data structures that the regex engine
needs to handle specially during a clone or free operation on the compiled
product. Each element in the data array has a corresponding element in the
what array. During compilation regops that need special structures stored
will add an element to each array using the add_data() routine and then store
the index in the regop.
=item C<program>
Compiled program. Inlined into the structure so the entire struct can be
treated as a single blob.
=back
=head1 SEE ALSO
L<perlreapi>
L<perlre>
L<perlunitut>
=head1 AUTHOR
by Yves Orton, 2006.
With excerpts from Perl, and contributions and suggestions from
Ronald J. Kimball, Dave Mitchell, Dominic Dunlop, Mark Jason Dominus,
Stephen McCamant, and David Landgren.
=head1 LICENCE
Same terms as Perl.
=head1 REFERENCES
[1] L<http://perl.plover.com/Rx/paper/>
[2] L<http://www.unicode.org>
=cut
PK z3�Z]�W�% �% perlclib.podnu �[��� =head1 NAME
perlclib - Internal replacements for standard C library functions
=head1 DESCRIPTION
One thing Perl porters should note is that F<perl> doesn't tend to use that
much of the C standard library internally; you'll see very little use of,
for example, the F<ctype.h> functions in there. This is because Perl
tends to reimplement or abstract standard library functions, so that we
know exactly how they're going to operate.
This is a reference card for people who are familiar with the C library
and who want to do things the Perl way; to tell them which functions
they ought to use instead of the more normal C functions.
=head2 Conventions
In the following tables:
=over 3
=item C<t>
is a type.
=item C<p>
is a pointer.
=item C<n>
is a number.
=item C<s>
is a string.
=back
C<sv>, C<av>, C<hv>, etc. represent variables of their respective types.
=head2 File Operations
Instead of the F<stdio.h> functions, you should use the Perl abstraction
layer. Instead of C<FILE*> types, you need to be handling C<PerlIO*>
types. Don't forget that with the new PerlIO layered I/O abstraction
C<FILE*> types may not even be available. See also the C<perlapio>
documentation for more information about the following functions:
Instead Of: Use:
stdin PerlIO_stdin()
stdout PerlIO_stdout()
stderr PerlIO_stderr()
fopen(fn, mode) PerlIO_open(fn, mode)
freopen(fn, mode, stream) PerlIO_reopen(fn, mode, perlio) (Dep-
recated)
fflush(stream) PerlIO_flush(perlio)
fclose(stream) PerlIO_close(perlio)
=head2 File Input and Output
Instead Of: Use:
fprintf(stream, fmt, ...) PerlIO_printf(perlio, fmt, ...)
[f]getc(stream) PerlIO_getc(perlio)
[f]putc(stream, n) PerlIO_putc(perlio, n)
ungetc(n, stream) PerlIO_ungetc(perlio, n)
Note that the PerlIO equivalents of C<fread> and C<fwrite> are slightly
different from their C library counterparts:
fread(p, size, n, stream) PerlIO_read(perlio, buf, numbytes)
fwrite(p, size, n, stream) PerlIO_write(perlio, buf, numbytes)
fputs(s, stream) PerlIO_puts(perlio, s)
There is no equivalent to C<fgets>; one should use C<sv_gets> instead:
fgets(s, n, stream) sv_gets(sv, perlio, append)
=head2 File Positioning
Instead Of: Use:
feof(stream) PerlIO_eof(perlio)
fseek(stream, n, whence) PerlIO_seek(perlio, n, whence)
rewind(stream) PerlIO_rewind(perlio)
fgetpos(stream, p) PerlIO_getpos(perlio, sv)
fsetpos(stream, p) PerlIO_setpos(perlio, sv)
ferror(stream) PerlIO_error(perlio)
clearerr(stream) PerlIO_clearerr(perlio)
=head2 Memory Management and String Handling
Instead Of: Use:
t* p = malloc(n) Newx(p, n, t)
t* p = calloc(n, s) Newxz(p, n, t)
p = realloc(p, n) Renew(p, n, t)
memcpy(dst, src, n) Copy(src, dst, n, t)
memmove(dst, src, n) Move(src, dst, n, t)
memcpy(dst, src, sizeof(t)) StructCopy(src, dst, t)
memset(dst, 0, n * sizeof(t)) Zero(dst, n, t)
memzero(dst, 0) Zero(dst, n, char)
free(p) Safefree(p)
strdup(p) savepv(p)
strndup(p, n) savepvn(p, n) (Hey, strndup doesn't
exist!)
strstr(big, little) instr(big, little)
strcmp(s1, s2) strLE(s1, s2) / strEQ(s1, s2)
/ strGT(s1,s2)
strncmp(s1, s2, n) strnNE(s1, s2, n) / strnEQ(s1, s2, n)
memcmp(p1, p2, n) memNE(p1, p2, n)
!memcmp(p1, p2, n) memEQ(p1, p2, n)
Notice the different order of arguments to C<Copy> and C<Move> than used
in C<memcpy> and C<memmove>.
Most of the time, though, you'll want to be dealing with SVs internally
instead of raw C<char *> strings:
strlen(s) sv_len(sv)
strcpy(dt, src) sv_setpv(sv, s)
strncpy(dt, src, n) sv_setpvn(sv, s, n)
strcat(dt, src) sv_catpv(sv, s)
strncat(dt, src) sv_catpvn(sv, s)
sprintf(s, fmt, ...) sv_setpvf(sv, fmt, ...)
Note also the existence of C<sv_catpvf> and C<sv_vcatpvfn>, combining
concatenation with formatting.
Sometimes instead of zeroing the allocated heap by using Newxz() you
should consider "poisoning" the data. This means writing a bit
pattern into it that should be illegal as pointers (and floating point
numbers), and also hopefully surprising enough as integers, so that
any code attempting to use the data without forethought will break
sooner rather than later. Poisoning can be done using the Poison()
macros, which have similar arguments to Zero():
PoisonWith(dst, n, t, b) scribble memory with byte b
PoisonNew(dst, n, t) equal to PoisonWith(dst, n, t, 0xAB)
PoisonFree(dst, n, t) equal to PoisonWith(dst, n, t, 0xEF)
Poison(dst, n, t) equal to PoisonFree(dst, n, t)
=head2 Character Class Tests
There are several types of character class tests that Perl implements.
The only ones described here are those that directly correspond to C
library functions that operate on 8-bit characters, but there are
equivalents that operate on wide characters, and UTF-8 encoded strings.
All are more fully described in L<perlapi/Character classification> and
L<perlapi/Character case changing>.
The C library routines listed in the table below return values based on
the current locale. Use the entries in the final column for that
functionality. The other two columns always assume a POSIX (or C)
locale. The entries in the ASCII column are only meaningful for ASCII
inputs, returning FALSE for anything else. Use these only when you
B<know> that is what you want. The entries in the Latin1 column assume
that the non-ASCII 8-bit characters are as Unicode defines, them, the
same as ISO-8859-1, often called Latin 1.
Instead Of: Use for ASCII: Use for Latin1: Use for locale:
isalnum(c) isALPHANUMERIC(c) isALPHANUMERIC_L1(c) isALPHANUMERIC_LC(c)
isalpha(c) isALPHA(c) isALPHA_L1(c) isALPHA_LC(u )
isascii(c) isASCII(c) isASCII_LC(c)
isblank(c) isBLANK(c) isBLANK_L1(c) isBLANK_LC(c)
iscntrl(c) isCNTRL(c) isCNTRL_L1(c) isCNTRL_LC(c)
isdigit(c) isDIGIT(c) isDIGIT_L1(c) isDIGIT_LC(c)
isgraph(c) isGRAPH(c) isGRAPH_L1(c) isGRAPH_LC(c)
islower(c) isLOWER(c) isLOWER_L1(c) isLOWER_LC(c)
isprint(c) isPRINT(c) isPRINT_L1(c) isPRINT_LC(c)
ispunct(c) isPUNCT(c) isPUNCT_L1(c) isPUNCT_LC(c)
isspace(c) isSPACE(c) isSPACE_L1(c) isSPACE_LC(c)
isupper(c) isUPPER(c) isUPPER_L1(c) isUPPER_LC(c)
isxdigit(c) isXDIGIT(c) isXDIGIT_L1(c) isXDIGIT_LC(c)
tolower(c) toLOWER(c) toLOWER_L1(c) toLOWER_LC(c)
toupper(c) toUPPER(c) toUPPER_LC(c)
To emphasize that you are operating only on ASCII characters, you can
append C<_A> to each of the macros in the ASCII column: C<isALPHA_A>,
C<isDIGIT_A>, and so on.
(There is no entry in the Latin1 column for C<isascii> even though there
is an C<isASCII_L1>, which is identical to C<isASCII>; the
latter name is clearer. There is no entry in the Latin1 column for
C<toupper> because the result can be non-Latin1. You have to use
C<toUPPER_uni>, as described in L<perlapi/Character case changing>.)
=head2 F<stdlib.h> functions
Instead Of: Use:
atof(s) Atof(s)
atoi(s) grok_atoUV(s, &uv, &e)
atol(s) grok_atoUV(s, &uv, &e)
strtod(s, &p) Nothing. Just don't use it.
strtol(s, &p, n) grok_atoUV(s, &uv, &e)
strtoul(s, &p, n) grok_atoUV(s, &uv, &e)
Typical use is to do range checks on C<uv> before casting:
int i; UV uv; char* end_ptr;
if (grok_atoUV(input, &uv, &end_ptr)
&& uv <= INT_MAX)
i = (int)uv;
... /* continue parsing from end_ptr */
} else {
... /* parse error: not a decimal integer in range 0 .. MAX_IV */
}
Notice also the C<grok_bin>, C<grok_hex>, and C<grok_oct> functions in
F<numeric.c> for converting strings representing numbers in the respective
bases into C<NV>s. Note that grok_atoUV() doesn't handle negative inputs,
or leading whitespace (being purposefully strict).
Note that strtol() and strtoul() may be disguised as Strtol(), Strtoul(),
Atol(), Atoul(). Avoid those, too.
In theory C<Strtol> and C<Strtoul> may not be defined if the machine perl is
built on doesn't actually have strtol and strtoul. But as those 2
functions are part of the 1989 ANSI C spec we suspect you'll find them
everywhere by now.
int rand() double Drand01()
srand(n) { seedDrand01((Rand_seed_t)n);
PL_srand_called = TRUE; }
exit(n) my_exit(n)
system(s) Don't. Look at pp_system or use my_popen.
getenv(s) PerlEnv_getenv(s)
setenv(s, val) my_setenv(s, val)
=head2 Miscellaneous functions
You should not even B<want> to use F<setjmp.h> functions, but if you
think you do, use the C<JMPENV> stack in F<scope.h> instead.
For C<signal>/C<sigaction>, use C<rsignal(signo, handler)>.
=head1 SEE ALSO
L<perlapi>, L<perlapio>, L<perlguts>
PK z3�Z�/��� � perlbook.podnu �[��� =head1 NAME
perlbook - Books about and related to Perl
=head1 DESCRIPTION
There are many books on Perl and Perl-related. A few of these are
good, some are OK, but many aren't worth your money. There is a list
of these books, some with extensive reviews, at
L<http://books.perl.org/> . We list some of the books here, and while
listing a book implies our
endorsement, don't think that not including a book means anything.
Most of these books are available online through Safari Books Online
( L<http://safaribooksonline.com/> ).
=head2 The most popular books
The major reference book on Perl, written by the creator of Perl, is
I<Programming Perl>:
=over 4
=item I<Programming Perl> (the "Camel Book"):
by Tom Christiansen, brian d foy, Larry Wall with Jon Orwant
ISBN 978-0-596-00492-7 [4th edition February 2012]
ISBN 978-1-4493-9890-3 [ebook]
http://oreilly.com/catalog/9780596004927
=back
The Ram is a cookbook with hundreds of examples of using Perl to
accomplish specific tasks:
=over 4
=item I<The Perl Cookbook> (the "Ram Book"):
by Tom Christiansen and Nathan Torkington,
with Foreword by Larry Wall
ISBN 978-0-596-00313-5 [2nd Edition August 2003]
ISBN 978-0-596-15888-0 [ebook]
http://oreilly.com/catalog/9780596003135/
=back
If you want to learn the basics of Perl, you might start with the
Llama book, which assumes that you already know a little about
programming:
=over 4
=item I<Learning Perl> (the "Llama Book")
by Randal L. Schwartz, Tom Phoenix, and brian d foy
ISBN 978-1-4493-0358-7 [6th edition June 2011]
ISBN 978-1-4493-0458-4 [ebook]
http://www.learning-perl.com/
=back
The tutorial started in the Llama continues in the Alpaca, which
introduces the intermediate features of references, data structures,
object-oriented programming, and modules:
=over 4
=item I<Intermediate Perl> (the "Alpaca Book")
by Randal L. Schwartz and brian d foy, with Tom Phoenix
foreword by Damian Conway
ISBN 978-1-4493-9309-0 [2nd edition August 2012]
ISBN 978-1-4493-0459-1 [ebook]
http://www.intermediateperl.com/
=back
=head2 References
You might want to keep these desktop references close by your keyboard:
=over 4
=item I<Perl 5 Pocket Reference>
by Johan Vromans
ISBN 978-1-4493-0370-9 [5th edition July 2011]
ISBN 978-1-4493-0813-1 [ebook]
http://oreilly.com/catalog/0636920018476/
=item I<Perl Debugger Pocket Reference>
by Richard Foley
ISBN 978-0-596-00503-0 [1st edition January 2004]
ISBN 978-0-596-55625-9 [ebook]
http://oreilly.com/catalog/9780596005030/
=item I<Regular Expression Pocket Reference>
by Tony Stubblebine
ISBN 978-0-596-51427-3 [2nd edition July 2007]
ISBN 978-0-596-55782-9 [ebook]
http://oreilly.com/catalog/9780596514273/
=back
=head2 Tutorials
=over 4
=item I<Beginning Perl>
(There are 2 books with this title)
by Curtis 'Ovid' Poe
ISBN 978-1-118-01384-7
http://www.wrox.com/WileyCDA/WroxTitle/productCd-1118013840.html
by James Lee
ISBN 1-59059-391-X [3rd edition April 2010 & ebook]
http://www.apress.com/9781430227939
=item I<Learning Perl> (the "Llama Book")
by Randal L. Schwartz, Tom Phoenix, and brian d foy
ISBN 978-1-4493-0358-7 [6th edition June 2011]
ISBN 978-1-4493-0458-4 [ebook]
http://www.learning-perl.com/
=item I<Intermediate Perl> (the "Alpaca Book")
by Randal L. Schwartz and brian d foy, with Tom Phoenix
foreword by Damian Conway
ISBN 978-1-4493-9309-0 [2nd edition August 2012]
ISBN 978-1-4493-0459-1 [ebook]
http://www.intermediateperl.com/
=item I<Mastering Perl>
by brian d foy
ISBN 9978-1-4493-9311-3 [2st edition January 2014]
ISBN 978-1-4493-6487-8 [ebook]
http://www.masteringperl.org/
=item I<Effective Perl Programming>
by Joseph N. Hall, Joshua A. McAdams, brian d foy
ISBN 0-321-49694-9 [2nd edition 2010]
http://www.effectiveperlprogramming.com/
=back
=head2 Task-Oriented
=over 4
=item I<Writing Perl Modules for CPAN>
by Sam Tregar
ISBN 1-59059-018-X [1st edition August 2002 & ebook]
http://www.apress.com/9781590590188
=item I<The Perl Cookbook>
by Tom Christiansen and Nathan Torkington,
with Foreword by Larry Wall
ISBN 978-0-596-00313-5 [2nd Edition August 2003]
ISBN 978-0-596-15888-0 [ebook]
http://oreilly.com/catalog/9780596003135/
=item I<Automating System Administration with Perl>
by David N. Blank-Edelman
ISBN 978-0-596-00639-6 [2nd edition May 2009]
ISBN 978-0-596-80251-6 [ebook]
http://oreilly.com/catalog/9780596006396
=item I<Real World SQL Server Administration with Perl>
by Linchi Shea
ISBN 1-59059-097-X [1st edition July 2003 & ebook]
http://www.apress.com/9781590590973
=back
=head2 Special Topics
=over 4
=item I<Regular Expressions Cookbook>
by Jan Goyvaerts and Steven Levithan
ISBN 978-1-4493-1943-4 [2nd edition August 2012]
ISBN 978-1-4493-2747-7 [ebook]
http://shop.oreilly.com/product/0636920023630.do
=item I<Programming the Perl DBI>
by Tim Bunce and Alligator Descartes
ISBN 978-1-56592-699-8 [February 2000]
ISBN 978-1-4493-8670-2 [ebook]
http://oreilly.com/catalog/9781565926998
=item I<Perl Best Practices>
by Damian Conway
ISBN 978-0-596-00173-5 [1st edition July 2005]
ISBN 978-0-596-15900-9 [ebook]
http://oreilly.com/catalog/9780596001735
=item I<Higher-Order Perl>
by Mark-Jason Dominus
ISBN 1-55860-701-3 [1st edition March 2005]
free ebook http://hop.perl.plover.com/book/
http://hop.perl.plover.com/
=item I<Mastering Regular Expressions>
by Jeffrey E. F. Friedl
ISBN 978-0-596-52812-6 [3rd edition August 2006]
ISBN 978-0-596-55899-4 [ebook]
http://oreilly.com/catalog/9780596528126
=item I<Network Programming with Perl>
by Lincoln Stein
ISBN 0-201-61571-1 [1st edition 2001]
http://www.pearsonhighered.com/educator/product/Network-Programming-with-Perl/9780201615715.page
=item I<Perl Template Toolkit>
by Darren Chamberlain, Dave Cross, and Andy Wardley
ISBN 978-0-596-00476-7 [December 2003]
ISBN 978-1-4493-8647-4 [ebook]
http://oreilly.com/catalog/9780596004767
=item I<Object Oriented Perl>
by Damian Conway
with foreword by Randal L. Schwartz
ISBN 1-884777-79-1 [1st edition August 1999 & ebook]
http://www.manning.com/conway/
=item I<Data Munging with Perl>
by Dave Cross
ISBN 1-930110-00-6 [1st edition 2001 & ebook]
http://www.manning.com/cross
=item I<Mastering Perl/Tk>
by Steve Lidie and Nancy Walsh
ISBN 978-1-56592-716-2 [1st edition January 2002]
ISBN 978-0-596-10344-6 [ebook]
http://oreilly.com/catalog/9781565927162
=item I<Extending and Embedding Perl>
by Tim Jenness and Simon Cozens
ISBN 1-930110-82-0 [1st edition August 2002 & ebook]
http://www.manning.com/jenness
=item I<Pro Perl Debugging>
by Richard Foley with Andy Lester
ISBN 1-59059-454-1 [1st edition July 2005 & ebook]
http://www.apress.com/9781590594544
=back
=head2 Free (as in beer) books
Some of these books are available as free downloads.
I<Higher-Order Perl>: L<http://hop.perl.plover.com/>
I<Modern Perl>: L<http://onyxneon.com/books/modern_perl/>
=head2 Other interesting, non-Perl books
You might notice several familiar Perl concepts in this collection of
ACM columns from Jon Bentley. The similarity to the title of the major
Perl book (which came later) is not completely accidental:
=over 4
=item I<Programming Pearls>
by Jon Bentley
ISBN 978-0-201-65788-3 [2 edition, October 1999]
=item I<More Programming Pearls>
by Jon Bentley
ISBN 0-201-11889-0 [January 1988]
=back
=head2 A note on freshness
Each version of Perl comes with the documentation that was current at
the time of release. This poses a problem for content such as book
lists. There are probably very nice books published after this list
was included in your Perl release, and you can check the latest
released version at L<http://perldoc.perl.org/perlbook.html> .
Some of the books we've listed appear almost ancient in internet
scale, but we've included those books because they still describe the
current way of doing things. Not everything in Perl changes every day.
Many of the beginner-level books, too, go over basic features and
techniques that are still valid today. In general though, we try to
limit this list to books published in the past five years.
=head2 Get your book listed
If your Perl book isn't listed and you think it should be, let us know.
L<mailto:perl5-porters@perl.org>
=cut
PK z3�Z���?j ?j perlcygwin.podnu �[��� If you read this file _as_is_, just ignore the funny characters you
see. It is written in the POD format (see F<pod/perlpod.pod>) which is
specially designed to be readable as is.
=head1 NAME
perlcygwin - Perl for Cygwin
=head1 SYNOPSIS
This document will help you configure, make, test and install Perl
on Cygwin. This document also describes features of Cygwin that will
affect how Perl behaves at runtime.
B<NOTE:> There are pre-built Perl packages available for Cygwin and a
version of Perl is provided in the normal Cygwin install. If you do
not need to customize the configuration, consider using one of those
packages.
=head1 PREREQUISITES FOR COMPILING PERL ON CYGWIN
=head2 Cygwin = GNU+Cygnus+Windows (Don't leave UNIX without it)
The Cygwin tools are ports of the popular GNU development tools for Win32
platforms. They run thanks to the Cygwin library which provides the UNIX
system calls and environment these programs expect. More information
about this project can be found at:
L<http://www.cygwin.com/>
A recent net or commercial release of Cygwin is required.
At the time this document was last updated, Cygwin 1.7.16 was current.
=head2 Cygwin Configuration
While building Perl some changes may be necessary to your Cygwin setup so
that Perl builds cleanly. These changes are B<not> required for normal
Perl usage.
B<NOTE:> The binaries that are built will run on all Win32 versions.
They do not depend on your host system (WinXP/Win2K/Win7) or your
Cygwin configuration (binary/text mounts, cvgserver).
The only dependencies come from hard-coded pathnames like F</usr/local>.
However, your host system and Cygwin configuration will affect Perl's
runtime behavior (see L</"TEST">).
=over 4
=item * C<PATH>
Set the C<PATH> environment variable so that Configure finds the Cygwin
versions of programs. Any not-needed Windows directories should be removed or
moved to the end of your C<PATH>.
=item * I<nroff>
If you do not have I<nroff> (which is part of the I<groff> package),
Configure will B<not> prompt you to install I<man> pages.
=back
=head1 CONFIGURE PERL ON CYGWIN
The default options gathered by Configure with the assistance of
F<hints/cygwin.sh> will build a Perl that supports dynamic loading
(which requires a shared F<cygperl5_16.dll>).
This will run Configure and keep a record:
./Configure 2>&1 | tee log.configure
If you are willing to accept all the defaults run Configure with B<-de>.
However, several useful customizations are available.
=head2 Stripping Perl Binaries on Cygwin
It is possible to strip the EXEs and DLLs created by the build process.
The resulting binaries will be significantly smaller. If you want the
binaries to be stripped, you can either add a B<-s> option when Configure
prompts you,
Any additional ld flags (NOT including libraries)? [none] -s
Any special flags to pass to g++ to create a dynamically loaded
library?
[none] -s
Any special flags to pass to gcc to use dynamic linking? [none] -s
or you can edit F<hints/cygwin.sh> and uncomment the relevant variables
near the end of the file.
=head2 Optional Libraries for Perl on Cygwin
Several Perl functions and modules depend on the existence of
some optional libraries. Configure will find them if they are
installed in one of the directories listed as being used for library
searches. Pre-built packages for most of these are available from
the Cygwin installer.
=over 4
=item * C<-lcrypt>
The crypt package distributed with Cygwin is a Linux compatible 56-bit
DES crypt port by Corinna Vinschen.
Alternatively, the crypt libraries in GNU libc have been ported to Cygwin.
As of libcrypt 1.3 (March 2016), you will need to install the
libcrypt-devel package for Configure to detect crypt().
=item * C<-lgdbm_compat> (C<use GDBM_File>)
GDBM is available for Cygwin.
NOTE: The GDBM library only works on NTFS partitions.
=item * C<-ldb> (C<use DB_File>)
BerkeleyDB is available for Cygwin.
NOTE: The BerkeleyDB library only completely works on NTFS partitions.
=item * C<cygserver> (C<use IPC::SysV>)
A port of SysV IPC is available for Cygwin.
NOTE: This has B<not> been extensively tested. In particular,
C<d_semctl_semun> is undefined because it fails a Configure test
and on Win9x the I<shm*()> functions seem to hang. It also creates
a compile time dependency because F<perl.h> includes F<<sys/ipc.h>>
and F<<sys/sem.h>> (which will be required in the future when compiling
CPAN modules). CURRENTLY NOT SUPPORTED!
=item * C<-lutil>
Included with the standard Cygwin netrelease is the inetutils package
which includes libutil.a.
=back
=head2 Configure-time Options for Perl on Cygwin
The F<INSTALL> document describes several Configure-time options. Some of
these will work with Cygwin, others are not yet possible. Also, some of
these are experimental. You can either select an option when Configure
prompts you or you can define (undefine) symbols on the command line.
=over 4
=item * C<-Uusedl>
Undefining this symbol forces Perl to be compiled statically.
=item * C<-Dusemymalloc>
By default Perl does not use the C<malloc()> included with the Perl source,
because it was slower and not entirely thread-safe. If you want to force
Perl to build with the old -Dusemymalloc define this.
=item * C<-Uuseperlio>
Undefining this symbol disables the PerlIO abstraction. PerlIO is now the
default; it is not recommended to disable PerlIO.
=item * C<-Dusemultiplicity>
Multiplicity is required when embedding Perl in a C program and using
more than one interpreter instance. This is only required when you build
a not-threaded perl with C<-Uuseithreads>.
=item * C<-Uuse64bitint>
By default Perl uses 64 bit integers. If you want to use smaller 32 bit
integers, define this symbol.
=item * C<-Duselongdouble>
I<gcc> supports long doubles (12 bytes). However, several additional
long double math functions are necessary to use them within Perl
(I<{atan2, cos, exp, floor, fmod, frexp, isnan, log, modf, pow, sin, sqrt}l,
strtold>).
These are B<not> yet available with newlib, the Cygwin libc.
=item * C<-Uuseithreads>
Define this symbol if you want not-threaded faster perl.
=item * C<-Duselargefiles>
Cygwin uses 64-bit integers for internal size and position calculations,
this will be correctly detected and defined by Configure.
=item * C<-Dmksymlinks>
Use this to build perl outside of the source tree. Details can be
found in the F<INSTALL> document. This is the recommended way to
build perl from sources.
=back
=head2 Suspicious Warnings on Cygwin
You may see some messages during Configure that seem suspicious.
=over 4
=item * Win9x and C<d_eofnblk>
Win9x does not correctly report C<EOF> with a non-blocking read on a
closed pipe. You will see the following messages:
But it also returns -1 to signal EOF, so be careful!
WARNING: you can't distinguish between EOF and no data!
*** WHOA THERE!!! ***
The recommended value for $d_eofnblk on this machine was
"define"!
Keep the recommended value? [y]
At least for consistency with WinNT, you should keep the recommended
value.
=item * Compiler/Preprocessor defines
The following error occurs because of the Cygwin C<#define> of
C<_LONG_DOUBLE>:
Guessing which symbols your C compiler and preprocessor define...
try.c:<line#>: missing binary operator
This failure does not seem to cause any problems. With older gcc
versions, "parse error" is reported instead of "missing binary
operator".
=back
=head1 MAKE ON CYGWIN
Simply run I<make> and wait:
make 2>&1 | tee log.make
=head1 TEST ON CYGWIN
There are two steps to running the test suite:
make test 2>&1 | tee log.make-test
cd t; ./perl harness 2>&1 | tee ../log.harness
The same tests are run both times, but more information is provided when
running as C<./perl harness>.
Test results vary depending on your host system and your Cygwin
configuration. If a test can pass in some Cygwin setup, it is always
attempted and explainable test failures are documented. It is possible
for Perl to pass all the tests, but it is more likely that some tests
will fail for one of the reasons listed below.
=head2 File Permissions on Cygwin
UNIX file permissions are based on sets of mode bits for
{read,write,execute} for each {user,group,other}. By default Cygwin
only tracks the Win32 read-only attribute represented as the UNIX file
user write bit (files are always readable, files are executable if they
have a F<.{com,bat,exe}> extension or begin with C<#!>, directories are
always readable and executable). On WinNT with the I<ntea> C<CYGWIN>
setting, the additional mode bits are stored as extended file attributes.
On WinNT with the default I<ntsec> C<CYGWIN> setting, permissions use the
standard WinNT security descriptors and access control lists. Without one of
these options, these tests will fail (listing not updated yet):
Failed Test List of failed
------------------------------------
io/fs.t 5, 7, 9-10
lib/anydbm.t 2
lib/db-btree.t 20
lib/db-hash.t 16
lib/db-recno.t 18
lib/gdbm.t 2
lib/ndbm.t 2
lib/odbm.t 2
lib/sdbm.t 2
op/stat.t 9, 20 (.tmp not an executable extension)
=head2 NDBM_File and ODBM_File do not work on FAT filesystems
Do not use NDBM_File or ODBM_File on FAT filesystem. They can be
built on a FAT filesystem, but many tests will fail:
../ext/NDBM_File/ndbm.t 13 3328 71 59 83.10% 1-2 4 16-71
../ext/ODBM_File/odbm.t 255 65280 ?? ?? % ??
../lib/AnyDBM_File.t 2 512 12 2 16.67% 1 4
../lib/Memoize/t/errors.t 0 139 11 5 45.45% 7-11
../lib/Memoize/t/tie_ndbm.t 13 3328 4 4 100.00% 1-4
run/fresh_perl.t 97 1 1.03% 91
If you intend to run only on FAT (or if using AnyDBM_File on FAT),
run Configure with the -Ui_ndbm and -Ui_dbm options to prevent
NDBM_File and ODBM_File being built.
With NTFS (and no CYGWIN=nontsec), there should be no problems even if
perl was built on FAT.
=head2 C<fork()> failures in io_* tests
A C<fork()> failure may result in the following tests failing:
ext/IO/lib/IO/t/io_multihomed.t
ext/IO/lib/IO/t/io_sock.t
ext/IO/lib/IO/t/io_unix.t
See comment on fork in L</Miscellaneous> below.
=head1 Specific features of the Cygwin port
=head2 Script Portability on Cygwin
Cygwin does an outstanding job of providing UNIX-like semantics on top of
Win32 systems. However, in addition to the items noted above, there are
some differences that you should know about. This is a very brief guide
to portability, more information can be found in the Cygwin documentation.
=over 4
=item * Pathnames
Cygwin pathnames are separated by forward (F</>) slashes, Universal
Naming Codes (F<//UNC>) are also supported Since cygwin-1.7 non-POSIX
pathnames are discouraged. Names may contain all printable
characters.
File names are case insensitive, but case preserving. A pathname that
contains a backslash or drive letter is a Win32 pathname, and not
subject to the translations applied to POSIX style pathnames, but
cygwin will warn you, so better convert them to POSIX.
For conversion we have C<Cygwin::win_to_posix_path()> and
C<Cygwin::posix_to_win_path()>.
Since cygwin-1.7 pathnames are UTF-8 encoded.
=item * Text/Binary
Since cygwin-1.7 textmounts are deprecated and strongly discouraged.
When a file is opened it is in either text or binary mode. In text mode
a file is subject to CR/LF/Ctrl-Z translations. With Cygwin, the default
mode for an C<open()> is determined by the mode of the mount that underlies
the file. See L</Cygwin::is_binmount>(). Perl provides a C<binmode()> function
to set binary mode on files that otherwise would be treated as text.
C<sysopen()> with the C<O_TEXT> flag sets text mode on files that otherwise
would be treated as binary:
sysopen(FOO, "bar", O_WRONLY|O_CREAT|O_TEXT)
C<lseek()>, C<tell()> and C<sysseek()> only work with files opened in binary
mode.
The text/binary issue is covered at length in the Cygwin documentation.
=item * PerlIO
PerlIO overrides the default Cygwin Text/Binary behaviour. A file will
always be treated as binary, regardless of the mode of the mount it lives
on, just like it is in UNIX. So CR/LF translation needs to be requested in
either the C<open()> call like this:
open(FH, ">:crlf", "out.txt");
which will do conversion from LF to CR/LF on the output, or in the
environment settings (add this to your .bashrc):
export PERLIO=crlf
which will pull in the crlf PerlIO layer which does LF -> CRLF conversion
on every output generated by perl.
=item * F<.exe>
The Cygwin C<stat()>, C<lstat()> and C<readlink()> functions make the F<.exe>
extension transparent by looking for F<foo.exe> when you ask for F<foo>
(unless a F<foo> also exists). Cygwin does not require a F<.exe>
extension, but I<gcc> adds it automatically when building a program.
However, when accessing an executable as a normal file (e.g., I<cp>
in a makefile) the F<.exe> is not transparent. The I<install> program
included with Cygwin automatically appends a F<.exe> when necessary.
=item * Cygwin vs. Windows process ids
Cygwin processes have their own pid, which is different from the
underlying windows pid. Most posix compliant Proc functions expect
the cygwin pid, but several Win32::Process functions expect the
winpid. E.g. C<$$> is the cygwin pid of F</usr/bin/perl>, which is not
the winpid. Use C<Cygwin::pid_to_winpid()> and C<Cygwin::winpid_to_pid()>
to translate between them.
=item * Cygwin vs. Windows errors
Under Cygwin, $^E is the same as $!. When using L<Win32 API Functions|Win32>,
use C<Win32::GetLastError()> to get the last Windows error.
=item * rebase errors on fork or system
Using C<fork()> or C<system()> out to another perl after loading multiple dlls
may result on a DLL baseaddress conflict. The internal cygwin error
looks like like the following:
0 [main] perl 8916 child_info_fork::abort: data segment start:
parent (0xC1A000) != child(0xA6A000)
or:
183 [main] perl 3588 C:\cygwin\bin\perl.exe: *** fatal error -
unable to remap C:\cygwin\bin\cygsvn_subr-1-0.dll to same address
as parent(0x6FB30000) != 0x6FE60000 46 [main] perl 3488 fork: child
3588 - died waiting for dll loading, errno11
See L<http://cygwin.com/faq/faq-nochunks.html#faq.using.fixing-fork-failures>
It helps if not too many DLLs are loaded in memory so the available address space is larger,
e.g. stopping the MS Internet Explorer might help.
Use the perlrebase or rebase utilities to resolve the conflicting dll addresses.
The rebase package is included in the Cygwin setup. Use F<setup.exe>
from L<http://www.cygwin.com/setup.exe> to install it.
1. kill all perl processes and run C<perlrebase> or
2. kill all cygwin processes and services, start dash from cmd.exe and run C<rebaseall>.
=item * C<chown()>
On WinNT C<chown()> can change a file's user and group IDs. On Win9x C<chown()>
is a no-op, although this is appropriate since there is no security model.
=item * Miscellaneous
File locking using the C<F_GETLK> command to C<fcntl()> is a stub that
returns C<ENOSYS>.
Win9x can not C<rename()> an open file (although WinNT can).
The Cygwin C<chroot()> implementation has holes (it can not restrict file
access by native Win32 programs).
Inplace editing C<perl -i> of files doesn't work without doing a backup
of the file being edited C<perl -i.bak> because of windowish restrictions,
therefore Perl adds the suffix C<.bak> automatically if you use C<perl -i>
without specifying a backup extension.
=back
=head2 Prebuilt methods:
=over 4
=item C<Cwd::cwd>
Returns the current working directory.
=item C<Cygwin::pid_to_winpid>
Translates a cygwin pid to the corresponding Windows pid (which may or
may not be the same).
=item C<Cygwin::winpid_to_pid>
Translates a Windows pid to the corresponding cygwin pid (if any).
=item C<Cygwin::win_to_posix_path>
Translates a Windows path to the corresponding cygwin path respecting
the current mount points. With a second non-null argument returns an
absolute path. Double-byte characters will not be translated.
=item C<Cygwin::posix_to_win_path>
Translates a cygwin path to the corresponding cygwin path respecting
the current mount points. With a second non-null argument returns an
absolute path. Double-byte characters will not be translated.
=item C<Cygwin::mount_table()>
Returns an array of [mnt_dir, mnt_fsname, mnt_type, mnt_opts].
perl -e 'for $i (Cygwin::mount_table) {print join(" ",@$i),"\n";}'
/bin c:\cygwin\bin system binmode,cygexec
/usr/bin c:\cygwin\bin system binmode
/usr/lib c:\cygwin\lib system binmode
/ c:\cygwin system binmode
/cygdrive/c c: system binmode,noumount
/cygdrive/d d: system binmode,noumount
/cygdrive/e e: system binmode,noumount
=item C<Cygwin::mount_flags>
Returns the mount type and flags for a specified mount point.
A comma-separated string of mntent->mnt_type (always
"system" or "user"), then the mntent->mnt_opts, where
the first is always "binmode" or "textmode".
system|user,binmode|textmode,exec,cygexec,cygdrive,mixed,
notexec,managed,nosuid,devfs,proc,noumount
If the argument is "/cygdrive", then just the volume mount settings,
and the cygdrive mount prefix are returned.
User mounts override system mounts.
$ perl -e 'print Cygwin::mount_flags "/usr/bin"'
system,binmode,cygexec
$ perl -e 'print Cygwin::mount_flags "/cygdrive"'
binmode,cygdrive,/cygdrive
=item C<Cygwin::is_binmount>
Returns true if the given cygwin path is binary mounted, false if the
path is mounted in textmode.
=item C<Cygwin::sync_winenv>
Cygwin does not initialize all original Win32 environment variables.
See the bottom of this page L<http://cygwin.com/cygwin-ug-net/setup-env.html>
for "Restricted Win32 environment".
Certain Win32 programs called from cygwin programs might need some environment
variable, such as e.g. ADODB needs %COMMONPROGRAMFILES%.
Call Cygwin::sync_winenv() to copy all Win32 environment variables to your
process and note that cygwin will warn on every encounter of non-POSIX paths.
=back
=head1 INSTALL PERL ON CYGWIN
This will install Perl, including I<man> pages.
make install 2>&1 | tee log.make-install
NOTE: If C<STDERR> is redirected C<make install> will B<not> prompt
you to install I<perl> into F</usr/bin>.
You may need to be I<Administrator> to run C<make install>. If you
are not, you must have write access to the directories in question.
Information on installing the Perl documentation in HTML format can be
found in the F<INSTALL> document.
=head1 MANIFEST ON CYGWIN
These are the files in the Perl release that contain references to Cygwin.
These very brief notes attempt to explain the reason for all conditional
code. Hopefully, keeping this up to date will allow the Cygwin port to
be kept as clean as possible.
=over 4
=item Documentation
INSTALL README.cygwin README.win32 MANIFEST
pod/perl.pod pod/perlport.pod pod/perlfaq3.pod
pod/perldelta.pod pod/perl5004delta.pod pod/perl56delta.pod
pod/perl561delta.pod pod/perl570delta.pod pod/perl572delta.pod
pod/perl573delta.pod pod/perl58delta.pod pod/perl581delta.pod
pod/perl590delta.pod pod/perlhist.pod pod/perlmodlib.pod
pod/perltoc.pod Porting/Glossary pod/perlgit.pod
Porting/checkAUTHORS.pl
dist/Cwd/Changes ext/Compress-Raw-Zlib/Changes
dist/Time-HiRes/Changes
ext/Compress-Raw-Zlib/README ext/Compress-Zlib/Changes
ext/DB_File/Changes ext/Encode/Changes ext/Sys-Syslog/Changes
ext/Win32API-File/Changes
lib/ExtUtils/CBuilder/Changes lib/ExtUtils/Changes
lib/ExtUtils/NOTES lib/ExtUtils/PATCHING lib/ExtUtils/README
lib/Net/Ping/Changes lib/Test/Harness/Changes
lib/Term/ANSIColor/ChangeLog lib/Term/ANSIColor/README
README.symbian symbian/TODO
=item Build, Configure, Make, Install
cygwin/Makefile.SHs
ext/IPC/SysV/hints/cygwin.pl
ext/NDBM_File/hints/cygwin.pl
ext/ODBM_File/hints/cygwin.pl
hints/cygwin.sh
Configure - help finding hints from uname,
shared libperl required for dynamic loading
Makefile.SH Cross/Makefile-cross-SH
- linklibperl
Porting/patchls - cygwin in port list
installman - man pages with :: translated to .
installperl - install dll, install to 'pods'
makedepend.SH - uwinfix
regen_lib.pl - file permissions
NetWare/Makefile
plan9/mkfile
symbian/sanity.pl symbian/sisify.pl
hints/uwin.sh
vms/descrip_mms.template
win32/Makefile win32/makefile.mk
=item Tests
t/io/fs.t - no file mode checks if not ntsec
skip rename() check when not
check_case:relaxed
t/io/tell.t - binmode
t/lib/cygwin.t - builtin cygwin function tests
t/op/groups.t - basegroup has ID = 0
t/op/magic.t - $^X/symlink WORKAROUND, s/.exe//
t/op/stat.t - no /dev, skip Win32 ftCreationTime quirk
(cache manager sometimes preserves ctime of
file previously created and deleted), no -u
(setuid)
t/op/taint.t - can't use empty path under Cygwin Perl
t/op/time.t - no tzset()
=item Compiled Perl Source
EXTERN.h - __declspec(dllimport)
XSUB.h - __declspec(dllexport)
cygwin/cygwin.c - os_extras (getcwd, spawn, and several
Cygwin:: functions)
perl.c - os_extras, -i.bak
perl.h - binmode
doio.c - win9x can not rename a file when it is open
pp_sys.c - do not define h_errno, init
_pwent_struct.pw_comment
util.c - use setenv
util.h - PERL_FILE_IS_ABSOLUTE macro
pp.c - Comment about Posix vs IEEE math under
Cygwin
perlio.c - CR/LF mode
perliol.c - Comment about EXTCONST under Cygwin
=item Compiled Module Source
ext/Compress-Raw-Zlib/Makefile.PL
- Can't install via CPAN shell under Cygwin
ext/Compress-Raw-Zlib/zlib-src/zutil.h
- Cygwin is Unix-like and has vsnprintf
ext/Errno/Errno_pm.PL - Special handling for Win32 Perl under
Cygwin
ext/POSIX/POSIX.xs - tzname defined externally
ext/SDBM_File/sdbm/pair.c
- EXTCONST needs to be redefined from
EXTERN.h
ext/SDBM_File/sdbm/sdbm.c
- binary open
ext/Sys/Syslog/Syslog.xs
- Cygwin has syslog.h
ext/Sys/Syslog/win32/compile.pl
- Convert paths to Windows paths
ext/Time-HiRes/HiRes.xs
- Various timers not available
ext/Time-HiRes/Makefile.PL
- Find w32api/windows.h
ext/Win32/Makefile.PL - Use various libraries under Cygwin
ext/Win32/Win32.xs - Child dir and child env under Cygwin
ext/Win32API-File/File.xs
- _open_osfhandle not implemented under
Cygwin
ext/Win32CORE/Win32CORE.c
- __declspec(dllexport)
=item Perl Modules/Scripts
ext/B/t/OptreeCheck.pm - Comment about stderr/stdout order under
Cygwin
ext/Digest-SHA/bin/shasum
- Use binary mode under Cygwin
ext/Sys/Syslog/win32/Win32.pm
- Convert paths to Windows paths
ext/Time-HiRes/HiRes.pm
- Comment about various timers not available
ext/Win32API-File/File.pm
- _open_osfhandle not implemented under
Cygwin
ext/Win32CORE/Win32CORE.pm
- History of Win32CORE under Cygwin
lib/Cwd.pm - hook to internal Cwd::cwd
lib/ExtUtils/CBuilder/Platform/cygwin.pm
- use gcc for ld, and link to libperl.dll.a
lib/ExtUtils/CBuilder.pm
- Cygwin is Unix-like
lib/ExtUtils/Install.pm - Install and rename issues under Cygwin
lib/ExtUtils/MM.pm - OS classifications
lib/ExtUtils/MM_Any.pm - Example for Cygwin
lib/ExtUtils/MakeMaker.pm
- require MM_Cygwin.pm
lib/ExtUtils/MM_Cygwin.pm
- canonpath, cflags, manifypods, perl_archive
lib/File/Fetch.pm - Comment about quotes using a Cygwin example
lib/File/Find.pm - on remote drives stat() always sets
st_nlink to 1
lib/File/Spec/Cygwin.pm - case_tolerant
lib/File/Spec/Unix.pm - preserve //unc
lib/File/Spec/Win32.pm - References a message on cygwin.com
lib/File/Spec.pm - Pulls in lib/File/Spec/Cygwin.pm
lib/File/Temp.pm - no directory sticky bit
lib/Module/CoreList.pm - List of all module files and versions
lib/Net/Domain.pm - No domainname command under Cygwin
lib/Net/Netrc.pm - Bypass using stat() under Cygwin
lib/Net/Ping.pm - ECONREFUSED is EAGAIN under Cygwin
lib/Pod/Find.pm - Set 'pods' dir
lib/Pod/Perldoc/ToMan.pm - '-c' switch for pod2man
lib/Pod/Perldoc.pm - Use 'less' pager, and use .exe extension
lib/Term/ANSIColor.pm - Cygwin terminal info
lib/perl5db.pl - use stdin not /dev/tty
utils/perlbug.PL - Add CYGWIN environment variable to report
=item Perl Module Tests
dist/Cwd/t/cwd.t
ext/Compress-Zlib/t/14gzopen.t
ext/DB_File/t/db-btree.t
ext/DB_File/t/db-hash.t
ext/DB_File/t/db-recno.t
ext/DynaLoader/t/DynaLoader.t
ext/File-Glob/t/basic.t
ext/GDBM_File/t/gdbm.t
ext/POSIX/t/sysconf.t
ext/POSIX/t/time.t
ext/SDBM_File/t/sdbm.t
ext/Sys/Syslog/t/syslog.t
ext/Time-HiRes/t/HiRes.t
ext/Win32/t/Unicode.t
ext/Win32API-File/t/file.t
ext/Win32CORE/t/win32core.t
lib/AnyDBM_File.t
lib/Archive/Extract/t/01_Archive-Extract.t
lib/Archive/Tar/t/02_methods.t
lib/ExtUtils/t/Embed.t
lib/ExtUtils/t/eu_command.t
lib/ExtUtils/t/MM_Cygwin.t
lib/ExtUtils/t/MM_Unix.t
lib/File/Compare.t
lib/File/Copy.t
lib/File/Find/t/find.t
lib/File/Path.t
lib/File/Spec/t/crossplatform.t
lib/File/Spec/t/Spec.t
lib/Net/hostent.t
lib/Net/Ping/t/110_icmp_inst.t
lib/Net/Ping/t/500_ping_icmp.t
lib/Net/t/netrc.t
lib/Pod/Simple/t/perlcyg.pod
lib/Pod/Simple/t/perlcygo.txt
lib/Pod/Simple/t/perlfaq.pod
lib/Pod/Simple/t/perlfaqo.txt
lib/User/grent.t
lib/User/pwent.t
=back
=head1 BUGS ON CYGWIN
Support for swapping real and effective user and group IDs is incomplete.
On WinNT Cygwin provides C<setuid()>, C<seteuid()>, C<setgid()> and C<setegid()>.
However, additional Cygwin calls for manipulating WinNT access tokens
and security contexts are required.
=head1 AUTHORS
Charles Wilson <cwilson@ece.gatech.edu>,
Eric Fifer <egf7@columbia.edu>,
alexander smishlajev <als@turnhere.com>,
Steven Morlock <newspost@morlock.net>,
Sebastien Barre <Sebastien.Barre@utc.fr>,
Teun Burgers <burgers@ecn.nl>,
Gerrit P. Haase <gp@familiehaase.de>,
Reini Urban <rurban@cpan.org>,
Jan Dubois <jand@activestate.com>,
Jerry D. Hedden <jdhedden@cpan.org>.
=head1 HISTORY
Last updated: 2012-02-08
PK z3�Z,�¥;B ;B perlunicode.podnu �[��� =head1 NAME
perlunicode - Unicode support in Perl
=head1 DESCRIPTION
If you haven't already, before reading this document, you should become
familiar with both L<perlunitut> and L<perluniintro>.
Unicode aims to B<UNI>-fy the en-B<CODE>-ings of all the world's
character sets into a single Standard. For quite a few of the various
coding standards that existed when Unicode was first created, converting
from each to Unicode essentially meant adding a constant to each code
point in the original standard, and converting back meant just
subtracting that same constant. For ASCII and ISO-8859-1, the constant
is 0. For ISO-8859-5, (Cyrillic) the constant is 864; for Hebrew
(ISO-8859-8), it's 1488; Thai (ISO-8859-11), 3424; and so forth. This
made it easy to do the conversions, and facilitated the adoption of
Unicode.
And it worked; nowadays, those legacy standards are rarely used. Most
everyone uses Unicode.
Unicode is a comprehensive standard. It specifies many things outside
the scope of Perl, such as how to display sequences of characters. For
a full discussion of all aspects of Unicode, see
L<http://www.unicode.org>.
=head2 Important Caveats
Even though some of this section may not be understandable to you on
first reading, we think it's important enough to highlight some of the
gotchas before delving further, so here goes:
Unicode support is an extensive requirement. While Perl does not
implement the Unicode standard or the accompanying technical reports
from cover to cover, Perl does support many Unicode features.
Also, the use of Unicode may present security issues that aren't
obvious, see L</Security Implications of Unicode>.
=over 4
=item Safest if you C<use feature 'unicode_strings'>
In order to preserve backward compatibility, Perl does not turn
on full internal Unicode support unless the pragma
L<S<C<use feature 'unicode_strings'>>|feature/The 'unicode_strings' feature>
is specified. (This is automatically
selected if you S<C<use 5.012>> or higher.) Failure to do this can
trigger unexpected surprises. See L</The "Unicode Bug"> below.
This pragma doesn't affect I/O. Nor does it change the internal
representation of strings, only their interpretation. There are still
several places where Unicode isn't fully supported, such as in
filenames.
=item Input and Output Layers
Use the C<:encoding(...)> layer to read from and write to
filehandles using the specified encoding. (See L<open>.)
=item You should convert your non-ASCII, non-UTF-8 Perl scripts to be
UTF-8.
See L<encoding>.
=item C<use utf8> still needed to enable L<UTF-8|/Unicode Encodings> in scripts
If your Perl script is itself encoded in L<UTF-8|/Unicode Encodings>,
the S<C<use utf8>> pragma must be explicitly included to enable
recognition of that (in string or regular expression literals, or in
identifier names). B<This is the only time when an explicit S<C<use
utf8>> is needed.> (See L<utf8>).
If a Perl script begins with the bytes that form the UTF-8 encoding of
the Unicode BYTE ORDER MARK (C<BOM>, see L</Unicode Encodings>), those
bytes are completely ignored.
=item L<UTF-16|/Unicode Encodings> scripts autodetected
If a Perl script begins with the Unicode C<BOM> (UTF-16LE,
UTF16-BE), or if the script looks like non-C<BOM>-marked
UTF-16 of either endianness, Perl will correctly read in the script as
the appropriate Unicode encoding.
=back
=head2 Byte and Character Semantics
Before Unicode, most encodings used 8 bits (a single byte) to encode
each character. Thus a character was a byte, and a byte was a
character, and there could be only 256 or fewer possible characters.
"Byte Semantics" in the title of this section refers to
this behavior. There was no need to distinguish between "Byte" and
"Character".
Then along comes Unicode which has room for over a million characters
(and Perl allows for even more). This means that a character may
require more than a single byte to represent it, and so the two terms
are no longer equivalent. What matter are the characters as whole
entities, and not usually the bytes that comprise them. That's what the
term "Character Semantics" in the title of this section refers to.
Perl had to change internally to decouple "bytes" from "characters".
It is important that you too change your ideas, if you haven't already,
so that "byte" and "character" no longer mean the same thing in your
mind.
The basic building block of Perl strings has always been a "character".
The changes basically come down to that the implementation no longer
thinks that a character is always just a single byte.
There are various things to note:
=over 4
=item *
String handling functions, for the most part, continue to operate in
terms of characters. C<length()>, for example, returns the number of
characters in a string, just as before. But that number no longer is
necessarily the same as the number of bytes in the string (there may be
more bytes than characters). The other such functions include
C<chop()>, C<chomp()>, C<substr()>, C<pos()>, C<index()>, C<rindex()>,
C<sort()>, C<sprintf()>, and C<write()>.
The exceptions are:
=over 4
=item *
the bit-oriented C<vec>
E<nbsp>
=item *
the byte-oriented C<pack>/C<unpack> C<"C"> format
However, the C<W> specifier does operate on whole characters, as does the
C<U> specifier.
=item *
some operators that interact with the platform's operating system
Operators dealing with filenames are examples.
=item *
when the functions are called from within the scope of the
S<C<L<use bytes|bytes>>> pragma
Likely, you should use this only for debugging anyway.
=back
=item *
Strings--including hash keys--and regular expression patterns may
contain characters that have ordinal values larger than 255.
If you use a Unicode editor to edit your program, Unicode characters may
occur directly within the literal strings in UTF-8 encoding, or UTF-16.
(The former requires a C<use utf8>, the latter may require a C<BOM>.)
L<perluniintro/Creating Unicode> gives other ways to place non-ASCII
characters in your strings.
=item *
The C<chr()> and C<ord()> functions work on whole characters.
=item *
Regular expressions match whole characters. For example, C<"."> matches
a whole character instead of only a single byte.
=item *
The C<tr///> operator translates whole characters. (Note that the
C<tr///CU> functionality has been removed. For similar functionality to
that, see C<pack('U0', ...)> and C<pack('C0', ...)>).
=item *
C<scalar reverse()> reverses by character rather than by byte.
=item *
The bit string operators, C<& | ^ ~> and (starting in v5.22)
C<&. |. ^. ~.> can operate on characters that don't fit into a byte.
However, the current behavior is likely to change. You should not use
these operators on strings that are encoded in UTF-8. If you're not
sure about the encoding of a string, downgrade it before using any of
these operators; you can use
L<C<utf8::utf8_downgrade()>|utf8/Utility functions>.
=back
The bottom line is that Perl has always practiced "Character Semantics",
but with the advent of Unicode, that is now different than "Byte
Semantics".
=head2 ASCII Rules versus Unicode Rules
Before Unicode, when a character was a byte was a character,
Perl knew only about the 128 characters defined by ASCII, code points 0
through 127 (except for under L<S<C<use locale>>|perllocale>). That
left the code
points 128 to 255 as unassigned, and available for whatever use a
program might want. The only semantics they have is their ordinal
numbers, and that they are members of none of the non-negative character
classes. None are considered to match C<\w> for example, but all match
C<\W>.
Unicode, of course, assigns each of those code points a particular
meaning (along with ones above 255). To preserve backward
compatibility, Perl only uses the Unicode meanings when there is some
indication that Unicode is what is intended; otherwise the non-ASCII
code points remain treated as if they are unassigned.
Here are the ways that Perl knows that a string should be treated as
Unicode:
=over
=item *
Within the scope of S<C<use utf8>>
If the whole program is Unicode (signified by using 8-bit B<U>nicode
B<T>ransformation B<F>ormat), then all strings within it must be
Unicode.
=item *
Within the scope of
L<S<C<use feature 'unicode_strings'>>|feature/The 'unicode_strings' feature>
This pragma was created so you can explicitly tell Perl that operations
executed within its scope are to use Unicode rules. More operations are
affected with newer perls. See L</The "Unicode Bug">.
=item *
Within the scope of S<C<use 5.012>> or higher
This implicitly turns on S<C<use feature 'unicode_strings'>>.
=item *
Within the scope of
L<S<C<use locale 'not_characters'>>|perllocale/Unicode and UTF-8>,
or L<S<C<use locale>>|perllocale> and the current
locale is a UTF-8 locale.
The former is defined to imply Unicode handling; and the latter
indicates a Unicode locale, hence a Unicode interpretation of all
strings within it.
=item *
When the string contains a Unicode-only code point
Perl has never accepted code points above 255 without them being
Unicode, so their use implies Unicode for the whole string.
=item *
When the string contains a Unicode named code point C<\N{...}>
The C<\N{...}> construct explicitly refers to a Unicode code point,
even if it is one that is also in ASCII. Therefore the string
containing it must be Unicode.
=item *
When the string has come from an external source marked as
Unicode
The L<C<-C>|perlrun/-C [numberE<sol>list]> command line option can
specify that certain inputs to the program are Unicode, and the values
of this can be read by your Perl code, see L<perlvar/"${^UNICODE}">.
=item * When the string has been upgraded to UTF-8
The function L<C<utf8::utf8_upgrade()>|utf8/Utility functions>
can be explicitly used to permanently (unless a subsequent
C<utf8::utf8_downgrade()> is called) cause a string to be treated as
Unicode.
=item * There are additional methods for regular expression patterns
A pattern that is compiled with the C<< /u >> or C<< /a >> modifiers is
treated as Unicode (though there are some restrictions with C<< /a >>).
Under the C<< /d >> and C<< /l >> modifiers, there are several other
indications for Unicode; see L<perlre/Character set modifiers>.
=back
Note that all of the above are overridden within the scope of
C<L<use bytes|bytes>>; but you should be using this pragma only for
debugging.
Note also that some interactions with the platform's operating system
never use Unicode rules.
When Unicode rules are in effect:
=over 4
=item *
Case translation operators use the Unicode case translation tables.
Note that C<uc()>, or C<\U> in interpolated strings, translates to
uppercase, while C<ucfirst>, or C<\u> in interpolated strings,
translates to titlecase in languages that make the distinction (which is
equivalent to uppercase in languages without the distinction).
There is a CPAN module, C<L<Unicode::Casing>>, which allows you to
define your own mappings to be used in C<lc()>, C<lcfirst()>, C<uc()>,
C<ucfirst()>, and C<fc> (or their double-quoted string inlined versions
such as C<\U>). (Prior to Perl 5.16, this functionality was partially
provided in the Perl core, but suffered from a number of insurmountable
drawbacks, so the CPAN module was written instead.)
=item *
Character classes in regular expressions match based on the character
properties specified in the Unicode properties database.
C<\w> can be used to match a Japanese ideograph, for instance; and
C<[[:digit:]]> a Bengali number.
=item *
Named Unicode properties, scripts, and block ranges may be used (like
bracketed character classes) by using the C<\p{}> "matches property"
construct and the C<\P{}> negation, "doesn't match property".
See L</"Unicode Character Properties"> for more details.
You can define your own character properties and use them
in the regular expression with the C<\p{}> or C<\P{}> construct.
See L</"User-Defined Character Properties"> for more details.
=back
=head2 Extended Grapheme Clusters (Logical characters)
Consider a character, say C<H>. It could appear with various marks around it,
such as an acute accent, or a circumflex, or various hooks, circles, arrows,
I<etc.>, above, below, to one side or the other, I<etc>. There are many
possibilities among the world's languages. The number of combinations is
astronomical, and if there were a character for each combination, it would
soon exhaust Unicode's more than a million possible characters. So Unicode
took a different approach: there is a character for the base C<H>, and a
character for each of the possible marks, and these can be variously combined
to get a final logical character. So a logical character--what appears to be a
single character--can be a sequence of more than one individual characters.
The Unicode standard calls these "extended grapheme clusters" (which
is an improved version of the no-longer much used "grapheme cluster");
Perl furnishes the C<\X> regular expression construct to match such
sequences in their entirety.
But Unicode's intent is to unify the existing character set standards and
practices, and several pre-existing standards have single characters that
mean the same thing as some of these combinations, like ISO-8859-1,
which has quite a few of them. For example, C<"LATIN CAPITAL LETTER E
WITH ACUTE"> was already in this standard when Unicode came along.
Unicode therefore added it to its repertoire as that single character.
But this character is considered by Unicode to be equivalent to the
sequence consisting of the character C<"LATIN CAPITAL LETTER E">
followed by the character C<"COMBINING ACUTE ACCENT">.
C<"LATIN CAPITAL LETTER E WITH ACUTE"> is called a "pre-composed"
character, and its equivalence with the "E" and the "COMBINING ACCENT"
sequence is called canonical equivalence. All pre-composed characters
are said to have a decomposition (into the equivalent sequence), and the
decomposition type is also called canonical. A string may be comprised
as much as possible of precomposed characters, or it may be comprised of
entirely decomposed characters. Unicode calls these respectively,
"Normalization Form Composed" (NFC) and "Normalization Form Decomposed".
The C<L<Unicode::Normalize>> module contains functions that convert
between the two. A string may also have both composed characters and
decomposed characters; this module can be used to make it all one or the
other.
You may be presented with strings in any of these equivalent forms.
There is currently nothing in Perl 5 that ignores the differences. So
you'll have to specially hanlde it. The usual advice is to convert your
inputs to C<NFD> before processing further.
For more detailed information, see L<http://unicode.org/reports/tr15/>.
=head2 Unicode Character Properties
(The only time that Perl considers a sequence of individual code
points as a single logical character is in the C<\X> construct, already
mentioned above. Therefore "character" in this discussion means a single
Unicode code point.)
Very nearly all Unicode character properties are accessible through
regular expressions by using the C<\p{}> "matches property" construct
and the C<\P{}> "doesn't match property" for its negation.
For instance, C<\p{Uppercase}> matches any single character with the Unicode
C<"Uppercase"> property, while C<\p{L}> matches any character with a
C<General_Category> of C<"L"> (letter) property (see
L</General_Category> below). Brackets are not
required for single letter property names, so C<\p{L}> is equivalent to C<\pL>.
More formally, C<\p{Uppercase}> matches any single character whose Unicode
C<Uppercase> property value is C<True>, and C<\P{Uppercase}> matches any character
whose C<Uppercase> property value is C<False>, and they could have been written as
C<\p{Uppercase=True}> and C<\p{Uppercase=False}>, respectively.
This formality is needed when properties are not binary; that is, if they can
take on more values than just C<True> and C<False>. For example, the
C<Bidi_Class> property (see L</"Bidirectional Character Types"> below),
can take on several different
values, such as C<Left>, C<Right>, C<Whitespace>, and others. To match these, one needs
to specify both the property name (C<Bidi_Class>), AND the value being
matched against
(C<Left>, C<Right>, I<etc.>). This is done, as in the examples above, by having the
two components separated by an equal sign (or interchangeably, a colon), like
C<\p{Bidi_Class: Left}>.
All Unicode-defined character properties may be written in these compound forms
of C<\p{I<property>=I<value>}> or C<\p{I<property>:I<value>}>, but Perl provides some
additional properties that are written only in the single form, as well as
single-form short-cuts for all binary properties and certain others described
below, in which you may omit the property name and the equals or colon
separator.
Most Unicode character properties have at least two synonyms (or aliases if you
prefer): a short one that is easier to type and a longer one that is more
descriptive and hence easier to understand. Thus the C<"L"> and
C<"Letter"> properties above are equivalent and can be used
interchangeably. Likewise, C<"Upper"> is a synonym for C<"Uppercase">,
and we could have written C<\p{Uppercase}> equivalently as C<\p{Upper}>.
Also, there are typically various synonyms for the values the property
can be. For binary properties, C<"True"> has 3 synonyms: C<"T">,
C<"Yes">, and C<"Y">; and C<"False"> has correspondingly C<"F">,
C<"No">, and C<"N">. But be careful. A short form of a value for one
property may not mean the same thing as the same short form for another.
Thus, for the C<L</General_Category>> property, C<"L"> means
C<"Letter">, but for the L<C<Bidi_Class>|/Bidirectional Character Types>
property, C<"L"> means C<"Left">. A complete list of properties and
synonyms is in L<perluniprops>.
Upper/lower case differences in property names and values are irrelevant;
thus C<\p{Upper}> means the same thing as C<\p{upper}> or even C<\p{UpPeR}>.
Similarly, you can add or subtract underscores anywhere in the middle of a
word, so that these are also equivalent to C<\p{U_p_p_e_r}>. And white space
is irrelevant adjacent to non-word characters, such as the braces and the equals
or colon separators, so C<\p{ Upper }> and C<\p{ Upper_case : Y }> are
equivalent to these as well. In fact, white space and even
hyphens can usually be added or deleted anywhere. So even C<\p{ Up-per case = Yes}> is
equivalent. All this is called "loose-matching" by Unicode. The few places
where stricter matching is used is in the middle of numbers, and in the Perl
extension properties that begin or end with an underscore. Stricter matching
cares about white space (except adjacent to non-word characters),
hyphens, and non-interior underscores.
You can also use negation in both C<\p{}> and C<\P{}> by introducing a caret
(C<^>) between the first brace and the property name: C<\p{^Tamil}> is
equal to C<\P{Tamil}>.
Almost all properties are immune to case-insensitive matching. That is,
adding a C</i> regular expression modifier does not change what they
match. There are two sets that are affected.
The first set is
C<Uppercase_Letter>,
C<Lowercase_Letter>,
and C<Titlecase_Letter>,
all of which match C<Cased_Letter> under C</i> matching.
And the second set is
C<Uppercase>,
C<Lowercase>,
and C<Titlecase>,
all of which match C<Cased> under C</i> matching.
This set also includes its subsets C<PosixUpper> and C<PosixLower> both
of which under C</i> match C<PosixAlpha>.
(The difference between these sets is that some things, such as Roman
numerals, come in both upper and lower case so they are C<Cased>, but
aren't considered letters, so they aren't C<Cased_Letter>'s.)
See L</Beyond Unicode code points> for special considerations when
matching Unicode properties against non-Unicode code points.
=head3 B<General_Category>
Every Unicode character is assigned a general category, which is the "most
usual categorization of a character" (from
L<http://www.unicode.org/reports/tr44>).
The compound way of writing these is like C<\p{General_Category=Number}>
(short: C<\p{gc:n}>). But Perl furnishes shortcuts in which everything up
through the equal or colon separator is omitted. So you can instead just write
C<\pN>.
Here are the short and long forms of the values the C<General Category> property
can have:
Short Long
L Letter
LC, L& Cased_Letter (that is: [\p{Ll}\p{Lu}\p{Lt}])
Lu Uppercase_Letter
Ll Lowercase_Letter
Lt Titlecase_Letter
Lm Modifier_Letter
Lo Other_Letter
M Mark
Mn Nonspacing_Mark
Mc Spacing_Mark
Me Enclosing_Mark
N Number
Nd Decimal_Number (also Digit)
Nl Letter_Number
No Other_Number
P Punctuation (also Punct)
Pc Connector_Punctuation
Pd Dash_Punctuation
Ps Open_Punctuation
Pe Close_Punctuation
Pi Initial_Punctuation
(may behave like Ps or Pe depending on usage)
Pf Final_Punctuation
(may behave like Ps or Pe depending on usage)
Po Other_Punctuation
S Symbol
Sm Math_Symbol
Sc Currency_Symbol
Sk Modifier_Symbol
So Other_Symbol
Z Separator
Zs Space_Separator
Zl Line_Separator
Zp Paragraph_Separator
C Other
Cc Control (also Cntrl)
Cf Format
Cs Surrogate
Co Private_Use
Cn Unassigned
Single-letter properties match all characters in any of the
two-letter sub-properties starting with the same letter.
C<LC> and C<L&> are special: both are aliases for the set consisting of everything matched by C<Ll>, C<Lu>, and C<Lt>.
=head3 B<Bidirectional Character Types>
Because scripts differ in their directionality (Hebrew and Arabic are
written right to left, for example) Unicode supplies a C<Bidi_Class> property.
Some of the values this property can have are:
Value Meaning
L Left-to-Right
LRE Left-to-Right Embedding
LRO Left-to-Right Override
R Right-to-Left
AL Arabic Letter
RLE Right-to-Left Embedding
RLO Right-to-Left Override
PDF Pop Directional Format
EN European Number
ES European Separator
ET European Terminator
AN Arabic Number
CS Common Separator
NSM Non-Spacing Mark
BN Boundary Neutral
B Paragraph Separator
S Segment Separator
WS Whitespace
ON Other Neutrals
This property is always written in the compound form.
For example, C<\p{Bidi_Class:R}> matches characters that are normally
written right to left. Unlike the
C<L</General_Category>> property, this
property can have more values added in a future Unicode release. Those
listed above comprised the complete set for many Unicode releases, but
others were added in Unicode 6.3; you can always find what the
current ones are in L<perluniprops>. And
L<http://www.unicode.org/reports/tr9/> describes how to use them.
=head3 B<Scripts>
The world's languages are written in many different scripts. This sentence
(unless you're reading it in translation) is written in Latin, while Russian is
written in Cyrillic, and Greek is written in, well, Greek; Japanese mainly in
Hiragana or Katakana. There are many more.
The Unicode C<Script> and C<Script_Extensions> properties give what
script a given character is in. The C<Script_Extensions> property is an
improved version of C<Script>, as demonstrated below. Either property
can be specified with the compound form like
C<\p{Script=Hebrew}> (short: C<\p{sc=hebr}>), or
C<\p{Script_Extensions=Javanese}> (short: C<\p{scx=java}>).
In addition, Perl furnishes shortcuts for all
C<Script_Extensions> property names. You can omit everything up through
the equals (or colon), and simply write C<\p{Latin}> or C<\P{Cyrillic}>.
(This is not true for C<Script>, which is required to be
written in the compound form. Prior to Perl v5.26, the single form
returned the plain old C<Script> version, but was changed because
C<Script_Extensions> gives better results.)
The difference between these two properties involves characters that are
used in multiple scripts. For example the digits '0' through '9' are
used in many parts of the world. These are placed in a script named
C<Common>. Other characters are used in just a few scripts. For
example, the C<"KATAKANA-HIRAGANA DOUBLE HYPHEN"> is used in both Japanese
scripts, Katakana and Hiragana, but nowhere else. The C<Script>
property places all characters that are used in multiple scripts in the
C<Common> script, while the C<Script_Extensions> property places those
that are used in only a few scripts into each of those scripts; while
still using C<Common> for those used in many scripts. Thus both these
match:
"0" =~ /\p{sc=Common}/ # Matches
"0" =~ /\p{scx=Common}/ # Matches
and only the first of these match:
"\N{KATAKANA-HIRAGANA DOUBLE HYPHEN}" =~ /\p{sc=Common} # Matches
"\N{KATAKANA-HIRAGANA DOUBLE HYPHEN}" =~ /\p{scx=Common} # No match
And only the last two of these match:
"\N{KATAKANA-HIRAGANA DOUBLE HYPHEN}" =~ /\p{sc=Hiragana} # No match
"\N{KATAKANA-HIRAGANA DOUBLE HYPHEN}" =~ /\p{sc=Katakana} # No match
"\N{KATAKANA-HIRAGANA DOUBLE HYPHEN}" =~ /\p{scx=Hiragana} # Matches
"\N{KATAKANA-HIRAGANA DOUBLE HYPHEN}" =~ /\p{scx=Katakana} # Matches
C<Script_Extensions> is thus an improved C<Script>, in which there are
fewer characters in the C<Common> script, and correspondingly more in
other scripts. It is new in Unicode version 6.0, and its data are likely
to change significantly in later releases, as things get sorted out.
New code should probably be using C<Script_Extensions> and not plain
C<Script>. If you compile perl with a Unicode release that doesn't have
C<Script_Extensions>, the single form Perl extensions will instead refer
to the plain C<Script> property. If you compile with a version of
Unicode that doesn't have the C<Script> property, these extensions will
not be defined at all.
(Actually, besides C<Common>, the C<Inherited> script, contains
characters that are used in multiple scripts. These are modifier
characters which inherit the script value
of the controlling character. Some of these are used in many scripts,
and so go into C<Inherited> in both C<Script> and C<Script_Extensions>.
Others are used in just a few scripts, so are in C<Inherited> in
C<Script>, but not in C<Script_Extensions>.)
It is worth stressing that there are several different sets of digits in
Unicode that are equivalent to 0-9 and are matchable by C<\d> in a
regular expression. If they are used in a single language only, they
are in that language's C<Script> and C<Script_Extensions>. If they are
used in more than one script, they will be in C<sc=Common>, but only
if they are used in many scripts should they be in C<scx=Common>.
The explanation above has omitted some detail; refer to UAX#24 "Unicode
Script Property": L<http://www.unicode.org/reports/tr24>.
A complete list of scripts and their shortcuts is in L<perluniprops>.
=head3 B<Use of the C<"Is"> Prefix>
For backward compatibility (with Perl 5.6), all properties writable
without using the compound form mentioned
so far may have C<Is> or C<Is_> prepended to their name, so C<\P{Is_Lu}>, for
example, is equal to C<\P{Lu}>, and C<\p{IsScript:Arabic}> is equal to
C<\p{Arabic}>.
=head3 B<Blocks>
In addition to B<scripts>, Unicode also defines B<blocks> of
characters. The difference between scripts and blocks is that the
concept of scripts is closer to natural languages, while the concept
of blocks is more of an artificial grouping based on groups of Unicode
characters with consecutive ordinal values. For example, the C<"Basic Latin">
block is all the characters whose ordinals are between 0 and 127, inclusive; in
other words, the ASCII characters. The C<"Latin"> script contains some letters
from this as well as several other blocks, like C<"Latin-1 Supplement">,
C<"Latin Extended-A">, I<etc.>, but it does not contain all the characters from
those blocks. It does not, for example, contain the digits 0-9, because
those digits are shared across many scripts, and hence are in the
C<Common> script.
For more about scripts versus blocks, see UAX#24 "Unicode Script Property":
L<http://www.unicode.org/reports/tr24>
The C<Script_Extensions> or C<Script> properties are likely to be the
ones you want to use when processing
natural language; the C<Block> property may occasionally be useful in working
with the nuts and bolts of Unicode.
Block names are matched in the compound form, like C<\p{Block: Arrows}> or
C<\p{Blk=Hebrew}>. Unlike most other properties, only a few block names have a
Unicode-defined short name.
Perl also defines single form synonyms for the block property in cases
where these do not conflict with something else. But don't use any of
these, because they are unstable. Since these are Perl extensions, they
are subordinate to official Unicode property names; Unicode doesn't know
nor care about Perl's extensions. It may happen that a name that
currently means the Perl extension will later be changed without warning
to mean a different Unicode property in a future version of the perl
interpreter that uses a later Unicode release, and your code would no
longer work. The extensions are mentioned here for completeness: Take
the block name and prefix it with one of: C<In> (for example
C<\p{Blk=Arrows}> can currently be written as C<\p{In_Arrows}>); or
sometimes C<Is> (like C<\p{Is_Arrows}>); or sometimes no prefix at all
(C<\p{Arrows}>). As of this writing (Unicode 9.0) there are no
conflicts with using the C<In_> prefix, but there are plenty with the
other two forms. For example, C<\p{Is_Hebrew}> and C<\p{Hebrew}> mean
C<\p{Script_Extensions=Hebrew}> which is NOT the same thing as
C<\p{Blk=Hebrew}>. Our
advice used to be to use the C<In_> prefix as a single form way of
specifying a block. But Unicode 8.0 added properties whose names begin
with C<In>, and it's now clear that it's only luck that's so far
prevented a conflict. Using C<In> is only marginally less typing than
C<Blk:>, and the latter's meaning is clearer anyway, and guaranteed to
never conflict. So don't take chances. Use C<\p{Blk=foo}> for new
code. And be sure that block is what you really really want to do. In
most cases scripts are what you want instead.
A complete list of blocks is in L<perluniprops>.
=head3 B<Other Properties>
There are many more properties than the very basic ones described here.
A complete list is in L<perluniprops>.
Unicode defines all its properties in the compound form, so all single-form
properties are Perl extensions. Most of these are just synonyms for the
Unicode ones, but some are genuine extensions, including several that are in
the compound form. And quite a few of these are actually recommended by Unicode
(in L<http://www.unicode.org/reports/tr18>).
This section gives some details on all extensions that aren't just
synonyms for compound-form Unicode properties
(for those properties, you'll have to refer to the
L<Unicode Standard|http://www.unicode.org/reports/tr44>.
=over
=item B<C<\p{All}>>
This matches every possible code point. It is equivalent to C<qr/./s>.
Unlike all the other non-user-defined C<\p{}> property matches, no
warning is ever generated if this is property is matched against a
non-Unicode code point (see L</Beyond Unicode code points> below).
=item B<C<\p{Alnum}>>
This matches any C<\p{Alphabetic}> or C<\p{Decimal_Number}> character.
=item B<C<\p{Any}>>
This matches any of the 1_114_112 Unicode code points. It is a synonym
for C<\p{Unicode}>.
=item B<C<\p{ASCII}>>
This matches any of the 128 characters in the US-ASCII character set,
which is a subset of Unicode.
=item B<C<\p{Assigned}>>
This matches any assigned code point; that is, any code point whose L<general
category|/General_Category> is not C<Unassigned> (or equivalently, not C<Cn>).
=item B<C<\p{Blank}>>
This is the same as C<\h> and C<\p{HorizSpace}>: A character that changes the
spacing horizontally.
=item B<C<\p{Decomposition_Type: Non_Canonical}>> (Short: C<\p{Dt=NonCanon}>)
Matches a character that has a non-canonical decomposition.
The L</Extended Grapheme Clusters (Logical characters)> section above
talked about canonical decompositions. However, many more characters
have a different type of decomposition, a "compatible" or
"non-canonical" decomposition. The sequences that form these
decompositions are not considered canonically equivalent to the
pre-composed character. An example is the C<"SUPERSCRIPT ONE">. It is
somewhat like a regular digit 1, but not exactly; its decomposition into
the digit 1 is called a "compatible" decomposition, specifically a
"super" decomposition. There are several such compatibility
decompositions (see L<http://www.unicode.org/reports/tr44>), including
one called "compat", which means some miscellaneous type of
decomposition that doesn't fit into the other decomposition categories
that Unicode has chosen.
Note that most Unicode characters don't have a decomposition, so their
decomposition type is C<"None">.
For your convenience, Perl has added the C<Non_Canonical> decomposition
type to mean any of the several compatibility decompositions.
=item B<C<\p{Graph}>>
Matches any character that is graphic. Theoretically, this means a character
that on a printer would cause ink to be used.
=item B<C<\p{HorizSpace}>>
This is the same as C<\h> and C<\p{Blank}>: a character that changes the
spacing horizontally.
=item B<C<\p{In=*}>>
This is a synonym for C<\p{Present_In=*}>
=item B<C<\p{PerlSpace}>>
This is the same as C<\s>, restricted to ASCII, namely C<S<[ \f\n\r\t]>>
and starting in Perl v5.18, a vertical tab.
Mnemonic: Perl's (original) space
=item B<C<\p{PerlWord}>>
This is the same as C<\w>, restricted to ASCII, namely C<[A-Za-z0-9_]>
Mnemonic: Perl's (original) word.
=item B<C<\p{Posix...}>>
There are several of these, which are equivalents, using the C<\p{}>
notation, for Posix classes and are described in
L<perlrecharclass/POSIX Character Classes>.
=item B<C<\p{Present_In: *}>> (Short: C<\p{In=*}>)
This property is used when you need to know in what Unicode version(s) a
character is.
The "*" above stands for some two digit Unicode version number, such as
C<1.1> or C<4.0>; or the "*" can also be C<Unassigned>. This property will
match the code points whose final disposition has been settled as of the
Unicode release given by the version number; C<\p{Present_In: Unassigned}>
will match those code points whose meaning has yet to be assigned.
For example, C<U+0041> C<"LATIN CAPITAL LETTER A"> was present in the very first
Unicode release available, which is C<1.1>, so this property is true for all
valid "*" versions. On the other hand, C<U+1EFF> was not assigned until version
5.1 when it became C<"LATIN SMALL LETTER Y WITH LOOP">, so the only "*" that
would match it are 5.1, 5.2, and later.
Unicode furnishes the C<Age> property from which this is derived. The problem
with Age is that a strict interpretation of it (which Perl takes) has it
matching the precise release a code point's meaning is introduced in. Thus
C<U+0041> would match only 1.1; and C<U+1EFF> only 5.1. This is not usually what
you want.
Some non-Perl implementations of the Age property may change its meaning to be
the same as the Perl C<Present_In> property; just be aware of that.
Another confusion with both these properties is that the definition is not
that the code point has been I<assigned>, but that the meaning of the code point
has been I<determined>. This is because 66 code points will always be
unassigned, and so the C<Age> for them is the Unicode version in which the decision
to make them so was made. For example, C<U+FDD0> is to be permanently
unassigned to a character, and the decision to do that was made in version 3.1,
so C<\p{Age=3.1}> matches this character, as also does C<\p{Present_In: 3.1}> and up.
=item B<C<\p{Print}>>
This matches any character that is graphical or blank, except controls.
=item B<C<\p{SpacePerl}>>
This is the same as C<\s>, including beyond ASCII.
Mnemonic: Space, as modified by Perl. (It doesn't include the vertical tab
until v5.18, which both the Posix standard and Unicode consider white space.)
=item B<C<\p{Title}>> and B<C<\p{Titlecase}>>
Under case-sensitive matching, these both match the same code points as
C<\p{General Category=Titlecase_Letter}> (C<\p{gc=lt}>). The difference
is that under C</i> caseless matching, these match the same as
C<\p{Cased}>, whereas C<\p{gc=lt}> matches C<\p{Cased_Letter>).
=item B<C<\p{Unicode}>>
This matches any of the 1_114_112 Unicode code points.
C<\p{Any}>.
=item B<C<\p{VertSpace}>>
This is the same as C<\v>: A character that changes the spacing vertically.
=item B<C<\p{Word}>>
This is the same as C<\w>, including over 100_000 characters beyond ASCII.
=item B<C<\p{XPosix...}>>
There are several of these, which are the standard Posix classes
extended to the full Unicode range. They are described in
L<perlrecharclass/POSIX Character Classes>.
=back
=head2 User-Defined Character Properties
You can define your own binary character properties by defining subroutines
whose names begin with C<"In"> or C<"Is">. (The experimental feature
L<perlre/(?[ ])> provides an alternative which allows more complex
definitions.) The subroutines can be defined in any
package. The user-defined properties can be used in the regular expression
C<\p{}> and C<\P{}> constructs; if you are using a user-defined property from a
package other than the one you are in, you must specify its package in the
C<\p{}> or C<\P{}> construct.
# assuming property Is_Foreign defined in Lang::
package main; # property package name required
if ($txt =~ /\p{Lang::IsForeign}+/) { ... }
package Lang; # property package name not required
if ($txt =~ /\p{IsForeign}+/) { ... }
Note that the effect is compile-time and immutable once defined.
However, the subroutines are passed a single parameter, which is 0 if
case-sensitive matching is in effect and non-zero if caseless matching
is in effect. The subroutine may return different values depending on
the value of the flag, and one set of values will immutably be in effect
for all case-sensitive matches, and the other set for all case-insensitive
matches.
Note that if the regular expression is tainted, then Perl will die rather
than calling the subroutine when the name of the subroutine is
determined by the tainted data.
The subroutines must return a specially-formatted string, with one
or more newline-separated lines. Each line must be one of the following:
=over 4
=item *
A single hexadecimal number denoting a code point to include.
=item *
Two hexadecimal numbers separated by horizontal whitespace (space or
tabular characters) denoting a range of code points to include.
=item *
Something to include, prefixed by C<"+">: a built-in character
property (prefixed by C<"utf8::">) or a fully qualified (including package
name) user-defined character property,
to represent all the characters in that property; two hexadecimal code
points for a range; or a single hexadecimal code point.
=item *
Something to exclude, prefixed by C<"-">: an existing character
property (prefixed by C<"utf8::">) or a fully qualified (including package
name) user-defined character property,
to represent all the characters in that property; two hexadecimal code
points for a range; or a single hexadecimal code point.
=item *
Something to negate, prefixed C<"!">: an existing character
property (prefixed by C<"utf8::">) or a fully qualified (including package
name) user-defined character property,
to represent all the characters in that property; two hexadecimal code
points for a range; or a single hexadecimal code point.
=item *
Something to intersect with, prefixed by C<"&">: an existing character
property (prefixed by C<"utf8::">) or a fully qualified (including package
name) user-defined character property,
for all the characters except the characters in the property; two
hexadecimal code points for a range; or a single hexadecimal code point.
=back
For example, to define a property that covers both the Japanese
syllabaries (hiragana and katakana), you can define
sub InKana {
return <<END;
3040\t309F
30A0\t30FF
END
}
Imagine that the here-doc end marker is at the beginning of the line.
Now you can use C<\p{InKana}> and C<\P{InKana}>.
You could also have used the existing block property names:
sub InKana {
return <<'END';
+utf8::InHiragana
+utf8::InKatakana
END
}
Suppose you wanted to match only the allocated characters,
not the raw block ranges: in other words, you want to remove
the unassigned characters:
sub InKana {
return <<'END';
+utf8::InHiragana
+utf8::InKatakana
-utf8::IsCn
END
}
The negation is useful for defining (surprise!) negated classes.
sub InNotKana {
return <<'END';
!utf8::InHiragana
-utf8::InKatakana
+utf8::IsCn
END
}
This will match all non-Unicode code points, since every one of them is
not in Kana. You can use intersection to exclude these, if desired, as
this modified example shows:
sub InNotKana {
return <<'END';
!utf8::InHiragana
-utf8::InKatakana
+utf8::IsCn
&utf8::Any
END
}
C<&utf8::Any> must be the last line in the definition.
Intersection is used generally for getting the common characters matched
by two (or more) classes. It's important to remember not to use C<"&"> for
the first set; that would be intersecting with nothing, resulting in an
empty set.
Unlike non-user-defined C<\p{}> property matches, no warning is ever
generated if these properties are matched against a non-Unicode code
point (see L</Beyond Unicode code points> below).
=head2 User-Defined Case Mappings (for serious hackers only)
B<This feature has been removed as of Perl 5.16.>
The CPAN module C<L<Unicode::Casing>> provides better functionality without
the drawbacks that this feature had. If you are using a Perl earlier
than 5.16, this feature was most fully documented in the 5.14 version of
this pod:
L<http://perldoc.perl.org/5.14.0/perlunicode.html#User-Defined-Case-Mappings-%28for-serious-hackers-only%29>
=head2 Character Encodings for Input and Output
See L<Encode>.
=head2 Unicode Regular Expression Support Level
The following list of Unicode supported features for regular expressions describes
all features currently directly supported by core Perl. The references
to "Level I<N>" and the section numbers refer to
L<UTS#18 "Unicode Regular Expressions"|http://www.unicode.org/reports/tr18>,
version 13, November 2013.
=head3 Level 1 - Basic Unicode Support
RL1.1 Hex Notation - Done [1]
RL1.2 Properties - Done [2]
RL1.2a Compatibility Properties - Done [3]
RL1.3 Subtraction and Intersection - Experimental [4]
RL1.4 Simple Word Boundaries - Done [5]
RL1.5 Simple Loose Matches - Done [6]
RL1.6 Line Boundaries - Partial [7]
RL1.7 Supplementary Code Points - Done [8]
=over 4
=item [1] C<\N{U+...}> and C<\x{...}>
=item [2]
C<\p{...}> C<\P{...}>. This requirement is for a minimal list of
properties. Perl supports these and all other Unicode character
properties, as R2.7 asks (see L</"Unicode Character Properties"> above).
=item [3]
Perl has C<\d> C<\D> C<\s> C<\S> C<\w> C<\W> C<\X> C<[:I<prop>:]>
C<[:^I<prop>:]>, plus all the properties specified by
L<http://www.unicode.org/reports/tr18/#Compatibility_Properties>. These
are described above in L</Other Properties>
=item [4]
The experimental feature C<"(?[...])"> starting in v5.18 accomplishes
this.
See L<perlre/(?[ ])>. If you don't want to use an experimental
feature, you can use one of the following:
=over 4
=item *
Regular expression lookahead
You can mimic class subtraction using lookahead.
For example, what UTS#18 might write as
[{Block=Greek}-[{UNASSIGNED}]]
in Perl can be written as:
(?!\p{Unassigned})\p{Block=Greek}
(?=\p{Assigned})\p{Block=Greek}
But in this particular example, you probably really want
\p{Greek}
which will match assigned characters known to be part of the Greek script.
=item *
CPAN module C<L<Unicode::Regex::Set>>
It does implement the full UTS#18 grouping, intersection, union, and
removal (subtraction) syntax.
=item *
L</"User-Defined Character Properties">
C<"+"> for union, C<"-"> for removal (set-difference), C<"&"> for intersection
=back
=item [5]
C<\b> C<\B> meet most, but not all, the details of this requirement, but
C<\b{wb}> and C<\B{wb}> do, as well as the stricter R2.3.
=item [6]
Note that Perl does Full case-folding in matching, not Simple:
For example C<U+1F88> is equivalent to C<U+1F00 U+03B9>, instead of just
C<U+1F80>. This difference matters mainly for certain Greek capital
letters with certain modifiers: the Full case-folding decomposes the
letter, while the Simple case-folding would map it to a single
character.
=item [7]
The reason this is considered to be only partially implemented is that
Perl has L<C<qrE<sol>\b{lb}E<sol>>|perlrebackslash/\b{lb}> and
C<L<Unicode::LineBreak>> that are conformant with
L<UAX#14 "Unicode Line Breaking Algorithm"|http://www.unicode.org/reports/tr14>.
The regular expression construct provides default behavior, while the
heavier-weight module provides customizable line breaking.
But Perl treats C<\n> as the start- and end-line
delimiter, whereas Unicode specifies more characters that should be
so-interpreted.
These are:
VT U+000B (\v in C)
FF U+000C (\f)
CR U+000D (\r)
NEL U+0085
LS U+2028
PS U+2029
C<^> and C<$> in regular expression patterns are supposed to match all
these, but don't.
These characters also don't, but should, affect C<< <> >> C<$.>, and
script line numbers.
Also, lines should not be split within C<CRLF> (i.e. there is no
empty line between C<\r> and C<\n>). For C<CRLF>, try the C<:crlf>
layer (see L<PerlIO>).
=item [8]
UTF-8/UTF-EBDDIC used in Perl allows not only C<U+10000> to
C<U+10FFFF> but also beyond C<U+10FFFF>
=back
=head3 Level 2 - Extended Unicode Support
RL2.1 Canonical Equivalents - Retracted [9]
by Unicode
RL2.2 Extended Grapheme Clusters - Partial [10]
RL2.3 Default Word Boundaries - Done [11]
RL2.4 Default Case Conversion - Done
RL2.5 Name Properties - Done
RL2.6 Wildcard Properties - Missing
RL2.7 Full Properties - Done
=over 4
=item [9]
Unicode has rewritten this portion of UTS#18 to say that getting
canonical equivalence (see UAX#15
L<"Unicode Normalization Forms"|http://www.unicode.org/reports/tr15>)
is basically to be done at the programmer level. Use NFD to write
both your regular expressions and text to match them against (you
can use L<Unicode::Normalize>).
=item [10]
Perl has C<\X> and C<\b{gcb}> but we don't have a "Grapheme Cluster Mode".
=item [11] see
L<UAX#29 "Unicode Text Segmentation"|http://www.unicode.org/reports/tr29>,
=back
=head3 Level 3 - Tailored Support
RL3.1 Tailored Punctuation - Missing
RL3.2 Tailored Grapheme Clusters - Missing [12]
RL3.3 Tailored Word Boundaries - Missing
RL3.4 Tailored Loose Matches - Retracted by Unicode
RL3.5 Tailored Ranges - Retracted by Unicode
RL3.6 Context Matching - Missing [13]
RL3.7 Incremental Matches - Missing
RL3.8 Unicode Set Sharing - Unicode is proposing
to retract this
RL3.9 Possible Match Sets - Missing
RL3.10 Folded Matching - Retracted by Unicode
RL3.11 Submatchers - Missing
=over 4
=item [12]
Perl has L<Unicode::Collate>, but it isn't integrated with regular
expressions. See
L<UTS#10 "Unicode Collation Algorithms"|http://www.unicode.org/reports/tr10>.
=item [13]
Perl has C<(?<=x)> and C<(?=x)>, but lookaheads or lookbehinds should
see outside of the target substring
=back
=head2 Unicode Encodings
Unicode characters are assigned to I<code points>, which are abstract
numbers. To use these numbers, various encodings are needed.
=over 4
=item *
UTF-8
UTF-8 is a variable-length (1 to 4 bytes), byte-order independent
encoding. In most of Perl's documentation, including elsewhere in this
document, the term "UTF-8" means also "UTF-EBCDIC". But in this section,
"UTF-8" refers only to the encoding used on ASCII platforms. It is a
superset of 7-bit US-ASCII, so anything encoded in ASCII has the
identical representation when encoded in UTF-8.
The following table is from Unicode 3.2.
Code Points 1st Byte 2nd Byte 3rd Byte 4th Byte
U+0000..U+007F 00..7F
U+0080..U+07FF * C2..DF 80..BF
U+0800..U+0FFF E0 * A0..BF 80..BF
U+1000..U+CFFF E1..EC 80..BF 80..BF
U+D000..U+D7FF ED 80..9F 80..BF
U+D800..U+DFFF +++++ utf16 surrogates, not legal utf8 +++++
U+E000..U+FFFF EE..EF 80..BF 80..BF
U+10000..U+3FFFF F0 * 90..BF 80..BF 80..BF
U+40000..U+FFFFF F1..F3 80..BF 80..BF 80..BF
U+100000..U+10FFFF F4 80..8F 80..BF 80..BF
Note the gaps marked by "*" before several of the byte entries above. These are
caused by legal UTF-8 avoiding non-shortest encodings: it is technically
possible to UTF-8-encode a single code point in different ways, but that is
explicitly forbidden, and the shortest possible encoding should always be used
(and that is what Perl does).
Another way to look at it is via bits:
Code Points 1st Byte 2nd Byte 3rd Byte 4th Byte
0aaaaaaa 0aaaaaaa
00000bbbbbaaaaaa 110bbbbb 10aaaaaa
ccccbbbbbbaaaaaa 1110cccc 10bbbbbb 10aaaaaa
00000dddccccccbbbbbbaaaaaa 11110ddd 10cccccc 10bbbbbb 10aaaaaa
As you can see, the continuation bytes all begin with C<"10">, and the
leading bits of the start byte tell how many bytes there are in the
encoded character.
The original UTF-8 specification allowed up to 6 bytes, to allow
encoding of numbers up to C<0x7FFF_FFFF>. Perl continues to allow those,
and has extended that up to 13 bytes to encode code points up to what
can fit in a 64-bit word. However, Perl will warn if you output any of
these as being non-portable; and under strict UTF-8 input protocols,
they are forbidden. In addition, it is deprecated to use a code point
larger than what a signed integer variable on your system can hold. On
32-bit ASCII systems, this means C<0x7FFF_FFFF> is the legal maximum
going forward (much higher on 64-bit systems).
=item *
UTF-EBCDIC
Like UTF-8, but EBCDIC-safe, in the way that UTF-8 is ASCII-safe.
This means that all the basic characters (which includes all
those that have ASCII equivalents (like C<"A">, C<"0">, C<"%">, I<etc.>)
are the same in both EBCDIC and UTF-EBCDIC.)
UTF-EBCDIC is used on EBCDIC platforms. It generally requires more
bytes to represent a given code point than UTF-8 does; the largest
Unicode code points take 5 bytes to represent (instead of 4 in UTF-8),
and, extended for 64-bit words, it uses 14 bytes instead of 13 bytes in
UTF-8.
=item *
UTF-16, UTF-16BE, UTF-16LE, Surrogates, and C<BOM>'s (Byte Order Marks)
The followings items are mostly for reference and general Unicode
knowledge, Perl doesn't use these constructs internally.
Like UTF-8, UTF-16 is a variable-width encoding, but where
UTF-8 uses 8-bit code units, UTF-16 uses 16-bit code units.
All code points occupy either 2 or 4 bytes in UTF-16: code points
C<U+0000..U+FFFF> are stored in a single 16-bit unit, and code
points C<U+10000..U+10FFFF> in two 16-bit units. The latter case is
using I<surrogates>, the first 16-bit unit being the I<high
surrogate>, and the second being the I<low surrogate>.
Surrogates are code points set aside to encode the C<U+10000..U+10FFFF>
range of Unicode code points in pairs of 16-bit units. The I<high
surrogates> are the range C<U+D800..U+DBFF> and the I<low surrogates>
are the range C<U+DC00..U+DFFF>. The surrogate encoding is
$hi = ($uni - 0x10000) / 0x400 + 0xD800;
$lo = ($uni - 0x10000) % 0x400 + 0xDC00;
and the decoding is
$uni = 0x10000 + ($hi - 0xD800) * 0x400 + ($lo - 0xDC00);
Because of the 16-bitness, UTF-16 is byte-order dependent. UTF-16
itself can be used for in-memory computations, but if storage or
transfer is required either UTF-16BE (big-endian) or UTF-16LE
(little-endian) encodings must be chosen.
This introduces another problem: what if you just know that your data
is UTF-16, but you don't know which endianness? Byte Order Marks, or
C<BOM>'s, are a solution to this. A special character has been reserved
in Unicode to function as a byte order marker: the character with the
code point C<U+FEFF> is the C<BOM>.
The trick is that if you read a C<BOM>, you will know the byte order,
since if it was written on a big-endian platform, you will read the
bytes C<0xFE 0xFF>, but if it was written on a little-endian platform,
you will read the bytes C<0xFF 0xFE>. (And if the originating platform
was writing in ASCII platform UTF-8, you will read the bytes
C<0xEF 0xBB 0xBF>.)
The way this trick works is that the character with the code point
C<U+FFFE> is not supposed to be in input streams, so the
sequence of bytes C<0xFF 0xFE> is unambiguously "C<BOM>, represented in
little-endian format" and cannot be C<U+FFFE>, represented in big-endian
format".
Surrogates have no meaning in Unicode outside their use in pairs to
represent other code points. However, Perl allows them to be
represented individually internally, for example by saying
C<chr(0xD801)>, so that all code points, not just those valid for open
interchange, are
representable. Unicode does define semantics for them, such as their
C<L</General_Category>> is C<"Cs">. But because their use is somewhat dangerous,
Perl will warn (using the warning category C<"surrogate">, which is a
sub-category of C<"utf8">) if an attempt is made
to do things like take the lower case of one, or match
case-insensitively, or to output them. (But don't try this on Perls
before 5.14.)
=item *
UTF-32, UTF-32BE, UTF-32LE
The UTF-32 family is pretty much like the UTF-16 family, except that
the units are 32-bit, and therefore the surrogate scheme is not
needed. UTF-32 is a fixed-width encoding. The C<BOM> signatures are
C<0x00 0x00 0xFE 0xFF> for BE and C<0xFF 0xFE 0x00 0x00> for LE.
=item *
UCS-2, UCS-4
Legacy, fixed-width encodings defined by the ISO 10646 standard. UCS-2 is a 16-bit
encoding. Unlike UTF-16, UCS-2 is not extensible beyond C<U+FFFF>,
because it does not use surrogates. UCS-4 is a 32-bit encoding,
functionally identical to UTF-32 (the difference being that
UCS-4 forbids neither surrogates nor code points larger than C<0x10_FFFF>).
=item *
UTF-7
A seven-bit safe (non-eight-bit) encoding, which is useful if the
transport or storage is not eight-bit safe. Defined by RFC 2152.
=back
=head2 Noncharacter code points
66 code points are set aside in Unicode as "noncharacter code points".
These all have the C<Unassigned> (C<Cn>) C<L</General_Category>>, and
no character will ever be assigned to any of them. They are the 32 code
points between C<U+FDD0> and C<U+FDEF> inclusive, and the 34 code
points:
U+FFFE U+FFFF
U+1FFFE U+1FFFF
U+2FFFE U+2FFFF
...
U+EFFFE U+EFFFF
U+FFFFE U+FFFFF
U+10FFFE U+10FFFF
Until Unicode 7.0, the noncharacters were "B<forbidden> for use in open
interchange of Unicode text data", so that code that processed those
streams could use these code points as sentinels that could be mixed in
with character data, and would always be distinguishable from that data.
(Emphasis above and in the next paragraph are added in this document.)
Unicode 7.0 changed the wording so that they are "B<not recommended> for
use in open interchange of Unicode text data". The 7.0 Standard goes on
to say:
=over 4
"If a noncharacter is received in open interchange, an application is
not required to interpret it in any way. It is good practice, however,
to recognize it as a noncharacter and to take appropriate action, such
as replacing it with C<U+FFFD> replacement character, to indicate the
problem in the text. It is not recommended to simply delete
noncharacter code points from such text, because of the potential
security issues caused by deleting uninterpreted characters. (See
conformance clause C7 in Section 3.2, Conformance Requirements, and
L<Unicode Technical Report #36, "Unicode Security
Considerations"|http://www.unicode.org/reports/tr36/#Substituting_for_Ill_Formed_Subsequences>)."
=back
This change was made because it was found that various commercial tools
like editors, or for things like source code control, had been written
so that they would not handle program files that used these code points,
effectively precluding their use almost entirely! And that was never
the intent. They've always been meant to be usable within an
application, or cooperating set of applications, at will.
If you're writing code, such as an editor, that is supposed to be able
to handle any Unicode text data, then you shouldn't be using these code
points yourself, and instead allow them in the input. If you need
sentinels, they should instead be something that isn't legal Unicode.
For UTF-8 data, you can use the bytes 0xC1 and 0xC2 as sentinels, as
they never appear in well-formed UTF-8. (There are equivalents for
UTF-EBCDIC). You can also store your Unicode code points in integer
variables and use negative values as sentinels.
If you're not writing such a tool, then whether you accept noncharacters
as input is up to you (though the Standard recommends that you not). If
you do strict input stream checking with Perl, these code points
continue to be forbidden. This is to maintain backward compatibility
(otherwise potential security holes could open up, as an unsuspecting
application that was written assuming the noncharacters would be
filtered out before getting to it, could now, without warning, start
getting them). To do strict checking, you can use the layer
C<:encoding('UTF-8')>.
Perl continues to warn (using the warning category C<"nonchar">, which
is a sub-category of C<"utf8">) if an attempt is made to output
noncharacters.
=head2 Beyond Unicode code points
The maximum Unicode code point is C<U+10FFFF>, and Unicode only defines
operations on code points up through that. But Perl works on code
points up to the maximum permissible unsigned number available on the
platform. However, Perl will not accept these from input streams unless
lax rules are being used, and will warn (using the warning category
C<"non_unicode">, which is a sub-category of C<"utf8">) if any are output.
Since Unicode rules are not defined on these code points, if a
Unicode-defined operation is done on them, Perl uses what we believe are
sensible rules, while generally warning, using the C<"non_unicode">
category. For example, C<uc("\x{11_0000}")> will generate such a
warning, returning the input parameter as its result, since Perl defines
the uppercase of every non-Unicode code point to be the code point
itself. (All the case changing operations, not just uppercasing, work
this way.)
The situation with matching Unicode properties in regular expressions,
the C<\p{}> and C<\P{}> constructs, against these code points is not as
clear cut, and how these are handled has changed as we've gained
experience.
One possibility is to treat any match against these code points as
undefined. But since Perl doesn't have the concept of a match being
undefined, it converts this to failing or C<FALSE>. This is almost, but
not quite, what Perl did from v5.14 (when use of these code points
became generally reliable) through v5.18. The difference is that Perl
treated all C<\p{}> matches as failing, but all C<\P{}> matches as
succeeding.
One problem with this is that it leads to unexpected, and confusing
results in some cases:
chr(0x110000) =~ \p{ASCII_Hex_Digit=True} # Failed on <= v5.18
chr(0x110000) =~ \p{ASCII_Hex_Digit=False} # Failed! on <= v5.18
That is, it treated both matches as undefined, and converted that to
false (raising a warning on each). The first case is the expected
result, but the second is likely counterintuitive: "How could both be
false when they are complements?" Another problem was that the
implementation optimized many Unicode property matches down to already
existing simpler, faster operations, which don't raise the warning. We
chose to not forgo those optimizations, which help the vast majority of
matches, just to generate a warning for the unlikely event that an
above-Unicode code point is being matched against.
As a result of these problems, starting in v5.20, what Perl does is
to treat non-Unicode code points as just typical unassigned Unicode
characters, and matches accordingly. (Note: Unicode has atypical
unassigned code points. For example, it has noncharacter code points,
and ones that, when they do get assigned, are destined to be written
Right-to-left, as Arabic and Hebrew are. Perl assumes that no
non-Unicode code point has any atypical properties.)
Perl, in most cases, will raise a warning when matching an above-Unicode
code point against a Unicode property when the result is C<TRUE> for
C<\p{}>, and C<FALSE> for C<\P{}>. For example:
chr(0x110000) =~ \p{ASCII_Hex_Digit=True} # Fails, no warning
chr(0x110000) =~ \p{ASCII_Hex_Digit=False} # Succeeds, with warning
In both these examples, the character being matched is non-Unicode, so
Unicode doesn't define how it should match. It clearly isn't an ASCII
hex digit, so the first example clearly should fail, and so it does,
with no warning. But it is arguable that the second example should have
an undefined, hence C<FALSE>, result. So a warning is raised for it.
Thus the warning is raised for many fewer cases than in earlier Perls,
and only when what the result is could be arguable. It turns out that
none of the optimizations made by Perl (or are ever likely to be made)
cause the warning to be skipped, so it solves both problems of Perl's
earlier approach. The most commonly used property that is affected by
this change is C<\p{Unassigned}> which is a short form for
C<\p{General_Category=Unassigned}>. Starting in v5.20, all non-Unicode
code points are considered C<Unassigned>. In earlier releases the
matches failed because the result was considered undefined.
The only place where the warning is not raised when it might ought to
have been is if optimizations cause the whole pattern match to not even
be attempted. For example, Perl may figure out that for a string to
match a certain regular expression pattern, the string has to contain
the substring C<"foobar">. Before attempting the match, Perl may look
for that substring, and if not found, immediately fail the match without
actually trying it; so no warning gets generated even if the string
contains an above-Unicode code point.
This behavior is more "Do what I mean" than in earlier Perls for most
applications. But it catches fewer issues for code that needs to be
strictly Unicode compliant. Therefore there is an additional mode of
operation available to accommodate such code. This mode is enabled if a
regular expression pattern is compiled within the lexical scope where
the C<"non_unicode"> warning class has been made fatal, say by:
use warnings FATAL => "non_unicode"
(see L<warnings>). In this mode of operation, Perl will raise the
warning for all matches against a non-Unicode code point (not just the
arguable ones), and it skips the optimizations that might cause the
warning to not be output. (It currently still won't warn if the match
isn't even attempted, like in the C<"foobar"> example above.)
In summary, Perl now normally treats non-Unicode code points as typical
Unicode unassigned code points for regular expression matches, raising a
warning only when it is arguable what the result should be. However, if
this warning has been made fatal, it isn't skipped.
There is one exception to all this. C<\p{All}> looks like a Unicode
property, but it is a Perl extension that is defined to be true for all
possible code points, Unicode or not, so no warning is ever generated
when matching this against a non-Unicode code point. (Prior to v5.20,
it was an exact synonym for C<\p{Any}>, matching code points C<0>
through C<0x10FFFF>.)
=head2 Security Implications of Unicode
First, read
L<Unicode Security Considerations|http://www.unicode.org/reports/tr36>.
Also, note the following:
=over 4
=item *
Malformed UTF-8
UTF-8 is very structured, so many combinations of bytes are invalid. In
the past, Perl tried to soldier on and make some sense of invalid
combinations, but this can lead to security holes, so now, if the Perl
core needs to process an invalid combination, it will either raise a
fatal error, or will replace those bytes by the sequence that forms the
Unicode REPLACEMENT CHARACTER, for which purpose Unicode created it.
Every code point can be represented by more than one possible
syntactically valid UTF-8 sequence. Early on, both Unicode and Perl
considered any of these to be valid, but now, all sequences longer
than the shortest possible one are considered to be malformed.
Unicode considers many code points to be illegal, or to be avoided.
Perl generally accepts them, once they have passed through any input
filters that may try to exclude them. These have been discussed above
(see "Surrogates" under UTF-16 in L</Unicode Encodings>,
L</Noncharacter code points>, and L</Beyond Unicode code points>).
=item *
Regular expression pattern matching may surprise you if you're not
accustomed to Unicode. Starting in Perl 5.14, several pattern
modifiers are available to control this, called the character set
modifiers. Details are given in L<perlre/Character set modifiers>.
=back
As discussed elsewhere, Perl has one foot (two hooves?) planted in
each of two worlds: the old world of ASCII and single-byte locales, and
the new world of Unicode, upgrading when necessary.
If your legacy code does not explicitly use Unicode, no automatic
switch-over to Unicode should happen.
=head2 Unicode in Perl on EBCDIC
Unicode is supported on EBCDIC platforms. See L<perlebcdic>.
Unless ASCII vs. EBCDIC issues are specifically being discussed,
references to UTF-8 encoding in this document and elsewhere should be
read as meaning UTF-EBCDIC on EBCDIC platforms.
See L<perlebcdic/Unicode and UTF>.
Because UTF-EBCDIC is so similar to UTF-8, the differences are mostly
hidden from you; S<C<use utf8>> (and NOT something like
S<C<use utfebcdic>>) declares the the script is in the platform's
"native" 8-bit encoding of Unicode. (Similarly for the C<":utf8">
layer.)
=head2 Locales
See L<perllocale/Unicode and UTF-8>
=head2 When Unicode Does Not Happen
There are still many places where Unicode (in some encoding or
another) could be given as arguments or received as results, or both in
Perl, but it is not, in spite of Perl having extensive ways to input and
output in Unicode, and a few other "entry points" like the C<@ARGV>
array (which can sometimes be interpreted as UTF-8).
The following are such interfaces. Also, see L</The "Unicode Bug">.
For all of these interfaces Perl
currently (as of v5.16.0) simply assumes byte strings both as arguments
and results, or UTF-8 strings if the (deprecated) C<encoding> pragma has been used.
One reason that Perl does not attempt to resolve the role of Unicode in
these situations is that the answers are highly dependent on the operating
system and the file system(s). For example, whether filenames can be
in Unicode and in exactly what kind of encoding, is not exactly a
portable concept. Similarly for C<qx> and C<system>: how well will the
"command-line interface" (and which of them?) handle Unicode?
=over 4
=item *
C<chdir>, C<chmod>, C<chown>, C<chroot>, C<exec>, C<link>, C<lstat>, C<mkdir>,
C<rename>, C<rmdir>, C<stat>, C<symlink>, C<truncate>, C<unlink>, C<utime>, C<-X>
=item *
C<%ENV>
=item *
C<glob> (aka the C<E<lt>*E<gt>>)
=item *
C<open>, C<opendir>, C<sysopen>
=item *
C<qx> (aka the backtick operator), C<system>
=item *
C<readdir>, C<readlink>
=back
=head2 The "Unicode Bug"
The term, "Unicode bug" has been applied to an inconsistency with the
code points in the C<Latin-1 Supplement> block, that is, between
128 and 255. Without a locale specified, unlike all other characters or
code points, these characters can have very different semantics
depending on the rules in effect. (Characters whose code points are
above 255 force Unicode rules; whereas the rules for ASCII characters
are the same under both ASCII and Unicode rules.)
Under Unicode rules, these upper-Latin1 characters are interpreted as
Unicode code points, which means they have the same semantics as Latin-1
(ISO-8859-1) and C1 controls.
As explained in L</ASCII Rules versus Unicode Rules>, under ASCII rules,
they are considered to be unassigned characters.
This can lead to unexpected results. For example, a string's
semantics can suddenly change if a code point above 255 is appended to
it, which changes the rules from ASCII to Unicode. As an
example, consider the following program and its output:
$ perl -le'
no feature "unicode_strings";
$s1 = "\xC2";
$s2 = "\x{2660}";
for ($s1, $s2, $s1.$s2) {
print /\w/ || 0;
}
'
0
0
1
If there's no C<\w> in C<s1> nor in C<s2>, why does their concatenation
have one?
This anomaly stems from Perl's attempt to not disturb older programs that
didn't use Unicode, along with Perl's desire to add Unicode support
seamlessly. But the result turned out to not be seamless. (By the way,
you can choose to be warned when things like this happen. See
C<L<encoding::warnings>>.)
L<S<C<use feature 'unicode_strings'>>|feature/The 'unicode_strings' feature>
was added, starting in Perl v5.12, to address this problem. It affects
these things:
=over 4
=item *
Changing the case of a scalar, that is, using C<uc()>, C<ucfirst()>, C<lc()>,
and C<lcfirst()>, or C<\L>, C<\U>, C<\u> and C<\l> in double-quotish
contexts, such as regular expression substitutions.
Under C<unicode_strings> starting in Perl 5.12.0, Unicode rules are
generally used. See L<perlfunc/lc> for details on how this works
in combination with various other pragmas.
=item *
Using caseless (C</i>) regular expression matching.
Starting in Perl 5.14.0, regular expressions compiled within
the scope of C<unicode_strings> use Unicode rules
even when executed or compiled into larger
regular expressions outside the scope.
=item *
Matching any of several properties in regular expressions.
These properties are C<\b> (without braces), C<\B> (without braces),
C<\s>, C<\S>, C<\w>, C<\W>, and all the Posix character classes
I<except> C<[[:ascii:]]>.
Starting in Perl 5.14.0, regular expressions compiled within
the scope of C<unicode_strings> use Unicode rules
even when executed or compiled into larger
regular expressions outside the scope.
=item *
In C<quotemeta> or its inline equivalent C<\Q>.
Starting in Perl 5.16.0, consistent quoting rules are used within the
scope of C<unicode_strings>, as described in L<perlfunc/quotemeta>.
Prior to that, or outside its scope, no code points above 127 are quoted
in UTF-8 encoded strings, but in byte encoded strings, code points
between 128-255 are always quoted.
=item *
In the C<..> or L<range|perlop/Range Operators> operator.
Starting in Perl 5.26.0, the range operator on strings treats their lengths
consistently within the scope of C<unicode_strings>. Prior to that, or
outside its scope, it could produce strings whose length in characters
exceeded that of the right-hand side, where the right-hand side took up more
bytes than the correct range endpoint.
=item *
In L<< C<split>'s special-case whitespace splitting|perlfunc/split >>.
Starting in Perl 5.28.0, the C<split> function with a pattern specified as
a string containing a single space handles whitespace characters consistently
within the scope of of C<unicode_strings>. Prior to that, or outside its scope,
characters that are whitespace according to Unicode rules but not according to
ASCII rules were treated as field contents rather than field separators when
they appear in byte-encoded strings.
=back
You can see from the above that the effect of C<unicode_strings>
increased over several Perl releases. (And Perl's support for Unicode
continues to improve; it's best to use the latest available release in
order to get the most complete and accurate results possible.) Note that
C<unicode_strings> is automatically chosen if you S<C<use 5.012>> or
higher.
For Perls earlier than those described above, or when a string is passed
to a function outside the scope of C<unicode_strings>, see the next section.
=head2 Forcing Unicode in Perl (Or Unforcing Unicode in Perl)
Sometimes (see L</"When Unicode Does Not Happen"> or L</The "Unicode Bug">)
there are situations where you simply need to force a byte
string into UTF-8, or vice versa. The standard module L<Encode> can be
used for this, or the low-level calls
L<C<utf8::upgrade($bytestring)>|utf8/Utility functions> and
L<C<utf8::downgrade($utf8string[, FAIL_OK])>|utf8/Utility functions>.
Note that C<utf8::downgrade()> can fail if the string contains characters
that don't fit into a byte.
Calling either function on a string that already is in the desired state is a
no-op.
L</ASCII Rules versus Unicode Rules> gives all the ways that a string is
made to use Unicode rules.
=head2 Using Unicode in XS
See L<perlguts/"Unicode Support"> for an introduction to Unicode at
the XS level, and L<perlapi/Unicode Support> for the API details.
=head2 Hacking Perl to work on earlier Unicode versions (for very serious hackers only)
Perl by default comes with the latest supported Unicode version built-in, but
the goal is to allow you to change to use any earlier one. In Perls
v5.20 and v5.22, however, the earliest usable version is Unicode 5.1.
Perl v5.18 and v5.24 are able to handle all earlier versions.
Download the files in the desired version of Unicode from the Unicode web
site L<http://www.unicode.org>). These should replace the existing files in
F<lib/unicore> in the Perl source tree. Follow the instructions in
F<README.perl> in that directory to change some of their names, and then build
perl (see L<INSTALL>).
=head2 Porting code from perl-5.6.X
Perls starting in 5.8 have a different Unicode model from 5.6. In 5.6 the
programmer was required to use the C<utf8> pragma to declare that a
given scope expected to deal with Unicode data and had to make sure that
only Unicode data were reaching that scope. If you have code that is
working with 5.6, you will need some of the following adjustments to
your code. The examples are written such that the code will continue to
work under 5.6, so you should be safe to try them out.
=over 3
=item *
A filehandle that should read or write UTF-8
if ($] > 5.008) {
binmode $fh, ":encoding(UTF-8)";
}
=item *
A scalar that is going to be passed to some extension
Be it C<Compress::Zlib>, C<Apache::Request> or any extension that has no
mention of Unicode in the manpage, you need to make sure that the
UTF8 flag is stripped off. Note that at the time of this writing
(January 2012) the mentioned modules are not UTF-8-aware. Please
check the documentation to verify if this is still true.
if ($] > 5.008) {
require Encode;
$val = Encode::encode("UTF-8", $val); # make octets
}
=item *
A scalar we got back from an extension
If you believe the scalar comes back as UTF-8, you will most likely
want the UTF8 flag restored:
if ($] > 5.008) {
require Encode;
$val = Encode::decode("UTF-8", $val);
}
=item *
Same thing, if you are really sure it is UTF-8
if ($] > 5.008) {
require Encode;
Encode::_utf8_on($val);
}
=item *
A wrapper for L<DBI> C<fetchrow_array> and C<fetchrow_hashref>
When the database contains only UTF-8, a wrapper function or method is
a convenient way to replace all your C<fetchrow_array> and
C<fetchrow_hashref> calls. A wrapper function will also make it easier to
adapt to future enhancements in your database driver. Note that at the
time of this writing (January 2012), the DBI has no standardized way
to deal with UTF-8 data. Please check the L<DBI documentation|DBI> to verify if
that is still true.
sub fetchrow {
# $what is one of fetchrow_{array,hashref}
my($self, $sth, $what) = @_;
if ($] < 5.008) {
return $sth->$what;
} else {
require Encode;
if (wantarray) {
my @arr = $sth->$what;
for (@arr) {
defined && /[^\000-\177]/ && Encode::_utf8_on($_);
}
return @arr;
} else {
my $ret = $sth->$what;
if (ref $ret) {
for my $k (keys %$ret) {
defined
&& /[^\000-\177]/
&& Encode::_utf8_on($_) for $ret->{$k};
}
return $ret;
} else {
defined && /[^\000-\177]/ && Encode::_utf8_on($_) for $ret;
return $ret;
}
}
}
}
=item *
A large scalar that you know can only contain ASCII
Scalars that contain only ASCII and are marked as UTF-8 are sometimes
a drag to your program. If you recognize such a situation, just remove
the UTF8 flag:
utf8::downgrade($val) if $] > 5.008;
=back
=head1 BUGS
See also L</The "Unicode Bug"> above.
=head2 Interaction with Extensions
When Perl exchanges data with an extension, the extension should be
able to understand the UTF8 flag and act accordingly. If the
extension doesn't recognize that flag, it's likely that the extension
will return incorrectly-flagged data.
So if you're working with Unicode data, consult the documentation of
every module you're using if there are any issues with Unicode data
exchange. If the documentation does not talk about Unicode at all,
suspect the worst and probably look at the source to learn how the
module is implemented. Modules written completely in Perl shouldn't
cause problems. Modules that directly or indirectly access code written
in other programming languages are at risk.
For affected functions, the simple strategy to avoid data corruption is
to always make the encoding of the exchanged data explicit. Choose an
encoding that you know the extension can handle. Convert arguments passed
to the extensions to that encoding and convert results back from that
encoding. Write wrapper functions that do the conversions for you, so
you can later change the functions when the extension catches up.
To provide an example, let's say the popular C<Foo::Bar::escape_html>
function doesn't deal with Unicode data yet. The wrapper function
would convert the argument to raw UTF-8 and convert the result back to
Perl's internal representation like so:
sub my_escape_html ($) {
my($what) = shift;
return unless defined $what;
Encode::decode("UTF-8", Foo::Bar::escape_html(
Encode::encode("UTF-8", $what)));
}
Sometimes, when the extension does not convert data but just stores
and retrieves it, you will be able to use the otherwise
dangerous L<C<Encode::_utf8_on()>|Encode/_utf8_on> function. Let's say
the popular C<Foo::Bar> extension, written in C, provides a C<param>
method that lets you store and retrieve data according to these prototypes:
$self->param($name, $value); # set a scalar
$value = $self->param($name); # retrieve a scalar
If it does not yet provide support for any encoding, one could write a
derived class with such a C<param> method:
sub param {
my($self,$name,$value) = @_;
utf8::upgrade($name); # make sure it is UTF-8 encoded
if (defined $value) {
utf8::upgrade($value); # make sure it is UTF-8 encoded
return $self->SUPER::param($name,$value);
} else {
my $ret = $self->SUPER::param($name);
Encode::_utf8_on($ret); # we know, it is UTF-8 encoded
return $ret;
}
}
Some extensions provide filters on data entry/exit points, such as
C<DB_File::filter_store_key> and family. Look out for such filters in
the documentation of your extensions; they can make the transition to
Unicode data much easier.
=head2 Speed
Some functions are slower when working on UTF-8 encoded strings than
on byte encoded strings. All functions that need to hop over
characters such as C<length()>, C<substr()> or C<index()>, or matching
regular expressions can work B<much> faster when the underlying data are
byte-encoded.
In Perl 5.8.0 the slowness was often quite spectacular; in Perl 5.8.1
a caching scheme was introduced which improved the situation. In general,
operations with UTF-8 encoded strings are still slower. As an example,
the Unicode properties (character classes) like C<\p{Nd}> are known to
be quite a bit slower (5-20 times) than their simpler counterparts
like C<[0-9]> (then again, there are hundreds of Unicode characters matching
C<Nd> compared with the 10 ASCII characters matching C<[0-9]>).
=head1 SEE ALSO
L<perlunitut>, L<perluniintro>, L<perluniprops>, L<Encode>, L<open>, L<utf8>, L<bytes>,
L<perlretut>, L<perlvar/"${^UNICODE}">,
L<http://www.unicode.org/reports/tr44>).
=cut
PK z3�Z�
8fI fI perlreftut.podnu �[��� =head1 NAME
perlreftut - Mark's very short tutorial about references
=head1 DESCRIPTION
One of the most important new features in Perl 5 was the capability to
manage complicated data structures like multidimensional arrays and
nested hashes. To enable these, Perl 5 introduced a feature called
I<references>, and using references is the key to managing complicated,
structured data in Perl. Unfortunately, there's a lot of funny syntax
to learn, and the main manual page can be hard to follow. The manual
is quite complete, and sometimes people find that a problem, because
it can be hard to tell what is important and what isn't.
Fortunately, you only need to know 10% of what's in the main page to get
90% of the benefit. This page will show you that 10%.
=head1 Who Needs Complicated Data Structures?
One problem that comes up all the time is needing a hash whose values are
lists. Perl has hashes, of course, but the values have to be scalars;
they can't be lists.
Why would you want a hash of lists? Let's take a simple example: You
have a file of city and country names, like this:
Chicago, USA
Frankfurt, Germany
Berlin, Germany
Washington, USA
Helsinki, Finland
New York, USA
and you want to produce an output like this, with each country mentioned
once, and then an alphabetical list of the cities in that country:
Finland: Helsinki.
Germany: Berlin, Frankfurt.
USA: Chicago, New York, Washington.
The natural way to do this is to have a hash whose keys are country
names. Associated with each country name key is a list of the cities in
that country. Each time you read a line of input, split it into a country
and a city, look up the list of cities already known to be in that
country, and append the new city to the list. When you're done reading
the input, iterate over the hash as usual, sorting each list of cities
before you print it out.
If hash values couldn't be lists, you lose. You'd probably have to
combine all the cities into a single string somehow, and then when
time came to write the output, you'd have to break the string into a
list, sort the list, and turn it back into a string. This is messy
and error-prone. And it's frustrating, because Perl already has
perfectly good lists that would solve the problem if only you could
use them.
=head1 The Solution
By the time Perl 5 rolled around, we were already stuck with this
design: Hash values must be scalars. The solution to this is
references.
A reference is a scalar value that I<refers to> an entire array or an
entire hash (or to just about anything else). Names are one kind of
reference that you're already familiar with. Think of the President
of the United States: a messy, inconvenient bag of blood and bones.
But to talk about him, or to represent him in a computer program, all
you need is the easy, convenient scalar string "Barack Obama".
References in Perl are like names for arrays and hashes. They're
Perl's private, internal names, so you can be sure they're
unambiguous. Unlike "Barack Obama", a reference only refers to one
thing, and you always know what it refers to. If you have a reference
to an array, you can recover the entire array from it. If you have a
reference to a hash, you can recover the entire hash. But the
reference is still an easy, compact scalar value.
You can't have a hash whose values are arrays; hash values can only be
scalars. We're stuck with that. But a single reference can refer to
an entire array, and references are scalars, so you can have a hash of
references to arrays, and it'll act a lot like a hash of arrays, and
it'll be just as useful as a hash of arrays.
We'll come back to this city-country problem later, after we've seen
some syntax for managing references.
=head1 Syntax
There are just two ways to make a reference, and just two ways to use
it once you have it.
=head2 Making References
=head3 B<Make Rule 1>
If you put a C<\> in front of a variable, you get a
reference to that variable.
$aref = \@array; # $aref now holds a reference to @array
$href = \%hash; # $href now holds a reference to %hash
$sref = \$scalar; # $sref now holds a reference to $scalar
Once the reference is stored in a variable like $aref or $href, you
can copy it or store it just the same as any other scalar value:
$xy = $aref; # $xy now holds a reference to @array
$p[3] = $href; # $p[3] now holds a reference to %hash
$z = $p[3]; # $z now holds a reference to %hash
These examples show how to make references to variables with names.
Sometimes you want to make an array or a hash that doesn't have a
name. This is analogous to the way you like to be able to use the
string C<"\n"> or the number 80 without having to store it in a named
variable first.
=head3 B<Make Rule 2>
C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to
that array. C<{ ITEMS }> makes a new, anonymous hash, and returns a
reference to that hash.
$aref = [ 1, "foo", undef, 13 ];
# $aref now holds a reference to an array
$href = { APR => 4, AUG => 8 };
# $href now holds a reference to a hash
The references you get from rule 2 are the same kind of
references that you get from rule 1:
# This:
$aref = [ 1, 2, 3 ];
# Does the same as this:
@array = (1, 2, 3);
$aref = \@array;
The first line is an abbreviation for the following two lines, except
that it doesn't create the superfluous array variable C<@array>.
If you write just C<[]>, you get a new, empty anonymous array.
If you write just C<{}>, you get a new, empty anonymous hash.
=head2 Using References
What can you do with a reference once you have it? It's a scalar
value, and we've seen that you can store it as a scalar and get it back
again just like any scalar. There are just two more ways to use it:
=head3 B<Use Rule 1>
You can always use an array reference, in curly braces, in place of
the name of an array. For example, C<@{$aref}> instead of C<@array>.
Here are some examples of that:
Arrays:
@a @{$aref} An array
reverse @a reverse @{$aref} Reverse the array
$a[3] ${$aref}[3] An element of the array
$a[3] = 17; ${$aref}[3] = 17 Assigning an element
On each line are two expressions that do the same thing. The
left-hand versions operate on the array C<@a>. The right-hand
versions operate on the array that is referred to by C<$aref>. Once
they find the array they're operating on, both versions do the same
things to the arrays.
Using a hash reference is I<exactly> the same:
%h %{$href} A hash
keys %h keys %{$href} Get the keys from the hash
$h{'red'} ${$href}{'red'} An element of the hash
$h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element
Whatever you want to do with a reference, B<Use Rule 1> tells you how
to do it. You just write the Perl code that you would have written
for doing the same thing to a regular array or hash, and then replace
the array or hash name with C<{$reference}>. "How do I loop over an
array when all I have is a reference?" Well, to loop over an array, you
would write
for my $element (@array) {
...
}
so replace the array name, C<@array>, with the reference:
for my $element (@{$aref}) {
...
}
"How do I print out the contents of a hash when all I have is a
reference?" First write the code for printing out a hash:
for my $key (keys %hash) {
print "$key => $hash{$key}\n";
}
And then replace the hash name with the reference:
for my $key (keys %{$href}) {
print "$key => ${$href}{$key}\n";
}
=head3 B<Use Rule 2>
L<B<Use Rule 1>|/B<Use Rule 1>> is all you really need, because it tells
you how to do absolutely everything you ever need to do with references.
But the most common thing to do with an array or a hash is to extract a
single element, and the L<B<Use Rule 1>|/B<Use Rule 1>> notation is
cumbersome. So there is an abbreviation.
C<${$aref}[3]> is too hard to read, so you can write C<< $aref->[3] >>
instead.
C<${$href}{red}> is too hard to read, so you can write
C<< $href->{red} >> instead.
If C<$aref> holds a reference to an array, then C<< $aref->[3] >> is
the fourth element of the array. Don't confuse this with C<$aref[3]>,
which is the fourth element of a totally different array, one
deceptively named C<@aref>. C<$aref> and C<@aref> are unrelated the
same way that C<$item> and C<@item> are.
Similarly, C<< $href->{'red'} >> is part of the hash referred to by
the scalar variable C<$href>, perhaps even one with no name.
C<$href{'red'}> is part of the deceptively named C<%href> hash. It's
easy to forget to leave out the C<< -> >>, and if you do, you'll get
bizarre results when your program gets array and hash elements out of
totally unexpected hashes and arrays that weren't the ones you wanted
to use.
=head2 An Example
Let's see a quick example of how all this is useful.
First, remember that C<[1, 2, 3]> makes an anonymous array containing
C<(1, 2, 3)>, and gives you a reference to that array.
Now think about
@a = ( [1, 2, 3],
[4, 5, 6],
[7, 8, 9]
);
C<@a> is an array with three elements, and each one is a reference to
another array.
C<$a[1]> is one of these references. It refers to an array, the array
containing C<(4, 5, 6)>, and because it is a reference to an array,
L<B<Use Rule 2>|/B<Use Rule 2>> says that we can write C<< $a[1]->[2] >>
to get the third element from that array. C<< $a[1]->[2] >> is the 6.
Similarly, C<< $a[0]->[1] >> is the 2. What we have here is like a
two-dimensional array; you can write C<< $a[ROW]->[COLUMN] >> to get or
set the element in any row and any column of the array.
The notation still looks a little cumbersome, so there's one more
abbreviation:
=head2 Arrow Rule
In between two B<subscripts>, the arrow is optional.
Instead of C<< $a[1]->[2] >>, we can write C<$a[1][2]>; it means the
same thing. Instead of C<< $a[0]->[1] = 23 >>, we can write
C<$a[0][1] = 23>; it means the same thing.
Now it really looks like two-dimensional arrays!
You can see why the arrows are important. Without them, we would have
had to write C<${$a[1]}[2]> instead of C<$a[1][2]>. For
three-dimensional arrays, they let us write C<$x[2][3][5]> instead of
the unreadable C<${${$x[2]}[3]}[5]>.
=head1 Solution
Here's the answer to the problem I posed earlier, of reformatting a
file of city and country names.
1 my %table;
2 while (<>) {
3 chomp;
4 my ($city, $country) = split /, /;
5 $table{$country} = [] unless exists $table{$country};
6 push @{$table{$country}}, $city;
7 }
8 for my $country (sort keys %table) {
9 print "$country: ";
10 my @cities = @{$table{$country}};
11 print join ', ', sort @cities;
12 print ".\n";
13 }
The program has two pieces: Lines 2-7 read the input and build a data
structure, and lines 8-13 analyze the data and print out the report.
We're going to have a hash, C<%table>, whose keys are country names,
and whose values are references to arrays of city names. The data
structure will look like this:
%table
+-------+---+
| | | +-----------+--------+
|Germany| *---->| Frankfurt | Berlin |
| | | +-----------+--------+
+-------+---+
| | | +----------+
|Finland| *---->| Helsinki |
| | | +----------+
+-------+---+
| | | +---------+------------+----------+
| USA | *---->| Chicago | Washington | New York |
| | | +---------+------------+----------+
+-------+---+
We'll look at output first. Supposing we already have this structure,
how do we print it out?
8 for my $country (sort keys %table) {
9 print "$country: ";
10 my @cities = @{$table{$country}};
11 print join ', ', sort @cities;
12 print ".\n";
13 }
C<%table> is an ordinary hash, and we get a list of keys from it, sort
the keys, and loop over the keys as usual. The only use of references
is in line 10. C<$table{$country}> looks up the key C<$country> in the
hash and gets the value, which is a reference to an array of cities in
that country. L<B<Use Rule 1>|/B<Use Rule 1>> says that we can recover
the array by saying C<@{$table{$country}}>. Line 10 is just like
@cities = @array;
except that the name C<array> has been replaced by the reference
C<{$table{$country}}>. The C<@> tells Perl to get the entire array.
Having gotten the list of cities, we sort it, join it, and print it
out as usual.
Lines 2-7 are responsible for building the structure in the first
place. Here they are again:
2 while (<>) {
3 chomp;
4 my ($city, $country) = split /, /;
5 $table{$country} = [] unless exists $table{$country};
6 push @{$table{$country}}, $city;
7 }
Lines 2-4 acquire a city and country name. Line 5 looks to see if the
country is already present as a key in the hash. If it's not, the
program uses the C<[]> notation (L<B<Make Rule 2>|/B<Make Rule 2>>) to
manufacture a new, empty anonymous array of cities, and installs a
reference to it into the hash under the appropriate key.
Line 6 installs the city name into the appropriate array.
C<$table{$country}> now holds a reference to the array of cities seen
in that country so far. Line 6 is exactly like
push @array, $city;
except that the name C<array> has been replaced by the reference
C<{$table{$country}}>. The L<C<push>|perlfunc/push ARRAY,LIST> adds a
city name to the end of the referred-to array.
There's one fine point I skipped. Line 5 is unnecessary, and we can
get rid of it.
2 while (<>) {
3 chomp;
4 my ($city, $country) = split /, /;
5 #### $table{$country} = [] unless exists $table{$country};
6 push @{$table{$country}}, $city;
7 }
If there's already an entry in C<%table> for the current C<$country>,
then nothing is different. Line 6 will locate the value in
C<$table{$country}>, which is a reference to an array, and push C<$city>
into the array. But what does it do when C<$country> holds a key, say
C<Greece>, that is not yet in C<%table>?
This is Perl, so it does the exact right thing. It sees that you want
to push C<Athens> onto an array that doesn't exist, so it helpfully
makes a new, empty, anonymous array for you, installs it into
C<%table>, and then pushes C<Athens> onto it. This is called
I<autovivification>--bringing things to life automatically. Perl saw
that the key wasn't in the hash, so it created a new hash entry
automatically. Perl saw that you wanted to use the hash value as an
array, so it created a new empty array and installed a reference to it
in the hash automatically. And as usual, Perl made the array one
element longer to hold the new city name.
=head1 The Rest
I promised to give you 90% of the benefit with 10% of the details, and
that means I left out 90% of the details. Now that you have an
overview of the important parts, it should be easier to read the
L<perlref> manual page, which discusses 100% of the details.
Some of the highlights of L<perlref>:
=over 4
=item *
You can make references to anything, including scalars, functions, and
other references.
=item *
In L<B<Use Rule 1>|/B<Use Rule 1>>, you can omit the curly brackets
whenever the thing inside them is an atomic scalar variable like
C<$aref>. For example, C<@$aref> is the same as C<@{$aref}>, and
C<$$aref[1]> is the same as C<${$aref}[1]>. If you're just starting
out, you may want to adopt the habit of always including the curly
brackets.
=item *
This doesn't copy the underlying array:
$aref2 = $aref1;
You get two references to the same array. If you modify
C<< $aref1->[23] >> and then look at
C<< $aref2->[23] >> you'll see the change.
To copy the array, use
$aref2 = [@{$aref1}];
This uses C<[...]> notation to create a new anonymous array, and
C<$aref2> is assigned a reference to the new array. The new array is
initialized with the contents of the array referred to by C<$aref1>.
Similarly, to copy an anonymous hash, you can use
$href2 = {%{$href1}};
=item *
To see if a variable contains a reference, use the
L<C<ref>|perlfunc/ref EXPR> function. It returns true if its argument
is a reference. Actually it's a little better than that: It returns
C<HASH> for hash references and C<ARRAY> for array references.
=item *
If you try to use a reference like a string, you get strings like
ARRAY(0x80f5dec) or HASH(0x826afc0)
If you ever see a string that looks like this, you'll know you
printed out a reference by mistake.
A side effect of this representation is that you can use
L<C<eq>|perlop/Equality Operators> to see if two references refer to the
same thing. (But you should usually use
L<C<==>|perlop/Equality Operators> instead because it's much faster.)
=item *
You can use a string as if it were a reference. If you use the string
C<"foo"> as an array reference, it's taken to be a reference to the
array C<@foo>. This is called a I<symbolic reference>. The declaration
L<C<use strict 'refs'>|strict> disables this feature, which can cause
all sorts of trouble if you use it by accident.
=back
You might prefer to go on to L<perllol> instead of L<perlref>; it
discusses lists of lists and multidimensional arrays in detail. After
that, you should move on to L<perldsc>; it's a Data Structure Cookbook
that shows recipes for using and printing out arrays of hashes, hashes
of arrays, and other kinds of data.
=head1 Summary
Everyone needs compound data structures, and in Perl the way you get
them is with references. There are four important rules for managing
references: Two for making references and two for using them. Once
you know these rules you can do most of the important things you need
to do with references.
=head1 Credits
Author: Mark Jason Dominus, Plover Systems (C<mjd-perl-ref+@plover.com>)
This article originally appeared in I<The Perl Journal>
( L<http://www.tpj.com/> ) volume 3, #2. Reprinted with permission.
The original title was I<Understand References Today>.
=head2 Distribution Conditions
Copyright 1998 The Perl Journal.
This documentation is free; you can redistribute it and/or modify it
under the same terms as Perl itself.
Irrespective of its distribution, all code examples in these files are
hereby placed into the public domain. You are permitted and
encouraged to use this code in your own programs for fun or for profit
as you see fit. A simple comment in the code giving credit would be
courteous but is not required.
=cut
PK z3�Z���b b perlartistic.podnu �[���
=head1 NAME
perlartistic - the Perl Artistic License
=head1 SYNOPSIS
You can refer to this document in Pod via "L<perlartistic>"
Or you can see this document by entering "perldoc perlartistic"
=head1 DESCRIPTION
Perl is free software; you can redistribute it and/or modify
it under the terms of either:
a) the GNU General Public License as published by the Free
Software Foundation; either version 1, or (at your option) any
later version, or
b) the "Artistic License" which comes with this Kit.
This is B<"The Artistic License">.
It's here so that modules, programs, etc., that want to declare
this as their distribution license can link to it.
For the GNU General Public License, see L<perlgpl>.
=head1 The "Artistic License"
=head2 Preamble
The intent of this document is to state the conditions under which a
Package may be copied, such that the Copyright Holder maintains some
semblance of artistic control over the development of the package,
while giving the users of the package the right to use and distribute
the Package in a more-or-less customary fashion, plus the right to make
reasonable modifications.
=head2 Definitions
=over
=item "Package"
refers to the collection of files distributed by the
Copyright Holder, and derivatives of that collection of files created
through textual modification.
=item "Standard Version"
refers to such a Package if it has not been
modified, or has been modified in accordance with the wishes of the
Copyright Holder as specified below.
=item "Copyright Holder"
is whoever is named in the copyright or
copyrights for the package.
=item "You"
is you, if you're thinking about copying or distributing this Package.
=item "Reasonable copying fee"
is whatever you can justify on the basis
of media cost, duplication charges, time of people involved, and so on.
(You will not be required to justify it to the Copyright Holder, but
only to the computing community at large as a market that must bear the
fee.)
=item "Freely Available"
means that no fee is charged for the item
itself, though there may be fees involved in handling the item. It also
means that recipients of the item may redistribute it under the same
conditions they received it.
=back
=head2 Conditions
=over
=item 1.
You may make and give away verbatim copies of the source form of the
Standard Version of this Package without restriction, provided that you
duplicate all of the original copyright notices and associated disclaimers.
=item 2.
You may apply bug fixes, portability fixes and other modifications
derived from the Public Domain or from the Copyright Holder. A Package
modified in such a way shall still be considered the Standard Version.
=item 3.
You may otherwise modify your copy of this Package in any way, provided
that you insert a prominent notice in each changed file stating how and
when you changed that file, and provided that you do at least ONE of the
following:
=over
=item a)
place your modifications in the Public Domain or otherwise make them
Freely Available, such as by posting said modifications to Usenet or an
equivalent medium, or placing the modifications on a major archive site
such as uunet.uu.net, or by allowing the Copyright Holder to include
your modifications in the Standard Version of the Package.
=item b)
use the modified Package only within your corporation or organization.
=item c)
rename any non-standard executables so the names do not conflict with
standard executables, which must also be provided, and provide a
separate manual page for each non-standard executable that clearly
documents how it differs from the Standard Version.
=item d)
make other distribution arrangements with the Copyright Holder.
=back
=item 4.
You may distribute the programs of this Package in object code or
executable form, provided that you do at least ONE of the following:
=over
=item a)
distribute a Standard Version of the executables and library files,
together with instructions (in the manual page or equivalent) on where
to get the Standard Version.
=item b)
accompany the distribution with the machine-readable source of the
Package with your modifications.
=item c)
give non-standard executables non-standard names, and clearly
document the differences in manual pages (or equivalent), together with
instructions on where to get the Standard Version.
=item d)
make other distribution arrangements with the Copyright Holder.
=back
=item 5.
You may charge a reasonable copying fee for any distribution of this
Package. You may charge any fee you choose for support of this
Package. You may not charge a fee for this Package itself. However,
you may distribute this Package in aggregate with other (possibly
commercial) programs as part of a larger (possibly commercial) software
distribution provided that you do not advertise this Package as a
product of your own. You may embed this Package's interpreter within
an executable of yours (by linking); this shall be construed as a mere
form of aggregation, provided that the complete Standard Version of the
interpreter is so embedded.
=item 6.
The scripts and library files supplied as input to or produced as
output from the programs of this Package do not automatically fall
under the copyright of this Package, but belong to whoever generated
them, and may be sold commercially, and may be aggregated with this
Package. If such scripts or library files are aggregated with this
Package via the so-called "undump" or "unexec" methods of producing a
binary executable image, then distribution of such an image shall
neither be construed as a distribution of this Package nor shall it
fall under the restrictions of Paragraphs 3 and 4, provided that you do
not represent such an executable image as a Standard Version of this
Package.
=item 7.
C subroutines (or comparably compiled subroutines in other
languages) supplied by you and linked into this Package in order to
emulate subroutines and variables of the language defined by this
Package shall not be considered part of this Package, but are the
equivalent of input as in Paragraph 6, provided these subroutines do
not change the language in any way that would cause it to fail the
regression tests for the language.
=item 8.
Aggregation of this Package with a commercial distribution is always
permitted provided that the use of this Package is embedded; that is,
when no overt attempt is made to make this Package's interfaces visible
to the end user of the commercial distribution. Such use shall not be
construed as a distribution of this Package.
=item 9.
The name of the Copyright Holder may not be used to endorse or promote
products derived from this software without specific prior written permission.
=item 10.
THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
=back
The End
=cut
PK z3�Z����� �� perlvms.podnu �[��� =head1 NAME
perlvms - VMS-specific documentation for Perl
=head1 DESCRIPTION
Gathered below are notes describing details of Perl 5's
behavior on VMS. They are a supplement to the regular Perl 5
documentation, so we have focussed on the ways in which Perl
5 functions differently under VMS than it does under Unix,
and on the interactions between Perl and the rest of the
operating system. We haven't tried to duplicate complete
descriptions of Perl features from the main Perl
documentation, which can be found in the F<[.pod]>
subdirectory of the Perl distribution.
We hope these notes will save you from confusion and lost
sleep when writing Perl scripts on VMS. If you find we've
missed something you think should appear here, please don't
hesitate to drop a line to vmsperl@perl.org.
=head1 Installation
Directions for building and installing Perl 5 can be found in
the file F<README.vms> in the main source directory of the
Perl distribution.
=head1 Organization of Perl Images
=head2 Core Images
During the build process, three Perl images are produced.
F<Miniperl.Exe> is an executable image which contains all of
the basic functionality of Perl, but cannot take advantage of
Perl XS extensions and has a hard-wired list of library locations
for loading pure-Perl modules. It is used extensively to build and
test Perl and various extensions, but is not installed.
Most of the complete Perl resides in the shareable image F<PerlShr.Exe>,
which provides a core to which the Perl executable image and all Perl
extensions are linked. It is generally located via the logical name
F<PERLSHR>. While it's possible to put the image in F<SYS$SHARE> to
make it loadable, that's not recommended. And while you may wish to
INSTALL the image for performance reasons, you should not install it
with privileges; if you do, the result will not be what you expect as
image privileges are disabled during Perl start-up.
Finally, F<Perl.Exe> is an executable image containing the main
entry point for Perl, as well as some initialization code. It
should be placed in a public directory, and made world executable.
In order to run Perl with command line arguments, you should
define a foreign command to invoke this image.
=head2 Perl Extensions
Perl extensions are packages which provide both XS and Perl code
to add new functionality to perl. (XS is a meta-language which
simplifies writing C code which interacts with Perl, see
L<perlxs> for more details.) The Perl code for an
extension is treated like any other library module - it's
made available in your script through the appropriate
C<use> or C<require> statement, and usually defines a Perl
package containing the extension.
The portion of the extension provided by the XS code may be
connected to the rest of Perl in either of two ways. In the
B<static> configuration, the object code for the extension is
linked directly into F<PerlShr.Exe>, and is initialized whenever
Perl is invoked. In the B<dynamic> configuration, the extension's
machine code is placed into a separate shareable image, which is
mapped by Perl's DynaLoader when the extension is C<use>d or
C<require>d in your script. This allows you to maintain the
extension as a separate entity, at the cost of keeping track of the
additional shareable image. Most extensions can be set up as either
static or dynamic.
The source code for an extension usually resides in its own
directory. At least three files are generally provided:
I<Extshortname>F<.xs> (where I<Extshortname> is the portion of
the extension's name following the last C<::>), containing
the XS code, I<Extshortname>F<.pm>, the Perl library module
for the extension, and F<Makefile.PL>, a Perl script which uses
the C<MakeMaker> library modules supplied with Perl to generate
a F<Descrip.MMS> file for the extension.
=head2 Installing static extensions
Since static extensions are incorporated directly into
F<PerlShr.Exe>, you'll have to rebuild Perl to incorporate a
new extension. You should edit the main F<Descrip.MMS> or F<Makefile>
you use to build Perl, adding the extension's name to the C<ext>
macro, and the extension's object file to the C<extobj> macro.
You'll also need to build the extension's object file, either
by adding dependencies to the main F<Descrip.MMS>, or using a
separate F<Descrip.MMS> for the extension. Then, rebuild
F<PerlShr.Exe> to incorporate the new code.
Finally, you'll need to copy the extension's Perl library
module to the F<[.>I<Extname>F<]> subdirectory under one
of the directories in C<@INC>, where I<Extname> is the name
of the extension, with all C<::> replaced by C<.> (e.g.
the library module for extension Foo::Bar would be copied
to a F<[.Foo.Bar]> subdirectory).
=head2 Installing dynamic extensions
In general, the distributed kit for a Perl extension includes
a file named Makefile.PL, which is a Perl program which is used
to create a F<Descrip.MMS> file which can be used to build and
install the files required by the extension. The kit should be
unpacked into a directory tree B<not> under the main Perl source
directory, and the procedure for building the extension is simply
$ perl Makefile.PL ! Create Descrip.MMS
$ mmk ! Build necessary files
$ mmk test ! Run test code, if supplied
$ mmk install ! Install into public Perl tree
VMS support for this process in the current release of Perl
is sufficient to handle most extensions. (See the MakeMaker
documentation for more details on installation options for
extensions.)
=over 4
=item *
the F<[.Lib.Auto.>I<Arch>I<$PVers>I<Extname>F<]> subdirectory
of one of the directories in C<@INC> (where I<PVers>
is the version of Perl you're using, as supplied in C<$]>,
with '.' converted to '_'), or
=item *
one of the directories in C<@INC>, or
=item *
a directory which the extensions Perl library module
passes to the DynaLoader when asking it to map
the shareable image, or
=item *
F<Sys$Share> or F<Sys$Library>.
=back
If the shareable image isn't in any of these places, you'll need
to define a logical name I<Extshortname>, where I<Extshortname>
is the portion of the extension's name after the last C<::>, which
translates to the full file specification of the shareable image.
=head1 File specifications
=head2 Syntax
We have tried to make Perl aware of both VMS-style and Unix-style file
specifications wherever possible. You may use either style, or both,
on the command line and in scripts, but you may not combine the two
styles within a single file specification. VMS Perl interprets Unix
pathnames in much the same way as the CRTL (I<e.g.> the first component
of an absolute path is read as the device name for the VMS file
specification). There are a set of functions provided in the
C<VMS::Filespec> package for explicit interconversion between VMS and
Unix syntax; its documentation provides more details.
We've tried to minimize the dependence of Perl library
modules on Unix syntax, but you may find that some of these,
as well as some scripts written for Unix systems, will
require that you use Unix syntax, since they will assume that
'/' is the directory separator, I<etc.> If you find instances
of this in the Perl distribution itself, please let us know,
so we can try to work around them.
Also when working on Perl programs on VMS, if you need a syntax
in a specific operating system format, then you need either to
check the appropriate DECC$ feature logical, or call a conversion
routine to force it to that format.
The feature logical name DECC$FILENAME_UNIX_REPORT modifies traditional
Perl behavior in the conversion of file specifications from Unix to VMS
format in order to follow the extended character handling rules now
expected by the CRTL. Specifically, when this feature is in effect, the
C<./.../> in a Unix path is now translated to C<[.^.^.^.]> instead of
the traditional VMS C<[...]>. To be compatible with what MakeMaker
expects, if a VMS path cannot be translated to a Unix path, it is
passed through unchanged, so C<unixify("[...]")> will return C<[...]>.
There are several ambiguous cases where a conversion routine cannot
determine whether an input filename is in Unix format or in VMS format,
since now both VMS and Unix file specifications may have characters in
them that could be mistaken for syntax delimiters of the other type. So
some pathnames simply cannot be used in a mode that allows either type
of pathname to be present. Perl will tend to assume that an ambiguous
filename is in Unix format.
Allowing "." as a version delimiter is simply incompatible with
determining whether a pathname is in VMS format or in Unix format with
extended file syntax. There is no way to know whether "perl-5.8.6" is a
Unix "perl-5.8.6" or a VMS "perl-5.8;6" when passing it to unixify() or
vmsify().
The DECC$FILENAME_UNIX_REPORT logical name controls how Perl interprets
filenames to the extent that Perl uses the CRTL internally for many
purposes, and attempts to follow CRTL conventions for reporting
filenames. The DECC$FILENAME_UNIX_ONLY feature differs in that it
expects all filenames passed to the C run-time to be already in Unix
format. This feature is not yet supported in Perl since Perl uses
traditional OpenVMS file specifications internally and in the test
harness, and it is not yet clear whether this mode will be useful or
useable. The feature logical name DECC$POSIX_COMPLIANT_PATHNAMES is new
with the RMS Symbolic Link SDK and included with OpenVMS v8.3, but is
not yet supported in Perl.
=head2 Filename Case
Perl enables DECC$EFS_CASE_PRESERVE and DECC$ARGV_PARSE_STYLE by
default. Note that the latter only takes effect when extended parse
is set in the process in which Perl is running. When these features
are explicitly disabled in the environment or the CRTL does not support
them, Perl follows the traditional CRTL behavior of downcasing command-line
arguments and returning file specifications in lower case only.
I<N. B.> It is very easy to get tripped up using a mixture of other
programs, external utilities, and Perl scripts that are in varying
states of being able to handle case preservation. For example, a file
created by an older version of an archive utility or a build utility
such as MMK or MMS may generate a filename in all upper case even on an
ODS-5 volume. If this filename is later retrieved by a Perl script or
module in a case preserving environment, that upper case name may not
match the mixed-case or lower-case expectations of the Perl code. Your
best bet is to follow an all-or-nothing approach to case preservation:
either don't use it at all, or make sure your entire toolchain and
application environment support and use it.
OpenVMS Alpha v7.3-1 and later and all version of OpenVMS I64 support
case sensitivity as a process setting (see C<SET PROCESS
/CASE_LOOKUP=SENSITIVE>). Perl does not currently support case
sensitivity on VMS, but it may in the future, so Perl programs should
use the C<< File::Spec->case_tolerant >> method to determine the state, and
not the C<$^O> variable.
=head2 Symbolic Links
When built on an ODS-5 volume with symbolic links enabled, Perl by
default supports symbolic links when the requisite support is available
in the filesystem and CRTL (generally 64-bit OpenVMS v8.3 and later).
There are a number of limitations and caveats to be aware of when
working with symbolic links on VMS. Most notably, the target of a valid
symbolic link must be expressed as a Unix-style path and it must exist
on a volume visible from your POSIX root (see the C<SHOW ROOT> command
in DCL help). For further details on symbolic link capabilities and
requirements, see chapter 12 of the CRTL manual that ships with OpenVMS
v8.3 or later.
=head2 Wildcard expansion
File specifications containing wildcards are allowed both on
the command line and within Perl globs (e.g. C<E<lt>*.cE<gt>>). If
the wildcard filespec uses VMS syntax, the resultant
filespecs will follow VMS syntax; if a Unix-style filespec is
passed in, Unix-style filespecs will be returned.
Similar to the behavior of wildcard globbing for a Unix shell,
one can escape command line wildcards with double quotation
marks C<"> around a perl program command line argument. However,
owing to the stripping of C<"> characters carried out by the C
handling of argv you will need to escape a construct such as
this one (in a directory containing the files F<PERL.C>, F<PERL.EXE>,
F<PERL.H>, and F<PERL.OBJ>):
$ perl -e "print join(' ',@ARGV)" perl.*
perl.c perl.exe perl.h perl.obj
in the following triple quoted manner:
$ perl -e "print join(' ',@ARGV)" """perl.*"""
perl.*
In both the case of unquoted command line arguments or in calls
to C<glob()> VMS wildcard expansion is performed. (csh-style
wildcard expansion is available if you use C<File::Glob::glob>.)
If the wildcard filespec contains a device or directory
specification, then the resultant filespecs will also contain
a device and directory; otherwise, device and directory
information are removed. VMS-style resultant filespecs will
contain a full device and directory, while Unix-style
resultant filespecs will contain only as much of a directory
path as was present in the input filespec. For example, if
your default directory is Perl_Root:[000000], the expansion
of C<[.t]*.*> will yield filespecs like
"perl_root:[t]base.dir", while the expansion of C<t/*/*> will
yield filespecs like "t/base.dir". (This is done to match
the behavior of glob expansion performed by Unix shells.)
Similarly, the resultant filespec will contain the file version
only if one was present in the input filespec.
=head2 Pipes
Input and output pipes to Perl filehandles are supported; the
"file name" is passed to lib$spawn() for asynchronous
execution. You should be careful to close any pipes you have
opened in a Perl script, lest you leave any "orphaned"
subprocesses around when Perl exits.
You may also use backticks to invoke a DCL subprocess, whose
output is used as the return value of the expression. The
string between the backticks is handled as if it were the
argument to the C<system> operator (see below). In this case,
Perl will wait for the subprocess to complete before continuing.
The mailbox (MBX) that perl can create to communicate with a pipe
defaults to a buffer size of 8192 on 64-bit systems, 512 on VAX. The
default buffer size is adjustable via the logical name PERL_MBX_SIZE
provided that the value falls between 128 and the SYSGEN parameter
MAXBUF inclusive. For example, to set the mailbox size to 32767 use
C<$ENV{'PERL_MBX_SIZE'} = 32767;> and then open and use pipe constructs.
An alternative would be to issue the command:
$ Define PERL_MBX_SIZE 32767
before running your wide record pipe program. A larger value may
improve performance at the expense of the BYTLM UAF quota.
=head1 PERL5LIB and PERLLIB
The PERL5LIB and PERLLIB environment elements work as documented in L<perl>,
except that the element separator is, by default, '|' instead of ':'.
However, when running under a Unix shell as determined by the logical
name C<GNV$UNIX_SHELL>, the separator will be ':' as on Unix systems. The
directory specifications may use either VMS or Unix syntax.
=head1 The Perl Forked Debugger
The Perl forked debugger places the debugger commands and output in a
separate X-11 terminal window so that commands and output from multiple
processes are not mixed together.
Perl on VMS supports an emulation of the forked debugger when Perl is
run on a VMS system that has X11 support installed.
To use the forked debugger, you need to have the default display set to an
X-11 Server and some environment variables set that Unix expects.
The forked debugger requires the environment variable C<TERM> to be C<xterm>,
and the environment variable C<DISPLAY> to exist. C<xterm> must be in
lower case.
$define TERM "xterm"
$define DISPLAY "hostname:0.0"
Currently the value of C<DISPLAY> is ignored. It is recommended that it be set
to be the hostname of the display, the server and screen in Unix notation. In
the future the value of DISPLAY may be honored by Perl instead of using the
default display.
It may be helpful to always use the forked debugger so that script I/O is
separated from debugger I/O. You can force the debugger to be forked by
assigning a value to the logical name <PERLDB_PIDS> that is not a process
identification number.
$define PERLDB_PIDS XXXX
=head1 PERL_VMS_EXCEPTION_DEBUG
The PERL_VMS_EXCEPTION_DEBUG being defined as "ENABLE" will cause the VMS
debugger to be invoked if a fatal exception that is not otherwise
handled is raised. The purpose of this is to allow debugging of
internal Perl problems that would cause such a condition.
This allows the programmer to look at the execution stack and variables to
find out the cause of the exception. As the debugger is being invoked as
the Perl interpreter is about to do a fatal exit, continuing the execution
in debug mode is usually not practical.
Starting Perl in the VMS debugger may change the program execution
profile in a way that such problems are not reproduced.
The C<kill> function can be used to test this functionality from within
a program.
In typical VMS style, only the first letter of the value of this logical
name is actually checked in a case insensitive mode, and it is considered
enabled if it is the value "T","1" or "E".
This logical name must be defined before Perl is started.
=head1 Command line
=head2 I/O redirection and backgrounding
Perl for VMS supports redirection of input and output on the
command line, using a subset of Bourne shell syntax:
=over 4
=item *
C<E<lt>file> reads stdin from C<file>,
=item *
C<E<gt>file> writes stdout to C<file>,
=item *
C<E<gt>E<gt>file> appends stdout to C<file>,
=item *
C<2E<gt>file> writes stderr to C<file>,
=item *
C<2E<gt>E<gt>file> appends stderr to C<file>, and
=item *
C<< 2>&1 >> redirects stderr to stdout.
=back
In addition, output may be piped to a subprocess, using the
character '|'. Anything after this character on the command
line is passed to a subprocess for execution; the subprocess
takes the output of Perl as its input.
Finally, if the command line ends with '&', the entire
command is run in the background as an asynchronous
subprocess.
=head2 Command line switches
The following command line switches behave differently under
VMS than described in L<perlrun>. Note also that in order
to pass uppercase switches to Perl, you need to enclose
them in double-quotes on the command line, since the CRTL
downcases all unquoted strings.
On newer 64 bit versions of OpenVMS, a process setting now
controls if the quoting is needed to preserve the case of
command line arguments.
=over 4
=item -i
If the C<-i> switch is present but no extension for a backup
copy is given, then inplace editing creates a new version of
a file; the existing copy is not deleted. (Note that if
an extension is given, an existing file is renamed to the backup
file, as is the case under other operating systems, so it does
not remain as a previous version under the original filename.)
=item -S
If the C<"-S"> or C<-"S"> switch is present I<and> the script
name does not contain a directory, then Perl translates the
logical name DCL$PATH as a searchlist, using each translation
as a directory in which to look for the script. In addition,
if no file type is specified, Perl looks in each directory
for a file matching the name specified, with a blank type,
a type of F<.pl>, and a type of F<.com>, in that order.
=item -u
The C<-u> switch causes the VMS debugger to be invoked
after the Perl program is compiled, but before it has
run. It does not create a core dump file.
=back
=head1 Perl functions
As of the time this document was last revised, the following
Perl functions were implemented in the VMS port of Perl
(functions marked with * are discussed in more detail below):
file tests*, abs, alarm, atan, backticks*, binmode*, bless,
caller, chdir, chmod, chown, chomp, chop, chr,
close, closedir, cos, crypt*, defined, delete, die, do, dump*,
each, endgrent, endpwent, eof, eval, exec*, exists, exit, exp,
fileno, flock getc, getgrent*, getgrgid*, getgrnam, getlogin,
getppid, getpwent*, getpwnam*, getpwuid*, glob, gmtime*, goto,
grep, hex, ioctl, import, index, int, join, keys, kill*,
last, lc, lcfirst, lchown*, length, link*, local, localtime, log,
lstat, m//, map, mkdir, my, next, no, oct, open, opendir, ord,
pack, pipe, pop, pos, print, printf, push, q//, qq//, qw//,
qx//*, quotemeta, rand, read, readdir, readlink*, redo, ref,
rename, require, reset, return, reverse, rewinddir, rindex,
rmdir, s///, scalar, seek, seekdir, select(internal),
select (system call)*, setgrent, setpwent, shift, sin, sleep,
socketpair, sort, splice, split, sprintf, sqrt, srand, stat,
study, substr, symlink*, sysread, system*, syswrite, tell,
telldir, tie, time, times*, tr///, uc, ucfirst, umask,
undef, unlink*, unpack, untie, unshift, use, utime*,
values, vec, wait, waitpid*, wantarray, warn, write, y///
The following functions were not implemented in the VMS port,
and calling them produces a fatal error (usually) or
undefined behavior (rarely, we hope):
chroot, dbmclose, dbmopen, fork*, getpgrp, getpriority,
msgctl, msgget, msgsend, msgrcv, semctl,
semget, semop, setpgrp, setpriority, shmctl, shmget,
shmread, shmwrite, syscall
The following functions are available on Perls compiled with Dec C
5.2 or greater and running VMS 7.0 or greater:
truncate
The following functions are available on Perls built on VMS 7.2 or
greater:
fcntl (without locking)
The following functions may or may not be implemented,
depending on what type of socket support you've built into
your copy of Perl:
accept, bind, connect, getpeername,
gethostbyname, getnetbyname, getprotobyname,
getservbyname, gethostbyaddr, getnetbyaddr,
getprotobynumber, getservbyport, gethostent,
getnetent, getprotoent, getservent, sethostent,
setnetent, setprotoent, setservent, endhostent,
endnetent, endprotoent, endservent, getsockname,
getsockopt, listen, recv, select(system call)*,
send, setsockopt, shutdown, socket
The following function is available on Perls built on 64 bit OpenVMS v8.2
with hard links enabled on an ODS-5 formatted build disk. CRTL support
is in principle available as of OpenVMS v7.3-1, and better configuration
support could detect this.
link
The following functions are available on Perls built on 64 bit OpenVMS
v8.2 and later. CRTL support is in principle available as of OpenVMS
v7.3-2, and better configuration support could detect this.
getgrgid, getgrnam, getpwnam, getpwuid,
setgrent, ttyname
The following functions are available on Perls built on 64 bit OpenVMS v8.2
and later.
statvfs, socketpair
=over 4
=item File tests
The tests C<-b>, C<-B>, C<-c>, C<-C>, C<-d>, C<-e>, C<-f>,
C<-o>, C<-M>, C<-s>, C<-S>, C<-t>, C<-T>, and C<-z> work as
advertised. The return values for C<-r>, C<-w>, and C<-x>
tell you whether you can actually access the file; this may
not reflect the UIC-based file protections. Since real and
effective UIC don't differ under VMS, C<-O>, C<-R>, C<-W>,
and C<-X> are equivalent to C<-o>, C<-r>, C<-w>, and C<-x>.
Similarly, several other tests, including C<-A>, C<-g>, C<-k>,
C<-l>, C<-p>, and C<-u>, aren't particularly meaningful under
VMS, and the values returned by these tests reflect whatever
your CRTL C<stat()> routine does to the equivalent bits in the
st_mode field. Finally, C<-d> returns true if passed a device
specification without an explicit directory (e.g. C<DUA1:>), as
well as if passed a directory.
There are DECC feature logical names AND ODS-5 volume attributes that
also control what values are returned for the date fields.
Note: Some sites have reported problems when using the file-access
tests (C<-r>, C<-w>, and C<-x>) on files accessed via DEC's DFS.
Specifically, since DFS does not currently provide access to the
extended file header of files on remote volumes, attempts to
examine the ACL fail, and the file tests will return false,
with C<$!> indicating that the file does not exist. You can
use C<stat> on these files, since that checks UIC-based protection
only, and then manually check the appropriate bits, as defined by
your C compiler's F<stat.h>, in the mode value it returns, if you
need an approximation of the file's protections.
=item backticks
Backticks create a subprocess, and pass the enclosed string
to it for execution as a DCL command. Since the subprocess is
created directly via C<lib$spawn()>, any valid DCL command string
may be specified.
=item binmode FILEHANDLE
The C<binmode> operator will attempt to insure that no translation
of carriage control occurs on input from or output to this filehandle.
Since this involves reopening the file and then restoring its
file position indicator, if this function returns FALSE, the
underlying filehandle may no longer point to an open file, or may
point to a different position in the file than before C<binmode>
was called.
Note that C<binmode> is generally not necessary when using normal
filehandles; it is provided so that you can control I/O to existing
record-structured files when necessary. You can also use the
C<vmsfopen> function in the VMS::Stdio extension to gain finer
control of I/O to files and devices with different record structures.
=item crypt PLAINTEXT, USER
The C<crypt> operator uses the C<sys$hash_password> system
service to generate the hashed representation of PLAINTEXT.
If USER is a valid username, the algorithm and salt values
are taken from that user's UAF record. If it is not, then
the preferred algorithm and a salt of 0 are used. The
quadword encrypted value is returned as an 8-character string.
The value returned by C<crypt> may be compared against
the encrypted password from the UAF returned by the C<getpw*>
functions, in order to authenticate users. If you're
going to do this, remember that the encrypted password in
the UAF was generated using uppercase username and
password strings; you'll have to upcase the arguments to
C<crypt> to insure that you'll get the proper value:
sub validate_passwd {
my($user,$passwd) = @_;
my($pwdhash);
if ( !($pwdhash = (getpwnam($user))[1]) ||
$pwdhash ne crypt("\U$passwd","\U$name") ) {
intruder_alert($name);
}
return 1;
}
=item die
C<die> will force the native VMS exit status to be an SS$_ABORT code
if neither of the $! or $? status values are ones that would cause
the native status to be interpreted as being what VMS classifies as
SEVERE_ERROR severity for DCL error handling.
When C<PERL_VMS_POSIX_EXIT> is active (see L</"$?"> below), the native VMS exit
status value will have either one of the C<$!> or C<$?> or C<$^E> or
the Unix value 255 encoded into it in a way that the effective original
value can be decoded by other programs written in C, including Perl
and the GNV package. As per the normal non-VMS behavior of C<die> if
either C<$!> or C<$?> are non-zero, one of those values will be
encoded into a native VMS status value. If both of the Unix status
values are 0, and the C<$^E> value is set one of ERROR or SEVERE_ERROR
severity, then the C<$^E> value will be used as the exit code as is.
If none of the above apply, the Unix value of 255 will be encoded into
a native VMS exit status value.
Please note a significant difference in the behavior of C<die> in
the C<PERL_VMS_POSIX_EXIT> mode is that it does not force a VMS
SEVERE_ERROR status on exit. The Unix exit values of 2 through
255 will be encoded in VMS status values with severity levels of
SUCCESS. The Unix exit value of 1 will be encoded in a VMS status
value with a severity level of ERROR. This is to be compatible with
how the VMS C library encodes these values.
The minimum severity level set by C<die> in C<PERL_VMS_POSIX_EXIT> mode
may be changed to be ERROR or higher in the future depending on the
results of testing and further review.
See L</"$?"> for a description of the encoding of the Unix value to
produce a native VMS status containing it.
=item dump
Rather than causing Perl to abort and dump core, the C<dump>
operator invokes the VMS debugger. If you continue to
execute the Perl program under the debugger, control will
be transferred to the label specified as the argument to
C<dump>, or, if no label was specified, back to the
beginning of the program. All other state of the program
(I<e.g.> values of variables, open file handles) are not
affected by calling C<dump>.
=item exec LIST
A call to C<exec> will cause Perl to exit, and to invoke the command
given as an argument to C<exec> via C<lib$do_command>. If the
argument begins with '@' or '$' (other than as part of a filespec),
then it is executed as a DCL command. Otherwise, the first token on
the command line is treated as the filespec of an image to run, and
an attempt is made to invoke it (using F<.Exe> and the process
defaults to expand the filespec) and pass the rest of C<exec>'s
argument to it as parameters. If the token has no file type, and
matches a file with null type, then an attempt is made to determine
whether the file is an executable image which should be invoked
using C<MCR> or a text file which should be passed to DCL as a
command procedure.
=item fork
While in principle the C<fork> operator could be implemented via
(and with the same rather severe limitations as) the CRTL C<vfork()>
routine, and while some internal support to do just that is in
place, the implementation has never been completed, making C<fork>
currently unavailable. A true kernel C<fork()> is expected in a
future version of VMS, and the pseudo-fork based on interpreter
threads may be available in a future version of Perl on VMS (see
L<perlfork>). In the meantime, use C<system>, backticks, or piped
filehandles to create subprocesses.
=item getpwent
=item getpwnam
=item getpwuid
These operators obtain the information described in L<perlfunc>,
if you have the privileges necessary to retrieve the named user's
UAF information via C<sys$getuai>. If not, then only the C<$name>,
C<$uid>, and C<$gid> items are returned. The C<$dir> item contains
the login directory in VMS syntax, while the C<$comment> item
contains the login directory in Unix syntax. The C<$gcos> item
contains the owner field from the UAF record. The C<$quota>
item is not used.
=item gmtime
The C<gmtime> operator will function properly if you have a
working CRTL C<gmtime()> routine, or if the logical name
SYS$TIMEZONE_DIFFERENTIAL is defined as the number of seconds
which must be added to UTC to yield local time. (This logical
name is defined automatically if you are running a version of
VMS with built-in UTC support.) If neither of these cases is
true, a warning message is printed, and C<undef> is returned.
=item kill
In most cases, C<kill> is implemented via the undocumented system
service C<$SIGPRC>, which has the same calling sequence as C<$FORCEX>, but
throws an exception in the target process rather than forcing it to call
C<$EXIT>. Generally speaking, C<kill> follows the behavior of the
CRTL's C<kill()> function, but unlike that function can be called from
within a signal handler. Also, unlike the C<kill> in some versions of
the CRTL, Perl's C<kill> checks the validity of the signal passed in and
returns an error rather than attempting to send an unrecognized signal.
Also, negative signal values don't do anything special under
VMS; they're just converted to the corresponding positive value.
=item qx//
See the entry on C<backticks> above.
=item select (system call)
If Perl was not built with socket support, the system call
version of C<select> is not available at all. If socket
support is present, then the system call version of
C<select> functions only for file descriptors attached
to sockets. It will not provide information about regular
files or pipes, since the CRTL C<select()> routine does not
provide this functionality.
=item stat EXPR
Since VMS keeps track of files according to a different scheme
than Unix, it's not really possible to represent the file's ID
in the C<st_dev> and C<st_ino> fields of a C<struct stat>. Perl
tries its best, though, and the values it uses are pretty unlikely
to be the same for two different files. We can't guarantee this,
though, so caveat scriptor.
=item system LIST
The C<system> operator creates a subprocess, and passes its
arguments to the subprocess for execution as a DCL command.
Since the subprocess is created directly via C<lib$spawn()>, any
valid DCL command string may be specified. If the string begins with
'@', it is treated as a DCL command unconditionally. Otherwise, if
the first token contains a character used as a delimiter in file
specification (e.g. C<:> or C<]>), an attempt is made to expand it
using a default type of F<.Exe> and the process defaults, and if
successful, the resulting file is invoked via C<MCR>. This allows you
to invoke an image directly simply by passing the file specification
to C<system>, a common Unixish idiom. If the token has no file type,
and matches a file with null type, then an attempt is made to
determine whether the file is an executable image which should be
invoked using C<MCR> or a text file which should be passed to DCL
as a command procedure.
If LIST consists of the empty string, C<system> spawns an
interactive DCL subprocess, in the same fashion as typing
B<SPAWN> at the DCL prompt.
Perl waits for the subprocess to complete before continuing
execution in the current process. As described in L<perlfunc>,
the return value of C<system> is a fake "status" which follows
POSIX semantics unless the pragma C<use vmsish 'status'> is in
effect; see the description of C<$?> in this document for more
detail.
=item time
The value returned by C<time> is the offset in seconds from
01-JAN-1970 00:00:00 (just like the CRTL's times() routine), in order
to make life easier for code coming in from the POSIX/Unix world.
=item times
The array returned by the C<times> operator is divided up
according to the same rules the CRTL C<times()> routine.
Therefore, the "system time" elements will always be 0, since
there is no difference between "user time" and "system" time
under VMS, and the time accumulated by a subprocess may or may
not appear separately in the "child time" field, depending on
whether C<times()> keeps track of subprocesses separately. Note
especially that the VAXCRTL (at least) keeps track only of
subprocesses spawned using C<fork()> and C<exec()>; it will not
accumulate the times of subprocesses spawned via pipes, C<system()>,
or backticks.
=item unlink LIST
C<unlink> will delete the highest version of a file only; in
order to delete all versions, you need to say
1 while unlink LIST;
You may need to make this change to scripts written for a
Unix system which expect that after a call to C<unlink>,
no files with the names passed to C<unlink> will exist.
(Note: This can be changed at compile time; if you
C<use Config> and C<$Config{'d_unlink_all_versions'}> is
C<define>, then C<unlink> will delete all versions of a
file on the first call.)
C<unlink> will delete a file if at all possible, even if it
requires changing file protection (though it won't try to
change the protection of the parent directory). You can tell
whether you've got explicit delete access to a file by using the
C<VMS::Filespec::candelete> operator. For instance, in order
to delete only files to which you have delete access, you could
say something like
sub safe_unlink {
my($file,$num);
foreach $file (@_) {
next unless VMS::Filespec::candelete($file);
$num += unlink $file;
}
$num;
}
(or you could just use C<VMS::Stdio::remove>, if you've installed
the VMS::Stdio extension distributed with Perl). If C<unlink> has to
change the file protection to delete the file, and you interrupt it
in midstream, the file may be left intact, but with a changed ACL
allowing you delete access.
This behavior of C<unlink> is to be compatible with POSIX behavior
and not traditional VMS behavior.
=item utime LIST
This operator changes only the modification time of the file (VMS
revision date) on ODS-2 volumes and ODS-5 volumes without access
dates enabled. On ODS-5 volumes with access dates enabled, the
true access time is modified.
=item waitpid PID,FLAGS
If PID is a subprocess started by a piped C<open()> (see L<open>),
C<waitpid> will wait for that subprocess, and return its final status
value in C<$?>. If PID is a subprocess created in some other way (e.g.
SPAWNed before Perl was invoked), C<waitpid> will simply check once per
second whether the process has completed, and return when it has. (If
PID specifies a process that isn't a subprocess of the current process,
and you invoked Perl with the C<-w> switch, a warning will be issued.)
Returns PID on success, -1 on error. The FLAGS argument is ignored
in all cases.
=back
=head1 Perl variables
The following VMS-specific information applies to the indicated
"special" Perl variables, in addition to the general information
in L<perlvar>. Where there is a conflict, this information
takes precedence.
=over 4
=item %ENV
The operation of the C<%ENV> array depends on the translation
of the logical name F<PERL_ENV_TABLES>. If defined, it should
be a search list, each element of which specifies a location
for C<%ENV> elements. If you tell Perl to read or set the
element C<$ENV{>I<name>C<}>, then Perl uses the translations of
F<PERL_ENV_TABLES> as follows:
=over 4
=item CRTL_ENV
This string tells Perl to consult the CRTL's internal C<environ> array
of key-value pairs, using I<name> as the key. In most cases, this
contains only a few keys, but if Perl was invoked via the C
C<exec[lv]e()> function, as is the case for some embedded Perl
applications or when running under a shell such as GNV bash, the
C<environ> array may have been populated by the calling program.
=item CLISYM_[LOCAL]
A string beginning with C<CLISYM_>tells Perl to consult the CLI's
symbol tables, using I<name> as the name of the symbol. When reading
an element of C<%ENV>, the local symbol table is scanned first, followed
by the global symbol table.. The characters following C<CLISYM_> are
significant when an element of C<%ENV> is set or deleted: if the
complete string is C<CLISYM_LOCAL>, the change is made in the local
symbol table; otherwise the global symbol table is changed.
=item Any other string
If an element of F<PERL_ENV_TABLES> translates to any other string,
that string is used as the name of a logical name table, which is
consulted using I<name> as the logical name. The normal search
order of access modes is used.
=back
F<PERL_ENV_TABLES> is translated once when Perl starts up; any changes
you make while Perl is running do not affect the behavior of C<%ENV>.
If F<PERL_ENV_TABLES> is not defined, then Perl defaults to consulting
first the logical name tables specified by F<LNM$FILE_DEV>, and then
the CRTL C<environ> array. This default order is reversed when the
logical name F<GNV$UNIX_SHELL> is defined, such as when running under
GNV bash.
For operations on %ENV entries based on logical names or DCL symbols, the
key string is treated as if it were entirely uppercase, regardless of the
case actually specified in the Perl expression. Entries in %ENV based on the
CRTL's environ array preserve the case of the key string when stored, and
lookups are case sensitive.
When an element of C<%ENV> is read, the locations to which
F<PERL_ENV_TABLES> points are checked in order, and the value
obtained from the first successful lookup is returned. If the
name of the C<%ENV> element contains a semi-colon, it and
any characters after it are removed. These are ignored when
the CRTL C<environ> array or a CLI symbol table is consulted.
However, the name is looked up in a logical name table, the
suffix after the semi-colon is treated as the translation index
to be used for the lookup. This lets you look up successive values
for search list logical names. For instance, if you say
$ Define STORY once,upon,a,time,there,was
$ perl -e "for ($i = 0; $i <= 6; $i++) " -
_$ -e "{ print $ENV{'story;'.$i},' '}"
Perl will print C<ONCE UPON A TIME THERE WAS>, assuming, of course,
that F<PERL_ENV_TABLES> is set up so that the logical name C<story>
is found, rather than a CLI symbol or CRTL C<environ> element with
the same name.
When an element of C<%ENV> is set to a defined string, the
corresponding definition is made in the location to which the
first translation of F<PERL_ENV_TABLES> points. If this causes a
logical name to be created, it is defined in supervisor mode.
(The same is done if an existing logical name was defined in
executive or kernel mode; an existing user or supervisor mode
logical name is reset to the new value.) If the value is an empty
string, the logical name's translation is defined as a single C<NUL>
(ASCII C<\0>) character, since a logical name cannot translate to a
zero-length string. (This restriction does not apply to CLI symbols
or CRTL C<environ> values; they are set to the empty string.)
When an element of C<%ENV> is set to C<undef>, the element is looked
up as if it were being read, and if it is found, it is deleted. (An
item "deleted" from the CRTL C<environ> array is set to the empty
string.) Using C<delete> to remove an element from C<%ENV> has a
similar effect, but after the element is deleted, another attempt is
made to look up the element, so an inner-mode logical name or a name
in another location will replace the logical name just deleted. In
either case, only the first value found searching PERL_ENV_TABLES is
altered. It is not possible at present to define a search list
logical name via %ENV.
The element C<$ENV{DEFAULT}> is special: when read, it returns
Perl's current default device and directory, and when set, it
resets them, regardless of the definition of F<PERL_ENV_TABLES>.
It cannot be cleared or deleted; attempts to do so are silently
ignored.
Note that if you want to pass on any elements of the
C-local environ array to a subprocess which isn't
started by fork/exec, or isn't running a C program, you
can "promote" them to logical names in the current
process, which will then be inherited by all subprocesses,
by saying
foreach my $key (qw[C-local keys you want promoted]) {
my $temp = $ENV{$key}; # read from C-local array
$ENV{$key} = $temp; # and define as logical name
}
(You can't just say C<$ENV{$key} = $ENV{$key}>, since the
Perl optimizer is smart enough to elide the expression.)
Don't try to clear C<%ENV> by saying C<%ENV = ();>, it will throw
a fatal error. This is equivalent to doing the following from DCL:
DELETE/LOGICAL *
You can imagine how bad things would be if, for example, the SYS$MANAGER
or SYS$SYSTEM logical names were deleted.
At present, the first time you iterate over %ENV using
C<keys>, or C<values>, you will incur a time penalty as all
logical names are read, in order to fully populate %ENV.
Subsequent iterations will not reread logical names, so they
won't be as slow, but they also won't reflect any changes
to logical name tables caused by other programs.
You do need to be careful with the logical names representing
process-permanent files, such as C<SYS$INPUT> and C<SYS$OUTPUT>.
The translations for these logical names are prepended with a
two-byte binary value (0x1B 0x00) that needs to be stripped off
if you want to use it. (In previous versions of Perl it wasn't
possible to get the values of these logical names, as the null
byte acted as an end-of-string marker)
=item $!
The string value of C<$!> is that returned by the CRTL's
strerror() function, so it will include the VMS message for
VMS-specific errors. The numeric value of C<$!> is the
value of C<errno>, except if errno is EVMSERR, in which
case C<$!> contains the value of vaxc$errno. Setting C<$!>
always sets errno to the value specified. If this value is
EVMSERR, it also sets vaxc$errno to 4 (NONAME-F-NOMSG), so
that the string value of C<$!> won't reflect the VMS error
message from before C<$!> was set.
=item $^E
This variable provides direct access to VMS status values
in vaxc$errno, which are often more specific than the
generic Unix-style error messages in C<$!>. Its numeric value
is the value of vaxc$errno, and its string value is the
corresponding VMS message string, as retrieved by sys$getmsg().
Setting C<$^E> sets vaxc$errno to the value specified.
While Perl attempts to keep the vaxc$errno value to be current, if
errno is not EVMSERR, it may not be from the current operation.
=item $?
The "status value" returned in C<$?> is synthesized from the
actual exit status of the subprocess in a way that approximates
POSIX wait(5) semantics, in order to allow Perl programs to
portably test for successful completion of subprocesses. The
low order 8 bits of C<$?> are always 0 under VMS, since the
termination status of a process may or may not have been
generated by an exception.
The next 8 bits contain the termination status of the program.
If the child process follows the convention of C programs
compiled with the _POSIX_EXIT macro set, the status value will
contain the actual value of 0 to 255 returned by that program
on a normal exit.
With the _POSIX_EXIT macro set, the Unix exit value of zero is
represented as a VMS native status of 1, and the Unix values
from 2 to 255 are encoded by the equation:
VMS_status = 0x35a000 + (unix_value * 8) + 1.
And in the special case of Unix value 1 the encoding is:
VMS_status = 0x35a000 + 8 + 2 + 0x10000000.
For other termination statuses, the severity portion of the
subprocess's exit status is used: if the severity was success or
informational, these bits are all 0; if the severity was
warning, they contain a value of 1; if the severity was
error or fatal error, they contain the actual severity bits,
which turns out to be a value of 2 for error and 4 for severe_error.
Fatal is another term for the severe_error status.
As a result, C<$?> will always be zero if the subprocess's exit
status indicated successful completion, and non-zero if a
warning or error occurred or a program compliant with encoding
_POSIX_EXIT values was run and set a status.
How can you tell the difference between a non-zero status that is
the result of a VMS native error status or an encoded Unix status?
You can not unless you look at the ${^CHILD_ERROR_NATIVE} value.
The ${^CHILD_ERROR_NATIVE} value returns the actual VMS status value
and check the severity bits. If the severity bits are equal to 1,
then if the numeric value for C<$?> is between 2 and 255 or 0, then
C<$?> accurately reflects a value passed back from a Unix application.
If C<$?> is 1, and the severity bits indicate a VMS error (2), then
C<$?> is from a Unix application exit value.
In practice, Perl scripts that call programs that return _POSIX_EXIT
type status values will be expecting those values, and programs that
call traditional VMS programs will either be expecting the previous
behavior or just checking for a non-zero status.
And success is always the value 0 in all behaviors.
When the actual VMS termination status of the child is an error,
internally the C<$!> value will be set to the closest Unix errno
value to that error so that Perl scripts that test for error
messages will see the expected Unix style error message instead
of a VMS message.
Conversely, when setting C<$?> in an END block, an attempt is made
to convert the POSIX value into a native status intelligible to
the operating system upon exiting Perl. What this boils down to
is that setting C<$?> to zero results in the generic success value
SS$_NORMAL, and setting C<$?> to a non-zero value results in the
generic failure status SS$_ABORT. See also L<perlport/exit>.
With the C<PERL_VMS_POSIX_EXIT> logical name defined as "ENABLE",
setting C<$?> will cause the new value to be encoded into C<$^E>
so that either the original parent or child exit status values
0 to 255 can be automatically recovered by C programs expecting
_POSIX_EXIT behavior. If both a parent and a child exit value are
non-zero, then it will be assumed that this is actually a VMS native
status value to be passed through. The special value of 0xFFFF is
almost a NOOP as it will cause the current native VMS status in the
C library to become the current native Perl VMS status, and is handled
this way as it is known to not be a valid native VMS status value.
It is recommend that only values in the range of normal Unix parent or
child status numbers, 0 to 255 are used.
The pragma C<use vmsish 'status'> makes C<$?> reflect the actual
VMS exit status instead of the default emulation of POSIX status
described above. This pragma also disables the conversion of
non-zero values to SS$_ABORT when setting C<$?> in an END
block (but zero will still be converted to SS$_NORMAL).
Do not use the pragma C<use vmsish 'status'> with C<PERL_VMS_POSIX_EXIT>
enabled, as they are at times requesting conflicting actions and the
consequence of ignoring this advice will be undefined to allow future
improvements in the POSIX exit handling.
In general, with C<PERL_VMS_POSIX_EXIT> enabled, more detailed information
will be available in the exit status for DCL scripts or other native VMS tools,
and will give the expected information for Posix programs. It has not been
made the default in order to preserve backward compatibility.
N.B. Setting C<DECC$FILENAME_UNIX_REPORT> implicitly enables
C<PERL_VMS_POSIX_EXIT>.
=item $|
Setting C<$|> for an I/O stream causes data to be flushed
all the way to disk on each write (I<i.e.> not just to
the underlying RMS buffers for a file). In other words,
it's equivalent to calling fflush() and fsync() from C.
=back
=head1 Standard modules with VMS-specific differences
=head2 SDBM_File
SDBM_File works properly on VMS. It has, however, one minor
difference. The database directory file created has a F<.sdbm_dir>
extension rather than a F<.dir> extension. F<.dir> files are VMS filesystem
directory files, and using them for other purposes could cause unacceptable
problems.
=head1 Revision date
Please see the git repository for revision history.
=head1 AUTHOR
Charles Bailey bailey@cor.newman.upenn.edu
Craig Berry craigberry@mac.com
Dan Sugalski dan@sidhe.org
John Malmberg wb8tyw@qsl.net
PK z3�Z�G�#� � perl5181delta.podnu �[��� =encoding utf8
=head1 NAME
perl5181delta - what is new for perl v5.18.1
=head1 DESCRIPTION
This document describes differences between the 5.18.0 release and the 5.18.1
release.
If you are upgrading from an earlier release such as 5.16.0, first read
L<perl5180delta>, which describes differences between 5.16.0 and 5.18.0.
=head1 Incompatible Changes
There are no changes intentionally incompatible with 5.18.0
If any exist, they are bugs, and we request that you submit a
report. See L</Reporting Bugs> below.
=head1 Modules and Pragmata
=head2 Updated Modules and Pragmata
=over 4
=item *
B has been upgraded from 1.42 to 1.42_01, fixing bugs related to lexical
subroutines.
=item *
Digest::SHA has been upgraded from 5.84 to 5.84_01, fixing a crashing bug.
[RT #118649]
=item *
Module::CoreList has been upgraded from 2.89 to 2.96.
=back
=head1 Platform Support
=head2 Platform-Specific Notes
=over 4
=item AIX
A rarely-encounted configuration bug in the AIX hints file has been corrected.
=item MidnightBSD
After a patch to the relevant hints file, perl should now build correctly on
MidnightBSD 0.4-RELEASE.
=back
=head1 Selected Bug Fixes
=over 4
=item *
Starting in v5.18.0, a construct like C</[#](?{})/x> would have its C<#>
incorrectly interpreted as a comment. The code block would be skipped,
unparsed. This has been corrected.
=item *
A number of memory leaks related to the new, experimental regexp bracketed
character class feature have been plugged.
=item *
The OP allocation code now returns correctly aligned memory in all cases
for C<struct pmop>. Previously it could return memory only aligned to a
4-byte boundary, which is not correct for an ithreads build with 64 bit IVs
on some 32 bit platforms. Notably, this caused the build to fail completely
on sparc GNU/Linux. [RT #118055]
=item *
The debugger's C<man> command been fixed. It was broken in the v5.18.0
release. The C<man> command is aliased to the names C<doc> and C<perldoc> -
all now work again.
=item *
C<@_> is now correctly visible in the debugger, fixing a regression
introduced in v5.18.0's debugger. [RT #118169]
=item *
Fixed a small number of regexp constructions that could either fail to
match or crash perl when the string being matched against was
allocated above the 2GB line on 32-bit systems. [RT #118175]
=item *
Perl v5.16 inadvertently introduced a bug whereby calls to XSUBs that were
not visible at compile time were treated as lvalues and could be assigned
to, even when the subroutine was not an lvalue sub. This has been fixed.
[perl #117947]
=item *
Perl v5.18 inadvertently introduced a bug whereby dual-vars (i.e.
variables with both string and numeric values, such as C<$!> ) where the
truthness of the variable was determined by the numeric value rather than
the string value. [RT #118159]
=item *
Perl v5.18 inadvertently introduced a bug whereby interpolating mixed up-
and down-graded UTF-8 strings in a regex could result in malformed UTF-8
in the pattern: specifically if a downgraded character in the range
C<\x80..\xff> followed a UTF-8 string, e.g.
utf8::upgrade( my $u = "\x{e5}");
utf8::downgrade(my $d = "\x{e5}");
/$u$d/
[perl #118297].
=item *
Lexical constants (C<my sub a() { 42 }>) no longer crash when inlined.
=item *
Parameter prototypes attached to lexical subroutines are now respected when
compiling sub calls without parentheses. Previously, the prototypes were
honoured only for calls I<with> parentheses. [RT #116735]
=item *
Syntax errors in lexical subroutines in combination with calls to the same
subroutines no longer cause crashes at compile time.
=item *
The dtrace sub-entry probe now works with lexical subs, instead of
crashing [perl #118305].
=item *
Undefining an inlinable lexical subroutine (C<my sub foo() { 42 } undef
&foo>) would result in a crash if warnings were turned on.
=item *
Deep recursion warnings no longer crash lexical subroutines. [RT #118521]
=back
=head1 Acknowledgements
Perl 5.18.1 represents approximately 2 months of development since Perl 5.18.0
and contains approximately 8,400 lines of changes across 60 files from 12
authors.
Perl continues to flourish into its third decade thanks to a vibrant community
of users and developers. The following people are known to have contributed the
improvements that became Perl 5.18.1:
Chris 'BinGOs' Williams, Craig A. Berry, Dagfinn Ilmari Mannsåker, David
Mitchell, Father Chrysostomos, Karl Williamson, Lukas Mai, Nicholas Clark,
Peter Martini, Ricardo Signes, Shlomi Fish, Tony Cook.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles recently
posted to the comp.lang.perl.misc newsgroup and the perl bug database at
http://rt.perl.org/perlbug/ . There may also be information at
http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug> program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of C<perl -V>,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it
inappropriate to send to a publicly archived mailing list, then please send it
to perl5-security-report@perl.org. This points to a closed subscription
unarchived mailing list, which includes all the core committers, who will be
able to help assess the impact of issues, figure out a resolution, and help
co-ordinate the release of patches to mitigate or fix the problem across all
platforms on which Perl is supported. Please only use this address for
security issues in the Perl core, not for modules independently distributed on
CPAN.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details on
what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Z��bT} }
perltw.podnu �[��� =encoding utf8
如果你用一般的文字編輯器閱覽這份文件, 請忽略文中奇特的註記字符.
這份文件是以 POD (簡明文件格式) 寫成; 這種格式是為了能讓人直接讀取,
而特別設計的. 關於此格式的進一步資訊, 請參考 perlpod 線上文件.
=head1 NAME
perltw - 正體中文 Perl 指南
=head1 DESCRIPTION
歡迎來到 Perl 的天地!
從 5.8.0 版開始, Perl 具備了完善的 Unicode (萬國碼) 支援,
也連帶支援了許多拉丁語系以外的編碼方式; CJK (中日韓) 便是其中的一部份.
Unicode 是國際性的標準, 試圖涵蓋世界上所有的字符: 西方世界, 東方世界,
以及兩者間的一切 (希臘文, 敘利亞文, 阿拉伯文, 希伯來文, 印度文,
印地安文, 等等). 它也容納了多種作業系統與平臺 (如 PC 及麥金塔).
Perl 本身以 Unicode 進行操作. 這表示 Perl 內部的字串資料可用 Unicode
表示; Perl 的函式與算符 (例如正規表示式比對) 也能對 Unicode 進行操作.
在輸入及輸出時, 為了處理以 Unicode 之前的編碼方式儲存的資料, Perl
提供了 Encode 這個模組, 可以讓你輕易地讀取及寫入舊有的編碼資料.
Encode 延伸模組支援下列正體中文的編碼方式 ('big5' 表示 'big5-eten'):
big5-eten Big5 編碼 (含倚天延伸字形)
big5-hkscs Big5 + 香港外字集, 2001 年版
cp950 字碼頁 950 (Big5 + 微軟添加的字符)
舉例來說, 將 Big5 編碼的檔案轉成 Unicode, 祗需鍵入下列指令:
perl -MEncode -pe '$_= encode( utf8 => decode( big5 => $_ ) )' \
< file.big5 > file.utf8
Perl 也內附了 "piconv", 一支完全以 Perl 寫成的字符轉換工具程式, 用法如下:
piconv -f big5 -t utf8 < file.big5 > file.utf8
piconv -f utf8 -t big5 < file.utf8 > file.big5
另外,若程式碼本身以 utf8 編碼儲存,配合使用 utf8 模組,可讓程式碼中字串以及其運
算皆以字符為單位,而不以位元為單位,如下所示:
#!/usr/bin/env perl
use utf8;
print length("駱駝"); # 2 (不是 6)
print index("諄諄教誨", "教誨"); # 2 (從 0 起算第 2 個字符)
=head2 額外的中文編碼
如果需要更多的中文編碼, 可以從 CPAN (L<http://www.cpan.org/>) 下載
Encode::HanExtra 模組. 它目前提供下列編碼方式:
cccii 1980 年文建會的中文資訊交換碼
euc-tw Unix 延伸字符集, 包含 CNS11643 平面 1-7
big5plus 中文數位化技術推廣基金會的 Big5+
big5ext 中文數位化技術推廣基金會的 Big5e
另外, Encode::HanConvert 模組則提供了簡繁轉換用的兩種編碼:
big5-simp Big5 正體中文與 Unicode 簡體中文互轉
gbk-trad GBK 簡體中文與 Unicode 正體中文互轉
若想在 GBK 與 Big5 之間互轉, 請參考該模組內附的 b2g.pl 與 g2b.pl 兩支程式,
或在程式內使用下列寫法:
use Encode::HanConvert;
$euc_cn = big5_to_gb($big5); # 從 Big5 轉為 GBK
$big5 = gb_to_big5($euc_cn); # 從 GBK 轉為 Big5
=head2 進一步的資訊
請參考 Perl 內附的大量說明文件 (不幸全是用英文寫的), 來學習更多關於
Perl 的知識, 以及 Unicode 的使用方式. 不過, 外部的資源相當豐富:
=head2 提供 Perl 資源的網址
=over 4
=item L<http://www.perl.com/>
Perl 的首頁 (由歐萊禮公司維護)
=item L<http://www.cpan.org/>
Perl 綜合典藏網 (Comprehensive Perl Archive Network)
=item L<http://lists.perl.org/>
Perl 郵遞論壇一覽
=back
=head2 學習 Perl 的網址
=over 4
=item L<http://www.oreilly.com.tw/product_perl.php?id=index_perl>
正體中文版的歐萊禮 Perl 書藉
=back
=head2 Perl 使用者集會
=over 4
=item L<http://www.pm.org/groups/taiwan.html>
臺灣 Perl 推廣組一覽
=item L<irc://irc.freenode.org/#perl.tw>
Perl.tw 線上聊天室
=back
=head2 Unicode 相關網址
=over 4
=item L<http://www.unicode.org/>
Unicode 學術學會 (Unicode 標準的制定者)
=item L<http://www.cl.cam.ac.uk/%7Emgk25/unicode.html>
Unix/Linux 上的 UTF-8 及 Unicode 答客問
=back
=head2 中文化資訊
=over 4
=item 中文化軟體聯盟
L<http://www.cpatch.org/>
=item Linux 軟體中文化計劃
L<http://www.linux.org.tw/CLDP/>
=back
=head1 SEE ALSO
L<Encode>, L<Encode::TW>, L<perluniintro>, L<perlunicode>
=head1 AUTHORS
Jarkko Hietaniemi E<lt>jhi@iki.fiE<gt>
Audrey Tang (唐鳳) E<lt>audreyt@audreyt.orgE<gt>
=cut
PK z3�Z��8vP P perl5143delta.podnu �[��� =encoding utf8
=head1 NAME
perl5143delta - what is new for perl v5.14.3
=head1 DESCRIPTION
This document describes differences between the 5.14.2 release and
the 5.14.3 release.
If you are upgrading from an earlier release such as 5.12.0, first read
L<perl5140delta>, which describes differences between 5.12.0 and
5.14.0.
=head1 Core Enhancements
No changes since 5.14.0.
=head1 Security
=head2 C<Digest> unsafe use of eval (CVE-2011-3597)
The C<Digest-E<gt>new()> function did not properly sanitize input before
using it in an eval() call, which could lead to the injection of arbitrary
Perl code.
In order to exploit this flaw, the attacker would need to be able to set
the algorithm name used, or be able to execute arbitrary Perl code already.
This problem has been fixed.
=head2 Heap buffer overrun in 'x' string repeat operator (CVE-2012-5195)
Poorly written perl code that allows an attacker to specify the count to
perl's 'x' string repeat operator can already cause a memory exhaustion
denial-of-service attack. A flaw in versions of perl before 5.15.5 can
escalate that into a heap buffer overrun; coupled with versions of glibc
before 2.16, it possibly allows the execution of arbitrary code.
This problem has been fixed.
=head1 Incompatible Changes
There are no changes intentionally incompatible with 5.14.0. If any
exist, they are bugs and reports are welcome.
=head1 Deprecations
There have been no deprecations since 5.14.0.
=head1 Modules and Pragmata
=head2 New Modules and Pragmata
None
=head2 Updated Modules and Pragmata
=over 4
=item *
L<PerlIO::scalar> was updated to fix a bug in which opening a filehandle to
a glob copy caused assertion failures (under debugging) or hangs or other
erratic behaviour without debugging.
=item *
L<ODBM_File> and L<NDBM_File> were updated to allow building on GNU/Hurd.
=item *
L<IPC::Open3> has been updated to fix a regression introduced in perl
5.12, which broke C<IPC::Open3::open3($in, $out, $err, '-')>.
[perl #95748]
=item *
L<Digest> has been upgraded from version 1.16 to 1.16_01.
See L</Security>.
=item *
L<Module::CoreList> has been updated to version 2.49_04 to add data for
this release.
=back
=head2 Removed Modules and Pragmata
None
=head1 Documentation
=head2 New Documentation
None
=head2 Changes to Existing Documentation
=head3 L<perlcheat>
=over 4
=item *
L<perlcheat> was updated to 5.14.
=back
=head1 Configuration and Compilation
=over 4
=item *
h2ph was updated to search correctly gcc include directories on platforms
such as Debian with multi-architecture support.
=item *
In Configure, the test for procselfexe was refactored into a loop.
=back
=head1 Platform Support
=head2 New Platforms
None
=head2 Discontinued Platforms
None
=head2 Platform-Specific Notes
=over 4
=item FreeBSD
The FreeBSD hints file was corrected to be compatible with FreeBSD 10.0.
=item Solaris and NetBSD
Configure was updated for "procselfexe" support on Solaris and NetBSD.
=item HP-UX
README.hpux was updated to note the existence of a broken header in
HP-UX 11.00.
=item Linux
libutil is no longer used when compiling on Linux platforms, which avoids
warnings being emitted.
The system gcc (rather than any other gcc which might be in the compiling
user's path) is now used when searching for libraries such as C<-lm>.
=item Mac OS X
The locale tests were updated to reflect the behaviour of locales in
Mountain Lion.
=item GNU/Hurd
Various build and test fixes were included for GNU/Hurd.
LFS support was enabled in GNU/Hurd.
=item NetBSD
The NetBSD hints file was corrected to be compatible with NetBSD 6.*
=back
=head1 Bug Fixes
=over 4
=item *
A regression has been fixed that was introduced in 5.14, in C</i>
regular expression matching, in which a match improperly fails if the
pattern is in UTF-8, the target string is not, and a Latin-1 character
precedes a character in the string that should match the pattern. [perl
#101710]
=item *
In case-insensitive regular expression pattern matching, no longer on
UTF-8 encoded strings does the scan for the start of match only look at
the first possible position. This caused matches such as
C<"f\x{FB00}" =~ /ff/i> to fail.
=item *
The sitecustomize support was made relocatableinc aware, so that
-Dusesitecustomize and -Duserelocatableinc may be used together.
=item *
The smartmatch operator (C<~~>) was changed so that the right-hand side
takes precedence during C<Any ~~ Object> operations.
=item *
A bug has been fixed in the tainting support, in which an C<index()>
operation on a tainted constant would cause all other constants to become
tainted. [perl #64804]
=item *
A regression has been fixed that was introduced in perl 5.12, whereby
tainting errors were not correctly propagated through C<die()>.
[perl #111654]
=item *
A regression has been fixed that was introduced in perl 5.14, in which
C</[[:lower:]]/i> and C</[[:upper:]]/i> no longer matched the opposite case.
[perl #101970]
=back
=head1 Acknowledgements
Perl 5.14.3 represents approximately 12 months of development since Perl 5.14.2
and contains approximately 2,300 lines of changes across 64 files from 22
authors.
Perl continues to flourish into its third decade thanks to a vibrant community
of users and developers. The following people are known to have contributed the
improvements that became Perl 5.14.3:
Abigail, Andy Dougherty, Carl Hayter, Chris 'BinGOs' Williams, Dave Rolsky,
David Mitchell, Dominic Hargreaves, Father Chrysostomos, Florian Ragwitz,
H.Merijn Brand, Jilles Tjoelker, Karl Williamson, Leon Timmermans, Michael G
Schwern, Nicholas Clark, Niko Tyni, Pino Toscano, Ricardo Signes, Salvador
Fandiño, Samuel Thibault, Steve Hay, Tony Cook.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles
recently posted to the comp.lang.perl.misc newsgroup and the perl
bug database at http://rt.perl.org/perlbug/ . There may also be
information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug>
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it
inappropriate to send to a publicly archived mailing list, then please send
it to perl5-security-report@perl.org. This points to a closed subscription
unarchived mailing list, which includes all the core committers, who be able
to help assess the impact of issues, figure out a resolution, and help
co-ordinate the release of patches to mitigate or fix the problem across all
platforms on which Perl is supported. Please only use this address for
security issues in the Perl core, not for modules independently
distributed on CPAN.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details
on what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Z�P�� � perlref.podnu �[��� =head1 NAME
X<reference> X<pointer> X<data structure> X<structure> X<struct>
perlref - Perl references and nested data structures
=head1 NOTE
This is complete documentation about all aspects of references.
For a shorter, tutorial introduction to just the essential features,
see L<perlreftut>.
=head1 DESCRIPTION
Before release 5 of Perl it was difficult to represent complex data
structures, because all references had to be symbolic--and even then
it was difficult to refer to a variable instead of a symbol table entry.
Perl now not only makes it easier to use symbolic references to variables,
but also lets you have "hard" references to any piece of data or code.
Any scalar may hold a hard reference. Because arrays and hashes contain
scalars, you can now easily build arrays of arrays, arrays of hashes,
hashes of arrays, arrays of hashes of functions, and so on.
Hard references are smart--they keep track of reference counts for you,
automatically freeing the thing referred to when its reference count goes
to zero. (Reference counts for values in self-referential or
cyclic data structures may not go to zero without a little help; see
L</"Circular References"> for a detailed explanation.)
If that thing happens to be an object, the object is destructed. See
L<perlobj> for more about objects. (In a sense, everything in Perl is an
object, but we usually reserve the word for references to objects that
have been officially "blessed" into a class package.)
Symbolic references are names of variables or other objects, just as a
symbolic link in a Unix filesystem contains merely the name of a file.
The C<*glob> notation is something of a symbolic reference. (Symbolic
references are sometimes called "soft references", but please don't call
them that; references are confusing enough without useless synonyms.)
X<reference, symbolic> X<reference, soft>
X<symbolic reference> X<soft reference>
In contrast, hard references are more like hard links in a Unix file
system: They are used to access an underlying object without concern for
what its (other) name is. When the word "reference" is used without an
adjective, as in the following paragraph, it is usually talking about a
hard reference.
X<reference, hard> X<hard reference>
References are easy to use in Perl. There is just one overriding
principle: in general, Perl does no implicit referencing or dereferencing.
When a scalar is holding a reference, it always behaves as a simple scalar.
It doesn't magically start being an array or hash or subroutine; you have to
tell it explicitly to do so, by dereferencing it.
=head2 Making References
X<reference, creation> X<referencing>
References can be created in several ways.
=over 4
=item 1.
X<\> X<backslash>
By using the backslash operator on a variable, subroutine, or value.
(This works much like the & (address-of) operator in C.)
This typically creates I<another> reference to a variable, because
there's already a reference to the variable in the symbol table. But
the symbol table reference might go away, and you'll still have the
reference that the backslash returned. Here are some examples:
$scalarref = \$foo;
$arrayref = \@ARGV;
$hashref = \%ENV;
$coderef = \&handler;
$globref = \*foo;
It isn't possible to create a true reference to an IO handle (filehandle
or dirhandle) using the backslash operator. The most you can get is a
reference to a typeglob, which is actually a complete symbol table entry.
But see the explanation of the C<*foo{THING}> syntax below. However,
you can still use type globs and globrefs as though they were IO handles.
=item 2.
X<array, anonymous> X<[> X<[]> X<square bracket>
X<bracket, square> X<arrayref> X<array reference> X<reference, array>
A reference to an anonymous array can be created using square
brackets:
$arrayref = [1, 2, ['a', 'b', 'c']];
Here we've created a reference to an anonymous array of three elements
whose final element is itself a reference to another anonymous array of three
elements. (The multidimensional syntax described later can be used to
access this. For example, after the above, C<< $arrayref->[2][1] >> would have
the value "b".)
Taking a reference to an enumerated list is not the same
as using square brackets--instead it's the same as creating
a list of references!
@list = (\$a, \@b, \%c);
@list = \($a, @b, %c); # same thing!
As a special case, C<\(@foo)> returns a list of references to the contents
of C<@foo>, not a reference to C<@foo> itself. Likewise for C<%foo>,
except that the key references are to copies (since the keys are just
strings rather than full-fledged scalars).
=item 3.
X<hash, anonymous> X<{> X<{}> X<curly bracket>
X<bracket, curly> X<brace> X<hashref> X<hash reference> X<reference, hash>
A reference to an anonymous hash can be created using curly
brackets:
$hashref = {
'Adam' => 'Eve',
'Clyde' => 'Bonnie',
};
Anonymous hash and array composers like these can be intermixed freely to
produce as complicated a structure as you want. The multidimensional
syntax described below works for these too. The values above are
literals, but variables and expressions would work just as well, because
assignment operators in Perl (even within local() or my()) are executable
statements, not compile-time declarations.
Because curly brackets (braces) are used for several other things
including BLOCKs, you may occasionally have to disambiguate braces at the
beginning of a statement by putting a C<+> or a C<return> in front so
that Perl realizes the opening brace isn't starting a BLOCK. The economy and
mnemonic value of using curlies is deemed worth this occasional extra
hassle.
For example, if you wanted a function to make a new hash and return a
reference to it, you have these options:
sub hashem { { @_ } } # silently wrong
sub hashem { +{ @_ } } # ok
sub hashem { return { @_ } } # ok
On the other hand, if you want the other meaning, you can do this:
sub showem { { @_ } } # ambiguous (currently ok,
# but may change)
sub showem { {; @_ } } # ok
sub showem { { return @_ } } # ok
The leading C<+{> and C<{;> always serve to disambiguate
the expression to mean either the HASH reference, or the BLOCK.
=item 4.
X<subroutine, anonymous> X<subroutine, reference> X<reference, subroutine>
X<scope, lexical> X<closure> X<lexical> X<lexical scope>
A reference to an anonymous subroutine can be created by using
C<sub> without a subname:
$coderef = sub { print "Boink!\n" };
Note the semicolon. Except for the code
inside not being immediately executed, a C<sub {}> is not so much a
declaration as it is an operator, like C<do{}> or C<eval{}>. (However, no
matter how many times you execute that particular line (unless you're in an
C<eval("...")>), $coderef will still have a reference to the I<same>
anonymous subroutine.)
Anonymous subroutines act as closures with respect to my() variables,
that is, variables lexically visible within the current scope. Closure
is a notion out of the Lisp world that says if you define an anonymous
function in a particular lexical context, it pretends to run in that
context even when it's called outside the context.
In human terms, it's a funny way of passing arguments to a subroutine when
you define it as well as when you call it. It's useful for setting up
little bits of code to run later, such as callbacks. You can even
do object-oriented stuff with it, though Perl already provides a different
mechanism to do that--see L<perlobj>.
You might also think of closure as a way to write a subroutine
template without using eval(). Here's a small example of how
closures work:
sub newprint {
my $x = shift;
return sub { my $y = shift; print "$x, $y!\n"; };
}
$h = newprint("Howdy");
$g = newprint("Greetings");
# Time passes...
&$h("world");
&$g("earthlings");
This prints
Howdy, world!
Greetings, earthlings!
Note particularly that $x continues to refer to the value passed
into newprint() I<despite> "my $x" having gone out of scope by the
time the anonymous subroutine runs. That's what a closure is all
about.
This applies only to lexical variables, by the way. Dynamic variables
continue to work as they have always worked. Closure is not something
that most Perl programmers need trouble themselves about to begin with.
=item 5.
X<constructor> X<new>
References are often returned by special subroutines called constructors. Perl
objects are just references to a special type of object that happens to know
which package it's associated with. Constructors are just special subroutines
that know how to create that association. They do so by starting with an
ordinary reference, and it remains an ordinary reference even while it's also
being an object. Constructors are often named C<new()>. You I<can> call them
indirectly:
$objref = new Doggie( Tail => 'short', Ears => 'long' );
But that can produce ambiguous syntax in certain cases, so it's often
better to use the direct method invocation approach:
$objref = Doggie->new(Tail => 'short', Ears => 'long');
use Term::Cap;
$terminal = Term::Cap->Tgetent( { OSPEED => 9600 });
use Tk;
$main = MainWindow->new();
$menubar = $main->Frame(-relief => "raised",
-borderwidth => 2)
=item 6.
X<autovivification>
References of the appropriate type can spring into existence if you
dereference them in a context that assumes they exist. Because we haven't
talked about dereferencing yet, we can't show you any examples yet.
=item 7.
X<*foo{THING}> X<*>
A reference can be created by using a special syntax, lovingly known as
the *foo{THING} syntax. *foo{THING} returns a reference to the THING
slot in *foo (which is the symbol table entry which holds everything
known as foo).
$scalarref = *foo{SCALAR};
$arrayref = *ARGV{ARRAY};
$hashref = *ENV{HASH};
$coderef = *handler{CODE};
$ioref = *STDIN{IO};
$globref = *foo{GLOB};
$formatref = *foo{FORMAT};
$globname = *foo{NAME}; # "foo"
$pkgname = *foo{PACKAGE}; # "main"
Most of these are self-explanatory, but C<*foo{IO}>
deserves special attention. It returns
the IO handle, used for file handles (L<perlfunc/open>), sockets
(L<perlfunc/socket> and L<perlfunc/socketpair>), and directory
handles (L<perlfunc/opendir>). For compatibility with previous
versions of Perl, C<*foo{FILEHANDLE}> is a synonym for C<*foo{IO}>, though it
is discouraged, to encourage a consistent use of one name: IO. On perls
between v5.8 and v5.22, it will issue a deprecation warning, but this
deprecation has since been rescinded.
C<*foo{THING}> returns undef if that particular THING hasn't been used yet,
except in the case of scalars. C<*foo{SCALAR}> returns a reference to an
anonymous scalar if $foo hasn't been used yet. This might change in a
future release.
C<*foo{NAME}> and C<*foo{PACKAGE}> are the exception, in that they return
strings, rather than references. These return the package and name of the
typeglob itself, rather than one that has been assigned to it. So, after
C<*foo=*Foo::bar>, C<*foo> will become "*Foo::bar" when used as a string,
but C<*foo{PACKAGE}> and C<*foo{NAME}> will continue to produce "main" and
"foo", respectively.
C<*foo{IO}> is an alternative to the C<*HANDLE> mechanism given in
L<perldata/"Typeglobs and Filehandles"> for passing filehandles
into or out of subroutines, or storing into larger data structures.
Its disadvantage is that it won't create a new filehandle for you.
Its advantage is that you have less risk of clobbering more than
you want to with a typeglob assignment. (It still conflates file
and directory handles, though.) However, if you assign the incoming
value to a scalar instead of a typeglob as we do in the examples
below, there's no risk of that happening.
splutter(*STDOUT); # pass the whole glob
splutter(*STDOUT{IO}); # pass both file and dir handles
sub splutter {
my $fh = shift;
print $fh "her um well a hmmm\n";
}
$rec = get_rec(*STDIN); # pass the whole glob
$rec = get_rec(*STDIN{IO}); # pass both file and dir handles
sub get_rec {
my $fh = shift;
return scalar <$fh>;
}
=back
=head2 Using References
X<reference, use> X<dereferencing> X<dereference>
That's it for creating references. By now you're probably dying to
know how to use references to get back to your long-lost data. There
are several basic methods.
=over 4
=item 1.
Anywhere you'd put an identifier (or chain of identifiers) as part
of a variable or subroutine name, you can replace the identifier with
a simple scalar variable containing a reference of the correct type:
$bar = $$scalarref;
push(@$arrayref, $filename);
$$arrayref[0] = "January";
$$hashref{"KEY"} = "VALUE";
&$coderef(1,2,3);
print $globref "output\n";
It's important to understand that we are specifically I<not> dereferencing
C<$arrayref[0]> or C<$hashref{"KEY"}> there. The dereference of the
scalar variable happens I<before> it does any key lookups. Anything more
complicated than a simple scalar variable must use methods 2 or 3 below.
However, a "simple scalar" includes an identifier that itself uses method
1 recursively. Therefore, the following prints "howdy".
$refrefref = \\\"howdy";
print $$$$refrefref;
=item 2.
Anywhere you'd put an identifier (or chain of identifiers) as part of a
variable or subroutine name, you can replace the identifier with a
BLOCK returning a reference of the correct type. In other words, the
previous examples could be written like this:
$bar = ${$scalarref};
push(@{$arrayref}, $filename);
${$arrayref}[0] = "January";
${$hashref}{"KEY"} = "VALUE";
&{$coderef}(1,2,3);
$globref->print("output\n"); # iff IO::Handle is loaded
Admittedly, it's a little silly to use the curlies in this case, but
the BLOCK can contain any arbitrary expression, in particular,
subscripted expressions:
&{ $dispatch{$index} }(1,2,3); # call correct routine
Because of being able to omit the curlies for the simple case of C<$$x>,
people often make the mistake of viewing the dereferencing symbols as
proper operators, and wonder about their precedence. If they were,
though, you could use parentheses instead of braces. That's not the case.
Consider the difference below; case 0 is a short-hand version of case 1,
I<not> case 2:
$$hashref{"KEY"} = "VALUE"; # CASE 0
${$hashref}{"KEY"} = "VALUE"; # CASE 1
${$hashref{"KEY"}} = "VALUE"; # CASE 2
${$hashref->{"KEY"}} = "VALUE"; # CASE 3
Case 2 is also deceptive in that you're accessing a variable
called %hashref, not dereferencing through $hashref to the hash
it's presumably referencing. That would be case 3.
=item 3.
Subroutine calls and lookups of individual array elements arise often
enough that it gets cumbersome to use method 2. As a form of
syntactic sugar, the examples for method 2 may be written:
$arrayref->[0] = "January"; # Array element
$hashref->{"KEY"} = "VALUE"; # Hash element
$coderef->(1,2,3); # Subroutine call
The left side of the arrow can be any expression returning a reference,
including a previous dereference. Note that C<$array[$x]> is I<not> the
same thing as C<< $array->[$x] >> here:
$array[$x]->{"foo"}->[0] = "January";
This is one of the cases we mentioned earlier in which references could
spring into existence when in an lvalue context. Before this
statement, C<$array[$x]> may have been undefined. If so, it's
automatically defined with a hash reference so that we can look up
C<{"foo"}> in it. Likewise C<< $array[$x]->{"foo"} >> will automatically get
defined with an array reference so that we can look up C<[0]> in it.
This process is called I<autovivification>.
One more thing here. The arrow is optional I<between> brackets
subscripts, so you can shrink the above down to
$array[$x]{"foo"}[0] = "January";
Which, in the degenerate case of using only ordinary arrays, gives you
multidimensional arrays just like C's:
$score[$x][$y][$z] += 42;
Well, okay, not entirely like C's arrays, actually. C doesn't know how
to grow its arrays on demand. Perl does.
=item 4.
If a reference happens to be a reference to an object, then there are
probably methods to access the things referred to, and you should probably
stick to those methods unless you're in the class package that defines the
object's methods. In other words, be nice, and don't violate the object's
encapsulation without a very good reason. Perl does not enforce
encapsulation. We are not totalitarians here. We do expect some basic
civility though.
=back
Using a string or number as a reference produces a symbolic reference,
as explained above. Using a reference as a number produces an
integer representing its storage location in memory. The only
useful thing to be done with this is to compare two references
numerically to see whether they refer to the same location.
X<reference, numeric context>
if ($ref1 == $ref2) { # cheap numeric compare of references
print "refs 1 and 2 refer to the same thing\n";
}
Using a reference as a string produces both its referent's type,
including any package blessing as described in L<perlobj>, as well
as the numeric address expressed in hex. The ref() operator returns
just the type of thing the reference is pointing to, without the
address. See L<perlfunc/ref> for details and examples of its use.
X<reference, string context>
The bless() operator may be used to associate the object a reference
points to with a package functioning as an object class. See L<perlobj>.
A typeglob may be dereferenced the same way a reference can, because
the dereference syntax always indicates the type of reference desired.
So C<${*foo}> and C<${\$foo}> both indicate the same scalar variable.
Here's a trick for interpolating a subroutine call into a string:
print "My sub returned @{[mysub(1,2,3)]} that time.\n";
The way it works is that when the C<@{...}> is seen in the double-quoted
string, it's evaluated as a block. The block creates a reference to an
anonymous array containing the results of the call to C<mysub(1,2,3)>. So
the whole block returns a reference to an array, which is then
dereferenced by C<@{...}> and stuck into the double-quoted string. This
chicanery is also useful for arbitrary expressions:
print "That yields @{[$n + 5]} widgets\n";
Similarly, an expression that returns a reference to a scalar can be
dereferenced via C<${...}>. Thus, the above expression may be written
as:
print "That yields ${\($n + 5)} widgets\n";
=head2 Circular References
X<circular reference> X<reference, circular>
It is possible to create a "circular reference" in Perl, which can lead
to memory leaks. A circular reference occurs when two references
contain a reference to each other, like this:
my $foo = {};
my $bar = { foo => $foo };
$foo->{bar} = $bar;
You can also create a circular reference with a single variable:
my $foo;
$foo = \$foo;
In this case, the reference count for the variables will never reach 0,
and the references will never be garbage-collected. This can lead to
memory leaks.
Because objects in Perl are implemented as references, it's possible to
have circular references with objects as well. Imagine a TreeNode class
where each node references its parent and child nodes. Any node with a
parent will be part of a circular reference.
You can break circular references by creating a "weak reference". A
weak reference does not increment the reference count for a variable,
which means that the object can go out of scope and be destroyed. You
can weaken a reference with the C<weaken> function exported by the
L<Scalar::Util> module.
Here's how we can make the first example safer:
use Scalar::Util 'weaken';
my $foo = {};
my $bar = { foo => $foo };
$foo->{bar} = $bar;
weaken $foo->{bar};
The reference from C<$foo> to C<$bar> has been weakened. When the
C<$bar> variable goes out of scope, it will be garbage-collected. The
next time you look at the value of the C<< $foo->{bar} >> key, it will
be C<undef>.
This action at a distance can be confusing, so you should be careful
with your use of weaken. You should weaken the reference in the
variable that will go out of scope I<first>. That way, the longer-lived
variable will contain the expected reference until it goes out of
scope.
=head2 Symbolic references
X<reference, symbolic> X<reference, soft>
X<symbolic reference> X<soft reference>
We said that references spring into existence as necessary if they are
undefined, but we didn't say what happens if a value used as a
reference is already defined, but I<isn't> a hard reference. If you
use it as a reference, it'll be treated as a symbolic
reference. That is, the value of the scalar is taken to be the I<name>
of a variable, rather than a direct link to a (possibly) anonymous
value.
People frequently expect it to work like this. So it does.
$name = "foo";
$$name = 1; # Sets $foo
${$name} = 2; # Sets $foo
${$name x 2} = 3; # Sets $foofoo
$name->[0] = 4; # Sets $foo[0]
@$name = (); # Clears @foo
&$name(); # Calls &foo()
$pack = "THAT";
${"${pack}::$name"} = 5; # Sets $THAT::foo without eval
This is powerful, and slightly dangerous, in that it's possible
to intend (with the utmost sincerity) to use a hard reference, and
accidentally use a symbolic reference instead. To protect against
that, you can say
use strict 'refs';
and then only hard references will be allowed for the rest of the enclosing
block. An inner block may countermand that with
no strict 'refs';
Only package variables (globals, even if localized) are visible to
symbolic references. Lexical variables (declared with my()) aren't in
a symbol table, and thus are invisible to this mechanism. For example:
local $value = 10;
$ref = "value";
{
my $value = 20;
print $$ref;
}
This will still print 10, not 20. Remember that local() affects package
variables, which are all "global" to the package.
=head2 Not-so-symbolic references
Brackets around a symbolic reference can simply
serve to isolate an identifier or variable name from the rest of an
expression, just as they always have within a string. For example,
$push = "pop on ";
print "${push}over";
has always meant to print "pop on over", even though push is
a reserved word. This is generalized to work the same
without the enclosing double quotes, so that
print ${push} . "over";
and even
print ${ push } . "over";
will have the same effect. This
construct is I<not> considered to be a symbolic reference when you're
using strict refs:
use strict 'refs';
${ bareword }; # Okay, means $bareword.
${ "bareword" }; # Error, symbolic reference.
Similarly, because of all the subscripting that is done using single words,
the same rule applies to any bareword that is used for subscripting a hash.
So now, instead of writing
$array{ "aaa" }{ "bbb" }{ "ccc" }
you can write just
$array{ aaa }{ bbb }{ ccc }
and not worry about whether the subscripts are reserved words. In the
rare event that you do wish to do something like
$array{ shift }
you can force interpretation as a reserved word by adding anything that
makes it more than a bareword:
$array{ shift() }
$array{ +shift }
$array{ shift @_ }
The C<use warnings> pragma or the B<-w> switch will warn you if it
interprets a reserved word as a string.
But it will no longer warn you about using lowercase words, because the
string is effectively quoted.
=head2 Pseudo-hashes: Using an array as a hash
X<pseudo-hash> X<pseudo hash> X<pseudohash>
Pseudo-hashes have been removed from Perl. The 'fields' pragma
remains available.
=head2 Function Templates
X<scope, lexical> X<closure> X<lexical> X<lexical scope>
X<subroutine, nested> X<sub, nested> X<subroutine, local> X<sub, local>
As explained above, an anonymous function with access to the lexical
variables visible when that function was compiled, creates a closure. It
retains access to those variables even though it doesn't get run until
later, such as in a signal handler or a Tk callback.
Using a closure as a function template allows us to generate many functions
that act similarly. Suppose you wanted functions named after the colors
that generated HTML font changes for the various colors:
print "Be ", red("careful"), "with that ", green("light");
The red() and green() functions would be similar. To create these,
we'll assign a closure to a typeglob of the name of the function we're
trying to build.
@colors = qw(red blue green yellow orange purple violet);
for my $name (@colors) {
no strict 'refs'; # allow symbol table manipulation
*$name = *{uc $name} = sub { "<FONT COLOR='$name'>@_</FONT>" };
}
Now all those different functions appear to exist independently. You can
call red(), RED(), blue(), BLUE(), green(), etc. This technique saves on
both compile time and memory use, and is less error-prone as well, since
syntax checks happen at compile time. It's critical that any variables in
the anonymous subroutine be lexicals in order to create a proper closure.
That's the reasons for the C<my> on the loop iteration variable.
This is one of the only places where giving a prototype to a closure makes
much sense. If you wanted to impose scalar context on the arguments of
these functions (probably not a wise idea for this particular example),
you could have written it this way instead:
*$name = sub ($) { "<FONT COLOR='$name'>$_[0]</FONT>" };
However, since prototype checking happens at compile time, the assignment
above happens too late to be of much use. You could address this by
putting the whole loop of assignments within a BEGIN block, forcing it
to occur during compilation.
Access to lexicals that change over time--like those in the C<for> loop
above, basically aliases to elements from the surrounding lexical scopes--
only works with anonymous subs, not with named subroutines. Generally
said, named subroutines do not nest properly and should only be declared
in the main package scope.
This is because named subroutines are created at compile time so their
lexical variables get assigned to the parent lexicals from the first
execution of the parent block. If a parent scope is entered a second
time, its lexicals are created again, while the nested subs still
reference the old ones.
Anonymous subroutines get to capture each time you execute the C<sub>
operator, as they are created on the fly. If you are accustomed to using
nested subroutines in other programming languages with their own private
variables, you'll have to work at it a bit in Perl. The intuitive coding
of this type of thing incurs mysterious warnings about "will not stay
shared" due to the reasons explained above.
For example, this won't work:
sub outer {
my $x = $_[0] + 35;
sub inner { return $x * 19 } # WRONG
return $x + inner();
}
A work-around is the following:
sub outer {
my $x = $_[0] + 35;
local *inner = sub { return $x * 19 };
return $x + inner();
}
Now inner() can only be called from within outer(), because of the
temporary assignments of the anonymous subroutine. But when it does,
it has normal access to the lexical variable $x from the scope of
outer() at the time outer is invoked.
This has the interesting effect of creating a function local to another
function, something not normally supported in Perl.
=head1 WARNING: Don't use references as hash keys
X<reference, string context> X<reference, use as hash key>
You may not (usefully) use a reference as the key to a hash. It will be
converted into a string:
$x{ \$a } = $a;
If you try to dereference the key, it won't do a hard dereference, and
you won't accomplish what you're attempting. You might want to do something
more like
$r = \@a;
$x{ $r } = $r;
And then at least you can use the values(), which will be
real refs, instead of the keys(), which won't.
The standard Tie::RefHash module provides a convenient workaround to this.
=head2 Postfix Dereference Syntax
Beginning in v5.20.0, a postfix syntax for using references is
available. It behaves as described in L</Using References>, but instead
of a prefixed sigil, a postfixed sigil-and-star is used.
For example:
$r = \@a;
@b = $r->@*; # equivalent to @$r or @{ $r }
$r = [ 1, [ 2, 3 ], 4 ];
$r->[1]->@*; # equivalent to @{ $r->[1] }
In Perl 5.20 and 5.22, this syntax must be enabled with C<use feature
'postderef'>. As of Perl 5.24, no feature declarations are required to make
it available.
Postfix dereference should work in all circumstances where block
(circumfix) dereference worked, and should be entirely equivalent. This
syntax allows dereferencing to be written and read entirely
left-to-right. The following equivalencies are defined:
$sref->$*; # same as ${ $sref }
$aref->@*; # same as @{ $aref }
$aref->$#*; # same as $#{ $aref }
$href->%*; # same as %{ $href }
$cref->&*; # same as &{ $cref }
$gref->**; # same as *{ $gref }
Note especially that C<< $cref->&* >> is I<not> equivalent to C<<
$cref->() >>, and can serve different purposes.
Glob elements can be extracted through the postfix dereferencing feature:
$gref->*{SCALAR}; # same as *{ $gref }{SCALAR}
Postfix array and scalar dereferencing I<can> be used in interpolating
strings (double quotes or the C<qq> operator), but only if the
C<postderef_qq> feature is enabled.
=head2 Postfix Reference Slicing
Value slices of arrays and hashes may also be taken with postfix
dereferencing notation, with the following equivalencies:
$aref->@[ ... ]; # same as @$aref[ ... ]
$href->@{ ... }; # same as @$href{ ... }
Postfix key/value pair slicing, added in 5.20.0 and documented in
L<the KeyE<sol>Value Hash Slices section of perldata|perldata/"Key/Value Hash
Slices">, also behaves as expected:
$aref->%[ ... ]; # same as %$aref[ ... ]
$href->%{ ... }; # same as %$href{ ... }
As with postfix array, postfix value slice dereferencing I<can> be used
in interpolating strings (double quotes or the C<qq> operator), but only
if the C<postderef_qq> L<feature> is enabled.
=head2 Assigning to References
Beginning in v5.22.0, the referencing operator can be assigned to. It
performs an aliasing operation, so that the variable name referenced on the
left-hand side becomes an alias for the thing referenced on the right-hand
side:
\$a = \$b; # $a and $b now point to the same scalar
\&foo = \&bar; # foo() now means bar()
This syntax must be enabled with C<use feature 'refaliasing'>. It is
experimental, and will warn by default unless C<no warnings
'experimental::refaliasing'> is in effect.
These forms may be assigned to, and cause the right-hand side to be
evaluated in scalar context:
\$scalar
\@array
\%hash
\&sub
\my $scalar
\my @array
\my %hash
\state $scalar # or @array, etc.
\our $scalar # etc.
\local $scalar # etc.
\local our $scalar # etc.
\$some_array[$index]
\$some_hash{$key}
\local $some_array[$index]
\local $some_hash{$key}
condition ? \$this : \$that[0] # etc.
Slicing operations and parentheses cause
the right-hand side to be evaluated in
list context:
\@array[5..7]
(\@array[5..7])
\(@array[5..7])
\@hash{'foo','bar'}
(\@hash{'foo','bar'})
\(@hash{'foo','bar'})
(\$scalar)
\($scalar)
\(my $scalar)
\my($scalar)
(\@array)
(\%hash)
(\&sub)
\(&sub)
\($foo, @bar, %baz)
(\$foo, \@bar, \%baz)
Each element on the right-hand side must be a reference to a datum of the
right type. Parentheses immediately surrounding an array (and possibly
also C<my>/C<state>/C<our>/C<local>) will make each element of the array an
alias to the corresponding scalar referenced on the right-hand side:
\(@a) = \(@b); # @a and @b now have the same elements
\my(@a) = \(@b); # likewise
\(my @a) = \(@b); # likewise
push @a, 3; # but now @a has an extra element that @b lacks
\(@a) = (\$a, \$b, \$c); # @a now contains $a, $b, and $c
Combining that form with C<local> and putting parentheses immediately
around a hash are forbidden (because it is not clear what they should do):
\local(@array) = foo(); # WRONG
\(%hash) = bar(); # wRONG
Assignment to references and non-references may be combined in lists and
conditional ternary expressions, as long as the values on the right-hand
side are the right type for each element on the left, though this may make
for obfuscated code:
(my $tom, \my $dick, \my @harry) = (\1, \2, [1..3]);
# $tom is now \1
# $dick is now 2 (read-only)
# @harry is (1,2,3)
my $type = ref $thingy;
($type ? $type eq 'ARRAY' ? \@foo : \$bar : $baz) = $thingy;
The C<foreach> loop can also take a reference constructor for its loop
variable, though the syntax is limited to one of the following, with an
optional C<my>, C<state>, or C<our> after the backslash:
\$s
\@a
\%h
\&c
No parentheses are permitted. This feature is particularly useful for
arrays-of-arrays, or arrays-of-hashes:
foreach \my @a (@array_of_arrays) {
frobnicate($a[0], $a[-1]);
}
foreach \my %h (@array_of_hashes) {
$h{gelastic}++ if $h{type} eq 'funny';
}
B<CAVEAT:> Aliasing does not work correctly with closures. If you try to
alias lexical variables from an inner subroutine or C<eval>, the aliasing
will only be visible within that inner sub, and will not affect the outer
subroutine where the variables are declared. This bizarre behavior is
subject to change.
=head1 Declaring a Reference to a Variable
Beginning in v5.26.0, the referencing operator can come after C<my>,
C<state>, C<our>, or C<local>. This syntax must be enabled with C<use
feature 'declared_refs'>. It is experimental, and will warn by default
unless C<no warnings 'experimental::refaliasing'> is in effect.
This feature makes these:
my \$x;
our \$y;
equivalent to:
\my $x;
\our $x;
It is intended mainly for use in assignments to references (see
L</Assigning to References>, above). It also allows the backslash to be
used on just some items in a list of declared variables:
my ($foo, \@bar, \%baz); # equivalent to: my $foo, \my(@bar, %baz);
=head1 SEE ALSO
Besides the obvious documents, source code can be instructive.
Some pathological examples of the use of references can be found
in the F<t/op/ref.t> regression test in the Perl source directory.
See also L<perldsc> and L<perllol> for how to use references to create
complex data structures, and L<perlootut> and L<perlobj>
for how to use them to create objects.
PK z3�ZM4�� � perlrepository.podnu �[��� =encoding utf8
=head1 NAME
perlrepository - Links to current information on the Perl source repository
=head1 DESCRIPTION
Perl's source code is stored in a Git repository.
See L<perlhack> for an explanation of Perl development, including the
L<Super Quick Patch Guide|perlhack/SUPER QUICK PATCH GUIDE> for making and
submitting a small patch.
See L<perlgit> for detailed information about Perl's Git repository.
(The above documents supersede the information that was formerly here in
perlrepository.)
PK z3�Z�\Ě1 1 perlcommunity.podnu �[��� =head1 NAME
perlcommunity - a brief overview of the Perl community
=head1 DESCRIPTION
This document aims to provide an overview of the vast perl community, which is
far too large and diverse to provide a detailed listing. If any specific niche
has been forgotten, it is not meant as an insult but an omission for the sake
of brevity.
The Perl community is as diverse as Perl, and there is a large amount of
evidence that the Perl users apply TMTOWTDI to all endeavors, not just
programming. From websites, to IRC, to mailing lists, there is more than one
way to get involved in the community.
=head2 Where to Find the Community
There is a central directory for the Perl community: L<http://perl.org>
maintained by the Perl Foundation (L<http://www.perlfoundation.org/>),
which tracks and provides services for a variety of other community sites.
=head2 Mailing Lists and Newsgroups
Perl runs on e-mail; there is no doubt about it. The Camel book was originally
written mostly over e-mail and today Perl's development is co-ordinated through
mailing lists. The largest repository of Perl mailing lists is located at
L<http://lists.perl.org>.
Most Perl-related projects set up mailing lists for both users and
contributors. If you don't see a certain project listed at
L<http://lists.perl.org>, check the particular website for that project.
Most mailing lists are archived at L<http://nntp.perl.org/>.
=head2 IRC
The Perl community has a rather large IRC presence. For starters, it has its
own IRC network, L<irc://irc.perl.org>. General (not help-oriented) chat can be
found at L<irc://irc.perl.org/#perl>. Many other more specific chats are also
hosted on the network. Information about irc.perl.org is located on the
network's website: L<http://www.irc.perl.org>. For a more help-oriented #perl,
check out L<irc://irc.freenode.net/#perl>. Perl 6 development also has a
presence in L<irc://irc.freenode.net/#perl6>. Most Perl-related channels will
be kind enough to point you in the right direction if you ask nicely.
Any large IRC network (Dalnet, EFnet) is also likely to have a #perl channel,
with varying activity levels.
=head2 Websites
Perl websites come in a variety of forms, but they fit into two large
categories: forums and news websites. There are many Perl-related
websites, so only a few of the community's largest are mentioned here.
=head3 News sites
=over 4
=item L<http://perl.com/>
Originally run by O'Reilly Media (the publisher of L<the Camel Book|perlbook>,
this site provides quality articles mostly about technical details of Perl.
=item L<http://blogs.perl.org/>
Many members of the community have a Perl-related blog on this site. If
you'd like to join them, you can sign up for free.
=item L<http://perlsphere.net/>
Perlsphere is one of several aggregators of Perl-related blog feeds.
=item L<http://perlweekly.com/>
Perl Weekly is a weekly mailing list that keeps you up to date on conferences,
releases and notable blog posts.
=item L<http://use.perl.org/>
use Perl; used to provide a slashdot-style news/blog website covering all
things Perl, from minutes of the meetings of the Perl 6 Design team to
conference announcements with (ir)relevant discussion. It no longer accepts
updates, but you can still use the site to read old entries and comments.
=back
=head3 Forums
=over 4
=item L<http://www.perlmonks.org/>
PerlMonks is one of the largest Perl forums, and describes itself as "A place
for individuals to polish, improve, and showcase their Perl skills." and "A
community which allows everyone to grow and learn from each other."
=item L<http://stackoverflow.com/>
Stack Overflow is a free question-and-answer site for programmers. It's not
focussed solely on Perl, but it does have an active group of users who do
their best to help people with their Perl programming questions.
=item L<http://prepan.org/>
PrePAN is used as a place to discuss modules that you're considering uploading
to the CPAN. You can get feedback on their design before you upload.
=back
=head2 User Groups
Many cities around the world have local Perl Mongers chapters. A Perl Mongers
chapter is a local user group which typically holds regular in-person meetings,
both social and technical; helps organize local conferences, workshops, and
hackathons; and provides a mailing list or other continual contact method for
its members to keep in touch.
To find your local Perl Mongers (or PM as they're commonly abbreviated) group
check the international Perl Mongers directory at L<http://www.pm.org/>.
=head2 Workshops
Perl workshops are, as the name might suggest, workshops where Perl is taught
in a variety of ways. At the workshops, subjects range from a beginner's
introduction (such as the Pittsburgh Perl Workshop's "Zero To Perl") to much
more advanced subjects.
There are several great resources for locating workshops: the
L<websites|"Websites"> mentioned above, the
L<calendar|"Calendar of Perl Events"> mentioned below, and the YAPC Europe
website, L<http://www.yapceurope.org/>, which is probably the best resource for
European Perl events.
=head2 Hackathons
Hackathons are a very different kind of gathering where Perl hackers gather to
do just that, hack nonstop for an extended (several day) period on a specific
project or projects. Information about hackathons can be located in the same
place as information about L<workshops|"Workshops"> as well as in
L<irc://irc.perl.org/#perl>.
If you have never been to a hackathon, here are a few basic things you need to
know before attending: have a working laptop and know how to use it; check out
the involved projects beforehand; have the necessary version control client;
and bring backup equipment (an extra LAN cable, additional power strips, etc.)
because someone will forget.
=head2 Conventions
Perl has two major annual conventions: The Perl Conference (now part of OSCON),
put on by O'Reilly, and Yet Another Perl Conference or YAPC (pronounced
yap-see), which is localized into several regional YAPCs (North America,
Europe, Asia) in a stunning grassroots display by the Perl community. For more
information about either conference, check out their respective web pages:
OSCON L<http://conferences.oreillynet.com/>; YAPC L<http://www.yapc.org>.
A relatively new conference franchise with a large Perl portion is the
Open Source Developers Conference or OSDC. First held in Australia it has
recently also spread to Israel and France. More information can be found at:
L<http://www.osdc.com.au/> for Australia, L<http://www.osdc.org.il>
for Israel, and L<http://www.osdc.fr/> for France.
=head2 Calendar of Perl Events
The Perl Review, L<http://www.theperlreview.com> maintains a website
and Google calendar
(L<http://www.theperlreview.com/community_calendar>) for tracking
workshops, hackathons, Perl Mongers meetings, and other events. Views
of this calendar are at L<http://www.perl.org/events.html> and
L<http://www.yapc.org>.
Not every event or Perl Mongers group is on that calendar, so don't lose
heart if you don't see yours posted. To have your event or group listed,
contact brian d foy (brian@theperlreview.com).
=head1 AUTHOR
Edgar "Trizor" Bering <trizor@gmail.com>
=cut
PK z3�Z��h� �
perlos400.podnu �[��� If you read this file _as_is_, just ignore the funny characters you see.
It is written in the POD format (see pod/perlpod.pod) which is specially
designed to be readable as is.
=head1 NAME
perlos400 - Perl version 5 on OS/400
B<This document needs to be updated, but we don't know what it should say.
Please email comments to L<perlbug@perl.org|mailto:perlbug@perl.org>.>
=head1 DESCRIPTION
This document describes various features of IBM's OS/400 operating
system that will affect how Perl version 5 (hereafter just Perl) is
compiled and/or runs.
By far the easiest way to build Perl for OS/400 is to use the PASE
(Portable Application Solutions Environment), for more information see
L<http://www.iseries.ibm.com/developer/factory/pase/index.html>
This environment allows one to use AIX APIs while programming, and it
provides a runtime that allows AIX binaries to execute directly on the
PowerPC iSeries.
=head2 Compiling Perl for OS/400 PASE
The recommended way to build Perl for the OS/400 PASE is to build the
Perl 5 source code (release 5.8.1 or later) under AIX.
The trick is to give a special parameter to the Configure shell script
when running it on AIX:
sh Configure -DPASE ...
The default installation directory of Perl under PASE is /QOpenSys/perl.
This can be modified if needed with Configure parameter -Dprefix=/some/dir.
Starting from OS/400 V5R2 the IBM Visual Age compiler is supported
on OS/400 PASE, so it is possible to build Perl natively on OS/400.
The easier way, however, is to compile in AIX, as just described.
If you don't want to install the compiled Perl in AIX into /QOpenSys
(for packaging it before copying it to PASE), you can use a Configure
parameter: -Dinstallprefix=/tmp/QOpenSys/perl. This will cause the
"make install" to install everything into that directory, while the
installed files still think they are (will be) in /QOpenSys/perl.
If building natively on PASE, please do the build under the /QOpenSys
directory, since Perl is happier when built on a case sensitive filesystem.
=head2 Installing Perl in OS/400 PASE
If you are compiling on AIX, simply do a "make install" on the AIX box.
Once the install finishes, tar up the /QOpenSys/perl directory. Transfer
the tarball to the OS/400 using FTP with the following commands:
> binary
> site namefmt 1
> put perl.tar /QOpenSys
Once you have it on, simply bring up a PASE shell and extract the tarball.
If you are compiling in PASE, then "make install" is the only thing you
will need to do.
The default path for perl binary is /QOpenSys/perl/bin/perl. You'll
want to symlink /QOpenSys/usr/bin/perl to this file so you don't have
to modify your path.
=head2 Using Perl in OS/400 PASE
Perl in PASE may be used in the same manner as you would use Perl on AIX.
Scripts starting with #!/usr/bin/perl should work if you have
/QOpenSys/usr/bin/perl symlinked to your perl binary. This will not
work if you've done a setuid/setgid or have environment variable
PASE_EXEC_QOPENSYS="N". If you have V5R1, you'll need to get the
latest PTFs to have this feature. Scripts starting with
#!/QOpenSys/perl/bin/perl should always work.
=head2 Known Problems
When compiling in PASE, there is no "oslevel" command. Therefore,
you may want to create a script called "oslevel" that echoes the
level of AIX that your version of PASE runtime supports. If you're
unsure, consult your documentation or use "4.3.3.0".
If you have test cases that fail, check for the existence of spool files.
The test case may be trying to use a syscall that is not implemented
in PASE. To avoid the SIGILL, try setting the PASE_SYSCALL_NOSIGILL
environment variable or have a handler for the SIGILL. If you can
compile programs for PASE, run the config script and edit config.sh
when it gives you the option. If you want to remove fchdir(), which
isn't implement in V5R1, simply change the line that says:
d_fchdir='define'
to
d_fchdir='undef'
and then compile Perl. The places where fchdir() is used have
alternatives for systems that do not have fchdir() available.
=head2 Perl on ILE
There exists a port of Perl to the ILE environment. This port, however,
is based quite an old release of Perl, Perl 5.00502 (August 1998).
(As of July 2002 the latest release of Perl is 5.8.0, and even 5.6.1
has been out since April 2001.) If you need to run Perl on ILE, though,
you may need this older port: L<http://www.cpan.org/ports/#os400>
Note that any Perl release later than 5.00502 has not been ported to ILE.
If you need to use Perl in the ILE environment, you may want to consider
using Qp2RunPase() to call the PASE version of Perl.
=head1 AUTHORS
Jarkko Hietaniemi <jhi@iki.fi>
Bryan Logan <bryanlog@us.ibm.com>
David Larson <larson1@us.ibm.com>
=cut
PK z3�ZlM7�� � perl5182delta.podnu �[��� =encoding utf8
=head1 NAME
perl5182delta - what is new for perl v5.18.2
=head1 DESCRIPTION
This document describes differences between the 5.18.1 release and the 5.18.2
release.
If you are upgrading from an earlier release such as 5.18.0, first read
L<perl5181delta>, which describes differences between 5.18.0 and 5.18.1.
=head1 Modules and Pragmata
=head2 Updated Modules and Pragmata
=over 4
=item *
L<B> has been upgraded from version 1.42_01 to 1.42_02.
The fix for [perl #118525] introduced a regression in the behaviour of
C<B::CV::GV>, changing the return value from a C<B::SPECIAL> object on
a C<NULL> C<CvGV> to C<undef>. C<B::CV::GV> again returns a
C<B::SPECIAL> object in this case. [perl #119413]
=item *
L<B::Concise> has been upgraded from version 0.95 to 0.95_01.
This fixes a bug in dumping unexpected SPECIALs.
=item *
L<English> has been upgraded from version 1.06 to 1.06_01. This fixes an
error about the performance of C<$`>, C<$&>, and C<$'>.
=item *
L<File::Glob> has been upgraded from version 1.20 to 1.20_01.
=back
=head1 Documentation
=head2 Changes to Existing Documentation
=over 4
=item *
L<perlrepository> has been restored with a pointer to more useful pages.
=item *
L<perlhack> has been updated with the latest changes from blead.
=back
=head1 Selected Bug Fixes
=over 4
=item *
Perl 5.18.1 introduced a regression along with a bugfix for lexical subs.
Some B::SPECIAL results from B::CV::GV became undefs instead. This broke
Devel::Cover among other libraries. This has been fixed. [perl #119351]
=item *
Perl 5.18.0 introduced a regression whereby C<[:^ascii:]>, if used in the same
character class as other qualifiers, would fail to match characters in the
Latin-1 block. This has been fixed. [perl #120799]
=item *
Perl 5.18.0 introduced a regression when using ->SUPER::method with AUTOLOAD
by looking up AUTOLOAD from the current package, rather than the current
package’s superclass. This has been fixed. [perl #120694]
=item *
Perl 5.18.0 introduced a regression whereby C<-bareword> was no longer
permitted under the C<strict> and C<integer> pragmata when used together. This
has been fixed. [perl #120288]
=item *
Previously PerlIOBase_dup didn't check if pushing the new layer succeeded
before (optionally) setting the utf8 flag. This could cause
segfaults-by-nullpointer. This has been fixed.
=item *
A buffer overflow with very long identifiers has been fixed.
=item *
A regression from 5.16 in the handling of padranges led to assertion failures
if a keyword plugin declined to handle the second ‘my’, but only after creating
a padop.
This affected, at least, Devel::CallParser under threaded builds.
This has been fixed.
=item *
The construct C<< $r=qr/.../; /$r/p >> is now handled properly, an issue which
had been worsened by changes 5.18.0. [perl #118213]
=back
=head1 Acknowledgements
Perl 5.18.2 represents approximately 3 months of development since Perl
5.18.1 and contains approximately 980 lines of changes across 39 files from 4
authors.
Perl continues to flourish into its third decade thanks to a vibrant
community of users and developers. The following people are known to have
contributed the improvements that became Perl 5.18.2:
Craig A. Berry, David Mitchell, Ricardo Signes, Tony Cook.
The list above is almost certainly incomplete as it is automatically
generated from version control history. In particular, it does not include
the names of the (very much appreciated) contributors who reported issues to
the Perl bug tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles recently
posted to the comp.lang.perl.misc newsgroup and the perl bug database at
http://rt.perl.org/perlbug/ . There may also be information at
http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug> program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of C<perl -V>,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it
inappropriate to send to a publicly archived mailing list, then please send it
to perl5-security-report@perl.org. This points to a closed subscription
unarchived mailing list, which includes all the core committers, who will be
able to help assess the impact of issues, figure out a resolution, and help
co-ordinate the release of patches to mitigate or fix the problem across all
platforms on which Perl is supported. Please only use this address for
security issues in the Perl core, not for modules independently distributed on
CPAN.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details on
what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Z e%� � perl5263delta.podnu �[��� =encoding utf8
=head1 NAME
perldelta - what is new for perl v5.26.3
=head1 DESCRIPTION
This document describes differences between the 5.26.2 release and the 5.26.3
release.
If you are upgrading from an earlier release such as 5.26.1, first read
L<perl5262delta>, which describes differences between 5.26.1 and 5.26.2.
=head1 Security
=head2 [CVE-2018-12015] Directory traversal in module Archive::Tar
By default, L<Archive::Tar> doesn't allow extracting files outside the current
working directory. However, this secure extraction mode could be bypassed by
putting a symlink and a regular file with the same name into the tar file.
L<[perl #133250]|https://rt.perl.org/Ticket/Display.html?id=133250>
L<[cpan #125523]|https://rt.cpan.org/Ticket/Display.html?id=125523>
=head2 [CVE-2018-18311] Integer overflow leading to buffer overflow and segmentation fault
Integer arithmetic in C<Perl_my_setenv()> could wrap when the combined length
of the environment variable name and value exceeded around 0x7fffffff. This
could lead to writing beyond the end of an allocated buffer with attacker
supplied data.
L<[perl #133204]|https://rt.perl.org/Ticket/Display.html?id=133204>
=head2 [CVE-2018-18312] Heap-buffer-overflow write in S_regatom (regcomp.c)
A crafted regular expression could cause heap-buffer-overflow write during
compilation, potentially allowing arbitrary code execution.
L<[perl #133423]|https://rt.perl.org/Ticket/Display.html?id=133423>
=head2 [CVE-2018-18313] Heap-buffer-overflow read in S_grok_bslash_N (regcomp.c)
A crafted regular expression could cause heap-buffer-overflow read during
compilation, potentially leading to sensitive information being leaked.
L<[perl #133192]|https://rt.perl.org/Ticket/Display.html?id=133192>
=head2 [CVE-2018-18314] Heap-buffer-overflow write in S_regatom (regcomp.c)
A crafted regular expression could cause heap-buffer-overflow write during
compilation, potentially allowing arbitrary code execution.
L<[perl #131649]|https://rt.perl.org/Ticket/Display.html?id=131649>
=head1 Incompatible Changes
There are no changes intentionally incompatible with 5.26.2. If any exist,
they are bugs, and we request that you submit a report. See
L</Reporting Bugs> below.
=head1 Modules and Pragmata
=head2 Updated Modules and Pragmata
=over 4
=item *
L<Archive::Tar> has been upgraded from version 2.24 to 2.24_01.
=item *
L<Module::CoreList> has been upgraded from version 5.20180414_26 to 5.20181129_26.
=back
=head1 Diagnostics
The following additions or changes have been made to diagnostic output,
including warnings and fatal error messages. For the complete list of
diagnostic messages, see L<perldiag>.
=head2 New Diagnostics
=head3 New Errors
=over 4
=item *
L<Unexpected ']' with no following ')' in (?[... in regex; marked by E<lt>-- HERE in mE<sol>%sE<sol>|perldiag/"Unexpected ']' with no following ')' in (?[... in regex; marked by E<lt>-- HERE in mE<sol>%sE<sol>">
(F) While parsing an extended character class a ']' character was encountered
at a point in the definition where the only legal use of ']' is to close the
character class definition as part of a '])', you may have forgotten the close
paren, or otherwise confused the parser.
=item *
L<Expecting close paren for nested extended charclass in regex; marked by E<lt>-- HERE in mE<sol>%sE<sol>|perldiag/"Expecting close paren for nested extended charclass in regex; marked by E<lt>-- HERE in mE<sol>%sE<sol>">
(F) While parsing a nested extended character class like:
(?[ ... (?flags:(?[ ... ])) ... ])
^
we expected to see a close paren ')' (marked by ^) but did not.
=item *
L<Expecting close paren for wrapper for nested extended charclass in regex; marked by E<lt>-- HERE in mE<sol>%sE<sol>|perldiag/"Expecting close paren for wrapper for nested extended charclass in regex; marked by E<lt>-- HERE in mE<sol>%sE<sol>">
(F) While parsing a nested extended character class like:
(?[ ... (?flags:(?[ ... ])) ... ])
^
we expected to see a close paren ')' (marked by ^) but did not.
=back
=head2 Changes to Existing Diagnostics
=over 4
=item *
L<Syntax error in (?[...]) in regex; marked by E<lt>-- HERE in mE<sol>%sE<sol>|perldiag/"Syntax error in (?[...]) in regex; marked by E<lt>-- HERE in mE<sol>%sE<sol>">
This fatal error message has been slightly expanded (from "Syntax error in
(?[...]) in regex mE<sol>%sE<sol>") for greater clarity.
=back
=head1 Acknowledgements
Perl 5.26.3 represents approximately 8 months of development since Perl 5.26.2
and contains approximately 4,500 lines of changes across 51 files from 15
authors.
Excluding auto-generated files, documentation and release tools, there were
approximately 770 lines of changes to 10 .pm, .t, .c and .h files.
Perl continues to flourish into its third decade thanks to a vibrant community
of users and developers. The following people are known to have contributed
the improvements that became Perl 5.26.3:
Aaron Crane, Abigail, Chris 'BinGOs' Williams, Dagfinn Ilmari Mannsåker, David
Mitchell, H.Merijn Brand, James E Keenan, John SJ Anderson, Karen Etheridge,
Karl Williamson, Sawyer X, Steve Hay, Todd Rinaldo, Tony Cook, Yves Orton.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the perl bug database
at L<https://rt.perl.org/> . There may also be information at
L<http://www.perl.org/> , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug> program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of C<perl -V>,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications which make it
inappropriate to send to a publicly archived mailing list, then see
L<perlsec/SECURITY VULNERABILITY CONTACT INFORMATION>
for details of how to report the issue.
=head1 Give Thanks
If you wish to thank the Perl 5 Porters for the work we had done in Perl 5,
you can do so by running the C<perlthanks> program:
perlthanks
This will send an email to the Perl 5 Porters list with your show of thanks.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details on
what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Z��;}� � perl58delta.podnu �[��� =head1 NAME
perl58delta - what is new for perl v5.8.0
=head1 DESCRIPTION
This document describes differences between the 5.6.0 release and
the 5.8.0 release.
Many of the bug fixes in 5.8.0 were already seen in the 5.6.1
maintenance release since the two releases were kept closely
coordinated (while 5.8.0 was still called 5.7.something).
Changes that were integrated into the 5.6.1 release are marked C<[561]>.
Many of these changes have been further developed since 5.6.1 was released,
those are marked C<[561+]>.
You can see the list of changes in the 5.6.1 release (both from the
5.005_03 release and the 5.6.0 release) by reading L<perl561delta>.
=head1 Highlights In 5.8.0
=over 4
=item *
Better Unicode support
=item *
New IO Implementation
=item *
New Thread Implementation
=item *
Better Numeric Accuracy
=item *
Safe Signals
=item *
Many New Modules
=item *
More Extensive Regression Testing
=back
=head1 Incompatible Changes
=head2 Binary Incompatibility
B<Perl 5.8 is not binary compatible with earlier releases of Perl.>
B<You have to recompile your XS modules.>
(Pure Perl modules should continue to work.)
The major reason for the discontinuity is the new IO architecture
called PerlIO. PerlIO is the default configuration because without
it many new features of Perl 5.8 cannot be used. In other words:
you just have to recompile your modules containing XS code, sorry
about that.
In future releases of Perl, non-PerlIO aware XS modules may become
completely unsupported. This shouldn't be too difficult for module
authors, however: PerlIO has been designed as a drop-in replacement
(at the source code level) for the stdio interface.
Depending on your platform, there are also other reasons why
we decided to break binary compatibility, please read on.
=head2 64-bit platforms and malloc
If your pointers are 64 bits wide, the Perl malloc is no longer being
used because it does not work well with 8-byte pointers. Also,
usually the system mallocs on such platforms are much better optimized
for such large memory models than the Perl malloc. Some memory-hungry
Perl applications like the PDL don't work well with Perl's malloc.
Finally, other applications than Perl (such as mod_perl) tend to prefer
the system malloc. Such platforms include Alpha and 64-bit HPPA,
MIPS, PPC, and Sparc.
=head2 AIX Dynaloading
The AIX dynaloading now uses in AIX releases 4.3 and newer the native
dlopen interface of AIX instead of the old emulated interface. This
change will probably break backward compatibility with compiled
modules. The change was made to make Perl more compliant with other
applications like mod_perl which are using the AIX native interface.
=head2 Attributes for C<my> variables now handled at run-time
The C<my EXPR : ATTRS> syntax now applies variable attributes at
run-time. (Subroutine and C<our> variables still get attributes applied
at compile-time.) See L<attributes> for additional details. In particular,
however, this allows variable attributes to be useful for C<tie> interfaces,
which was a deficiency of earlier releases. Note that the new semantics
doesn't work with the Attribute::Handlers module (as of version 0.76).
=head2 Socket Extension Dynamic in VMS
The Socket extension is now dynamically loaded instead of being
statically built in. This may or may not be a problem with ancient
TCP/IP stacks of VMS: we do not know since we weren't able to test
Perl in such configurations.
=head2 IEEE-format Floating Point Default on OpenVMS Alpha
Perl now uses IEEE format (T_FLOAT) as the default internal floating
point format on OpenVMS Alpha, potentially breaking binary compatibility
with external libraries or existing data. G_FLOAT is still available as
a configuration option. The default on VAX (D_FLOAT) has not changed.
=head2 New Unicode Semantics (no more C<use utf8>, almost)
Previously in Perl 5.6 to use Unicode one would say "use utf8" and
then the operations (like string concatenation) were Unicode-aware
in that lexical scope.
This was found to be an inconvenient interface, and in Perl 5.8 the
Unicode model has completely changed: now the "Unicodeness" is bound
to the data itself, and for most of the time "use utf8" is not needed
at all. The only remaining use of "use utf8" is when the Perl script
itself has been written in the UTF-8 encoding of Unicode. (UTF-8 has
not been made the default since there are many Perl scripts out there
that are using various national eight-bit character sets, which would
be illegal in UTF-8.)
See L<perluniintro> for the explanation of the current model,
and L<utf8> for the current use of the utf8 pragma.
=head2 New Unicode Properties
Unicode I<scripts> are now supported. Scripts are similar to (and superior
to) Unicode I<blocks>. The difference between scripts and blocks is that
scripts are the glyphs used by a language or a group of languages, while
the blocks are more artificial groupings of (mostly) 256 characters based
on the Unicode numbering.
In general, scripts are more inclusive, but not universally so. For
example, while the script C<Latin> includes all the Latin characters and
their various diacritic-adorned versions, it does not include the various
punctuation or digits (since they are not solely C<Latin>).
A number of other properties are now supported, including C<\p{L&}>,
C<\p{Any}> C<\p{Assigned}>, C<\p{Unassigned}>, C<\p{Blank}> [561] and
C<\p{SpacePerl}> [561] (along with their C<\P{...}> versions, of course).
See L<perlunicode> for details, and more additions.
The C<In> or C<Is> prefix to names used with the C<\p{...}> and C<\P{...}>
are now almost always optional. The only exception is that a C<In> prefix
is required to signify a Unicode block when a block name conflicts with a
script name. For example, C<\p{Tibetan}> refers to the script, while
C<\p{InTibetan}> refers to the block. When there is no name conflict, you
can omit the C<In> from the block name (e.g. C<\p{BraillePatterns}>), but
to be safe, it's probably best to always use the C<In>).
=head2 REF(...) Instead Of SCALAR(...)
A reference to a reference now stringifies as "REF(0x81485ec)" instead
of "SCALAR(0x81485ec)" in order to be more consistent with the return
value of ref().
=head2 pack/unpack D/F recycled
The undocumented pack/unpack template letters D/F have been recycled
for better use: now they stand for long double (if supported by the
platform) and NV (Perl internal floating point type). (They used
to be aliases for d/f, but you never knew that.)
=head2 glob() now returns filenames in alphabetical order
The list of filenames from glob() (or <...>) is now by default sorted
alphabetically to be csh-compliant (which is what happened before
in most Unix platforms). (bsd_glob() does still sort platform
natively, ASCII or EBCDIC, unless GLOB_ALPHASORT is specified.) [561]
=head2 Deprecations
=over 4
=item *
The semantics of bless(REF, REF) were unclear and until someone proves
it to make some sense, it is forbidden.
=item *
The obsolete chat2 library that should never have been allowed
to escape the laboratory has been decommissioned.
=item *
Using chdir("") or chdir(undef) instead of explicit chdir() is
doubtful. A failure (think chdir(some_function()) can lead into
unintended chdir() to the home directory, therefore this behaviour
is deprecated.
=item *
The builtin dump() function has probably outlived most of its
usefulness. The core-dumping functionality will remain in future
available as an explicit call to C<CORE::dump()>, but in future
releases the behaviour of an unqualified C<dump()> call may change.
=item *
The very dusty examples in the eg/ directory have been removed.
Suggestions for new shiny examples welcome but the main issue is that
the examples need to be documented, tested and (most importantly)
maintained.
=item *
The (bogus) escape sequences \8 and \9 now give an optional warning
("Unrecognized escape passed through"). There is no need to \-escape
any C<\w> character.
=item *
The *glob{FILEHANDLE} is deprecated, use *glob{IO} instead.
=item *
The C<package;> syntax (C<package> without an argument) has been
deprecated. Its semantics were never that clear and its
implementation even less so. If you have used that feature to
disallow all but fully qualified variables, C<use strict;> instead.
=item *
The unimplemented POSIX regex features [[.cc.]] and [[=c=]] are still
recognised but now cause fatal errors. The previous behaviour of
ignoring them by default and warning if requested was unacceptable
since it, in a way, falsely promised that the features could be used.
=item *
In future releases, non-PerlIO aware XS modules may become completely
unsupported. Since PerlIO is a drop-in replacement for stdio at the
source code level, this shouldn't be that drastic a change.
=item *
Previous versions of perl and some readings of some sections of Camel
III implied that the C<:raw> "discipline" was the inverse of C<:crlf>.
Turning off "clrfness" is no longer enough to make a stream truly
binary. So the PerlIO C<:raw> layer (or "discipline", to use the Camel
book's older terminology) is now formally defined as being equivalent
to binmode(FH) - which is in turn defined as doing whatever is
necessary to pass each byte as-is without any translation. In
particular binmode(FH) - and hence C<:raw> - will now turn off both
CRLF and UTF-8 translation and remove other layers (e.g. :encoding())
which would modify byte stream.
=item *
The current user-visible implementation of pseudo-hashes (the weird
use of the first array element) is deprecated starting from Perl 5.8.0
and will be removed in Perl 5.10.0, and the feature will be
implemented differently. Not only is the current interface rather
ugly, but the current implementation slows down normal array and hash
use quite noticeably. The C<fields> pragma interface will remain
available. The I<restricted hashes> interface is expected to
be the replacement interface (see L<Hash::Util>). If your existing
programs depends on the underlying implementation, consider using
L<Class::PseudoHash> from CPAN.
=item *
The syntaxes C<< @a->[...] >> and C<< %h->{...} >> have now been deprecated.
=item *
After years of trying, suidperl is considered to be too complex to
ever be considered truly secure. The suidperl functionality is likely
to be removed in a future release.
=item *
The 5.005 threads model (module C<Thread>) is deprecated and expected
to be removed in Perl 5.10. Multithreaded code should be migrated to
the new ithreads model (see L<threads>, L<threads::shared> and
L<perlthrtut>).
=item *
The long deprecated uppercase aliases for the string comparison
operators (EQ, NE, LT, LE, GE, GT) have now been removed.
=item *
The tr///C and tr///U features have been removed and will not return;
the interface was a mistake. Sorry about that. For similar
functionality, see pack('U0', ...) and pack('C0', ...). [561]
=item *
Earlier Perls treated "sub foo (@bar)" as equivalent to "sub foo (@)".
The prototypes are now checked better at compile-time for invalid
syntax. An optional warning is generated ("Illegal character in
prototype...") but this may be upgraded to a fatal error in a future
release.
=item *
The C<exec LIST> and C<system LIST> operations now produce warnings on
tainted data and in some future release they will produce fatal errors.
=item *
The existing behaviour when localising tied arrays and hashes is wrong,
and will be changed in a future release, so do not rely on the existing
behaviour. See L</"Localising Tied Arrays and Hashes Is Broken">.
=back
=head1 Core Enhancements
=head2 Unicode Overhaul
Unicode in general should be now much more usable than in Perl 5.6.0
(or even in 5.6.1). Unicode can be used in hash keys, Unicode in
regular expressions should work now, Unicode in tr/// should work now,
Unicode in I/O should work now. See L<perluniintro> for introduction
and L<perlunicode> for details.
=over 4
=item *
The Unicode Character Database coming with Perl has been upgraded
to Unicode 3.2.0. For more information, see http://www.unicode.org/ .
[561+] (5.6.1 has UCD 3.0.1.)
=item *
For developers interested in enhancing Perl's Unicode capabilities:
almost all the UCD files are included with the Perl distribution in
the F<lib/unicore> subdirectory. The most notable omission, for space
considerations, is the Unihan database.
=item *
The properties \p{Blank} and \p{SpacePerl} have been added. "Blank" is like
C isblank(), that is, it contains only "horizontal whitespace" (the space
character is, the newline isn't), and the "SpacePerl" is the Unicode
equivalent of C<\s> (\p{Space} isn't, since that includes the vertical
tabulator character, whereas C<\s> doesn't.)
See "New Unicode Properties" earlier in this document for additional
information on changes with Unicode properties.
=back
=head2 PerlIO is Now The Default
=over 4
=item *
IO is now by default done via PerlIO rather than system's "stdio".
PerlIO allows "layers" to be "pushed" onto a file handle to alter the
handle's behaviour. Layers can be specified at open time via 3-arg
form of open:
open($fh,'>:crlf :utf8', $path) || ...
or on already opened handles via extended C<binmode>:
binmode($fh,':encoding(iso-8859-7)');
The built-in layers are: unix (low level read/write), stdio (as in
previous Perls), perlio (re-implementation of stdio buffering in a
portable manner), crlf (does CRLF <=> "\n" translation as on Win32,
but available on any platform). A mmap layer may be available if
platform supports it (mostly Unixes).
Layers to be applied by default may be specified via the 'open' pragma.
See L</"Installation and Configuration Improvements"> for the effects
of PerlIO on your architecture name.
=item *
If your platform supports fork(), you can use the list form of C<open>
for pipes. For example:
open KID_PS, "-|", "ps", "aux" or die $!;
forks the ps(1) command (without spawning a shell, as there are more
than three arguments to open()), and reads its standard output via the
C<KID_PS> filehandle. See L<perlipc>.
=item *
File handles can be marked as accepting Perl's internal encoding of Unicode
(UTF-8 or UTF-EBCDIC depending on platform) by a pseudo layer ":utf8" :
open($fh,">:utf8","Uni.txt");
Note for EBCDIC users: the pseudo layer ":utf8" is erroneously named
for you since it's not UTF-8 what you will be getting but instead
UTF-EBCDIC. See L<perlunicode>, L<utf8>, and
http://www.unicode.org/unicode/reports/tr16/ for more information.
In future releases this naming may change. See L<perluniintro>
for more information about UTF-8.
=item *
If your environment variables (LC_ALL, LC_CTYPE, LANG) look like you
want to use UTF-8 (any of the variables match C</utf-?8/i>), your
STDIN, STDOUT, STDERR handles and the default open layer (see L<open>)
are marked as UTF-8. (This feature, like other new features that
combine Unicode and I/O, work only if you are using PerlIO, but that's
the default.)
Note that after this Perl really does assume that everything is UTF-8:
for example if some input handle is not, Perl will probably very soon
complain about the input data like this "Malformed UTF-8 ..." since
any old eight-bit data is not legal UTF-8.
Note for code authors: if you want to enable your users to use UTF-8
as their default encoding but in your code still have eight-bit I/O streams
(such as images or zip files), you need to explicitly open() or binmode()
with C<:bytes> (see L<perlfunc/open> and L<perlfunc/binmode>), or you
can just use C<binmode(FH)> (nice for pre-5.8.0 backward compatibility).
=item *
File handles can translate character encodings from/to Perl's internal
Unicode form on read/write via the ":encoding()" layer.
=item *
File handles can be opened to "in memory" files held in Perl scalars via:
open($fh,'>', \$variable) || ...
=item *
Anonymous temporary files are available without need to
'use FileHandle' or other module via
open($fh,"+>", undef) || ...
That is a literal undef, not an undefined value.
=back
=head2 ithreads
The new interpreter threads ("ithreads" for short) implementation of
multithreading, by Arthur Bergman, replaces the old "5.005 threads"
implementation. In the ithreads model any data sharing between
threads must be explicit, as opposed to the model where data sharing
was implicit. See L<threads> and L<threads::shared>, and
L<perlthrtut>.
As a part of the ithreads implementation Perl will also use
any necessary and detectable reentrant libc interfaces.
=head2 Restricted Hashes
A restricted hash is restricted to a certain set of keys, no keys
outside the set can be added. Also individual keys can be restricted
so that the key cannot be deleted and the value cannot be changed.
No new syntax is involved: the Hash::Util module is the interface.
=head2 Safe Signals
Perl used to be fragile in that signals arriving at inopportune moments
could corrupt Perl's internal state. Now Perl postpones handling of
signals until it's safe (between opcodes).
This change may have surprising side effects because signals no longer
interrupt Perl instantly. Perl will now first finish whatever it was
doing, like finishing an internal operation (like sort()) or an
external operation (like an I/O operation), and only then look at any
arrived signals (and before starting the next operation). No more corrupt
internal state since the current operation is always finished first,
but the signal may take more time to get heard. Note that breaking
out from potentially blocking operations should still work, though.
=head2 Understanding of Numbers
In general a lot of fixing has happened in the area of Perl's
understanding of numbers, both integer and floating point. Since in
many systems the standard number parsing functions like C<strtoul()>
and C<atof()> seem to have bugs, Perl tries to work around their
deficiencies. This results hopefully in more accurate numbers.
Perl now tries internally to use integer values in numeric conversions
and basic arithmetics (+ - * /) if the arguments are integers, and
tries also to keep the results stored internally as integers.
This change leads to often slightly faster and always less lossy
arithmetics. (Previously Perl always preferred floating point numbers
in its math.)
=head2 Arrays now always interpolate into double-quoted strings [561]
In double-quoted strings, arrays now interpolate, no matter what. The
behavior in earlier versions of perl 5 was that arrays would interpolate
into strings if the array had been mentioned before the string was
compiled, and otherwise Perl would raise a fatal compile-time error.
In versions 5.000 through 5.003, the error was
Literal @example now requires backslash
In versions 5.004_01 through 5.6.0, the error was
In string, @example now must be written as \@example
The idea here was to get people into the habit of writing
C<"fred\@example.com"> when they wanted a literal C<@> sign, just as
they have always written C<"Give me back my \$5"> when they wanted a
literal C<$> sign.
Starting with 5.6.1, when Perl now sees an C<@> sign in a
double-quoted string, it I<always> attempts to interpolate an array,
regardless of whether or not the array has been used or declared
already. The fatal error has been downgraded to an optional warning:
Possible unintended interpolation of @example in string
This warns you that C<"fred@example.com"> is going to turn into
C<fred.com> if you don't backslash the C<@>.
See http://perl.plover.com/at-error.html for more details
about the history here.
=head2 Miscellaneous Changes
=over 4
=item *
AUTOLOAD is now lvaluable, meaning that you can add the :lvalue attribute
to AUTOLOAD subroutines and you can assign to the AUTOLOAD return value.
=item *
The $Config{byteorder} (and corresponding BYTEORDER in config.h) was
previously wrong in platforms if sizeof(long) was 4, but sizeof(IV)
was 8. The byteorder was only sizeof(long) bytes long (1234 or 4321),
but now it is correctly sizeof(IV) bytes long, (12345678 or 87654321).
(This problem didn't affect Windows platforms.)
Also, $Config{byteorder} is now computed dynamically--this is more
robust with "fat binaries" where an executable image contains binaries
for more than one binary platform, and when cross-compiling.
=item *
C<perl -d:Module=arg,arg,arg> now works (previously one couldn't pass
in multiple arguments.)
=item *
C<do> followed by a bareword now ensures that this bareword isn't
a keyword (to avoid a bug where C<do q(foo.pl)> tried to call a
subroutine called C<q>). This means that for example instead of
C<do format()> you must write C<do &format()>.
=item *
The builtin dump() now gives an optional warning
C<dump() better written as CORE::dump()>,
meaning that by default C<dump(...)> is resolved as the builtin
dump() which dumps core and aborts, not as (possibly) user-defined
C<sub dump>. To call the latter, qualify the call as C<&dump(...)>.
(The whole dump() feature is to considered deprecated, and possibly
removed/changed in future releases.)
=item *
chomp() and chop() are now overridable. Note, however, that their
prototype (as given by C<prototype("CORE::chomp")> is undefined,
because it cannot be expressed and therefore one cannot really write
replacements to override these builtins.
=item *
END blocks are now run even if you exit/die in a BEGIN block.
Internally, the execution of END blocks is now controlled by
PL_exit_flags & PERL_EXIT_DESTRUCT_END. This enables the new
behaviour for Perl embedders. This will default in 5.10. See
L<perlembed>.
=item *
Formats now support zero-padded decimal fields.
=item *
Although "you shouldn't do that", it was possible to write code that
depends on Perl's hashed key order (Data::Dumper does this). The new
algorithm "One-at-a-Time" produces a different hashed key order.
More details are in L</"Performance Enhancements">.
=item *
lstat(FILEHANDLE) now gives a warning because the operation makes no sense.
In future releases this may become a fatal error.
=item *
Spurious syntax errors generated in certain situations, when glob()
caused File::Glob to be loaded for the first time, have been fixed. [561]
=item *
Lvalue subroutines can now return C<undef> in list context. However,
the lvalue subroutine feature still remains experimental. [561+]
=item *
A lost warning "Can't declare ... dereference in my" has been
restored (Perl had it earlier but it became lost in later releases.)
=item *
A new special regular expression variable has been introduced:
C<$^N>, which contains the most-recently closed group (submatch).
=item *
C<no Module;> does not produce an error even if Module does not have an
unimport() method. This parallels the behavior of C<use> vis-a-vis
C<import>. [561]
=item *
The numerical comparison operators return C<undef> if either operand
is a NaN. Previously the behaviour was unspecified.
=item *
C<our> can now have an experimental optional attribute C<unique> that
affects how global variables are shared among multiple interpreters,
see L<perlfunc/our>.
=item *
The following builtin functions are now overridable: each(), keys(),
pop(), push(), shift(), splice(), unshift(). [561]
=item *
C<pack() / unpack()> can now group template letters with C<()> and then
apply repetition/count modifiers on the groups.
=item *
C<pack() / unpack()> can now process the Perl internal numeric types:
IVs, UVs, NVs-- and also long doubles, if supported by the platform.
The template letters are C<j>, C<J>, C<F>, and C<D>.
=item *
C<pack('U0a*', ...)> can now be used to force a string to UTF-8.
=item *
my __PACKAGE__ $obj now works. [561]
=item *
POSIX::sleep() now returns the number of I<unslept> seconds
(as the POSIX standard says), as opposed to CORE::sleep() which
returns the number of slept seconds.
=item *
printf() and sprintf() now support parameter reordering using the
C<%\d+\$> and C<*\d+\$> syntaxes. For example
printf "%2\$s %1\$s\n", "foo", "bar";
will print "bar foo\n". This feature helps in writing
internationalised software, and in general when the order
of the parameters can vary.
=item *
The (\&) prototype now works properly. [561]
=item *
prototype(\[$@%&]) is now available to implicitly create references
(useful for example if you want to emulate the tie() interface).
=item *
A new command-line option, C<-t> is available. It is the
little brother of C<-T>: instead of dying on taint violations,
lexical warnings are given. B<This is only meant as a temporary
debugging aid while securing the code of old legacy applications.
This is not a substitute for -T.>
=item *
In other taint news, the C<exec LIST> and C<system LIST> have now been
considered too risky (think C<exec @ARGV>: it can start any program
with any arguments), and now the said forms cause a warning under
lexical warnings. You should carefully launder the arguments to
guarantee their validity. In future releases of Perl the forms will
become fatal errors so consider starting laundering now.
=item *
Tied hash interfaces are now required to have the EXISTS and DELETE
methods (either own or inherited).
=item *
If tr/// is just counting characters, it doesn't attempt to
modify its target.
=item *
untie() will now call an UNTIE() hook if it exists. See L<perltie>
for details. [561]
=item *
L<perlfunc/utime> now supports C<utime undef, undef, @files> to change the
file timestamps to the current time.
=item *
The rules for allowing underscores (underbars) in numeric constants
have been relaxed and simplified: now you can have an underscore
simply B<between digits>.
=item *
Rather than relying on C's argv[0] (which may not contain a full pathname)
where possible $^X is now set by asking the operating system.
(eg by reading F</proc/self/exe> on Linux, F</proc/curproc/file> on FreeBSD)
=item *
A new variable, C<${^TAINT}>, indicates whether taint mode is enabled.
=item *
You can now override the readline() builtin, and this overrides also
the <FILEHANDLE> angle bracket operator.
=item *
The command-line options -s and -F are now recognized on the shebang
(#!) line.
=item *
Use of the C</c> match modifier without an accompanying C</g> modifier
elicits a new warning: C<Use of /c modifier is meaningless without /g>.
Use of C</c> in substitutions, even with C</g>, elicits
C<Use of /c modifier is meaningless in s///>.
Use of C</g> with C<split> elicits C<Use of /g modifier is meaningless
in split>.
=item *
Support for the C<CLONE> special subroutine had been added.
With ithreads, when a new thread is created, all Perl data is cloned,
however non-Perl data cannot be cloned automatically. In C<CLONE> you
can do whatever you need to do, like for example handle the cloning of
non-Perl data, if necessary. C<CLONE> will be executed once for every
package that has it defined or inherited. It will be called in the
context of the new thread, so all modifications are made in the new area.
See L<perlmod>
=back
=head1 Modules and Pragmata
=head2 New Modules and Pragmata
=over 4
=item *
C<Attribute::Handlers>, originally by Damian Conway and now maintained
by Arthur Bergman, allows a class to define attribute handlers.
package MyPack;
use Attribute::Handlers;
sub Wolf :ATTR(SCALAR) { print "howl!\n" }
# later, in some package using or inheriting from MyPack...
my MyPack $Fluffy : Wolf; # the attribute handler Wolf will be called
Both variables and routines can have attribute handlers. Handlers can
be specific to type (SCALAR, ARRAY, HASH, or CODE), or specific to the
exact compilation phase (BEGIN, CHECK, INIT, or END).
See L<Attribute::Handlers>.
=item *
C<B::Concise>, by Stephen McCamant, is a new compiler backend for
walking the Perl syntax tree, printing concise info about ops.
The output is highly customisable. See L<B::Concise>. [561+]
=item *
The new bignum, bigint, and bigrat pragmas, by Tels, implement
transparent bignum support (using the Math::BigInt, Math::BigFloat,
and Math::BigRat backends).
=item *
C<Class::ISA>, by Sean Burke, is a module for reporting the search
path for a class's ISA tree. See L<Class::ISA>.
=item *
C<Cwd> now has a split personality: if possible, an XS extension is
used, (this will hopefully be faster, more secure, and more robust)
but if not possible, the familiar Perl implementation is used.
=item *
C<Devel::PPPort>, originally by Kenneth Albanowski and now
maintained by Paul Marquess, has been added. It is primarily used
by C<h2xs> to enhance portability of XS modules between different
versions of Perl. See L<Devel::PPPort>.
=item *
C<Digest>, frontend module for calculating digests (checksums), from
Gisle Aas, has been added. See L<Digest>.
=item *
C<Digest::MD5> for calculating MD5 digests (checksums) as defined in
RFC 1321, from Gisle Aas, has been added. See L<Digest::MD5>.
use Digest::MD5 'md5_hex';
$digest = md5_hex("Thirsty Camel");
print $digest, "\n"; # 01d19d9d2045e005c3f1b80e8b164de1
NOTE: the C<MD5> backward compatibility module is deliberately not
included since its further use is discouraged.
See also L<PerlIO::via::QuotedPrint>.
=item *
C<Encode>, originally by Nick Ing-Simmons and now maintained by Dan
Kogai, provides a mechanism to translate between different character
encodings. Support for Unicode, ISO-8859-1, and ASCII are compiled in
to the module. Several other encodings (like the rest of the
ISO-8859, CP*/Win*, Mac, KOI8-R, three variants EBCDIC, Chinese,
Japanese, and Korean encodings) are included and can be loaded at
runtime. (For space considerations, the largest Chinese encodings
have been separated into their own CPAN module, Encode::HanExtra,
which Encode will use if available). See L<Encode>.
Any encoding supported by Encode module is also available to the
":encoding()" layer if PerlIO is used.
=item *
C<Hash::Util> is the interface to the new I<restricted hashes>
feature. (Implemented by Jeffrey Friedl, Nick Ing-Simmons, and
Michael Schwern.) See L<Hash::Util>.
=item *
C<I18N::Langinfo> can be used to query locale information.
See L<I18N::Langinfo>.
=item *
C<I18N::LangTags>, by Sean Burke, has functions for dealing with
RFC3066-style language tags. See L<I18N::LangTags>.
=item *
C<ExtUtils::Constant>, by Nicholas Clark, is a new tool for extension
writers for generating XS code to import C header constants.
See L<ExtUtils::Constant>.
=item *
C<Filter::Simple>, by Damian Conway, is an easy-to-use frontend to
Filter::Util::Call. See L<Filter::Simple>.
# in MyFilter.pm:
package MyFilter;
use Filter::Simple sub {
while (my ($from, $to) = splice @_, 0, 2) {
s/$from/$to/g;
}
};
1;
# in user's code:
use MyFilter qr/red/ => 'green';
print "red\n"; # this code is filtered, will print "green\n"
print "bored\n"; # this code is filtered, will print "bogreen\n"
no MyFilter;
print "red\n"; # this code is not filtered, will print "red\n"
=item *
C<File::Temp>, by Tim Jenness, allows one to create temporary files
and directories in an easy, portable, and secure way. See L<File::Temp>.
[561+]
=item *
C<Filter::Util::Call>, by Paul Marquess, provides you with the
framework to write I<source filters> in Perl. For most uses, the
frontend Filter::Simple is to be preferred. See L<Filter::Util::Call>.
=item *
C<if>, by Ilya Zakharevich, is a new pragma for conditional inclusion
of modules.
=item *
L<libnet>, by Graham Barr, is a collection of perl5 modules related
to network programming. See L<Net::FTP>, L<Net::NNTP>, L<Net::Ping>
(not part of libnet, but related), L<Net::POP3>, L<Net::SMTP>,
and L<Net::Time>.
Perl installation leaves libnet unconfigured; use F<libnetcfg>
to configure it.
=item *
C<List::Util>, by Graham Barr, is a selection of general-utility
list subroutines, such as sum(), min(), first(), and shuffle().
See L<List::Util>.
=item *
C<Locale::Constants>, C<Locale::Country>, C<Locale::Currency>
C<Locale::Language>, and L<Locale::Script>, by Neil Bowers, have
been added. They provide the codes for various locale standards, such
as "fr" for France, "usd" for US Dollar, and "ja" for Japanese.
use Locale::Country;
$country = code2country('jp'); # $country gets 'Japan'
$code = country2code('Norway'); # $code gets 'no'
See L<Locale::Constants>, L<Locale::Country>, L<Locale::Currency>,
and L<Locale::Language>.
=item *
C<Locale::Maketext>, by Sean Burke, is a localization framework. See
L<Locale::Maketext>, and L<Locale::Maketext::TPJ13>. The latter is an
article about software localization, originally published in The Perl
Journal #13, and republished here with kind permission.
=item *
C<Math::BigRat> for big rational numbers, to accompany Math::BigInt and
Math::BigFloat, from Tels. See L<Math::BigRat>.
=item *
C<Memoize> can make your functions faster by trading space for time,
from Mark-Jason Dominus. See L<Memoize>.
=item *
C<MIME::Base64>, by Gisle Aas, allows you to encode data in base64,
as defined in RFC 2045 - I<MIME (Multipurpose Internet Mail
Extensions)>.
use MIME::Base64;
$encoded = encode_base64('Aladdin:open sesame');
$decoded = decode_base64($encoded);
print $encoded, "\n"; # "QWxhZGRpbjpvcGVuIHNlc2FtZQ=="
See L<MIME::Base64>.
=item *
C<MIME::QuotedPrint>, by Gisle Aas, allows you to encode data
in quoted-printable encoding, as defined in RFC 2045 - I<MIME
(Multipurpose Internet Mail Extensions)>.
use MIME::QuotedPrint;
$encoded = encode_qp("\xDE\xAD\xBE\xEF");
$decoded = decode_qp($encoded);
print $encoded, "\n"; # "=DE=AD=BE=EF\n"
print $decoded, "\n"; # "\xDE\xAD\xBE\xEF\n"
See also L<PerlIO::via::QuotedPrint>.
=item *
C<NEXT>, by Damian Conway, is a pseudo-class for method redispatch.
See L<NEXT>.
=item *
C<open> is a new pragma for setting the default I/O layers
for open().
=item *
C<PerlIO::scalar>, by Nick Ing-Simmons, provides the implementation
of IO to "in memory" Perl scalars as discussed above. It also serves
as an example of a loadable PerlIO layer. Other future possibilities
include PerlIO::Array and PerlIO::Code. See L<PerlIO::scalar>.
=item *
C<PerlIO::via>, by Nick Ing-Simmons, acts as a PerlIO layer and wraps
PerlIO layer functionality provided by a class (typically implemented
in Perl code).
=item *
C<PerlIO::via::QuotedPrint>, by Elizabeth Mattijsen, is an example
of a C<PerlIO::via> class:
use PerlIO::via::QuotedPrint;
open($fh,">:via(QuotedPrint)",$path);
This will automatically convert everything output to C<$fh> to
Quoted-Printable. See L<PerlIO::via> and L<PerlIO::via::QuotedPrint>.
=item *
C<Pod::ParseLink>, by Russ Allbery, has been added,
to parse LZ<><> links in pods as described in the new
perlpodspec.
=item *
C<Pod::Text::Overstrike>, by Joe Smith, has been added.
It converts POD data to formatted overstrike text.
See L<Pod::Text::Overstrike>. [561+]
=item *
C<Scalar::Util> is a selection of general-utility scalar subroutines,
such as blessed(), reftype(), and tainted(). See L<Scalar::Util>.
=item *
C<sort> is a new pragma for controlling the behaviour of sort().
=item *
C<Storable> gives persistence to Perl data structures by allowing the
storage and retrieval of Perl data to and from files in a fast and
compact binary format. Because in effect Storable does serialisation
of Perl data structures, with it you can also clone deep, hierarchical
datastructures. Storable was originally created by Raphael Manfredi,
but it is now maintained by Abhijit Menon-Sen. Storable has been
enhanced to understand the two new hash features, Unicode keys and
restricted hashes. See L<Storable>.
=item *
C<Switch>, by Damian Conway, has been added. Just by saying
use Switch;
you have C<switch> and C<case> available in Perl.
use Switch;
switch ($val) {
case 1 { print "number 1" }
case "a" { print "string a" }
case [1..10,42] { print "number in list" }
case (@array) { print "number in list" }
case /\w+/ { print "pattern" }
case qr/\w+/ { print "pattern" }
case (%hash) { print "entry in hash" }
case (\%hash) { print "entry in hash" }
case (\&sub) { print "arg to subroutine" }
else { print "previous case not true" }
}
See L<Switch>.
=item *
C<Test::More>, by Michael Schwern, is yet another framework for writing
test scripts, more extensive than Test::Simple. See L<Test::More>.
=item *
C<Test::Simple>, by Michael Schwern, has basic utilities for writing
tests. See L<Test::Simple>.
=item *
C<Text::Balanced>, by Damian Conway, has been added, for extracting
delimited text sequences from strings.
use Text::Balanced 'extract_delimited';
($a, $b) = extract_delimited("'never say never', he never said", "'", '');
$a will be "'never say never'", $b will be ', he never said'.
In addition to extract_delimited(), there are also extract_bracketed(),
extract_quotelike(), extract_codeblock(), extract_variable(),
extract_tagged(), extract_multiple(), gen_delimited_pat(), and
gen_extract_tagged(). With these, you can implement rather advanced
parsing algorithms. See L<Text::Balanced>.
=item *
C<threads>, by Arthur Bergman, is an interface to interpreter threads.
Interpreter threads (ithreads) is the new thread model introduced in
Perl 5.6 but only available as an internal interface for extension
writers (and for Win32 Perl for C<fork()> emulation). See L<threads>,
L<threads::shared>, and L<perlthrtut>.
=item *
C<threads::shared>, by Arthur Bergman, allows data sharing for
interpreter threads. See L<threads::shared>.
=item *
C<Tie::File>, by Mark-Jason Dominus, associates a Perl array with the
lines of a file. See L<Tie::File>.
=item *
C<Tie::Memoize>, by Ilya Zakharevich, provides on-demand loaded hashes.
See L<Tie::Memoize>.
=item *
C<Tie::RefHash::Nestable>, by Edward Avis, allows storing hash
references (unlike the standard Tie::RefHash) The module is contained
within Tie::RefHash. See L<Tie::RefHash>.
=item *
C<Time::HiRes>, by Douglas E. Wegscheid, provides high resolution
timing (ualarm, usleep, and gettimeofday). See L<Time::HiRes>.
=item *
C<Unicode::UCD> offers a querying interface to the Unicode Character
Database. See L<Unicode::UCD>.
=item *
C<Unicode::Collate>, by SADAHIRO Tomoyuki, implements the UCA
(Unicode Collation Algorithm) for sorting Unicode strings.
See L<Unicode::Collate>.
=item *
C<Unicode::Normalize>, by SADAHIRO Tomoyuki, implements the various
Unicode normalization forms. See L<Unicode::Normalize>.
=item *
C<XS::APItest>, by Tim Jenness, is a test extension that exercises XS
APIs. Currently only C<printf()> is tested: how to output various
basic data types from XS.
=item *
C<XS::Typemap>, by Tim Jenness, is a test extension that exercises
XS typemaps. Nothing gets installed, but the code is worth studying
for extension writers.
=back
=head2 Updated And Improved Modules and Pragmata
=over 4
=item *
The following independently supported modules have been updated to the
newest versions from CPAN: CGI, CPAN, DB_File, File::Spec, File::Temp,
Getopt::Long, Math::BigFloat, Math::BigInt, the podlators bundle
(Pod::Man, Pod::Text), Pod::LaTeX [561+], Pod::Parser, Storable,
Term::ANSIColor, Test, Text-Tabs+Wrap.
=item *
attributes::reftype() now works on tied arguments.
=item *
AutoLoader can now be disabled with C<no AutoLoader;>.
=item *
B::Deparse has been significantly enhanced by Robin Houston. It can
now deparse almost all of the standard test suite (so that the tests
still succeed). There is a make target "test.deparse" for trying this
out.
=item *
Carp now has better interface documentation, and the @CARP_NOT
interface has been added to get optional control over where errors
are reported independently of @ISA, by Ben Tilly.
=item *
Class::Struct can now define the classes in compile time.
=item *
Class::Struct now assigns the array/hash element if the accessor
is called with an array/hash element as the B<sole> argument.
=item *
The return value of Cwd::fastcwd() is now tainted.
=item *
Data::Dumper now has an option to sort hashes.
=item *
Data::Dumper now has an option to dump code references
using B::Deparse.
=item *
DB_File now supports newer Berkeley DB versions, among
other improvements.
=item *
Devel::Peek now has an interface for the Perl memory statistics
(this works only if you are using perl's malloc, and if you have
compiled with debugging).
=item *
The English module can now be used without the infamous performance
hit by saying
use English '-no_match_vars';
(Assuming, of course, that you don't need the troublesome variables
C<$`>, C<$&>, or C<$'>.) Also, introduced C<@LAST_MATCH_START> and
C<@LAST_MATCH_END> English aliases for C<@-> and C<@+>.
=item *
ExtUtils::MakeMaker has been significantly cleaned up and fixed.
The enhanced version has also been backported to earlier releases
of Perl and submitted to CPAN so that the earlier releases can
enjoy the fixes.
=item *
The arguments of WriteMakefile() in Makefile.PL are now checked
for sanity much more carefully than before. This may cause new
warnings when modules are being installed. See L<ExtUtils::MakeMaker>
for more details.
=item *
ExtUtils::MakeMaker now uses File::Spec internally, which hopefully
leads to better portability.
=item *
Fcntl, Socket, and Sys::Syslog have been rewritten by Nicholas Clark
to use the new-style constant dispatch section (see L<ExtUtils::Constant>).
This means that they will be more robust and hopefully faster.
=item *
File::Find now chdir()s correctly when chasing symbolic links. [561]
=item *
File::Find now has pre- and post-processing callbacks. It also
correctly changes directories when chasing symbolic links. Callbacks
(naughtily) exiting with "next;" instead of "return;" now work.
=item *
File::Find is now (again) reentrant. It also has been made
more portable.
=item *
The warnings issued by File::Find now belong to their own category.
You can enable/disable them with C<use/no warnings 'File::Find';>.
=item *
File::Glob::glob() has been renamed to File::Glob::bsd_glob()
because the name clashes with the builtin glob(). The older
name is still available for compatibility, but is deprecated. [561]
=item *
File::Glob now supports C<GLOB_LIMIT> constant to limit the size of
the returned list of filenames.
=item *
IPC::Open3 now allows the use of numeric file descriptors.
=item *
IO::Socket now has an atmark() method, which returns true if the socket
is positioned at the out-of-band mark. The method is also exportable
as a sockatmark() function.
=item *
IO::Socket::INET failed to open the specified port if the service name
was not known. It now correctly uses the supplied port number as is. [561]
=item *
IO::Socket::INET has support for the ReusePort option (if your
platform supports it). The Reuse option now has an alias, ReuseAddr.
For clarity, you may want to prefer ReuseAddr.
=item *
IO::Socket::INET now supports a value of zero for C<LocalPort>
(usually meaning that the operating system will make one up.)
=item *
'use lib' now works identically to @INC. Removing directories
with 'no lib' now works.
=item *
Math::BigFloat and Math::BigInt have undergone a full rewrite by Tels.
They are now magnitudes faster, and they support various bignum
libraries such as GMP and PARI as their backends.
=item *
Math::Complex handles inf, NaN etc., better.
=item *
Net::Ping has been considerably enhanced by Rob Brown: multihoming is
now supported, Win32 functionality is better, there is now time
measuring functionality (optionally high-resolution using
Time::HiRes), and there is now "external" protocol which uses
Net::Ping::External module which runs your external ping utility and
parses the output. A version of Net::Ping::External is available in
CPAN.
Note that some of the Net::Ping tests are disabled when running
under the Perl distribution since one cannot assume one or more
of the following: enabled echo port at localhost, full Internet
connectivity, or sympathetic firewalls. You can set the environment
variable PERL_TEST_Net_Ping to "1" (one) before running the Perl test
suite to enable all the Net::Ping tests.
=item *
POSIX::sigaction() is now much more flexible and robust.
You can now install coderef handlers, 'DEFAULT', and 'IGNORE'
handlers, installing new handlers was not atomic.
=item *
In Safe, C<%INC> is now localised in a Safe compartment so that
use/require work.
=item *
In SDBM_File on DOSish platforms, some keys went missing because of
lack of support for files with "holes". A workaround for the problem
has been added.
=item *
In Search::Dict one can now have a pre-processing hook for the
lines being searched.
=item *
The Shell module now has an OO interface.
=item *
In Sys::Syslog there is now a failover mechanism that will go
through alternative connection mechanisms until the message
is successfully logged.
=item *
The Test module has been significantly enhanced.
=item *
Time::Local::timelocal() does not handle fractional seconds anymore.
The rationale is that neither does localtime(), and timelocal() and
localtime() are supposed to be inverses of each other.
=item *
The vars pragma now supports declaring fully qualified variables.
(Something that C<our()> does not and will not support.)
=item *
The C<utf8::> name space (as in the pragma) provides various
Perl-callable functions to provide low level access to Perl's
internal Unicode representation. At the moment only length()
has been implemented.
=back
=head1 Utility Changes
=over 4
=item *
Emacs perl mode (emacs/cperl-mode.el) has been updated to version
4.31.
=item *
F<emacs/e2ctags.pl> is now much faster.
=item *
C<enc2xs> is a tool for people adding their own encodings to the
Encode module.
=item *
C<h2ph> now supports C trigraphs.
=item *
C<h2xs> now produces a template README.
=item *
C<h2xs> now uses C<Devel::PPPort> for better portability between
different versions of Perl.
=item *
C<h2xs> uses the new L<ExtUtils::Constant|ExtUtils::Constant> module
which will affect newly created extensions that define constants.
Since the new code is more correct (if you have two constants where the
first one is a prefix of the second one, the first constant B<never>
got defined), less lossy (it uses integers for integer constant,
as opposed to the old code that used floating point numbers even for
integer constants), and slightly faster, you might want to consider
regenerating your extension code (the new scheme makes regenerating
easy). L<h2xs> now also supports C trigraphs.
=item *
C<libnetcfg> has been added to configure libnet.
=item *
C<perlbug> is now much more robust. It also sends the bug report to
perl.org, not perl.com.
=item *
C<perlcc> has been rewritten and its user interface (that is,
command line) is much more like that of the Unix C compiler, cc.
(The perlbc tools has been removed. Use C<perlcc -B> instead.)
B<Note that perlcc is still considered very experimental and
unsupported.> [561]
=item *
C<perlivp> is a new Installation Verification Procedure utility
for running any time after installing Perl.
=item *
C<piconv> is an implementation of the character conversion utility
C<iconv>, demonstrating the new Encode module.
=item *
C<pod2html> now allows specifying a cache directory.
=item *
C<pod2html> now produces XHTML 1.0.
=item *
C<pod2html> now understands POD written using different line endings
(PC-like CRLF versus Unix-like LF versus MacClassic-like CR).
=item *
C<s2p> has been completely rewritten in Perl. (It is in fact a full
implementation of sed in Perl: you can use the sed functionality by
using the C<psed> utility.)
=item *
C<xsubpp> now understands POD documentation embedded in the *.xs
files. [561]
=item *
C<xsubpp> now supports the OUT keyword.
=back
=head1 New Documentation
=over 4
=item *
perl56delta details the changes between the 5.005 release and the
5.6.0 release.
=item *
perlclib documents the internal replacements for standard C library
functions. (Interesting only for extension writers and Perl core
hackers.) [561+]
=item *
perldebtut is a Perl debugging tutorial. [561+]
=item *
perlebcdic contains considerations for running Perl on EBCDIC
platforms. [561+]
=item *
perlintro is a gentle introduction to Perl.
=item *
perliol documents the internals of PerlIO with layers.
=item *
perlmodstyle is a style guide for writing modules.
=item *
perlnewmod tells about writing and submitting a new module. [561+]
=item *
perlpacktut is a pack() tutorial.
=item *
perlpod has been rewritten to be clearer and to record the best
practices gathered over the years.
=item *
perlpodspec is a more formal specification of the pod format,
mainly of interest for writers of pod applications, not to
people writing in pod.
=item *
perlretut is a regular expression tutorial. [561+]
=item *
perlrequick is a regular expressions quick-start guide.
Yes, much quicker than perlretut. [561]
=item *
perltodo has been updated.
=item *
perltootc has been renamed as perltooc (to not to conflict
with perltoot in filesystems restricted to "8.3" names).
=item *
perluniintro is an introduction to using Unicode in Perl.
(perlunicode is more of a detailed reference and background
information)
=item *
perlutil explains the command line utilities packaged with the Perl
distribution. [561+]
=back
The following platform-specific documents are available before
the installation as README.I<platform>, and after the installation
as perlI<platform>:
perlaix perlamiga perlapollo perlbeos perlbs2000
perlce perlcygwin perldgux perldos perlepoc perlfreebsd perlhpux
perlhurd perlirix perlmachten perlmacos perlmint perlmpeix
perlnetware perlos2 perlos390 perlplan9 perlqnx perlsolaris
perltru64 perluts perlvmesa perlvms perlvos perlwin32
These documents usually detail one or more of the following subjects:
configuring, building, testing, installing, and sometimes also using
Perl on the said platform.
Eastern Asian Perl users are now welcomed in their own languages:
README.jp (Japanese), README.ko (Korean), README.cn (simplified
Chinese) and README.tw (traditional Chinese), which are written in
normal pod but encoded in EUC-JP, EUC-KR, EUC-CN and Big5. These
will get installed as
perljp perlko perlcn perltw
=over 4
=item *
The documentation for the POSIX-BC platform is called "BS2000", to avoid
confusion with the Perl POSIX module.
=item *
The documentation for the WinCE platform is called perlce (README.ce
in the source code kit), to avoid confusion with the perlwin32
documentation on 8.3-restricted filesystems.
=back
=head1 Performance Enhancements
=over 4
=item *
map() could get pathologically slow when the result list it generates
is larger than the source list. The performance has been improved for
common scenarios. [561]
=item *
sort() is also fully reentrant, in the sense that the sort function
can itself call sort(). This did not work reliably in previous
releases. [561]
=item *
sort() has been changed to use primarily mergesort internally as
opposed to the earlier quicksort. For very small lists this may
result in slightly slower sorting times, but in general the speedup
should be at least 20%. Additional bonuses are that the worst case
behaviour of sort() is now better (in computer science terms it now
runs in time O(N log N), as opposed to quicksort's Theta(N**2)
worst-case run time behaviour), and that sort() is now stable
(meaning that elements with identical keys will stay ordered as they
were before the sort). See the C<sort> pragma for information.
The story in more detail: suppose you want to serve yourself a little
slice of Pi.
@digits = ( 3,1,4,1,5,9 );
A numerical sort of the digits will yield (1,1,3,4,5,9), as expected.
Which C<1> comes first is hard to know, since one C<1> looks pretty
much like any other. You can regard this as totally trivial,
or somewhat profound. However, if you just want to sort the even
digits ahead of the odd ones, then what will
sort { ($a % 2) <=> ($b % 2) } @digits;
yield? The only even digit, C<4>, will come first. But how about
the odd numbers, which all compare equal? With the quicksort algorithm
used to implement Perl 5.6 and earlier, the order of ties is left up
to the sort. So, as you add more and more digits of Pi, the order
in which the sorted even and odd digits appear will change.
and, for sufficiently large slices of Pi, the quicksort algorithm
in Perl 5.8 won't return the same results even if reinvoked with the
same input. The justification for this rests with quicksort's
worst case behavior. If you run
sort { $a <=> $b } ( 1 .. $N , 1 .. $N );
(something you might approximate if you wanted to merge two sorted
arrays using sort), doubling $N doesn't just double the quicksort time,
it I<quadruples> it. Quicksort has a worst case run time that can
grow like N**2, so-called I<quadratic> behaviour, and it can happen
on patterns that may well arise in normal use. You won't notice this
for small arrays, but you I<will> notice it with larger arrays,
and you may not live long enough for the sort to complete on arrays
of a million elements. So the 5.8 quicksort scrambles large arrays
before sorting them, as a statistical defence against quadratic behaviour.
But that means if you sort the same large array twice, ties may be
broken in different ways.
Because of the unpredictability of tie-breaking order, and the quadratic
worst-case behaviour, quicksort was I<almost> replaced completely with
a stable mergesort. I<Stable> means that ties are broken to preserve
the original order of appearance in the input array. So
sort { ($a % 2) <=> ($b % 2) } (3,1,4,1,5,9);
will yield (4,3,1,1,5,9), guaranteed. The even and odd numbers
appear in the output in the same order they appeared in the input.
Mergesort has worst case O(N log N) behaviour, the best value
attainable. And, ironically, this mergesort does particularly
well where quicksort goes quadratic: mergesort sorts (1..$N, 1..$N)
in O(N) time. But quicksort was rescued at the last moment because
it is faster than mergesort on certain inputs and platforms.
For example, if you really I<don't> care about the order of even
and odd digits, quicksort will run in O(N) time; it's very good
at sorting many repetitions of a small number of distinct elements.
The quicksort divide and conquer strategy works well on platforms
with relatively small, very fast, caches. Eventually, the problem gets
whittled down to one that fits in the cache, from which point it
benefits from the increased memory speed.
Quicksort was rescued by implementing a sort pragma to control aspects
of the sort. The B<stable> subpragma forces stable behaviour,
regardless of algorithm. The B<_quicksort> and B<_mergesort>
subpragmas are heavy-handed ways to select the underlying implementation.
The leading C<_> is a reminder that these subpragmas may not survive
beyond 5.8. More appropriate mechanisms for selecting the implementation
exist, but they wouldn't have arrived in time to save quicksort.
=item *
Hashes now use Bob Jenkins "One-at-a-Time" hashing key algorithm
( http://burtleburtle.net/bob/hash/doobs.html ). This algorithm is
reasonably fast while producing a much better spread of values than
the old hashing algorithm (originally by Chris Torek, later tweaked by
Ilya Zakharevich). Hash values output from the algorithm on a hash of
all 3-char printable ASCII keys comes much closer to passing the
DIEHARD random number generation tests. According to perlbench, this
change has not affected the overall speed of Perl.
=item *
unshift() should now be noticeably faster.
=back
=head1 Installation and Configuration Improvements
=head2 Generic Improvements
=over 4
=item *
INSTALL now explains how you can configure Perl to use 64-bit
integers even on non-64-bit platforms.
=item *
Policy.sh policy change: if you are reusing a Policy.sh file
(see INSTALL) and you use Configure -Dprefix=/foo/bar and in the old
Policy $prefix eq $siteprefix and $prefix eq $vendorprefix, all of
them will now be changed to the new prefix, /foo/bar. (Previously
only $prefix changed.) If you do not like this new behaviour,
specify prefix, siteprefix, and vendorprefix explicitly.
=item *
A new optional location for Perl libraries, otherlibdirs, is available.
It can be used for example for vendor add-ons without disturbing Perl's
own library directories.
=item *
In many platforms, the vendor-supplied 'cc' is too stripped-down to
build Perl (basically, 'cc' doesn't do ANSI C). If this seems
to be the case and 'cc' does not seem to be the GNU C compiler
'gcc', an automatic attempt is made to find and use 'gcc' instead.
=item *
gcc needs to closely track the operating system release to avoid
build problems. If Configure finds that gcc was built for a different
operating system release than is running, it now gives a clearly visible
warning that there may be trouble ahead.
=item *
Since Perl 5.8 is not binary-compatible with previous releases
of Perl, Configure no longer suggests including the 5.005
modules in @INC.
=item *
Configure C<-S> can now run non-interactively. [561]
=item *
Configure support for pdp11-style memory models has been removed due
to obsolescence. [561]
=item *
configure.gnu now works with options with whitespace in them.
=item *
installperl now outputs everything to STDERR.
=item *
Because PerlIO is now the default on most platforms, "-perlio" doesn't
get appended to the $Config{archname} (also known as $^O) anymore.
Instead, if you explicitly choose not to use perlio (Configure command
line option -Uuseperlio), you will get "-stdio" appended.
=item *
Another change related to the architecture name is that "-64all"
(-Duse64bitall, or "maximally 64-bit") is appended only if your
pointers are 64 bits wide. (To be exact, the use64bitall is ignored.)
=item *
In AFS installations, one can configure the root of the AFS to be
somewhere else than the default F</afs> by using the Configure
parameter C<-Dafsroot=/some/where/else>.
=item *
APPLLIB_EXP, a lesser-known configuration-time definition, has been
documented. It can be used to prepend site-specific directories
to Perl's default search path (@INC); see INSTALL for information.
=item *
The version of Berkeley DB used when the Perl (and, presumably, the
DB_File extension) was built is now available as
C<@Config{qw(db_version_major db_version_minor db_version_patch)}>
from Perl and as C<DB_VERSION_MAJOR_CFG DB_VERSION_MINOR_CFG
DB_VERSION_PATCH_CFG> from C.
=item *
Building Berkeley DB3 for compatibility modes for DB, NDBM, and ODBM
has been documented in INSTALL.
=item *
If you have CPAN access (either network or a local copy such as a
CD-ROM) you can during specify extra modules to Configure to build and
install with Perl using the -Dextras=... option. See INSTALL for
more details.
=item *
In addition to config.over, a new override file, config.arch, is
available. This file is supposed to be used by hints file writers
for architecture-wide changes (as opposed to config.over which is
for site-wide changes).
=item *
If your file system supports symbolic links, you can build Perl outside
of the source directory by
mkdir perl/build/directory
cd perl/build/directory
sh /path/to/perl/source/Configure -Dmksymlinks ...
This will create in perl/build/directory a tree of symbolic links
pointing to files in /path/to/perl/source. The original files are left
unaffected. After Configure has finished, you can just say
make all test
and Perl will be built and tested, all in perl/build/directory.
[561]
=item *
For Perl developers, several new make targets for profiling
and debugging have been added; see L<perlhack>.
=over 8
=item *
Use of the F<gprof> tool to profile Perl has been documented in
L<perlhack>. There is a make target called "perl.gprof" for
generating a gprofiled Perl executable.
=item *
If you have GCC 3, there is a make target called "perl.gcov" for
creating a gcoved Perl executable for coverage analysis. See
L<perlhack>.
=item *
If you are on IRIX or Tru64 platforms, new profiling/debugging options
have been added; see L<perlhack> for more information about pixie and
Third Degree.
=back
=item *
Guidelines of how to construct minimal Perl installations have
been added to INSTALL.
=item *
The Thread extension is now not built at all under ithreads
(C<Configure -Duseithreads>) because it wouldn't work anyway (the
Thread extension requires being Configured with C<-Duse5005threads>).
B<Note that the 5.005 threads are unsupported and deprecated: if you
have code written for the old threads you should migrate it to the
new ithreads model.>
=item *
The Gconvert macro ($Config{d_Gconvert}) used by perl for stringifying
floating-point numbers is now more picky about using sprintf %.*g
rules for the conversion. Some platforms that used to use gcvt may
now resort to the slower sprintf.
=item *
The obsolete method of making a special (e.g., debugging) flavor
of perl by saying
make LIBPERL=libperld.a
has been removed. Use -DDEBUGGING instead.
=back
=head2 New Or Improved Platforms
For the list of platforms known to support Perl,
see L<perlport/"Supported Platforms">.
=over 4
=item *
AIX dynamic loading should be now better supported.
=item *
AIX should now work better with gcc, threads, and 64-bitness. Also the
long doubles support in AIX should be better now. See L<perlaix>.
=item *
AtheOS ( http://www.atheos.cx/ ) is a new platform.
=item *
BeOS has been reclaimed.
=item *
The DG/UX platform now supports 5.005-style threads.
See L<perldgux>.
=item *
The DYNIX/ptx platform (also known as dynixptx) is supported at or
near osvers 4.5.2.
=item *
EBCDIC platforms (z/OS (also known as OS/390), POSIX-BC, and VM/ESA)
have been regained. Many test suite tests still fail and the
co-existence of Unicode and EBCDIC isn't quite settled, but the
situation is much better than with Perl 5.6. See L<perlos390>,
L<perlbs2000> (for POSIX-BC), and perlvmesa for more information.
(B<Note:> support for VM/ESA was removed in Perl v5.18.0. The relevant
information was in F<README.vmesa>)
=item *
Building perl with -Duseithreads or -Duse5005threads now works under
HP-UX 10.20 (previously it only worked under 10.30 or later). You will
need a thread library package installed. See README.hpux. [561]
=item *
Mac OS Classic is now supported in the mainstream source package
(MacPerl has of course been available since perl 5.004 but now the
source code bases of standard Perl and MacPerl have been synchronised)
[561]
=item *
Mac OS X (or Darwin) should now be able to build Perl even on HFS+
filesystems. (The case-insensitivity used to confuse the Perl build
process.)
=item *
NCR MP-RAS is now supported. [561]
=item *
All the NetBSD specific patches (except for the installation
specific ones) have been merged back to the main distribution.
=item *
NetWare from Novell is now supported. See L<perlnetware>.
=item *
NonStop-UX is now supported. [561]
=item *
NEC SUPER-UX is now supported.
=item *
All the OpenBSD specific patches (except for the installation
specific ones) have been merged back to the main distribution.
=item *
Perl has been tested with the GNU pth userlevel thread package
( http://www.gnu.org/software/pth/pth.html ). All thread tests
of Perl now work, but not without adding some yield()s to the tests,
so while pth (and other userlevel thread implementations) can be
considered to be "working" with Perl ithreads, keep in mind the
possible non-preemptability of the underlying thread implementation.
=item *
Stratus VOS is now supported using Perl's native build method
(Configure). This is the recommended method to build Perl on
VOS. The older methods, which build miniperl, are still
available. See L<perlvos>. [561+]
=item *
The Amdahl UTS Unix mainframe platform is now supported. [561]
=item *
WinCE is now supported. See L<perlce>.
=item *
z/OS (formerly known as OS/390, formerly known as MVS OE) now has
support for dynamic loading. This is not selected by default,
however, you must specify -Dusedl in the arguments of Configure. [561]
=back
=head1 Selected Bug Fixes
Numerous memory leaks and uninitialized memory accesses have been
hunted down. Most importantly, anonymous subs used to leak quite
a bit. [561]
=over 4
=item *
The autouse pragma didn't work for Multi::Part::Function::Names.
=item *
caller() could cause core dumps in certain situations. Carp was
sometimes affected by this problem. In particular, caller() now
returns a subroutine name of C<(unknown)> for subroutines that have
been removed from the symbol table.
=item *
chop(@list) in list context returned the characters chopped in
reverse order. This has been reversed to be in the right order. [561]
=item *
Configure no longer includes the DBM libraries (dbm, gdbm, db, ndbm)
when building the Perl binary. The only exception to this is SunOS 4.x,
which needs them. [561]
=item *
The behaviour of non-decimal but numeric string constants such as
"0x23" was platform-dependent: in some platforms that was seen as 35,
in some as 0, in some as a floating point number (don't ask). This
was caused by Perl's using the operating system libraries in a situation
where the result of the string to number conversion is undefined: now
Perl consistently handles such strings as zero in numeric contexts.
=item *
Several debugger fixes: exit code now reflects the script exit code,
condition C<"0"> now treated correctly, the C<d> command now checks
line number, C<$.> no longer gets corrupted, and all debugger output
now goes correctly to the socket if RemotePort is set. [561]
=item *
The debugger (perl5db.pl) has been modified to present a more
consistent commands interface, via (CommandSet=580). perl5db.t was
also added to test the changes, and as a placeholder for further tests.
See L<perldebug>.
=item *
The debugger has a new C<dumpDepth> option to control the maximum
depth to which nested structures are dumped. The C<x> command has
been extended so that C<x N EXPR> dumps out the value of I<EXPR> to a
depth of at most I<N> levels.
=item *
The debugger can now show lexical variables if you have the CPAN
module PadWalker installed.
=item *
The order of DESTROYs has been made more predictable.
=item *
Perl 5.6.0 could emit spurious warnings about redefinition of
dl_error() when statically building extensions into perl.
This has been corrected. [561]
=item *
L<dprofpp> -R didn't work.
=item *
C<*foo{FORMAT}> now works.
=item *
Infinity is now recognized as a number.
=item *
UNIVERSAL::isa no longer caches methods incorrectly. (This broke
the Tk extension with 5.6.0.) [561]
=item *
Lexicals I: lexicals outside an eval "" weren't resolved
correctly inside a subroutine definition inside the eval "" if they
were not already referenced in the top level of the eval""ed code.
=item *
Lexicals II: lexicals leaked at file scope into subroutines that
were declared before the lexicals.
=item *
Lexical warnings now propagating correctly between scopes
and into C<eval "...">.
=item *
C<use warnings qw(FATAL all)> did not work as intended. This has been
corrected. [561]
=item *
warnings::enabled() now reports the state of $^W correctly if the caller
isn't using lexical warnings. [561]
=item *
Line renumbering with eval and C<#line> now works. [561]
=item *
Fixed numerous memory leaks, especially in eval "".
=item *
Localised tied variables no longer leak memory
use Tie::Hash;
tie my %tied_hash => 'Tie::StdHash';
...
# Used to leak memory every time local() was called;
# in a loop, this added up.
local($tied_hash{Foo}) = 1;
=item *
Localised hash elements (and %ENV) are correctly unlocalised to not
exist, if they didn't before they were localised.
use Tie::Hash;
tie my %tied_hash => 'Tie::StdHash';
...
# Nothing has set the FOO element so far
{ local $tied_hash{FOO} = 'Bar' }
# This used to print, but not now.
print "exists!\n" if exists $tied_hash{FOO};
As a side effect of this fix, tied hash interfaces B<must> define
the EXISTS and DELETE methods.
=item *
mkdir() now ignores trailing slashes in the directory name,
as mandated by POSIX.
=item *
Some versions of glibc have a broken modfl(). This affects builds
with C<-Duselongdouble>. This version of Perl detects this brokenness
and has a workaround for it. The glibc release 2.2.2 is known to have
fixed the modfl() bug.
=item *
Modulus of unsigned numbers now works (4063328477 % 65535 used to
return 27406, instead of 27047). [561]
=item *
Some "not a number" warnings introduced in 5.6.0 eliminated to be
more compatible with 5.005. Infinity is now recognised as a number. [561]
=item *
Numeric conversions did not recognize changes in the string value
properly in certain circumstances. [561]
=item *
Attributes (such as :shared) didn't work with our().
=item *
our() variables will not cause bogus "Variable will not stay shared"
warnings. [561]
=item *
"our" variables of the same name declared in two sibling blocks
resulted in bogus warnings about "redeclaration" of the variables.
The problem has been corrected. [561]
=item *
pack "Z" now correctly terminates the string with "\0".
=item *
Fix password routines which in some shadow password platforms
(e.g. HP-UX) caused getpwent() to return every other entry.
=item *
The PERL5OPT environment variable (for passing command line arguments
to Perl) didn't work for more than a single group of options. [561]
=item *
PERL5OPT with embedded spaces didn't work.
=item *
printf() no longer resets the numeric locale to "C".
=item *
C<qw(a\\b)> now parses correctly as C<'a\\b'>: that is, as three
characters, not four. [561]
=item *
pos() did not return the correct value within s///ge in earlier
versions. This is now handled correctly. [561]
=item *
Printing quads (64-bit integers) with printf/sprintf now works
without the q L ll prefixes (assuming you are on a quad-capable platform).
=item *
Regular expressions on references and overloaded scalars now work. [561+]
=item *
Right-hand side magic (GMAGIC) could in many cases such as string
concatenation be invoked too many times.
=item *
scalar() now forces scalar context even when used in void context.
=item *
SOCKS support is now much more robust.
=item *
sort() arguments are now compiled in the right wantarray context
(they were accidentally using the context of the sort() itself).
The comparison block is now run in scalar context, and the arguments
to be sorted are always provided list context. [561]
=item *
Changed the POSIX character class C<[[:space:]]> to include the (very
rarely used) vertical tab character. Added a new POSIX-ish character
class C<[[:blank:]]> which stands for horizontal whitespace
(currently, the space and the tab).
=item *
The tainting behaviour of sprintf() has been rationalized. It does
not taint the result of floating point formats anymore, making the
behaviour consistent with that of string interpolation. [561]
=item *
Some cases of inconsistent taint propagation (such as within hash
values) have been fixed.
=item *
The RE engine found in Perl 5.6.0 accidentally pessimised certain kinds
of simple pattern matches. These are now handled better. [561]
=item *
Regular expression debug output (whether through C<use re 'debug'>
or via C<-Dr>) now looks better. [561]
=item *
Multi-line matches like C<"a\nxb\n" =~ /(?!\A)x/m> were flawed. The
bug has been fixed. [561]
=item *
Use of $& could trigger a core dump under some situations. This
is now avoided. [561]
=item *
The regular expression captured submatches ($1, $2, ...) are now
more consistently unset if the match fails, instead of leaving false
data lying around in them. [561]
=item *
readline() on files opened in "slurp" mode could return an extra
"" (blank line) at the end in certain situations. This has been
corrected. [561]
=item *
Autovivification of symbolic references of special variables described
in L<perlvar> (as in C<${$num}>) was accidentally disabled. This works
again now. [561]
=item *
Sys::Syslog ignored the C<LOG_AUTH> constant.
=item *
$AUTOLOAD, sort(), lock(), and spawning subprocesses
in multiple threads simultaneously are now thread-safe.
=item *
Tie::Array's SPLICE method was broken.
=item *
Allow a read-only string on the left-hand side of a non-modifying tr///.
=item *
If C<STDERR> is tied, warnings caused by C<warn> and C<die> now
correctly pass to it.
=item *
Several Unicode fixes.
=over 8
=item *
BOMs (byte order marks) at the beginning of Perl files
(scripts, modules) should now be transparently skipped.
UTF-16 and UCS-2 encoded Perl files should now be read correctly.
=item *
The character tables have been updated to Unicode 3.2.0.
=item *
Comparing with utf8 data does not magically upgrade non-utf8 data
into utf8. (This was a problem for example if you were mixing data
from I/O and Unicode data: your output might have got magically encoded
as UTF-8.)
=item *
Generating illegal Unicode code points such as U+FFFE, or the UTF-16
surrogates, now also generates an optional warning.
=item *
C<IsAlnum>, C<IsAlpha>, and C<IsWord> now match titlecase.
=item *
Concatenation with the C<.> operator or via variable interpolation,
C<eq>, C<substr>, C<reverse>, C<quotemeta>, the C<x> operator,
substitution with C<s///>, single-quoted UTF-8, should now work.
=item *
The C<tr///> operator now works. Note that the C<tr///CU>
functionality has been removed (but see pack('U0', ...)).
=item *
C<eval "v200"> now works.
=item *
Perl 5.6.0 parsed m/\x{ab}/ incorrectly, leading to spurious warnings.
This has been corrected. [561]
=item *
Zero entries were missing from the Unicode classes such as C<IsDigit>.
=back
=item *
Large unsigned numbers (those above 2**31) could sometimes lose their
unsignedness, causing bogus results in arithmetic operations. [561]
=item *
The Perl parser has been stress tested using both random input and
Markov chain input and the few found crashes and lockups have been
fixed.
=back
=head2 Platform Specific Changes and Fixes
=over 4
=item *
BSDI 4.*
Perl now works on post-4.0 BSD/OSes.
=item *
All BSDs
Setting C<$0> now works (as much as possible; see L<perlvar> for details).
=item *
Cygwin
Numerous updates; currently synchronised with Cygwin 1.3.10.
=item *
Previously DYNIX/ptx had problems in its Configure probe for non-blocking I/O.
=item *
EPOC
EPOC now better supported. See README.epoc. [561]
=item *
FreeBSD 3.*
Perl now works on post-3.0 FreeBSDs.
=item *
HP-UX
README.hpux updated; C<Configure -Duse64bitall> now works;
now uses HP-UX malloc instead of Perl malloc.
=item *
IRIX
Numerous compilation flag and hint enhancements; accidental mixing
of 32-bit and 64-bit libraries (a doomed attempt) made much harder.
=item *
Linux
=over 8
=item *
Long doubles should now work (see INSTALL). [561]
=item *
Linux previously had problems related to sockaddrlen when using
accept(), recvfrom() (in Perl: recv()), getpeername(), and
getsockname().
=back
=item *
Mac OS Classic
Compilation of the standard Perl distribution in Mac OS Classic should
now work if you have the Metrowerks development environment and the
missing Mac-specific toolkit bits. Contact the macperl mailing list
for details.
=item *
MPE/iX
MPE/iX update after Perl 5.6.0. See README.mpeix. [561]
=item *
NetBSD/threads: try installing the GNU pth (should be in the
packages collection, or http://www.gnu.org/software/pth/),
and Configure with -Duseithreads.
=item *
NetBSD/sparc
Perl now works on NetBSD/sparc.
=item *
OS/2
Now works with usethreads (see INSTALL). [561]
=item *
Solaris
64-bitness using the Sun Workshop compiler now works.
=item *
Stratus VOS
The native build method requires at least VOS Release 14.5.0
and GNU C++/GNU Tools 2.0.1 or later. The Perl pack function
now maps overflowed values to +infinity and underflowed values
to -infinity.
=item *
Tru64 (aka Digital UNIX, aka DEC OSF/1)
The operating system version letter now recorded in $Config{osvers}.
Allow compiling with gcc (previously explicitly forbidden). Compiling
with gcc still not recommended because buggy code results, even with
gcc 2.95.2.
=item *
Unicos
Fixed various alignment problems that lead into core dumps either
during build or later; no longer dies on math errors at runtime;
now using full quad integers (64 bits), previously was using
only 46 bit integers for speed.
=item *
VMS
See L</"Socket Extension Dynamic in VMS"> and L</"IEEE-format Floating Point
Default on OpenVMS Alpha"> for important changes not otherwise listed here.
chdir() now works better despite a CRT bug; now works with MULTIPLICITY
(see INSTALL); now works with Perl's malloc.
The tainting of C<%ENV> elements via C<keys> or C<values> was previously
unimplemented. It now works as documented.
The C<waitpid> emulation has been improved. The worst bug (now fixed)
was that a pid of -1 would cause a wildcard search of all processes on
the system.
POSIX-style signals are now emulated much better on VMS versions prior
to 7.0.
The C<system> function and backticks operator have improved
functionality and better error handling. [561]
File access tests now use current process privileges rather than the
user's default privileges, which could sometimes result in a mismatch
between reported access and actual access. This improvement is only
available on VMS v6.0 and later.
There is a new C<kill> implementation based on C<sys$sigprc> that allows
older VMS systems (pre-7.0) to use C<kill> to send signals rather than
simply force exit. This implementation also allows later systems to
call C<kill> from within a signal handler.
Iterative logical name translations are now limited to 10 iterations in
imitation of SHOW LOGICAL and other OpenVMS facilities.
=item *
Windows
=over 8
=item *
Signal handling now works better than it used to. It is now implemented
using a Windows message loop, and is therefore less prone to random
crashes.
=item *
fork() emulation is now more robust, but still continues to have a few
esoteric bugs and caveats. See L<perlfork> for details. [561+]
=item *
A failed (pseudo)fork now returns undef and sets errno to EAGAIN. [561]
=item *
The following modules now work on Windows:
ExtUtils::Embed [561]
IO::Pipe
IO::Poll
Net::Ping
=item *
IO::File::new_tmpfile() is no longer limited to 32767 invocations
per-process.
=item *
Better chdir() return value for a non-existent directory.
=item *
Compiling perl using the 64-bit Platform SDK tools is now supported.
=item *
The Win32::SetChildShowWindow() builtin can be used to control the
visibility of windows created by child processes. See L<Win32> for
details.
=item *
Non-blocking waits for child processes (or pseudo-processes) are
supported via C<waitpid($pid, &POSIX::WNOHANG)>.
=item *
The behavior of system() with multiple arguments has been rationalized.
Each unquoted argument will be automatically quoted to protect whitespace,
and any existing whitespace in the arguments will be preserved. This
improves the portability of system(@args) by avoiding the need for
Windows C<cmd> shell specific quoting in perl programs.
Note that this means that some scripts that may have relied on earlier
buggy behavior may no longer work correctly. For example,
C<system("nmake /nologo", @args)> will now attempt to run the file
C<nmake /nologo> and will fail when such a file isn't found.
On the other hand, perl will now execute code such as
C<system("c:/Program Files/MyApp/foo.exe", @args)> correctly.
=item *
The perl header files no longer suppress common warnings from the
Microsoft Visual C++ compiler. This means that additional warnings may
now show up when compiling XS code.
=item *
Borland C++ v5.5 is now a supported compiler that can build Perl.
However, the generated binaries continue to be incompatible with those
generated by the other supported compilers (GCC and Visual C++). [561]
=item *
Duping socket handles with open(F, ">&MYSOCK") now works under Windows 9x.
[561]
=item *
Current directory entries in %ENV are now correctly propagated to child
processes. [561]
=item *
New %ENV entries now propagate to subprocesses. [561]
=item *
Win32::GetCwd() correctly returns C:\ instead of C: when at the drive root.
Other bugs in chdir() and Cwd::cwd() have also been fixed. [561]
=item *
The makefiles now default to the features enabled in ActiveState ActivePerl
(a popular Win32 binary distribution). [561]
=item *
HTML files will now be installed in c:\perl\html instead of
c:\perl\lib\pod\html
=item *
REG_EXPAND_SZ keys are now allowed in registry settings used by perl. [561]
=item *
Can now send() from all threads, not just the first one. [561]
=item *
ExtUtils::MakeMaker now uses $ENV{LIB} to search for libraries. [561]
=item *
Less stack reserved per thread so that more threads can run
concurrently. (Still 16M per thread.) [561]
=item *
C<< File::Spec->tmpdir() >> now prefers C:/temp over /tmp
(works better when perl is running as service).
=item *
Better UNC path handling under ithreads. [561]
=item *
wait(), waitpid(), and backticks now return the correct exit status
under Windows 9x. [561]
=item *
A socket handle leak in accept() has been fixed. [561]
=back
=back
=head1 New or Changed Diagnostics
Please see L<perldiag> for more details.
=over 4
=item *
Ambiguous range in the transliteration operator (like a-z-9) now
gives a warning.
=item *
chdir("") and chdir(undef) now give a deprecation warning because they
cause a possible unintentional chdir to the home directory.
Say chdir() if you really mean that.
=item *
Two new debugging options have been added: if you have compiled your
Perl with debugging, you can use the -DT [561] and -DR options to trace
tokenising and to add reference counts to displaying variables,
respectively.
=item *
The lexical warnings category "deprecated" is no longer a sub-category
of the "syntax" category. It is now a top-level category in its own
right.
=item *
Unadorned dump() will now give a warning suggesting to
use explicit CORE::dump() if that's what really is meant.
=item *
The "Unrecognized escape" warning has been extended to include C<\8>,
C<\9>, and C<\_>. There is no need to escape any of the C<\w> characters.
=item *
All regular expression compilation error messages are now hopefully
easier to understand both because the error message now comes before
the failed regex and because the point of failure is now clearly
marked by a C<E<lt>-- HERE> marker.
=item *
Various I/O (and socket) functions like binmode(), close(), and so
forth now more consistently warn if they are used illogically either
on a yet unopened or on an already closed filehandle (or socket).
=item *
Using lstat() on a filehandle now gives a warning. (It's a non-sensical
thing to do.)
=item *
The C<-M> and C<-m> options now warn if you didn't supply the module name.
=item *
If you in C<use> specify a required minimum version, modules matching
the name and but not defining a $VERSION will cause a fatal failure.
=item *
Using negative offset for vec() in lvalue context is now a warnable offense.
=item *
Odd number of arguments to overload::constant now elicits a warning.
=item *
Odd number of elements in anonymous hash now elicits a warning.
=item *
The various "opened only for", "on closed", "never opened" warnings
drop the C<main::> prefix for filehandles in the C<main> package,
for example C<STDIN> instead of C<main::STDIN>.
=item *
Subroutine prototypes are now checked more carefully, you may
get warnings for example if you have used non-prototype characters.
=item *
If an attempt to use a (non-blessed) reference as an array index
is made, a warning is given.
=item *
C<push @a;> and C<unshift @a;> (with no values to push or unshift)
now give a warning. This may be a problem for generated and eval'ed
code.
=item *
If you try to L<perlfunc/pack> a number less than 0 or larger than 255
using the C<"C"> format you will get an optional warning. Similarly
for the C<"c"> format and a number less than -128 or more than 127.
=item *
pack C<P> format now demands an explicit size.
=item *
unpack C<w> now warns of unterminated compressed integers.
=item *
Warnings relating to the use of PerlIO have been added.
=item *
Certain regex modifiers such as C<(?o)> make sense only if applied to
the entire regex. You will get an optional warning if you try to do
otherwise.
=item *
Variable length lookbehind has not yet been implemented, trying to
use it will tell that.
=item *
Using arrays or hashes as references (e.g. C<< %foo->{bar} >>
has been deprecated for a while. Now you will get an optional warning.
=item *
Warnings relating to the use of the new restricted hashes feature
have been added.
=item *
Self-ties of arrays and hashes are not supported and fatal errors
will happen even at an attempt to do so.
=item *
Using C<sort> in scalar context now issues an optional warning.
This didn't do anything useful, as the sort was not performed.
=item *
Using the /g modifier in split() is meaningless and will cause a warning.
=item *
Using splice() past the end of an array now causes a warning.
=item *
Malformed Unicode encodings (UTF-8 and UTF-16) cause a lot of warnings,
as does trying to use UTF-16 surrogates (which are unimplemented).
=item *
Trying to use Unicode characters on an I/O stream without marking the
stream's encoding (using open() or binmode()) will cause "Wide character"
warnings.
=item *
Use of v-strings in use/require causes a (backward) portability warning.
=item *
Warnings relating to the use interpreter threads and their shared data
have been added.
=back
=head1 Changed Internals
=over 4
=item *
PerlIO is now the default.
=item *
perlapi.pod (a companion to perlguts) now attempts to document the
internal API.
=item *
You can now build a really minimal perl called microperl.
Building microperl does not require even running Configure;
C<make -f Makefile.micro> should be enough. Beware: microperl makes
many assumptions, some of which may be too bold; the resulting
executable may crash or otherwise misbehave in wondrous ways.
For careful hackers only.
=item *
Added rsignal(), whichsig(), do_join(), op_clear, op_null,
ptr_table_clear(), ptr_table_free(), sv_setref_uv(), and several UTF-8
interfaces to the publicised API. For the full list of the available
APIs see L<perlapi>.
=item *
Made possible to propagate customised exceptions via croak()ing.
=item *
Now xsubs can have attributes just like subs. (Well, at least the
built-in attributes.)
=item *
dTHR and djSP have been obsoleted; the former removed (because it's
a no-op) and the latter replaced with dSP.
=item *
PERL_OBJECT has been completely removed.
=item *
The MAGIC constants (e.g. C<'P'>) have been macrofied
(e.g. C<PERL_MAGIC_TIED>) for better source code readability
and maintainability.
=item *
The regex compiler now maintains a structure that identifies nodes in
the compiled bytecode with the corresponding syntactic features of the
original regex expression. The information is attached to the new
C<offsets> member of the C<struct regexp>. See L<perldebguts> for more
complete information.
=item *
The C code has been made much more C<gcc -Wall> clean. Some warning
messages still remain in some platforms, so if you are compiling with
gcc you may see some warnings about dubious practices. The warnings
are being worked on.
=item *
F<perly.c>, F<sv.c>, and F<sv.h> have now been extensively commented.
=item *
Documentation on how to use the Perl source repository has been added
to F<Porting/repository.pod>.
=item *
There are now several profiling make targets.
=back
=head1 Security Vulnerability Closed [561]
(This change was already made in 5.7.0 but bears repeating here.)
(5.7.0 came out before 5.6.1: the development branch 5.7 released
earlier than the maintenance branch 5.6)
A potential security vulnerability in the optional suidperl component
of Perl was identified in August 2000. suidperl is neither built nor
installed by default. As of November 2001 the only known vulnerable
platform is Linux, most likely all Linux distributions. CERT and
various vendors and distributors have been alerted about the vulnerability.
See http://www.cpan.org/src/5.0/sperl-2000-08-05/sperl-2000-08-05.txt
for more information.
The problem was caused by Perl trying to report a suspected security
exploit attempt using an external program, /bin/mail. On Linux
platforms the /bin/mail program had an undocumented feature which
when combined with suidperl gave access to a root shell, resulting in
a serious compromise instead of reporting the exploit attempt. If you
don't have /bin/mail, or if you have 'safe setuid scripts', or if
suidperl is not installed, you are safe.
The exploit attempt reporting feature has been completely removed from
Perl 5.8.0 (and the maintenance release 5.6.1, and it was removed also
from all the Perl 5.7 releases), so that particular vulnerability
isn't there anymore. However, further security vulnerabilities are,
unfortunately, always possible. The suidperl functionality is most
probably going to be removed in Perl 5.10. In any case, suidperl
should only be used by security experts who know exactly what they are
doing and why they are using suidperl instead of some other solution
such as sudo ( see http://www.courtesan.com/sudo/ ).
=head1 New Tests
Several new tests have been added, especially for the F<lib> and
F<ext> subsections. There are now about 69 000 individual tests
(spread over about 700 test scripts), in the regression suite (5.6.1
has about 11 700 tests, in 258 test scripts) The exact numbers depend
on the platform and Perl configuration used. Many of the new tests
are of course introduced by the new modules, but still in general Perl
is now more thoroughly tested.
Because of the large number of tests, running the regression suite
will take considerably longer time than it used to: expect the suite
to take up to 4-5 times longer to run than in perl 5.6. On a really
fast machine you can hope to finish the suite in about 6-8 minutes
(wallclock time).
The tests are now reported in a different order than in earlier Perls.
(This happens because the test scripts from under t/lib have been moved
to be closer to the library/extension they are testing.)
=head1 Known Problems
=head2 The Compiler Suite Is Still Very Experimental
The compiler suite is slowly getting better but it continues to be
highly experimental. Use in production environments is discouraged.
=head2 Localising Tied Arrays and Hashes Is Broken
local %tied_array;
doesn't work as one would expect: the old value is restored
incorrectly. This will be changed in a future release, but we don't
know yet what the new semantics will exactly be. In any case, the
change will break existing code that relies on the current
(ill-defined) semantics, so just avoid doing this in general.
=head2 Building Extensions Can Fail Because Of Largefiles
Some extensions like mod_perl are known to have issues with
`largefiles', a change brought by Perl 5.6.0 in which file offsets
default to 64 bits wide, where supported. Modules may fail to compile
at all, or they may compile and work incorrectly. Currently, there
is no good solution for the problem, but Configure now provides
appropriate non-largefile ccflags, ldflags, libswanted, and libs
in the %Config hash (e.g., $Config{ccflags_nolargefiles}) so the
extensions that are having problems can try configuring themselves
without the largefileness. This is admittedly not a clean solution,
and the solution may not even work at all. One potential failure is
whether one can (or, if one can, whether it's a good idea to) link
together at all binaries with different ideas about file offsets;
all this is platform-dependent.
=head2 Modifying $_ Inside for(..)
for (1..5) { $_++ }
works without complaint. It shouldn't. (You should be able to
modify only lvalue elements inside the loops.) You can see the
correct behaviour by replacing the 1..5 with 1, 2, 3, 4, 5.
=head2 mod_perl 1.26 Doesn't Build With Threaded Perl
Use mod_perl 1.27 or higher.
=head2 lib/ftmp-security tests warn 'system possibly insecure'
Don't panic. Read the 'make test' section of INSTALL instead.
=head2 libwww-perl (LWP) fails base/date #51
Use libwww-perl 5.65 or later.
=head2 PDL failing some tests
Use PDL 2.3.4 or later.
=head2 Perl_get_sv
You may get errors like 'Undefined symbol "Perl_get_sv"' or "can't
resolve symbol 'Perl_get_sv'", or the symbol may be "Perl_sv_2pv".
This probably means that you are trying to use an older shared Perl
library (or extensions linked with such) with Perl 5.8.0 executable.
Perl used to have such a subroutine, but that is no more the case.
Check your shared library path, and any shared Perl libraries in those
directories.
Sometimes this problem may also indicate a partial Perl 5.8.0
installation, see L</"Mac OS X dyld undefined symbols"> for an
example and how to deal with it.
=head2 Self-tying Problems
Self-tying of arrays and hashes is broken in rather deep and
hard-to-fix ways. As a stop-gap measure to avoid people from getting
frustrated at the mysterious results (core dumps, most often), it is
forbidden for now (you will get a fatal error even from an attempt).
A change to self-tying of globs has caused them to be recursively
referenced (see: L<perlobj/"Two-Phased Garbage Collection">). You
will now need an explicit untie to destroy a self-tied glob. This
behaviour may be fixed at a later date.
Self-tying of scalars and IO thingies works.
=head2 ext/threads/t/libc
If this test fails, it indicates that your libc (C library) is not
threadsafe. This particular test stress tests the localtime() call to
find out whether it is threadsafe. See L<perlthrtut> for more information.
=head2 Failure of Thread (5.005-style) tests
B<Note that support for 5.005-style threading is deprecated,
experimental and practically unsupported. In 5.10, it is expected
to be removed. You should migrate your code to ithreads.>
The following tests are known to fail due to fundamental problems in
the 5.005 threading implementation. These are not new failures--Perl
5.005_0x has the same bugs, but didn't have these tests.
../ext/B/t/xref.t 255 65280 14 12 85.71% 3-14
../ext/List/Util/t/first.t 255 65280 7 4 57.14% 2 5-7
../lib/English.t 2 512 54 2 3.70% 2-3
../lib/FileCache.t 5 1 20.00% 5
../lib/Filter/Simple/t/data.t 6 3 50.00% 1-3
../lib/Filter/Simple/t/filter_only. 9 3 33.33% 1-2 5
../lib/Math/BigInt/t/bare_mbf.t 1627 4 0.25% 8 11 1626-1627
../lib/Math/BigInt/t/bigfltpm.t 1629 4 0.25% 10 13 1628-
1629
../lib/Math/BigInt/t/sub_mbf.t 1633 4 0.24% 8 11 1632-1633
../lib/Math/BigInt/t/with_sub.t 1628 4 0.25% 9 12 1627-1628
../lib/Tie/File/t/31_autodefer.t 255 65280 65 32 49.23% 34-65
../lib/autouse.t 10 1 10.00% 4
op/flip.t 15 1 6.67% 15
These failures are unlikely to get fixed as 5.005-style threads
are considered fundamentally broken. (Basically what happens is that
competing threads can corrupt shared global state, one good example
being regular expression engine's state.)
=head2 Timing problems
The following tests may fail intermittently because of timing
problems, for example if the system is heavily loaded.
t/op/alarm.t
ext/Time/HiRes/HiRes.t
lib/Benchmark.t
lib/Memoize/t/expmod_t.t
lib/Memoize/t/speed.t
In case of failure please try running them manually, for example
./perl -Ilib ext/Time/HiRes/HiRes.t
=head2 Tied/Magical Array/Hash Elements Do Not Autovivify
For normal arrays C<$foo = \$bar[1]> will assign C<undef> to
C<$bar[1]> (assuming that it didn't exist before), but for
tied/magical arrays and hashes such autovivification does not happen
because there is currently no way to catch the reference creation.
The same problem affects slicing over non-existent indices/keys of
a tied/magical array/hash.
=head2 Unicode in package/class and subroutine names does not work
One can have Unicode in identifier names, but not in package/class or
subroutine names. While some limited functionality towards this does
exist as of Perl 5.8.0, that is more accidental than designed; use of
Unicode for the said purposes is unsupported.
One reason of this unfinishedness is its (currently) inherent
unportability: since both package names and subroutine names may
need to be mapped to file and directory names, the Unicode capability
of the filesystem becomes important-- and there unfortunately aren't
portable answers.
=head1 Platform Specific Problems
=head2 AIX
=over 4
=item *
If using the AIX native make command, instead of just "make" issue
"make all". In some setups the former has been known to spuriously
also try to run "make install". Alternatively, you may want to use
GNU make.
=item *
In AIX 4.2, Perl extensions that use C++ functions that use statics
may have problems in that the statics are not getting initialized.
In newer AIX releases, this has been solved by linking Perl with
the libC_r library, but unfortunately in AIX 4.2 the said library
has an obscure bug where the various functions related to time
(such as time() and gettimeofday()) return broken values, and
therefore in AIX 4.2 Perl is not linked against libC_r.
=item *
vac 5.0.0.0 May Produce Buggy Code For Perl
The AIX C compiler vac version 5.0.0.0 may produce buggy code,
resulting in a few random tests failing when run as part of "make
test", but when the failing tests are run by hand, they succeed.
We suggest upgrading to at least vac version 5.0.1.0, that has been
known to compile Perl correctly. "lslpp -L|grep vac.C" will tell
you the vac version. See README.aix.
=item *
If building threaded Perl, you may get compilation warning from pp_sys.c:
"pp_sys.c", line 4651.39: 1506-280 (W) Function argument assignment between types "unsigned char*" and "const void*" is not allowed.
This is harmless; it is caused by the getnetbyaddr() and getnetbyaddr_r()
having slightly different types for their first argument.
=back
=head2 Alpha systems with old gccs fail several tests
If you see op/pack, op/pat, op/regexp, or ext/Storable tests failing
in a Linux/alpha or *BSD/Alpha, it's probably time to upgrade your gcc.
gccs prior to 2.95.3 are definitely not good enough, and gcc 3.1 may
be even better. (RedHat Linux/alpha with gcc 3.1 reported no problems,
as did Linux 2.4.18 with gcc 2.95.4.) (In Tru64, it is preferable to
use the bundled C compiler.)
=head2 AmigaOS
Perl 5.8.0 doesn't build in AmigaOS. It broke at some point during
the ithreads work and we could not find Amiga experts to unbreak the
problems. Perl 5.6.1 still works for AmigaOS (as does the 5.7.2
development release).
=head2 BeOS
The following tests fail on 5.8.0 Perl in BeOS Personal 5.03:
t/op/lfs............................FAILED at test 17
t/op/magic..........................FAILED at test 24
ext/Fcntl/t/syslfs..................FAILED at test 17
ext/File/Glob/t/basic...............FAILED at test 3
ext/POSIX/t/sigaction...............FAILED at test 13
ext/POSIX/t/waitpid.................FAILED at test 1
(B<Note:> more information was available in F<README.beos> until support for
BeOS was removed in Perl v5.18.0)
=head2 Cygwin "unable to remap"
For example when building the Tk extension for Cygwin,
you may get an error message saying "unable to remap".
This is known problem with Cygwin, and a workaround is
detailed in here: http://sources.redhat.com/ml/cygwin/2001-12/msg00894.html
=head2 Cygwin ndbm tests fail on FAT
One can build but not install (or test the build of) the NDBM_File
on FAT filesystems. Installation (or build) on NTFS works fine.
If one attempts the test on a FAT install (or build) the following
failures are expected:
../ext/NDBM_File/ndbm.t 13 3328 71 59 83.10% 1-2 4 16-71
../ext/ODBM_File/odbm.t 255 65280 ?? ?? % ??
../lib/AnyDBM_File.t 2 512 12 2 16.67% 1 4
../lib/Memoize/t/errors.t 0 139 11 5 45.45% 7-11
../lib/Memoize/t/tie_ndbm.t 13 3328 4 4 100.00% 1-4
run/fresh_perl.t 97 1 1.03% 91
NDBM_File fails and ODBM_File just coredumps.
If you intend to run only on FAT (or if using AnyDBM_File on FAT),
run Configure with the -Ui_ndbm and -Ui_dbm options to prevent
NDBM_File and ODBM_File being built.
=head2 DJGPP Failures
t/op/stat............................FAILED at test 29
lib/File/Find/t/find.................FAILED at test 1
lib/File/Find/t/taint................FAILED at test 1
lib/h2xs.............................FAILED at test 15
lib/Pod/t/eol........................FAILED at test 1
lib/Test/Harness/t/strap-analyze.....FAILED at test 8
lib/Test/Harness/t/test-harness......FAILED at test 23
lib/Test/Simple/t/exit...............FAILED at test 1
The above failures are known as of 5.8.0 with native builds with long
filenames, but there are a few more if running under dosemu because of
limitations (and maybe bugs) of dosemu:
t/comp/cpp...........................FAILED at test 3
t/op/inccode.........................(crash)
and a few lib/ExtUtils tests, and several hundred Encode/t/Aliases.t
failures that work fine with long filenames. So you really might
prefer native builds and long filenames.
=head2 FreeBSD built with ithreads coredumps reading large directories
This is a known bug in FreeBSD 4.5's readdir_r(), it has been fixed in
FreeBSD 4.6 (see L<perlfreebsd> (README.freebsd)).
=head2 FreeBSD Failing locale Test 117 For ISO 8859-15 Locales
The ISO 8859-15 locales may fail the locale test 117 in FreeBSD.
This is caused by the characters \xFF (y with diaeresis) and \xBE
(Y with diaeresis) not behaving correctly when being matched
case-insensitively. Apparently this problem has been fixed in
the latest FreeBSD releases.
( http://www.freebsd.org/cgi/query-pr.cgi?pr=34308 )
=head2 IRIX fails ext/List/Util/t/shuffle.t or Digest::MD5
IRIX with MIPSpro 7.3.1.2m or 7.3.1.3m compiler may fail the List::Util
test ext/List/Util/t/shuffle.t by dumping core. This seems to be
a compiler error since if compiled with gcc no core dump ensues, and
no failures have been seen on the said test on any other platform.
Similarly, building the Digest::MD5 extension has been
known to fail with "*** Termination code 139 (bu21)".
The cure is to drop optimization level (Configure -Doptimize=-O2).
=head2 HP-UX lib/posix Subtest 9 Fails When LP64-Configured
If perl is configured with -Duse64bitall, the successful result of the
subtest 10 of lib/posix may arrive before the successful result of the
subtest 9, which confuses the test harness so much that it thinks the
subtest 9 failed.
=head2 Linux with glibc 2.2.5 fails t/op/int subtest #6 with -Duse64bitint
This is a known bug in the glibc 2.2.5 with long long integers.
( http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=65612 )
=head2 Linux With Sfio Fails op/misc Test 48
No known fix.
=head2 Mac OS X
Please remember to set your environment variable LC_ALL to "C"
(setenv LC_ALL C) before running "make test" to avoid a lot of
warnings about the broken locales of Mac OS X.
The following tests are known to fail in Mac OS X 10.1.5 because of
buggy (old) implementations of Berkeley DB included in Mac OS X:
Failed Test Stat Wstat Total Fail Failed List of Failed
-------------------------------------------------------------------------
../ext/DB_File/t/db-btree.t 0 11 ?? ?? % ??
../ext/DB_File/t/db-recno.t 149 3 2.01% 61 63 65
If you are building on a UFS partition, you will also probably see
t/op/stat.t subtest #9 fail. This is caused by Darwin's UFS not
supporting inode change time.
Also the ext/POSIX/t/posix.t subtest #10 fails but it is skipped for
now because the failure is Apple's fault, not Perl's (blocked signals
are lost).
If you Configure with ithreads, ext/threads/t/libc.t will fail. Again,
this is not Perl's fault-- the libc of Mac OS X is not threadsafe
(in this particular test, the localtime() call is found to be
threadunsafe.)
=head2 Mac OS X dyld undefined symbols
If after installing Perl 5.8.0 you are getting warnings about missing
symbols, for example
dyld: perl Undefined symbols
_perl_sv_2pv
_perl_get_sv
you probably have an old pre-Perl-5.8.0 installation (or parts of one)
in /Library/Perl (the undefined symbols used to exist in pre-5.8.0 Perls).
It seems that for some reason "make install" doesn't always completely
overwrite the files in /Library/Perl. You can move the old Perl
shared library out of the way like this:
cd /Library/Perl/darwin/CORE
mv libperl.dylib libperlold.dylib
and then reissue "make install". Note that the above of course is
extremely disruptive for anything using the /usr/local/bin/perl.
If that doesn't help, you may have to try removing all the .bundle
files from beneath /Library/Perl, and again "make install"-ing.
=head2 OS/2 Test Failures
The following tests are known to fail on OS/2 (for clarity
only the failures are shown, not the full error messages):
../lib/ExtUtils/t/Mkbootstrap.t 1 256 18 1 5.56% 8
../lib/ExtUtils/t/Packlist.t 1 256 34 1 2.94% 17
../lib/ExtUtils/t/basic.t 1 256 17 1 5.88% 14
lib/os2_process.t 2 512 227 2 0.88% 174 209
lib/os2_process_kid.t 227 2 0.88% 174 209
lib/rx_cmprt.t 255 65280 18 3 16.67% 16-18
=head2 op/sprintf tests 91, 129, and 130
The op/sprintf tests 91, 129, and 130 are known to fail on some platforms.
Examples include any platform using sfio, and Compaq/Tandem's NonStop-UX.
Test 91 is known to fail on QNX6 (nto), because C<sprintf '%e',0>
incorrectly produces C<0.000000e+0> instead of C<0.000000e+00>.
For tests 129 and 130, the failing platforms do not comply with
the ANSI C Standard: lines 19ff on page 134 of ANSI X3.159 1989, to
be exact. (They produce something other than "1" and "-1" when
formatting 0.6 and -0.6 using the printf format "%.0f"; most often,
they produce "0" and "-0".)
=head2 SCO
The socketpair tests are known to be unhappy in SCO 3.2v5.0.4:
ext/Socket/socketpair.t...............FAILED tests 15-45
=head2 Solaris 2.5
In case you are still using Solaris 2.5 (aka SunOS 5.5), you may
experience failures (the test core dumping) in lib/locale.t.
The suggested cure is to upgrade your Solaris.
=head2 Solaris x86 Fails Tests With -Duse64bitint
The following tests are known to fail in Solaris x86 with Perl
configured to use 64 bit integers:
ext/Data/Dumper/t/dumper.............FAILED at test 268
ext/Devel/Peek/Peek..................FAILED at test 7
=head2 SUPER-UX (NEC SX)
The following tests are known to fail on SUPER-UX:
op/64bitint...........................FAILED tests 29-30, 32-33, 35-36
op/arith..............................FAILED tests 128-130
op/pack...............................FAILED tests 25-5625
op/pow................................
op/taint..............................# msgsnd failed
../ext/IO/lib/IO/t/io_poll............FAILED tests 3-4
../ext/IPC/SysV/ipcsysv...............FAILED tests 2, 5-6
../ext/IPC/SysV/t/msg.................FAILED tests 2, 4-6
../ext/Socket/socketpair..............FAILED tests 12
../lib/IPC/SysV.......................FAILED tests 2, 5-6
../lib/warnings.......................FAILED tests 115-116, 118-119
The op/pack failure ("Cannot compress negative numbers at op/pack.t line 126")
is serious but as of yet unsolved. It points at some problems with the
signedness handling of the C compiler, as do the 64bitint, arith, and pow
failures. Most of the rest point at problems with SysV IPC.
=head2 Term::ReadKey not working on Win32
Use Term::ReadKey 2.20 or later.
=head2 UNICOS/mk
=over 4
=item *
During Configure, the test
Guessing which symbols your C compiler and preprocessor define...
will probably fail with error messages like
CC-20 cc: ERROR File = try.c, Line = 3
The identifier "bad" is undefined.
bad switch yylook 79bad switch yylook 79bad switch yylook 79bad switch yylook 79#ifdef A29K
^
CC-65 cc: ERROR File = try.c, Line = 3
A semicolon is expected at this point.
This is caused by a bug in the awk utility of UNICOS/mk. You can ignore
the error, but it does cause a slight problem: you cannot fully
benefit from the h2ph utility (see L<h2ph>) that can be used to
convert C headers to Perl libraries, mainly used to be able to access
from Perl the constants defined using C preprocessor, cpp. Because of
the above error, parts of the converted headers will be invisible.
Luckily, these days the need for h2ph is rare.
=item *
If building Perl with interpreter threads (ithreads), the
getgrent(), getgrnam(), and getgrgid() functions cannot return the
list of the group members due to a bug in the multithreaded support of
UNICOS/mk. What this means is that in list context the functions will
return only three values, not four.
=back
=head2 UTS
There are a few known test failures. (B<Note:> the relevant information was
available in F<README.uts> until support for UTS was removed in Perl
v5.18.0)
=head2 VOS (Stratus)
When Perl is built using the native build process on VOS Release
14.5.0 and GNU C++/GNU Tools 2.0.1, all attempted tests either
pass or result in TODO (ignored) failures.
=head2 VMS
There should be no reported test failures with a default configuration,
though there are a number of tests marked TODO that point to areas
needing further debugging and/or porting work.
=head2 Win32
In multi-CPU boxes, there are some problems with the I/O buffering:
some output may appear twice.
=head2 XML::Parser not working
Use XML::Parser 2.31 or later.
=head2 z/OS (OS/390)
z/OS has rather many test failures but the situation is actually much
better than it was in 5.6.0; it's just that so many new modules and
tests have been added.
Failed Test Stat Wstat Total Fail Failed List of Failed
---------------------------------------------------------------------------
../ext/Data/Dumper/t/dumper.t 357 8 2.24% 311 314 325 327
331 333 337 339
../ext/IO/lib/IO/t/io_unix.t 5 4 80.00% 2-5
../ext/Storable/t/downgrade.t 12 3072 169 12 7.10% 14-15 46-47 78-79
110-111 150 161
../lib/ExtUtils/t/Constant.t 121 30976 48 48 100.00% 1-48
../lib/ExtUtils/t/Embed.t 9 9 100.00% 1-9
op/pat.t 922 7 0.76% 665 776 785 832-
834 845
op/sprintf.t 224 3 1.34% 98 100 136
op/tr.t 97 5 5.15% 63 71-74
uni/fold.t 780 6 0.77% 61 169 196 661
710-711
The failures in dumper.t and downgrade.t are problems in the tests,
those in io_unix and sprintf are problems in the USS (UDP sockets and
printf formats). The pat, tr, and fold failures are genuine Perl
problems caused by EBCDIC (and in the pat and fold cases, combining
that with Unicode). The Constant and Embed are probably problems in
the tests (since they test Perl's ability to build extensions, and
that seems to be working reasonably well.)
=head2 Unicode Support on EBCDIC Still Spotty
Though mostly working, Unicode support still has problem spots on
EBCDIC platforms. One such known spot are the C<\p{}> and C<\P{}>
regular expression constructs for code points less than 256: the
C<pP> are testing for Unicode code points, not knowing about EBCDIC.
=head2 Seen In Perl 5.7 But Gone Now
C<Time::Piece> (previously known as C<Time::Object>) was removed
because it was felt that it didn't have enough value in it to be a
core module. It is still a useful module, though, and is available
from the CPAN.
Perl 5.8 unfortunately does not build anymore on AmigaOS; this broke
accidentally at some point. Since there are not that many Amiga
developers available, we could not get this fixed and tested in time
for 5.8.0. Perl 5.6.1 still works for AmigaOS (as does the 5.7.2
development release).
The C<PerlIO::Scalar> and C<PerlIO::Via> (capitalised) were renamed as
C<PerlIO::scalar> and C<PerlIO::via> (all lowercase) just before 5.8.0.
The main rationale was to have all core PerlIO layers to have all
lowercase names. The "plugins" are named as usual, for example
C<PerlIO::via::QuotedPrint>.
The C<threads::shared::queue> and C<threads::shared::semaphore> were
renamed as C<Thread::Queue> and C<Thread::Semaphore> just before 5.8.0.
The main rationale was to have thread modules to obey normal naming,
C<Thread::> (the C<threads> and C<threads::shared> themselves are
more pragma-like, they affect compile-time, so they stay lowercase).
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles
recently posted to the comp.lang.perl.misc newsgroup and the perl
bug database at http://bugs.perl.org/ . There may also be
information at http://www.perl.com/ , the Perl Home Page.
If you believe you have an unreported bug, please run the B<perlbug>
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
=head1 SEE ALSO
The F<Changes> file for exhaustive details on what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=head1 HISTORY
Written by Jarkko Hietaniemi <F<jhi@iki.fi>>.
=cut
PK z3�Z��낿 � perl583delta.podnu �[��� =head1 NAME
perl583delta - what is new for perl v5.8.3
=head1 DESCRIPTION
This document describes differences between the 5.8.2 release and
the 5.8.3 release.
If you are upgrading from an earlier release such as 5.6.1, first read
the L<perl58delta>, which describes differences between 5.6.0 and
5.8.0, and the L<perl581delta> and L<perl582delta>, which describe differences
between 5.8.0, 5.8.1 and 5.8.2
=head1 Incompatible Changes
There are no changes incompatible with 5.8.2.
=head1 Core Enhancements
A C<SCALAR> method is now available for tied hashes. This is called when
a tied hash is used in scalar context, such as
if (%tied_hash) {
...
}
The old behaviour was that %tied_hash would return whatever would have been
returned for that hash before the hash was tied (so usually 0). The new
behaviour in the absence of a SCALAR method is to return TRUE if in the
middle of an C<each> iteration, and otherwise call FIRSTKEY to check if the
hash is empty (making sure that a subsequent C<each> will also begin by
calling FIRSTKEY). Please see L<perltie/SCALAR> for the full details and
caveats.
=head1 Modules and Pragmata
=over 4
=item CGI
=item Cwd
=item Digest
=item Digest::MD5
=item Encode
=item File::Spec
=item FindBin
A function C<again> is provided to resolve problems where modules in different
directories wish to use FindBin.
=item List::Util
You can now weaken references to read only values.
=item Math::BigInt
=item PodParser
=item Pod::Perldoc
=item POSIX
=item Unicode::Collate
=item Unicode::Normalize
=item Test::Harness
=item threads::shared
C<cond_wait> has a new two argument form. C<cond_timedwait> has been added.
=back
=head1 Utility Changes
C<find2perl> now assumes C<-print> as a default action. Previously, it
needed to be specified explicitly.
A new utility, C<prove>, makes it easy to run an individual regression test
at the command line. C<prove> is part of Test::Harness, which users of earlier
Perl versions can install from CPAN.
=head1 New Documentation
The documentation has been revised in places to produce more standard manpages.
The documentation for the special code blocks (BEGIN, CHECK, INIT, END)
has been improved.
=head1 Installation and Configuration Improvements
Perl now builds on OpenVMS I64
=head1 Selected Bug Fixes
Using substr() on a UTF8 string could cause subsequent accesses on that
string to return garbage. This was due to incorrect UTF8 offsets being
cached, and is now fixed.
join() could return garbage when the same join() statement was used to
process 8 bit data having earlier processed UTF8 data, due to the flags
on that statement's temporary workspace not being reset correctly. This
is now fixed.
C<$a .. $b> will now work as expected when either $a or $b is C<undef>
Using Unicode keys with tied hashes should now work correctly.
Reading $^E now preserves $!. Previously, the C code implementing $^E
did not preserve C<errno>, so reading $^E could cause C<errno> and therefore
C<$!> to change unexpectedly.
Reentrant functions will (once more) work with C++. 5.8.2 introduced a bugfix
which accidentally broke the compilation of Perl extensions written in C++
=head1 New or Changed Diagnostics
The fatal error "DESTROY created new reference to dead object" is now
documented in L<perldiag>.
=head1 Changed Internals
The hash code has been refactored to reduce source duplication. The
external interface is unchanged, and aside from the bug fixes described
above, there should be no change in behaviour.
C<hv_clear_placeholders> is now part of the perl API
Some C macros have been tidied. In particular macros which create temporary
local variables now name these variables more defensively, which should
avoid bugs where names clash.
<signal.h> is now always included.
=head1 Configuration and Building
C<Configure> now invokes callbacks regardless of the value of the variable
they are called for. Previously callbacks were only invoked in the
C<case $variable $define)> branch. This change should only affect platform
maintainers writing configuration hints files.
=head1 Platform Specific Problems
The regression test ext/threads/shared/t/wait.t fails on early RedHat 9
and HP-UX 10.20 due to bugs in their threading implementations.
RedHat users should see https://rhn.redhat.com/errata/RHBA-2003-136.html
and consider upgrading their glibc.
=head1 Known Problems
Detached threads aren't supported on Windows yet, as they may lead to
memory access violation problems.
There is a known race condition opening scripts in C<suidperl>. C<suidperl>
is neither built nor installed by default, and has been deprecated since
perl 5.8.0. You are advised to replace use of suidperl with tools such
as sudo ( http://www.courtesan.com/sudo/ )
We have a backlog of unresolved bugs. Dealing with bugs and bug reports
is unglamorous work; not something ideally suited to volunteer labour,
but that is all that we have.
The perl5 development team are implementing changes to help address this
problem, which should go live in early 2004.
=head1 Future Directions
Code freeze for the next maintenance release (5.8.4) is on March 31st 2004,
with release expected by mid April. Similarly 5.8.5's freeze will be at
the end of June, with release by mid July.
=head1 Obituary
Iain 'Spoon' Truskett, Perl hacker, author of L<perlreref> and
contributor to CPAN, died suddenly on 29th December 2003, aged 24.
He will be missed.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles
recently posted to the comp.lang.perl.misc newsgroup and the perl
bug database at http://bugs.perl.org. There may also be
information at http://www.perl.org, the Perl Home Page.
If you believe you have an unreported bug, please run the B<perlbug>
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team. You can browse and search
the Perl 5 bugs at http://bugs.perl.org/
=head1 SEE ALSO
The F<Changes> file for exhaustive details on what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Z��(�v v perl582delta.podnu �[��� =head1 NAME
perl582delta - what is new for perl v5.8.2
=head1 DESCRIPTION
This document describes differences between the 5.8.1 release and
the 5.8.2 release.
If you are upgrading from an earlier release such as 5.6.1, first read
the L<perl58delta>, which describes differences between 5.6.0 and
5.8.0, and the L<perl581delta>, which describes differences between
5.8.0 and 5.8.1.
=head1 Incompatible Changes
For threaded builds for modules calling certain re-entrant system calls,
binary compatibility was accidentally lost between 5.8.0 and 5.8.1.
Binary compatibility with 5.8.0 has been restored in 5.8.2, which
necessitates breaking compatibility with 5.8.1. We see this as the
lesser of two evils.
This will only affect people who have a threaded perl 5.8.1, and compiled
modules which use these calls, and now attempt to run the compiled modules
with 5.8.2. The fix is to re-compile and re-install the modules using 5.8.2.
=head1 Core Enhancements
=head2 Hash Randomisation
The hash randomisation introduced with 5.8.1 has been amended. It
transpired that although the implementation introduced in 5.8.1 was source
compatible with 5.8.0, it was not binary compatible in certain cases. 5.8.2
contains an improved implementation which is both source and binary
compatible with both 5.8.0 and 5.8.1, and remains robust against the form of
attack which prompted the change for 5.8.1.
We are grateful to the Debian project for their input in this area.
See L<perlsec/"Algorithmic Complexity Attacks"> for the original
rationale behind this change.
=head2 Threading
Several memory leaks associated with variables shared between threads
have been fixed.
=head1 Modules and Pragmata
=head2 Updated Modules And Pragmata
The following modules and pragmata have been updated since Perl 5.8.1:
=over 4
=item Devel::PPPort
=item Digest::MD5
=item I18N::LangTags
=item libnet
=item MIME::Base64
=item Pod::Perldoc
=item strict
Documentation improved
=item Tie::Hash
Documentation improved
=item Time::HiRes
=item Unicode::Collate
=item Unicode::Normalize
=item UNIVERSAL
Documentation improved
=back
=head1 Selected Bug Fixes
Some syntax errors involving unrecognized filetest operators are now handled
correctly by the parser.
=head1 Changed Internals
Interpreter initialization is more complete when -DMULTIPLICITY is off.
This should resolve problems with initializing and destroying the Perl
interpreter more than once in a single process.
=head1 Platform Specific Problems
Dynamic linker flags have been tweaked for Solaris and OS X, which should
solve problems seen while building some XS modules.
Bugs in OS/2 sockets and tmpfile have been fixed.
In OS X C<setreuid> and friends are troublesome - perl will now work
around their problems as best possible.
=head1 Future Directions
Starting with 5.8.3 we intend to make more frequent maintenance releases,
with a smaller number of changes in each. The intent is to propagate
bug fixes out to stable releases more rapidly and make upgrading stable
releases less of an upheaval. This should give end users more
flexibility in their choice of upgrade timing, and allow them easier
assessment of the impact of upgrades. The current plan is for code freezes
as follows
=over 4
=item *
5.8.3 23:59:59 GMT, Wednesday December 31st 2003
=item *
5.8.4 23:59:59 GMT, Wednesday March 31st 2004
=item *
5.8.5 23:59:59 GMT, Wednesday June 30th 2004
=back
with the release following soon after, when testing is complete.
See L<perl581delta/"Future Directions"> for more soothsaying.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles
recently posted to the comp.lang.perl.misc newsgroup and the perl
bug database at http://bugs.perl.org/. There may also be
information at http://www.perl.com/, the Perl Home Page.
If you believe you have an unreported bug, please run the B<perlbug>
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team. You can browse and search
the Perl 5 bugs at http://bugs.perl.org/
=head1 SEE ALSO
The F<Changes> file for exhaustive details on what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Z�.��5 �5 perlgpl.podnu �[���
=head1 NAME
perlgpl - the GNU General Public License, version 1
=head1 SYNOPSIS
You can refer to this document in Pod via "L<perlgpl>"
Or you can see this document by entering "perldoc perlgpl"
=head1 DESCRIPTION
Perl is free software; you can redistribute it and/or modify
it under the terms of either:
a) the GNU General Public License as published by the Free
Software Foundation; either version 1, or (at your option) any
later version, or
b) the "Artistic License" which comes with this Kit.
This is the B<"GNU General Public License, version 1">.
It's here so that modules, programs, etc., that want to declare
this as their distribution license can link to it.
For the Perl Artistic License, see L<perlartistic>.
=cut
# Because the following document's language disallows "changing"
# it, we haven't gone thru and prettied it up with =item's or
# anything. It's good enough the way it is.
=head1 GNU GENERAL PUBLIC LICENSE
GNU GENERAL PUBLIC LICENSE
Version 1, February 1989
Copyright (C) 1989 Free Software Foundation, Inc.
51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The license agreements of most software companies try to keep users
at the mercy of those companies. By contrast, our General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. The
General Public License applies to the Free Software Foundation's
software and to any other program whose authors commit to using it.
You can use it for your programs, too.
When we speak of free software, we are referring to freedom, not
price. Specifically, the General Public License is designed to make
sure that you have the freedom to give away or sell copies of free
software, that you receive source code or can get it if you want it,
that you can change the software or use pieces of it in new free
programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of a such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must tell them their rights.
We protect your rights with two steps: (1) copyright the software,
and (2) offer you this license which gives you legal permission to
copy, distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on,
we want its recipients to know that what they have is not the original,
so that any problems introduced by others will not reflect on the
original authors' reputations.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License Agreement applies to any program or other work which
contains a notice placed by the copyright holder saying it may be
distributed under the terms of this General Public License. The
"Program", below, refers to any such program or work, and a "work based
on the Program" means either the Program or any work containing the
Program or a portion of it, either verbatim or with modifications.
Each licensee is addressed as "you".
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this General Public License and to the absence of
any warranty; and give any other recipients of the Program a copy of
this General Public License along with the Program. You may charge a
fee for the physical act of transferring a copy.
2. You may modify your copy or copies of the Program or any portion
of it, and copy and distribute such modifications under the terms of
Paragraph 1 above, provided that you also do the following:
a) cause the modified files to carry prominent notices stating that
you changed the files and the date of any change; and
b) cause the whole of any work that you distribute or publish, that
in whole or in part contains the Program or any part thereof,
either with or without modifications, to be licensed at no charge
to all third parties under the terms of this General Public License
(except that you may choose to grant warranty protection to some or
all third parties, at your option).
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the simplest and most usual way, to print or
display an announcement including an appropriate copyright notice
and a notice that there is no warranty (or else, saying that you
provide a warranty) and that users may redistribute the program
under these conditions, and telling the user how to view a copy of
this General Public License.
d) You may charge a fee for the physical act of transferring a
copy, and you may at your option offer warranty protection in
exchange for a fee.
Mere aggregation of another independent work with the Program (or its
derivative) on a volume of a storage or distribution medium does not
bring the other work under the scope of these terms.
3. You may copy and distribute the Program (or a portion or
derivative of it, under Paragraph 2) in object code or executable form
under the terms of Paragraphs 1 and 2 above provided that you also do
one of the following:
a) accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of
Paragraphs 1 and 2 above; or,
b) accompany it with a written offer, valid for at least three
years, to give any third party free (except for a nominal charge
for the cost of distribution) a complete machine-readable copy of
the corresponding source code, to be distributed under the terms of
Paragraphs 1 and 2 above; or,
c) accompany it with the information you received as to where the
corresponding source code may be obtained. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form alone.)
Source code for a work means the preferred form of the work for making
modifications to it. For an executable file, complete source code
means all the source code for all modules it contains; but, as a
special exception, it need not include source code for modules which
are standard libraries that accompany the operating system on which the
executable file runs, or for standard header files or definitions files
that accompany that operating system.
4. You may not copy, modify, sublicense, distribute or transfer the
Program except as expressly provided under this General Public License.
Any attempt otherwise to copy, modify, sublicense, distribute or
transfer the Program is void, and will automatically terminate your
rights to use the Program under this License. However, parties who
have received copies, or rights to use copies, from you under this
General Public License will not have their licenses terminated so long
as such parties remain in full compliance.
5. By copying, distributing or modifying the Program (or any work
based on the Program) you indicate your acceptance of this license to
do so, and all its terms and conditions.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
7. The Free Software Foundation may publish revised and/or new
versions of the General Public License from time to time. Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of the license which applies to it and "any
later version", you have the option of following the terms and
conditions either of that version or of any later version published by
the Free Software Foundation. If the Program does not specify a
version number of the license, you may choose any version ever
published by the Free Software Foundation.
8. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the
author to ask for permission. For software which is copyrighted by the
Free Software Foundation, write to the Free Software Foundation; we
sometimes make exceptions for this. Our decision will be guided by the
two goals of preserving the free status of all derivatives of our free
software and of promoting the sharing and reuse of software generally.
NO WARRANTY
9. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO
WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.
EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR
OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS
WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
10. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY
AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU
FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING
RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF
SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
END OF TERMS AND CONDITIONS
Appendix: How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to humanity, the best way to achieve this is to make it
free software which everyone can redistribute and change under these
terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it
does.>
Copyright (C) 19yy <name of author>
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 1, or (at
your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston MA
02110-1301 USA
Also add information on how to contact you by electronic and paper
mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) 19xx name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type
'show w'. This is free software, and you are welcome to
redistribute it under certain conditions; type 'show c' for
details.
The hypothetical commands 'show w' and 'show c' should show the
appropriate parts of the General Public License. Of course, the
commands you use may be called something other than 'show w' and 'show
c'; they could even be mouse-clicks or menu items--whatever suits your
program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the
program 'Gnomovision' (a program to direct compilers to make passes
at assemblers) written by James Hacker.
<signature of Ty Coon>, 1 April 1989
Ty Coon, President of Vice
That's all there is to it!
=cut
PK z3�Z�#OU1 U1 perl5222delta.podnu �[��� =encoding utf8
=head1 NAME
perl5222delta - what is new for perl v5.22.2
=head1 DESCRIPTION
This document describes differences between the 5.22.1 release and the 5.22.2
release.
If you are upgrading from an earlier release such as 5.22.0, first read
L<perl5221delta>, which describes differences between 5.22.0 and 5.22.1.
=head1 Security
=head2 Fix out of boundary access in Win32 path handling
This is CVE-2015-8608. For more information see
L<[perl #126755]|https://rt.perl.org/Ticket/Display.html?id=126755>.
=head2 Fix loss of taint in C<canonpath()>
This is CVE-2015-8607. For more information see
L<[perl #126862]|https://rt.perl.org/Ticket/Display.html?id=126862>.
=head2 Set proper umask before calling C<mkstemp(3)>
In 5.22.0 perl started setting umask to C<0600> before calling C<mkstemp(3)>
and restoring it afterwards. This wrongfully tells C<open(2)> to strip the
owner read and write bits from the given mode before applying it, rather than
the intended negation of leaving only those bits in place.
Systems that use mode C<0666> in C<mkstemp(3)> (like old versions of glibc)
create a file with permissions C<0066>, leaving world read and write permissions
regardless of current umask.
This has been fixed by using umask C<0177> instead.
L<[perl #127322]|https://rt.perl.org/Ticket/Display.html?id=127322>
=head2 Avoid accessing uninitialized memory in Win32 C<crypt()>
Validation that will detect both a short salt and invalid characters in the
salt has been added.
L<[perl #126922]|https://rt.perl.org/Ticket/Display.html?id=126922>
=head2 Remove duplicate environment variables from C<environ>
Previously, if an environment variable appeared more than once in C<environ[]>,
L<C<%ENV>|perlvar/%ENV> would contain the last entry for that name, while a
typical C<getenv()> would return the first entry. We now make sure C<%ENV>
contains the same as what C<getenv()> returns.
Secondly, we now remove duplicates from C<environ[]>, so if a setting with that
name is set in C<%ENV> we won't pass an unsafe value to a child process.
This is CVE-2016-2381.
=head1 Incompatible Changes
There are no changes intentionally incompatible with Perl 5.22.1. If any
exist, they are bugs, and we request that you submit a report. See
L</Reporting Bugs> below.
=head1 Modules and Pragmata
=head2 Updated Modules and Pragmata
=over 4
=item *
L<File::Spec> has been upgraded from version 3.56 to 3.56_01.
C<canonpath()> now preserves taint. See L</"Fix loss of taint in
C<canonpath()>">.
=item *
L<Module::CoreList> has been upgraded from version 5.20151213 to 5.20160429.
The version number of L<Digest::SHA> listed for Perl 5.18.4 was wrong and has
been corrected. Likewise for the version number of L<Config> in 5.18.3 and
5.18.4.
L<[perl #127624]|https://rt.perl.org/Ticket/Display.html?id=127624>
=back
=head1 Documentation
=head2 Changes to Existing Documentation
=head3 L<perldiag>
=over 4
=item *
The explanation of the warning "unable to close filehandle %s properly: %s"
which can occur when doing an implicit close of a filehandle has been expanded
and improved.
=back
=head3 L<perlfunc>
=over 4
=item *
The documentation of L<C<hex()>|perlfunc/hex> has been revised to clarify valid
inputs.
=back
=head1 Configuration and Compilation
=over 4
=item *
Dtrace builds now build successfully on systems with a newer dtrace that
require an input object file that uses the probes in the F<.d> file.
Previously the probe would fail and cause a build failure.
L<[perl #122287]|https://rt.perl.org/Ticket/Display.html?id=122287>
=item *
F<Configure> no longer probes for F<libnm> by default. Originally this was the
"New Math" library, but the name has been re-used by the GNOME NetworkManager.
L<[perl #127131]|https://rt.perl.org/Ticket/Display.html?id=127131>
=item *
F<Configure> now knows about gcc 5.
=item *
Compiling perl with B<-DPERL_MEM_LOG> now works again.
=back
=head1 Platform Support
=head2 Platform-Specific Notes
=over 4
=item Darwin
Compiling perl with B<-Dusecbacktrace> on Darwin now works again.
L<[perl #127764]|https://rt.perl.org/Ticket/Display.html?id=127764>
=item OS X/Darwin
Builds with both B<-DDEBUGGING> and threading enabled would fail with a "panic:
free from wrong pool" error when built or tested from Terminal on OS X. This
was caused by perl's internal management of the environment conflicting with an
atfork handler using the libc C<setenv()> function to update the environment.
Perl now uses C<setenv()>/C<unsetenv()> to update the environment on OS X.
L<[perl #126240]|https://rt.perl.org/Ticket/Display.html?id=126240>
=item ppc64el
The floating point format of ppc64el (Debian naming for little-endian PowerPC)
is now detected correctly.
=item Tru64
A test failure in F<t/porting/extrefs.t> has been fixed.
=back
=head1 Internal Changes
=over 4
=item *
An unwarranted assertion in C<Perl_newATTRSUB_x()> has been removed. If a stub
subroutine definition with a prototype has been seen, then any subsequent stub
(or definition) of the same subroutine with an attribute was causing an
assertion failure because of a null pointer.
L<[perl #126845]|https://rt.perl.org/Ticket/Display.html?id=126845>
=back
=head1 Selected Bug Fixes
=over 4
=item *
Calls to the placeholder C<&PL_sv_yes> used internally when an C<import()> or
C<unimport()> method isn't found now correctly handle scalar context.
L<[perl #126042]|https://rt.perl.org/Ticket/Display.html?id=126042>
=item *
The L<C<pipe()>|perlfunc/pipe> operator would assert for C<DEBUGGING> builds
instead of producing the correct error message. The condition asserted on is
detected and reported on correctly without the assertions, so the assertions
were removed.
L<[perl #126480]|https://rt.perl.org/Ticket/Display.html?id=126480>
=item *
In some cases, failing to parse a here-doc would attempt to use freed memory.
This was caused by a pointer not being restored correctly.
L<[perl #126443]|https://rt.perl.org/Ticket/Display.html?id=126443>
=item *
Perl now reports more context when it sees an array where it expects to see an
operator, and avoids an assertion failure.
L<[perl #123737]|https://rt.perl.org/Ticket/Display.html?id=123737>
=item *
If a here-doc was found while parsing another operator, the parser had already
read end of file, and the here-doc was not terminated, perl could produce an
assertion or a segmentation fault. This now reliably complains about the
unterminated here-doc.
L<[perl #125540]|https://rt.perl.org/Ticket/Display.html?id=125540>
=item *
Parsing beyond the end of the buffer when processing a C<#line> directive with
no filename is now avoided.
L<[perl #127334]|https://rt.perl.org/Ticket/Display.html?id=127334>
=item *
Perl 5.22.0 added support for the C99 hexadecimal floating point notation, but
sometimes misparsed hex floats. This has been fixed.
L<[perl #127183]|https://rt.perl.org/Ticket/Display.html?id=127183>
=item *
Certain regex patterns involving a complemented posix class in an inverted
bracketed character class, and matching something else optionally would
improperly fail to match. An example of one that could fail is
C<qr/_?[^\Wbar]\x{100}/>. This has been fixed.
L<[perl #127537]|https://rt.perl.org/Ticket/Display.html?id=127537>
=item *
Fixed an issue with L<C<pack()>|perlfunc/pack> where C<< pack "H" >> (and
C<< pack "h" >>) could read past the source when given a non-utf8 source and a
utf8 target.
L<[perl #126325]|https://rt.perl.org/Ticket/Display.html?id=126325>
=item *
Fixed some cases where perl would abort due to a segmentation fault, or a
C-level assert.
L<[perl #126193]|https://rt.perl.org/Ticket/Display.html?id=126193>
L<[perl #126257]|https://rt.perl.org/Ticket/Display.html?id=126257>
L<[perl #126258]|https://rt.perl.org/Ticket/Display.html?id=126258>
L<[perl #126405]|https://rt.perl.org/Ticket/Display.html?id=126405>
L<[perl #126602]|https://rt.perl.org/Ticket/Display.html?id=126602>
L<[perl #127773]|https://rt.perl.org/Ticket/Display.html?id=127773>
L<[perl #127786]|https://rt.perl.org/Ticket/Display.html?id=127786>
=item *
A memory leak when setting C<$ENV{foo}> on Darwin has been fixed.
L<[perl #126240]|https://rt.perl.org/Ticket/Display.html?id=126240>
=item *
Perl now correctly raises an error when trying to compile patterns with
unterminated character classes while there are trailing backslashes.
L<[perl #126141]|https://rt.perl.org/Ticket/Display.html?id=126141>
=item *
C<NOTHING> regops and C<EXACTFU_SS> regops in C<make_trie()> are now handled
properly.
L<[perl #126206]|https://rt.perl.org/Ticket/Display.html?id=126206>
=item *
Perl now only tests C<semctl()> if we have everything needed to use it. In
FreeBSD the C<semctl()> entry point may exist, but it can be disabled by
policy.
L<[perl #127533]|https://rt.perl.org/Ticket/Display.html?id=127533>
=item *
A regression that allowed undeclared barewords as hash keys to work despite
strictures has been fixed.
L<[perl #126981]|https://rt.perl.org/Ticket/Display.html?id=126981>
=item *
As an optimization (introduced in Perl 5.20.0), L<C<uc()>|perlfunc/uc>,
L<C<lc()>|perlfunc/lc>, L<C<ucfirst()>|perlfunc/ucfirst> and
L<C<lcfirst()>|perlfunc/lcfirst> sometimes modify their argument in-place
rather than returning a modified copy. The criteria for this optimization has
been made stricter to avoid these functions accidentally modifying in-place
when they should not, which has been happening in some cases, e.g. in
L<List::Util>.
=item *
Excessive memory usage in the compilation of some regular expressions involving
non-ASCII characters has been reduced. A more complete fix is forthcoming in
Perl 5.24.0.
=back
=head1 Acknowledgements
Perl 5.22.2 represents approximately 5 months of development since Perl 5.22.1
and contains approximately 3,000 lines of changes across 110 files from 24
authors.
Excluding auto-generated files, documentation and release tools, there were
approximately 1,500 lines of changes to 52 .pm, .t, .c and .h files.
Perl continues to flourish into its third decade thanks to a vibrant community
of users and developers. The following people are known to have contributed
the improvements that became Perl 5.22.2:
Aaron Crane, Abigail, Andreas König, Aristotle Pagaltzis, Chris 'BinGOs'
Williams, Craig A. Berry, Dagfinn Ilmari Mannsåker, David Golden, David
Mitchell, H.Merijn Brand, James E Keenan, Jarkko Hietaniemi, Karen Etheridge,
Karl Williamson, Matthew Horsfall, Niko Tyni, Ricardo Signes, Sawyer X, Stevan
Little, Steve Hay, Todd Rinaldo, Tony Cook, Vladimir Timofeev, Yves Orton.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles recently
posted to the comp.lang.perl.misc newsgroup and the perl bug database at
https://rt.perl.org/ . There may also be information at http://www.perl.org/ ,
the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug> program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of C<perl -V>,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it
inappropriate to send to a publicly archived mailing list, then please send it
to perl5-security-report@perl.org. This points to a closed subscription
unarchived mailing list, which includes all the core committers, who will be
able to help assess the impact of issues, figure out a resolution, and help
co-ordinate the release of patches to mitigate or fix the problem across all
platforms on which Perl is supported. Please only use this address for
security issues in the Perl core, not for modules independently distributed on
CPAN.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details on
what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Z3U�� perl5125delta.podnu �[��� =encoding utf8
=head1 NAME
perl5125delta - what is new for perl v5.12.5
=head1 DESCRIPTION
This document describes differences between the 5.12.4 release and
the 5.12.5 release.
If you are upgrading from an earlier release such as 5.12.3, first read
L<perl5124delta>, which describes differences between 5.12.3 and
5.12.4.
=head1 Security
=head2 C<Encode> decode_xs n-byte heap-overflow (CVE-2011-2939)
A bug in C<Encode> could, on certain inputs, cause the heap to overflow.
This problem has been corrected. Bug reported by Robert Zacek.
=head2 C<File::Glob::bsd_glob()> memory error with GLOB_ALTDIRFUNC (CVE-2011-2728).
Calling C<File::Glob::bsd_glob> with the unsupported flag GLOB_ALTDIRFUNC would
cause an access violation / segfault. A Perl program that accepts a flags value from
an external source could expose itself to denial of service or arbitrary code
execution attacks. There are no known exploits in the wild. The problem has been
corrected by explicitly disabling all unsupported flags and setting unused function
pointers to null. Bug reported by Clément Lecigne.
=head2 Heap buffer overrun in 'x' string repeat operator (CVE-2012-5195)
Poorly written perl code that allows an attacker to specify the count to
perl's 'x' string repeat operator can already cause a memory exhaustion
denial-of-service attack. A flaw in versions of perl before 5.15.5 can
escalate that into a heap buffer overrun; coupled with versions of glibc
before 2.16, it possibly allows the execution of arbitrary code.
This problem has been fixed.
=head1 Incompatible Changes
There are no changes intentionally incompatible with 5.12.4. If any
exist, they are bugs and reports are welcome.
=head1 Modules and Pragmata
=head2 Updated Modules
=head3 L<B::Concise>
L<B::Concise> no longer produces mangled output with the B<-tree> option
[perl #80632].
=head3 L<charnames>
A regression introduced in Perl 5.8.8 has been fixed, that caused
C<charnames::viacode(0)> to return C<undef> instead of the string "NULL"
[perl #72624].
=head3 L<Encode> has been upgraded from version 2.39 to version 2.39_01.
See L</Security>.
=head3 L<File::Glob> has been upgraded from version 1.07 to version 1.07_01.
See L</Security>.
=head3 L<Unicode::UCD>
The documentation for the C<upper> function now actually says "upper", not
"lower".
=head3 L<Module::CoreList>
L<Module::CoreList> has been updated to version 2.50_02 to add data for
this release.
=head1 Changes to Existing Documentation
=head2 L<perlebcdic>
The L<perlebcdic> document contains a helpful table to use in C<tr///> to
convert between EBCDIC and Latin1/ASCII. Unfortunately, the table was the
inverse of the one it describes. This has been corrected.
=head2 L<perlunicode>
The section on
L<User-Defined Case Mappings|perlunicode/User-Defined Case Mappings> had
some bad markup and unclear sentences, making parts of it unreadable. This
has been rectified.
=head2 L<perluniprops>
This document has been corrected to take non-ASCII platforms into account.
=head1 Installation and Configuration Improvements
=head2 Platform Specific Changes
=over 4
=item Mac OS X
There have been configuration and test fixes to make Perl build cleanly on
Lion and Mountain Lion.
=item NetBSD
The NetBSD hints file was corrected to be compatible with NetBSD 6.*
=back
=head1 Selected Bug Fixes
=over 4
=item *
C<chop> now correctly handles characters above "\x{7fffffff}"
[perl #73246].
=item *
C<< ($<,$>) = (...) >> stopped working properly in 5.12.0. It is supposed
to make a single C<setreuid()> call, rather than calling C<setruid()> and
C<seteuid()> separately. Consequently it did not work properly. This has
been fixed [perl #75212].
=item *
Fixed a regression of kill() when a match variable is used for the
process ID to kill [perl #75812].
=item *
C<UNIVERSAL::VERSION> no longer leaks memory. It started leaking in Perl
5.10.0.
=item *
The C-level C<my_strftime> functions no longer leaks memory. This fixes a
memory leak in C<POSIX::strftime> [perl #73520].
=item *
C<caller> no longer leaks memory when called from the DB package if
C<@DB::args> was assigned to after the first call to C<caller>. L<Carp>
was triggering this bug [perl #97010].
=item *
Passing to C<index> an offset beyond the end of the string when the string
is encoded internally in UTF8 no longer causes panics [perl #75898].
=item *
Syntax errors in C<< (?{...}) >> blocks in regular expressions no longer
cause panic messages [perl #2353].
=item *
Perl 5.10.0 introduced some faulty logic that made "U*" in the middle of
a pack template equivalent to "U0" if the input string was empty. This has
been fixed [perl #90160].
=back
=head1 Errata
=head2 split() and C<@_>
split() no longer modifies C<@_> when called in scalar or void context.
In void context it now produces a "Useless use of split" warning.
This is actually a change introduced in perl 5.12.0, but it was missed from
that release's L<perl5120delta>.
=head1 Acknowledgements
Perl 5.12.5 represents approximately 17 months of development since Perl 5.12.4
and contains approximately 1,900 lines of changes across 64 files from 18
authors.
Perl continues to flourish into its third decade thanks to a vibrant community
of users and developers. The following people are known to have contributed the
improvements that became Perl 5.12.5:
Andy Dougherty, Chris 'BinGOs' Williams, Craig A. Berry, David Mitchell,
Dominic Hargreaves, Father Chrysostomos, Florian Ragwitz, George Greer, Goro
Fuji, Jesse Vincent, Karl Williamson, Leon Brocard, Nicholas Clark, Rafael
Garcia-Suarez, Reini Urban, Ricardo Signes, Steve Hay, Tony Cook.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles
recently posted to the comp.lang.perl.misc newsgroup and the perl
bug database at http://rt.perl.org/perlbug/ . There may also be
information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the B<perlbug>
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it
inappropriate to send to a publicly archived mailing list, then please send
it to perl5-security-report@perl.org. This points to a closed subscription
unarchived mailing list, which includes all the core committers, who be able
to help assess the impact of issues, figure out a resolution, and help
co-ordinate the release of patches to mitigate or fix the problem across all
platforms on which Perl is supported. Please only use this address for
security issues in the Perl core, not for modules independently
distributed on CPAN.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details
on what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Z~�8�� ��
perlxstut.podnu �[��� =head1 NAME
perlxstut - Tutorial for writing XSUBs
=head1 DESCRIPTION
This tutorial will educate the reader on the steps involved in creating
a Perl extension. The reader is assumed to have access to L<perlguts>,
L<perlapi> and L<perlxs>.
This tutorial starts with very simple examples and becomes more complex,
with each new example adding new features. Certain concepts may not be
completely explained until later in the tutorial in order to slowly ease
the reader into building extensions.
This tutorial was written from a Unix point of view. Where I know them
to be otherwise different for other platforms (e.g. Win32), I will list
them. If you find something that was missed, please let me know.
=head1 SPECIAL NOTES
=head2 make
This tutorial assumes that the make program that Perl is configured to
use is called C<make>. Instead of running "make" in the examples that
follow, you may have to substitute whatever make program Perl has been
configured to use. Running B<perl -V:make> should tell you what it is.
=head2 Version caveat
When writing a Perl extension for general consumption, one should expect that
the extension will be used with versions of Perl different from the
version available on your machine. Since you are reading this document,
the version of Perl on your machine is probably 5.005 or later, but the users
of your extension may have more ancient versions.
To understand what kinds of incompatibilities one may expect, and in the rare
case that the version of Perl on your machine is older than this document,
see the section on "Troubleshooting these Examples" for more information.
If your extension uses some features of Perl which are not available on older
releases of Perl, your users would appreciate an early meaningful warning.
You would probably put this information into the F<README> file, but nowadays
installation of extensions may be performed automatically, guided by F<CPAN.pm>
module or other tools.
In MakeMaker-based installations, F<Makefile.PL> provides the earliest
opportunity to perform version checks. One can put something like this
in F<Makefile.PL> for this purpose:
eval { require 5.007 }
or die <<EOD;
############
### This module uses frobnication framework which is not available
### before version 5.007 of Perl. Upgrade your Perl before
### installing Kara::Mba.
############
EOD
=head2 Dynamic Loading versus Static Loading
It is commonly thought that if a system does not have the capability to
dynamically load a library, you cannot build XSUBs. This is incorrect.
You I<can> build them, but you must link the XSUBs subroutines with the
rest of Perl, creating a new executable. This situation is similar to
Perl 4.
This tutorial can still be used on such a system. The XSUB build mechanism
will check the system and build a dynamically-loadable library if possible,
or else a static library and then, optionally, a new statically-linked
executable with that static library linked in.
Should you wish to build a statically-linked executable on a system which
can dynamically load libraries, you may, in all the following examples,
where the command "C<make>" with no arguments is executed, run the command
"C<make perl>" instead.
If you have generated such a statically-linked executable by choice, then
instead of saying "C<make test>", you should say "C<make test_static>".
On systems that cannot build dynamically-loadable libraries at all, simply
saying "C<make test>" is sufficient.
=head2 Threads and PERL_NO_GET_CONTEXT
For threaded builds, perl requires the context pointer for the current
thread, without C<PERL_NO_GET_CONTEXT>, perl will call a function to
retrieve the context.
For improved performance, include:
#define PERL_NO_GET_CONTEXT
as shown below.
For more details, see L<perlguts|perlguts/How multiple interpreters
and concurrency are supported>.
=head1 TUTORIAL
Now let's go on with the show!
=head2 EXAMPLE 1
Our first extension will be very simple. When we call the routine in the
extension, it will print out a well-known message and return.
Run "C<h2xs -A -n Mytest>". This creates a directory named Mytest,
possibly under ext/ if that directory exists in the current working
directory. Several files will be created under the Mytest dir, including
MANIFEST, Makefile.PL, lib/Mytest.pm, Mytest.xs, t/Mytest.t, and Changes.
The MANIFEST file contains the names of all the files just created in the
Mytest directory.
The file Makefile.PL should look something like this:
use ExtUtils::MakeMaker;
# See lib/ExtUtils/MakeMaker.pm for details of how to influence
# the contents of the Makefile that is written.
WriteMakefile(
NAME => 'Mytest',
VERSION_FROM => 'Mytest.pm', # finds $VERSION
LIBS => [''], # e.g., '-lm'
DEFINE => '', # e.g., '-DHAVE_SOMETHING'
INC => '', # e.g., '-I/usr/include/other'
);
The file Mytest.pm should start with something like this:
package Mytest;
use 5.008008;
use strict;
use warnings;
require Exporter;
our @ISA = qw(Exporter);
our %EXPORT_TAGS = ( 'all' => [ qw(
) ] );
our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } );
our @EXPORT = qw(
);
our $VERSION = '0.01';
require XSLoader;
XSLoader::load('Mytest', $VERSION);
# Preloaded methods go here.
1;
__END__
# Below is the stub of documentation for your module. You better
# edit it!
The rest of the .pm file contains sample code for providing documentation for
the extension.
Finally, the Mytest.xs file should look something like this:
#define PERL_NO_GET_CONTEXT
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include "ppport.h"
MODULE = Mytest PACKAGE = Mytest
Let's edit the .xs file by adding this to the end of the file:
void
hello()
CODE:
printf("Hello, world!\n");
It is okay for the lines starting at the "CODE:" line to not be indented.
However, for readability purposes, it is suggested that you indent CODE:
one level and the lines following one more level.
Now we'll run "C<perl Makefile.PL>". This will create a real Makefile,
which make needs. Its output looks something like:
% perl Makefile.PL
Checking if your kit is complete...
Looks good
Writing Makefile for Mytest
%
Now, running make will produce output that looks something like this (some
long lines have been shortened for clarity and some extraneous lines have
been deleted):
% make
cp lib/Mytest.pm blib/lib/Mytest.pm
perl xsubpp -typemap typemap Mytest.xs > Mytest.xsc && \
mv Mytest.xsc Mytest.c
Please specify prototyping behavior for Mytest.xs (see perlxs manual)
cc -c Mytest.c
Running Mkbootstrap for Mytest ()
chmod 644 Mytest.bs
rm -f blib/arch/auto/Mytest/Mytest.so
cc -shared -L/usr/local/lib Mytest.o -o blib/arch/auto/Mytest/Mytest.so
chmod 755 blib/arch/auto/Mytest/Mytest.so
cp Mytest.bs blib/arch/auto/Mytest/Mytest.bs
chmod 644 blib/arch/auto/Mytest/Mytest.bs
Manifying blib/man3/Mytest.3pm
%
You can safely ignore the line about "prototyping behavior" - it is
explained in L<perlxs/"The PROTOTYPES: Keyword">.
Perl has its own special way of easily writing test scripts, but for this
example only, we'll create our own test script. Create a file called hello
that looks like this:
#! /opt/perl5/bin/perl
use ExtUtils::testlib;
use Mytest;
Mytest::hello();
Now we make the script executable (C<chmod +x hello>), run the script
and we should see the following output:
% ./hello
Hello, world!
%
=head2 EXAMPLE 2
Now let's add to our extension a subroutine that will take a single numeric
argument as input and return 1 if the number is even or 0 if the number
is odd.
Add the following to the end of Mytest.xs:
int
is_even(input)
int input
CODE:
RETVAL = (input % 2 == 0);
OUTPUT:
RETVAL
There does not need to be whitespace at the start of the "C<int input>"
line, but it is useful for improving readability. Placing a semi-colon at
the end of that line is also optional. Any amount and kind of whitespace
may be placed between the "C<int>" and "C<input>".
Now re-run make to rebuild our new shared library.
Now perform the same steps as before, generating a Makefile from the
Makefile.PL file, and running make.
In order to test that our extension works, we now need to look at the
file Mytest.t. This file is set up to imitate the same kind of testing
structure that Perl itself has. Within the test script, you perform a
number of tests to confirm the behavior of the extension, printing "ok"
when the test is correct, "not ok" when it is not.
use Test::More tests => 4;
BEGIN { use_ok('Mytest') };
#########################
# Insert your test code below, the Test::More module is use()ed here
# so read its man page ( perldoc Test::More ) for help writing this
# test script.
is(&Mytest::is_even(0), 1);
is(&Mytest::is_even(1), 0);
is(&Mytest::is_even(2), 1);
We will be calling the test script through the command "C<make test>". You
should see output that looks something like this:
%make test
PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"
"test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/Mytest....ok
All tests successful.
Files=1, Tests=4, 0 wallclock secs ( 0.03 cusr + 0.00 csys = 0.03 CPU)
%
=head2 What has gone on?
The program h2xs is the starting point for creating extensions. In later
examples we'll see how we can use h2xs to read header files and generate
templates to connect to C routines.
h2xs creates a number of files in the extension directory. The file
Makefile.PL is a perl script which will generate a true Makefile to build
the extension. We'll take a closer look at it later.
The .pm and .xs files contain the meat of the extension. The .xs file holds
the C routines that make up the extension. The .pm file contains routines
that tell Perl how to load your extension.
Generating the Makefile and running C<make> created a directory called blib
(which stands for "build library") in the current working directory. This
directory will contain the shared library that we will build. Once we have
tested it, we can install it into its final location.
Invoking the test script via "C<make test>" did something very important.
It invoked perl with all those C<-I> arguments so that it could find the
various files that are part of the extension. It is I<very> important that
while you are still testing extensions that you use "C<make test>". If you
try to run the test script all by itself, you will get a fatal error.
Another reason it is important to use "C<make test>" to run your test
script is that if you are testing an upgrade to an already-existing version,
using "C<make test>" ensures that you will test your new extension, not the
already-existing version.
When Perl sees a C<use extension;>, it searches for a file with the same name
as the C<use>'d extension that has a .pm suffix. If that file cannot be found,
Perl dies with a fatal error. The default search path is contained in the
C<@INC> array.
In our case, Mytest.pm tells perl that it will need the Exporter and Dynamic
Loader extensions. It then sets the C<@ISA> and C<@EXPORT> arrays and the
C<$VERSION> scalar; finally it tells perl to bootstrap the module. Perl
will call its dynamic loader routine (if there is one) and load the shared
library.
The two arrays C<@ISA> and C<@EXPORT> are very important. The C<@ISA>
array contains a list of other packages in which to search for methods (or
subroutines) that do not exist in the current package. This is usually
only important for object-oriented extensions (which we will talk about
much later), and so usually doesn't need to be modified.
The C<@EXPORT> array tells Perl which of the extension's variables and
subroutines should be placed into the calling package's namespace. Because
you don't know if the user has already used your variable and subroutine
names, it's vitally important to carefully select what to export. Do I<not>
export method or variable names I<by default> without a good reason.
As a general rule, if the module is trying to be object-oriented then don't
export anything. If it's just a collection of functions and variables, then
you can export them via another array, called C<@EXPORT_OK>. This array
does not automatically place its subroutine and variable names into the
namespace unless the user specifically requests that this be done.
See L<perlmod> for more information.
The C<$VERSION> variable is used to ensure that the .pm file and the shared
library are "in sync" with each other. Any time you make changes to
the .pm or .xs files, you should increment the value of this variable.
=head2 Writing good test scripts
The importance of writing good test scripts cannot be over-emphasized. You
should closely follow the "ok/not ok" style that Perl itself uses, so that
it is very easy and unambiguous to determine the outcome of each test case.
When you find and fix a bug, make sure you add a test case for it.
By running "C<make test>", you ensure that your Mytest.t script runs and uses
the correct version of your extension. If you have many test cases,
save your test files in the "t" directory and use the suffix ".t".
When you run "C<make test>", all of these test files will be executed.
=head2 EXAMPLE 3
Our third extension will take one argument as its input, round off that
value, and set the I<argument> to the rounded value.
Add the following to the end of Mytest.xs:
void
round(arg)
double arg
CODE:
if (arg > 0.0) {
arg = floor(arg + 0.5);
} else if (arg < 0.0) {
arg = ceil(arg - 0.5);
} else {
arg = 0.0;
}
OUTPUT:
arg
Edit the Makefile.PL file so that the corresponding line looks like this:
'LIBS' => ['-lm'], # e.g., '-lm'
Generate the Makefile and run make. Change the test number in Mytest.t to
"9" and add the following tests:
$i = -1.5; &Mytest::round($i); is( $i, -2.0 );
$i = -1.1; &Mytest::round($i); is( $i, -1.0 );
$i = 0.0; &Mytest::round($i); is( $i, 0.0 );
$i = 0.5; &Mytest::round($i); is( $i, 1.0 );
$i = 1.2; &Mytest::round($i); is( $i, 1.0 );
Running "C<make test>" should now print out that all nine tests are okay.
Notice that in these new test cases, the argument passed to round was a
scalar variable. You might be wondering if you can round a constant or
literal. To see what happens, temporarily add the following line to Mytest.t:
&Mytest::round(3);
Run "C<make test>" and notice that Perl dies with a fatal error. Perl won't
let you change the value of constants!
=head2 What's new here?
=over 4
=item *
We've made some changes to Makefile.PL. In this case, we've specified an
extra library to be linked into the extension's shared library, the math
library libm in this case. We'll talk later about how to write XSUBs that
can call every routine in a library.
=item *
The value of the function is not being passed back as the function's return
value, but by changing the value of the variable that was passed into the
function. You might have guessed that when you saw that the return value
of round is of type "void".
=back
=head2 Input and Output Parameters
You specify the parameters that will be passed into the XSUB on the line(s)
after you declare the function's return value and name. Each input parameter
line starts with optional whitespace, and may have an optional terminating
semicolon.
The list of output parameters occurs at the very end of the function, just
after the OUTPUT: directive. The use of RETVAL tells Perl that you
wish to send this value back as the return value of the XSUB function. In
Example 3, we wanted the "return value" placed in the original variable
which we passed in, so we listed it (and not RETVAL) in the OUTPUT: section.
=head2 The XSUBPP Program
The B<xsubpp> program takes the XS code in the .xs file and translates it into
C code, placing it in a file whose suffix is .c. The C code created makes
heavy use of the C functions within Perl.
=head2 The TYPEMAP file
The B<xsubpp> program uses rules to convert from Perl's data types (scalar,
array, etc.) to C's data types (int, char, etc.). These rules are stored
in the typemap file ($PERLLIB/ExtUtils/typemap). There's a brief discussion
below, but all the nitty-gritty details can be found in L<perlxstypemap>.
If you have a new-enough version of perl (5.16 and up) or an upgraded
XS compiler (C<ExtUtils::ParseXS> 3.13_01 or better), then you can inline
typemaps in your XS instead of writing separate files.
Either way, this typemap thing is split into three parts:
The first section maps various C data types to a name, which corresponds
somewhat with the various Perl types. The second section contains C code
which B<xsubpp> uses to handle input parameters. The third section contains
C code which B<xsubpp> uses to handle output parameters.
Let's take a look at a portion of the .c file created for our extension.
The file name is Mytest.c:
XS(XS_Mytest_round)
{
dXSARGS;
if (items != 1)
Perl_croak(aTHX_ "Usage: Mytest::round(arg)");
PERL_UNUSED_VAR(cv); /* -W */
{
double arg = (double)SvNV(ST(0)); /* XXXXX */
if (arg > 0.0) {
arg = floor(arg + 0.5);
} else if (arg < 0.0) {
arg = ceil(arg - 0.5);
} else {
arg = 0.0;
}
sv_setnv(ST(0), (double)arg); /* XXXXX */
SvSETMAGIC(ST(0));
}
XSRETURN_EMPTY;
}
Notice the two lines commented with "XXXXX". If you check the first part
of the typemap file (or section), you'll see that doubles are of type
T_DOUBLE. In the INPUT part of the typemap, an argument that is T_DOUBLE
is assigned to the variable arg by calling the routine SvNV on something,
then casting it to double, then assigned to the variable arg. Similarly,
in the OUTPUT section, once arg has its final value, it is passed to the
sv_setnv function to be passed back to the calling subroutine. These two
functions are explained in L<perlguts>; we'll talk more later about what
that "ST(0)" means in the section on the argument stack.
=head2 Warning about Output Arguments
In general, it's not a good idea to write extensions that modify their input
parameters, as in Example 3. Instead, you should probably return multiple
values in an array and let the caller handle them (we'll do this in a later
example). However, in order to better accommodate calling pre-existing C
routines, which often do modify their input parameters, this behavior is
tolerated.
=head2 EXAMPLE 4
In this example, we'll now begin to write XSUBs that will interact with
pre-defined C libraries. To begin with, we will build a small library of
our own, then let h2xs write our .pm and .xs files for us.
Create a new directory called Mytest2 at the same level as the directory
Mytest. In the Mytest2 directory, create another directory called mylib,
and cd into that directory.
Here we'll create some files that will generate a test library. These will
include a C source file and a header file. We'll also create a Makefile.PL
in this directory. Then we'll make sure that running make at the Mytest2
level will automatically run this Makefile.PL file and the resulting Makefile.
In the mylib directory, create a file mylib.h that looks like this:
#define TESTVAL 4
extern double foo(int, long, const char*);
Also create a file mylib.c that looks like this:
#include <stdlib.h>
#include "./mylib.h"
double
foo(int a, long b, const char *c)
{
return (a + b + atof(c) + TESTVAL);
}
And finally create a file Makefile.PL that looks like this:
use ExtUtils::MakeMaker;
$Verbose = 1;
WriteMakefile(
NAME => 'Mytest2::mylib',
SKIP => [qw(all static static_lib dynamic dynamic_lib)],
clean => {'FILES' => 'libmylib$(LIB_EXT)'},
);
sub MY::top_targets {
'
all :: static
pure_all :: static
static :: libmylib$(LIB_EXT)
libmylib$(LIB_EXT): $(O_FILES)
$(AR) cr libmylib$(LIB_EXT) $(O_FILES)
$(RANLIB) libmylib$(LIB_EXT)
';
}
Make sure you use a tab and not spaces on the lines beginning with "$(AR)"
and "$(RANLIB)". Make will not function properly if you use spaces.
It has also been reported that the "cr" argument to $(AR) is unnecessary
on Win32 systems.
We will now create the main top-level Mytest2 files. Change to the directory
above Mytest2 and run the following command:
% h2xs -O -n Mytest2 ./Mytest2/mylib/mylib.h
This will print out a warning about overwriting Mytest2, but that's okay.
Our files are stored in Mytest2/mylib, and will be untouched.
The normal Makefile.PL that h2xs generates doesn't know about the mylib
directory. We need to tell it that there is a subdirectory and that we
will be generating a library in it. Let's add the argument MYEXTLIB to
the WriteMakefile call so that it looks like this:
WriteMakefile(
'NAME' => 'Mytest2',
'VERSION_FROM' => 'Mytest2.pm', # finds $VERSION
'LIBS' => [''], # e.g., '-lm'
'DEFINE' => '', # e.g., '-DHAVE_SOMETHING'
'INC' => '', # e.g., '-I/usr/include/other'
'MYEXTLIB' => 'mylib/libmylib$(LIB_EXT)',
);
and then at the end add a subroutine (which will override the pre-existing
subroutine). Remember to use a tab character to indent the line beginning
with "cd"!
sub MY::postamble {
'
$(MYEXTLIB): mylib/Makefile
cd mylib && $(MAKE) $(PASSTHRU)
';
}
Let's also fix the MANIFEST file so that it accurately reflects the contents
of our extension. The single line that says "mylib" should be replaced by
the following three lines:
mylib/Makefile.PL
mylib/mylib.c
mylib/mylib.h
To keep our namespace nice and unpolluted, edit the .pm file and change
the variable C<@EXPORT> to C<@EXPORT_OK>. Finally, in the
.xs file, edit the #include line to read:
#include "mylib/mylib.h"
And also add the following function definition to the end of the .xs file:
double
foo(a,b,c)
int a
long b
const char * c
OUTPUT:
RETVAL
Now we also need to create a typemap because the default Perl doesn't
currently support the C<const char *> type. Include a new TYPEMAP
section in your XS code before the above function:
TYPEMAP: <<END
const char * T_PV
END
Now run perl on the top-level Makefile.PL. Notice that it also created a
Makefile in the mylib directory. Run make and watch that it does cd into
the mylib directory and run make in there as well.
Now edit the Mytest2.t script and change the number of tests to "4",
and add the following lines to the end of the script:
is( &Mytest2::foo(1, 2, "Hello, world!"), 7 );
is( &Mytest2::foo(1, 2, "0.0"), 7 );
ok( abs(&Mytest2::foo(0, 0, "-3.4") - 0.6) <= 0.01 );
(When dealing with floating-point comparisons, it is best to not check for
equality, but rather that the difference between the expected and actual
result is below a certain amount (called epsilon) which is 0.01 in this case)
Run "C<make test>" and all should be well. There are some warnings on missing
tests for the Mytest2::mylib extension, but you can ignore them.
=head2 What has happened here?
Unlike previous examples, we've now run h2xs on a real include file. This
has caused some extra goodies to appear in both the .pm and .xs files.
=over 4
=item *
In the .xs file, there's now a #include directive with the absolute path to
the mylib.h header file. We changed this to a relative path so that we
could move the extension directory if we wanted to.
=item *
There's now some new C code that's been added to the .xs file. The purpose
of the C<constant> routine is to make the values that are #define'd in the
header file accessible by the Perl script (by calling either C<TESTVAL> or
C<&Mytest2::TESTVAL>). There's also some XS code to allow calls to the
C<constant> routine.
=item *
The .pm file originally exported the name C<TESTVAL> in the C<@EXPORT> array.
This could lead to name clashes. A good rule of thumb is that if the #define
is only going to be used by the C routines themselves, and not by the user,
they should be removed from the C<@EXPORT> array. Alternately, if you don't
mind using the "fully qualified name" of a variable, you could move most
or all of the items from the C<@EXPORT> array into the C<@EXPORT_OK> array.
=item *
If our include file had contained #include directives, these would not have
been processed by h2xs. There is no good solution to this right now.
=item *
We've also told Perl about the library that we built in the mylib
subdirectory. That required only the addition of the C<MYEXTLIB> variable
to the WriteMakefile call and the replacement of the postamble subroutine
to cd into the subdirectory and run make. The Makefile.PL for the
library is a bit more complicated, but not excessively so. Again we
replaced the postamble subroutine to insert our own code. This code
simply specified that the library to be created here was a static archive
library (as opposed to a dynamically loadable library) and provided the
commands to build it.
=back
=head2 Anatomy of .xs file
The .xs file of L<"EXAMPLE 4"> contained some new elements. To understand
the meaning of these elements, pay attention to the line which reads
MODULE = Mytest2 PACKAGE = Mytest2
Anything before this line is plain C code which describes which headers
to include, and defines some convenience functions. No translations are
performed on this part, apart from having embedded POD documentation
skipped over (see L<perlpod>) it goes into the generated output C file as is.
Anything after this line is the description of XSUB functions.
These descriptions are translated by B<xsubpp> into C code which
implements these functions using Perl calling conventions, and which
makes these functions visible from Perl interpreter.
Pay a special attention to the function C<constant>. This name appears
twice in the generated .xs file: once in the first part, as a static C
function, then another time in the second part, when an XSUB interface to
this static C function is defined.
This is quite typical for .xs files: usually the .xs file provides
an interface to an existing C function. Then this C function is defined
somewhere (either in an external library, or in the first part of .xs file),
and a Perl interface to this function (i.e. "Perl glue") is described in the
second part of .xs file. The situation in L<"EXAMPLE 1">, L<"EXAMPLE 2">,
and L<"EXAMPLE 3">, when all the work is done inside the "Perl glue", is
somewhat of an exception rather than the rule.
=head2 Getting the fat out of XSUBs
In L<"EXAMPLE 4"> the second part of .xs file contained the following
description of an XSUB:
double
foo(a,b,c)
int a
long b
const char * c
OUTPUT:
RETVAL
Note that in contrast with L<"EXAMPLE 1">, L<"EXAMPLE 2"> and L<"EXAMPLE 3">,
this description does not contain the actual I<code> for what is done
during a call to Perl function foo(). To understand what is going
on here, one can add a CODE section to this XSUB:
double
foo(a,b,c)
int a
long b
const char * c
CODE:
RETVAL = foo(a,b,c);
OUTPUT:
RETVAL
However, these two XSUBs provide almost identical generated C code: B<xsubpp>
compiler is smart enough to figure out the C<CODE:> section from the first
two lines of the description of XSUB. What about C<OUTPUT:> section? In
fact, that is absolutely the same! The C<OUTPUT:> section can be removed
as well, I<as far as C<CODE:> section or C<PPCODE:> section> is not
specified: B<xsubpp> can see that it needs to generate a function call
section, and will autogenerate the OUTPUT section too. Thus one can
shortcut the XSUB to become:
double
foo(a,b,c)
int a
long b
const char * c
Can we do the same with an XSUB
int
is_even(input)
int input
CODE:
RETVAL = (input % 2 == 0);
OUTPUT:
RETVAL
of L<"EXAMPLE 2">? To do this, one needs to define a C function C<int
is_even(int input)>. As we saw in L<Anatomy of .xs file>, a proper place
for this definition is in the first part of .xs file. In fact a C function
int
is_even(int arg)
{
return (arg % 2 == 0);
}
is probably overkill for this. Something as simple as a C<#define> will
do too:
#define is_even(arg) ((arg) % 2 == 0)
After having this in the first part of .xs file, the "Perl glue" part becomes
as simple as
int
is_even(input)
int input
This technique of separation of the glue part from the workhorse part has
obvious tradeoffs: if you want to change a Perl interface, you need to
change two places in your code. However, it removes a lot of clutter,
and makes the workhorse part independent from idiosyncrasies of Perl calling
convention. (In fact, there is nothing Perl-specific in the above description,
a different version of B<xsubpp> might have translated this to TCL glue or
Python glue as well.)
=head2 More about XSUB arguments
With the completion of Example 4, we now have an easy way to simulate some
real-life libraries whose interfaces may not be the cleanest in the world.
We shall now continue with a discussion of the arguments passed to the
B<xsubpp> compiler.
When you specify arguments to routines in the .xs file, you are really
passing three pieces of information for each argument listed. The first
piece is the order of that argument relative to the others (first, second,
etc). The second is the type of argument, and consists of the type
declaration of the argument (e.g., int, char*, etc). The third piece is
the calling convention for the argument in the call to the library function.
While Perl passes arguments to functions by reference,
C passes arguments by value; to implement a C function which modifies data
of one of the "arguments", the actual argument of this C function would be
a pointer to the data. Thus two C functions with declarations
int string_length(char *s);
int upper_case_char(char *cp);
may have completely different semantics: the first one may inspect an array
of chars pointed by s, and the second one may immediately dereference C<cp>
and manipulate C<*cp> only (using the return value as, say, a success
indicator). From Perl one would use these functions in
a completely different manner.
One conveys this info to B<xsubpp> by replacing C<*> before the
argument by C<&>. C<&> means that the argument should be passed to a library
function by its address. The above two function may be XSUB-ified as
int
string_length(s)
char * s
int
upper_case_char(cp)
char &cp
For example, consider:
int
foo(a,b)
char &a
char * b
The first Perl argument to this function would be treated as a char and
assigned to the variable a, and its address would be passed into the function
foo. The second Perl argument would be treated as a string pointer and assigned
to the variable b. The I<value> of b would be passed into the function foo.
The actual call to the function foo that B<xsubpp> generates would look like
this:
foo(&a, b);
B<xsubpp> will parse the following function argument lists identically:
char &a
char&a
char & a
However, to help ease understanding, it is suggested that you place a "&"
next to the variable name and away from the variable type), and place a
"*" near the variable type, but away from the variable name (as in the
call to foo above). By doing so, it is easy to understand exactly what
will be passed to the C function; it will be whatever is in the "last
column".
You should take great pains to try to pass the function the type of variable
it wants, when possible. It will save you a lot of trouble in the long run.
=head2 The Argument Stack
If we look at any of the C code generated by any of the examples except
example 1, you will notice a number of references to ST(n), where n is
usually 0. "ST" is actually a macro that points to the n'th argument
on the argument stack. ST(0) is thus the first argument on the stack and
therefore the first argument passed to the XSUB, ST(1) is the second
argument, and so on.
When you list the arguments to the XSUB in the .xs file, that tells B<xsubpp>
which argument corresponds to which of the argument stack (i.e., the first
one listed is the first argument, and so on). You invite disaster if you
do not list them in the same order as the function expects them.
The actual values on the argument stack are pointers to the values passed
in. When an argument is listed as being an OUTPUT value, its corresponding
value on the stack (i.e., ST(0) if it was the first argument) is changed.
You can verify this by looking at the C code generated for Example 3.
The code for the round() XSUB routine contains lines that look like this:
double arg = (double)SvNV(ST(0));
/* Round the contents of the variable arg */
sv_setnv(ST(0), (double)arg);
The arg variable is initially set by taking the value from ST(0), then is
stored back into ST(0) at the end of the routine.
XSUBs are also allowed to return lists, not just scalars. This must be
done by manipulating stack values ST(0), ST(1), etc, in a subtly
different way. See L<perlxs> for details.
XSUBs are also allowed to avoid automatic conversion of Perl function arguments
to C function arguments. See L<perlxs> for details. Some people prefer
manual conversion by inspecting C<ST(i)> even in the cases when automatic
conversion will do, arguing that this makes the logic of an XSUB call clearer.
Compare with L<"Getting the fat out of XSUBs"> for a similar tradeoff of
a complete separation of "Perl glue" and "workhorse" parts of an XSUB.
While experts may argue about these idioms, a novice to Perl guts may
prefer a way which is as little Perl-guts-specific as possible, meaning
automatic conversion and automatic call generation, as in
L<"Getting the fat out of XSUBs">. This approach has the additional
benefit of protecting the XSUB writer from future changes to the Perl API.
=head2 Extending your Extension
Sometimes you might want to provide some extra methods or subroutines
to assist in making the interface between Perl and your extension simpler
or easier to understand. These routines should live in the .pm file.
Whether they are automatically loaded when the extension itself is loaded
or only loaded when called depends on where in the .pm file the subroutine
definition is placed. You can also consult L<AutoLoader> for an alternate
way to store and load your extra subroutines.
=head2 Documenting your Extension
There is absolutely no excuse for not documenting your extension.
Documentation belongs in the .pm file. This file will be fed to pod2man,
and the embedded documentation will be converted to the manpage format,
then placed in the blib directory. It will be copied to Perl's
manpage directory when the extension is installed.
You may intersperse documentation and Perl code within the .pm file.
In fact, if you want to use method autoloading, you must do this,
as the comment inside the .pm file explains.
See L<perlpod> for more information about the pod format.
=head2 Installing your Extension
Once your extension is complete and passes all its tests, installing it
is quite simple: you simply run "make install". You will either need
to have write permission into the directories where Perl is installed,
or ask your system administrator to run the make for you.
Alternately, you can specify the exact directory to place the extension's
files by placing a "PREFIX=/destination/directory" after the make install
(or in between the make and install if you have a brain-dead version of make).
This can be very useful if you are building an extension that will eventually
be distributed to multiple systems. You can then just archive the files in
the destination directory and distribute them to your destination systems.
=head2 EXAMPLE 5
In this example, we'll do some more work with the argument stack. The
previous examples have all returned only a single value. We'll now
create an extension that returns an array.
This extension is very Unix-oriented (struct statfs and the statfs system
call). If you are not running on a Unix system, you can substitute for
statfs any other function that returns multiple values, you can hard-code
values to be returned to the caller (although this will be a bit harder
to test the error case), or you can simply not do this example. If you
change the XSUB, be sure to fix the test cases to match the changes.
Return to the Mytest directory and add the following code to the end of
Mytest.xs:
void
statfs(path)
char * path
INIT:
int i;
struct statfs buf;
PPCODE:
i = statfs(path, &buf);
if (i == 0) {
XPUSHs(sv_2mortal(newSVnv(buf.f_bavail)));
XPUSHs(sv_2mortal(newSVnv(buf.f_bfree)));
XPUSHs(sv_2mortal(newSVnv(buf.f_blocks)));
XPUSHs(sv_2mortal(newSVnv(buf.f_bsize)));
XPUSHs(sv_2mortal(newSVnv(buf.f_ffree)));
XPUSHs(sv_2mortal(newSVnv(buf.f_files)));
XPUSHs(sv_2mortal(newSVnv(buf.f_type)));
} else {
XPUSHs(sv_2mortal(newSVnv(errno)));
}
You'll also need to add the following code to the top of the .xs file, just
after the include of "XSUB.h":
#include <sys/vfs.h>
Also add the following code segment to Mytest.t while incrementing the "9"
tests to "11":
@a = &Mytest::statfs("/blech");
ok( scalar(@a) == 1 && $a[0] == 2 );
@a = &Mytest::statfs("/");
is( scalar(@a), 7 );
=head2 New Things in this Example
This example added quite a few new concepts. We'll take them one at a time.
=over 4
=item *
The INIT: directive contains code that will be placed immediately after
the argument stack is decoded. C does not allow variable declarations at
arbitrary locations inside a function,
so this is usually the best way to declare local variables needed by the XSUB.
(Alternatively, one could put the whole C<PPCODE:> section into braces, and
put these declarations on top.)
=item *
This routine also returns a different number of arguments depending on the
success or failure of the call to statfs. If there is an error, the error
number is returned as a single-element array. If the call is successful,
then a 7-element array is returned. Since only one argument is passed into
this function, we need room on the stack to hold the 7 values which may be
returned.
We do this by using the PPCODE: directive, rather than the CODE: directive.
This tells B<xsubpp> that we will be managing the return values that will be
put on the argument stack by ourselves.
=item *
When we want to place values to be returned to the caller onto the stack,
we use the series of macros that begin with "XPUSH". There are five
different versions, for placing integers, unsigned integers, doubles,
strings, and Perl scalars on the stack. In our example, we placed a
Perl scalar onto the stack. (In fact this is the only macro which
can be used to return multiple values.)
The XPUSH* macros will automatically extend the return stack to prevent
it from being overrun. You push values onto the stack in the order you
want them seen by the calling program.
=item *
The values pushed onto the return stack of the XSUB are actually mortal SV's.
They are made mortal so that once the values are copied by the calling
program, the SV's that held the returned values can be deallocated.
If they were not mortal, then they would continue to exist after the XSUB
routine returned, but would not be accessible. This is a memory leak.
=item *
If we were interested in performance, not in code compactness, in the success
branch we would not use C<XPUSHs> macros, but C<PUSHs> macros, and would
pre-extend the stack before pushing the return values:
EXTEND(SP, 7);
The tradeoff is that one needs to calculate the number of return values
in advance (though overextending the stack will not typically hurt
anything but memory consumption).
Similarly, in the failure branch we could use C<PUSHs> I<without> extending
the stack: the Perl function reference comes to an XSUB on the stack, thus
the stack is I<always> large enough to take one return value.
=back
=head2 EXAMPLE 6
In this example, we will accept a reference to an array as an input
parameter, and return a reference to an array of hashes. This will
demonstrate manipulation of complex Perl data types from an XSUB.
This extension is somewhat contrived. It is based on the code in
the previous example. It calls the statfs function multiple times,
accepting a reference to an array of filenames as input, and returning
a reference to an array of hashes containing the data for each of the
filesystems.
Return to the Mytest directory and add the following code to the end of
Mytest.xs:
SV *
multi_statfs(paths)
SV * paths
INIT:
AV * results;
SSize_t numpaths = 0, n;
int i;
struct statfs buf;
SvGETMAGIC(paths);
if ((!SvROK(paths))
|| (SvTYPE(SvRV(paths)) != SVt_PVAV)
|| ((numpaths = av_top_index((AV *)SvRV(paths))) < 0))
{
XSRETURN_UNDEF;
}
results = (AV *)sv_2mortal((SV *)newAV());
CODE:
for (n = 0; n <= numpaths; n++) {
HV * rh;
STRLEN l;
char * fn = SvPV(*av_fetch((AV *)SvRV(paths), n, 0), l);
i = statfs(fn, &buf);
if (i != 0) {
av_push(results, newSVnv(errno));
continue;
}
rh = (HV *)sv_2mortal((SV *)newHV());
hv_store(rh, "f_bavail", 8, newSVnv(buf.f_bavail), 0);
hv_store(rh, "f_bfree", 7, newSVnv(buf.f_bfree), 0);
hv_store(rh, "f_blocks", 8, newSVnv(buf.f_blocks), 0);
hv_store(rh, "f_bsize", 7, newSVnv(buf.f_bsize), 0);
hv_store(rh, "f_ffree", 7, newSVnv(buf.f_ffree), 0);
hv_store(rh, "f_files", 7, newSVnv(buf.f_files), 0);
hv_store(rh, "f_type", 6, newSVnv(buf.f_type), 0);
av_push(results, newRV_inc((SV *)rh));
}
RETVAL = newRV_inc((SV *)results);
OUTPUT:
RETVAL
And add the following code to Mytest.t, while incrementing the "11"
tests to "13":
$results = Mytest::multi_statfs([ '/', '/blech' ]);
ok( ref $results->[0] );
ok( ! ref $results->[1] );
=head2 New Things in this Example
There are a number of new concepts introduced here, described below:
=over 4
=item *
This function does not use a typemap. Instead, we declare it as accepting
one SV* (scalar) parameter, and returning an SV* value, and we take care of
populating these scalars within the code. Because we are only returning
one value, we don't need a C<PPCODE:> directive - instead, we use C<CODE:>
and C<OUTPUT:> directives.
=item *
When dealing with references, it is important to handle them with caution.
The C<INIT:> block first calls SvGETMAGIC(paths), in case
paths is a tied variable. Then it checks that C<SvROK> returns
true, which indicates that paths is a valid reference. (Simply
checking C<SvROK> won't trigger FETCH on a tied variable.) It
then verifies that the object referenced by paths is an array, using C<SvRV>
to dereference paths, and C<SvTYPE> to discover its type. As an added test,
it checks that the array referenced by paths is non-empty, using the
C<av_top_index> function (which returns -1 if the array is empty). The
XSRETURN_UNDEF macro is used to abort the XSUB and return the undefined value
whenever all three of these conditions are not met.
=item *
We manipulate several arrays in this XSUB. Note that an array is represented
internally by an AV* pointer. The functions and macros for manipulating
arrays are similar to the functions in Perl: C<av_top_index> returns the
highest index in an AV*, much like $#array; C<av_fetch> fetches a single scalar
value from an array, given its index; C<av_push> pushes a scalar value onto the
end of the array, automatically extending the array as necessary.
Specifically, we read pathnames one at a time from the input array, and
store the results in an output array (results) in the same order. If
statfs fails, the element pushed onto the return array is the value of
errno after the failure. If statfs succeeds, though, the value pushed
onto the return array is a reference to a hash containing some of the
information in the statfs structure.
As with the return stack, it would be possible (and a small performance win)
to pre-extend the return array before pushing data into it, since we know
how many elements we will return:
av_extend(results, numpaths);
=item *
We are performing only one hash operation in this function, which is storing
a new scalar under a key using C<hv_store>. A hash is represented by an HV*
pointer. Like arrays, the functions for manipulating hashes from an XSUB
mirror the functionality available from Perl. See L<perlguts> and L<perlapi>
for details.
=item *
To create a reference, we use the C<newRV_inc> function. Note that you can
cast an AV* or an HV* to type SV* in this case (and many others). This
allows you to take references to arrays, hashes and scalars with the same
function. Conversely, the C<SvRV> function always returns an SV*, which may
need to be cast to the appropriate type if it is something other than a
scalar (check with C<SvTYPE>).
=item *
At this point, xsubpp is doing very little work - the differences between
Mytest.xs and Mytest.c are minimal.
=back
=head2 EXAMPLE 7 (Coming Soon)
XPUSH args AND set RETVAL AND assign return value to array
=head2 EXAMPLE 8 (Coming Soon)
Setting $!
=head2 EXAMPLE 9 Passing open files to XSes
You would think passing files to an XS is difficult, with all the
typeglobs and stuff. Well, it isn't.
Suppose that for some strange reason we need a wrapper around the
standard C library function C<fputs()>. This is all we need:
#define PERLIO_NOT_STDIO 0
#define PERL_NO_GET_CONTEXT
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include <stdio.h>
int
fputs(s, stream)
char * s
FILE * stream
The real work is done in the standard typemap.
B<But> you lose all the fine stuff done by the perlio layers. This
calls the stdio function C<fputs()>, which knows nothing about them.
The standard typemap offers three variants of PerlIO *:
C<InputStream> (T_IN), C<InOutStream> (T_INOUT) and C<OutputStream>
(T_OUT). A bare C<PerlIO *> is considered a T_INOUT. If it matters
in your code (see below for why it might) #define or typedef
one of the specific names and use that as the argument or result
type in your XS file.
The standard typemap does not contain PerlIO * before perl 5.7,
but it has the three stream variants. Using a PerlIO * directly
is not backwards compatible unless you provide your own typemap.
For streams coming I<from> perl the main difference is that
C<OutputStream> will get the output PerlIO * - which may make
a difference on a socket. Like in our example...
For streams being handed I<to> perl a new file handle is created
(i.e. a reference to a new glob) and associated with the PerlIO *
provided. If the read/write state of the PerlIO * is not correct then you
may get errors or warnings from when the file handle is used.
So if you opened the PerlIO * as "w" it should really be an
C<OutputStream> if open as "r" it should be an C<InputStream>.
Now, suppose you want to use perlio layers in your XS. We'll use the
perlio C<PerlIO_puts()> function as an example.
In the C part of the XS file (above the first MODULE line) you
have
#define OutputStream PerlIO *
or
typedef PerlIO * OutputStream;
And this is the XS code:
int
perlioputs(s, stream)
char * s
OutputStream stream
CODE:
RETVAL = PerlIO_puts(stream, s);
OUTPUT:
RETVAL
We have to use a C<CODE> section because C<PerlIO_puts()> has the arguments
reversed compared to C<fputs()>, and we want to keep the arguments the same.
Wanting to explore this thoroughly, we want to use the stdio C<fputs()>
on a PerlIO *. This means we have to ask the perlio system for a stdio
C<FILE *>:
int
perliofputs(s, stream)
char * s
OutputStream stream
PREINIT:
FILE *fp = PerlIO_findFILE(stream);
CODE:
if (fp != (FILE*) 0) {
RETVAL = fputs(s, fp);
} else {
RETVAL = -1;
}
OUTPUT:
RETVAL
Note: C<PerlIO_findFILE()> will search the layers for a stdio
layer. If it can't find one, it will call C<PerlIO_exportFILE()> to
generate a new stdio C<FILE>. Please only call C<PerlIO_exportFILE()> if
you want a I<new> C<FILE>. It will generate one on each call and push a
new stdio layer. So don't call it repeatedly on the same
file. C<PerlIO_findFILE()> will retrieve the stdio layer once it has been
generated by C<PerlIO_exportFILE()>.
This applies to the perlio system only. For versions before 5.7,
C<PerlIO_exportFILE()> is equivalent to C<PerlIO_findFILE()>.
=head2 Troubleshooting these Examples
As mentioned at the top of this document, if you are having problems with
these example extensions, you might see if any of these help you.
=over 4
=item *
In versions of 5.002 prior to the gamma version, the test script in Example
1 will not function properly. You need to change the "use lib" line to
read:
use lib './blib';
=item *
In versions of 5.002 prior to version 5.002b1h, the test.pl file was not
automatically created by h2xs. This means that you cannot say "make test"
to run the test script. You will need to add the following line before the
"use extension" statement:
use lib './blib';
=item *
In versions 5.000 and 5.001, instead of using the above line, you will need
to use the following line:
BEGIN { unshift(@INC, "./blib") }
=item *
This document assumes that the executable named "perl" is Perl version 5.
Some systems may have installed Perl version 5 as "perl5".
=back
=head1 See also
For more information, consult L<perlguts>, L<perlapi>, L<perlxs>, L<perlmod>,
and L<perlpod>.
=head1 Author
Jeff Okamoto <F<okamoto@corp.hp.com>>
Reviewed and assisted by Dean Roehrich, Ilya Zakharevich, Andreas Koenig,
and Tim Bunce.
PerlIO material contributed by Lupe Christoph, with some clarification
by Nick Ing-Simmons.
Changes for h2xs as of Perl 5.8.x by Renee Baecker
=head2 Last Changed
2012-01-20
PK z3�Zƕ��� � perlsyn.podnu �[��� =head1 NAME
X<syntax>
perlsyn - Perl syntax
=head1 DESCRIPTION
A Perl program consists of a sequence of declarations and statements
which run from the top to the bottom. Loops, subroutines, and other
control structures allow you to jump around within the code.
Perl is a B<free-form> language: you can format and indent it however
you like. Whitespace serves mostly to separate tokens, unlike
languages like Python where it is an important part of the syntax,
or Fortran where it is immaterial.
Many of Perl's syntactic elements are B<optional>. Rather than
requiring you to put parentheses around every function call and
declare every variable, you can often leave such explicit elements off
and Perl will figure out what you meant. This is known as B<Do What I
Mean>, abbreviated B<DWIM>. It allows programmers to be B<lazy> and to
code in a style with which they are comfortable.
Perl B<borrows syntax> and concepts from many languages: awk, sed, C,
Bourne Shell, Smalltalk, Lisp and even English. Other
languages have borrowed syntax from Perl, particularly its regular
expression extensions. So if you have programmed in another language
you will see familiar pieces in Perl. They often work the same, but
see L<perltrap> for information about how they differ.
=head2 Declarations
X<declaration> X<undef> X<undefined> X<uninitialized>
The only things you need to declare in Perl are report formats and
subroutines (and sometimes not even subroutines). A scalar variable holds
the undefined value (C<undef>) until it has been assigned a defined
value, which is anything other than C<undef>. When used as a number,
C<undef> is treated as C<0>; when used as a string, it is treated as
the empty string, C<"">; and when used as a reference that isn't being
assigned to, it is treated as an error. If you enable warnings,
you'll be notified of an uninitialized value whenever you treat
C<undef> as a string or a number. Well, usually. Boolean contexts,
such as:
if ($a) {}
are exempt from warnings (because they care about truth rather than
definedness). Operators such as C<++>, C<-->, C<+=>,
C<-=>, and C<.=>, that operate on undefined variables such as:
undef $a;
$a++;
are also always exempt from such warnings.
A declaration can be put anywhere a statement can, but has no effect on
the execution of the primary sequence of statements: declarations all
take effect at compile time. All declarations are typically put at
the beginning or the end of the script. However, if you're using
lexically-scoped private variables created with C<my()>,
C<state()>, or C<our()>, you'll have to make sure
your format or subroutine definition is within the same block scope
as the my if you expect to be able to access those private variables.
Declaring a subroutine allows a subroutine name to be used as if it were a
list operator from that point forward in the program. You can declare a
subroutine without defining it by saying C<sub name>, thus:
X<subroutine, declaration>
sub myname;
$me = myname $0 or die "can't get myname";
A bare declaration like that declares the function to be a list operator,
not a unary operator, so you have to be careful to use parentheses (or
C<or> instead of C<||>.) The C<||> operator binds too tightly to use after
list operators; it becomes part of the last element. You can always use
parentheses around the list operators arguments to turn the list operator
back into something that behaves more like a function call. Alternatively,
you can use the prototype C<($)> to turn the subroutine into a unary
operator:
sub myname ($);
$me = myname $0 || die "can't get myname";
That now parses as you'd expect, but you still ought to get in the habit of
using parentheses in that situation. For more on prototypes, see
L<perlsub>.
Subroutines declarations can also be loaded up with the C<require> statement
or both loaded and imported into your namespace with a C<use> statement.
See L<perlmod> for details on this.
A statement sequence may contain declarations of lexically-scoped
variables, but apart from declaring a variable name, the declaration acts
like an ordinary statement, and is elaborated within the sequence of
statements as if it were an ordinary statement. That means it actually
has both compile-time and run-time effects.
=head2 Comments
X<comment> X<#>
Text from a C<"#"> character until the end of the line is a comment,
and is ignored. Exceptions include C<"#"> inside a string or regular
expression.
=head2 Simple Statements
X<statement> X<semicolon> X<expression> X<;>
The only kind of simple statement is an expression evaluated for its
side-effects. Every simple statement must be terminated with a
semicolon, unless it is the final statement in a block, in which case
the semicolon is optional. But put the semicolon in anyway if the
block takes up more than one line, because you may eventually add
another line. Note that there are operators like C<eval {}>, C<sub {}>, and
C<do {}> that I<look> like compound statements, but aren't--they're just
TERMs in an expression--and thus need an explicit termination when used
as the last item in a statement.
=head2 Truth and Falsehood
X<truth> X<falsehood> X<true> X<false> X<!> X<not> X<negation> X<0>
The number 0, the strings C<'0'> and C<"">, the empty list C<()>, and
C<undef> are all false in a boolean context. All other values are true.
Negation of a true value by C<!> or C<not> returns a special false value.
When evaluated as a string it is treated as C<"">, but as a number, it
is treated as 0. Most Perl operators
that return true or false behave this way.
=head2 Statement Modifiers
X<statement modifier> X<modifier> X<if> X<unless> X<while>
X<until> X<when> X<foreach> X<for>
Any simple statement may optionally be followed by a I<SINGLE> modifier,
just before the terminating semicolon (or block ending). The possible
modifiers are:
if EXPR
unless EXPR
while EXPR
until EXPR
for LIST
foreach LIST
when EXPR
The C<EXPR> following the modifier is referred to as the "condition".
Its truth or falsehood determines how the modifier will behave.
C<if> executes the statement once I<if> and only if the condition is
true. C<unless> is the opposite, it executes the statement I<unless>
the condition is true (that is, if the condition is false).
print "Basset hounds got long ears" if length $ear >= 10;
go_outside() and play() unless $is_raining;
The C<for(each)> modifier is an iterator: it executes the statement once
for each item in the LIST (with C<$_> aliased to each item in turn).
print "Hello $_!\n" for qw(world Dolly nurse);
C<while> repeats the statement I<while> the condition is true.
C<until> does the opposite, it repeats the statement I<until> the
condition is true (or while the condition is false):
# Both of these count from 0 to 10.
print $i++ while $i <= 10;
print $j++ until $j > 10;
The C<while> and C<until> modifiers have the usual "C<while> loop"
semantics (conditional evaluated first), except when applied to a
C<do>-BLOCK (or to the Perl4 C<do>-SUBROUTINE statement), in
which case the block executes once before the conditional is
evaluated.
This is so that you can write loops like:
do {
$line = <STDIN>;
...
} until !defined($line) || $line eq ".\n"
See L<perlfunc/do>. Note also that the loop control statements described
later will I<NOT> work in this construct, because modifiers don't take
loop labels. Sorry. You can always put another block inside of it
(for C<next>/C<redo>) or around it (for C<last>) to do that sort of thing.
X<next> X<last> X<redo>
For C<next> or C<redo>, just double the braces:
do {{
next if $x == $y;
# do something here
}} until $x++ > $z;
For C<last>, you have to be more elaborate and put braces around it:
X<last>
{
do {
last if $x == $y**2;
# do something here
} while $x++ <= $z;
}
If you need both C<next> and C<last>, you have to do both and also use a
loop label:
LOOP: {
do {{
next if $x == $y;
last LOOP if $x == $y**2;
# do something here
}} until $x++ > $z;
}
B<NOTE:> The behaviour of a C<my>, C<state>, or
C<our> modified with a statement modifier conditional
or loop construct (for example, C<my $x if ...>) is
B<undefined>. The value of the C<my> variable may be C<undef>, any
previously assigned value, or possibly anything else. Don't rely on
it. Future versions of perl might do something different from the
version of perl you try it out on. Here be dragons.
X<my>
The C<when> modifier is an experimental feature that first appeared in Perl
5.14. To use it, you should include a C<use v5.14> declaration.
(Technically, it requires only the C<switch> feature, but that aspect of it
was not available before 5.14.) Operative only from within a C<foreach>
loop or a C<given> block, it executes the statement only if the smartmatch
C<< $_ ~~ I<EXPR> >> is true. If the statement executes, it is followed by
a C<next> from inside a C<foreach> and C<break> from inside a C<given>.
Under the current implementation, the C<foreach> loop can be
anywhere within the C<when> modifier's dynamic scope, but must be
within the C<given> block's lexical scope. This restriction may
be relaxed in a future release. See L</"Switch Statements"> below.
=head2 Compound Statements
X<statement, compound> X<block> X<bracket, curly> X<curly bracket> X<brace>
X<{> X<}> X<if> X<unless> X<given> X<while> X<until> X<foreach> X<for> X<continue>
In Perl, a sequence of statements that defines a scope is called a block.
Sometimes a block is delimited by the file containing it (in the case
of a required file, or the program as a whole), and sometimes a block
is delimited by the extent of a string (in the case of an eval).
But generally, a block is delimited by curly brackets, also known as braces.
We will call this syntactic construct a BLOCK.
The following compound statements may be used to control flow:
if (EXPR) BLOCK
if (EXPR) BLOCK else BLOCK
if (EXPR) BLOCK elsif (EXPR) BLOCK ...
if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
unless (EXPR) BLOCK
unless (EXPR) BLOCK else BLOCK
unless (EXPR) BLOCK elsif (EXPR) BLOCK ...
unless (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
given (EXPR) BLOCK
LABEL while (EXPR) BLOCK
LABEL while (EXPR) BLOCK continue BLOCK
LABEL until (EXPR) BLOCK
LABEL until (EXPR) BLOCK continue BLOCK
LABEL for (EXPR; EXPR; EXPR) BLOCK
LABEL for VAR (LIST) BLOCK
LABEL for VAR (LIST) BLOCK continue BLOCK
LABEL foreach (EXPR; EXPR; EXPR) BLOCK
LABEL foreach VAR (LIST) BLOCK
LABEL foreach VAR (LIST) BLOCK continue BLOCK
LABEL BLOCK
LABEL BLOCK continue BLOCK
PHASE BLOCK
The experimental C<given> statement is I<not automatically enabled>; see
L</"Switch Statements"> below for how to do so, and the attendant caveats.
Unlike in C and Pascal, in Perl these are all defined in terms of BLOCKs,
not statements. This means that the curly brackets are I<required>--no
dangling statements allowed. If you want to write conditionals without
curly brackets, there are several other ways to do it. The following
all do the same thing:
if (!open(FOO)) { die "Can't open $FOO: $!" }
die "Can't open $FOO: $!" unless open(FOO);
open(FOO) || die "Can't open $FOO: $!";
open(FOO) ? () : die "Can't open $FOO: $!";
# a bit exotic, that last one
The C<if> statement is straightforward. Because BLOCKs are always
bounded by curly brackets, there is never any ambiguity about which
C<if> an C<else> goes with. If you use C<unless> in place of C<if>,
the sense of the test is reversed. Like C<if>, C<unless> can be followed
by C<else>. C<unless> can even be followed by one or more C<elsif>
statements, though you may want to think twice before using that particular
language construct, as everyone reading your code will have to think at least
twice before they can understand what's going on.
The C<while> statement executes the block as long as the expression is
L<true|/"Truth and Falsehood">.
The C<until> statement executes the block as long as the expression is
false.
The LABEL is optional, and if present, consists of an identifier followed
by a colon. The LABEL identifies the loop for the loop control
statements C<next>, C<last>, and C<redo>.
If the LABEL is omitted, the loop control statement
refers to the innermost enclosing loop. This may include dynamically
looking back your call-stack at run time to find the LABEL. Such
desperate behavior triggers a warning if you use the C<use warnings>
pragma or the B<-w> flag.
If there is a C<continue> BLOCK, it is always executed just before the
conditional is about to be evaluated again. Thus it can be used to
increment a loop variable, even when the loop has been continued via
the C<next> statement.
When a block is preceding by a compilation phase keyword such as C<BEGIN>,
C<END>, C<INIT>, C<CHECK>, or C<UNITCHECK>, then the block will run only
during the corresponding phase of execution. See L<perlmod> for more details.
Extension modules can also hook into the Perl parser to define new
kinds of compound statements. These are introduced by a keyword which
the extension recognizes, and the syntax following the keyword is
defined entirely by the extension. If you are an implementor, see
L<perlapi/PL_keyword_plugin> for the mechanism. If you are using such
a module, see the module's documentation for details of the syntax that
it defines.
=head2 Loop Control
X<loop control> X<loop, control> X<next> X<last> X<redo> X<continue>
The C<next> command starts the next iteration of the loop:
LINE: while (<STDIN>) {
next LINE if /^#/; # discard comments
...
}
The C<last> command immediately exits the loop in question. The
C<continue> block, if any, is not executed:
LINE: while (<STDIN>) {
last LINE if /^$/; # exit when done with header
...
}
The C<redo> command restarts the loop block without evaluating the
conditional again. The C<continue> block, if any, is I<not> executed.
This command is normally used by programs that want to lie to themselves
about what was just input.
For example, when processing a file like F</etc/termcap>.
If your input lines might end in backslashes to indicate continuation, you
want to skip ahead and get the next record.
while (<>) {
chomp;
if (s/\\$//) {
$_ .= <>;
redo unless eof();
}
# now process $_
}
which is Perl shorthand for the more explicitly written version:
LINE: while (defined($line = <ARGV>)) {
chomp($line);
if ($line =~ s/\\$//) {
$line .= <ARGV>;
redo LINE unless eof(); # not eof(ARGV)!
}
# now process $line
}
Note that if there were a C<continue> block on the above code, it would
get executed only on lines discarded by the regex (since redo skips the
continue block). A continue block is often used to reset line counters
or C<m?pat?> one-time matches:
# inspired by :1,$g/fred/s//WILMA/
while (<>) {
m?(fred)? && s//WILMA $1 WILMA/;
m?(barney)? && s//BETTY $1 BETTY/;
m?(homer)? && s//MARGE $1 MARGE/;
} continue {
print "$ARGV $.: $_";
close ARGV if eof; # reset $.
reset if eof; # reset ?pat?
}
If the word C<while> is replaced by the word C<until>, the sense of the
test is reversed, but the conditional is still tested before the first
iteration.
Loop control statements don't work in an C<if> or C<unless>, since
they aren't loops. You can double the braces to make them such, though.
if (/pattern/) {{
last if /fred/;
next if /barney/; # same effect as "last",
# but doesn't document as well
# do something here
}}
This is caused by the fact that a block by itself acts as a loop that
executes once, see L</"Basic BLOCKs">.
The form C<while/if BLOCK BLOCK>, available in Perl 4, is no longer
available. Replace any occurrence of C<if BLOCK> by C<if (do BLOCK)>.
=head2 For Loops
X<for> X<foreach>
Perl's C-style C<for> loop works like the corresponding C<while> loop;
that means that this:
for ($i = 1; $i < 10; $i++) {
...
}
is the same as this:
$i = 1;
while ($i < 10) {
...
} continue {
$i++;
}
There is one minor difference: if variables are declared with C<my>
in the initialization section of the C<for>, the lexical scope of
those variables is exactly the C<for> loop (the body of the loop
and the control sections).
X<my>
As a special case, if the test in the C<for> loop (or the corresponding
C<while> loop) is empty, it is treated as true. That is, both
for (;;) {
...
}
and
while () {
...
}
are treated as infinite loops.
Besides the normal array index looping, C<for> can lend itself
to many other interesting applications. Here's one that avoids the
problem you get into if you explicitly test for end-of-file on
an interactive file descriptor causing your program to appear to
hang.
X<eof> X<end-of-file> X<end of file>
$on_a_tty = -t STDIN && -t STDOUT;
sub prompt { print "yes? " if $on_a_tty }
for ( prompt(); <STDIN>; prompt() ) {
# do something
}
Using C<readline> (or the operator form, C<< <EXPR> >>) as the
conditional of a C<for> loop is shorthand for the following. This
behaviour is the same as a C<while> loop conditional.
X<readline> X<< <> >>
for ( prompt(); defined( $_ = <STDIN> ); prompt() ) {
# do something
}
=head2 Foreach Loops
X<for> X<foreach>
The C<foreach> loop iterates over a normal list value and sets the scalar
variable VAR to be each element of the list in turn. If the variable
is preceded with the keyword C<my>, then it is lexically scoped, and
is therefore visible only within the loop. Otherwise, the variable is
implicitly local to the loop and regains its former value upon exiting
the loop. If the variable was previously declared with C<my>, it uses
that variable instead of the global one, but it's still localized to
the loop. This implicit localization occurs I<only> in a C<foreach>
loop.
X<my> X<local>
The C<foreach> keyword is actually a synonym for the C<for> keyword, so
you can use either. If VAR is omitted, C<$_> is set to each value.
X<$_>
If any element of LIST is an lvalue, you can modify it by modifying
VAR inside the loop. Conversely, if any element of LIST is NOT an
lvalue, any attempt to modify that element will fail. In other words,
the C<foreach> loop index variable is an implicit alias for each item
in the list that you're looping over.
X<alias>
If any part of LIST is an array, C<foreach> will get very confused if
you add or remove elements within the loop body, for example with
C<splice>. So don't do that.
X<splice>
C<foreach> probably won't do what you expect if VAR is a tied or other
special variable. Don't do that either.
As of Perl 5.22, there is an experimental variant of this loop that accepts
a variable preceded by a backslash for VAR, in which case the items in the
LIST must be references. The backslashed variable will become an alias
to each referenced item in the LIST, which must be of the correct type.
The variable needn't be a scalar in this case, and the backslash may be
followed by C<my>. To use this form, you must enable the C<refaliasing>
feature via C<use feature>. (See L<feature>. See also L<perlref/Assigning
to References>.)
Examples:
for (@ary) { s/foo/bar/ }
for my $elem (@elements) {
$elem *= 2;
}
for $count (reverse(1..10), "BOOM") {
print $count, "\n";
sleep(1);
}
for (1..15) { print "Merry Christmas\n"; }
foreach $item (split(/:[\\\n:]*/, $ENV{TERMCAP})) {
print "Item: $item\n";
}
use feature "refaliasing";
no warnings "experimental::refaliasing";
foreach \my %hash (@array_of_hash_references) {
# do something which each %hash
}
Here's how a C programmer might code up a particular algorithm in Perl:
for (my $i = 0; $i < @ary1; $i++) {
for (my $j = 0; $j < @ary2; $j++) {
if ($ary1[$i] > $ary2[$j]) {
last; # can't go to outer :-(
}
$ary1[$i] += $ary2[$j];
}
# this is where that last takes me
}
Whereas here's how a Perl programmer more comfortable with the idiom might
do it:
OUTER: for my $wid (@ary1) {
INNER: for my $jet (@ary2) {
next OUTER if $wid > $jet;
$wid += $jet;
}
}
See how much easier this is? It's cleaner, safer, and faster. It's
cleaner because it's less noisy. It's safer because if code gets added
between the inner and outer loops later on, the new code won't be
accidentally executed. The C<next> explicitly iterates the other loop
rather than merely terminating the inner one. And it's faster because
Perl executes a C<foreach> statement more rapidly than it would the
equivalent C<for> loop.
Perceptive Perl hackers may have noticed that a C<for> loop has a return
value, and that this value can be captured by wrapping the loop in a C<do>
block. The reward for this discovery is this cautionary advice: The
return value of a C<for> loop is unspecified and may change without notice.
Do not rely on it.
=head2 Basic BLOCKs
X<block>
A BLOCK by itself (labeled or not) is semantically equivalent to a
loop that executes once. Thus you can use any of the loop control
statements in it to leave or restart the block. (Note that this is
I<NOT> true in C<eval{}>, C<sub{}>, or contrary to popular belief
C<do{}> blocks, which do I<NOT> count as loops.) The C<continue>
block is optional.
The BLOCK construct can be used to emulate case structures.
SWITCH: {
if (/^abc/) { $abc = 1; last SWITCH; }
if (/^def/) { $def = 1; last SWITCH; }
if (/^xyz/) { $xyz = 1; last SWITCH; }
$nothing = 1;
}
You'll also find that C<foreach> loop used to create a topicalizer
and a switch:
SWITCH:
for ($var) {
if (/^abc/) { $abc = 1; last SWITCH; }
if (/^def/) { $def = 1; last SWITCH; }
if (/^xyz/) { $xyz = 1; last SWITCH; }
$nothing = 1;
}
Such constructs are quite frequently used, both because older versions of
Perl had no official C<switch> statement, and also because the new version
described immediately below remains experimental and can sometimes be confusing.
=head2 Switch Statements
X<switch> X<case> X<given> X<when> X<default>
Starting from Perl 5.10.1 (well, 5.10.0, but it didn't work
right), you can say
use feature "switch";
to enable an experimental switch feature. This is loosely based on an
old version of a Perl 6 proposal, but it no longer resembles the Perl 6
construct. You also get the switch feature whenever you declare that your
code prefers to run under a version of Perl that is 5.10 or later. For
example:
use v5.14;
Under the "switch" feature, Perl gains the experimental keywords
C<given>, C<when>, C<default>, C<continue>, and C<break>.
Starting from Perl 5.16, one can prefix the switch
keywords with C<CORE::> to access the feature without a C<use feature>
statement. The keywords C<given> and
C<when> are analogous to C<switch> and
C<case> in other languages -- though C<continue> is not -- so the code
in the previous section could be rewritten as
use v5.10.1;
for ($var) {
when (/^abc/) { $abc = 1 }
when (/^def/) { $def = 1 }
when (/^xyz/) { $xyz = 1 }
default { $nothing = 1 }
}
The C<foreach> is the non-experimental way to set a topicalizer.
If you wish to use the highly experimental C<given>, that could be
written like this:
use v5.10.1;
given ($var) {
when (/^abc/) { $abc = 1 }
when (/^def/) { $def = 1 }
when (/^xyz/) { $xyz = 1 }
default { $nothing = 1 }
}
As of 5.14, that can also be written this way:
use v5.14;
for ($var) {
$abc = 1 when /^abc/;
$def = 1 when /^def/;
$xyz = 1 when /^xyz/;
default { $nothing = 1 }
}
Or if you don't care to play it safe, like this:
use v5.14;
given ($var) {
$abc = 1 when /^abc/;
$def = 1 when /^def/;
$xyz = 1 when /^xyz/;
default { $nothing = 1 }
}
The arguments to C<given> and C<when> are in scalar context,
and C<given> assigns the C<$_> variable its topic value.
Exactly what the I<EXPR> argument to C<when> does is hard to describe
precisely, but in general, it tries to guess what you want done. Sometimes
it is interpreted as C<< $_ ~~ I<EXPR> >>, and sometimes it is not. It
also behaves differently when lexically enclosed by a C<given> block than
it does when dynamically enclosed by a C<foreach> loop. The rules are far
too difficult to understand to be described here. See L</"Experimental Details
on given and when"> later on.
Due to an unfortunate bug in how C<given> was implemented between Perl 5.10
and 5.16, under those implementations the version of C<$_> governed by
C<given> is merely a lexically scoped copy of the original, not a
dynamically scoped alias to the original, as it would be if it were a
C<foreach> or under both the original and the current Perl 6 language
specification. This bug was fixed in Perl 5.18 (and lexicalized C<$_> itself
was removed in Perl 5.24).
If your code still needs to run on older versions,
stick to C<foreach> for your topicalizer and
you will be less unhappy.
=head2 Goto
X<goto>
Although not for the faint of heart, Perl does support a C<goto>
statement. There are three forms: C<goto>-LABEL, C<goto>-EXPR, and
C<goto>-&NAME. A loop's LABEL is not actually a valid target for
a C<goto>; it's just the name of the loop.
The C<goto>-LABEL form finds the statement labeled with LABEL and resumes
execution there. It may not be used to go into any construct that
requires initialization, such as a subroutine or a C<foreach> loop. It
also can't be used to go into a construct that is optimized away. It
can be used to go almost anywhere else within the dynamic scope,
including out of subroutines, but it's usually better to use some other
construct such as C<last> or C<die>. The author of Perl has never felt the
need to use this form of C<goto> (in Perl, that is--C is another matter).
The C<goto>-EXPR form expects a label name, whose scope will be resolved
dynamically. This allows for computed C<goto>s per FORTRAN, but isn't
necessarily recommended if you're optimizing for maintainability:
goto(("FOO", "BAR", "GLARCH")[$i]);
The C<goto>-&NAME form is highly magical, and substitutes a call to the
named subroutine for the currently running subroutine. This is used by
C<AUTOLOAD()> subroutines that wish to load another subroutine and then
pretend that the other subroutine had been called in the first place
(except that any modifications to C<@_> in the current subroutine are
propagated to the other subroutine.) After the C<goto>, not even C<caller()>
will be able to tell that this routine was called first.
In almost all cases like this, it's usually a far, far better idea to use the
structured control flow mechanisms of C<next>, C<last>, or C<redo> instead of
resorting to a C<goto>. For certain applications, the catch and throw pair of
C<eval{}> and die() for exception processing can also be a prudent approach.
=head2 The Ellipsis Statement
X<...>
X<... statement>
X<ellipsis operator>
X<elliptical statement>
X<unimplemented statement>
X<unimplemented operator>
X<yada-yada>
X<yada-yada operator>
X<... operator>
X<whatever operator>
X<triple-dot operator>
Beginning in Perl 5.12, Perl accepts an ellipsis, "C<...>", as a
placeholder for code that you haven't implemented yet. This form of
ellipsis, the unimplemented statement, should not be confused with the
binary flip-flop C<...> operator. One is a statement and the other an
operator. (Perl doesn't usually confuse them because usually Perl can tell
whether it wants an operator or a statement, but see below for exceptions.)
When Perl 5.12 or later encounters an ellipsis statement, it parses this
without error, but if and when you should actually try to execute it, Perl
throws an exception with the text C<Unimplemented>:
use v5.12;
sub unimplemented { ... }
eval { unimplemented() };
if ($@ =~ /^Unimplemented at /) {
say "I found an ellipsis!";
}
You can only use the elliptical statement to stand in for a
complete statement. These examples of how the ellipsis works:
use v5.12;
{ ... }
sub foo { ... }
...;
eval { ... };
sub somemeth {
my $self = shift;
...;
}
$x = do {
my $n;
...;
say "Hurrah!";
$n;
};
The elliptical statement cannot stand in for an expression that
is part of a larger statement, since the C<...> is also the three-dot
version of the flip-flop operator (see L<perlop/"Range Operators">).
These examples of attempts to use an ellipsis are syntax errors:
use v5.12;
print ...;
open(my $fh, ">", "/dev/passwd") or ...;
if ($condition && ... ) { say "Howdy" };
There are some cases where Perl can't immediately tell the difference
between an expression and a statement. For instance, the syntax for a
block and an anonymous hash reference constructor look the same unless
there's something in the braces to give Perl a hint. The ellipsis is a
syntax error if Perl doesn't guess that the C<{ ... }> is a block. In that
case, it doesn't think the C<...> is an ellipsis because it's expecting an
expression instead of a statement:
@transformed = map { ... } @input; # syntax error
Inside your block, you can use a C<;> before the ellipsis to denote that the
C<{ ... }> is a block and not a hash reference constructor. Now the ellipsis
works:
@transformed = map {; ... } @input; # ';' disambiguates
Note: Some folks colloquially refer to this bit of punctuation as a
"yada-yada" or "triple-dot", but its true name
is actually an ellipsis.
=head2 PODs: Embedded Documentation
X<POD> X<documentation>
Perl has a mechanism for intermixing documentation with source code.
While it's expecting the beginning of a new statement, if the compiler
encounters a line that begins with an equal sign and a word, like this
=head1 Here There Be Pods!
Then that text and all remaining text up through and including a line
beginning with C<=cut> will be ignored. The format of the intervening
text is described in L<perlpod>.
This allows you to intermix your source code
and your documentation text freely, as in
=item snazzle($)
The snazzle() function will behave in the most spectacular
form that you can possibly imagine, not even excepting
cybernetic pyrotechnics.
=cut back to the compiler, nuff of this pod stuff!
sub snazzle($) {
my $thingie = shift;
.........
}
Note that pod translators should look at only paragraphs beginning
with a pod directive (it makes parsing easier), whereas the compiler
actually knows to look for pod escapes even in the middle of a
paragraph. This means that the following secret stuff will be
ignored by both the compiler and the translators.
$a=3;
=secret stuff
warn "Neither POD nor CODE!?"
=cut back
print "got $a\n";
You probably shouldn't rely upon the C<warn()> being podded out forever.
Not all pod translators are well-behaved in this regard, and perhaps
the compiler will become pickier.
One may also use pod directives to quickly comment out a section
of code.
=head2 Plain Old Comments (Not!)
X<comment> X<line> X<#> X<preprocessor> X<eval>
Perl can process line directives, much like the C preprocessor. Using
this, one can control Perl's idea of filenames and line numbers in
error or warning messages (especially for strings that are processed
with C<eval()>). The syntax for this mechanism is almost the same as for
most C preprocessors: it matches the regular expression
# example: '# line 42 "new_filename.plx"'
/^\# \s*
line \s+ (\d+) \s*
(?:\s("?)([^"]+)\g2)? \s*
$/x
with C<$1> being the line number for the next line, and C<$3> being
the optional filename (specified with or without quotes). Note that
no whitespace may precede the C<< # >>, unlike modern C preprocessors.
There is a fairly obvious gotcha included with the line directive:
Debuggers and profilers will only show the last source line to appear
at a particular line number in a given file. Care should be taken not
to cause line number collisions in code you'd like to debug later.
Here are some examples that you should be able to type into your command
shell:
% perl
# line 200 "bzzzt"
# the '#' on the previous line must be the first char on line
die 'foo';
__END__
foo at bzzzt line 201.
% perl
# line 200 "bzzzt"
eval qq[\n#line 2001 ""\ndie 'foo']; print $@;
__END__
foo at - line 2001.
% perl
eval qq[\n#line 200 "foo bar"\ndie 'foo']; print $@;
__END__
foo at foo bar line 200.
% perl
# line 345 "goop"
eval "\n#line " . __LINE__ . ' "' . __FILE__ ."\"\ndie 'foo'";
print $@;
__END__
foo at goop line 345.
=head2 Experimental Details on given and when
As previously mentioned, the "switch" feature is considered highly
experimental; it is subject to change with little notice. In particular,
C<when> has tricky behaviours that are expected to change to become less
tricky in the future. Do not rely upon its current (mis)implementation.
Before Perl 5.18, C<given> also had tricky behaviours that you should still
beware of if your code must run on older versions of Perl.
Here is a longer example of C<given>:
use feature ":5.10";
given ($foo) {
when (undef) {
say '$foo is undefined';
}
when ("foo") {
say '$foo is the string "foo"';
}
when ([1,3,5,7,9]) {
say '$foo is an odd digit';
continue; # Fall through
}
when ($_ < 100) {
say '$foo is numerically less than 100';
}
when (\&complicated_check) {
say 'a complicated check for $foo is true';
}
default {
die q(I don't know what to do with $foo);
}
}
Before Perl 5.18, C<given(EXPR)> assigned the value of I<EXPR> to
merely a lexically scoped I<B<copy>> (!) of C<$_>, not a dynamically
scoped alias the way C<foreach> does. That made it similar to
do { my $_ = EXPR; ... }
except that the block was automatically broken out of by a successful
C<when> or an explicit C<break>. Because it was only a copy, and because
it was only lexically scoped, not dynamically scoped, you could not do the
things with it that you are used to in a C<foreach> loop. In particular,
it did not work for arbitrary function calls if those functions might try
to access $_. Best stick to C<foreach> for that.
Most of the power comes from the implicit smartmatching that can
sometimes apply. Most of the time, C<when(EXPR)> is treated as an
implicit smartmatch of C<$_>, that is, C<$_ ~~ EXPR>. (See
L<perlop/"Smartmatch Operator"> for more information on smartmatching.)
But when I<EXPR> is one of the 10 exceptional cases (or things like them)
listed below, it is used directly as a boolean.
=over 4
=item Z<>1.
A user-defined subroutine call or a method invocation.
=item Z<>2.
A regular expression match in the form of C</REGEX/>, C<$foo =~ /REGEX/>,
or C<$foo =~ EXPR>. Also, a negated regular expression match in
the form C<!/REGEX/>, C<$foo !~ /REGEX/>, or C<$foo !~ EXPR>.
=item Z<>3.
A smart match that uses an explicit C<~~> operator, such as C<EXPR ~~ EXPR>.
B<NOTE:> You will often have to use C<$c ~~ $_> because the default case
uses C<$_ ~~ $c> , which is frequentlythe opposite of what you want.
=item Z<>4.
A boolean comparison operator such as C<$_ E<lt> 10> or C<$x eq "abc">. The
relational operators that this applies to are the six numeric comparisons
(C<< < >>, C<< > >>, C<< <= >>, C<< >= >>, C<< == >>, and C<< != >>), and
the six string comparisons (C<lt>, C<gt>, C<le>, C<ge>, C<eq>, and C<ne>).
=item Z<>5.
At least the three builtin functions C<defined(...)>, C<exists(...)>, and
C<eof(...)>. We might someday add more of these later if we think of them.
=item Z<>6.
A negated expression, whether C<!(EXPR)> or C<not(EXPR)>, or a logical
exclusive-or, C<(EXPR1) xor (EXPR2)>. The bitwise versions (C<~> and C<^>)
are not included.
=item Z<>7.
A filetest operator, with exactly 4 exceptions: C<-s>, C<-M>, C<-A>, and
C<-C>, as these return numerical values, not boolean ones. The C<-z>
filetest operator is not included in the exception list.
=item Z<>8.
The C<..> and C<...> flip-flop operators. Note that the C<...> flip-flop
operator is completely different from the C<...> elliptical statement
just described.
=back
In those 8 cases above, the value of EXPR is used directly as a boolean, so
no smartmatching is done. You may think of C<when> as a smartsmartmatch.
Furthermore, Perl inspects the operands of logical operators to
decide whether to use smartmatching for each one by applying the
above test to the operands:
=over 4
=item Z<>9.
If EXPR is C<EXPR1 && EXPR2> or C<EXPR1 and EXPR2>, the test is applied
I<recursively> to both EXPR1 and EXPR2.
Only if I<both> operands also pass the
test, I<recursively>, will the expression be treated as boolean. Otherwise,
smartmatching is used.
=item Z<>10.
If EXPR is C<EXPR1 || EXPR2>, C<EXPR1 // EXPR2>, or C<EXPR1 or EXPR2>, the
test is applied I<recursively> to EXPR1 only (which might itself be a
higher-precedence AND operator, for example, and thus subject to the
previous rule), not to EXPR2. If EXPR1 is to use smartmatching, then EXPR2
also does so, no matter what EXPR2 contains. But if EXPR2 does not get to
use smartmatching, then the second argument will not be either. This is
quite different from the C<&&> case just described, so be careful.
=back
These rules are complicated, but the goal is for them to do what you want
(even if you don't quite understand why they are doing it). For example:
when (/^\d+$/ && $_ < 75) { ... }
will be treated as a boolean match because the rules say both
a regex match and an explicit test on C<$_> will be treated
as boolean.
Also:
when ([qw(foo bar)] && /baz/) { ... }
will use smartmatching because only I<one> of the operands is a boolean:
the other uses smartmatching, and that wins.
Further:
when ([qw(foo bar)] || /^baz/) { ... }
will use smart matching (only the first operand is considered), whereas
when (/^baz/ || [qw(foo bar)]) { ... }
will test only the regex, which causes both operands to be
treated as boolean. Watch out for this one, then, because an
arrayref is always a true value, which makes it effectively
redundant. Not a good idea.
Tautologous boolean operators are still going to be optimized
away. Don't be tempted to write
when ("foo" or "bar") { ... }
This will optimize down to C<"foo">, so C<"bar"> will never be considered (even
though the rules say to use a smartmatch
on C<"foo">). For an alternation like
this, an array ref will work, because this will instigate smartmatching:
when ([qw(foo bar)] { ... }
This is somewhat equivalent to the C-style switch statement's fallthrough
functionality (not to be confused with I<Perl's> fallthrough
functionality--see below), wherein the same block is used for several
C<case> statements.
Another useful shortcut is that, if you use a literal array or hash as the
argument to C<given>, it is turned into a reference. So C<given(@foo)> is
the same as C<given(\@foo)>, for example.
C<default> behaves exactly like C<when(1 == 1)>, which is
to say that it always matches.
=head3 Breaking out
You can use the C<break> keyword to break out of the enclosing
C<given> block. Every C<when> block is implicitly ended with
a C<break>.
=head3 Fall-through
You can use the C<continue> keyword to fall through from one
case to the next immediate C<when> or C<default>:
given($foo) {
when (/x/) { say '$foo contains an x'; continue }
when (/y/) { say '$foo contains a y' }
default { say '$foo does not contain a y' }
}
=head3 Return value
When a C<given> statement is also a valid expression (for example,
when it's the last statement of a block), it evaluates to:
=over 4
=item *
An empty list as soon as an explicit C<break> is encountered.
=item *
The value of the last evaluated expression of the successful
C<when>/C<default> clause, if there happens to be one.
=item *
The value of the last evaluated expression of the C<given> block if no
condition is true.
=back
In both last cases, the last expression is evaluated in the context that
was applied to the C<given> block.
Note that, unlike C<if> and C<unless>, failed C<when> statements always
evaluate to an empty list.
my $price = do {
given ($item) {
when (["pear", "apple"]) { 1 }
break when "vote"; # My vote cannot be bought
1e10 when /Mona Lisa/;
"unknown";
}
};
Currently, C<given> blocks can't always
be used as proper expressions. This
may be addressed in a future version of Perl.
=head3 Switching in a loop
Instead of using C<given()>, you can use a C<foreach()> loop.
For example, here's one way to count how many times a particular
string occurs in an array:
use v5.10.1;
my $count = 0;
for (@array) {
when ("foo") { ++$count }
}
print "\@array contains $count copies of 'foo'\n";
Or in a more recent version:
use v5.14;
my $count = 0;
for (@array) {
++$count when "foo";
}
print "\@array contains $count copies of 'foo'\n";
At the end of all C<when> blocks, there is an implicit C<next>.
You can override that with an explicit C<last> if you're
interested in only the first match alone.
This doesn't work if you explicitly specify a loop variable, as
in C<for $item (@array)>. You have to use the default variable C<$_>.
=head3 Differences from Perl 6
The Perl 5 smartmatch and C<given>/C<when> constructs are not compatible
with their Perl 6 analogues. The most visible difference and least
important difference is that, in Perl 5, parentheses are required around
the argument to C<given()> and C<when()> (except when this last one is used
as a statement modifier). Parentheses in Perl 6 are always optional in a
control construct such as C<if()>, C<while()>, or C<when()>; they can't be
made optional in Perl 5 without a great deal of potential confusion,
because Perl 5 would parse the expression
given $foo {
...
}
as though the argument to C<given> were an element of the hash
C<%foo>, interpreting the braces as hash-element syntax.
However, their are many, many other differences. For example,
this works in Perl 5:
use v5.12;
my @primary = ("red", "blue", "green");
if (@primary ~~ "red") {
say "primary smartmatches red";
}
if ("red" ~~ @primary) {
say "red smartmatches primary";
}
say "that's all, folks!";
But it doesn't work at all in Perl 6. Instead, you should
use the (parallelizable) C<any> operator:
if any(@primary) eq "red" {
say "primary smartmatches red";
}
if "red" eq any(@primary) {
say "red smartmatches primary";
}
The table of smartmatches in L<perlop/"Smartmatch Operator"> is not
identical to that proposed by the Perl 6 specification, mainly due to
differences between Perl 6's and Perl 5's data models, but also because
the Perl 6 spec has changed since Perl 5 rushed into early adoption.
In Perl 6, C<when()> will always do an implicit smartmatch with its
argument, while in Perl 5 it is convenient (albeit potentially confusing) to
suppress this implicit smartmatch in various rather loosely-defined
situations, as roughly outlined above. (The difference is largely because
Perl 5 does not have, even internally, a boolean type.)
=cut
PK z3�Z�bl" " perl5184delta.podnu �[��� =encoding utf8
=head1 NAME
perl5184delta - what is new for perl v5.18.4
=head1 DESCRIPTION
This document describes differences between the 5.18.4 release and the 5.18.2
release. B<Please note:> This document ignores perl 5.18.3, a broken release
which existed for a few hours only.
If you are upgrading from an earlier release such as 5.18.1, first read
L<perl5182delta>, which describes differences between 5.18.1 and 5.18.2.
=head1 Modules and Pragmata
=head2 Updated Modules and Pragmata
=over 4
=item *
L<Digest::SHA> has been upgraded from 5.84_01 to 5.84_02.
=item *
L<perl5db.pl> has been upgraded from version 1.39_10 to 1.39_11.
This fixes a crash in tab completion, where available. [perl #120827] Also,
filehandle information is properly reset after a pager is run. [perl #121456]
=back
=head1 Platform Support
=head2 Platform-Specific Notes
=over 4
=item Win32
=over 4
=item *
Introduced by
L<perl #113536|https://rt.perl.org/Public/Bug/Display.html?id=113536>, a memory
leak on every call to C<system> and backticks (C< `` >), on most Win32 Perls
starting from 5.18.0 has been fixed. The memory leak only occurred if you
enabled psuedo-fork in your build of Win32 Perl, and were running that build on
Server 2003 R2 or newer OS. The leak does not appear on WinXP SP3.
[L<perl #121676|https://rt.perl.org/Public/Bug/Display.html?id=121676>]
=back
=back
=head1 Selected Bug Fixes
=over 4
=item *
The debugger now properly resets filehandles as needed. [perl #121456]
=item *
A segfault in Digest::SHA has been addressed. [perl #121421]
=item *
perl can again be built with USE_64_BIT_INT, with Visual C 2003, 32 bit.
[perl #120925]
=item *
A leading { (brace) in formats is properly parsed again. [perl #119973]
=item *
Copy the values used to perturb hash iteration when cloning an
interpreter. This was fairly harmless but caused C<valgrind> to
complain. [perl #121336]
=item *
In Perl v5.18 C<undef *_; goto &sub> and C<local *_; goto &sub> started
crashing. This has been fixed. [perl #119949]
=back
=head1 Acknowledgements
Perl 5.18.4 represents approximately 9 months of development since Perl 5.18.2
and contains approximately 2,000 lines of changes across 53 files from 13
authors.
Perl continues to flourish into its third decade thanks to a vibrant community
of users and developers. The following people are known to have contributed the
improvements that became Perl 5.18.4:
Daniel Dragan, David Mitchell, Doug Bell, Father Chrysostomos, Hiroo Hayashi,
James E Keenan, Karl Williamson, Mark Shelor, Ricardo Signes, Shlomi Fish,
Smylers, Steve Hay, Tony Cook.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles recently
posted to the comp.lang.perl.misc newsgroup and the perl bug database at
http://rt.perl.org/perlbug/ . There may also be information at
http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug> program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of C<perl -V>,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it
inappropriate to send to a publicly archived mailing list, then please send it
to perl5-security-report@perl.org. This points to a closed subscription
unarchived mailing list, which includes all the core committers, who will be
able to help assess the impact of issues, figure out a resolution, and help
co-ordinate the release of patches to mitigate or fix the problem across all
platforms on which Perl is supported. Please only use this address for
security issues in the Perl core, not for modules independently distributed on
CPAN.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details on
what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Z��1� I I perlebcdic.podnu �[��� =encoding utf8
=head1 NAME
perlebcdic - Considerations for running Perl on EBCDIC platforms
=head1 DESCRIPTION
An exploration of some of the issues facing Perl programmers
on EBCDIC based computers.
Portions of this document that are still incomplete are marked with XXX.
Early Perl versions worked on some EBCDIC machines, but the last known
version that ran on EBCDIC was v5.8.7, until v5.22, when the Perl core
again works on z/OS. Theoretically, it could work on OS/400 or Siemens'
BS2000 (or their successors), but this is untested. In v5.22 and 5.24,
not all
the modules found on CPAN but shipped with core Perl work on z/OS.
If you want to use Perl on a non-z/OS EBCDIC machine, please let us know
by sending mail to perlbug@perl.org
Writing Perl on an EBCDIC platform is really no different than writing
on an L</ASCII> one, but with different underlying numbers, as we'll see
shortly. You'll have to know something about those L</ASCII> platforms
because the documentation is biased and will frequently use example
numbers that don't apply to EBCDIC. There are also very few CPAN
modules that are written for EBCDIC and which don't work on ASCII;
instead the vast majority of CPAN modules are written for ASCII, and
some may happen to work on EBCDIC, while a few have been designed to
portably work on both.
If your code just uses the 52 letters A-Z and a-z, plus SPACE, the
digits 0-9, and the punctuation characters that Perl uses, plus a few
controls that are denoted by escape sequences like C<\n> and C<\t>, then
there's nothing special about using Perl, and your code may very well
work on an ASCII machine without change.
But if you write code that uses C<\005> to mean a TAB or C<\xC1> to mean
an "A", or C<\xDF> to mean a "E<yuml>" (small C<"y"> with a diaeresis),
then your code may well work on your EBCDIC platform, but not on an
ASCII one. That's fine to do if no one will ever want to run your code
on an ASCII platform; but the bias in this document will be towards writing
code portable between EBCDIC and ASCII systems. Again, if every
character you care about is easily enterable from your keyboard, you
don't have to know anything about ASCII, but many keyboards don't easily
allow you to directly enter, say, the character C<\xDF>, so you have to
specify it indirectly, such as by using the C<"\xDF"> escape sequence.
In those cases it's easiest to know something about the ASCII/Unicode
character sets. If you know that the small "E<yuml>" is C<U+00FF>, then
you can instead specify it as C<"\N{U+FF}">, and have the computer
automatically translate it to C<\xDF> on your platform, and leave it as
C<\xFF> on ASCII ones. Or you could specify it by name, C<\N{LATIN
SMALL LETTER Y WITH DIAERESIS> and not have to know the numbers.
Either way works, but both require familiarity with Unicode.
=head1 COMMON CHARACTER CODE SETS
=head2 ASCII
The American Standard Code for Information Interchange (ASCII or
US-ASCII) is a set of
integers running from 0 to 127 (decimal) that have standardized
interpretations by the computers which use ASCII. For example, 65 means
the letter "A".
The range 0..127 can be covered by setting various bits in a 7-bit binary
digit, hence the set is sometimes referred to as "7-bit ASCII".
ASCII was described by the American National Standards Institute
document ANSI X3.4-1986. It was also described by ISO 646:1991
(with localization for currency symbols). The full ASCII set is
given in the table L<below|/recipe 3> as the first 128 elements.
Languages that
can be written adequately with the characters in ASCII include
English, Hawaiian, Indonesian, Swahili and some Native American
languages.
Most non-EBCDIC character sets are supersets of ASCII. That is the
integers 0-127 mean what ASCII says they mean. But integers 128 and
above are specific to the character set.
Many of these fit entirely into 8 bits, using ASCII as 0-127, while
specifying what 128-255 mean, and not using anything above 255.
Thus, these are single-byte (or octet if you prefer) character sets.
One important one (since Unicode is a superset of it) is the ISO 8859-1
character set.
=head2 ISO 8859
The ISO 8859-I<B<$n>> are a collection of character code sets from the
International Organization for Standardization (ISO), each of which adds
characters to the ASCII set that are typically found in various
languages, many of which are based on the Roman, or Latin, alphabet.
Most are for European languages, but there are also ones for Arabic,
Greek, Hebrew, and Thai. There are good references on the web about
all these.
=head2 Latin 1 (ISO 8859-1)
A particular 8-bit extension to ASCII that includes grave and acute
accented Latin characters. Languages that can employ ISO 8859-1
include all the languages covered by ASCII as well as Afrikaans,
Albanian, Basque, Catalan, Danish, Faroese, Finnish, Norwegian,
Portuguese, Spanish, and Swedish. Dutch is covered albeit without
the ij ligature. French is covered too but without the oe ligature.
German can use ISO 8859-1 but must do so without German-style
quotation marks. This set is based on Western European extensions
to ASCII and is commonly encountered in world wide web work.
In IBM character code set identification terminology, ISO 8859-1 is
also known as CCSID 819 (or sometimes 0819 or even 00819).
=head2 EBCDIC
The Extended Binary Coded Decimal Interchange Code refers to a
large collection of single- and multi-byte coded character sets that are
quite different from ASCII and ISO 8859-1, and are all slightly
different from each other; they typically run on host computers. The
EBCDIC encodings derive from 8-bit byte extensions of Hollerith punched
card encodings, which long predate ASCII. The layout on the
cards was such that high bits were set for the upper and lower case
alphabetic
characters C<[a-z]> and C<[A-Z]>, but there were gaps within each Latin
alphabet range, visible in the table L<below|/recipe 3>. These gaps can
cause complications.
Some IBM EBCDIC character sets may be known by character code set
identification numbers (CCSID numbers) or code page numbers.
Perl can be compiled on platforms that run any of three commonly used EBCDIC
character sets, listed below.
=head3 The 13 variant characters
Among IBM EBCDIC character code sets there are 13 characters that
are often mapped to different integer values. Those characters
are known as the 13 "variant" characters and are:
\ [ ] { } ^ ~ ! # | $ @ `
When Perl is compiled for a platform, it looks at all of these characters to
guess which EBCDIC character set the platform uses, and adapts itself
accordingly to that platform. If the platform uses a character set that is not
one of the three Perl knows about, Perl will either fail to compile, or
mistakenly and silently choose one of the three.
=head3 EBCDIC code sets recognized by Perl
=over
=item B<0037>
Character code set ID 0037 is a mapping of the ASCII plus Latin-1
characters (i.e. ISO 8859-1) to an EBCDIC set. 0037 is used
in North American English locales on the OS/400 operating system
that runs on AS/400 computers. CCSID 0037 differs from ISO 8859-1
in 236 places; in other words they agree on only 20 code point values.
=item B<1047>
Character code set ID 1047 is also a mapping of the ASCII plus
Latin-1 characters (i.e. ISO 8859-1) to an EBCDIC set. 1047 is
used under Unix System Services for OS/390 or z/OS, and OpenEdition
for VM/ESA. CCSID 1047 differs from CCSID 0037 in eight places,
and from ISO 8859-1 in 236.
=item B<POSIX-BC>
The EBCDIC code page in use on Siemens' BS2000 system is distinct from
1047 and 0037. It is identified below as the POSIX-BC set.
Like 0037 and 1047, it is the same as ISO 8859-1 in 20 code point
values.
=back
=head2 Unicode code points versus EBCDIC code points
In Unicode terminology a I<code point> is the number assigned to a
character: for example, in EBCDIC the character "A" is usually assigned
the number 193. In Unicode, the character "A" is assigned the number 65.
All the code points in ASCII and Latin-1 (ISO 8859-1) have the same
meaning in Unicode. All three of the recognized EBCDIC code sets have
256 code points, and in each code set, all 256 code points are mapped to
equivalent Latin1 code points. Obviously, "A" will map to "A", "B" =>
"B", "%" => "%", etc., for all printable characters in Latin1 and these
code pages.
It also turns out that EBCDIC has nearly precise equivalents for the
ASCII/Latin1 C0 controls and the DELETE control. (The C0 controls are
those whose ASCII code points are 0..0x1F; things like TAB, ACK, BEL,
etc.) A mapping is set up between these ASCII/EBCDIC controls. There
isn't such a precise mapping between the C1 controls on ASCII platforms
and the remaining EBCDIC controls. What has been done is to map these
controls, mostly arbitrarily, to some otherwise unmatched character in
the other character set. Most of these are very very rarely used
nowadays in EBCDIC anyway, and their names have been dropped, without
much complaint. For example the EO (Eight Ones) EBCDIC control
(consisting of eight one bits = 0xFF) is mapped to the C1 APC control
(0x9F), and you can't use the name "EO".
The EBCDIC controls provide three possible line terminator characters,
CR (0x0D), LF (0x25), and NL (0x15). On ASCII platforms, the symbols
"NL" and "LF" refer to the same character, but in strict EBCDIC
terminology they are different ones. The EBCDIC NL is mapped to the C1
control called "NEL" ("Next Line"; here's a case where the mapping makes
quite a bit of sense, and hence isn't just arbitrary). On some EBCDIC
platforms, this NL or NEL is the typical line terminator. This is true
of z/OS and BS2000. In these platforms, the C compilers will swap the
LF and NEL code points, so that C<"\n"> is 0x15, and refers to NL. Perl
does that too; you can see it in the code chart L<below|/recipe 3>.
This makes things generally "just work" without you even having to be
aware that there is a swap.
=head2 Unicode and UTF
UTF stands for "Unicode Transformation Format".
UTF-8 is an encoding of Unicode into a sequence of 8-bit byte chunks, based on
ASCII and Latin-1.
The length of a sequence required to represent a Unicode code point
depends on the ordinal number of that code point,
with larger numbers requiring more bytes.
UTF-EBCDIC is like UTF-8, but based on EBCDIC.
They are enough alike that often, casual usage will conflate the two
terms, and use "UTF-8" to mean both the UTF-8 found on ASCII platforms,
and the UTF-EBCDIC found on EBCDIC ones.
You may see the term "invariant" character or code point.
This simply means that the character has the same numeric
value and representation when encoded in UTF-8 (or UTF-EBCDIC) as when
not. (Note that this is a very different concept from L</The 13 variant
characters> mentioned above. Careful prose will use the term "UTF-8
invariant" instead of just "invariant", but most often you'll see just
"invariant".) For example, the ordinal value of "A" is 193 in most
EBCDIC code pages, and also is 193 when encoded in UTF-EBCDIC. All
UTF-8 (or UTF-EBCDIC) variant code points occupy at least two bytes when
encoded in UTF-8 (or UTF-EBCDIC); by definition, the UTF-8 (or
UTF-EBCDIC) invariant code points are exactly one byte whether encoded
in UTF-8 (or UTF-EBCDIC), or not. (By now you see why people typically
just say "UTF-8" when they also mean "UTF-EBCDIC". For the rest of this
document, we'll mostly be casual about it too.)
In ASCII UTF-8, the code points corresponding to the lowest 128
ordinal numbers (0 - 127: the ASCII characters) are invariant.
In UTF-EBCDIC, there are 160 invariant characters.
(If you care, the EBCDIC invariants are those characters
which have ASCII equivalents, plus those that correspond to
the C1 controls (128 - 159 on ASCII platforms).)
A string encoded in UTF-EBCDIC may be longer (very rarely shorter) than
one encoded in UTF-8. Perl extends both UTF-8 and UTF-EBCDIC so that
they can encode code points above the Unicode maximum of U+10FFFF. Both
extensions are constructed to allow encoding of any code point that fits
in a 64-bit word.
UTF-EBCDIC is defined by
L<Unicode Technical Report #16|http://www.unicode.org/reports/tr16>
(often referred to as just TR16).
It is defined based on CCSID 1047, not allowing for the differences for
other code pages. This allows for easy interchange of text between
computers running different code pages, but makes it unusable, without
adaptation, for Perl on those other code pages.
The reason for this unusability is that a fundamental assumption of Perl
is that the characters it cares about for parsing and lexical analysis
are the same whether or not the text is in UTF-8. For example, Perl
expects the character C<"["> to have the same representation, no matter
if the string containing it (or program text) is UTF-8 encoded or not.
To ensure this, Perl adapts UTF-EBCDIC to the particular code page so
that all characters it expects to be UTF-8 invariant are in fact UTF-8
invariant. This means that text generated on a computer running one
version of Perl's UTF-EBCDIC has to be translated to be intelligible to
a computer running another.
TR16 implies a method to extend UTF-EBCDIC to encode points up through
S<C<2 ** 31 - 1>>. Perl uses this method for code points up through
S<C<2 ** 30 - 1>>, but uses an incompatible method for larger ones, to
enable it to handle much larger code points than otherwise.
=head2 Using Encode
Starting from Perl 5.8 you can use the standard module Encode
to translate from EBCDIC to Latin-1 code points.
Encode knows about more EBCDIC character sets than Perl can currently
be compiled to run on.
use Encode 'from_to';
my %ebcdic = ( 176 => 'cp37', 95 => 'cp1047', 106 => 'posix-bc' );
# $a is in EBCDIC code points
from_to($a, $ebcdic{ord '^'}, 'latin1');
# $a is ISO 8859-1 code points
and from Latin-1 code points to EBCDIC code points
use Encode 'from_to';
my %ebcdic = ( 176 => 'cp37', 95 => 'cp1047', 106 => 'posix-bc' );
# $a is ISO 8859-1 code points
from_to($a, 'latin1', $ebcdic{ord '^'});
# $a is in EBCDIC code points
For doing I/O it is suggested that you use the autotranslating features
of PerlIO, see L<perluniintro>.
Since version 5.8 Perl uses the PerlIO I/O library. This enables
you to use different encodings per IO channel. For example you may use
use Encode;
open($f, ">:encoding(ascii)", "test.ascii");
print $f "Hello World!\n";
open($f, ">:encoding(cp37)", "test.ebcdic");
print $f "Hello World!\n";
open($f, ">:encoding(latin1)", "test.latin1");
print $f "Hello World!\n";
open($f, ">:encoding(utf8)", "test.utf8");
print $f "Hello World!\n";
to get four files containing "Hello World!\n" in ASCII, CP 0037 EBCDIC,
ISO 8859-1 (Latin-1) (in this example identical to ASCII since only ASCII
characters were printed), and
UTF-EBCDIC (in this example identical to normal EBCDIC since only characters
that don't differ between EBCDIC and UTF-EBCDIC were printed). See the
documentation of L<Encode::PerlIO> for details.
As the PerlIO layer uses raw IO (bytes) internally, all this totally
ignores things like the type of your filesystem (ASCII or EBCDIC).
=head1 SINGLE OCTET TABLES
The following tables list the ASCII and Latin 1 ordered sets including
the subsets: C0 controls (0..31), ASCII graphics (32..7e), delete (7f),
C1 controls (80..9f), and Latin-1 (a.k.a. ISO 8859-1) (a0..ff). In the
table names of the Latin 1
extensions to ASCII have been labelled with character names roughly
corresponding to I<The Unicode Standard, Version 6.1> albeit with
substitutions such as C<s/LATIN//> and C<s/VULGAR//> in all cases;
S<C<s/CAPITAL LETTER//>> in some cases; and
S<C<s/SMALL LETTER ([A-Z])/\l$1/>> in some other
cases. Controls are listed using their Unicode 6.2 abbreviations.
The differences between the 0037 and 1047 sets are
flagged with C<**>. The differences between the 1047 and POSIX-BC sets
are flagged with C<##.> All C<ord()> numbers listed are decimal. If you
would rather see this table listing octal values, then run the table
(that is, the pod source text of this document, since this recipe may not
work with a pod2_other_format translation) through:
=over 4
=item recipe 0
=back
perl -ne 'if(/(.{29})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \
-e '{printf("%s%-5.03o%-5.03o%-5.03o%.03o\n",$1,$2,$3,$4,$5)}' \
perlebcdic.pod
If you want to retain the UTF-x code points then in script form you
might want to write:
=over 4
=item recipe 1
=back
open(FH,"<perlebcdic.pod") or die "Could not open perlebcdic.pod: $!";
while (<FH>) {
if (/(.{29})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\.?(\d*)
\s+(\d+)\.?(\d*)/x)
{
if ($7 ne '' && $9 ne '') {
printf(
"%s%-5.03o%-5.03o%-5.03o%-5.03o%-3o.%-5o%-3o.%.03o\n",
$1,$2,$3,$4,$5,$6,$7,$8,$9);
}
elsif ($7 ne '') {
printf("%s%-5.03o%-5.03o%-5.03o%-5.03o%-3o.%-5o%.03o\n",
$1,$2,$3,$4,$5,$6,$7,$8);
}
else {
printf("%s%-5.03o%-5.03o%-5.03o%-5.03o%-5.03o%.03o\n",
$1,$2,$3,$4,$5,$6,$8);
}
}
}
If you would rather see this table listing hexadecimal values then
run the table through:
=over 4
=item recipe 2
=back
perl -ne 'if(/(.{29})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \
-e '{printf("%s%-5.02X%-5.02X%-5.02X%.02X\n",$1,$2,$3,$4,$5)}' \
perlebcdic.pod
Or, in order to retain the UTF-x code points in hexadecimal:
=over 4
=item recipe 3
=back
open(FH,"<perlebcdic.pod") or die "Could not open perlebcdic.pod: $!";
while (<FH>) {
if (/(.{29})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\.?(\d*)
\s+(\d+)\.?(\d*)/x)
{
if ($7 ne '' && $9 ne '') {
printf(
"%s%-5.02X%-5.02X%-5.02X%-5.02X%-2X.%-6.02X%02X.%02X\n",
$1,$2,$3,$4,$5,$6,$7,$8,$9);
}
elsif ($7 ne '') {
printf("%s%-5.02X%-5.02X%-5.02X%-5.02X%-2X.%-6.02X%02X\n",
$1,$2,$3,$4,$5,$6,$7,$8);
}
else {
printf("%s%-5.02X%-5.02X%-5.02X%-5.02X%-5.02X%02X\n",
$1,$2,$3,$4,$5,$6,$8);
}
}
}
ISO
8859-1 POS- CCSID
CCSID CCSID CCSID IX- 1047
chr 0819 0037 1047 BC UTF-8 UTF-EBCDIC
---------------------------------------------------------------------
<NUL> 0 0 0 0 0 0
<SOH> 1 1 1 1 1 1
<STX> 2 2 2 2 2 2
<ETX> 3 3 3 3 3 3
<EOT> 4 55 55 55 4 55
<ENQ> 5 45 45 45 5 45
<ACK> 6 46 46 46 6 46
<BEL> 7 47 47 47 7 47
<BS> 8 22 22 22 8 22
<HT> 9 5 5 5 9 5
<LF> 10 37 21 21 10 21 **
<VT> 11 11 11 11 11 11
<FF> 12 12 12 12 12 12
<CR> 13 13 13 13 13 13
<SO> 14 14 14 14 14 14
<SI> 15 15 15 15 15 15
<DLE> 16 16 16 16 16 16
<DC1> 17 17 17 17 17 17
<DC2> 18 18 18 18 18 18
<DC3> 19 19 19 19 19 19
<DC4> 20 60 60 60 20 60
<NAK> 21 61 61 61 21 61
<SYN> 22 50 50 50 22 50
<ETB> 23 38 38 38 23 38
<CAN> 24 24 24 24 24 24
<EOM> 25 25 25 25 25 25
<SUB> 26 63 63 63 26 63
<ESC> 27 39 39 39 27 39
<FS> 28 28 28 28 28 28
<GS> 29 29 29 29 29 29
<RS> 30 30 30 30 30 30
<US> 31 31 31 31 31 31
<SPACE> 32 64 64 64 32 64
! 33 90 90 90 33 90
" 34 127 127 127 34 127
# 35 123 123 123 35 123
$ 36 91 91 91 36 91
% 37 108 108 108 37 108
& 38 80 80 80 38 80
' 39 125 125 125 39 125
( 40 77 77 77 40 77
) 41 93 93 93 41 93
* 42 92 92 92 42 92
+ 43 78 78 78 43 78
, 44 107 107 107 44 107
- 45 96 96 96 45 96
. 46 75 75 75 46 75
/ 47 97 97 97 47 97
0 48 240 240 240 48 240
1 49 241 241 241 49 241
2 50 242 242 242 50 242
3 51 243 243 243 51 243
4 52 244 244 244 52 244
5 53 245 245 245 53 245
6 54 246 246 246 54 246
7 55 247 247 247 55 247
8 56 248 248 248 56 248
9 57 249 249 249 57 249
: 58 122 122 122 58 122
; 59 94 94 94 59 94
< 60 76 76 76 60 76
= 61 126 126 126 61 126
> 62 110 110 110 62 110
? 63 111 111 111 63 111
@ 64 124 124 124 64 124
A 65 193 193 193 65 193
B 66 194 194 194 66 194
C 67 195 195 195 67 195
D 68 196 196 196 68 196
E 69 197 197 197 69 197
F 70 198 198 198 70 198
G 71 199 199 199 71 199
H 72 200 200 200 72 200
I 73 201 201 201 73 201
J 74 209 209 209 74 209
K 75 210 210 210 75 210
L 76 211 211 211 76 211
M 77 212 212 212 77 212
N 78 213 213 213 78 213
O 79 214 214 214 79 214
P 80 215 215 215 80 215
Q 81 216 216 216 81 216
R 82 217 217 217 82 217
S 83 226 226 226 83 226
T 84 227 227 227 84 227
U 85 228 228 228 85 228
V 86 229 229 229 86 229
W 87 230 230 230 87 230
X 88 231 231 231 88 231
Y 89 232 232 232 89 232
Z 90 233 233 233 90 233
[ 91 186 173 187 91 173 ** ##
\ 92 224 224 188 92 224 ##
] 93 187 189 189 93 189 **
^ 94 176 95 106 94 95 ** ##
_ 95 109 109 109 95 109
` 96 121 121 74 96 121 ##
a 97 129 129 129 97 129
b 98 130 130 130 98 130
c 99 131 131 131 99 131
d 100 132 132 132 100 132
e 101 133 133 133 101 133
f 102 134 134 134 102 134
g 103 135 135 135 103 135
h 104 136 136 136 104 136
i 105 137 137 137 105 137
j 106 145 145 145 106 145
k 107 146 146 146 107 146
l 108 147 147 147 108 147
m 109 148 148 148 109 148
n 110 149 149 149 110 149
o 111 150 150 150 111 150
p 112 151 151 151 112 151
q 113 152 152 152 113 152
r 114 153 153 153 114 153
s 115 162 162 162 115 162
t 116 163 163 163 116 163
u 117 164 164 164 117 164
v 118 165 165 165 118 165
w 119 166 166 166 119 166
x 120 167 167 167 120 167
y 121 168 168 168 121 168
z 122 169 169 169 122 169
{ 123 192 192 251 123 192 ##
| 124 79 79 79 124 79
} 125 208 208 253 125 208 ##
~ 126 161 161 255 126 161 ##
<DEL> 127 7 7 7 127 7
<PAD> 128 32 32 32 194.128 32
<HOP> 129 33 33 33 194.129 33
<BPH> 130 34 34 34 194.130 34
<NBH> 131 35 35 35 194.131 35
<IND> 132 36 36 36 194.132 36
<NEL> 133 21 37 37 194.133 37 **
<SSA> 134 6 6 6 194.134 6
<ESA> 135 23 23 23 194.135 23
<HTS> 136 40 40 40 194.136 40
<HTJ> 137 41 41 41 194.137 41
<VTS> 138 42 42 42 194.138 42
<PLD> 139 43 43 43 194.139 43
<PLU> 140 44 44 44 194.140 44
<RI> 141 9 9 9 194.141 9
<SS2> 142 10 10 10 194.142 10
<SS3> 143 27 27 27 194.143 27
<DCS> 144 48 48 48 194.144 48
<PU1> 145 49 49 49 194.145 49
<PU2> 146 26 26 26 194.146 26
<STS> 147 51 51 51 194.147 51
<CCH> 148 52 52 52 194.148 52
<MW> 149 53 53 53 194.149 53
<SPA> 150 54 54 54 194.150 54
<EPA> 151 8 8 8 194.151 8
<SOS> 152 56 56 56 194.152 56
<SGC> 153 57 57 57 194.153 57
<SCI> 154 58 58 58 194.154 58
<CSI> 155 59 59 59 194.155 59
<ST> 156 4 4 4 194.156 4
<OSC> 157 20 20 20 194.157 20
<PM> 158 62 62 62 194.158 62
<APC> 159 255 255 95 194.159 255 ##
<NON-BREAKING SPACE> 160 65 65 65 194.160 128.65
<INVERTED "!" > 161 170 170 170 194.161 128.66
<CENT SIGN> 162 74 74 176 194.162 128.67 ##
<POUND SIGN> 163 177 177 177 194.163 128.68
<CURRENCY SIGN> 164 159 159 159 194.164 128.69
<YEN SIGN> 165 178 178 178 194.165 128.70
<BROKEN BAR> 166 106 106 208 194.166 128.71 ##
<SECTION SIGN> 167 181 181 181 194.167 128.72
<DIAERESIS> 168 189 187 121 194.168 128.73 ** ##
<COPYRIGHT SIGN> 169 180 180 180 194.169 128.74
<FEMININE ORDINAL> 170 154 154 154 194.170 128.81
<LEFT POINTING GUILLEMET> 171 138 138 138 194.171 128.82
<NOT SIGN> 172 95 176 186 194.172 128.83 ** ##
<SOFT HYPHEN> 173 202 202 202 194.173 128.84
<REGISTERED TRADE MARK> 174 175 175 175 194.174 128.85
<MACRON> 175 188 188 161 194.175 128.86 ##
<DEGREE SIGN> 176 144 144 144 194.176 128.87
<PLUS-OR-MINUS SIGN> 177 143 143 143 194.177 128.88
<SUPERSCRIPT TWO> 178 234 234 234 194.178 128.89
<SUPERSCRIPT THREE> 179 250 250 250 194.179 128.98
<ACUTE ACCENT> 180 190 190 190 194.180 128.99
<MICRO SIGN> 181 160 160 160 194.181 128.100
<PARAGRAPH SIGN> 182 182 182 182 194.182 128.101
<MIDDLE DOT> 183 179 179 179 194.183 128.102
<CEDILLA> 184 157 157 157 194.184 128.103
<SUPERSCRIPT ONE> 185 218 218 218 194.185 128.104
<MASC. ORDINAL INDICATOR> 186 155 155 155 194.186 128.105
<RIGHT POINTING GUILLEMET> 187 139 139 139 194.187 128.106
<FRACTION ONE QUARTER> 188 183 183 183 194.188 128.112
<FRACTION ONE HALF> 189 184 184 184 194.189 128.113
<FRACTION THREE QUARTERS> 190 185 185 185 194.190 128.114
<INVERTED QUESTION MARK> 191 171 171 171 194.191 128.115
<A WITH GRAVE> 192 100 100 100 195.128 138.65
<A WITH ACUTE> 193 101 101 101 195.129 138.66
<A WITH CIRCUMFLEX> 194 98 98 98 195.130 138.67
<A WITH TILDE> 195 102 102 102 195.131 138.68
<A WITH DIAERESIS> 196 99 99 99 195.132 138.69
<A WITH RING ABOVE> 197 103 103 103 195.133 138.70
<CAPITAL LIGATURE AE> 198 158 158 158 195.134 138.71
<C WITH CEDILLA> 199 104 104 104 195.135 138.72
<E WITH GRAVE> 200 116 116 116 195.136 138.73
<E WITH ACUTE> 201 113 113 113 195.137 138.74
<E WITH CIRCUMFLEX> 202 114 114 114 195.138 138.81
<E WITH DIAERESIS> 203 115 115 115 195.139 138.82
<I WITH GRAVE> 204 120 120 120 195.140 138.83
<I WITH ACUTE> 205 117 117 117 195.141 138.84
<I WITH CIRCUMFLEX> 206 118 118 118 195.142 138.85
<I WITH DIAERESIS> 207 119 119 119 195.143 138.86
<CAPITAL LETTER ETH> 208 172 172 172 195.144 138.87
<N WITH TILDE> 209 105 105 105 195.145 138.88
<O WITH GRAVE> 210 237 237 237 195.146 138.89
<O WITH ACUTE> 211 238 238 238 195.147 138.98
<O WITH CIRCUMFLEX> 212 235 235 235 195.148 138.99
<O WITH TILDE> 213 239 239 239 195.149 138.100
<O WITH DIAERESIS> 214 236 236 236 195.150 138.101
<MULTIPLICATION SIGN> 215 191 191 191 195.151 138.102
<O WITH STROKE> 216 128 128 128 195.152 138.103
<U WITH GRAVE> 217 253 253 224 195.153 138.104 ##
<U WITH ACUTE> 218 254 254 254 195.154 138.105
<U WITH CIRCUMFLEX> 219 251 251 221 195.155 138.106 ##
<U WITH DIAERESIS> 220 252 252 252 195.156 138.112
<Y WITH ACUTE> 221 173 186 173 195.157 138.113 ** ##
<CAPITAL LETTER THORN> 222 174 174 174 195.158 138.114
<SMALL LETTER SHARP S> 223 89 89 89 195.159 138.115
<a WITH GRAVE> 224 68 68 68 195.160 139.65
<a WITH ACUTE> 225 69 69 69 195.161 139.66
<a WITH CIRCUMFLEX> 226 66 66 66 195.162 139.67
<a WITH TILDE> 227 70 70 70 195.163 139.68
<a WITH DIAERESIS> 228 67 67 67 195.164 139.69
<a WITH RING ABOVE> 229 71 71 71 195.165 139.70
<SMALL LIGATURE ae> 230 156 156 156 195.166 139.71
<c WITH CEDILLA> 231 72 72 72 195.167 139.72
<e WITH GRAVE> 232 84 84 84 195.168 139.73
<e WITH ACUTE> 233 81 81 81 195.169 139.74
<e WITH CIRCUMFLEX> 234 82 82 82 195.170 139.81
<e WITH DIAERESIS> 235 83 83 83 195.171 139.82
<i WITH GRAVE> 236 88 88 88 195.172 139.83
<i WITH ACUTE> 237 85 85 85 195.173 139.84
<i WITH CIRCUMFLEX> 238 86 86 86 195.174 139.85
<i WITH DIAERESIS> 239 87 87 87 195.175 139.86
<SMALL LETTER eth> 240 140 140 140 195.176 139.87
<n WITH TILDE> 241 73 73 73 195.177 139.88
<o WITH GRAVE> 242 205 205 205 195.178 139.89
<o WITH ACUTE> 243 206 206 206 195.179 139.98
<o WITH CIRCUMFLEX> 244 203 203 203 195.180 139.99
<o WITH TILDE> 245 207 207 207 195.181 139.100
<o WITH DIAERESIS> 246 204 204 204 195.182 139.101
<DIVISION SIGN> 247 225 225 225 195.183 139.102
<o WITH STROKE> 248 112 112 112 195.184 139.103
<u WITH GRAVE> 249 221 221 192 195.185 139.104 ##
<u WITH ACUTE> 250 222 222 222 195.186 139.105
<u WITH CIRCUMFLEX> 251 219 219 219 195.187 139.106
<u WITH DIAERESIS> 252 220 220 220 195.188 139.112
<y WITH ACUTE> 253 141 141 141 195.189 139.113
<SMALL LETTER thorn> 254 142 142 142 195.190 139.114
<y WITH DIAERESIS> 255 223 223 223 195.191 139.115
If you would rather see the above table in CCSID 0037 order rather than
ASCII + Latin-1 order then run the table through:
=over 4
=item recipe 4
=back
perl \
-ne 'if(/.{29}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}/)'\
-e '{push(@l,$_)}' \
-e 'END{print map{$_->[0]}' \
-e ' sort{$a->[1] <=> $b->[1]}' \
-e ' map{[$_,substr($_,34,3)]}@l;}' perlebcdic.pod
If you would rather see it in CCSID 1047 order then change the number
34 in the last line to 39, like this:
=over 4
=item recipe 5
=back
perl \
-ne 'if(/.{29}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}/)'\
-e '{push(@l,$_)}' \
-e 'END{print map{$_->[0]}' \
-e ' sort{$a->[1] <=> $b->[1]}' \
-e ' map{[$_,substr($_,39,3)]}@l;}' perlebcdic.pod
If you would rather see it in POSIX-BC order then change the number
34 in the last line to 44, like this:
=over 4
=item recipe 6
=back
perl \
-ne 'if(/.{29}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}/)'\
-e '{push(@l,$_)}' \
-e 'END{print map{$_->[0]}' \
-e ' sort{$a->[1] <=> $b->[1]}' \
-e ' map{[$_,substr($_,44,3)]}@l;}' perlebcdic.pod
=head2 Table in hex, sorted in 1047 order
Since this document was first written, the convention has become more
and more to use hexadecimal notation for code points. To do this with
the recipes and to also sort is a multi-step process, so here, for
convenience, is the table from above, re-sorted to be in Code Page 1047
order, and using hex notation.
ISO
8859-1 POS- CCSID
CCSID CCSID CCSID IX- 1047
chr 0819 0037 1047 BC UTF-8 UTF-EBCDIC
---------------------------------------------------------------------
<NUL> 00 00 00 00 00 00
<SOH> 01 01 01 01 01 01
<STX> 02 02 02 02 02 02
<ETX> 03 03 03 03 03 03
<ST> 9C 04 04 04 C2.9C 04
<HT> 09 05 05 05 09 05
<SSA> 86 06 06 06 C2.86 06
<DEL> 7F 07 07 07 7F 07
<EPA> 97 08 08 08 C2.97 08
<RI> 8D 09 09 09 C2.8D 09
<SS2> 8E 0A 0A 0A C2.8E 0A
<VT> 0B 0B 0B 0B 0B 0B
<FF> 0C 0C 0C 0C 0C 0C
<CR> 0D 0D 0D 0D 0D 0D
<SO> 0E 0E 0E 0E 0E 0E
<SI> 0F 0F 0F 0F 0F 0F
<DLE> 10 10 10 10 10 10
<DC1> 11 11 11 11 11 11
<DC2> 12 12 12 12 12 12
<DC3> 13 13 13 13 13 13
<OSC> 9D 14 14 14 C2.9D 14
<LF> 0A 25 15 15 0A 15 **
<BS> 08 16 16 16 08 16
<ESA> 87 17 17 17 C2.87 17
<CAN> 18 18 18 18 18 18
<EOM> 19 19 19 19 19 19
<PU2> 92 1A 1A 1A C2.92 1A
<SS3> 8F 1B 1B 1B C2.8F 1B
<FS> 1C 1C 1C 1C 1C 1C
<GS> 1D 1D 1D 1D 1D 1D
<RS> 1E 1E 1E 1E 1E 1E
<US> 1F 1F 1F 1F 1F 1F
<PAD> 80 20 20 20 C2.80 20
<HOP> 81 21 21 21 C2.81 21
<BPH> 82 22 22 22 C2.82 22
<NBH> 83 23 23 23 C2.83 23
<IND> 84 24 24 24 C2.84 24
<NEL> 85 15 25 25 C2.85 25 **
<ETB> 17 26 26 26 17 26
<ESC> 1B 27 27 27 1B 27
<HTS> 88 28 28 28 C2.88 28
<HTJ> 89 29 29 29 C2.89 29
<VTS> 8A 2A 2A 2A C2.8A 2A
<PLD> 8B 2B 2B 2B C2.8B 2B
<PLU> 8C 2C 2C 2C C2.8C 2C
<ENQ> 05 2D 2D 2D 05 2D
<ACK> 06 2E 2E 2E 06 2E
<BEL> 07 2F 2F 2F 07 2F
<DCS> 90 30 30 30 C2.90 30
<PU1> 91 31 31 31 C2.91 31
<SYN> 16 32 32 32 16 32
<STS> 93 33 33 33 C2.93 33
<CCH> 94 34 34 34 C2.94 34
<MW> 95 35 35 35 C2.95 35
<SPA> 96 36 36 36 C2.96 36
<EOT> 04 37 37 37 04 37
<SOS> 98 38 38 38 C2.98 38
<SGC> 99 39 39 39 C2.99 39
<SCI> 9A 3A 3A 3A C2.9A 3A
<CSI> 9B 3B 3B 3B C2.9B 3B
<DC4> 14 3C 3C 3C 14 3C
<NAK> 15 3D 3D 3D 15 3D
<PM> 9E 3E 3E 3E C2.9E 3E
<SUB> 1A 3F 3F 3F 1A 3F
<SPACE> 20 40 40 40 20 40
<NON-BREAKING SPACE> A0 41 41 41 C2.A0 80.41
<a WITH CIRCUMFLEX> E2 42 42 42 C3.A2 8B.43
<a WITH DIAERESIS> E4 43 43 43 C3.A4 8B.45
<a WITH GRAVE> E0 44 44 44 C3.A0 8B.41
<a WITH ACUTE> E1 45 45 45 C3.A1 8B.42
<a WITH TILDE> E3 46 46 46 C3.A3 8B.44
<a WITH RING ABOVE> E5 47 47 47 C3.A5 8B.46
<c WITH CEDILLA> E7 48 48 48 C3.A7 8B.48
<n WITH TILDE> F1 49 49 49 C3.B1 8B.58
<CENT SIGN> A2 4A 4A B0 C2.A2 80.43 ##
. 2E 4B 4B 4B 2E 4B
< 3C 4C 4C 4C 3C 4C
( 28 4D 4D 4D 28 4D
+ 2B 4E 4E 4E 2B 4E
| 7C 4F 4F 4F 7C 4F
& 26 50 50 50 26 50
<e WITH ACUTE> E9 51 51 51 C3.A9 8B.4A
<e WITH CIRCUMFLEX> EA 52 52 52 C3.AA 8B.51
<e WITH DIAERESIS> EB 53 53 53 C3.AB 8B.52
<e WITH GRAVE> E8 54 54 54 C3.A8 8B.49
<i WITH ACUTE> ED 55 55 55 C3.AD 8B.54
<i WITH CIRCUMFLEX> EE 56 56 56 C3.AE 8B.55
<i WITH DIAERESIS> EF 57 57 57 C3.AF 8B.56
<i WITH GRAVE> EC 58 58 58 C3.AC 8B.53
<SMALL LETTER SHARP S> DF 59 59 59 C3.9F 8A.73
! 21 5A 5A 5A 21 5A
$ 24 5B 5B 5B 24 5B
* 2A 5C 5C 5C 2A 5C
) 29 5D 5D 5D 29 5D
; 3B 5E 5E 5E 3B 5E
^ 5E B0 5F 6A 5E 5F ** ##
- 2D 60 60 60 2D 60
/ 2F 61 61 61 2F 61
<A WITH CIRCUMFLEX> C2 62 62 62 C3.82 8A.43
<A WITH DIAERESIS> C4 63 63 63 C3.84 8A.45
<A WITH GRAVE> C0 64 64 64 C3.80 8A.41
<A WITH ACUTE> C1 65 65 65 C3.81 8A.42
<A WITH TILDE> C3 66 66 66 C3.83 8A.44
<A WITH RING ABOVE> C5 67 67 67 C3.85 8A.46
<C WITH CEDILLA> C7 68 68 68 C3.87 8A.48
<N WITH TILDE> D1 69 69 69 C3.91 8A.58
<BROKEN BAR> A6 6A 6A D0 C2.A6 80.47 ##
, 2C 6B 6B 6B 2C 6B
% 25 6C 6C 6C 25 6C
_ 5F 6D 6D 6D 5F 6D
> 3E 6E 6E 6E 3E 6E
? 3F 6F 6F 6F 3F 6F
<o WITH STROKE> F8 70 70 70 C3.B8 8B.67
<E WITH ACUTE> C9 71 71 71 C3.89 8A.4A
<E WITH CIRCUMFLEX> CA 72 72 72 C3.8A 8A.51
<E WITH DIAERESIS> CB 73 73 73 C3.8B 8A.52
<E WITH GRAVE> C8 74 74 74 C3.88 8A.49
<I WITH ACUTE> CD 75 75 75 C3.8D 8A.54
<I WITH CIRCUMFLEX> CE 76 76 76 C3.8E 8A.55
<I WITH DIAERESIS> CF 77 77 77 C3.8F 8A.56
<I WITH GRAVE> CC 78 78 78 C3.8C 8A.53
` 60 79 79 4A 60 79 ##
: 3A 7A 7A 7A 3A 7A
# 23 7B 7B 7B 23 7B
@ 40 7C 7C 7C 40 7C
' 27 7D 7D 7D 27 7D
= 3D 7E 7E 7E 3D 7E
" 22 7F 7F 7F 22 7F
<O WITH STROKE> D8 80 80 80 C3.98 8A.67
a 61 81 81 81 61 81
b 62 82 82 82 62 82
c 63 83 83 83 63 83
d 64 84 84 84 64 84
e 65 85 85 85 65 85
f 66 86 86 86 66 86
g 67 87 87 87 67 87
h 68 88 88 88 68 88
i 69 89 89 89 69 89
<LEFT POINTING GUILLEMET> AB 8A 8A 8A C2.AB 80.52
<RIGHT POINTING GUILLEMET> BB 8B 8B 8B C2.BB 80.6A
<SMALL LETTER eth> F0 8C 8C 8C C3.B0 8B.57
<y WITH ACUTE> FD 8D 8D 8D C3.BD 8B.71
<SMALL LETTER thorn> FE 8E 8E 8E C3.BE 8B.72
<PLUS-OR-MINUS SIGN> B1 8F 8F 8F C2.B1 80.58
<DEGREE SIGN> B0 90 90 90 C2.B0 80.57
j 6A 91 91 91 6A 91
k 6B 92 92 92 6B 92
l 6C 93 93 93 6C 93
m 6D 94 94 94 6D 94
n 6E 95 95 95 6E 95
o 6F 96 96 96 6F 96
p 70 97 97 97 70 97
q 71 98 98 98 71 98
r 72 99 99 99 72 99
<FEMININE ORDINAL> AA 9A 9A 9A C2.AA 80.51
<MASC. ORDINAL INDICATOR> BA 9B 9B 9B C2.BA 80.69
<SMALL LIGATURE ae> E6 9C 9C 9C C3.A6 8B.47
<CEDILLA> B8 9D 9D 9D C2.B8 80.67
<CAPITAL LIGATURE AE> C6 9E 9E 9E C3.86 8A.47
<CURRENCY SIGN> A4 9F 9F 9F C2.A4 80.45
<MICRO SIGN> B5 A0 A0 A0 C2.B5 80.64
~ 7E A1 A1 FF 7E A1 ##
s 73 A2 A2 A2 73 A2
t 74 A3 A3 A3 74 A3
u 75 A4 A4 A4 75 A4
v 76 A5 A5 A5 76 A5
w 77 A6 A6 A6 77 A6
x 78 A7 A7 A7 78 A7
y 79 A8 A8 A8 79 A8
z 7A A9 A9 A9 7A A9
<INVERTED "!" > A1 AA AA AA C2.A1 80.42
<INVERTED QUESTION MARK> BF AB AB AB C2.BF 80.73
<CAPITAL LETTER ETH> D0 AC AC AC C3.90 8A.57
[ 5B BA AD BB 5B AD ** ##
<CAPITAL LETTER THORN> DE AE AE AE C3.9E 8A.72
<REGISTERED TRADE MARK> AE AF AF AF C2.AE 80.55
<NOT SIGN> AC 5F B0 BA C2.AC 80.53 ** ##
<POUND SIGN> A3 B1 B1 B1 C2.A3 80.44
<YEN SIGN> A5 B2 B2 B2 C2.A5 80.46
<MIDDLE DOT> B7 B3 B3 B3 C2.B7 80.66
<COPYRIGHT SIGN> A9 B4 B4 B4 C2.A9 80.4A
<SECTION SIGN> A7 B5 B5 B5 C2.A7 80.48
<PARAGRAPH SIGN> B6 B6 B6 B6 C2.B6 80.65
<FRACTION ONE QUARTER> BC B7 B7 B7 C2.BC 80.70
<FRACTION ONE HALF> BD B8 B8 B8 C2.BD 80.71
<FRACTION THREE QUARTERS> BE B9 B9 B9 C2.BE 80.72
<Y WITH ACUTE> DD AD BA AD C3.9D 8A.71 ** ##
<DIAERESIS> A8 BD BB 79 C2.A8 80.49 ** ##
<MACRON> AF BC BC A1 C2.AF 80.56 ##
] 5D BB BD BD 5D BD **
<ACUTE ACCENT> B4 BE BE BE C2.B4 80.63
<MULTIPLICATION SIGN> D7 BF BF BF C3.97 8A.66
{ 7B C0 C0 FB 7B C0 ##
A 41 C1 C1 C1 41 C1
B 42 C2 C2 C2 42 C2
C 43 C3 C3 C3 43 C3
D 44 C4 C4 C4 44 C4
E 45 C5 C5 C5 45 C5
F 46 C6 C6 C6 46 C6
G 47 C7 C7 C7 47 C7
H 48 C8 C8 C8 48 C8
I 49 C9 C9 C9 49 C9
<SOFT HYPHEN> AD CA CA CA C2.AD 80.54
<o WITH CIRCUMFLEX> F4 CB CB CB C3.B4 8B.63
<o WITH DIAERESIS> F6 CC CC CC C3.B6 8B.65
<o WITH GRAVE> F2 CD CD CD C3.B2 8B.59
<o WITH ACUTE> F3 CE CE CE C3.B3 8B.62
<o WITH TILDE> F5 CF CF CF C3.B5 8B.64
} 7D D0 D0 FD 7D D0 ##
J 4A D1 D1 D1 4A D1
K 4B D2 D2 D2 4B D2
L 4C D3 D3 D3 4C D3
M 4D D4 D4 D4 4D D4
N 4E D5 D5 D5 4E D5
O 4F D6 D6 D6 4F D6
P 50 D7 D7 D7 50 D7
Q 51 D8 D8 D8 51 D8
R 52 D9 D9 D9 52 D9
<SUPERSCRIPT ONE> B9 DA DA DA C2.B9 80.68
<u WITH CIRCUMFLEX> FB DB DB DB C3.BB 8B.6A
<u WITH DIAERESIS> FC DC DC DC C3.BC 8B.70
<u WITH GRAVE> F9 DD DD C0 C3.B9 8B.68 ##
<u WITH ACUTE> FA DE DE DE C3.BA 8B.69
<y WITH DIAERESIS> FF DF DF DF C3.BF 8B.73
\ 5C E0 E0 BC 5C E0 ##
<DIVISION SIGN> F7 E1 E1 E1 C3.B7 8B.66
S 53 E2 E2 E2 53 E2
T 54 E3 E3 E3 54 E3
U 55 E4 E4 E4 55 E4
V 56 E5 E5 E5 56 E5
W 57 E6 E6 E6 57 E6
X 58 E7 E7 E7 58 E7
Y 59 E8 E8 E8 59 E8
Z 5A E9 E9 E9 5A E9
<SUPERSCRIPT TWO> B2 EA EA EA C2.B2 80.59
<O WITH CIRCUMFLEX> D4 EB EB EB C3.94 8A.63
<O WITH DIAERESIS> D6 EC EC EC C3.96 8A.65
<O WITH GRAVE> D2 ED ED ED C3.92 8A.59
<O WITH ACUTE> D3 EE EE EE C3.93 8A.62
<O WITH TILDE> D5 EF EF EF C3.95 8A.64
0 30 F0 F0 F0 30 F0
1 31 F1 F1 F1 31 F1
2 32 F2 F2 F2 32 F2
3 33 F3 F3 F3 33 F3
4 34 F4 F4 F4 34 F4
5 35 F5 F5 F5 35 F5
6 36 F6 F6 F6 36 F6
7 37 F7 F7 F7 37 F7
8 38 F8 F8 F8 38 F8
9 39 F9 F9 F9 39 F9
<SUPERSCRIPT THREE> B3 FA FA FA C2.B3 80.62
<U WITH CIRCUMFLEX> DB FB FB DD C3.9B 8A.6A ##
<U WITH DIAERESIS> DC FC FC FC C3.9C 8A.70
<U WITH GRAVE> D9 FD FD E0 C3.99 8A.68 ##
<U WITH ACUTE> DA FE FE FE C3.9A 8A.69
<APC> 9F FF FF 5F C2.9F FF ##
=head1 IDENTIFYING CHARACTER CODE SETS
It is possible to determine which character set you are operating under.
But first you need to be really really sure you need to do this. Your
code will be simpler and probably just as portable if you don't have
to test the character set and do different things, depending. There are
actually only very few circumstances where it's not easy to write
straight-line code portable to all character sets. See
L<perluniintro/Unicode and EBCDIC> for how to portably specify
characters.
But there are some cases where you may want to know which character set
you are running under. One possible example is doing
L<sorting|/SORTING> in inner loops where performance is critical.
To determine if you are running under ASCII or EBCDIC, you can use the
return value of C<ord()> or C<chr()> to test one or more character
values. For example:
$is_ascii = "A" eq chr(65);
$is_ebcdic = "A" eq chr(193);
$is_ascii = ord("A") == 65;
$is_ebcdic = ord("A") == 193;
There's even less need to distinguish between EBCDIC code pages, but to
do so try looking at one or more of the characters that differ between
them.
$is_ascii = ord('[') == 91;
$is_ebcdic_37 = ord('[') == 186;
$is_ebcdic_1047 = ord('[') == 173;
$is_ebcdic_POSIX_BC = ord('[') == 187;
However, it would be unwise to write tests such as:
$is_ascii = "\r" ne chr(13); # WRONG
$is_ascii = "\n" ne chr(10); # ILL ADVISED
Obviously the first of these will fail to distinguish most ASCII
platforms from either a CCSID 0037, a 1047, or a POSIX-BC EBCDIC
platform since S<C<"\r" eq chr(13)>> under all of those coded character
sets. But note too that because C<"\n"> is C<chr(13)> and C<"\r"> is
C<chr(10)> on old Macintosh (which is an ASCII platform) the second
C<$is_ascii> test will lead to trouble there.
To determine whether or not perl was built under an EBCDIC
code page you can use the Config module like so:
use Config;
$is_ebcdic = $Config{'ebcdic'} eq 'define';
=head1 CONVERSIONS
=head2 C<utf8::unicode_to_native()> and C<utf8::native_to_unicode()>
These functions take an input numeric code point in one encoding and
return what its equivalent value is in the other.
See L<utf8>.
=head2 tr///
In order to convert a string of characters from one character set to
another a simple list of numbers, such as in the right columns in the
above table, along with Perl's C<tr///> operator is all that is needed.
The data in the table are in ASCII/Latin1 order, hence the EBCDIC columns
provide easy-to-use ASCII/Latin1 to EBCDIC operations that are also easily
reversed.
For example, to convert ASCII/Latin1 to code page 037 take the output of the
second numbers column from the output of recipe 2 (modified to add
C<"\"> characters), and use it in C<tr///> like so:
$cp_037 =
'\x00\x01\x02\x03\x37\x2D\x2E\x2F\x16\x05\x25\x0B\x0C\x0D\x0E\x0F' .
'\x10\x11\x12\x13\x3C\x3D\x32\x26\x18\x19\x3F\x27\x1C\x1D\x1E\x1F' .
'\x40\x5A\x7F\x7B\x5B\x6C\x50\x7D\x4D\x5D\x5C\x4E\x6B\x60\x4B\x61' .
'\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\x7A\x5E\x4C\x7E\x6E\x6F' .
'\x7C\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xD1\xD2\xD3\xD4\xD5\xD6' .
'\xD7\xD8\xD9\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xBA\xE0\xBB\xB0\x6D' .
'\x79\x81\x82\x83\x84\x85\x86\x87\x88\x89\x91\x92\x93\x94\x95\x96' .
'\x97\x98\x99\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xC0\x4F\xD0\xA1\x07' .
'\x20\x21\x22\x23\x24\x15\x06\x17\x28\x29\x2A\x2B\x2C\x09\x0A\x1B' .
'\x30\x31\x1A\x33\x34\x35\x36\x08\x38\x39\x3A\x3B\x04\x14\x3E\xFF' .
'\x41\xAA\x4A\xB1\x9F\xB2\x6A\xB5\xBD\xB4\x9A\x8A\x5F\xCA\xAF\xBC' .
'\x90\x8F\xEA\xFA\xBE\xA0\xB6\xB3\x9D\xDA\x9B\x8B\xB7\xB8\xB9\xAB' .
'\x64\x65\x62\x66\x63\x67\x9E\x68\x74\x71\x72\x73\x78\x75\x76\x77' .
'\xAC\x69\xED\xEE\xEB\xEF\xEC\xBF\x80\xFD\xFE\xFB\xFC\xAD\xAE\x59' .
'\x44\x45\x42\x46\x43\x47\x9C\x48\x54\x51\x52\x53\x58\x55\x56\x57' .
'\x8C\x49\xCD\xCE\xCB\xCF\xCC\xE1\x70\xDD\xDE\xDB\xDC\x8D\x8E\xDF';
my $ebcdic_string = $ascii_string;
eval '$ebcdic_string =~ tr/\000-\377/' . $cp_037 . '/';
To convert from EBCDIC 037 to ASCII just reverse the order of the tr///
arguments like so:
my $ascii_string = $ebcdic_string;
eval '$ascii_string =~ tr/' . $cp_037 . '/\000-\377/';
Similarly one could take the output of the third numbers column from recipe 2
to obtain a C<$cp_1047> table. The fourth numbers column of the output from
recipe 2 could provide a C<$cp_posix_bc> table suitable for transcoding as
well.
If you wanted to see the inverse tables, you would first have to sort on the
desired numbers column as in recipes 4, 5 or 6, then take the output of the
first numbers column.
=head2 iconv
XPG operability often implies the presence of an I<iconv> utility
available from the shell or from the C library. Consult your system's
documentation for information on iconv.
On OS/390 or z/OS see the L<iconv(1)> manpage. One way to invoke the C<iconv>
shell utility from within perl would be to:
# OS/390 or z/OS example
$ascii_data = `echo '$ebcdic_data'| iconv -f IBM-1047 -t ISO8859-1`
or the inverse map:
# OS/390 or z/OS example
$ebcdic_data = `echo '$ascii_data'| iconv -f ISO8859-1 -t IBM-1047`
For other Perl-based conversion options see the C<Convert::*> modules on CPAN.
=head2 C RTL
The OS/390 and z/OS C run-time libraries provide C<_atoe()> and C<_etoa()> functions.
=head1 OPERATOR DIFFERENCES
The C<..> range operator treats certain character ranges with
care on EBCDIC platforms. For example the following array
will have twenty six elements on either an EBCDIC platform
or an ASCII platform:
@alphabet = ('A'..'Z'); # $#alphabet == 25
The bitwise operators such as & ^ | may return different results
when operating on string or character data in a Perl program running
on an EBCDIC platform than when run on an ASCII platform. Here is
an example adapted from the one in L<perlop>:
# EBCDIC-based examples
print "j p \n" ^ " a h"; # prints "JAPH\n"
print "JA" | " ph\n"; # prints "japh\n"
print "JAPH\nJunk" & "\277\277\277\277\277"; # prints "japh\n";
print 'p N$' ^ " E<H\n"; # prints "Perl\n";
An interesting property of the 32 C0 control characters
in the ASCII table is that they can "literally" be constructed
as control characters in Perl, e.g. C<(chr(0)> eq C<\c@>)>
C<(chr(1)> eq C<\cA>)>, and so on. Perl on EBCDIC platforms has been
ported to take C<\c@> to C<chr(0)> and C<\cA> to C<chr(1)>, etc. as well, but the
characters that result depend on which code page you are
using. The table below uses the standard acronyms for the controls.
The POSIX-BC and 1047 sets are
identical throughout this range and differ from the 0037 set at only
one spot (21 decimal). Note that the line terminator character
may be generated by C<\cJ> on ASCII platforms but by C<\cU> on 1047 or POSIX-BC
platforms and cannot be generated as a C<"\c.letter."> control character on
0037 platforms. Note also that C<\c\> cannot be the final element in a string
or regex, as it will absorb the terminator. But C<\c\I<X>> is a C<FILE
SEPARATOR> concatenated with I<X> for all I<X>.
The outlier C<\c?> on ASCII, which yields a non-C0 control C<DEL>,
yields the outlier control C<APC> on EBCDIC, the one that isn't in the
block of contiguous controls. Note that a subtlety of this is that
C<\c?> on ASCII platforms is an ASCII character, while it isn't
equivalent to any ASCII character in EBCDIC platforms.
chr ord 8859-1 0037 1047 && POSIX-BC
-----------------------------------------------------------------------
\c@ 0 <NUL> <NUL> <NUL>
\cA 1 <SOH> <SOH> <SOH>
\cB 2 <STX> <STX> <STX>
\cC 3 <ETX> <ETX> <ETX>
\cD 4 <EOT> <ST> <ST>
\cE 5 <ENQ> <HT> <HT>
\cF 6 <ACK> <SSA> <SSA>
\cG 7 <BEL> <DEL> <DEL>
\cH 8 <BS> <EPA> <EPA>
\cI 9 <HT> <RI> <RI>
\cJ 10 <LF> <SS2> <SS2>
\cK 11 <VT> <VT> <VT>
\cL 12 <FF> <FF> <FF>
\cM 13 <CR> <CR> <CR>
\cN 14 <SO> <SO> <SO>
\cO 15 <SI> <SI> <SI>
\cP 16 <DLE> <DLE> <DLE>
\cQ 17 <DC1> <DC1> <DC1>
\cR 18 <DC2> <DC2> <DC2>
\cS 19 <DC3> <DC3> <DC3>
\cT 20 <DC4> <OSC> <OSC>
\cU 21 <NAK> <NEL> <LF> **
\cV 22 <SYN> <BS> <BS>
\cW 23 <ETB> <ESA> <ESA>
\cX 24 <CAN> <CAN> <CAN>
\cY 25 <EOM> <EOM> <EOM>
\cZ 26 <SUB> <PU2> <PU2>
\c[ 27 <ESC> <SS3> <SS3>
\c\X 28 <FS>X <FS>X <FS>X
\c] 29 <GS> <GS> <GS>
\c^ 30 <RS> <RS> <RS>
\c_ 31 <US> <US> <US>
\c? * <DEL> <APC> <APC>
C<*> Note: C<\c?> maps to ordinal 127 (C<DEL>) on ASCII platforms, but
since ordinal 127 is a not a control character on EBCDIC machines,
C<\c?> instead maps on them to C<APC>, which is 255 in 0037 and 1047,
and 95 in POSIX-BC.
=head1 FUNCTION DIFFERENCES
=over 8
=item C<chr()>
C<chr()> must be given an EBCDIC code number argument to yield a desired
character return value on an EBCDIC platform. For example:
$CAPITAL_LETTER_A = chr(193);
=item C<ord()>
C<ord()> will return EBCDIC code number values on an EBCDIC platform.
For example:
$the_number_193 = ord("A");
=item C<pack()>
The C<"c"> and C<"C"> templates for C<pack()> are dependent upon character set
encoding. Examples of usage on EBCDIC include:
$foo = pack("CCCC",193,194,195,196);
# $foo eq "ABCD"
$foo = pack("C4",193,194,195,196);
# same thing
$foo = pack("ccxxcc",193,194,195,196);
# $foo eq "AB\0\0CD"
The C<"U"> template has been ported to mean "Unicode" on all platforms so
that
pack("U", 65) eq 'A'
is true on all platforms. If you want native code points for the low
256, use the C<"W"> template. This means that the equivalences
pack("W", ord($character)) eq $character
unpack("W", $character) == ord $character
will hold.
=item C<print()>
One must be careful with scalars and strings that are passed to
print that contain ASCII encodings. One common place
for this to occur is in the output of the MIME type header for
CGI script writing. For example, many Perl programming guides
recommend something similar to:
print "Content-type:\ttext/html\015\012\015\012";
# this may be wrong on EBCDIC
You can instead write
print "Content-type:\ttext/html\r\n\r\n"; # OK for DGW et al
and have it work portably.
That is because the translation from EBCDIC to ASCII is done
by the web server in this case. Consult your web server's documentation for
further details.
=item C<printf()>
The formats that can convert characters to numbers and vice versa
will be different from their ASCII counterparts when executed
on an EBCDIC platform. Examples include:
printf("%c%c%c",193,194,195); # prints ABC
=item C<sort()>
EBCDIC sort results may differ from ASCII sort results especially for
mixed case strings. This is discussed in more detail L<below|/SORTING>.
=item C<sprintf()>
See the discussion of C<L</printf()>> above. An example of the use
of sprintf would be:
$CAPITAL_LETTER_A = sprintf("%c",193);
=item C<unpack()>
See the discussion of C<L</pack()>> above.
=back
Note that it is possible to write portable code for these by specifying
things in Unicode numbers, and using a conversion function:
printf("%c",utf8::unicode_to_native(65)); # prints A on all
# platforms
print utf8::native_to_unicode(ord("A")); # Likewise, prints 65
See L<perluniintro/Unicode and EBCDIC> and L</CONVERSIONS>
for other options.
=head1 REGULAR EXPRESSION DIFFERENCES
You can write your regular expressions just like someone on an ASCII
platform would do. But keep in mind that using octal or hex notation to
specify a particular code point will give you the character that the
EBCDIC code page natively maps to it. (This is also true of all
double-quoted strings.) If you want to write portably, just use the
C<\N{U+...}> notation everywhere where you would have used C<\x{...}>,
and don't use octal notation at all.
Starting in Perl v5.22, this applies to ranges in bracketed character
classes. If you say, for example, C<qr/[\N{U+20}-\N{U+7F}]/>, it means
the characters C<\N{U+20}>, C<\N{U+21}>, ..., C<\N{U+7F}>. This range
is all the printable characters that the ASCII character set contains.
Prior to v5.22, you couldn't specify any ranges portably, except
(starting in Perl v5.5.3) all subsets of the C<[A-Z]> and C<[a-z]>
ranges are specially coded to not pick up gap characters. For example,
characters such as "E<ocirc>" (C<o WITH CIRCUMFLEX>) that lie between
"I" and "J" would not be matched by the regular expression range
C</[H-K]/>. But if either of the range end points is explicitly numeric
(and neither is specified by C<\N{U+...}>), the gap characters are
matched:
/[\x89-\x91]/
will match C<\x8e>, even though C<\x89> is "i" and C<\x91 > is "j",
and C<\x8e> is a gap character, from the alphabetic viewpoint.
Another construct to be wary of is the inappropriate use of hex (unless
you use C<\N{U+...}>) or
octal constants in regular expressions. Consider the following
set of subs:
sub is_c0 {
my $char = substr(shift,0,1);
$char =~ /[\000-\037]/;
}
sub is_print_ascii {
my $char = substr(shift,0,1);
$char =~ /[\040-\176]/;
}
sub is_delete {
my $char = substr(shift,0,1);
$char eq "\177";
}
sub is_c1 {
my $char = substr(shift,0,1);
$char =~ /[\200-\237]/;
}
sub is_latin_1 { # But not ASCII; not C1
my $char = substr(shift,0,1);
$char =~ /[\240-\377]/;
}
These are valid only on ASCII platforms. Starting in Perl v5.22, simply
changing the octal constants to equivalent C<\N{U+...}> values makes
them portable:
sub is_c0 {
my $char = substr(shift,0,1);
$char =~ /[\N{U+00}-\N{U+1F}]/;
}
sub is_print_ascii {
my $char = substr(shift,0,1);
$char =~ /[\N{U+20}-\N{U+7E}]/;
}
sub is_delete {
my $char = substr(shift,0,1);
$char eq "\N{U+7F}";
}
sub is_c1 {
my $char = substr(shift,0,1);
$char =~ /[\N{U+80}-\N{U+9F}]/;
}
sub is_latin_1 { # But not ASCII; not C1
my $char = substr(shift,0,1);
$char =~ /[\N{U+A0}-\N{U+FF}]/;
}
And here are some alternative portable ways to write them:
sub Is_c0 {
my $char = substr(shift,0,1);
return $char =~ /[[:cntrl:]]/a && ! Is_delete($char);
# Alternatively:
# return $char =~ /[[:cntrl:]]/
# && $char =~ /[[:ascii:]]/
# && ! Is_delete($char);
}
sub Is_print_ascii {
my $char = substr(shift,0,1);
return $char =~ /[[:print:]]/a;
# Alternatively:
# return $char =~ /[[:print:]]/ && $char =~ /[[:ascii:]]/;
# Or
# return $char
# =~ /[ !"\#\$%&'()*+,\-.\/0-9:;<=>?\@A-Z[\\\]^_`a-z{|}~]/;
}
sub Is_delete {
my $char = substr(shift,0,1);
return utf8::native_to_unicode(ord $char) == 0x7F;
}
sub Is_c1 {
use feature 'unicode_strings';
my $char = substr(shift,0,1);
return $char =~ /[[:cntrl:]]/ && $char !~ /[[:ascii:]]/;
}
sub Is_latin_1 { # But not ASCII; not C1
use feature 'unicode_strings';
my $char = substr(shift,0,1);
return ord($char) < 256
&& $char !~ /[[:ascii:]]/
&& $char !~ /[[:cntrl:]]/;
}
Another way to write C<Is_latin_1()> would be
to use the characters in the range explicitly:
sub Is_latin_1 {
my $char = substr(shift,0,1);
$char =~ /[ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ]
[ÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]/x;
}
Although that form may run into trouble in network transit (due to the
presence of 8 bit characters) or on non ISO-Latin character sets. But
it does allow C<Is_c1> to be rewritten so it works on Perls that don't
have C<'unicode_strings'> (earlier than v5.14):
sub Is_latin_1 { # But not ASCII; not C1
my $char = substr(shift,0,1);
return ord($char) < 256
&& $char !~ /[[:ascii:]]/
&& ! Is_latin1($char);
}
=head1 SOCKETS
Most socket programming assumes ASCII character encodings in network
byte order. Exceptions can include CGI script writing under a
host web server where the server may take care of translation for you.
Most host web servers convert EBCDIC data to ISO-8859-1 or Unicode on
output.
=head1 SORTING
One big difference between ASCII-based character sets and EBCDIC ones
are the relative positions of the characters when sorted in native
order. Of most concern are the upper- and lowercase letters, the
digits, and the underscore (C<"_">). On ASCII platforms the native sort
order has the digits come before the uppercase letters which come before
the underscore which comes before the lowercase letters. On EBCDIC, the
underscore comes first, then the lowercase letters, then the uppercase
ones, and the digits last. If sorted on an ASCII-based platform, the
two-letter abbreviation for a physician comes before the two letter
abbreviation for drive; that is:
@sorted = sort(qw(Dr. dr.)); # @sorted holds ('Dr.','dr.') on ASCII,
# but ('dr.','Dr.') on EBCDIC
The property of lowercase before uppercase letters in EBCDIC is
even carried to the Latin 1 EBCDIC pages such as 0037 and 1047.
An example would be that "E<Euml>" (C<E WITH DIAERESIS>, 203) comes
before "E<euml>" (C<e WITH DIAERESIS>, 235) on an ASCII platform, but
the latter (83) comes before the former (115) on an EBCDIC platform.
(Astute readers will note that the uppercase version of "E<szlig>"
C<SMALL LETTER SHARP S> is simply "SS" and that the upper case versions
of "E<yuml>" (small C<y WITH DIAERESIS>) and "E<micro>" (C<MICRO SIGN>)
are not in the 0..255 range but are in Unicode, in a Unicode enabled
Perl).
The sort order will cause differences between results obtained on
ASCII platforms versus EBCDIC platforms. What follows are some suggestions
on how to deal with these differences.
=head2 Ignore ASCII vs. EBCDIC sort differences.
This is the least computationally expensive strategy. It may require
some user education.
=head2 Use a sort helper function
This is completely general, but the most computationally expensive
strategy. Choose one or the other character set and transform to that
for every sort comparision. Here's a complete example that transforms
to ASCII sort order:
sub native_to_uni($) {
my $string = shift;
# Saves time on an ASCII platform
return $string if ord 'A' == 65;
my $output = "";
for my $i (0 .. length($string) - 1) {
$output
.= chr(utf8::native_to_unicode(ord(substr($string, $i, 1))));
}
# Preserve utf8ness of input onto the output, even if it didn't need
# to be utf8
utf8::upgrade($output) if utf8::is_utf8($string);
return $output;
}
sub ascii_order { # Sort helper
return native_to_uni($a) cmp native_to_uni($b);
}
sort ascii_order @list;
=head2 MONO CASE then sort data (for non-digits, non-underscore)
If you don't care about where digits and underscore sort to, you can do
something like this
sub case_insensitive_order { # Sort helper
return lc($a) cmp lc($b)
}
sort case_insensitive_order @list;
If performance is an issue, and you don't care if the output is in the
same case as the input, Use C<tr///> to transform to the case most
employed within the data. If the data are primarily UPPERCASE
non-Latin1, then apply C<tr/[a-z]/[A-Z]/>, and then C<sort()>. If the
data are primarily lowercase non Latin1 then apply C<tr/[A-Z]/[a-z]/>
before sorting. If the data are primarily UPPERCASE and include Latin-1
characters then apply:
tr/[a-z]/[A-Z]/;
tr/[àáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ]/[ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ/;
s/ß/SS/g;
then C<sort()>. If you have a choice, it's better to lowercase things
to avoid the problems of the two Latin-1 characters whose uppercase is
outside Latin-1: "E<yuml>" (small C<y WITH DIAERESIS>) and "E<micro>"
(C<MICRO SIGN>). If you do need to upppercase, you can; with a
Unicode-enabled Perl, do:
tr/ÿ/\x{178}/;
tr/µ/\x{39C}/;
=head2 Perform sorting on one type of platform only.
This strategy can employ a network connection. As such
it would be computationally expensive.
=head1 TRANSFORMATION FORMATS
There are a variety of ways of transforming data with an intra character set
mapping that serve a variety of purposes. Sorting was discussed in the
previous section and a few of the other more popular mapping techniques are
discussed next.
=head2 URL decoding and encoding
Note that some URLs have hexadecimal ASCII code points in them in an
attempt to overcome character or protocol limitation issues. For example
the tilde character is not on every keyboard hence a URL of the form:
http://www.pvhp.com/~pvhp/
may also be expressed as either of:
http://www.pvhp.com/%7Epvhp/
http://www.pvhp.com/%7epvhp/
where 7E is the hexadecimal ASCII code point for "~". Here is an example
of decoding such a URL in any EBCDIC code page:
$url = 'http://www.pvhp.com/%7Epvhp/';
$url =~ s/%([0-9a-fA-F]{2})/
pack("c",utf8::unicode_to_native(hex($1)))/xge;
Conversely, here is a partial solution for the task of encoding such
a URL in any EBCDIC code page:
$url = 'http://www.pvhp.com/~pvhp/';
# The following regular expression does not address the
# mappings for: ('.' => '%2E', '/' => '%2F', ':' => '%3A')
$url =~ s/([\t "#%&\(\),;<=>\?\@\[\\\]^`{|}~])/
sprintf("%%%02X",utf8::native_to_unicode(ord($1)))/xge;
where a more complete solution would split the URL into components
and apply a full s/// substitution only to the appropriate parts.
=head2 uu encoding and decoding
The C<u> template to C<pack()> or C<unpack()> will render EBCDIC data in
EBCDIC characters equivalent to their ASCII counterparts. For example,
the following will print "Yes indeed\n" on either an ASCII or EBCDIC
computer:
$all_byte_chrs = '';
for (0..255) { $all_byte_chrs .= chr($_); }
$uuencode_byte_chrs = pack('u', $all_byte_chrs);
($uu = <<'ENDOFHEREDOC') =~ s/^\s*//gm;
M``$"`P0%!@<("0H+#`T.#Q`1$A,4%187&!D:&QP='A\@(2(C)"4F)R@I*BLL
M+2XO,#$R,S0U-C<X.3H[/#T^/T!!0D-$149'2$E*2TQ-3D]045)35%565UA9
M6EM<75Y?8&%B8V1E9F=H:6IK;&UN;W!Q<G-T=79W>'EZ>WQ]?G^`@8*#A(6&
MAXB)BHN,C8Z/D)&2DY25EI>8F9J;G)V>GZ"AHJ.DI::GJ*FJJZRMKJ^PL;*S
MM+6VM[BYNKN\O;Z_P,'"P\3%QL?(R<K+S,W.S]#1TM/4U=;7V-G:V]S=WM_@
?X>+CY.7FY^CIZNOL[>[O\/'R\_3U]O?X^?K[_/W^_P``
ENDOFHEREDOC
if ($uuencode_byte_chrs eq $uu) {
print "Yes ";
}
$uudecode_byte_chrs = unpack('u', $uuencode_byte_chrs);
if ($uudecode_byte_chrs eq $all_byte_chrs) {
print "indeed\n";
}
Here is a very spartan uudecoder that will work on EBCDIC:
#!/usr/local/bin/perl
$_ = <> until ($mode,$file) = /^begin\s*(\d*)\s*(\S*)/;
open(OUT, "> $file") if $file ne "";
while(<>) {
last if /^end/;
next if /[a-z]/;
next unless int((((utf8::native_to_unicode(ord()) - 32 ) & 077)
+ 2) / 3)
== int(length() / 4);
print OUT unpack("u", $_);
}
close(OUT);
chmod oct($mode), $file;
=head2 Quoted-Printable encoding and decoding
On ASCII-encoded platforms it is possible to strip characters outside of
the printable set using:
# This QP encoder works on ASCII only
$qp_string =~ s/([=\x00-\x1F\x80-\xFF])/
sprintf("=%02X",ord($1))/xge;
Starting in Perl v5.22, this is trivially changeable to work portably on
both ASCII and EBCDIC platforms.
# This QP encoder works on both ASCII and EBCDIC
$qp_string =~ s/([=\N{U+00}-\N{U+1F}\N{U+80}-\N{U+FF}])/
sprintf("=%02X",ord($1))/xge;
For earlier Perls, a QP encoder that works on both ASCII and EBCDIC
platforms would look somewhat like the following:
$delete = utf8::unicode_to_native(ord("\x7F"));
$qp_string =~
s/([^[:print:]$delete])/
sprintf("=%02X",utf8::native_to_unicode(ord($1)))/xage;
(although in production code the substitutions might be done
in the EBCDIC branch with the function call and separately in the
ASCII branch without the expense of the identity map; in Perl v5.22, the
identity map is optimized out so there is no expense, but the
alternative above is simpler and is also available in v5.22).
Such QP strings can be decoded with:
# This QP decoder is limited to ASCII only
$string =~ s/=([[:xdigit:][[:xdigit:])/chr hex $1/ge;
$string =~ s/=[\n\r]+$//;
Whereas a QP decoder that works on both ASCII and EBCDIC platforms
would look somewhat like the following:
$string =~ s/=([[:xdigit:][:xdigit:]])/
chr utf8::native_to_unicode(hex $1)/xge;
$string =~ s/=[\n\r]+$//;
=head2 Caesarean ciphers
The practice of shifting an alphabet one or more characters for encipherment
dates back thousands of years and was explicitly detailed by Gaius Julius
Caesar in his B<Gallic Wars> text. A single alphabet shift is sometimes
referred to as a rotation and the shift amount is given as a number $n after
the string 'rot' or "rot$n". Rot0 and rot26 would designate identity maps
on the 26-letter English version of the Latin alphabet. Rot13 has the
interesting property that alternate subsequent invocations are identity maps
(thus rot13 is its own non-trivial inverse in the group of 26 alphabet
rotations). Hence the following is a rot13 encoder and decoder that will
work on ASCII and EBCDIC platforms:
#!/usr/local/bin/perl
while(<>){
tr/n-za-mN-ZA-M/a-zA-Z/;
print;
}
In one-liner form:
perl -ne 'tr/n-za-mN-ZA-M/a-zA-Z/;print'
=head1 Hashing order and checksums
Perl deliberately randomizes hash order for security purposes on both
ASCII and EBCDIC platforms.
EBCDIC checksums will differ for the same file translated into ASCII
and vice versa.
=head1 I18N AND L10N
Internationalization (I18N) and localization (L10N) are supported at least
in principle even on EBCDIC platforms. The details are system-dependent
and discussed under the L</OS ISSUES> section below.
=head1 MULTI-OCTET CHARACTER SETS
Perl works with UTF-EBCDIC, a multi-byte encoding. In Perls earlier
than v5.22, there may be various bugs in this regard.
Legacy multi byte EBCDIC code pages XXX.
=head1 OS ISSUES
There may be a few system-dependent issues
of concern to EBCDIC Perl programmers.
=head2 OS/400
=over 8
=item PASE
The PASE environment is a runtime environment for OS/400 that can run
executables built for PowerPC AIX in OS/400; see L<perlos400>. PASE
is ASCII-based, not EBCDIC-based as the ILE.
=item IFS access
XXX.
=back
=head2 OS/390, z/OS
Perl runs under Unix Systems Services or USS.
=over 8
=item C<sigaction>
C<SA_SIGINFO> can have segmentation faults.
=item C<chcp>
B<chcp> is supported as a shell utility for displaying and changing
one's code page. See also L<chcp(1)>.
=item dataset access
For sequential data set access try:
my @ds_records = `cat //DSNAME`;
or:
my @ds_records = `cat //'HLQ.DSNAME'`;
See also the OS390::Stdio module on CPAN.
=item C<iconv>
B<iconv> is supported as both a shell utility and a C RTL routine.
See also the L<iconv(1)> and L<iconv(3)> manual pages.
=item locales
Locales are supported. There may be glitches when a locale is another
EBCDIC code page which has some of the
L<code-page variant characters|/The 13 variant characters> in other
positions.
There aren't currently any real UTF-8 locales, even though some locale
names contain the string "UTF-8".
See L<perllocale> for information on locales. The L10N files
are in F</usr/nls/locale>. C<$Config{d_setlocale}> is C<'define'> on
OS/390 or z/OS.
=back
=head2 POSIX-BC?
XXX.
=head1 BUGS
=over 4
=item *
Not all shells will allow multiple C<-e> string arguments to perl to
be concatenated together properly as recipes in this document
0, 2, 4, 5, and 6 might
seem to imply.
=item *
There are a significant number of test failures in the CPAN modules
shipped with Perl v5.22 and 5.24. These are only in modules not primarily
maintained by Perl 5 porters. Some of these are failures in the tests
only: they don't realize that it is proper to get different results on
EBCDIC platforms. And some of the failures are real bugs. If you
compile and do a C<make test> on Perl, all tests on the C</cpan>
directory are skipped.
L<Encode> partially works.
=item *
In earlier Perl versions, when byte and character data were
concatenated, the new string was sometimes created by
decoding the byte strings as I<ISO 8859-1 (Latin-1)>, even if the
old Unicode string used EBCDIC.
=back
=head1 SEE ALSO
L<perllocale>, L<perlfunc>, L<perlunicode>, L<utf8>.
=head1 REFERENCES
L<http://anubis.dkuug.dk/i18n/charmaps>
L<http://www.unicode.org/>
L<http://www.unicode.org/unicode/reports/tr16/>
L<http://www.wps.com/projects/codes/>
B<ASCII: American Standard Code for Information Infiltration> Tom Jennings,
September 1999.
B<The Unicode Standard, Version 3.0> The Unicode Consortium, Lisa Moore ed.,
ISBN 0-201-61633-5, Addison Wesley Developers Press, February 2000.
B<CDRA: IBM - Character Data Representation Architecture -
Reference and Registry>, IBM SC09-2190-00, December 1996.
"Demystifying Character Sets", Andrea Vine, Multilingual Computing
& Technology, B<#26 Vol. 10 Issue 4>, August/September 1999;
ISSN 1523-0309; Multilingual Computing Inc. Sandpoint ID, USA.
B<Codes, Ciphers, and Other Cryptic and Clandestine Communication>
Fred B. Wrixon, ISBN 1-57912-040-7, Black Dog & Leventhal Publishers,
1998.
L<http://www.bobbemer.com/P-BIT.HTM>
B<IBM - EBCDIC and the P-bit; The biggest Computer Goof Ever> Robert Bemer.
=head1 HISTORY
15 April 2001: added UTF-8 and UTF-EBCDIC to main table, pvhp.
=head1 AUTHOR
Peter Prymmer pvhp@best.com wrote this in 1999 and 2000
with CCSID 0819 and 0037 help from Chris Leach and
AndrE<eacute> Pirard A.Pirard@ulg.ac.be as well as POSIX-BC
help from Thomas Dorner Thomas.Dorner@start.de.
Thanks also to Vickie Cooper, Philip Newton, William Raffloer, and
Joe Smith. Trademarks, registered trademarks, service marks and
registered service marks used in this document are the property of
their respective owners.
Now maintained by Perl5 Porters.
PK z3�Z� �ٰ$ �$ perl5203delta.podnu �[��� =encoding utf8
=head1 NAME
perl5203delta - what is new for perl v5.20.3
=head1 DESCRIPTION
This document describes differences between the 5.20.2 release and the 5.20.3
release.
If you are upgrading from an earlier release such as 5.20.1, first read
L<perl5202delta>, which describes differences between 5.20.1 and 5.20.2.
=head1 Incompatible Changes
There are no changes intentionally incompatible with 5.20.2. If any exist,
they are bugs, and we request that you submit a report. See L</Reporting Bugs>
below.
=head1 Modules and Pragmata
=head2 Updated Modules and Pragmata
=over 4
=item *
L<Errno> has been upgraded from version 1.20_05 to 1.20_06.
Add B<-P> to the pre-processor command-line on GCC 5. GCC added extra line
directives, breaking parsing of error code definitions.
L<[perl #123784]|https://rt.perl.org/Ticket/Display.html?id=123784>
=item *
L<Module::CoreList> has been upgraded from version 5.20150214 to 5.20150822.
Updated to cover the latest releases of Perl.
=item *
L<perl5db.pl> has been upgraded from 1.44 to 1.44_01.
The debugger would cause an assertion failure.
L<[perl #124127]|https://rt.perl.org/Ticket/Display.html?id=124127>
=back
=head1 Documentation
=head2 Changes to Existing Documentation
=head3 L<perlfunc>
=over 4
=item *
Mention that L<C<study()>|perlfunc/study> is currently a no-op.
=back
=head3 L<perlguts>
=over 4
=item *
The OOK example has been updated to account for COW changes and a change in the
storage of the offset.
=back
=head3 L<perlhacktips>
=over 4
=item *
Documentation has been added illustrating the perils of assuming the contents
of static memory pointed to by the return values of Perl wrappers for C library
functions doesn't change.
=back
=head3 L<perlpodspec>
=over 4
=item *
The specification of the POD language is changing so that the default encoding
of PODs that aren't in UTF-8 (unless otherwise indicated) is CP1252 instead of
ISO-8859-1 (Latin1).
=back
=head1 Utility Changes
=head2 L<h2ph>
=over 4
=item *
B<h2ph> now handles hexadecimal constants in the compiler's predefined macro
definitions, as visible in C<$Config{cppsymbols}>.
L<[perl #123784]|https://rt.perl.org/Ticket/Display.html?id=123784>
=back
=head1 Testing
=over 4
=item *
F<t/perf/taint.t> has been added to see if optimisations with taint issues are
keeping things fast.
=item *
F<t/porting/re_context.t> has been added to test that L<utf8> and its
dependencies only use the subset of the C<$1..$n> capture vars that
Perl_save_re_context() is hard-coded to localize, because that function has no
efficient way of determining at runtime what vars to localize.
=back
=head1 Platform Support
=head2 Platform-Specific Notes
=over 4
=item Win32
=over 4
=item *
Previously, when compiling with a 64-bit Visual C++, every Perl XS module
(including CPAN ones) and Perl aware C file would unconditionally have around a
dozen warnings from F<hv_func.h>. These warnings have been silenced. GCC (all
bitness) and 32-bit Visual C++ were not affected.
=item *
B<miniperl.exe> is now built with B<-fno-strict-aliasing>, allowing 64-bit
builds to complete with GCC 4.8.
L<[perl #123976]|https://rt.perl.org/Ticket/Display.html?id=123976>
=back
=back
=head1 Selected Bug Fixes
=over 4
=item *
Repeated global pattern matches in scalar context on large tainted strings were
exponentially slow depending on the current match position in the string.
L<[perl #123202]|https://rt.perl.org/Ticket/Display.html?id=123202>
=item *
The original visible value of L<C<$E<sol>>|perlvar/$E<sol>> is now preserved
when it is set to an invalid value. Previously if you set C<$/> to a reference
to an array, for example, perl would produce a runtime error and not set PL_rs,
but Perl code that checked C<$/> would see the array reference.
L<[perl #123218]|https://rt.perl.org/Ticket/Display.html?id=123218>
=item *
Perl 5.14.0 introduced a bug whereby C<eval { LABEL: }> would crash. This has
been fixed.
L<[perl #123652]|https://rt.perl.org/Ticket/Display.html?id=123652>
=item *
Extending an array cloned from a parent thread could result in "Modification of
a read-only value attempted" errors when attempting to modify the new elements.
L<[perl #124127]|https://rt.perl.org/Ticket/Display.html?id=124127>
=item *
Several cases of data used to store environment variable contents in core C
code being potentially overwritten before being used have been fixed.
L<[perl #123748]|https://rt.perl.org/Ticket/Display.html?id=123748>
=item *
UTF-8 variable names used in array indexes, unquoted UTF-8 HERE-document
terminators and UTF-8 function names all now work correctly.
L<[perl #124113]|https://rt.perl.org/Ticket/Display.html?id=124113>
=item *
A subtle bug introduced in Perl 5.20.2 involving UTF-8 in regular expressions
and sometimes causing a crash has been fixed. A new test script has been added
to test this fix; see under L</Testing>.
L<[perl #124109]|https://rt.perl.org/Ticket/Display.html?id=124109>
=item *
Some patterns starting with C</.*..../> matched against long strings have been
slow since Perl 5.8, and some of the form C</.*..../i> have been slow since
Perl 5.18. They are now all fast again.
L<[perl #123743]|https://rt.perl.org/Ticket/Display.html?id=123743>
=item *
Warning fatality is now ignored when rewinding the stack. This prevents
infinite recursion when the now fatal error also causes rewinding of the stack.
L<[perl #123398]|https://rt.perl.org/Ticket/Display.html?id=123398>
=item *
C<setpgrp($nonzero)> (with one argument) was accidentally changed in Perl 5.16
to mean C<setpgrp(0)>. This has been fixed.
=item *
A crash with C<< %::=(); J->${\"::"} >> has been fixed.
L<[perl #125541]|https://rt.perl.org/Ticket/Display.html?id=125541>
=item *
Regular expression possessive quantifier Perl 5.20 regression now fixed.
C<qr/>I<PAT>C<{>I<min>,I<max>C<}+>C</> is supposed to behave identically to
C<qr/(?E<gt>>I<PAT>C<{>I<min>,I<max>C<})/>. Since Perl 5.20, this didn't work
if I<min> and I<max> were equal.
L<[perl #125825]|https://rt.perl.org/Ticket/Display.html?id=125825>
=item *
Code like C</$a[/> used to read the next line of input and treat it as though
it came immediately after the opening bracket. Some invalid code consequently
would parse and run, but some code caused crashes, so this is now disallowed.
L<[perl #123712]|https://rt.perl.org/Ticket/Display.html?id=123712>
=back
=head1 Acknowledgements
Perl 5.20.3 represents approximately 7 months of development since Perl 5.20.2
and contains approximately 3,200 lines of changes across 99 files from 26
authors.
Excluding auto-generated files, documentation and release tools, there were
approximately 1,500 lines of changes to 43 .pm, .t, .c and .h files.
Perl continues to flourish into its third decade thanks to a vibrant community
of users and developers. The following people are known to have contributed
the improvements that became Perl 5.20.3:
Alex Vandiver, Andy Dougherty, Aristotle Pagaltzis, Chris 'BinGOs' Williams,
Craig A. Berry, Dagfinn Ilmari Mannsåker, Daniel Dragan, David Mitchell,
Father Chrysostomos, H.Merijn Brand, James E Keenan, James McCoy, Jarkko
Hietaniemi, Karen Etheridge, Karl Williamson, kmx, Lajos Veres, Lukas Mai,
Matthew Horsfall, Petr Písař, Randy Stauner, Ricardo Signes, Sawyer X, Steve
Hay, Tony Cook, Yves Orton.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles recently
posted to the comp.lang.perl.misc newsgroup and the perl bug database at
https://rt.perl.org/ . There may also be information at
http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug> program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of C<perl -V>,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it
inappropriate to send to a publicly archived mailing list, then please send it
to perl5-security-report@perl.org. This points to a closed subscription
unarchived mailing list, which includes all the core committers, who will be
able to help assess the impact of issues, figure out a resolution, and help
co-ordinate the release of patches to mitigate or fix the problem across all
platforms on which Perl is supported. Please only use this address for
security issues in the Perl core, not for modules independently distributed on
CPAN.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details on
what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�ZG���{� {� perlthrtut.podnu �[��� =encoding utf8
=head1 NAME
perlthrtut - Tutorial on threads in Perl
=head1 DESCRIPTION
This tutorial describes the use of Perl interpreter threads (sometimes
referred to as I<ithreads>). In this
model, each thread runs in its own Perl interpreter, and any data sharing
between threads must be explicit. The user-level interface for I<ithreads>
uses the L<threads> class.
B<NOTE>: There was another older Perl threading flavor called the 5.005 model
that used the L<threads> class. This old model was known to have problems, is
deprecated, and was removed for release 5.10. You are
strongly encouraged to migrate any existing 5.005 threads code to the new
model as soon as possible.
You can see which (or neither) threading flavour you have by
running C<perl -V> and looking at the C<Platform> section.
If you have C<useithreads=define> you have ithreads, if you
have C<use5005threads=define> you have 5.005 threads.
If you have neither, you don't have any thread support built in.
If you have both, you are in trouble.
The L<threads> and L<threads::shared> modules are included in the core Perl
distribution. Additionally, they are maintained as a separate modules on
CPAN, so you can check there for any updates.
=head1 What Is A Thread Anyway?
A thread is a flow of control through a program with a single
execution point.
Sounds an awful lot like a process, doesn't it? Well, it should.
Threads are one of the pieces of a process. Every process has at least
one thread and, up until now, every process running Perl had only one
thread. With 5.8, though, you can create extra threads. We're going
to show you how, when, and why.
=head1 Threaded Program Models
There are three basic ways that you can structure a threaded
program. Which model you choose depends on what you need your program
to do. For many non-trivial threaded programs, you'll need to choose
different models for different pieces of your program.
=head2 Boss/Worker
The boss/worker model usually has one I<boss> thread and one or more
I<worker> threads. The boss thread gathers or generates tasks that need
to be done, then parcels those tasks out to the appropriate worker
thread.
This model is common in GUI and server programs, where a main thread
waits for some event and then passes that event to the appropriate
worker threads for processing. Once the event has been passed on, the
boss thread goes back to waiting for another event.
The boss thread does relatively little work. While tasks aren't
necessarily performed faster than with any other method, it tends to
have the best user-response times.
=head2 Work Crew
In the work crew model, several threads are created that do
essentially the same thing to different pieces of data. It closely
mirrors classical parallel processing and vector processors, where a
large array of processors do the exact same thing to many pieces of
data.
This model is particularly useful if the system running the program
will distribute multiple threads across different processors. It can
also be useful in ray tracing or rendering engines, where the
individual threads can pass on interim results to give the user visual
feedback.
=head2 Pipeline
The pipeline model divides up a task into a series of steps, and
passes the results of one step on to the thread processing the
next. Each thread does one thing to each piece of data and passes the
results to the next thread in line.
This model makes the most sense if you have multiple processors so two
or more threads will be executing in parallel, though it can often
make sense in other contexts as well. It tends to keep the individual
tasks small and simple, as well as allowing some parts of the pipeline
to block (on I/O or system calls, for example) while other parts keep
going. If you're running different parts of the pipeline on different
processors you may also take advantage of the caches on each
processor.
This model is also handy for a form of recursive programming where,
rather than having a subroutine call itself, it instead creates
another thread. Prime and Fibonacci generators both map well to this
form of the pipeline model. (A version of a prime number generator is
presented later on.)
=head1 What kind of threads are Perl threads?
If you have experience with other thread implementations, you might
find that things aren't quite what you expect. It's very important to
remember when dealing with Perl threads that I<Perl Threads Are Not X
Threads> for all values of X. They aren't POSIX threads, or
DecThreads, or Java's Green threads, or Win32 threads. There are
similarities, and the broad concepts are the same, but if you start
looking for implementation details you're going to be either
disappointed or confused. Possibly both.
This is not to say that Perl threads are completely different from
everything that's ever come before. They're not. Perl's threading
model owes a lot to other thread models, especially POSIX. Just as
Perl is not C, though, Perl threads are not POSIX threads. So if you
find yourself looking for mutexes, or thread priorities, it's time to
step back a bit and think about what you want to do and how Perl can
do it.
However, it is important to remember that Perl threads cannot magically
do things unless your operating system's threads allow it. So if your
system blocks the entire process on C<sleep()>, Perl usually will, as well.
B<Perl Threads Are Different.>
=head1 Thread-Safe Modules
The addition of threads has changed Perl's internals
substantially. There are implications for people who write
modules with XS code or external libraries. However, since Perl data is
not shared among threads by default, Perl modules stand a high chance of
being thread-safe or can be made thread-safe easily. Modules that are not
tagged as thread-safe should be tested or code reviewed before being used
in production code.
Not all modules that you might use are thread-safe, and you should
always assume a module is unsafe unless the documentation says
otherwise. This includes modules that are distributed as part of the
core. Threads are a relatively new feature, and even some of the standard
modules aren't thread-safe.
Even if a module is thread-safe, it doesn't mean that the module is optimized
to work well with threads. A module could possibly be rewritten to utilize
the new features in threaded Perl to increase performance in a threaded
environment.
If you're using a module that's not thread-safe for some reason, you
can protect yourself by using it from one, and only one thread at all.
If you need multiple threads to access such a module, you can use semaphores and
lots of programming discipline to control access to it. Semaphores
are covered in L</"Basic semaphores">.
See also L</"Thread-Safety of System Libraries">.
=head1 Thread Basics
The L<threads> module provides the basic functions you need to write
threaded programs. In the following sections, we'll cover the basics,
showing you what you need to do to create a threaded program. After
that, we'll go over some of the features of the L<threads> module that
make threaded programming easier.
=head2 Basic Thread Support
Thread support is a Perl compile-time option. It's something that's
turned on or off when Perl is built at your site, rather than when
your programs are compiled. If your Perl wasn't compiled with thread
support enabled, then any attempt to use threads will fail.
Your programs can use the Config module to check whether threads are
enabled. If your program can't run without them, you can say something
like:
use Config;
$Config{useithreads} or
die('Recompile Perl with threads to run this program.');
A possibly-threaded program using a possibly-threaded module might
have code like this:
use Config;
use MyMod;
BEGIN {
if ($Config{useithreads}) {
# We have threads
require MyMod_threaded;
import MyMod_threaded;
} else {
require MyMod_unthreaded;
import MyMod_unthreaded;
}
}
Since code that runs both with and without threads is usually pretty
messy, it's best to isolate the thread-specific code in its own
module. In our example above, that's what C<MyMod_threaded> is, and it's
only imported if we're running on a threaded Perl.
=head2 A Note about the Examples
In a real situation, care should be taken that all threads are finished
executing before the program exits. That care has B<not> been taken in these
examples in the interest of simplicity. Running these examples I<as is> will
produce error messages, usually caused by the fact that there are still
threads running when the program exits. You should not be alarmed by this.
=head2 Creating Threads
The L<threads> module provides the tools you need to create new
threads. Like any other module, you need to tell Perl that you want to use
it; C<use threads;> imports all the pieces you need to create basic
threads.
The simplest, most straightforward way to create a thread is with C<create()>:
use threads;
my $thr = threads->create(\&sub1);
sub sub1 {
print("In the thread\n");
}
The C<create()> method takes a reference to a subroutine and creates a new
thread that starts executing in the referenced subroutine. Control
then passes both to the subroutine and the caller.
If you need to, your program can pass parameters to the subroutine as
part of the thread startup. Just include the list of parameters as
part of the C<threads-E<gt>create()> call, like this:
use threads;
my $Param3 = 'foo';
my $thr1 = threads->create(\&sub1, 'Param 1', 'Param 2', $Param3);
my @ParamList = (42, 'Hello', 3.14);
my $thr2 = threads->create(\&sub1, @ParamList);
my $thr3 = threads->create(\&sub1, qw(Param1 Param2 Param3));
sub sub1 {
my @InboundParameters = @_;
print("In the thread\n");
print('Got parameters >', join('<>',@InboundParameters), "<\n");
}
The last example illustrates another feature of threads. You can spawn
off several threads using the same subroutine. Each thread executes
the same subroutine, but in a separate thread with a separate
environment and potentially separate arguments.
C<new()> is a synonym for C<create()>.
=head2 Waiting For A Thread To Exit
Since threads are also subroutines, they can return values. To wait
for a thread to exit and extract any values it might return, you can
use the C<join()> method:
use threads;
my ($thr) = threads->create(\&sub1);
my @ReturnData = $thr->join();
print('Thread returned ', join(', ', @ReturnData), "\n");
sub sub1 { return ('Fifty-six', 'foo', 2); }
In the example above, the C<join()> method returns as soon as the thread
ends. In addition to waiting for a thread to finish and gathering up
any values that the thread might have returned, C<join()> also performs
any OS cleanup necessary for the thread. That cleanup might be
important, especially for long-running programs that spawn lots of
threads. If you don't want the return values and don't want to wait
for the thread to finish, you should call the C<detach()> method
instead, as described next.
NOTE: In the example above, the thread returns a list, thus necessitating
that the thread creation call be made in list context (i.e., C<my ($thr)>).
See L<< threads/"$thr->join()" >> and L<threads/"THREAD CONTEXT"> for more
details on thread context and return values.
=head2 Ignoring A Thread
C<join()> does three things: it waits for a thread to exit, cleans up
after it, and returns any data the thread may have produced. But what
if you're not interested in the thread's return values, and you don't
really care when the thread finishes? All you want is for the thread
to get cleaned up after when it's done.
In this case, you use the C<detach()> method. Once a thread is detached,
it'll run until it's finished; then Perl will clean up after it
automatically.
use threads;
my $thr = threads->create(\&sub1); # Spawn the thread
$thr->detach(); # Now we officially don't care any more
sleep(15); # Let thread run for awhile
sub sub1 {
my $count = 0;
while (1) {
$count++;
print("\$count is $count\n");
sleep(1);
}
}
Once a thread is detached, it may not be joined, and any return data
that it might have produced (if it was done and waiting for a join) is
lost.
C<detach()> can also be called as a class method to allow a thread to
detach itself:
use threads;
my $thr = threads->create(\&sub1);
sub sub1 {
threads->detach();
# Do more work
}
=head2 Process and Thread Termination
With threads one must be careful to make sure they all have a chance to
run to completion, assuming that is what you want.
An action that terminates a process will terminate I<all> running
threads. die() and exit() have this property,
and perl does an exit when the main thread exits,
perhaps implicitly by falling off the end of your code,
even if that's not what you want.
As an example of this case, this code prints the message
"Perl exited with active threads: 2 running and unjoined":
use threads;
my $thr1 = threads->new(\&thrsub, "test1");
my $thr2 = threads->new(\&thrsub, "test2");
sub thrsub {
my ($message) = @_;
sleep 1;
print "thread $message\n";
}
But when the following lines are added at the end:
$thr1->join();
$thr2->join();
it prints two lines of output, a perhaps more useful outcome.
=head1 Threads And Data
Now that we've covered the basics of threads, it's time for our next
topic: Data. Threading introduces a couple of complications to data
access that non-threaded programs never need to worry about.
=head2 Shared And Unshared Data
The biggest difference between Perl I<ithreads> and the old 5.005 style
threading, or for that matter, to most other threading systems out there,
is that by default, no data is shared. When a new Perl thread is created,
all the data associated with the current thread is copied to the new
thread, and is subsequently private to that new thread!
This is similar in feel to what happens when a Unix process forks,
except that in this case, the data is just copied to a different part of
memory within the same process rather than a real fork taking place.
To make use of threading, however, one usually wants the threads to share
at least some data between themselves. This is done with the
L<threads::shared> module and the C<:shared> attribute:
use threads;
use threads::shared;
my $foo :shared = 1;
my $bar = 1;
threads->create(sub { $foo++; $bar++; })->join();
print("$foo\n"); # Prints 2 since $foo is shared
print("$bar\n"); # Prints 1 since $bar is not shared
In the case of a shared array, all the array's elements are shared, and for
a shared hash, all the keys and values are shared. This places
restrictions on what may be assigned to shared array and hash elements: only
simple values or references to shared variables are allowed - this is
so that a private variable can't accidentally become shared. A bad
assignment will cause the thread to die. For example:
use threads;
use threads::shared;
my $var = 1;
my $svar :shared = 2;
my %hash :shared;
... create some threads ...
$hash{a} = 1; # All threads see exists($hash{a})
# and $hash{a} == 1
$hash{a} = $var; # okay - copy-by-value: same effect as previous
$hash{a} = $svar; # okay - copy-by-value: same effect as previous
$hash{a} = \$svar; # okay - a reference to a shared variable
$hash{a} = \$var; # This will die
delete($hash{a}); # okay - all threads will see !exists($hash{a})
Note that a shared variable guarantees that if two or more threads try to
modify it at the same time, the internal state of the variable will not
become corrupted. However, there are no guarantees beyond this, as
explained in the next section.
=head2 Thread Pitfalls: Races
While threads bring a new set of useful tools, they also bring a
number of pitfalls. One pitfall is the race condition:
use threads;
use threads::shared;
my $x :shared = 1;
my $thr1 = threads->create(\&sub1);
my $thr2 = threads->create(\&sub2);
$thr1->join();
$thr2->join();
print("$x\n");
sub sub1 { my $foo = $x; $x = $foo + 1; }
sub sub2 { my $bar = $x; $x = $bar + 1; }
What do you think C<$x> will be? The answer, unfortunately, is I<it
depends>. Both C<sub1()> and C<sub2()> access the global variable C<$x>, once
to read and once to write. Depending on factors ranging from your
thread implementation's scheduling algorithm to the phase of the moon,
C<$x> can be 2 or 3.
Race conditions are caused by unsynchronized access to shared
data. Without explicit synchronization, there's no way to be sure that
nothing has happened to the shared data between the time you access it
and the time you update it. Even this simple code fragment has the
possibility of error:
use threads;
my $x :shared = 2;
my $y :shared;
my $z :shared;
my $thr1 = threads->create(sub { $y = $x; $x = $y + 1; });
my $thr2 = threads->create(sub { $z = $x; $x = $z + 1; });
$thr1->join();
$thr2->join();
Two threads both access C<$x>. Each thread can potentially be interrupted
at any point, or be executed in any order. At the end, C<$x> could be 3
or 4, and both C<$y> and C<$z> could be 2 or 3.
Even C<$x += 5> or C<$x++> are not guaranteed to be atomic.
Whenever your program accesses data or resources that can be accessed
by other threads, you must take steps to coordinate access or risk
data inconsistency and race conditions. Note that Perl will protect its
internals from your race conditions, but it won't protect you from you.
=head1 Synchronization and control
Perl provides a number of mechanisms to coordinate the interactions
between themselves and their data, to avoid race conditions and the like.
Some of these are designed to resemble the common techniques used in thread
libraries such as C<pthreads>; others are Perl-specific. Often, the
standard techniques are clumsy and difficult to get right (such as
condition waits). Where possible, it is usually easier to use Perlish
techniques such as queues, which remove some of the hard work involved.
=head2 Controlling access: lock()
The C<lock()> function takes a shared variable and puts a lock on it.
No other thread may lock the variable until the variable is unlocked
by the thread holding the lock. Unlocking happens automatically
when the locking thread exits the block that contains the call to the
C<lock()> function. Using C<lock()> is straightforward: This example has
several threads doing some calculations in parallel, and occasionally
updating a running total:
use threads;
use threads::shared;
my $total :shared = 0;
sub calc {
while (1) {
my $result;
# (... do some calculations and set $result ...)
{
lock($total); # Block until we obtain the lock
$total += $result;
} # Lock implicitly released at end of scope
last if $result == 0;
}
}
my $thr1 = threads->create(\&calc);
my $thr2 = threads->create(\&calc);
my $thr3 = threads->create(\&calc);
$thr1->join();
$thr2->join();
$thr3->join();
print("total=$total\n");
C<lock()> blocks the thread until the variable being locked is
available. When C<lock()> returns, your thread can be sure that no other
thread can lock that variable until the block containing the
lock exits.
It's important to note that locks don't prevent access to the variable
in question, only lock attempts. This is in keeping with Perl's
longstanding tradition of courteous programming, and the advisory file
locking that C<flock()> gives you.
You may lock arrays and hashes as well as scalars. Locking an array,
though, will not block subsequent locks on array elements, just lock
attempts on the array itself.
Locks are recursive, which means it's okay for a thread to
lock a variable more than once. The lock will last until the outermost
C<lock()> on the variable goes out of scope. For example:
my $x :shared;
doit();
sub doit {
{
{
lock($x); # Wait for lock
lock($x); # NOOP - we already have the lock
{
lock($x); # NOOP
{
lock($x); # NOOP
lockit_some_more();
}
}
} # *** Implicit unlock here ***
}
}
sub lockit_some_more {
lock($x); # NOOP
} # Nothing happens here
Note that there is no C<unlock()> function - the only way to unlock a
variable is to allow it to go out of scope.
A lock can either be used to guard the data contained within the variable
being locked, or it can be used to guard something else, like a section
of code. In this latter case, the variable in question does not hold any
useful data, and exists only for the purpose of being locked. In this
respect, the variable behaves like the mutexes and basic semaphores of
traditional thread libraries.
=head2 A Thread Pitfall: Deadlocks
Locks are a handy tool to synchronize access to data, and using them
properly is the key to safe shared data. Unfortunately, locks aren't
without their dangers, especially when multiple locks are involved.
Consider the following code:
use threads;
my $x :shared = 4;
my $y :shared = 'foo';
my $thr1 = threads->create(sub {
lock($x);
sleep(20);
lock($y);
});
my $thr2 = threads->create(sub {
lock($y);
sleep(20);
lock($x);
});
This program will probably hang until you kill it. The only way it
won't hang is if one of the two threads acquires both locks
first. A guaranteed-to-hang version is more complicated, but the
principle is the same.
The first thread will grab a lock on C<$x>, then, after a pause during which
the second thread has probably had time to do some work, try to grab a
lock on C<$y>. Meanwhile, the second thread grabs a lock on C<$y>, then later
tries to grab a lock on C<$x>. The second lock attempt for both threads will
block, each waiting for the other to release its lock.
This condition is called a deadlock, and it occurs whenever two or
more threads are trying to get locks on resources that the others
own. Each thread will block, waiting for the other to release a lock
on a resource. That never happens, though, since the thread with the
resource is itself waiting for a lock to be released.
There are a number of ways to handle this sort of problem. The best
way is to always have all threads acquire locks in the exact same
order. If, for example, you lock variables C<$x>, C<$y>, and C<$z>, always lock
C<$x> before C<$y>, and C<$y> before C<$z>. It's also best to hold on to locks for
as short a period of time to minimize the risks of deadlock.
The other synchronization primitives described below can suffer from
similar problems.
=head2 Queues: Passing Data Around
A queue is a special thread-safe object that lets you put data in one
end and take it out the other without having to worry about
synchronization issues. They're pretty straightforward, and look like
this:
use threads;
use Thread::Queue;
my $DataQueue = Thread::Queue->new();
my $thr = threads->create(sub {
while (my $DataElement = $DataQueue->dequeue()) {
print("Popped $DataElement off the queue\n");
}
});
$DataQueue->enqueue(12);
$DataQueue->enqueue("A", "B", "C");
sleep(10);
$DataQueue->enqueue(undef);
$thr->join();
You create the queue with C<Thread::Queue-E<gt>new()>. Then you can
add lists of scalars onto the end with C<enqueue()>, and pop scalars off
the front of it with C<dequeue()>. A queue has no fixed size, and can grow
as needed to hold everything pushed on to it.
If a queue is empty, C<dequeue()> blocks until another thread enqueues
something. This makes queues ideal for event loops and other
communications between threads.
=head2 Semaphores: Synchronizing Data Access
Semaphores are a kind of generic locking mechanism. In their most basic
form, they behave very much like lockable scalars, except that they
can't hold data, and that they must be explicitly unlocked. In their
advanced form, they act like a kind of counter, and can allow multiple
threads to have the I<lock> at any one time.
=head2 Basic semaphores
Semaphores have two methods, C<down()> and C<up()>: C<down()> decrements the resource
count, while C<up()> increments it. Calls to C<down()> will block if the
semaphore's current count would decrement below zero. This program
gives a quick demonstration:
use threads;
use Thread::Semaphore;
my $semaphore = Thread::Semaphore->new();
my $GlobalVariable :shared = 0;
$thr1 = threads->create(\&sample_sub, 1);
$thr2 = threads->create(\&sample_sub, 2);
$thr3 = threads->create(\&sample_sub, 3);
sub sample_sub {
my $SubNumber = shift(@_);
my $TryCount = 10;
my $LocalCopy;
sleep(1);
while ($TryCount--) {
$semaphore->down();
$LocalCopy = $GlobalVariable;
print("$TryCount tries left for sub $SubNumber "
."(\$GlobalVariable is $GlobalVariable)\n");
sleep(2);
$LocalCopy++;
$GlobalVariable = $LocalCopy;
$semaphore->up();
}
}
$thr1->join();
$thr2->join();
$thr3->join();
The three invocations of the subroutine all operate in sync. The
semaphore, though, makes sure that only one thread is accessing the
global variable at once.
=head2 Advanced Semaphores
By default, semaphores behave like locks, letting only one thread
C<down()> them at a time. However, there are other uses for semaphores.
Each semaphore has a counter attached to it. By default, semaphores are
created with the counter set to one, C<down()> decrements the counter by
one, and C<up()> increments by one. However, we can override any or all
of these defaults simply by passing in different values:
use threads;
use Thread::Semaphore;
my $semaphore = Thread::Semaphore->new(5);
# Creates a semaphore with the counter set to five
my $thr1 = threads->create(\&sub1);
my $thr2 = threads->create(\&sub1);
sub sub1 {
$semaphore->down(5); # Decrements the counter by five
# Do stuff here
$semaphore->up(5); # Increment the counter by five
}
$thr1->detach();
$thr2->detach();
If C<down()> attempts to decrement the counter below zero, it blocks until
the counter is large enough. Note that while a semaphore can be created
with a starting count of zero, any C<up()> or C<down()> always changes the
counter by at least one, and so C<< $semaphore->down(0) >> is the same as
C<< $semaphore->down(1) >>.
The question, of course, is why would you do something like this? Why
create a semaphore with a starting count that's not one, or why
decrement or increment it by more than one? The answer is resource
availability. Many resources that you want to manage access for can be
safely used by more than one thread at once.
For example, let's take a GUI driven program. It has a semaphore that
it uses to synchronize access to the display, so only one thread is
ever drawing at once. Handy, but of course you don't want any thread
to start drawing until things are properly set up. In this case, you
can create a semaphore with a counter set to zero, and up it when
things are ready for drawing.
Semaphores with counters greater than one are also useful for
establishing quotas. Say, for example, that you have a number of
threads that can do I/O at once. You don't want all the threads
reading or writing at once though, since that can potentially swamp
your I/O channels, or deplete your process's quota of filehandles. You
can use a semaphore initialized to the number of concurrent I/O
requests (or open files) that you want at any one time, and have your
threads quietly block and unblock themselves.
Larger increments or decrements are handy in those cases where a
thread needs to check out or return a number of resources at once.
=head2 Waiting for a Condition
The functions C<cond_wait()> and C<cond_signal()>
can be used in conjunction with locks to notify
co-operating threads that a resource has become available. They are
very similar in use to the functions found in C<pthreads>. However
for most purposes, queues are simpler to use and more intuitive. See
L<threads::shared> for more details.
=head2 Giving up control
There are times when you may find it useful to have a thread
explicitly give up the CPU to another thread. You may be doing something
processor-intensive and want to make sure that the user-interface thread
gets called frequently. Regardless, there are times that you might want
a thread to give up the processor.
Perl's threading package provides the C<yield()> function that does
this. C<yield()> is pretty straightforward, and works like this:
use threads;
sub loop {
my $thread = shift;
my $foo = 50;
while($foo--) { print("In thread $thread\n"); }
threads->yield();
$foo = 50;
while($foo--) { print("In thread $thread\n"); }
}
my $thr1 = threads->create(\&loop, 'first');
my $thr2 = threads->create(\&loop, 'second');
my $thr3 = threads->create(\&loop, 'third');
It is important to remember that C<yield()> is only a hint to give up the CPU,
it depends on your hardware, OS and threading libraries what actually happens.
B<On many operating systems, yield() is a no-op.> Therefore it is important
to note that one should not build the scheduling of the threads around
C<yield()> calls. It might work on your platform but it won't work on another
platform.
=head1 General Thread Utility Routines
We've covered the workhorse parts of Perl's threading package, and
with these tools you should be well on your way to writing threaded
code and packages. There are a few useful little pieces that didn't
really fit in anyplace else.
=head2 What Thread Am I In?
The C<threads-E<gt>self()> class method provides your program with a way to
get an object representing the thread it's currently in. You can use this
object in the same way as the ones returned from thread creation.
=head2 Thread IDs
C<tid()> is a thread object method that returns the thread ID of the
thread the object represents. Thread IDs are integers, with the main
thread in a program being 0. Currently Perl assigns a unique TID to
every thread ever created in your program, assigning the first thread
to be created a TID of 1, and increasing the TID by 1 for each new
thread that's created. When used as a class method, C<threads-E<gt>tid()>
can be used by a thread to get its own TID.
=head2 Are These Threads The Same?
The C<equal()> method takes two thread objects and returns true
if the objects represent the same thread, and false if they don't.
Thread objects also have an overloaded C<==> comparison so that you can do
comparison on them as you would with normal objects.
=head2 What Threads Are Running?
C<threads-E<gt>list()> returns a list of thread objects, one for each thread
that's currently running and not detached. Handy for a number of things,
including cleaning up at the end of your program (from the main Perl thread,
of course):
# Loop through all the threads
foreach my $thr (threads->list()) {
$thr->join();
}
If some threads have not finished running when the main Perl thread
ends, Perl will warn you about it and die, since it is impossible for Perl
to clean up itself while other threads are running.
NOTE: The main Perl thread (thread 0) is in a I<detached> state, and so
does not appear in the list returned by C<threads-E<gt>list()>.
=head1 A Complete Example
Confused yet? It's time for an example program to show some of the
things we've covered. This program finds prime numbers using threads.
1 #!/usr/bin/perl
2 # prime-pthread, courtesy of Tom Christiansen
3
4 use strict;
5 use warnings;
6
7 use threads;
8 use Thread::Queue;
9
10 sub check_num {
11 my ($upstream, $cur_prime) = @_;
12 my $kid;
13 my $downstream = Thread::Queue->new();
14 while (my $num = $upstream->dequeue()) {
15 next unless ($num % $cur_prime);
16 if ($kid) {
17 $downstream->enqueue($num);
18 } else {
19 print("Found prime: $num\n");
20 $kid = threads->create(\&check_num, $downstream, $num);
21 if (! $kid) {
22 warn("Sorry. Ran out of threads.\n");
23 last;
24 }
25 }
26 }
27 if ($kid) {
28 $downstream->enqueue(undef);
29 $kid->join();
30 }
31 }
32
33 my $stream = Thread::Queue->new(3..1000, undef);
34 check_num($stream, 2);
This program uses the pipeline model to generate prime numbers. Each
thread in the pipeline has an input queue that feeds numbers to be
checked, a prime number that it's responsible for, and an output queue
into which it funnels numbers that have failed the check. If the thread
has a number that's failed its check and there's no child thread, then
the thread must have found a new prime number. In that case, a new
child thread is created for that prime and stuck on the end of the
pipeline.
This probably sounds a bit more confusing than it really is, so let's
go through this program piece by piece and see what it does. (For
those of you who might be trying to remember exactly what a prime
number is, it's a number that's only evenly divisible by itself and 1.)
The bulk of the work is done by the C<check_num()> subroutine, which
takes a reference to its input queue and a prime number that it's
responsible for. After pulling in the input queue and the prime that
the subroutine is checking (line 11), we create a new queue (line 13)
and reserve a scalar for the thread that we're likely to create later
(line 12).
The while loop from line 14 to line 26 grabs a scalar off the input
queue and checks against the prime this thread is responsible
for. Line 15 checks to see if there's a remainder when we divide the
number to be checked by our prime. If there is one, the number
must not be evenly divisible by our prime, so we need to either pass
it on to the next thread if we've created one (line 17) or create a
new thread if we haven't.
The new thread creation is line 20. We pass on to it a reference to
the queue we've created, and the prime number we've found. In lines 21
through 24, we check to make sure that our new thread got created, and
if not, we stop checking any remaining numbers in the queue.
Finally, once the loop terminates (because we got a 0 or C<undef> in the
queue, which serves as a note to terminate), we pass on the notice to our
child, and wait for it to exit if we've created a child (lines 27 and
30).
Meanwhile, back in the main thread, we first create a queue (line 33) and
queue up all the numbers from 3 to 1000 for checking, plus a termination
notice. Then all we have to do to get the ball rolling is pass the queue
and the first prime to the C<check_num()> subroutine (line 34).
That's how it works. It's pretty simple; as with many Perl programs,
the explanation is much longer than the program.
=head1 Different implementations of threads
Some background on thread implementations from the operating system
viewpoint. There are three basic categories of threads: user-mode threads,
kernel threads, and multiprocessor kernel threads.
User-mode threads are threads that live entirely within a program and
its libraries. In this model, the OS knows nothing about threads. As
far as it's concerned, your process is just a process.
This is the easiest way to implement threads, and the way most OSes
start. The big disadvantage is that, since the OS knows nothing about
threads, if one thread blocks they all do. Typical blocking activities
include most system calls, most I/O, and things like C<sleep()>.
Kernel threads are the next step in thread evolution. The OS knows
about kernel threads, and makes allowances for them. The main
difference between a kernel thread and a user-mode thread is
blocking. With kernel threads, things that block a single thread don't
block other threads. This is not the case with user-mode threads,
where the kernel blocks at the process level and not the thread level.
This is a big step forward, and can give a threaded program quite a
performance boost over non-threaded programs. Threads that block
performing I/O, for example, won't block threads that are doing other
things. Each process still has only one thread running at once,
though, regardless of how many CPUs a system might have.
Since kernel threading can interrupt a thread at any time, they will
uncover some of the implicit locking assumptions you may make in your
program. For example, something as simple as C<$x = $x + 2> can behave
unpredictably with kernel threads if C<$x> is visible to other
threads, as another thread may have changed C<$x> between the time it
was fetched on the right hand side and the time the new value is
stored.
Multiprocessor kernel threads are the final step in thread
support. With multiprocessor kernel threads on a machine with multiple
CPUs, the OS may schedule two or more threads to run simultaneously on
different CPUs.
This can give a serious performance boost to your threaded program,
since more than one thread will be executing at the same time. As a
tradeoff, though, any of those nagging synchronization issues that
might not have shown with basic kernel threads will appear with a
vengeance.
In addition to the different levels of OS involvement in threads,
different OSes (and different thread implementations for a particular
OS) allocate CPU cycles to threads in different ways.
Cooperative multitasking systems have running threads give up control
if one of two things happen. If a thread calls a yield function, it
gives up control. It also gives up control if the thread does
something that would cause it to block, such as perform I/O. In a
cooperative multitasking implementation, one thread can starve all the
others for CPU time if it so chooses.
Preemptive multitasking systems interrupt threads at regular intervals
while the system decides which thread should run next. In a preemptive
multitasking system, one thread usually won't monopolize the CPU.
On some systems, there can be cooperative and preemptive threads
running simultaneously. (Threads running with realtime priorities
often behave cooperatively, for example, while threads running at
normal priorities behave preemptively.)
Most modern operating systems support preemptive multitasking nowadays.
=head1 Performance considerations
The main thing to bear in mind when comparing Perl's I<ithreads> to other threading
models is the fact that for each new thread created, a complete copy of
all the variables and data of the parent thread has to be taken. Thus,
thread creation can be quite expensive, both in terms of memory usage and
time spent in creation. The ideal way to reduce these costs is to have a
relatively short number of long-lived threads, all created fairly early
on (before the base thread has accumulated too much data). Of course, this
may not always be possible, so compromises have to be made. However, after
a thread has been created, its performance and extra memory usage should
be little different than ordinary code.
Also note that under the current implementation, shared variables
use a little more memory and are a little slower than ordinary variables.
=head1 Process-scope Changes
Note that while threads themselves are separate execution threads and
Perl data is thread-private unless explicitly shared, the threads can
affect process-scope state, affecting all the threads.
The most common example of this is changing the current working
directory using C<chdir()>. One thread calls C<chdir()>, and the working
directory of all the threads changes.
Even more drastic example of a process-scope change is C<chroot()>:
the root directory of all the threads changes, and no thread can
undo it (as opposed to C<chdir()>).
Further examples of process-scope changes include C<umask()> and
changing uids and gids.
Thinking of mixing C<fork()> and threads? Please lie down and wait
until the feeling passes. Be aware that the semantics of C<fork()> vary
between platforms. For example, some Unix systems copy all the current
threads into the child process, while others only copy the thread that
called C<fork()>. You have been warned!
Similarly, mixing signals and threads may be problematic.
Implementations are platform-dependent, and even the POSIX
semantics may not be what you expect (and Perl doesn't even
give you the full POSIX API). For example, there is no way to
guarantee that a signal sent to a multi-threaded Perl application
will get intercepted by any particular thread. (However, a recently
added feature does provide the capability to send signals between
threads. See L<threads/THREAD SIGNALLING> for more details.)
=head1 Thread-Safety of System Libraries
Whether various library calls are thread-safe is outside the control
of Perl. Calls often suffering from not being thread-safe include:
C<localtime()>, C<gmtime()>, functions fetching user, group and
network information (such as C<getgrent()>, C<gethostent()>,
C<getnetent()> and so on), C<readdir()>, C<rand()>, and C<srand()>. In
general, calls that depend on some global external state.
If the system Perl is compiled in has thread-safe variants of such
calls, they will be used. Beyond that, Perl is at the mercy of
the thread-safety or -unsafety of the calls. Please consult your
C library call documentation.
On some platforms the thread-safe library interfaces may fail if the
result buffer is too small (for example the user group databases may
be rather large, and the reentrant interfaces may have to carry around
a full snapshot of those databases). Perl will start with a small
buffer, but keep retrying and growing the result buffer
until the result fits. If this limitless growing sounds bad for
security or memory consumption reasons you can recompile Perl with
C<PERL_REENTRANT_MAXSIZE> defined to the maximum number of bytes you will
allow.
=head1 Conclusion
A complete thread tutorial could fill a book (and has, many times),
but with what we've covered in this introduction, you should be well
on your way to becoming a threaded Perl expert.
=head1 SEE ALSO
Annotated POD for L<threads>:
L<http://annocpan.org/?mode=search&field=Module&name=threads>
Latest version of L<threads> on CPAN:
L<http://search.cpan.org/search?module=threads>
Annotated POD for L<threads::shared>:
L<http://annocpan.org/?mode=search&field=Module&name=threads%3A%3Ashared>
Latest version of L<threads::shared> on CPAN:
L<http://search.cpan.org/search?module=threads%3A%3Ashared>
Perl threads mailing list:
L<http://lists.perl.org/list/ithreads.html>
=head1 Bibliography
Here's a short bibliography courtesy of Jürgen Christoffel:
=head2 Introductory Texts
Birrell, Andrew D. An Introduction to Programming with
Threads. Digital Equipment Corporation, 1989, DEC-SRC Research Report
#35 online as
L<ftp://ftp.dec.com/pub/DEC/SRC/research-reports/SRC-035.pdf>
(highly recommended)
Robbins, Kay. A., and Steven Robbins. Practical Unix Programming: A
Guide to Concurrency, Communication, and
Multithreading. Prentice-Hall, 1996.
Lewis, Bill, and Daniel J. Berg. Multithreaded Programming with
Pthreads. Prentice Hall, 1997, ISBN 0-13-443698-9 (a well-written
introduction to threads).
Nelson, Greg (editor). Systems Programming with Modula-3. Prentice
Hall, 1991, ISBN 0-13-590464-1.
Nichols, Bradford, Dick Buttlar, and Jacqueline Proulx Farrell.
Pthreads Programming. O'Reilly & Associates, 1996, ISBN 156592-115-1
(covers POSIX threads).
=head2 OS-Related References
Boykin, Joseph, David Kirschen, Alan Langerman, and Susan
LoVerso. Programming under Mach. Addison-Wesley, 1994, ISBN
0-201-52739-1.
Tanenbaum, Andrew S. Distributed Operating Systems. Prentice Hall,
1995, ISBN 0-13-219908-4 (great textbook).
Silberschatz, Abraham, and Peter B. Galvin. Operating System Concepts,
4th ed. Addison-Wesley, 1995, ISBN 0-201-59292-4
=head2 Other References
Arnold, Ken and James Gosling. The Java Programming Language, 2nd
ed. Addison-Wesley, 1998, ISBN 0-201-31006-6.
comp.programming.threads FAQ,
L<http://www.serpentine.com/~bos/threads-faq/>
Le Sergent, T. and B. Berthomieu. "Incremental MultiThreaded Garbage
Collection on Virtually Shared Memory Architectures" in Memory
Management: Proc. of the International Workshop IWMM 92, St. Malo,
France, September 1992, Yves Bekkers and Jacques Cohen, eds. Springer,
1992, ISBN 3540-55940-X (real-life thread applications).
Artur Bergman, "Where Wizards Fear To Tread", June 11, 2002,
L<http://www.perl.com/pub/a/2002/06/11/threads.html>
=head1 Acknowledgements
Thanks (in no particular order) to Chaim Frenkel, Steve Fink, Gurusamy
Sarathy, Ilya Zakharevich, Benjamin Sugars, Jürgen Christoffel, Joshua
Pritikin, and Alan Burlison, for their help in reality-checking and
polishing this article. Big thanks to Tom Christiansen for his rewrite
of the prime number generator.
=head1 AUTHOR
Dan Sugalski E<lt>dan@sidhe.org<gt>
Slightly modified by Arthur Bergman to fit the new thread model/module.
Reworked slightly by Jörg Walter E<lt>jwalt@cpan.org<gt> to be more concise
about thread-safety of Perl code.
Rearranged slightly by Elizabeth Mattijsen E<lt>liz@dijkmat.nl<gt> to put
less emphasis on yield().
=head1 Copyrights
The original version of this article originally appeared in The Perl
Journal #10, and is copyright 1998 The Perl Journal. It appears courtesy
of Jon Orwant and The Perl Journal. This document may be distributed
under the same terms as Perl itself.
=cut
PK z3�Zj*�#8 8 perlpragma.podnu �[��� =head1 NAME
perlpragma - how to write a user pragma
=head1 DESCRIPTION
A pragma is a module which influences some aspect of the compile time or run
time behaviour of Perl, such as C<strict> or C<warnings>. With Perl 5.10 you
are no longer limited to the built in pragmata; you can now create user
pragmata that modify the behaviour of user functions within a lexical scope.
=head1 A basic example
For example, say you need to create a class implementing overloaded
mathematical operators, and would like to provide your own pragma that
functions much like C<use integer;> You'd like this code
use MyMaths;
my $l = MyMaths->new(1.2);
my $r = MyMaths->new(3.4);
print "A: ", $l + $r, "\n";
use myint;
print "B: ", $l + $r, "\n";
{
no myint;
print "C: ", $l + $r, "\n";
}
print "D: ", $l + $r, "\n";
no myint;
print "E: ", $l + $r, "\n";
to give the output
A: 4.6
B: 4
C: 4.6
D: 4
E: 4.6
I<i.e.>, where C<use myint;> is in effect, addition operations are forced
to integer, whereas by default they are not, with the default behaviour being
restored via C<no myint;>
The minimal implementation of the package C<MyMaths> would be something like
this:
package MyMaths;
use warnings;
use strict;
use myint();
use overload '+' => sub {
my ($l, $r) = @_;
# Pass 1 to check up one call level from here
if (myint::in_effect(1)) {
int($$l) + int($$r);
} else {
$$l + $$r;
}
};
sub new {
my ($class, $value) = @_;
bless \$value, $class;
}
1;
Note how we load the user pragma C<myint> with an empty list C<()> to
prevent its C<import> being called.
The interaction with the Perl compilation happens inside package C<myint>:
package myint;
use strict;
use warnings;
sub import {
$^H{"myint/in_effect"} = 1;
}
sub unimport {
$^H{"myint/in_effect"} = 0;
}
sub in_effect {
my $level = shift // 0;
my $hinthash = (caller($level))[10];
return $hinthash->{"myint/in_effect"};
}
1;
As pragmata are implemented as modules, like any other module, C<use myint;>
becomes
BEGIN {
require myint;
myint->import();
}
and C<no myint;> is
BEGIN {
require myint;
myint->unimport();
}
Hence the C<import> and C<unimport> routines are called at B<compile time>
for the user's code.
User pragmata store their state by writing to the magical hash C<%^H>,
hence these two routines manipulate it. The state information in C<%^H> is
stored in the optree, and can be retrieved read-only at runtime with C<caller()>,
at index 10 of the list of returned results. In the example pragma, retrieval
is encapsulated into the routine C<in_effect()>, which takes as parameter
the number of call frames to go up to find the value of the pragma in the
user's script. This uses C<caller()> to determine the value of
C<$^H{"myint/in_effect"}> when each line of the user's script was called, and
therefore provide the correct semantics in the subroutine implementing the
overloaded addition.
=head1 Key naming
There is only a single C<%^H>, but arbitrarily many modules that want
to use its scoping semantics. To avoid stepping on each other's toes,
they need to be sure to use different keys in the hash. It is therefore
conventional for a module to use only keys that begin with the module's
name (the name of its main package) and a "/" character. After this
module-identifying prefix, the rest of the key is entirely up to the
module: it may include any characters whatsoever. For example, a module
C<Foo::Bar> should use keys such as C<Foo::Bar/baz> and C<Foo::Bar/$%/_!>.
Modules following this convention all play nicely with each other.
The Perl core uses a handful of keys in C<%^H> which do not follow this
convention, because they predate it. Keys that follow the convention
won't conflict with the core's historical keys.
=head1 Implementation details
The optree is shared between threads. This means there is a possibility that
the optree will outlive the particular thread (and therefore the interpreter
instance) that created it, so true Perl scalars cannot be stored in the
optree. Instead a compact form is used, which can only store values that are
integers (signed and unsigned), strings or C<undef> - references and
floating point values are stringified. If you need to store multiple values
or complex structures, you should serialise them, for example with C<pack>.
The deletion of a hash key from C<%^H> is recorded, and as ever can be
distinguished from the existence of a key with value C<undef> with
C<exists>.
B<Don't> attempt to store references to data structures as integers which
are retrieved via C<caller> and converted back, as this will not be threadsafe.
Accesses would be to the structure without locking (which is not safe for
Perl's scalars), and either the structure has to leak, or it has to be
freed when its creating thread terminates, which may be before the optree
referencing it is deleted, if other threads outlive it.
PK z3�Z��}l l perl5224delta.podnu �[��� =encoding utf8
=head1 NAME
perl5224delta - what is new for perl v5.22.4
=head1 DESCRIPTION
This document describes differences between the 5.22.3 release and the 5.22.4
release.
If you are upgrading from an earlier release such as 5.22.2, first read
L<perl5223delta>, which describes differences between 5.22.2 and 5.22.3.
=head1 Security
=head2 Improved handling of '.' in @INC in base.pm
The handling of (the removal of) C<'.'> in C<@INC> in L<base> has been
improved. This resolves some problematic behaviour in the approach taken in
Perl 5.22.3, which is probably best described in the following two threads on
the Perl 5 Porters mailing list:
L<http://www.nntp.perl.org/group/perl.perl5.porters/2016/08/msg238991.html>,
L<http://www.nntp.perl.org/group/perl.perl5.porters/2016/10/msg240297.html>.
=head2 "Escaped" colons and relative paths in PATH
On Unix systems, Perl treats any relative paths in the PATH environment
variable as tainted when starting a new process. Previously, it was allowing a
backslash to escape a colon (unlike the OS), consequently allowing relative
paths to be considered safe if the PATH was set to something like C</\:.>. The
check has been fixed to treat C<.> as tainted in that example.
=head1 Modules and Pragmata
=head2 Updated Modules and Pragmata
=over 4
=item *
L<base> has been upgraded from version 2.22 to 2.22_01.
=item *
L<Module::CoreList> has been upgraded from version 5.20170114_22 to 5.20170715_22.
=back
=head1 Selected Bug Fixes
=over 4
=item *
Fixed a crash with C<s///l> where it thought it was dealing with UTF-8 when it
wasn't.
L<[perl #129038]|https://rt.perl.org/Ticket/Display.html?id=129038>
=back
=head1 Acknowledgements
Perl 5.22.4 represents approximately 6 months of development since Perl 5.22.3
and contains approximately 2,200 lines of changes across 52 files from 16
authors.
Excluding auto-generated files, documentation and release tools, there were
approximately 970 lines of changes to 18 .pm, .t, .c and .h files.
Perl continues to flourish into its third decade thanks to a vibrant community
of users and developers. The following people are known to have contributed
the improvements that became Perl 5.22.4:
Aaron Crane, Abigail, Aristotle Pagaltzis, Chris 'BinGOs' Williams, David
Mitchell, Eric Herman, Father Chrysostomos, James E Keenan, Karl Williamson,
Lukas Mai, Renee Baecker, Ricardo Signes, Sawyer X, Stevan Little, Steve Hay,
Tony Cook.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles recently
posted to the comp.lang.perl.misc newsgroup and the perl bug database at
https://rt.perl.org/ . There may also be information at
http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug> program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of C<perl -V>,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it
inappropriate to send to a publicly archived mailing list, then please send it
to perl5-security-report@perl.org. This points to a closed subscription
unarchived mailing list, which includes all the core committers, who will be
able to help assess the impact of issues, figure out a resolution, and help
co-ordinate the release of patches to mitigate or fix the problem across all
platforms on which Perl is supported. Please only use this address for
security issues in the Perl core, not for modules independently distributed on
CPAN.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details on
what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Ze���� � perl5142delta.podnu �[��� =encoding utf8
=head1 NAME
perl5142delta - what is new for perl v5.14.2
=head1 DESCRIPTION
This document describes differences between the 5.14.1 release and
the 5.14.2 release.
If you are upgrading from an earlier release such as 5.14.0, first read
L<perl5141delta>, which describes differences between 5.14.0 and
5.14.1.
=head1 Core Enhancements
No changes since 5.14.0.
=head1 Security
=head2 C<File::Glob::bsd_glob()> memory error with GLOB_ALTDIRFUNC (CVE-2011-2728).
Calling C<File::Glob::bsd_glob> with the unsupported flag GLOB_ALTDIRFUNC would
cause an access violation / segfault. A Perl program that accepts a flags value from
an external source could expose itself to denial of service or arbitrary code
execution attacks. There are no known exploits in the wild. The problem has been
corrected by explicitly disabling all unsupported flags and setting unused function
pointers to null. Bug reported by Clément Lecigne.
=head2 C<Encode> decode_xs n-byte heap-overflow (CVE-2011-2939)
A bug in C<Encode> could, on certain inputs, cause the heap to overflow.
This problem has been corrected. Bug reported by Robert Zacek.
=head1 Incompatible Changes
There are no changes intentionally incompatible with 5.14.0. If any
exist, they are bugs and reports are welcome.
=head1 Deprecations
There have been no deprecations since 5.14.0.
=head1 Modules and Pragmata
=head2 New Modules and Pragmata
None
=head2 Updated Modules and Pragmata
=over 4
=item *
L<CPAN> has been upgraded from version 1.9600 to version 1.9600_01.
L<CPAN::Distribution> has been upgraded from version 1.9602 to 1.9602_01.
Backported bugfixes from CPAN version 1.9800. Ensures proper
detection of C<configure_requires> prerequisites from CPAN Meta files
in the case where C<dynamic_config> is true. [rt.cpan.org #68835]
Also ensures that C<configure_requires> is only checked in META files,
not MYMETA files, so protect against MYMETA generation that drops
C<configure_requires>.
=item *
L<Encode> has been upgraded from version 2.42 to 2.42_01.
See L</Security>.
=item *
L<File::Glob> has been upgraded from version 1.12 to version 1.13.
See L</Security>.
=item *
L<PerlIO::scalar> has been upgraded from version 0.11 to 0.11_01.
It fixes a problem with C<< open my $fh, ">", \$scalar >> not working if
C<$scalar> is a copy-on-write scalar.
=back
=head2 Removed Modules and Pragmata
None
=head1 Platform Support
=head2 New Platforms
None
=head2 Discontinued Platforms
None
=head2 Platform-Specific Notes
=over 4
=item HP-UX PA-RISC/64 now supports gcc-4.x
A fix to correct the socketsize now makes the test suite pass on HP-UX
PA-RISC for 64bitall builds.
=item Building on OS X 10.7 Lion and Xcode 4 works again
The build system has been updated to work with the build tools under Mac OS X
10.7.
=back
=head1 Bug Fixes
=over 4
=item *
In @INC filters (subroutines returned by subroutines in @INC), $_ used to
misbehave: If returned from a subroutine, it would not be copied, but the
variable itself would be returned; and freeing $_ (e.g., with C<undef *_>)
would cause perl to crash. This has been fixed [perl #91880].
=item *
Perl 5.10.0 introduced some faulty logic that made "U*" in the middle of
a pack template equivalent to "U0" if the input string was empty. This has
been fixed [perl #90160].
=item *
C<caller> no longer leaks memory when called from the DB package if
C<@DB::args> was assigned to after the first call to C<caller>. L<Carp>
was triggering this bug [perl #97010].
=item *
C<utf8::decode> had a nasty bug that would modify copy-on-write scalars'
string buffers in place (i.e., skipping the copy). This could result in
hashes having two elements with the same key [perl #91834].
=item *
Localising a tied variable used to make it read-only if it contained a
copy-on-write string.
=item *
Elements of restricted hashes (see the L<fields> pragma) containing
copy-on-write values couldn't be deleted, nor could such hashes be cleared
(C<%hash = ()>).
=item *
Locking a hash element that is a glob copy no longer causes subsequent
assignment to it to corrupt the glob.
=item *
A panic involving the combination of the regular expression modifiers
C</aa> introduced in 5.14.0 and the C<\b> escape sequence has been
fixed [perl #95964].
=back
=head1 Known Problems
This is a list of some significant unfixed bugs, which are regressions
from 5.12.0.
=over 4
=item *
C<PERL_GLOBAL_STRUCT> is broken.
Since perl 5.14.0, building with C<-DPERL_GLOBAL_STRUCT> hasn't been
possible. This means that perl currently doesn't work on any platforms that
require it to be built this way, including Symbian.
While C<PERL_GLOBAL_STRUCT> now works again on recent development versions of
perl, it actually working on Symbian again hasn't been verified.
We'd be very interested in hearing from anyone working with Perl on Symbian.
=back
=head1 Acknowledgements
Perl 5.14.2 represents approximately three months of development since
Perl 5.14.1 and contains approximately 1200 lines of changes
across 61 files from 9 authors.
Perl continues to flourish into its third decade thanks to a vibrant
community of users and developers. The following people are known to
have contributed the improvements that became Perl 5.14.2:
Craig A. Berry, David Golden, Father Chrysostomos, Florian Ragwitz, H.Merijn
Brand, Karl Williamson, Nicholas Clark, Pau Amma and Ricardo Signes.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles
recently posted to the comp.lang.perl.misc newsgroup and the perl
bug database at http://rt.perl.org/perlbug/ . There may also be
information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug>
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it
inappropriate to send to a publicly archived mailing list, then please send
it to perl5-security-report@perl.org. This points to a closed subscription
unarchived mailing list, which includes all the core committers, who be able
to help assess the impact of issues, figure out a resolution, and help
co-ordinate the release of patches to mitigate or fix the problem across all
platforms on which Perl is supported. Please only use this address for
security issues in the Perl core, not for modules independently
distributed on CPAN.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details
on what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Z`NJ�� � perl587delta.podnu �[��� =head1 NAME
perl587delta - what is new for perl v5.8.7
=head1 DESCRIPTION
This document describes differences between the 5.8.6 release and
the 5.8.7 release.
=head1 Incompatible Changes
There are no changes incompatible with 5.8.6.
=head1 Core Enhancements
=head2 Unicode Character Database 4.1.0
The copy of the Unicode Character Database included in Perl 5.8 has
been updated to 4.1.0 from 4.0.1. See
L<http://www.unicode.org/versions/Unicode4.1.0/#NotableChanges> for the
notable changes.
=head2 suidperl less insecure
A pair of exploits in C<suidperl> involving debugging code have been closed.
For new projects the core perl team strongly recommends that you use
dedicated, single purpose security tools such as C<sudo> in preference to
C<suidperl>.
=head2 Optional site customization script
The perl interpreter can be built to allow the use of a site customization
script. By default this is not enabled, to be consistent with previous perl
releases. To use this, add C<-Dusesitecustomize> to the command line flags
when running the C<Configure> script. See also L<perlrun/-f>.
=head2 C<Config.pm> is now much smaller.
C<Config.pm> is now about 3K rather than 32K, with the infrequently used
code and C<%Config> values loaded on demand. This is transparent to the
programmer, but means that most code will save parsing and loading 29K of
script (for example, code that uses C<File::Find>).
=head1 Modules and Pragmata
=over 4
=item *
B upgraded to version 1.09
=item *
base upgraded to version 2.07
=item *
bignum upgraded to version 0.17
=item *
bytes upgraded to version 1.02
=item *
Carp upgraded to version 1.04
=item *
CGI upgraded to version 3.10
=item *
Class::ISA upgraded to version 0.33
=item *
Data::Dumper upgraded to version 2.121_02
=item *
DB_File upgraded to version 1.811
=item *
Devel::PPPort upgraded to version 3.06
=item *
Digest upgraded to version 1.10
=item *
Encode upgraded to version 2.10
=item *
FileCache upgraded to version 1.05
=item *
File::Path upgraded to version 1.07
=item *
File::Temp upgraded to version 0.16
=item *
IO::File upgraded to version 1.11
=item *
IO::Socket upgraded to version 1.28
=item *
Math::BigInt upgraded to version 1.77
=item *
Math::BigRat upgraded to version 0.15
=item *
overload upgraded to version 1.03
=item *
PathTools upgraded to version 3.05
=item *
Pod::HTML upgraded to version 1.0503
=item *
Pod::Perldoc upgraded to version 3.14
=item *
Pod::LaTeX upgraded to version 0.58
=item *
Pod::Parser upgraded to version 1.30
=item *
Symbol upgraded to version 1.06
=item *
Term::ANSIColor upgraded to version 1.09
=item *
Test::Harness upgraded to version 2.48
=item *
Test::Simple upgraded to version 0.54
=item *
Text::Wrap upgraded to version 2001.09293, to fix a bug when wrap() was
called with a non-space separator.
=item *
threads::shared upgraded to version 0.93
=item *
Time::HiRes upgraded to version 1.66
=item *
Time::Local upgraded to version 1.11
=item *
Unicode::Normalize upgraded to version 0.32
=item *
utf8 upgraded to version 1.05
=item *
Win32 upgraded to version 0.24, which provides Win32::GetFileVersion
=back
=head1 Utility Changes
=head2 find2perl enhancements
C<find2perl> has new options C<-iname>, C<-path> and C<-ipath>.
=head1 Performance Enhancements
The internal pointer mapping hash used during ithreads cloning now uses an
arena for memory allocation. In tests this reduced ithreads cloning time by
about 10%.
=head1 Installation and Configuration Improvements
=over 4
=item *
The Win32 "dmake" makefile.mk has been updated to make it compatible
with the latest versions of dmake.
=item *
C<PERL_MALLOC>, C<DEBUG_MSTATS>, C<PERL_HASH_SEED_EXPLICIT> and C<NO_HASH_SEED>
should now work in Win32 makefiles.
=back
=head1 Selected Bug Fixes
=over 4
=item *
The socket() function on Win32 has been fixed so that it is able to use
transport providers which specify a protocol of 0 (meaning any protocol
is allowed) once more. (This was broken in 5.8.6, and typically caused
the use of ICMP sockets to fail.)
=item *
Another obscure bug involving C<substr> and UTF-8 caused by bad internal
offset caching has been identified and fixed.
=item *
A bug involving the loading of UTF-8 tables by the regexp engine has been
fixed - code such as C<"\x{100}" =~ /[[:print:]]/> will no longer give
corrupt results.
=item *
Case conversion operations such as C<uc> on a long Unicode string could
exhaust memory. This has been fixed.
=item *
C<index>/C<rindex> were buggy for some combinations of Unicode and
non-Unicode data. This has been fixed.
=item *
C<read> (and presumably C<sysread>) would expose the UTF-8 internals when
reading from a byte oriented file handle into a UTF-8 scalar. This has
been fixed.
=item *
Several C<pack>/C<unpack> bug fixes:
=over 4
=item *
Checksums with C<b> or C<B> formats were broken.
=item *
C<unpack> checksums could overflow with the C<C> format.
=item *
C<U0> and C<C0> are now scoped to C<()> C<pack> sub-templates.
=item *
Counted length prefixes now don't change C<C0>/C<U0> mode.
=item *
C<pack> C<Z0> used to destroy the preceding character.
=item *
C<P>/C<p> C<pack> formats used to only recognise literal C<undef>
=back
=item *
Using closures with ithreads could cause perl to crash. This was due to
failure to correctly lock internal OP structures, and has been fixed.
=item *
The return value of C<close> now correctly reflects any file errors that
occur while flushing the handle's data, instead of just giving failure if
the actual underlying file close operation failed.
=item *
C<not() || 1> used to segfault. C<not()> now behaves like C<not(0)>, which was
the pre 5.6.0 behaviour.
=item *
C<h2ph> has various enhancements to cope with constructs in header files that
used to result in incorrect or invalid output.
=back
=head1 New or Changed Diagnostics
There is a new taint error, "%ENV is aliased to %s". This error is thrown
when taint checks are enabled and when C<*ENV> has been aliased, so that
C<%ENV> has no env-magic anymore and hence the environment cannot be verified
as taint-free.
The internals of C<pack> and C<unpack> have been updated. All legitimate
templates should work as before, but there may be some changes in the error
reported for complex failure cases. Any behaviour changes for non-error cases
are bugs, and should be reported.
=head1 Changed Internals
There has been a fair amount of refactoring of the C<C> source code, partly to
make it tidier and more maintainable. The resulting object code and the
C<perl> binary may well be smaller than 5.8.6, and hopefully faster in some
cases, but apart from this there should be no user-detectable changes.
C<${^UTF8LOCALE}> has been added to give perl space access to C<PL_utf8locale>.
The size of the arenas used to allocate SV heads and most SV bodies can now
be changed at compile time. The old size was 1008 bytes, the new default size
is 4080 bytes.
=head1 Known Problems
Unicode strings returned from overloaded operators can be buggy. This is a
long standing bug reported since 5.8.6 was released, but we do not yet have
a suitable fix for it.
=head1 Platform Specific Problems
On UNICOS, lib/Math/BigInt/t/bigintc.t hangs burning CPU.
ext/B/t/bytecode.t and ext/Socket/t/socketpair.t both fail tests.
These are unlikely to be resolved, as our valiant UNICOS porter's last
Cray is being decommissioned.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles
recently posted to the comp.lang.perl.misc newsgroup and the perl
bug database at http://bugs.perl.org. There may also be
information at http://www.perl.org, the Perl Home Page.
If you believe you have an unreported bug, please run the B<perlbug>
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team. You can browse and search
the Perl 5 bugs at http://bugs.perl.org/
=head1 SEE ALSO
The F<Changes> file for exhaustive details on what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Z��-X X perl5124delta.podnu �[��� =encoding utf8
=head1 NAME
perl5124delta - what is new for perl v5.12.4
=head1 DESCRIPTION
This document describes differences between the 5.12.3 release and
the 5.12.4 release.
If you are upgrading from an earlier release such as 5.12.2, first read
L<perl5123delta>, which describes differences between 5.12.2
and 5.12.3. The major changes made in 5.12.0 are described in L<perl5120delta>.
=head1 Incompatible Changes
There are no changes intentionally incompatible with 5.12.3. If any
exist, they are bugs and reports are welcome.
=head1 Selected Bug Fixes
When strict "refs" mode is off, C<%{...}> in rvalue context returns
C<undef> if its argument is undefined. An optimisation introduced in Perl
5.12.0 to make C<keys %{...}> faster when used as a boolean did not take
this into account, causing C<keys %{+undef}> (and C<keys %$foo> when
C<$foo> is undefined) to be an error, which it should be so in strict
mode only [perl #81750].
C<lc>, C<uc>, C<lcfirst>, and C<ucfirst> no longer return untainted strings
when the argument is tainted. This has been broken since perl 5.8.9
[perl #87336].
Fixed a case where it was possible that a freed buffer may have been read
from when parsing a here document.
=head1 Modules and Pragmata
L<Module::CoreList> has been upgraded from version 2.43 to 2.50.
=head1 Testing
The F<cpan/CGI/t/http.t> test script has been fixed to work when the
environment has HTTPS_* environment variables, such as HTTPS_PROXY.
=head1 Documentation
Updated the documentation for rand() in L<perlfunc> to note that it is not
cryptographically secure.
=head1 Platform Specific Notes
=over 4
=item Linux
Support Ubuntu 11.04's new multi-arch library layout.
=back
=head1 Acknowledgements
Perl 5.12.4 represents approximately 5 months of development since
Perl 5.12.3 and contains approximately 200 lines of changes across
11 files from 8 authors.
Perl continues to flourish into its third decade thanks to a vibrant
community of users and developers. The following people are known to
have contributed the improvements that became Perl 5.12.4:
Andy Dougherty, David Golden, David Leadbeater, Father Chrysostomos,
Florian Ragwitz, Jesse Vincent, Leon Brocard, Zsbán Ambrus.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles
recently posted to the comp.lang.perl.misc newsgroup and the perl
bug database at http://rt.perl.org/perlbug/ . There may also be
information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the B<perlbug>
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it
inappropriate to send to a publicly archived mailing list, then please send
it to perl5-security-report@perl.org. This points to a closed subscription
unarchived mailing list, which includes all the core committers, who be able
to help assess the impact of issues, figure out a resolution, and help
co-ordinate the release of patches to mitigate or fix the problem across all
platforms on which Perl is supported. Please only use this address for
security issues in the Perl core, not for modules independently
distributed on CPAN.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details
on what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Z��{��@ �@ perlform.podnu �[��� =head1 NAME
X<format> X<report> X<chart>
perlform - Perl formats
=head1 DESCRIPTION
Perl has a mechanism to help you generate simple reports and charts. To
facilitate this, Perl helps you code up your output page close to how it
will look when it's printed. It can keep track of things like how many
lines are on a page, what page you're on, when to print page headers,
etc. Keywords are borrowed from FORTRAN: format() to declare and write()
to execute; see their entries in L<perlfunc>. Fortunately, the layout is
much more legible, more like BASIC's PRINT USING statement. Think of it
as a poor man's nroff(1).
X<nroff>
Formats, like packages and subroutines, are declared rather than
executed, so they may occur at any point in your program. (Usually it's
best to keep them all together though.) They have their own namespace
apart from all the other "types" in Perl. This means that if you have a
function named "Foo", it is not the same thing as having a format named
"Foo". However, the default name for the format associated with a given
filehandle is the same as the name of the filehandle. Thus, the default
format for STDOUT is named "STDOUT", and the default format for filehandle
TEMP is named "TEMP". They just look the same. They aren't.
Output record formats are declared as follows:
format NAME =
FORMLIST
.
If the name is omitted, format "STDOUT" is defined. A single "." in
column 1 is used to terminate a format. FORMLIST consists of a sequence
of lines, each of which may be one of three types:
=over 4
=item 1.
A comment, indicated by putting a '#' in the first column.
=item 2.
A "picture" line giving the format for one output line.
=item 3.
An argument line supplying values to plug into the previous picture line.
=back
Picture lines contain output field definitions, intermingled with
literal text. These lines do not undergo any kind of variable interpolation.
Field definitions are made up from a set of characters, for starting and
extending a field to its desired width. This is the complete set of
characters for field definitions:
X<format, picture line>
X<@> X<^> X<< < >> X<< | >> X<< > >> X<#> X<0> X<.> X<...>
X<@*> X<^*> X<~> X<~~>
@ start of regular field
^ start of special field
< pad character for left justification
| pad character for centering
> pad character for right justification
# pad character for a right-justified numeric field
0 instead of first #: pad number with leading zeroes
. decimal point within a numeric field
... terminate a text field, show "..." as truncation evidence
@* variable width field for a multi-line value
^* variable width field for next line of a multi-line value
~ suppress line with all fields empty
~~ repeat line until all fields are exhausted
Each field in a picture line starts with either "@" (at) or "^" (caret),
indicating what we'll call, respectively, a "regular" or "special" field.
The choice of pad characters determines whether a field is textual or
numeric. The tilde operators are not part of a field. Let's look at
the various possibilities in detail.
=head2 Text Fields
X<format, text field>
The length of the field is supplied by padding out the field with multiple
"E<lt>", "E<gt>", or "|" characters to specify a non-numeric field with,
respectively, left justification, right justification, or centering.
For a regular field, the value (up to the first newline) is taken and
printed according to the selected justification, truncating excess characters.
If you terminate a text field with "...", three dots will be shown if
the value is truncated. A special text field may be used to do rudimentary
multi-line text block filling; see L</Using Fill Mode> for details.
Example:
format STDOUT =
@<<<<<< @|||||| @>>>>>>
"left", "middle", "right"
.
Output:
left middle right
=head2 Numeric Fields
X<#> X<format, numeric field>
Using "#" as a padding character specifies a numeric field, with
right justification. An optional "." defines the position of the
decimal point. With a "0" (zero) instead of the first "#", the
formatted number will be padded with leading zeroes if necessary.
A special numeric field is blanked out if the value is undefined.
If the resulting value would exceed the width specified the field is
filled with "#" as overflow evidence.
Example:
format STDOUT =
@### @.### @##.### @### @### ^####
42, 3.1415, undef, 0, 10000, undef
.
Output:
42 3.142 0.000 0 ####
=head2 The Field @* for Variable-Width Multi-Line Text
X<@*>
The field "@*" can be used for printing multi-line, nontruncated
values; it should (but need not) appear by itself on a line. A final
line feed is chomped off, but all other characters are emitted verbatim.
=head2 The Field ^* for Variable-Width One-line-at-a-time Text
X<^*>
Like "@*", this is a variable-width field. The value supplied must be a
scalar variable. Perl puts the first line (up to the first "\n") of the
text into the field, and then chops off the front of the string so that
the next time the variable is referenced, more of the text can be printed.
The variable will I<not> be restored.
Example:
$text = "line 1\nline 2\nline 3";
format STDOUT =
Text: ^*
$text
~~ ^*
$text
.
Output:
Text: line 1
line 2
line 3
=head2 Specifying Values
X<format, specifying values>
The values are specified on the following format line in the same order as
the picture fields. The expressions providing the values must be
separated by commas. They are all evaluated in a list context
before the line is processed, so a single list expression could produce
multiple list elements. The expressions may be spread out to more than
one line if enclosed in braces. If so, the opening brace must be the first
token on the first line. If an expression evaluates to a number with a
decimal part, and if the corresponding picture specifies that the decimal
part should appear in the output (that is, any picture except multiple "#"
characters B<without> an embedded "."), the character used for the decimal
point is determined by the current LC_NUMERIC locale if C<use locale> is in
effect. This means that, if, for example, the run-time environment happens
to specify a German locale, "," will be used instead of the default ".". See
L<perllocale> and L</"WARNINGS"> for more information.
=head2 Using Fill Mode
X<format, fill mode>
On text fields the caret enables a kind of fill mode. Instead of an
arbitrary expression, the value supplied must be a scalar variable
that contains a text string. Perl puts the next portion of the text into
the field, and then chops off the front of the string so that the next time
the variable is referenced, more of the text can be printed. (Yes, this
means that the variable itself is altered during execution of the write()
call, and is not restored.) The next portion of text is determined by
a crude line-breaking algorithm. You may use the carriage return character
(C<\r>) to force a line break. You can change which characters are legal
to break on by changing the variable C<$:> (that's
$FORMAT_LINE_BREAK_CHARACTERS if you're using the English module) to a
list of the desired characters.
Normally you would use a sequence of fields in a vertical stack associated
with the same scalar variable to print out a block of text. You might wish
to end the final field with the text "...", which will appear in the output
if the text was too long to appear in its entirety.
=head2 Suppressing Lines Where All Fields Are Void
X<format, suppressing lines>
Using caret fields can produce lines where all fields are blank. You can
suppress such lines by putting a "~" (tilde) character anywhere in the
line. The tilde will be translated to a space upon output.
=head2 Repeating Format Lines
X<format, repeating lines>
If you put two contiguous tilde characters "~~" anywhere into a line,
the line will be repeated until all the fields on the line are exhausted,
i.e. undefined. For special (caret) text fields this will occur sooner or
later, but if you use a text field of the at variety, the expression you
supply had better not give the same value every time forever! (C<shift(@f)>
is a simple example that would work.) Don't use a regular (at) numeric
field in such lines, because it will never go blank.
=head2 Top of Form Processing
X<format, top of form> X<top> X<header>
Top-of-form processing is by default handled by a format with the
same name as the current filehandle with "_TOP" concatenated to it.
It's triggered at the top of each page. See L<perlfunc/write>.
Examples:
# a report on the /etc/passwd file
format STDOUT_TOP =
Passwd File
Name Login Office Uid Gid Home
------------------------------------------------------------------
.
format STDOUT =
@<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<<
$name, $login, $office,$uid,$gid, $home
.
# a report from a bug report form
format STDOUT_TOP =
Bug Reports
@<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>>
$system, $%, $date
------------------------------------------------------------------
.
format STDOUT =
Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$subject
Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$index, $description
Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$priority, $date, $description
From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$from, $description
Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$programmer, $description
~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$description
~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$description
~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$description
~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$description
~ ^<<<<<<<<<<<<<<<<<<<<<<<...
$description
.
It is possible to intermix print()s with write()s on the same output
channel, but you'll have to handle C<$-> (C<$FORMAT_LINES_LEFT>)
yourself.
=head2 Format Variables
X<format variables>
X<format, variables>
The current format name is stored in the variable C<$~> (C<$FORMAT_NAME>),
and the current top of form format name is in C<$^> (C<$FORMAT_TOP_NAME>).
The current output page number is stored in C<$%> (C<$FORMAT_PAGE_NUMBER>),
and the number of lines on the page is in C<$=> (C<$FORMAT_LINES_PER_PAGE>).
Whether to autoflush output on this handle is stored in C<$|>
(C<$OUTPUT_AUTOFLUSH>). The string output before each top of page (except
the first) is stored in C<$^L> (C<$FORMAT_FORMFEED>). These variables are
set on a per-filehandle basis, so you'll need to select() into a different
one to affect them:
select((select(OUTF),
$~ = "My_Other_Format",
$^ = "My_Top_Format"
)[0]);
Pretty ugly, eh? It's a common idiom though, so don't be too surprised
when you see it. You can at least use a temporary variable to hold
the previous filehandle: (this is a much better approach in general,
because not only does legibility improve, you now have an intermediary
stage in the expression to single-step the debugger through):
$ofh = select(OUTF);
$~ = "My_Other_Format";
$^ = "My_Top_Format";
select($ofh);
If you use the English module, you can even read the variable names:
use English;
$ofh = select(OUTF);
$FORMAT_NAME = "My_Other_Format";
$FORMAT_TOP_NAME = "My_Top_Format";
select($ofh);
But you still have those funny select()s. So just use the FileHandle
module. Now, you can access these special variables using lowercase
method names instead:
use FileHandle;
format_name OUTF "My_Other_Format";
format_top_name OUTF "My_Top_Format";
Much better!
=head1 NOTES
Because the values line may contain arbitrary expressions (for at fields,
not caret fields), you can farm out more sophisticated processing
to other functions, like sprintf() or one of your own. For example:
format Ident =
@<<<<<<<<<<<<<<<
&commify($n)
.
To get a real at or caret into the field, do this:
format Ident =
I have an @ here.
"@"
.
To center a whole line of text, do something like this:
format Ident =
@|||||||||||||||||||||||||||||||||||||||||||||||
"Some text line"
.
There is no builtin way to say "float this to the right hand side
of the page, however wide it is." You have to specify where it goes.
The truly desperate can generate their own format on the fly, based
on the current number of columns, and then eval() it:
$format = "format STDOUT = \n"
. '^' . '<' x $cols . "\n"
. '$entry' . "\n"
. "\t^" . "<" x ($cols-8) . "~~\n"
. '$entry' . "\n"
. ".\n";
print $format if $Debugging;
eval $format;
die $@ if $@;
Which would generate a format looking something like this:
format STDOUT =
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$entry
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~~
$entry
.
Here's a little program that's somewhat like fmt(1):
format =
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~~
$_
.
$/ = '';
while (<>) {
s/\s*\n\s*/ /g;
write;
}
=head2 Footers
X<format, footer> X<footer>
While $FORMAT_TOP_NAME contains the name of the current header format,
there is no corresponding mechanism to automatically do the same thing
for a footer. Not knowing how big a format is going to be until you
evaluate it is one of the major problems. It's on the TODO list.
Here's one strategy: If you have a fixed-size footer, you can get footers
by checking $FORMAT_LINES_LEFT before each write() and print the footer
yourself if necessary.
Here's another strategy: Open a pipe to yourself, using C<open(MYSELF, "|-")>
(see L<perlfunc/open>) and always write() to MYSELF instead of STDOUT.
Have your child process massage its STDIN to rearrange headers and footers
however you like. Not very convenient, but doable.
=head2 Accessing Formatting Internals
X<format, internals>
For low-level access to the formatting mechanism, you may use formline()
and access C<$^A> (the $ACCUMULATOR variable) directly.
For example:
$str = formline <<'END', 1,2,3;
@<<< @||| @>>>
END
print "Wow, I just stored '$^A' in the accumulator!\n";
Or to make an swrite() subroutine, which is to write() what sprintf()
is to printf(), do this:
use Carp;
sub swrite {
croak "usage: swrite PICTURE ARGS" unless @_;
my $format = shift;
$^A = "";
formline($format,@_);
return $^A;
}
$string = swrite(<<'END', 1, 2, 3);
Check me out
@<<< @||| @>>>
END
print $string;
=head1 WARNINGS
The lone dot that ends a format can also prematurely end a mail
message passing through a misconfigured Internet mailer (and based on
experience, such misconfiguration is the rule, not the exception). So
when sending format code through mail, you should indent it so that
the format-ending dot is not on the left margin; this will prevent
SMTP cutoff.
Lexical variables (declared with "my") are not visible within a
format unless the format is declared within the scope of the lexical
variable.
If a program's environment specifies an LC_NUMERIC locale and C<use
locale> is in effect when the format is declared, the locale is used
to specify the decimal point character in formatted output. Formatted
output cannot be controlled by C<use locale> at the time when write()
is called. See L<perllocale> for further discussion of locale handling.
Within strings that are to be displayed in a fixed-length text field,
each control character is substituted by a space. (But remember the
special meaning of C<\r> when using fill mode.) This is done to avoid
misalignment when control characters "disappear" on some output media.
PK z3�Z�:�f perldtrace.podnu �[��� =head1 NAME
perldtrace - Perl's support for DTrace
=head1 SYNOPSIS
# dtrace -Zn 'perl::sub-entry, perl::sub-return { trace(copyinstr(arg0)) }'
dtrace: description 'perl::sub-entry, perl::sub-return ' matched 10 probes
# perl -E 'sub outer { inner(@_) } sub inner { say shift } outer("hello")'
hello
(dtrace output)
CPU ID FUNCTION:NAME
0 75915 Perl_pp_entersub:sub-entry BEGIN
0 75915 Perl_pp_entersub:sub-entry import
0 75922 Perl_pp_leavesub:sub-return import
0 75922 Perl_pp_leavesub:sub-return BEGIN
0 75915 Perl_pp_entersub:sub-entry outer
0 75915 Perl_pp_entersub:sub-entry inner
0 75922 Perl_pp_leavesub:sub-return inner
0 75922 Perl_pp_leavesub:sub-return outer
=head1 DESCRIPTION
DTrace is a framework for comprehensive system- and application-level
tracing. Perl is a DTrace I<provider>, meaning it exposes several
I<probes> for instrumentation. You can use these in conjunction
with kernel-level probes, as well as probes from other providers
such as MySQL, in order to diagnose software defects, or even just
your application's bottlenecks.
Perl must be compiled with the C<-Dusedtrace> option in order to
make use of the provided probes. While DTrace aims to have no
overhead when its instrumentation is not active, Perl's support
itself cannot uphold that guarantee, so it is built without DTrace
probes under most systems. One notable exception is that Mac OS X
ships a F</usr/bin/perl> with DTrace support enabled.
=head1 HISTORY
=over 4
=item 5.10.1
Perl's initial DTrace support was added, providing C<sub-entry> and
C<sub-return> probes.
=item 5.14.0
The C<sub-entry> and C<sub-return> probes gain a fourth argument: the
package name of the function.
=item 5.16.0
The C<phase-change> probe was added.
=item 5.18.0
The C<op-entry>, C<loading-file>, and C<loaded-file> probes were added.
=back
=head1 PROBES
=over 4
=item sub-entry(SUBNAME, FILE, LINE, PACKAGE)
Traces the entry of any subroutine. Note that all of the variables
refer to the subroutine that is being invoked; there is currently
no way to get ahold of any information about the subroutine's
I<caller> from a DTrace action.
:*perl*::sub-entry {
printf("%s::%s entered at %s line %d\n",
copyinstr(arg3), copyinstr(arg0), copyinstr(arg1), arg2);
}
=item sub-return(SUBNAME, FILE, LINE, PACKAGE)
Traces the exit of any subroutine. Note that all of the variables
refer to the subroutine that is returning; there is currently no
way to get ahold of any information about the subroutine's I<caller>
from a DTrace action.
:*perl*::sub-return {
printf("%s::%s returned at %s line %d\n",
copyinstr(arg3), copyinstr(arg0), copyinstr(arg1), arg2);
}
=item phase-change(NEWPHASE, OLDPHASE)
Traces changes to Perl's interpreter state. You can internalize this
as tracing changes to Perl's C<${^GLOBAL_PHASE}> variable, especially
since the values for C<NEWPHASE> and C<OLDPHASE> are the strings that
C<${^GLOBAL_PHASE}> reports.
:*perl*::phase-change {
printf("Phase changed from %s to %s\n",
copyinstr(arg1), copyinstr(arg0));
}
=item op-entry(OPNAME)
Traces the execution of each opcode in the Perl runloop. This probe
is fired before the opcode is executed. When the Perl debugger is
enabled, the DTrace probe is fired I<after> the debugger hooks (but
still before the opcode itself is executed).
:*perl*::op-entry {
printf("About to execute opcode %s\n", copyinstr(arg0));
}
=item loading-file(FILENAME)
Fires when Perl is about to load an individual file, whether from
C<use>, C<require>, or C<do>. This probe fires before the file is
read from disk. The filename argument is converted to local filesystem
paths instead of providing C<Module::Name>-style names.
:*perl*:loading-file {
printf("About to load %s\n", copyinstr(arg0));
}
=item loaded-file(FILENAME)
Fires when Perl has successfully loaded an individual file, whether
from C<use>, C<require>, or C<do>. This probe fires after the file
is read from disk and its contents evaluated. The filename argument
is converted to local filesystem paths instead of providing
C<Module::Name>-style names.
:*perl*:loaded-file {
printf("Successfully loaded %s\n", copyinstr(arg0));
}
=back
=head1 EXAMPLES
=over 4
=item Most frequently called functions
# dtrace -qZn 'sub-entry { @[strjoin(strjoin(copyinstr(arg3),"::"),copyinstr(arg0))] = count() } END {trunc(@, 10)}'
Class::MOP::Attribute::slots 400
Try::Tiny::catch 411
Try::Tiny::try 411
Class::MOP::Instance::inline_slot_access 451
Class::MOP::Class::Immutable::Trait:::around 472
Class::MOP::Mixin::AttributeCore::has_initializer 496
Class::MOP::Method::Wrapped::__ANON__ 544
Class::MOP::Package::_package_stash 737
Class::MOP::Class::initialize 1128
Class::MOP::get_metaclass_by_name 1204
=item Trace function calls
# dtrace -qFZn 'sub-entry, sub-return { trace(copyinstr(arg0)) }'
0 -> Perl_pp_entersub BEGIN
0 <- Perl_pp_leavesub BEGIN
0 -> Perl_pp_entersub BEGIN
0 -> Perl_pp_entersub import
0 <- Perl_pp_leavesub import
0 <- Perl_pp_leavesub BEGIN
0 -> Perl_pp_entersub BEGIN
0 -> Perl_pp_entersub dress
0 <- Perl_pp_leavesub dress
0 -> Perl_pp_entersub dirty
0 <- Perl_pp_leavesub dirty
0 -> Perl_pp_entersub whiten
0 <- Perl_pp_leavesub whiten
0 <- Perl_dounwind BEGIN
=item Function calls during interpreter cleanup
# dtrace -Zn 'phase-change /copyinstr(arg0) == "END"/ { self->ending = 1 } sub-entry /self->ending/ { trace(copyinstr(arg0)) }'
CPU ID FUNCTION:NAME
1 77214 Perl_pp_entersub:sub-entry END
1 77214 Perl_pp_entersub:sub-entry END
1 77214 Perl_pp_entersub:sub-entry cleanup
1 77214 Perl_pp_entersub:sub-entry _force_writable
1 77214 Perl_pp_entersub:sub-entry _force_writable
=item System calls at compile time
# dtrace -qZn 'phase-change /copyinstr(arg0) == "START"/ { self->interesting = 1 } phase-change /copyinstr(arg0) == "RUN"/ { self->interesting = 0 } syscall::: /self->interesting/ { @[probefunc] = count() } END { trunc(@, 3) }'
lseek 310
read 374
stat64 1056
=item Perl functions that execute the most opcodes
# dtrace -qZn 'sub-entry { self->fqn = strjoin(copyinstr(arg3), strjoin("::", copyinstr(arg0))) } op-entry /self->fqn != ""/ { @[self->fqn] = count() } END { trunc(@, 3) }'
warnings::unimport 4589
Exporter::Heavy::_rebuild_cache 5039
Exporter::import 14578
=back
=head1 REFERENCES
=over 4
=item DTrace Dynamic Tracing Guide
L<http://dtrace.org/guide/preface.html>
=item DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD
L<http://www.amazon.com/DTrace-Dynamic-Tracing-Solaris-FreeBSD/dp/0132091518/>
=back
=head1 SEE ALSO
=over 4
=item L<Devel::DTrace::Provider>
This CPAN module lets you create application-level DTrace probes written in
Perl.
=back
=head1 AUTHORS
Shawn M Moore C<sartak@gmail.com>
=cut
PK z3�ZM��ʇ� �� perl5180delta.podnu �[��� =encoding utf8
=head1 NAME
perl5180delta - what is new for perl v5.18.0
=head1 DESCRIPTION
This document describes differences between the v5.16.0 release and the v5.18.0
release.
If you are upgrading from an earlier release such as v5.14.0, first read
L<perl5160delta>, which describes differences between v5.14.0 and v5.16.0.
=head1 Core Enhancements
=head2 New mechanism for experimental features
Newly-added experimental features will now require this incantation:
no warnings "experimental::feature_name";
use feature "feature_name"; # would warn without the prev line
There is a new warnings category, called "experimental", containing
warnings that the L<feature> pragma emits when enabling experimental
features.
Newly-added experimental features will also be given special warning IDs,
which consist of "experimental::" followed by the name of the feature. (The
plan is to extend this mechanism eventually to all warnings, to allow them
to be enabled or disabled individually, and not just by category.)
By saying
no warnings "experimental::feature_name";
you are taking responsibility for any breakage that future changes to, or
removal of, the feature may cause.
Since some features (like C<~~> or C<my $_>) now emit experimental warnings,
and you may want to disable them in code that is also run on perls that do not
recognize these warning categories, consider using the C<if> pragma like this:
no if $] >= 5.018, warnings => "experimental::feature_name";
Existing experimental features may begin emitting these warnings, too. Please
consult L<perlexperiment> for information on which features are considered
experimental.
=head2 Hash overhaul
Changes to the implementation of hashes in perl v5.18.0 will be one of the most
visible changes to the behavior of existing code.
By default, two distinct hash variables with identical keys and values may now
provide their contents in a different order where it was previously identical.
When encountering these changes, the key to cleaning up from them is to accept
that B<hashes are unordered collections> and to act accordingly.
=head3 Hash randomization
The seed used by Perl's hash function is now random. This means that the
order which keys/values will be returned from functions like C<keys()>,
C<values()>, and C<each()> will differ from run to run.
This change was introduced to make Perl's hashes more robust to algorithmic
complexity attacks, and also because we discovered that it exposes hash
ordering dependency bugs and makes them easier to track down.
Toolchain maintainers might want to invest in additional infrastructure to
test for things like this. Running tests several times in a row and then
comparing results will make it easier to spot hash order dependencies in
code. Authors are strongly encouraged not to expose the key order of
Perl's hashes to insecure audiences.
Further, every hash has its own iteration order, which should make it much
more difficult to determine what the current hash seed is.
=head3 New hash functions
Perl v5.18 includes support for multiple hash functions, and changed
the default (to ONE_AT_A_TIME_HARD), you can choose a different
algorithm by defining a symbol at compile time. For a current list,
consult the F<INSTALL> document. Note that as of Perl v5.18 we can
only recommend use of the default or SIPHASH. All the others are
known to have security issues and are for research purposes only.
=head3 PERL_HASH_SEED environment variable now takes a hex value
C<PERL_HASH_SEED> no longer accepts an integer as a parameter;
instead the value is expected to be a binary value encoded in a hex
string, such as "0xf5867c55039dc724". This is to make the
infrastructure support hash seeds of arbitrary lengths, which might
exceed that of an integer. (SipHash uses a 16 byte seed.)
=head3 PERL_PERTURB_KEYS environment variable added
The C<PERL_PERTURB_KEYS> environment variable allows one to control the level of
randomization applied to C<keys> and friends.
When C<PERL_PERTURB_KEYS> is 0, perl will not randomize the key order at all. The
chance that C<keys> changes due to an insert will be the same as in previous
perls, basically only when the bucket size is changed.
When C<PERL_PERTURB_KEYS> is 1, perl will randomize keys in a non-repeatable
way. The chance that C<keys> changes due to an insert will be very high. This
is the most secure and default mode.
When C<PERL_PERTURB_KEYS> is 2, perl will randomize keys in a repeatable way.
Repeated runs of the same program should produce the same output every time.
C<PERL_HASH_SEED> implies a non-default C<PERL_PERTURB_KEYS> setting. Setting
C<PERL_HASH_SEED=0> (exactly one 0) implies C<PERL_PERTURB_KEYS=0> (hash key
randomization disabled); setting C<PERL_HASH_SEED> to any other value implies
C<PERL_PERTURB_KEYS=2> (deterministic and repeatable hash key randomization).
Specifying C<PERL_PERTURB_KEYS> explicitly to a different level overrides this
behavior.
=head3 Hash::Util::hash_seed() now returns a string
Hash::Util::hash_seed() now returns a string instead of an integer. This
is to make the infrastructure support hash seeds of arbitrary lengths
which might exceed that of an integer. (SipHash uses a 16 byte seed.)
=head3 Output of PERL_HASH_SEED_DEBUG has been changed
The environment variable PERL_HASH_SEED_DEBUG now makes perl show both the
hash function perl was built with, I<and> the seed, in hex, in use for that
process. Code parsing this output, should it exist, must change to accommodate
the new format. Example of the new format:
$ PERL_HASH_SEED_DEBUG=1 ./perl -e1
HASH_FUNCTION = MURMUR3 HASH_SEED = 0x1476bb9f
=head2 Upgrade to Unicode 6.2
Perl now supports Unicode 6.2. A list of changes from Unicode
6.1 is at L<http://www.unicode.org/versions/Unicode6.2.0>.
=head2 Character name aliases may now include non-Latin1-range characters
It is possible to define your own names for characters for use in
C<\N{...}>, C<charnames::vianame()>, etc. These names can now be
comprised of characters from the whole Unicode range. This allows for
names to be in your native language, and not just English. Certain
restrictions apply to the characters that may be used (you can't define
a name that has punctuation in it, for example). See L<charnames/CUSTOM
ALIASES>.
=head2 New DTrace probes
The following new DTrace probes have been added:
=over 4
=item *
C<op-entry>
=item *
C<loading-file>
=item *
C<loaded-file>
=back
=head2 C<${^LAST_FH}>
This new variable provides access to the filehandle that was last read.
This is the handle used by C<$.> and by C<tell> and C<eof> without
arguments.
=head2 Regular Expression Set Operations
This is an B<experimental> feature to allow matching against the union,
intersection, etc., of sets of code points, similar to
L<Unicode::Regex::Set>. It can also be used to extend C</x> processing
to [bracketed] character classes, and as a replacement of user-defined
properties, allowing more complex expressions than they do. See
L<perlrecharclass/Extended Bracketed Character Classes>.
=head2 Lexical subroutines
This new feature is still considered B<experimental>. To enable it:
use 5.018;
no warnings "experimental::lexical_subs";
use feature "lexical_subs";
You can now declare subroutines with C<state sub foo>, C<my sub foo>, and
C<our sub foo>. (C<state sub> requires that the "state" feature be
enabled, unless you write it as C<CORE::state sub foo>.)
C<state sub> creates a subroutine visible within the lexical scope in which
it is declared. The subroutine is shared between calls to the outer sub.
C<my sub> declares a lexical subroutine that is created each time the
enclosing block is entered. C<state sub> is generally slightly faster than
C<my sub>.
C<our sub> declares a lexical alias to the package subroutine of the same
name.
For more information, see L<perlsub/Lexical Subroutines>.
=head2 Computed Labels
The loop controls C<next>, C<last> and C<redo>, and the special C<dump>
operator, now allow arbitrary expressions to be used to compute labels at run
time. Previously, any argument that was not a constant was treated as the
empty string.
=head2 More CORE:: subs
Several more built-in functions have been added as subroutines to the
CORE:: namespace - namely, those non-overridable keywords that can be
implemented without custom parsers: C<defined>, C<delete>, C<exists>,
C<glob>, C<pos>, C<prototype>, C<scalar>, C<split>, C<study>, and C<undef>.
As some of these have prototypes, C<prototype('CORE::...')> has been
changed to not make a distinction between overridable and non-overridable
keywords. This is to make C<prototype('CORE::pos')> consistent with
C<prototype(&CORE::pos)>.
=head2 C<kill> with negative signal names
C<kill> has always allowed a negative signal number, which kills the
process group instead of a single process. It has also allowed signal
names. But it did not behave consistently, because negative signal names
were treated as 0. Now negative signals names like C<-INT> are supported
and treated the same way as -2 [perl #112990].
=head1 Security
=head2 See also: hash overhaul
Some of the changes in the L<hash overhaul|/"Hash overhaul"> were made to
enhance security. Please read that section.
=head2 C<Storable> security warning in documentation
The documentation for C<Storable> now includes a section which warns readers
of the danger of accepting Storable documents from untrusted sources. The
short version is that deserializing certain types of data can lead to loading
modules and other code execution. This is documented behavior and wanted
behavior, but this opens an attack vector for malicious entities.
=head2 C<Locale::Maketext> allowed code injection via a malicious template
If users could provide a translation string to Locale::Maketext, this could be
used to invoke arbitrary Perl subroutines available in the current process.
This has been fixed, but it is still possible to invoke any method provided by
C<Locale::Maketext> itself or a subclass that you are using. One of these
methods in turn will invoke the Perl core's C<sprintf> subroutine.
In summary, allowing users to provide translation strings without auditing
them is a bad idea.
This vulnerability is documented in CVE-2012-6329.
=head2 Avoid calling memset with a negative count
Poorly written perl code that allows an attacker to specify the count to perl's
C<x> string repeat operator can already cause a memory exhaustion
denial-of-service attack. A flaw in versions of perl before v5.15.5 can escalate
that into a heap buffer overrun; coupled with versions of glibc before 2.16, it
possibly allows the execution of arbitrary code.
The flaw addressed to this commit has been assigned identifier CVE-2012-5195
and was researched by Tim Brown.
=head1 Incompatible Changes
=head2 See also: hash overhaul
Some of the changes in the L<hash overhaul|/"Hash overhaul"> are not fully
compatible with previous versions of perl. Please read that section.
=head2 An unknown character name in C<\N{...}> is now a syntax error
Previously, it warned, and the Unicode REPLACEMENT CHARACTER was
substituted. Unicode now recommends that this situation be a syntax
error. Also, the previous behavior led to some confusing warnings and
behaviors, and since the REPLACEMENT CHARACTER has no use other than as
a stand-in for some unknown character, any code that has this problem is
buggy.
=head2 Formerly deprecated characters in C<\N{}> character name aliases are now errors.
Since v5.12.0, it has been deprecated to use certain characters in
user-defined C<\N{...}> character names. These now cause a syntax
error. For example, it is now an error to begin a name with a digit,
such as in
my $undraftable = "\N{4F}"; # Syntax error!
or to have commas anywhere in the name. See L<charnames/CUSTOM ALIASES>.
=head2 C<\N{BELL}> now refers to U+1F514 instead of U+0007
Unicode 6.0 reused the name "BELL" for a different code point than it
traditionally had meant. Since Perl v5.14, use of this name still
referred to U+0007, but would raise a deprecation warning. Now, "BELL"
refers to U+1F514, and the name for U+0007 is "ALERT". All the
functions in L<charnames> have been correspondingly updated.
=head2 New Restrictions in Multi-Character Case-Insensitive Matching in Regular Expression Bracketed Character Classes
Unicode has now withdrawn their previous recommendation for regular
expressions to automatically handle cases where a single character can
match multiple characters case-insensitively, for example, the letter
LATIN SMALL LETTER SHARP S and the sequence C<ss>. This is because
it turns out to be impracticable to do this correctly in all
circumstances. Because Perl has tried to do this as best it can, it
will continue to do so. (We are considering an option to turn it off.)
However, a new restriction is being added on such matches when they
occur in [bracketed] character classes. People were specifying
things such as C</[\0-\xff]/i>, and being surprised that it matches the
two character sequence C<ss> (since LATIN SMALL LETTER SHARP S occurs in
this range). This behavior is also inconsistent with using a
property instead of a range: C<\p{Block=Latin1}> also includes LATIN
SMALL LETTER SHARP S, but C</[\p{Block=Latin1}]/i> does not match C<ss>.
The new rule is that for there to be a multi-character case-insensitive
match within a bracketed character class, the character must be
explicitly listed, and not as an end point of a range. This more
closely obeys the Principle of Least Astonishment. See
L<perlrecharclass/Bracketed Character Classes>. Note that a bug [perl
#89774], now fixed as part of this change, prevented the previous
behavior from working fully.
=head2 Explicit rules for variable names and identifiers
Due to an oversight, single character variable names in v5.16 were
completely unrestricted. This opened the door to several kinds of
insanity. As of v5.18, these now follow the rules of other identifiers,
in addition to accepting characters that match the C<\p{POSIX_Punct}>
property.
There is no longer any difference in the parsing of identifiers
specified by using braces versus without braces. For instance, perl
used to allow C<${foo:bar}> (with a single colon) but not C<$foo:bar>.
Now that both are handled by a single code path, they are both treated
the same way: both are forbidden. Note that this change is about the
range of permissible literal identifiers, not other expressions.
=head2 Vertical tabs are now whitespace
No one could recall why C<\s> didn't match C<\cK>, the vertical tab.
Now it does. Given the extreme rarity of that character, very little
breakage is expected. That said, here's what it means:
C<\s> in a regex now matches a vertical tab in all circumstances.
Literal vertical tabs in a regex literal are ignored when the C</x>
modifier is used.
Leading vertical tabs, alone or mixed with other whitespace, are now
ignored when interpreting a string as a number. For example:
$dec = " \cK \t 123";
$hex = " \cK \t 0xF";
say 0 + $dec; # was 0 with warning, now 123
say int $dec; # was 0, now 123
say oct $hex; # was 0, now 15
=head2 C</(?{})/> and C</(??{})/> have been heavily reworked
The implementation of this feature has been almost completely rewritten.
Although its main intent is to fix bugs, some behaviors, especially
related to the scope of lexical variables, will have changed. This is
described more fully in the L</Selected Bug Fixes> section.
=head2 Stricter parsing of substitution replacement
It is no longer possible to abuse the way the parser parses C<s///e> like
this:
%_=(_,"Just another ");
$_="Perl hacker,\n";
s//_}->{_/e;print
=head2 C<given> now aliases the global C<$_>
Instead of assigning to an implicit lexical C<$_>, C<given> now makes the
global C<$_> an alias for its argument, just like C<foreach>. However, it
still uses lexical C<$_> if there is lexical C<$_> in scope (again, just like
C<foreach>) [perl #114020].
=head2 The smartmatch family of features are now experimental
Smart match, added in v5.10.0 and significantly revised in v5.10.1, has been
a regular point of complaint. Although there are a number of ways in which
it is useful, it has also proven problematic and confusing for both users and
implementors of Perl. There have been a number of proposals on how to best
address the problem. It is clear that smartmatch is almost certainly either
going to change or go away in the future. Relying on its current behavior
is not recommended.
Warnings will now be issued when the parser sees C<~~>, C<given>, or C<when>.
To disable these warnings, you can add this line to the appropriate scope:
no if $] >= 5.018, warnings => "experimental::smartmatch";
Consider, though, replacing the use of these features, as they may change
behavior again before becoming stable.
=head2 Lexical C<$_> is now experimental
Since it was introduced in Perl v5.10, it has caused much confusion with no
obvious solution:
=over
=item *
Various modules (e.g., List::Util) expect callback routines to use the
global C<$_>. C<use List::Util 'first'; my $_; first { $_ == 1 } @list>
does not work as one would expect.
=item *
A C<my $_> declaration earlier in the same file can cause confusing closure
warnings.
=item *
The "_" subroutine prototype character allows called subroutines to access
your lexical C<$_>, so it is not really private after all.
=item *
Nevertheless, subroutines with a "(@)" prototype and methods cannot access
the caller's lexical C<$_>, unless they are written in XS.
=item *
But even XS routines cannot access a lexical C<$_> declared, not in the
calling subroutine, but in an outer scope, iff that subroutine happened not
to mention C<$_> or use any operators that default to C<$_>.
=back
It is our hope that lexical C<$_> can be rehabilitated, but this may
cause changes in its behavior. Please use it with caution until it
becomes stable.
=head2 readline() with C<$/ = \N> now reads N characters, not N bytes
Previously, when reading from a stream with I/O layers such as
C<encoding>, the readline() function, otherwise known as the C<< <> >>
operator, would read I<N> bytes from the top-most layer. [perl #79960]
Now, I<N> characters are read instead.
There is no change in behaviour when reading from streams with no
extra layers, since bytes map exactly to characters.
=head2 Overridden C<glob> is now passed one argument
C<glob> overrides used to be passed a magical undocumented second argument
that identified the caller. Nothing on CPAN was using this, and it got in
the way of a bug fix, so it was removed. If you really need to identify
the caller, see L<Devel::Callsite> on CPAN.
=head2 Here doc parsing
The body of a here document inside a quote-like operator now always begins
on the line after the "<<foo" marker. Previously, it was documented to
begin on the line following the containing quote-like operator, but that
was only sometimes the case [perl #114040].
=head2 Alphanumeric operators must now be separated from the closing
delimiter of regular expressions
You may no longer write something like:
m/a/and 1
Instead you must write
m/a/ and 1
with whitespace separating the operator from the closing delimiter of
the regular expression. Not having whitespace has resulted in a
deprecation warning since Perl v5.14.0.
=head2 qw(...) can no longer be used as parentheses
C<qw> lists used to fool the parser into thinking they were always
surrounded by parentheses. This permitted some surprising constructions
such as C<foreach $x qw(a b c) {...}>, which should really be written
C<foreach $x (qw(a b c)) {...}>. These would sometimes get the lexer into
the wrong state, so they didn't fully work, and the similar C<foreach qw(a
b c) {...}> that one might expect to be permitted never worked at all.
This side effect of C<qw> has now been abolished. It has been deprecated
since Perl v5.13.11. It is now necessary to use real parentheses
everywhere that the grammar calls for them.
=head2 Interaction of lexical and default warnings
Turning on any lexical warnings used first to disable all default warnings
if lexical warnings were not already enabled:
$*; # deprecation warning
use warnings "void";
$#; # void warning; no deprecation warning
Now, the C<debugging>, C<deprecated>, C<glob>, C<inplace> and C<malloc> warnings
categories are left on when turning on lexical warnings (unless they are
turned off by C<no warnings>, of course).
This may cause deprecation warnings to occur in code that used to be free
of warnings.
Those are the only categories consisting only of default warnings. Default
warnings in other categories are still disabled by C<< use warnings "category" >>,
as we do not yet have the infrastructure for controlling
individual warnings.
=head2 C<state sub> and C<our sub>
Due to an accident of history, C<state sub> and C<our sub> were equivalent
to a plain C<sub>, so one could even create an anonymous sub with
C<our sub { ... }>. These are now disallowed outside of the "lexical_subs"
feature. Under the "lexical_subs" feature they have new meanings described
in L<perlsub/Lexical Subroutines>.
=head2 Defined values stored in environment are forced to byte strings
A value stored in an environment variable has always been stringified when
inherited by child processes.
In this release, when assigning to C<%ENV>, values are immediately stringified,
and converted to be only a byte string.
First, it is forced to be a only a string. Then if the string is utf8 and the
equivalent of C<utf8::downgrade()> works, that result is used; otherwise, the
equivalent of C<utf8::encode()> is used, and a warning is issued about wide
characters (L</Diagnostics>).
=head2 C<require> dies for unreadable files
When C<require> encounters an unreadable file, it now dies. It used to
ignore the file and continue searching the directories in C<@INC>
[perl #113422].
=head2 C<gv_fetchmeth_*> and SUPER
The various C<gv_fetchmeth_*> XS functions used to treat a package whose
named ended with C<::SUPER> specially. A method lookup on the C<Foo::SUPER>
package would be treated as a C<SUPER> method lookup on the C<Foo> package. This
is no longer the case. To do a C<SUPER> lookup, pass the C<Foo> stash and the
C<GV_SUPER> flag.
=head2 C<split>'s first argument is more consistently interpreted
After some changes earlier in v5.17, C<split>'s behavior has been
simplified: if the PATTERN argument evaluates to a string
containing one space, it is treated the way that a I<literal> string
containing one space once was.
=head1 Deprecations
=head2 Module removals
The following modules will be removed from the core distribution in a future
release, and will at that time need to be installed from CPAN. Distributions
on CPAN which require these modules will need to list them as prerequisites.
The core versions of these modules will now issue C<"deprecated">-category
warnings to alert you to this fact. To silence these deprecation warnings,
install the modules in question from CPAN.
Note that these are (with rare exceptions) fine modules that you are encouraged
to continue to use. Their disinclusion from core primarily hinges on their
necessity to bootstrapping a fully functional, CPAN-capable Perl installation,
not usually on concerns over their design.
=over
=item L<encoding>
The use of this pragma is now strongly discouraged. It conflates the encoding
of source text with the encoding of I/O data, reinterprets escape sequences in
source text (a questionable choice), and introduces the UTF-8 bug to all runtime
handling of character strings. It is broken as designed and beyond repair.
For using non-ASCII literal characters in source text, please refer to L<utf8>.
For dealing with textual I/O data, please refer to L<Encode> and L<open>.
=item L<Archive::Extract>
=item L<B::Lint>
=item L<B::Lint::Debug>
=item L<CPANPLUS> and all included C<CPANPLUS::*> modules
=item L<Devel::InnerPackage>
=item L<Log::Message>
=item L<Log::Message::Config>
=item L<Log::Message::Handlers>
=item L<Log::Message::Item>
=item L<Log::Message::Simple>
=item L<Module::Pluggable>
=item L<Module::Pluggable::Object>
=item L<Object::Accessor>
=item L<Pod::LaTeX>
=item L<Term::UI>
=item L<Term::UI::History>
=back
=head2 Deprecated Utilities
The following utilities will be removed from the core distribution in a
future release as their associated modules have been deprecated. They
will remain available with the applicable CPAN distribution.
=over
=item L<cpanp>
=item C<cpanp-run-perl>
=item L<cpan2dist>
These items are part of the C<CPANPLUS> distribution.
=item L<pod2latex>
This item is part of the C<Pod::LaTeX> distribution.
=back
=head2 PL_sv_objcount
This interpreter-global variable used to track the total number of
Perl objects in the interpreter. It is no longer maintained and will
be removed altogether in Perl v5.20.
=head2 Five additional characters should be escaped in patterns with C</x>
When a regular expression pattern is compiled with C</x>, Perl treats 6
characters as white space to ignore, such as SPACE and TAB. However,
Unicode recommends 11 characters be treated thusly. We will conform
with this in a future Perl version. In the meantime, use of any of the
missing characters will raise a deprecation warning, unless turned off.
The five characters are:
U+0085 NEXT LINE
U+200E LEFT-TO-RIGHT MARK
U+200F RIGHT-TO-LEFT MARK
U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR
=head2 User-defined charnames with surprising whitespace
A user-defined character name with trailing or multiple spaces in a row is
likely a typo. This now generates a warning when defined, on the assumption
that uses of it will be unlikely to include the excess whitespace.
=head2 Various XS-callable functions are now deprecated
All the functions used to classify characters will be removed from a
future version of Perl, and should not be used. With participating C
compilers (e.g., gcc), compiling any file that uses any of these will
generate a warning. These were not intended for public use; there are
equivalent, faster, macros for most of them.
See L<perlapi/Character classes>. The complete list is:
C<is_uni_alnum>, C<is_uni_alnumc>, C<is_uni_alnumc_lc>,
C<is_uni_alnum_lc>, C<is_uni_alpha>, C<is_uni_alpha_lc>,
C<is_uni_ascii>, C<is_uni_ascii_lc>, C<is_uni_blank>,
C<is_uni_blank_lc>, C<is_uni_cntrl>, C<is_uni_cntrl_lc>,
C<is_uni_digit>, C<is_uni_digit_lc>, C<is_uni_graph>,
C<is_uni_graph_lc>, C<is_uni_idfirst>, C<is_uni_idfirst_lc>,
C<is_uni_lower>, C<is_uni_lower_lc>, C<is_uni_print>,
C<is_uni_print_lc>, C<is_uni_punct>, C<is_uni_punct_lc>,
C<is_uni_space>, C<is_uni_space_lc>, C<is_uni_upper>,
C<is_uni_upper_lc>, C<is_uni_xdigit>, C<is_uni_xdigit_lc>,
C<is_utf8_alnum>, C<is_utf8_alnumc>, C<is_utf8_alpha>,
C<is_utf8_ascii>, C<is_utf8_blank>, C<is_utf8_char>,
C<is_utf8_cntrl>, C<is_utf8_digit>, C<is_utf8_graph>,
C<is_utf8_idcont>, C<is_utf8_idfirst>, C<is_utf8_lower>,
C<is_utf8_mark>, C<is_utf8_perl_space>, C<is_utf8_perl_word>,
C<is_utf8_posix_digit>, C<is_utf8_print>, C<is_utf8_punct>,
C<is_utf8_space>, C<is_utf8_upper>, C<is_utf8_xdigit>,
C<is_utf8_xidcont>, C<is_utf8_xidfirst>.
In addition these three functions that have never worked properly are
deprecated:
C<to_uni_lower_lc>, C<to_uni_title_lc>, and C<to_uni_upper_lc>.
=head2 Certain rare uses of backslashes within regexes are now deprecated
There are three pairs of characters that Perl recognizes as
metacharacters in regular expression patterns: C<{}>, C<[]>, and C<()>.
These can be used as well to delimit patterns, as in:
m{foo}
s(foo)(bar)
Since they are metacharacters, they have special meaning to regular
expression patterns, and it turns out that you can't turn off that
special meaning by the normal means of preceding them with a backslash,
if you use them, paired, within a pattern delimited by them. For
example, in
m{foo\{1,3\}}
the backslashes do not change the behavior, and this matches
S<C<"f o">> followed by one to three more occurrences of C<"o">.
Usages like this, where they are interpreted as metacharacters, are
exceedingly rare; we think there are none, for example, in all of CPAN.
Hence, this deprecation should affect very little code. It does give
notice, however, that any such code needs to change, which will in turn
allow us to change the behavior in future Perl versions so that the
backslashes do have an effect, and without fear that we are silently
breaking any existing code.
=head2 Splitting the tokens C<(?> and C<(*> in regular expressions
A deprecation warning is now raised if the C<(> and C<?> are separated
by white space or comments in C<(?...)> regular expression constructs.
Similarly, if the C<(> and C<*> are separated in C<(*VERB...)>
constructs.
=head2 Pre-PerlIO IO implementations
In theory, you can currently build perl without PerlIO. Instead, you'd use a
wrapper around stdio or sfio. In practice, this isn't very useful. It's not
well tested, and without any support for IO layers or (thus) Unicode, it's not
much of a perl. Building without PerlIO will most likely be removed in the
next version of perl.
PerlIO supports a C<stdio> layer if stdio use is desired. Similarly a
sfio layer could be produced in the future, if needed.
=head1 Future Deprecations
=over
=item *
Platforms without support infrastructure
Both Windows CE and z/OS have been historically under-maintained, and are
currently neither successfully building nor regularly being smoke tested.
Efforts are underway to change this situation, but it should not be taken for
granted that the platforms are safe and supported. If they do not become
buildable and regularly smoked, support for them may be actively removed in
future releases. If you have an interest in these platforms and you can lend
your time, expertise, or hardware to help support these platforms, please let
the perl development effort know by emailing C<perl5-porters@perl.org>.
Some platforms that appear otherwise entirely dead are also on the short list
for removal between now and v5.20.0:
=over
=item DG/UX
=item NeXT
=back
We also think it likely that current versions of Perl will no longer
build AmigaOS, DJGPP, NetWare (natively), OS/2 and Plan 9. If you
are using Perl on such a platform and have an interest in ensuring
Perl's future on them, please contact us.
We believe that Perl has long been unable to build on mixed endian
architectures (such as PDP-11s), and intend to remove any remaining
support code. Similarly, code supporting the long umaintained GNU
dld will be removed soon if no-one makes themselves known as an
active user.
=item *
Swapping of $< and $>
Perl has supported the idiom of swapping $< and $> (and likewise $( and
$)) to temporarily drop permissions since 5.0, like this:
($<, $>) = ($>, $<);
However, this idiom modifies the real user/group id, which can have
undesirable side-effects, is no longer useful on any platform perl
supports and complicates the implementation of these variables and list
assignment in general.
As an alternative, assignment only to C<< $> >> is recommended:
local $> = $<;
See also: L<Setuid Demystified|http://www.cs.berkeley.edu/~daw/papers/setuid-usenix02.pdf>.
=item *
C<microperl>, long broken and of unclear present purpose, will be removed.
=item *
Revamping C<< "\Q" >> semantics in double-quotish strings when combined with
other escapes.
There are several bugs and inconsistencies involving combinations
of C<\Q> and escapes like C<\x>, C<\L>, etc., within a C<\Q...\E> pair.
These need to be fixed, and doing so will necessarily change current
behavior. The changes have not yet been settled.
=item *
Use of C<$x>, where C<x> stands for any actual (non-printing) C0 control
character will be disallowed in a future Perl version. Use C<${x}>
instead (where again C<x> stands for a control character),
or better, C<$^A> , where C<^> is a caret (CIRCUMFLEX ACCENT),
and C<A> stands for any of the characters listed at the end of
L<perlebcdic/OPERATOR DIFFERENCES>.
=back
=head1 Performance Enhancements
=over 4
=item *
Lists of lexical variable declarations (C<my($x, $y)>) are now optimised
down to a single op and are hence faster than before.
=item *
A new C preprocessor define C<NO_TAINT_SUPPORT> was added that, if set,
disables Perl's taint support altogether. Using the -T or -t command
line flags will cause a fatal error. Beware that both core tests as
well as many a CPAN distribution's tests will fail with this change. On
the upside, it provides a small performance benefit due to reduced
branching.
B<Do not enable this unless you know exactly what you are getting yourself
into.>
=item *
C<pack> with constant arguments is now constant folded in most cases
[perl #113470].
=item *
Speed up in regular expression matching against Unicode properties. The
largest gain is for C<\X>, the Unicode "extended grapheme cluster." The
gain for it is about 35% - 40%. Bracketed character classes, e.g.,
C<[0-9\x{100}]> containing code points above 255 are also now faster.
=item *
On platforms supporting it, several former macros are now implemented as static
inline functions. This should speed things up slightly on non-GCC platforms.
=item *
The optimisation of hashes in boolean context has been extended to
affect C<scalar(%hash)>, C<%hash ? ... : ...>, and C<sub { %hash || ... }>.
=item *
Filetest operators manage the stack in a fractionally more efficient manner.
=item *
Globs used in a numeric context are now numified directly in most cases,
rather than being numified via stringification.
=item *
The C<x> repetition operator is now folded to a single constant at compile
time if called in scalar context with constant operands and no parentheses
around the left operand.
=back
=head1 Modules and Pragmata
=head2 New Modules and Pragmata
=over 4
=item *
L<Config::Perl::V> version 0.16 has been added as a dual-lifed module.
It provides structured data retrieval of C<perl -V> output including
information only known to the C<perl> binary and not available via L<Config>.
=back
=head2 Updated Modules and Pragmata
For a complete list of updates, run:
$ corelist --diff 5.16.0 5.18.0
You can substitute your favorite version in place of C<5.16.0>, too.
=over
=item *
L<Archive::Extract> has been upgraded to 0.68.
Work around an edge case on Linux with Busybox's unzip.
=item *
L<Archive::Tar> has been upgraded to 1.90.
ptar now supports the -T option as well as dashless options
[rt.cpan.org #75473], [rt.cpan.org #75475].
Auto-encode filenames marked as UTF-8 [rt.cpan.org #75474].
Don't use C<tell> on L<IO::Zlib> handles [rt.cpan.org #64339].
Don't try to C<chown> on symlinks.
=item *
L<autodie> has been upgraded to 2.13.
C<autodie> now plays nicely with the 'open' pragma.
=item *
L<B> has been upgraded to 1.42.
The C<stashoff> method of COPs has been added. This provides access to an
internal field added in perl 5.16 under threaded builds [perl #113034].
C<B::COP::stashpv> now supports UTF-8 package names and embedded NULs.
All C<CVf_*> and C<GVf_*>
and more SV-related flag values are now provided as constants in the C<B::>
namespace and available for export. The default export list has not changed.
This makes the module work with the new pad API.
=item *
L<B::Concise> has been upgraded to 0.95.
The C<-nobanner> option has been fixed, and C<format>s can now be dumped.
When passed a sub name to dump, it will check also to see whether it
is the name of a format. If a sub and a format share the same name,
it will dump both.
This adds support for the new C<OpMAYBE_TRUEBOOL> and C<OPpTRUEBOOL> flags.
=item *
L<B::Debug> has been upgraded to 1.18.
This adds support (experimentally) for C<B::PADLIST>, which was
added in Perl 5.17.4.
=item *
L<B::Deparse> has been upgraded to 1.20.
Avoid warning when run under C<perl -w>.
It now deparses
loop controls with the correct precedence, and multiple statements in a
C<format> line are also now deparsed correctly.
This release suppresses trailing semicolons in formats.
This release adds stub deparsing for lexical subroutines.
It no longer dies when deparsing C<sort> without arguments. It now
correctly omits the comma for C<system $prog @args> and C<exec $prog
@args>.
=item *
L<bignum>, L<bigint> and L<bigrat> have been upgraded to 0.33.
The overrides for C<hex> and C<oct> have been rewritten, eliminating
several problems, and making one incompatible change:
=over
=item *
Formerly, whichever of C<use bigint> or C<use bigrat> was compiled later
would take precedence over the other, causing C<hex> and C<oct> not to
respect the other pragma when in scope.
=item *
Using any of these three pragmata would cause C<hex> and C<oct> anywhere
else in the program to evaluate their arguments in list context and prevent
them from inferring $_ when called without arguments.
=item *
Using any of these three pragmata would make C<oct("1234")> return 1234
(for any number not beginning with 0) anywhere in the program. Now "1234"
is translated from octal to decimal, whether within the pragma's scope or
not.
=item *
The global overrides that facilitate lexical use of C<hex> and C<oct> now
respect any existing overrides that were in place before the new overrides
were installed, falling back to them outside of the scope of C<use bignum>.
=item *
C<use bignum "hex">, C<use bignum "oct"> and similar invocations for bigint
and bigrat now export a C<hex> or C<oct> function, instead of providing a
global override.
=back
=item *
L<Carp> has been upgraded to 1.29.
Carp is no longer confused when C<caller> returns undef for a package that
has been deleted.
The C<longmess()> and C<shortmess()> functions are now documented.
=item *
L<CGI> has been upgraded to 3.63.
Unrecognized HTML escape sequences are now handled better, problematic
trailing newlines are no longer inserted after E<lt>formE<gt> tags
by C<startform()> or C<start_form()>, and bogus "Insecure Dependency"
warnings appearing with some versions of perl are now worked around.
=item *
L<Class::Struct> has been upgraded to 0.64.
The constructor now respects overridden accessor methods [perl #29230].
=item *
L<Compress::Raw::Bzip2> has been upgraded to 2.060.
The misuse of Perl's "magic" API has been fixed.
=item *
L<Compress::Raw::Zlib> has been upgraded to 2.060.
Upgrade bundled zlib to version 1.2.7.
Fix build failures on Irix, Solaris, and Win32, and also when building as C++
[rt.cpan.org #69985], [rt.cpan.org #77030], [rt.cpan.org #75222].
The misuse of Perl's "magic" API has been fixed.
C<compress()>, C<uncompress()>, C<memGzip()> and C<memGunzip()> have
been speeded up by making parameter validation more efficient.
=item *
L<CPAN::Meta::Requirements> has been upgraded to 2.122.
Treat undef requirements to C<from_string_hash> as 0 (with a warning).
Added C<requirements_for_module> method.
=item *
L<CPANPLUS> has been upgraded to 0.9135.
Allow adding F<blib/script> to PATH.
Save the history between invocations of the shell.
Handle multiple C<makemakerargs> and C<makeflags> arguments better.
This resolves issues with the SQLite source engine.
=item *
L<Data::Dumper> has been upgraded to 2.145.
It has been optimized to only build a seen-scalar hash as necessary,
thereby speeding up serialization drastically.
Additional tests were added in order to improve statement, branch, condition
and subroutine coverage. On the basis of the coverage analysis, some of the
internals of Dumper.pm were refactored. Almost all methods are now
documented.
=item *
L<DB_File> has been upgraded to 1.827.
The main Perl module no longer uses the C<"@_"> construct.
=item *
L<Devel::Peek> has been upgraded to 1.11.
This fixes compilation with C++ compilers and makes the module work with
the new pad API.
=item *
L<Digest::MD5> has been upgraded to 2.52.
Fix C<Digest::Perl::MD5> OO fallback [rt.cpan.org #66634].
=item *
L<Digest::SHA> has been upgraded to 5.84.
This fixes a double-free bug, which might have caused vulnerabilities
in some cases.
=item *
L<DynaLoader> has been upgraded to 1.18.
This is due to a minor code change in the XS for the VMS implementation.
This fixes warnings about using C<CODE> sections without an C<OUTPUT>
section.
=item *
L<Encode> has been upgraded to 2.49.
The Mac alias x-mac-ce has been added, and various bugs have been fixed
in Encode::Unicode, Encode::UTF7 and Encode::GSM0338.
=item *
L<Env> has been upgraded to 1.04.
Its SPLICE implementation no longer misbehaves in list context.
=item *
L<ExtUtils::CBuilder> has been upgraded to 0.280210.
Manifest files are now correctly embedded for those versions of VC++ which
make use of them. [perl #111782, #111798].
A list of symbols to export can now be passed to C<link()> when on
Windows, as on other OSes [perl #115100].
=item *
L<ExtUtils::ParseXS> has been upgraded to 3.18.
The generated C code now avoids unnecessarily incrementing
C<PL_amagic_generation> on Perl versions where it's done automatically
(or on current Perl where the variable no longer exists).
This avoids a bogus warning for initialised XSUB non-parameters [perl
#112776].
=item *
L<File::Copy> has been upgraded to 2.26.
C<copy()> no longer zeros files when copying into the same directory,
and also now fails (as it has long been documented to do) when attempting
to copy a file over itself.
=item *
L<File::DosGlob> has been upgraded to 1.10.
The internal cache of file names that it keeps for each caller is now
freed when that caller is freed. This means
C<< use File::DosGlob 'glob'; eval 'scalar <*>' >> no longer leaks memory.
=item *
L<File::Fetch> has been upgraded to 0.38.
Added the 'file_default' option for URLs that do not have a file
component.
Use C<File::HomeDir> when available, and provide C<PERL5_CPANPLUS_HOME> to
override the autodetection.
Always re-fetch F<CHECKSUMS> if C<fetchdir> is set.
=item *
L<File::Find> has been upgraded to 1.23.
This fixes inconsistent unixy path handling on VMS.
Individual files may now appear in list of directories to be searched
[perl #59750].
=item *
L<File::Glob> has been upgraded to 1.20.
File::Glob has had exactly the same fix as File::DosGlob. Since it is
what Perl's own C<glob> operator itself uses (except on VMS), this means
C<< eval 'scalar <*>' >> no longer leaks.
A space-separated list of patterns return long lists of results no longer
results in memory corruption or crashes. This bug was introduced in
Perl 5.16.0. [perl #114984]
=item *
L<File::Spec::Unix> has been upgraded to 3.40.
C<abs2rel> could produce incorrect results when given two relative paths or
the root directory twice [perl #111510].
=item *
L<File::stat> has been upgraded to 1.07.
C<File::stat> ignores the L<filetest> pragma, and warns when used in
combination therewith. But it was not warning for C<-r>. This has been
fixed [perl #111640].
C<-p> now works, and does not return false for pipes [perl #111638].
Previously C<File::stat>'s overloaded C<-x> and C<-X> operators did not give
the correct results for directories or executable files when running as
root. They had been treating executable permissions for root just like for
any other user, performing group membership tests I<etc> for files not owned
by root. They now follow the correct Unix behaviour - for a directory they
are always true, and for a file if any of the three execute permission bits
are set then they report that root can execute the file. Perl's builtin
C<-x> and C<-X> operators have always been correct.
=item *
L<File::Temp> has been upgraded to 0.23
Fixes various bugs involving directory removal. Defers unlinking tempfiles if
the initial unlink fails, which fixes problems on NFS.
=item *
L<GDBM_File> has been upgraded to 1.15.
The undocumented optional fifth parameter to C<TIEHASH> has been
removed. This was intended to provide control of the callback used by
C<gdbm*> functions in case of fatal errors (such as filesystem problems),
but did not work (and could never have worked). No code on CPAN even
attempted to use it. The callback is now always the previous default,
C<croak>. Problems on some platforms with how the C<C> C<croak> function
is called have also been resolved.
=item *
L<Hash::Util> has been upgraded to 0.15.
C<hash_unlocked> and C<hashref_unlocked> now returns true if the hash is
unlocked, instead of always returning false [perl #112126].
C<hash_unlocked>, C<hashref_unlocked>, C<lock_hash_recurse> and
C<unlock_hash_recurse> are now exportable [perl #112126].
Two new functions, C<hash_locked> and C<hashref_locked>, have been added.
Oddly enough, these two functions were already exported, even though they
did not exist [perl #112126].
=item *
L<HTTP::Tiny> has been upgraded to 0.025.
Add SSL verification features [github #6], [github #9].
Include the final URL in the response hashref.
Add C<local_address> option.
This improves SSL support.
=item *
L<IO> has been upgraded to 1.28.
C<sync()> can now be called on read-only file handles [perl #64772].
L<IO::Socket> tries harder to cache or otherwise fetch socket
information.
=item *
L<IPC::Cmd> has been upgraded to 0.80.
Use C<POSIX::_exit> instead of C<exit> in C<run_forked> [rt.cpan.org #76901].
=item *
L<IPC::Open3> has been upgraded to 1.13.
The C<open3()> function no longer uses C<POSIX::close()> to close file
descriptors since that breaks the ref-counting of file descriptors done by
PerlIO in cases where the file descriptors are shared by PerlIO streams,
leading to attempts to close the file descriptors a second time when
any such PerlIO streams are closed later on.
=item *
L<Locale::Codes> has been upgraded to 3.25.
It includes some new codes.
=item *
L<Memoize> has been upgraded to 1.03.
Fix the C<MERGE> cache option.
=item *
L<Module::Build> has been upgraded to 0.4003.
Fixed bug where modules without C<$VERSION> might have a version of '0' listed
in 'provides' metadata, which will be rejected by PAUSE.
Fixed bug in PodParser to allow numerals in module names.
Fixed bug where giving arguments twice led to them becoming arrays, resulting
in install paths like F<ARRAY(0xdeadbeef)/lib/Foo.pm>.
A minor bug fix allows markup to be used around the leading "Name" in
a POD "abstract" line, and some documentation improvements have been made.
=item *
L<Module::CoreList> has been upgraded to 2.90
Version information is now stored as a delta, which greatly reduces the
size of the F<CoreList.pm> file.
This restores compatibility with older versions of perl and cleans up
the corelist data for various modules.
=item *
L<Module::Load::Conditional> has been upgraded to 0.54.
Fix use of C<requires> on perls installed to a path with spaces.
Various enhancements include the new use of Module::Metadata.
=item *
L<Module::Metadata> has been upgraded to 1.000011.
The creation of a Module::Metadata object for a typical module file has
been sped up by about 40%, and some spurious warnings about C<$VERSION>s
have been suppressed.
=item *
L<Module::Pluggable> has been upgraded to 4.7.
Amongst other changes, triggers are now allowed on events, which gives
a powerful way to modify behaviour.
=item *
L<Net::Ping> has been upgraded to 2.41.
This fixes some test failures on Windows.
=item *
L<Opcode> has been upgraded to 1.25.
Reflect the removal of the boolkeys opcode and the addition of the
clonecv, introcv and padcv opcodes.
=item *
L<overload> has been upgraded to 1.22.
C<no overload> now warns for invalid arguments, just like C<use overload>.
=item *
L<PerlIO::encoding> has been upgraded to 0.16.
This is the module implementing the ":encoding(...)" I/O layer. It no
longer corrupts memory or crashes when the encoding back-end reallocates
the buffer or gives it a typeglob or shared hash key scalar.
=item *
L<PerlIO::scalar> has been upgraded to 0.16.
The buffer scalar supplied may now only contain code points 0xFF or
lower. [perl #109828]
=item *
L<Perl::OSType> has been upgraded to 1.003.
This fixes a bug detecting the VOS operating system.
=item *
L<Pod::Html> has been upgraded to 1.18.
The option C<--libpods> has been reinstated. It is deprecated, and its use
does nothing other than issue a warning that it is no longer supported.
Since the HTML files generated by pod2html claim to have a UTF-8 charset,
actually write the files out using UTF-8 [perl #111446].
=item *
L<Pod::Simple> has been upgraded to 3.28.
Numerous improvements have been made, mostly to Pod::Simple::XHTML,
which also has a compatibility change: the C<codes_in_verbatim> option
is now disabled by default. See F<cpan/Pod-Simple/ChangeLog> for the
full details.
=item *
L<re> has been upgraded to 0.23
Single character [class]es like C</[s]/> or C</[s]/i> are now optimized
as if they did not have the brackets, i.e. C</s/> or C</s/i>.
See note about C<op_comp> in the L</Internal Changes> section below.
=item *
L<Safe> has been upgraded to 2.35.
Fix interactions with C<Devel::Cover>.
Don't eval code under C<no strict>.
=item *
L<Scalar::Util> has been upgraded to version 1.27.
Fix an overloading issue with C<sum>.
C<first> and C<reduce> now check the callback first (so C<&first(1)> is
disallowed).
Fix C<tainted> on magical values [rt.cpan.org #55763].
Fix C<sum> on previously magical values [rt.cpan.org #61118].
Fix reading past the end of a fixed buffer [rt.cpan.org #72700].
=item *
L<Search::Dict> has been upgraded to 1.07.
No longer require C<stat> on filehandles.
Use C<fc> for casefolding.
=item *
L<Socket> has been upgraded to 2.009.
Constants and functions required for IP multicast source group membership
have been added.
C<unpack_sockaddr_in()> and C<unpack_sockaddr_in6()> now return just the IP
address in scalar context, and C<inet_ntop()> now guards against incorrect
length scalars being passed in.
This fixes an uninitialized memory read.
=item *
L<Storable> has been upgraded to 2.41.
Modifying C<$_[0]> within C<STORABLE_freeze> no longer results in crashes
[perl #112358].
An object whose class implements C<STORABLE_attach> is now thawed only once
when there are multiple references to it in the structure being thawed
[perl #111918].
Restricted hashes were not always thawed correctly [perl #73972].
Storable would croak when freezing a blessed REF object with a
C<STORABLE_freeze()> method [perl #113880].
It can now freeze and thaw vstrings correctly. This causes a slight
incompatible change in the storage format, so the format version has
increased to 2.9.
This contains various bugfixes, including compatibility fixes for older
versions of Perl and vstring handling.
=item *
L<Sys::Syslog> has been upgraded to 0.32.
This contains several bug fixes relating to C<getservbyname()>,
C<setlogsock()>and log levels in C<syslog()>, together with fixes for
Windows, Haiku-OS and GNU/kFreeBSD. See F<cpan/Sys-Syslog/Changes>
for the full details.
=item *
L<Term::ANSIColor> has been upgraded to 4.02.
Add support for italics.
Improve error handling.
=item *
L<Term::ReadLine> has been upgraded to 1.10. This fixes the
use of the B<cpan> and B<cpanp> shells on Windows in the event that the current
drive happens to contain a F<\dev\tty> file.
=item *
L<Test::Harness> has been upgraded to 3.26.
Fix glob semantics on Win32 [rt.cpan.org #49732].
Don't use C<Win32::GetShortPathName> when calling perl [rt.cpan.org #47890].
Ignore -T when reading shebang [rt.cpan.org #64404].
Handle the case where we don't know the wait status of the test more
gracefully.
Make the test summary 'ok' line overridable so that it can be changed to a
plugin to make the output of prove idempotent.
Don't run world-writable files.
=item *
L<Text::Tabs> and L<Text::Wrap> have been upgraded to
2012.0818. Support for Unicode combining characters has been added to them
both.
=item *
L<threads::shared> has been upgraded to 1.31.
This adds the option to warn about or ignore attempts to clone structures
that can't be cloned, as opposed to just unconditionally dying in
that case.
This adds support for dual-valued values as created by
L<Scalar::Util::dualvar|Scalar::Util/"dualvar NUM, STRING">.
=item *
L<Tie::StdHandle> has been upgraded to 4.3.
C<READ> now respects the offset argument to C<read> [perl #112826].
=item *
L<Time::Local> has been upgraded to 1.2300.
Seconds values greater than 59 but less than 60 no longer cause
C<timegm()> and C<timelocal()> to croak.
=item *
L<Unicode::UCD> has been upgraded to 0.53.
This adds a function L<all_casefolds()|Unicode::UCD/all_casefolds()>
that returns all the casefolds.
=item *
L<Win32> has been upgraded to 0.47.
New APIs have been added for getting and setting the current code page.
=back
=head2 Removed Modules and Pragmata
=over
=item *
L<Version::Requirements> has been removed from the core distribution. It is
available under a different name: L<CPAN::Meta::Requirements>.
=back
=head1 Documentation
=head2 Changes to Existing Documentation
=head3 L<perlcheat>
=over 4
=item *
L<perlcheat> has been reorganized, and a few new sections were added.
=back
=head3 L<perldata>
=over 4
=item *
Now explicitly documents the behaviour of hash initializer lists that
contain duplicate keys.
=back
=head3 L<perldiag>
=over 4
=item *
The explanation of symbolic references being prevented by "strict refs"
now doesn't assume that the reader knows what symbolic references are.
=back
=head3 L<perlfaq>
=over 4
=item *
L<perlfaq> has been synchronized with version 5.0150040 from CPAN.
=back
=head3 L<perlfunc>
=over 4
=item *
The return value of C<pipe> is now documented.
=item *
Clarified documentation of C<our>.
=back
=head3 L<perlop>
=over 4
=item *
Loop control verbs (C<dump>, C<goto>, C<next>, C<last> and C<redo>) have always
had the same precedence as assignment operators, but this was not documented
until now.
=back
=head3 Diagnostics
The following additions or changes have been made to diagnostic output,
including warnings and fatal error messages. For the complete list of
diagnostic messages, see L<perldiag>.
=head2 New Diagnostics
=head3 New Errors
=over 4
=item *
L<Unterminated delimiter for here document|perldiag/"Unterminated delimiter for here document">
This message now occurs when a here document label has an initial quotation
mark but the final quotation mark is missing.
This replaces a bogus and misleading error message about not finding the label
itself [perl #114104].
=item *
L<panic: child pseudo-process was never scheduled|perldiag/"panic: child pseudo-process was never scheduled">
This error is thrown when a child pseudo-process in the ithreads implementation
on Windows was not scheduled within the time period allowed and therefore was
not able to initialize properly [perl #88840].
=item *
L<Group name must start with a non-digit word character in regex; marked by <-- HERE in mE<sol>%sE<sol>|perldiag/"Group name must start with a non-digit word character in regex; marked by <-- HERE in m/%s/">
This error has been added for C<(?&0)>, which is invalid. It used to
produce an incomprehensible error message [perl #101666].
=item *
L<Can't use an undefined value as a subroutine reference|perldiag/"Can't use an undefined value as %s reference">
Calling an undefined value as a subroutine now produces this error message.
It used to, but was accidentally disabled, first in Perl 5.004 for
non-magical variables, and then in Perl v5.14 for magical (e.g., tied)
variables. It has now been restored. In the mean time, undef was treated
as an empty string [perl #113576].
=item *
L<Experimental "%s" subs not enabled|perldiag/"Experimental "%s" subs not enabled">
To use lexical subs, you must first enable them:
no warnings 'experimental::lexical_subs';
use feature 'lexical_subs';
my sub foo { ... }
=back
=head3 New Warnings
=over 4
=item *
L<'Strings with code points over 0xFF may not be mapped into in-memory file handles'|perldiag/"Strings with code points over 0xFF may not be mapped into in-memory file handles">
=item *
L<'%s' resolved to '\o{%s}%d'|perldiag/"'%s' resolved to '\o{%s}%d'">
=item *
L<'Trailing white-space in a charnames alias definition is deprecated'|perldiag/"Trailing white-space in a charnames alias definition is deprecated">
=item *
L<'A sequence of multiple spaces in a charnames alias definition is deprecated'|perldiag/"A sequence of multiple spaces in a charnames alias definition is deprecated">
=item *
L<'Passing malformed UTF-8 to "%s" is deprecated'|perldiag/"Passing malformed UTF-8 to "%s" is deprecated">
=item *
L<Subroutine "&%s" is not available|perldiag/"Subroutine "&%s" is not available">
(W closure) During compilation, an inner named subroutine or eval is
attempting to capture an outer lexical subroutine that is not currently
available. This can happen for one of two reasons. First, the lexical
subroutine may be declared in an outer anonymous subroutine that has not
yet been created. (Remember that named subs are created at compile time,
while anonymous subs are created at run-time.) For example,
sub { my sub a {...} sub f { \&a } }
At the time that f is created, it can't capture the current the "a" sub,
since the anonymous subroutine hasn't been created yet. Conversely, the
following won't give a warning since the anonymous subroutine has by now
been created and is live:
sub { my sub a {...} eval 'sub f { \&a }' }->();
The second situation is caused by an eval accessing a variable that has
gone out of scope, for example,
sub f {
my sub a {...}
sub { eval '\&a' }
}
f()->();
Here, when the '\&a' in the eval is being compiled, f() is not currently
being executed, so its &a is not available for capture.
=item *
L<"%s" subroutine &%s masks earlier declaration in same %s|perldiag/"%s" subroutine &%s masks earlier declaration in same %s>
(W misc) A "my" or "state" subroutine has been redeclared in the
current scope or statement, effectively eliminating all access to
the previous instance. This is almost always a typographical error.
Note that the earlier subroutine will still exist until the end of
the scope or until all closure references to it are destroyed.
=item *
L<The %s feature is experimental|perldiag/"The %s feature is experimental">
(S experimental) This warning is emitted if you enable an experimental
feature via C<use feature>. Simply suppress the warning if you want
to use the feature, but know that in doing so you are taking the risk
of using an experimental feature which may change or be removed in a
future Perl version:
no warnings "experimental::lexical_subs";
use feature "lexical_subs";
=item *
L<sleep(%u) too large|perldiag/"sleep(%u) too large">
(W overflow) You called C<sleep> with a number that was larger than it can
reliably handle and C<sleep> probably slept for less time than requested.
=item *
L<Wide character in setenv|perldiag/"Wide character in %s">
Attempts to put wide characters into environment variables via C<%ENV> now
provoke this warning.
=item *
"L<Invalid negative number (%s) in chr|perldiag/"Invalid negative number (%s) in chr">"
C<chr()> now warns when passed a negative value [perl #83048].
=item *
"L<Integer overflow in srand|perldiag/"Integer overflow in srand">"
C<srand()> now warns when passed a value that doesn't fit in a C<UV> (since the
value will be truncated rather than overflowing) [perl #40605].
=item *
"L<-i used with no filenames on the command line, reading from STDIN|perldiag/"-i used with no filenames on the command line, reading from STDIN">"
Running perl with the C<-i> flag now warns if no input files are provided on
the command line [perl #113410].
=back
=head2 Changes to Existing Diagnostics
=over 4
=item *
L<$* is no longer supported|perldiag/"$* is no longer supported">
The warning that use of C<$*> and C<$#> is no longer supported is now
generated for every location that references them. Previously it would fail
to be generated if another variable using the same typeglob was seen first
(e.g. C<@*> before C<$*>), and would not be generated for the second and
subsequent uses. (It's hard to fix the failure to generate warnings at all
without also generating them every time, and warning every time is
consistent with the warnings that C<$[> used to generate.)
=item *
The warnings for C<\b{> and C<\B{> were added. They are a deprecation
warning which should be turned off by that category. One should not
have to turn off regular regexp warnings as well to get rid of these.
=item *
L<Constant(%s): Call to &{$^H{%s}} did not return a defined value|perldiag/Constant(%s): Call to &{$^H{%s}} did not return a defined value>
Constant overloading that returns C<undef> results in this error message.
For numeric constants, it used to say "Constant(undef)". "undef" has been
replaced with the number itself.
=item *
The error produced when a module cannot be loaded now includes a hint that
the module may need to be installed: "Can't locate hopping.pm in @INC (you
may need to install the hopping module) (@INC contains: ...)"
=item *
L<vector argument not supported with alpha versions|perldiag/vector argument not supported with alpha versions>
This warning was not suppressible, even with C<no warnings>. Now it is
suppressible, and has been moved from the "internal" category to the
"printf" category.
=item *
C<< Can't do {n,m} with n > m in regex; marked by <-- HERE in m/%s/ >>
This fatal error has been turned into a warning that reads:
L<< Quantifier {n,m} with n > m can't match in regex | perldiag/Quantifier {n,m} with n > m can't match in regex >>
(W regexp) Minima should be less than or equal to maxima. If you really want
your regexp to match something 0 times, just put {0}.
=item *
The "Runaway prototype" warning that occurs in bizarre cases has been
removed as being unhelpful and inconsistent.
=item *
The "Not a format reference" error has been removed, as the only case in
which it could be triggered was a bug.
=item *
The "Unable to create sub named %s" error has been removed for the same
reason.
=item *
The 'Can't use "my %s" in sort comparison' error has been downgraded to a
warning, '"my %s" used in sort comparison' (with 'state' instead of 'my'
for state variables). In addition, the heuristics for guessing whether
lexical $a or $b has been misused have been improved to generate fewer
false positives. Lexical $a and $b are no longer disallowed if they are
outside the sort block. Also, a named unary or list operator inside the
sort block no longer causes the $a or $b to be ignored [perl #86136].
=back
=head1 Utility Changes
=head3 L<h2xs>
=over 4
=item *
F<h2xs> no longer produces invalid code for empty defines. [perl #20636]
=back
=head1 Configuration and Compilation
=over 4
=item *
Added C<useversionedarchname> option to Configure
When set, it includes 'api_versionstring' in 'archname'. E.g.
x86_64-linux-5.13.6-thread-multi. It is unset by default.
This feature was requested by Tim Bunce, who observed that
C<INSTALL_BASE> creates a library structure that does not
differentiate by perl version. Instead, it places architecture
specific files in "$install_base/lib/perl5/$archname". This makes
it difficult to use a common C<INSTALL_BASE> library path with
multiple versions of perl.
By setting C<-Duseversionedarchname>, the $archname will be
distinct for architecture I<and> API version, allowing mixed use of
C<INSTALL_BASE>.
=item *
Add a C<PERL_NO_INLINE_FUNCTIONS> option
If C<PERL_NO_INLINE_FUNCTIONS> is defined, don't include "inline.h"
This permits test code to include the perl headers for definitions without
creating a link dependency on the perl library (which may not exist yet).
=item *
Configure will honour the external C<MAILDOMAIN> environment variable, if set.
=item *
C<installman> no longer ignores the silent option
=item *
Both C<META.yml> and C<META.json> files are now included in the distribution.
=item *
F<Configure> will now correctly detect C<isblank()> when compiling with a C++
compiler.
=item *
The pager detection in F<Configure> has been improved to allow responses which
specify options after the program name, e.g. B</usr/bin/less -R>, if the user
accepts the default value. This helps B<perldoc> when handling ANSI escapes
[perl #72156].
=back
=head1 Testing
=over 4
=item *
The test suite now has a section for tests that require very large amounts
of memory. These tests won't run by default; they can be enabled by
setting the C<PERL_TEST_MEMORY> environment variable to the number of
gibibytes of memory that may be safely used.
=back
=head1 Platform Support
=head2 Discontinued Platforms
=over 4
=item BeOS
BeOS was an operating system for personal computers developed by Be Inc,
initially for their BeBox hardware. The OS Haiku was written as an open
source replacement for/continuation of BeOS, and its perl port is current and
actively maintained.
=item UTS Global
Support code relating to UTS global has been removed. UTS was a mainframe
version of System V created by Amdahl, subsequently sold to UTS Global. The
port has not been touched since before Perl v5.8.0, and UTS Global is now
defunct.
=item VM/ESA
Support for VM/ESA has been removed. The port was tested on 2.3.0, which
IBM ended service on in March 2002. 2.4.0 ended service in June 2003, and
was superseded by Z/VM. The current version of Z/VM is V6.2.0, and scheduled
for end of service on 2015/04/30.
=item MPE/IX
Support for MPE/IX has been removed.
=item EPOC
Support code relating to EPOC has been removed. EPOC was a family of
operating systems developed by Psion for mobile devices. It was the
predecessor of Symbian. The port was last updated in April 2002.
=item Rhapsody
Support for Rhapsody has been removed.
=back
=head2 Platform-Specific Notes
=head3 AIX
Configure now always adds C<-qlanglvl=extc99> to the CC flags on AIX when
using xlC. This will make it easier to compile a number of XS-based modules
that assume C99 [perl #113778].
=head3 clang++
There is now a workaround for a compiler bug that prevented compiling
with clang++ since Perl v5.15.7 [perl #112786].
=head3 C++
When compiling the Perl core as C++ (which is only semi-supported), the
mathom functions are now compiled as C<extern "C">, to ensure proper
binary compatibility. (However, binary compatibility isn't generally
guaranteed anyway in the situations where this would matter.)
=head3 Darwin
Stop hardcoding an alignment on 8 byte boundaries to fix builds using
-Dusemorebits.
=head3 Haiku
Perl should now work out of the box on Haiku R1 Alpha 4.
=head3 MidnightBSD
C<libc_r> was removed from recent versions of MidnightBSD and older versions
work better with C<pthread>. Threading is now enabled using C<pthread> which
corrects build errors with threading enabled on 0.4-CURRENT.
=head3 Solaris
In Configure, avoid running sed commands with flags not supported on Solaris.
=head3 VMS
=over
=item *
Where possible, the case of filenames and command-line arguments is now
preserved by enabling the CRTL features C<DECC$EFS_CASE_PRESERVE> and
C<DECC$ARGV_PARSE_STYLE> at start-up time. The latter only takes effect
when extended parse is enabled in the process from which Perl is run.
=item *
The character set for Extended Filename Syntax (EFS) is now enabled by default
on VMS. Among other things, this provides better handling of dots in directory
names, multiple dots in filenames, and spaces in filenames. To obtain the old
behavior, set the logical name C<DECC$EFS_CHARSET> to C<DISABLE>.
=item *
Fixed linking on builds configured with C<-Dusemymalloc=y>.
=item *
Experimental support for building Perl with the HP C++ compiler is available
by configuring with C<-Dusecxx>.
=item *
All C header files from the top-level directory of the distribution are now
installed on VMS, providing consistency with a long-standing practice on other
platforms. Previously only a subset were installed, which broke non-core
extension builds for extensions that depended on the missing include files.
=item *
Quotes are now removed from the command verb (but not the parameters) for
commands spawned via C<system>, backticks, or a piped C<open>. Previously,
quotes on the verb were passed through to DCL, which would fail to recognize
the command. Also, if the verb is actually a path to an image or command
procedure on an ODS-5 volume, quoting it now allows the path to contain spaces.
=item *
The B<a2p> build has been fixed for the HP C++ compiler on OpenVMS.
=back
=head3 Win32
=over
=item *
Perl can now be built using Microsoft's Visual C++ 2012 compiler by specifying
CCTYPE=MSVC110 (or MSVC110FREE if you are using the free Express edition for
Windows Desktop) in F<win32/Makefile>.
=item *
The option to build without C<USE_SOCKETS_AS_HANDLES> has been removed.
=item *
Fixed a problem where perl could crash while cleaning up threads (including the
main thread) in threaded debugging builds on Win32 and possibly other platforms
[perl #114496].
=item *
A rare race condition that would lead to L<sleep|perlfunc/sleep> taking more
time than requested, and possibly even hanging, has been fixed [perl #33096].
=item *
C<link> on Win32 now attempts to set C<$!> to more appropriate values
based on the Win32 API error code. [perl #112272]
Perl no longer mangles the environment block, e.g. when launching a new
sub-process, when the environment contains non-ASCII characters. Known
problems still remain, however, when the environment contains characters
outside of the current ANSI codepage (e.g. see the item about Unicode in
C<%ENV> in L<http://perl5.git.perl.org/perl.git/blob/HEAD:/Porting/todo.pod>).
[perl #113536]
=item *
Building perl with some Windows compilers used to fail due to a problem
with miniperl's C<glob> operator (which uses the C<perlglob> program)
deleting the PATH environment variable [perl #113798].
=item *
A new makefile option, C<USE_64_BIT_INT>, has been added to the Windows
makefiles. Set this to "define" when building a 32-bit perl if you want
it to use 64-bit integers.
Machine code size reductions, already made to the DLLs of XS modules in
Perl v5.17.2, have now been extended to the perl DLL itself.
Building with VC++ 6.0 was inadvertently broken in Perl v5.17.2 but has
now been fixed again.
=back
=head3 WinCE
Building on WinCE is now possible once again, although more work is required
to fully restore a clean build.
=head1 Internal Changes
=over
=item *
Synonyms for the misleadingly named C<av_len()> have been created:
C<av_top_index()> and C<av_tindex>. All three of these return the
number of the highest index in the array, not the number of elements it
contains.
=item *
SvUPGRADE() is no longer an expression. Originally this macro (and its
underlying function, sv_upgrade()) were documented as boolean, although
in reality they always croaked on error and never returned false. In 2005
the documentation was updated to specify a void return value, but
SvUPGRADE() was left always returning 1 for backwards compatibility. This
has now been removed, and SvUPGRADE() is now a statement with no return
value.
So this is now a syntax error:
if (!SvUPGRADE(sv)) { croak(...); }
If you have code like that, simply replace it with
SvUPGRADE(sv);
or to avoid compiler warnings with older perls, possibly
(void)SvUPGRADE(sv);
=item *
Perl has a new copy-on-write mechanism that allows any SvPOK scalar to be
upgraded to a copy-on-write scalar. A reference count on the string buffer
is stored in the string buffer itself. This feature is B<not enabled by
default>.
It can be enabled in a perl build by running F<Configure> with
B<-Accflags=-DPERL_NEW_COPY_ON_WRITE>, and we would encourage XS authors
to try their code with such an enabled perl, and provide feedback.
Unfortunately, there is not yet a good guide to updating XS code to cope
with COW. Until such a document is available, consult the perl5-porters
mailing list.
It breaks a few XS modules by allowing copy-on-write scalars to go
through code paths that never encountered them before.
=item *
Copy-on-write no longer uses the SvFAKE and SvREADONLY flags. Hence,
SvREADONLY indicates a true read-only SV.
Use the SvIsCOW macro (as before) to identify a copy-on-write scalar.
=item *
C<PL_glob_index> is gone.
=item *
The private Perl_croak_no_modify has had its context parameter removed. It is
now has a void prototype. Users of the public API croak_no_modify remain
unaffected.
=item *
Copy-on-write (shared hash key) scalars are no longer marked read-only.
C<SvREADONLY> returns false on such an SV, but C<SvIsCOW> still returns
true.
=item *
A new op type, C<OP_PADRANGE> has been introduced. The perl peephole
optimiser will, where possible, substitute a single padrange op for a
pushmark followed by one or more pad ops, and possibly also skipping list
and nextstate ops. In addition, the op can carry out the tasks associated
with the RHS of a C<< my(...) = @_ >> assignment, so those ops may be optimised
away too.
=item *
Case-insensitive matching inside a [bracketed] character class with a
multi-character fold no longer excludes one of the possibilities in the
circumstances that it used to. [perl #89774].
=item *
C<PL_formfeed> has been removed.
=item *
The regular expression engine no longer reads one byte past the end of the
target string. While for all internally well-formed scalars this should
never have been a problem, this change facilitates clever tricks with
string buffers in CPAN modules. [perl #73542]
=item *
Inside a BEGIN block, C<PL_compcv> now points to the currently-compiling
subroutine, rather than the BEGIN block itself.
=item *
C<mg_length> has been deprecated.
=item *
C<sv_len> now always returns a byte count and C<sv_len_utf8> a character
count. Previously, C<sv_len> and C<sv_len_utf8> were both buggy and would
sometimes returns bytes and sometimes characters. C<sv_len_utf8> no longer
assumes that its argument is in UTF-8. Neither of these creates UTF-8 caches
for tied or overloaded values or for non-PVs any more.
=item *
C<sv_mortalcopy> now copies string buffers of shared hash key scalars when
called from XS modules [perl #79824].
=item *
The new C<RXf_MODIFIES_VARS> flag can be set by custom regular expression
engines to indicate that the execution of the regular expression may cause
variables to be modified. This lets C<s///> know to skip certain
optimisations. Perl's own regular expression engine sets this flag for the
special backtracking verbs that set $REGMARK and $REGERROR.
=item *
The APIs for accessing lexical pads have changed considerably.
C<PADLIST>s are now longer C<AV>s, but their own type instead.
C<PADLIST>s now contain a C<PAD> and a C<PADNAMELIST> of C<PADNAME>s,
rather than C<AV>s for the pad and the list of pad names. C<PAD>s,
C<PADNAMELIST>s, and C<PADNAME>s are to be accessed as such through the
newly added pad API instead of the plain C<AV> and C<SV> APIs. See
L<perlapi> for details.
=item *
In the regex API, the numbered capture callbacks are passed an index
indicating what match variable is being accessed. There are special
index values for the C<$`, $&, $&> variables. Previously the same three
values were used to retrieve C<${^PREMATCH}, ${^MATCH}, ${^POSTMATCH}>
too, but these have now been assigned three separate values. See
L<perlreapi/Numbered capture callbacks>.
=item *
C<PL_sawampersand> was previously a boolean indicating that any of
C<$`, $&, $&> had been seen; it now contains three one-bit flags
indicating the presence of each of the variables individually.
=item *
The C<CV *> typemap entry now supports C<&{}> overloading and typeglobs,
just like C<&{...}> [perl #96872].
=item *
The C<SVf_AMAGIC> flag to indicate overloading is now on the stash, not the
object. It is now set automatically whenever a method or @ISA changes, so
its meaning has changed, too. It now means "potentially overloaded". When
the overload table is calculated, the flag is automatically turned off if
there is no overloading, so there should be no noticeable slowdown.
The staleness of the overload tables is now checked when overload methods
are invoked, rather than during C<bless>.
"A" magic is gone. The changes to the handling of the C<SVf_AMAGIC> flag
eliminate the need for it.
C<PL_amagic_generation> has been removed as no longer necessary. For XS
modules, it is now a macro alias to C<PL_na>.
The fallback overload setting is now stored in a stash entry separate from
overloadedness itself.
=item *
The character-processing code has been cleaned up in places. The changes
should be operationally invisible.
=item *
The C<study> function was made a no-op in v5.16. It was simply disabled via
a C<return> statement; the code was left in place. Now the code supporting
what C<study> used to do has been removed.
=item *
Under threaded perls, there is no longer a separate PV allocated for every
COP to store its package name (C<< cop->stashpv >>). Instead, there is an
offset (C<< cop->stashoff >>) into the new C<PL_stashpad> array, which
holds stash pointers.
=item *
In the pluggable regex API, the C<regexp_engine> struct has acquired a new
field C<op_comp>, which is currently just for perl's internal use, and
should be initialized to NULL by other regex plugin modules.
=item *
A new function C<alloccopstash> has been added to the API, but is considered
experimental. See L<perlapi>.
=item *
Perl used to implement get magic in a way that would sometimes hide bugs in
code that could call mg_get() too many times on magical values. This hiding of
errors no longer occurs, so long-standing bugs may become visible now. If
you see magic-related errors in XS code, check to make sure it, together
with the Perl API functions it uses, calls mg_get() only once on SvGMAGICAL()
values.
=item *
OP allocation for CVs now uses a slab allocator. This simplifies
memory management for OPs allocated to a CV, so cleaning up after a
compilation error is simpler and safer [perl #111462][perl #112312].
=item *
C<PERL_DEBUG_READONLY_OPS> has been rewritten to work with the new slab
allocator, allowing it to catch more violations than before.
=item *
The old slab allocator for ops, which was only enabled for C<PERL_IMPLICIT_SYS>
and C<PERL_DEBUG_READONLY_OPS>, has been retired.
=back
=head1 Selected Bug Fixes
=over 4
=item *
Here document terminators no longer require a terminating newline character when
they occur at the end of a file. This was already the case at the end of a
string eval [perl #65838].
=item *
C<-DPERL_GLOBAL_STRUCT> builds now free the global struct B<after>
they've finished using it.
=item *
A trailing '/' on a path in @INC will no longer have an additional '/'
appended.
=item *
The C<:crlf> layer now works when unread data doesn't fit into its own
buffer. [perl #112244].
=item *
C<ungetc()> now handles UTF-8 encoded data. [perl #116322].
=item *
A bug in the core typemap caused any C types that map to the T_BOOL core
typemap entry to not be set, updated, or modified when the T_BOOL variable was
used in an OUTPUT: section with an exception for RETVAL. T_BOOL in an INPUT:
section was not affected. Using a T_BOOL return type for an XSUB (RETVAL)
was not affected. A side effect of fixing this bug is, if a T_BOOL is specified
in the OUTPUT: section (which previous did nothing to the SV), and a read only
SV (literal) is passed to the XSUB, croaks like "Modification of a read-only
value attempted" will happen. [perl #115796]
=item *
On many platforms, providing a directory name as the script name caused perl
to do nothing and report success. It should now universally report an error
and exit nonzero. [perl #61362]
=item *
C<sort {undef} ...> under fatal warnings no longer crashes. It had
begun crashing in Perl v5.16.
=item *
Stashes blessed into each other
(C<bless \%Foo::, 'Bar'; bless \%Bar::, 'Foo'>) no longer result in double
frees. This bug started happening in Perl v5.16.
=item *
Numerous memory leaks have been fixed, mostly involving fatal warnings and
syntax errors.
=item *
Some failed regular expression matches such as C<'f' =~ /../g> were not
resetting C<pos>. Also, "match-once" patterns (C<m?...?g>) failed to reset
it, too, when invoked a second time [perl #23180].
=item *
Several bugs involving C<local *ISA> and C<local *Foo::> causing stale
MRO caches have been fixed.
=item *
Defining a subroutine when its typeglob has been aliased no longer results
in stale method caches. This bug was introduced in Perl v5.10.
=item *
Localising a typeglob containing a subroutine when the typeglob's package
has been deleted from its parent stash no longer produces an error. This
bug was introduced in Perl v5.14.
=item *
Under some circumstances, C<local *method=...> would fail to reset method
caches upon scope exit.
=item *
C</[.foo.]/> is no longer an error, but produces a warning (as before) and
is treated as C</[.fo]/> [perl #115818].
=item *
C<goto $tied_var> now calls FETCH before deciding what type of goto
(subroutine or label) this is.
=item *
Renaming packages through glob assignment
(C<*Foo:: = *Bar::; *Bar:: = *Baz::>) in combination with C<m?...?> and
C<reset> no longer makes threaded builds crash.
=item *
A number of bugs related to assigning a list to hash have been fixed. Many of
these involve lists with repeated keys like C<(1, 1, 1, 1)>.
=over 4
=item *
The expression C<scalar(%h = (1, 1, 1, 1))> now returns C<4>, not C<2>.
=item *
The return value of C<%h = (1, 1, 1)> in list context was wrong. Previously
this would return C<(1, undef, 1)>, now it returns C<(1, undef)>.
=item *
Perl now issues the same warning on C<($s, %h) = (1, {})> as it does for
C<(%h) = ({})>, "Reference found where even-sized list expected".
=item *
A number of additional edge cases in list assignment to hashes were
corrected. For more details see commit 23b7025ebc.
=back
=item *
Attributes applied to lexical variables no longer leak memory.
[perl #114764]
=item *
C<dump>, C<goto>, C<last>, C<next>, C<redo> or C<require> followed by a
bareword (or version) and then an infix operator is no longer a syntax
error. It used to be for those infix operators (like C<+>) that have a
different meaning where a term is expected. [perl #105924]
=item *
C<require a::b . 1> and C<require a::b + 1> no longer produce erroneous
ambiguity warnings. [perl #107002]
=item *
Class method calls are now allowed on any string, and not just strings
beginning with an alphanumeric character. [perl #105922]
=item *
An empty pattern created with C<qr//> used in C<m///> no longer triggers
the "empty pattern reuses last pattern" behaviour. [perl #96230]
=item *
Tying a hash during iteration no longer results in a memory leak.
=item *
Freeing a tied hash during iteration no longer results in a memory leak.
=item *
List assignment to a tied array or hash that dies on STORE no longer
results in a memory leak.
=item *
If the hint hash (C<%^H>) is tied, compile-time scope entry (which copies
the hint hash) no longer leaks memory if FETCH dies. [perl #107000]
=item *
Constant folding no longer inappropriately triggers the special
C<split " "> behaviour. [perl #94490]
=item *
C<defined scalar(@array)>, C<defined do { &foo }>, and similar constructs
now treat the argument to C<defined> as a simple scalar. [perl #97466]
=item *
Running a custom debugging that defines no C<*DB::DB> glob or provides a
subroutine stub for C<&DB::DB> no longer results in a crash, but an error
instead. [perl #114990]
=item *
C<reset ""> now matches its documentation. C<reset> only resets C<m?...?>
patterns when called with no argument. An empty string for an argument now
does nothing. (It used to be treated as no argument.) [perl #97958]
=item *
C<printf> with an argument returning an empty list no longer reads past the
end of the stack, resulting in erratic behaviour. [perl #77094]
=item *
C<--subname> no longer produces erroneous ambiguity warnings.
[perl #77240]
=item *
C<v10> is now allowed as a label or package name. This was inadvertently
broken when v-strings were added in Perl v5.6. [perl #56880]
=item *
C<length>, C<pos>, C<substr> and C<sprintf> could be confused by ties,
overloading, references and typeglobs if the stringification of such
changed the internal representation to or from UTF-8. [perl #114410]
=item *
utf8::encode now calls FETCH and STORE on tied variables. utf8::decode now
calls STORE (it was already calling FETCH).
=item *
C<$tied =~ s/$non_utf8/$utf8/> no longer loops infinitely if the tied
variable returns a Latin-1 string, shared hash key scalar, or reference or
typeglob that stringifies as ASCII or Latin-1. This was a regression from
v5.12.
=item *
C<s///> without /e is now better at detecting when it needs to forego
certain optimisations, fixing some buggy cases:
=over
=item *
Match variables in certain constructs (C<&&>, C<||>, C<..> and others) in
the replacement part; e.g., C<s/(.)/$l{$a||$1}/g>. [perl #26986]
=item *
Aliases to match variables in the replacement.
=item *
C<$REGERROR> or C<$REGMARK> in the replacement. [perl #49190]
=item *
An empty pattern (C<s//$foo/>) that causes the last-successful pattern to
be used, when that pattern contains code blocks that modify the variables
in the replacement.
=back
=item *
The taintedness of the replacement string no longer affects the taintedness
of the return value of C<s///e>.
=item *
The C<$|> autoflush variable is created on-the-fly when needed. If this
happened (e.g., if it was mentioned in a module or eval) when the
currently-selected filehandle was a typeglob with an empty IO slot, it used
to crash. [perl #115206]
=item *
Line numbers at the end of a string eval are no longer off by one.
[perl #114658]
=item *
@INC filters (subroutines returned by subroutines in @INC) that set $_ to a
copy-on-write scalar no longer cause the parser to modify that string
buffer in place.
=item *
C<length($object)> no longer returns the undefined value if the object has
string overloading that returns undef. [perl #115260]
=item *
The use of C<PL_stashcache>, the stash name lookup cache for method calls, has
been restored,
Commit da6b625f78f5f133 in August 2011 inadvertently broke the code that looks
up values in C<PL_stashcache>. As it's a only cache, quite correctly everything
carried on working without it.
=item *
The error "Can't localize through a reference" had disappeared in v5.16.0
when C<local %$ref> appeared on the last line of an lvalue subroutine.
This error disappeared for C<\local %$ref> in perl v5.8.1. It has now
been restored.
=item *
The parsing of here-docs has been improved significantly, fixing several
parsing bugs and crashes and one memory leak, and correcting wrong
subsequent line numbers under certain conditions.
=item *
Inside an eval, the error message for an unterminated here-doc no longer
has a newline in the middle of it [perl #70836].
=item *
A substitution inside a substitution pattern (C<s/${s|||}//>) no longer
confuses the parser.
=item *
It may be an odd place to allow comments, but C<s//"" # hello/e> has
always worked, I<unless> there happens to be a null character before the
first #. Now it works even in the presence of nulls.
=item *
An invalid range in C<tr///> or C<y///> no longer results in a memory leak.
=item *
String eval no longer treats a semicolon-delimited quote-like operator at
the very end (C<eval 'q;;'>) as a syntax error.
=item *
C<< warn {$_ => 1} + 1 >> is no longer a syntax error. The parser used to
get confused with certain list operators followed by an anonymous hash and
then an infix operator that shares its form with a unary operator.
=item *
C<(caller $n)[6]> (which gives the text of the eval) used to return the
actual parser buffer. Modifying it could result in crashes. Now it always
returns a copy. The string returned no longer has "\n;" tacked on to the
end. The returned text also includes here-doc bodies, which used to be
omitted.
=item *
The UTF-8 position cache is now reset when accessing magical variables, to
avoid the string buffer and the UTF-8 position cache getting out of sync
[perl #114410].
=item *
Various cases of get magic being called twice for magical UTF-8
strings have been fixed.
=item *
This code (when not in the presence of C<$&> etc)
$_ = 'x' x 1_000_000;
1 while /(.)/;
used to skip the buffer copy for performance reasons, but suffered from C<$1>
etc changing if the original string changed. That's now been fixed.
=item *
Perl doesn't use PerlIO anymore to report out of memory messages, as PerlIO
might attempt to allocate more memory.
=item *
In a regular expression, if something is quantified with C<{n,m}> where
C<S<n E<gt> m>>, it can't possibly match. Previously this was a fatal
error, but now is merely a warning (and that something won't match).
[perl #82954].
=item *
It used to be possible for formats defined in subroutines that have
subsequently been undefined and redefined to close over variables in the
wrong pad (the newly-defined enclosing sub), resulting in crashes or
"Bizarre copy" errors.
=item *
Redefinition of XSUBs at run time could produce warnings with the wrong
line number.
=item *
The %vd sprintf format does not support version objects for alpha versions.
It used to output the format itself (%vd) when passed an alpha version, and
also emit an "Invalid conversion in printf" warning. It no longer does,
but produces the empty string in the output. It also no longer leaks
memory in this case.
=item *
C<< $obj->SUPER::method >> calls in the main package could fail if the
SUPER package had already been accessed by other means.
=item *
Stash aliasing (C<< *foo:: = *bar:: >>) no longer causes SUPER calls to ignore
changes to methods or @ISA or use the wrong package.
=item *
Method calls on packages whose names end in ::SUPER are no longer treated
as SUPER method calls, resulting in failure to find the method.
Furthermore, defining subroutines in such packages no longer causes them to
be found by SUPER method calls on the containing package [perl #114924].
=item *
C<\w> now matches the code points U+200C (ZERO WIDTH NON-JOINER) and U+200D
(ZERO WIDTH JOINER). C<\W> no longer matches these. This change is because
Unicode corrected their definition of what C<\w> should match.
=item *
C<dump LABEL> no longer leaks its label.
=item *
Constant folding no longer changes the behaviour of functions like C<stat()>
and C<truncate()> that can take either filenames or handles.
C<stat 1 ? foo : bar> nows treats its argument as a file name (since it is an
arbitrary expression), rather than the handle "foo".
=item *
C<truncate FOO, $len> no longer falls back to treating "FOO" as a file name if
the filehandle has been deleted. This was broken in Perl v5.16.0.
=item *
Subroutine redefinitions after sub-to-glob and glob-to-glob assignments no
longer cause double frees or panic messages.
=item *
C<s///> now turns vstrings into plain strings when performing a substitution,
even if the resulting string is the same (C<s/a/a/>).
=item *
Prototype mismatch warnings no longer erroneously treat constant subs as having
no prototype when they actually have "".
=item *
Constant subroutines and forward declarations no longer prevent prototype
mismatch warnings from omitting the sub name.
=item *
C<undef> on a subroutine now clears call checkers.
=item *
The C<ref> operator started leaking memory on blessed objects in Perl v5.16.0.
This has been fixed [perl #114340].
=item *
C<use> no longer tries to parse its arguments as a statement, making
C<use constant { () };> a syntax error [perl #114222].
=item *
On debugging builds, "uninitialized" warnings inside formats no longer cause
assertion failures.
=item *
On debugging builds, subroutines nested inside formats no longer cause
assertion failures [perl #78550].
=item *
Formats and C<use> statements are now permitted inside formats.
=item *
C<print $x> and C<sub { print $x }-E<gt>()> now always produce the same output.
It was possible for the latter to refuse to close over $x if the variable was
not active; e.g., if it was defined outside a currently-running named
subroutine.
=item *
Similarly, C<print $x> and C<print eval '$x'> now produce the same output.
This also allows "my $x if 0" variables to be seen in the debugger [perl
#114018].
=item *
Formats called recursively no longer stomp on their own lexical variables, but
each recursive call has its own set of lexicals.
=item *
Attempting to free an active format or the handle associated with it no longer
results in a crash.
=item *
Format parsing no longer gets confused by braces, semicolons and low-precedence
operators. It used to be possible to use braces as format delimiters (instead
of C<=> and C<.>), but only sometimes. Semicolons and low-precedence operators
in format argument lines no longer confuse the parser into ignoring the line's
return value. In format argument lines, braces can now be used for anonymous
hashes, instead of being treated always as C<do> blocks.
=item *
Formats can now be nested inside code blocks in regular expressions and other
quoted constructs (C</(?{...})/> and C<qq/${...}/>) [perl #114040].
=item *
Formats are no longer created after compilation errors.
=item *
Under debugging builds, the B<-DA> command line option started crashing in Perl
v5.16.0. It has been fixed [perl #114368].
=item *
A potential deadlock scenario involving the premature termination of a pseudo-
forked child in a Windows build with ithreads enabled has been fixed. This
resolves the common problem of the F<t/op/fork.t> test hanging on Windows [perl
#88840].
=item *
The code which generates errors from C<require()> could potentially read one or
two bytes before the start of the filename for filenames less than three bytes
long and ending C</\.p?\z/>. This has now been fixed. Note that it could
never have happened with module names given to C<use()> or C<require()> anyway.
=item *
The handling of pathnames of modules given to C<require()> has been made
thread-safe on VMS.
=item *
Non-blocking sockets have been fixed on VMS.
=item *
Pod can now be nested in code inside a quoted construct outside of a string
eval. This used to work only within string evals [perl #114040].
=item *
C<goto ''> now looks for an empty label, producing the "goto must have
label" error message, instead of exiting the program [perl #111794].
=item *
C<goto "\0"> now dies with "Can't find label" instead of "goto must have
label".
=item *
The C function C<hv_store> used to result in crashes when used on C<%^H>
[perl #111000].
=item *
A call checker attached to a closure prototype via C<cv_set_call_checker>
is now copied to closures cloned from it. So C<cv_set_call_checker> now
works inside an attribute handler for a closure.
=item *
Writing to C<$^N> used to have no effect. Now it croaks with "Modification
of a read-only value" by default, but that can be overridden by a custom
regular expression engine, as with C<$1> [perl #112184].
=item *
C<undef> on a control character glob (C<undef *^H>) no longer emits an
erroneous warning about ambiguity [perl #112456].
=item *
For efficiency's sake, many operators and built-in functions return the
same scalar each time. Lvalue subroutines and subroutines in the CORE::
namespace were allowing this implementation detail to leak through.
C<print &CORE::uc("a"), &CORE::uc("b")> used to print "BB". The same thing
would happen with an lvalue subroutine returning the return value of C<uc>.
Now the value is copied in such cases.
=item *
C<method {}> syntax with an empty block or a block returning an empty list
used to crash or use some random value left on the stack as its invocant.
Now it produces an error.
=item *
C<vec> now works with extremely large offsets (E<gt>2 GB) [perl #111730].
=item *
Changes to overload settings now take effect immediately, as do changes to
inheritance that affect overloading. They used to take effect only after
C<bless>.
Objects that were created before a class had any overloading used to remain
non-overloaded even if the class gained overloading through C<use overload>
or @ISA changes, and even after C<bless>. This has been fixed
[perl #112708].
=item *
Classes with overloading can now inherit fallback values.
=item *
Overloading was not respecting a fallback value of 0 if there were
overloaded objects on both sides of an assignment operator like C<+=>
[perl #111856].
=item *
C<pos> now croaks with hash and array arguments, instead of producing
erroneous warnings.
=item *
C<while(each %h)> now implies C<while(defined($_ = each %h))>, like
C<readline> and C<readdir>.
=item *
Subs in the CORE:: namespace no longer crash after C<undef *_> when called
with no argument list (C<&CORE::time> with no parentheses).
=item *
C<unpack> no longer produces the "'/' must follow a numeric type in unpack"
error when it is the data that are at fault [perl #60204].
=item *
C<join> and C<"@array"> now call FETCH only once on a tied C<$">
[perl #8931].
=item *
Some subroutine calls generated by compiling core ops affected by a
C<CORE::GLOBAL> override had op checking performed twice. The checking
is always idempotent for pure Perl code, but the double checking can
matter when custom call checkers are involved.
=item *
A race condition used to exist around fork that could cause a signal sent to
the parent to be handled by both parent and child. Signals are now blocked
briefly around fork to prevent this from happening [perl #82580].
=item *
The implementation of code blocks in regular expressions, such as C<(?{})>
and C<(??{})>, has been heavily reworked to eliminate a whole slew of bugs.
The main user-visible changes are:
=over 4
=item *
Code blocks within patterns are now parsed in the same pass as the
surrounding code; in particular it is no longer necessary to have balanced
braces: this now works:
/(?{ $x='{' })/
This means that this error message is no longer generated:
Sequence (?{...}) not terminated or not {}-balanced in regex
but a new error may be seen:
Sequence (?{...}) not terminated with ')'
In addition, literal code blocks within run-time patterns are only
compiled once, at perl compile-time:
for my $p (...) {
# this 'FOO' block of code is compiled once,
# at the same time as the surrounding 'for' loop
/$p{(?{FOO;})/;
}
=item *
Lexical variables are now sane as regards scope, recursion and closure
behavior. In particular, C</A(?{B})C/> behaves (from a closure viewpoint)
exactly like C</A/ && do { B } && /C/>, while C<qr/A(?{B})C/> is like
C<sub {/A/ && do { B } && /C/}>. So this code now works how you might
expect, creating three regexes that match 0, 1, and 2:
for my $i (0..2) {
push @r, qr/^(??{$i})$/;
}
"1" =~ $r[1]; # matches
=item *
The C<use re 'eval'> pragma is now only required for code blocks defined
at runtime; in particular in the following, the text of the C<$r> pattern is
still interpolated into the new pattern and recompiled, but the individual
compiled code-blocks within C<$r> are reused rather than being recompiled,
and C<use re 'eval'> isn't needed any more:
my $r = qr/abc(?{....})def/;
/xyz$r/;
=item *
Flow control operators no longer crash. Each code block runs in a new
dynamic scope, so C<next> etc. will not see
any enclosing loops. C<return> returns a value
from the code block, not from any enclosing subroutine.
=item *
Perl normally caches the compilation of run-time patterns, and doesn't
recompile if the pattern hasn't changed, but this is now disabled if
required for the correct behavior of closures. For example:
my $code = '(??{$x})';
for my $x (1..3) {
# recompile to see fresh value of $x each time
$x =~ /$code/;
}
=item *
The C</msix> and C<(?msix)> etc. flags are now propagated into the return
value from C<(??{})>; this now works:
"AB" =~ /a(??{'b'})/i;
=item *
Warnings and errors will appear to come from the surrounding code (or for
run-time code blocks, from an eval) rather than from an C<re_eval>:
use re 'eval'; $c = '(?{ warn "foo" })'; /$c/;
/(?{ warn "foo" })/;
formerly gave:
foo at (re_eval 1) line 1.
foo at (re_eval 2) line 1.
and now gives:
foo at (eval 1) line 1.
foo at /some/prog line 2.
=back
=item *
Perl now can be recompiled to use any Unicode version. In v5.16, it
worked on Unicodes 6.0 and 6.1, but there were various bugs if earlier
releases were used; the older the release the more problems.
=item *
C<vec> no longer produces "uninitialized" warnings in lvalue context
[perl #9423].
=item *
An optimization involving fixed strings in regular expressions could cause
a severe performance penalty in edge cases. This has been fixed
[perl #76546].
=item *
In certain cases, including empty subpatterns within a regular expression (such
as C<(?:)> or C<(?:|)>) could disable some optimizations. This has been fixed.
=item *
The "Can't find an opnumber" message that C<prototype> produces when passed
a string like "CORE::nonexistent_keyword" now passes UTF-8 and embedded
NULs through unchanged [perl #97478].
=item *
C<prototype> now treats magical variables like C<$1> the same way as
non-magical variables when checking for the CORE:: prefix, instead of
treating them as subroutine names.
=item *
Under threaded perls, a runtime code block in a regular expression could
corrupt the package name stored in the op tree, resulting in bad reads
in C<caller>, and possibly crashes [perl #113060].
=item *
Referencing a closure prototype (C<\&{$_[1]}> in an attribute handler for a
closure) no longer results in a copy of the subroutine (or assertion
failures on debugging builds).
=item *
C<eval '__PACKAGE__'> now returns the right answer on threaded builds if
the current package has been assigned over (as in
C<*ThisPackage:: = *ThatPackage::>) [perl #78742].
=item *
If a package is deleted by code that it calls, it is possible for C<caller>
to see a stack frame belonging to that deleted package. C<caller> could
crash if the stash's memory address was reused for a scalar and a
substitution was performed on the same scalar [perl #113486].
=item *
C<UNIVERSAL::can> no longer treats its first argument differently
depending on whether it is a string or number internally.
=item *
C<open> with C<< <& >> for the mode checks to see whether the third argument is
a number, in determining whether to treat it as a file descriptor or a handle
name. Magical variables like C<$1> were always failing the numeric check and
being treated as handle names.
=item *
C<warn>'s handling of magical variables (C<$1>, ties) has undergone several
fixes. C<FETCH> is only called once now on a tied argument or a tied C<$@>
[perl #97480]. Tied variables returning objects that stringify as "" are
no longer ignored. A tied C<$@> that happened to return a reference the
I<previous> time it was used is no longer ignored.
=item *
C<warn ""> now treats C<$@> with a number in it the same way, regardless of
whether it happened via C<$@=3> or C<$@="3">. It used to ignore the
former. Now it appends "\t...caught", as it has always done with
C<$@="3">.
=item *
Numeric operators on magical variables (e.g., S<C<$1 + 1>>) used to use
floating point operations even where integer operations were more appropriate,
resulting in loss of accuracy on 64-bit platforms [perl #109542].
=item *
Unary negation no longer treats a string as a number if the string happened
to be used as a number at some point. So, if C<$x> contains the string "dogs",
C<-$x> returns "-dogs" even if C<$y=0+$x> has happened at some point.
=item *
In Perl v5.14, C<-'-10'> was fixed to return "10", not "+10". But magical
variables (C<$1>, ties) were not fixed till now [perl #57706].
=item *
Unary negation now treats strings consistently, regardless of the internal
C<UTF8> flag.
=item *
A regression introduced in Perl v5.16.0 involving
C<tr/I<SEARCHLIST>/I<REPLACEMENTLIST>/> has been fixed. Only the first
instance is supposed to be meaningful if a character appears more than
once in C<I<SEARCHLIST>>. Under some circumstances, the final instance
was overriding all earlier ones. [perl #113584]
=item *
Regular expressions like C<qr/\87/> previously silently inserted a NUL
character, thus matching as if it had been written C<qr/\00087/>. Now it
matches as if it had been written as C<qr/87/>, with a message that the
sequence C<"\8"> is unrecognized.
=item *
C<__SUB__> now works in special blocks (C<BEGIN>, C<END>, etc.).
=item *
Thread creation on Windows could theoretically result in a crash if done
inside a C<BEGIN> block. It still does not work properly, but it no longer
crashes [perl #111610].
=item *
C<\&{''}> (with the empty string) now autovivifies a stub like any other
sub name, and no longer produces the "Unable to create sub" error
[perl #94476].
=item *
A regression introduced in v5.14.0 has been fixed, in which some calls
to the C<re> module would clobber C<$_> [perl #113750].
=item *
C<do FILE> now always either sets or clears C<$@>, even when the file can't be
read. This ensures that testing C<$@> first (as recommended by the
documentation) always returns the correct result.
=item *
The array iterator used for the C<each @array> construct is now correctly
reset when C<@array> is cleared [perl #75596]. This happens, for example, when
the array is globally assigned to, as in C<@array = (...)>, but not when its
B<values> are assigned to. In terms of the XS API, it means that C<av_clear()>
will now reset the iterator.
This mirrors the behaviour of the hash iterator when the hash is cleared.
=item *
C<< $class->can >>, C<< $class->isa >>, and C<< $class->DOES >> now return
correct results, regardless of whether that package referred to by C<$class>
exists [perl #47113].
=item *
Arriving signals no longer clear C<$@> [perl #45173].
=item *
Allow C<my ()> declarations with an empty variable list [perl #113554].
=item *
During parsing, subs declared after errors no longer leave stubs
[perl #113712].
=item *
Closures containing no string evals no longer hang on to their containing
subroutines, allowing variables closed over by outer subroutines to be
freed when the outer sub is freed, even if the inner sub still exists
[perl #89544].
=item *
Duplication of in-memory filehandles by opening with a "<&=" or ">&=" mode
stopped working properly in v5.16.0. It was causing the new handle to
reference a different scalar variable. This has been fixed [perl #113764].
=item *
C<qr//> expressions no longer crash with custom regular expression engines
that do not set C<offs> at regular expression compilation time
[perl #112962].
=item *
C<delete local> no longer crashes with certain magical arrays and hashes
[perl #112966].
=item *
C<local> on elements of certain magical arrays and hashes used not to
arrange to have the element deleted on scope exit, even if the element did
not exist before C<local>.
=item *
C<scalar(write)> no longer returns multiple items [perl #73690].
=item *
String to floating point conversions no longer misparse certain strings under
C<use locale> [perl #109318].
=item *
C<@INC> filters that die no longer leak memory [perl #92252].
=item *
The implementations of overloaded operations are now called in the correct
context. This allows, among other things, being able to properly override
C<< <> >> [perl #47119].
=item *
Specifying only the C<fallback> key when calling C<use overload> now behaves
properly [perl #113010].
=item *
C<< sub foo { my $a = 0; while ($a) { ... } } >> and
C<< sub foo { while (0) { ... } } >> now return the same thing [perl #73618].
=item *
String negation now behaves the same under C<use integer;> as it does
without [perl #113012].
=item *
C<chr> now returns the Unicode replacement character (U+FFFD) for -1,
regardless of the internal representation. -1 used to wrap if the argument
was tied or a string internally.
=item *
Using a C<format> after its enclosing sub was freed could crash as of
perl v5.12.0, if the format referenced lexical variables from the outer sub.
=item *
Using a C<format> after its enclosing sub was undefined could crash as of
perl v5.10.0, if the format referenced lexical variables from the outer sub.
=item *
Using a C<format> defined inside a closure, which format references
lexical variables from outside, never really worked unless the C<write>
call was directly inside the closure. In v5.10.0 it even started crashing.
Now the copy of that closure nearest the top of the call stack is used to
find those variables.
=item *
Formats that close over variables in special blocks no longer crash if a
stub exists with the same name as the special block before the special
block is compiled.
=item *
The parser no longer gets confused, treating C<eval foo ()> as a syntax
error if preceded by C<print;> [perl #16249].
=item *
The return value of C<syscall> is no longer truncated on 64-bit platforms
[perl #113980].
=item *
Constant folding no longer causes C<print 1 ? FOO : BAR> to print to the
FOO handle [perl #78064].
=item *
C<do subname> now calls the named subroutine and uses the file name it
returns, instead of opening a file named "subname".
=item *
Subroutines looked up by rv2cv check hooks (registered by XS modules) are
now taken into consideration when determining whether C<foo bar> should be
the sub call C<foo(bar)> or the method call C<< "bar"->foo >>.
=item *
C<CORE::foo::bar> is no longer treated specially, allowing global overrides
to be called directly via C<CORE::GLOBAL::uc(...)> [perl #113016].
=item *
Calling an undefined sub whose typeglob has been undefined now produces the
customary "Undefined subroutine called" error, instead of "Not a CODE
reference".
=item *
Two bugs involving @ISA have been fixed. C<*ISA = *glob_without_array> and
C<undef *ISA; @{*ISA}> would prevent future modifications to @ISA from
updating the internal caches used to look up methods. The
*glob_without_array case was a regression from Perl v5.12.
=item *
Regular expression optimisations sometimes caused C<$> with C</m> to
produce failed or incorrect matches [perl #114068].
=item *
C<__SUB__> now works in a C<sort> block when the enclosing subroutine is
predeclared with C<sub foo;> syntax [perl #113710].
=item *
Unicode properties only apply to Unicode code points, which leads to
some subtleties when regular expressions are matched against
above-Unicode code points. There is a warning generated to draw your
attention to this. However, this warning was being generated
inappropriately in some cases, such as when a program was being parsed.
Non-Unicode matches such as C<\w> and C<[:word:]> should not generate the
warning, as their definitions don't limit them to apply to only Unicode
code points. Now the message is only generated when matching against
C<\p{}> and C<\P{}>. There remains a bug, [perl #114148], for the very
few properties in Unicode that match just a single code point. The
warning is not generated if they are matched against an above-Unicode
code point.
=item *
Uninitialized warnings mentioning hash elements would only mention the
element name if it was not in the first bucket of the hash, due to an
off-by-one error.
=item *
A regular expression optimizer bug could cause multiline "^" to behave
incorrectly in the presence of line breaks, such that
C<"/\n\n" =~ m#\A(?:^/$)#im> would not match [perl #115242].
=item *
Failed C<fork> in list context no longer corrupts the stack.
C<@a = (1, 2, fork, 3)> used to gobble up the 2 and assign C<(1, undef, 3)>
if the C<fork> call failed.
=item *
Numerous memory leaks have been fixed, mostly involving tied variables that
die, regular expression character classes and code blocks, and syntax
errors.
=item *
Assigning a regular expression (C<${qr//}>) to a variable that happens to
hold a floating point number no longer causes assertion failures on
debugging builds.
=item *
Assigning a regular expression to a scalar containing a number no longer
causes subsequent numification to produce random numbers.
=item *
Assigning a regular expression to a magic variable no longer wipes away the
magic. This was a regression from v5.10.
=item *
Assigning a regular expression to a blessed scalar no longer results in
crashes. This was also a regression from v5.10.
=item *
Regular expression can now be assigned to tied hash and array elements with
flattening into strings.
=item *
Numifying a regular expression no longer results in an uninitialized
warning.
=item *
Negative array indices no longer cause EXISTS methods of tied variables to
be ignored. This was a regression from v5.12.
=item *
Negative array indices no longer result in crashes on arrays tied to
non-objects.
=item *
C<$byte_overload .= $utf8> no longer results in doubly-encoded UTF-8 if the
left-hand scalar happened to have produced a UTF-8 string the last time
overloading was invoked.
=item *
C<goto &sub> now uses the current value of @_, instead of using the array
the subroutine was originally called with. This means
C<local @_ = (...); goto &sub> now works [perl #43077].
=item *
If a debugger is invoked recursively, it no longer stomps on its own
lexical variables. Formerly under recursion all calls would share the same
set of lexical variables [perl #115742].
=item *
C<*_{ARRAY}> returned from a subroutine no longer spontaneously
becomes empty.
=item *
When using C<say> to print to a tied filehandle, the value of C<$\> is
correctly localized, even if it was previously undef. [perl #119927]
=back
=head1 Known Problems
=over 4
=item *
UTF8-flagged strings in C<%ENV> on HP-UX 11.00 are buggy
The interaction of UTF8-flagged strings and C<%ENV> on HP-UX 11.00 is
currently dodgy in some not-yet-fully-diagnosed way. Expect test
failures in F<t/op/magic.t>, followed by unknown behavior when storing
wide characters in the environment.
=back
=head1 Obituary
Hojung Yoon (AMORETTE), 24, of Seoul, South Korea, went to his long rest
on May 8, 2013 with llama figurine and autographed TIMTOADY card. He
was a brilliant young Perl 5 & 6 hacker and a devoted member of
Seoul.pm. He programmed Perl, talked Perl, ate Perl, and loved Perl. We
believe that he is still programming in Perl with his broken IBM laptop
somewhere. He will be missed.
=head1 Acknowledgements
Perl v5.18.0 represents approximately 12 months of development since
Perl v5.16.0 and contains approximately 400,000 lines of changes across
2,100 files from 113 authors.
Perl continues to flourish into its third decade thanks to a vibrant
community of users and developers. The following people are known to
have contributed the improvements that became Perl v5.18.0:
Aaron Crane, Aaron Trevena, Abhijit Menon-Sen, Adrian M. Enache, Alan
Haggai Alavi, Alexandr Ciornii, Andrew Tam, Andy Dougherty, Anton Nikishaev,
Aristotle Pagaltzis, Augustina Blair, Bob Ernst, Brad Gilbert, Breno G. de
Oliveira, Brian Carlson, Brian Fraser, Charlie Gonzalez, Chip Salzenberg, Chris
'BinGOs' Williams, Christian Hansen, Colin Kuskie, Craig A. Berry, Dagfinn
Ilmari Mannsåker, Daniel Dragan, Daniel Perrett, Darin McBride, Dave Rolsky,
David Golden, David Leadbeater, David Mitchell, David Nicol, Dominic
Hargreaves, E. Choroba, Eric Brine, Evan Miller, Father Chrysostomos, Florian
Ragwitz, François Perrad, George Greer, Goro Fuji, H.Merijn Brand, Herbert
Breunung, Hugo van der Sanden, Igor Zaytsev, James E Keenan, Jan Dubois,
Jasmine Ahuja, Jerry D. Hedden, Jess Robinson, Jesse Luehrs, Joaquin Ferrero,
Joel Berger, John Goodyear, John Peacock, Karen Etheridge, Karl Williamson,
Karthik Rajagopalan, Kent Fredric, Leon Timmermans, Lucas Holt, Lukas Mai,
Marcus Holland-Moritz, Markus Jansen, Martin Hasch, Matthew Horsfall, Max
Maischein, Michael G Schwern, Michael Schroeder, Moritz Lenz, Nicholas Clark,
Niko Tyni, Oleg Nesterov, Patrik Hägglund, Paul Green, Paul Johnson, Paul
Marquess, Peter Martini, Rafael Garcia-Suarez, Reini Urban, Renee Baecker,
Rhesa Rozendaal, Ricardo Signes, Robin Barker, Ronald J. Kimball, Ruslan
Zakirov, Salvador Fandiño, Sawyer X, Scott Lanning, Sergey Alekseev, Shawn M
Moore, Shirakata Kentaro, Shlomi Fish, Sisyphus, Smylers, Steffen Müller,
Steve Hay, Steve Peters, Steven Schubiger, Sullivan Beck, Sven Strickroth,
Sébastien Aperghis-Tramoni, Thomas Sibley, Tobias Leich, Tom Wyant, Tony Cook,
Vadim Konovalov, Vincent Pit, Volker Schatz, Walt Mankowski, Yves Orton,
Zefram.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles recently
posted to the comp.lang.perl.misc newsgroup and the perl bug database at
http://rt.perl.org/perlbug/ . There may also be information at
http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug> program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of C<perl -V>,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it
inappropriate to send to a publicly archived mailing list, then please send it
to perl5-security-report@perl.org. This points to a closed subscription
unarchived mailing list, which includes all the core committers, who will be
able to help assess the impact of issues, figure out a resolution, and help
co-ordinate the release of patches to mitigate or fix the problem across all
platforms on which Perl is supported. Please only use this address for
security issues in the Perl core, not for modules independently distributed on
CPAN.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details on
what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Z��̧l �l perlos2.podnu �[��� If you read this file _as_is_, just ignore the funny characters you
see. It is written in the POD format (see perlpod manpage) which is
specially designed to be readable as is.
=head1 NAME
perlos2 - Perl under OS/2, DOS, Win0.3*, Win0.95 and WinNT.
=head1 SYNOPSIS
One can read this document in the following formats:
man perlos2
view perl perlos2
explorer perlos2.html
info perlos2
to list some (not all may be available simultaneously), or it may
be read I<as is>: either as F<README.os2>, or F<pod/perlos2.pod>.
To read the F<.INF> version of documentation (B<very> recommended)
outside of OS/2, one needs an IBM's reader (may be available on IBM
ftp sites (?) (URL anyone?)) or shipped with PC DOS 7.0 and IBM's
Visual Age C++ 3.5.
A copy of a Win* viewer is contained in the "Just add OS/2 Warp" package
ftp://ftp.software.ibm.com/ps/products/os2/tools/jaow/jaow.zip
in F<?:\JUST_ADD\view.exe>. This gives one an access to EMX's
F<.INF> docs as well (text form is available in F</emx/doc> in
EMX's distribution). There is also a different viewer named xview.
Note that if you have F<lynx.exe> or F<netscape.exe> installed, you can follow WWW links
from this document in F<.INF> format. If you have EMX docs installed
correctly, you can follow library links (you need to have C<view emxbook>
working by setting C<EMXBOOK> environment variable as it is described
in EMX docs).
=cut
Contents (This may be a little bit obsolete)
perlos2 - Perl under OS/2, DOS, Win0.3*, Win0.95 and WinNT.
NAME
SYNOPSIS
DESCRIPTION
- Target
- Other OSes
- Prerequisites
- Starting Perl programs under OS/2 (and DOS and...)
- Starting OS/2 (and DOS) programs under Perl
Frequently asked questions
- "It does not work"
- I cannot run external programs
- I cannot embed perl into my program, or use perl.dll from my
- `` and pipe-open do not work under DOS.
- Cannot start find.exe "pattern" file
INSTALLATION
- Automatic binary installation
- Manual binary installation
- Warning
Accessing documentation
- OS/2 .INF file
- Plain text
- Manpages
- HTML
- GNU info files
- PDF files
- LaTeX docs
BUILD
- The short story
- Prerequisites
- Getting perl source
- Application of the patches
- Hand-editing
- Making
- Testing
- Installing the built perl
- a.out-style build
Build FAQ
- Some / became \ in pdksh.
- 'errno' - unresolved external
- Problems with tr or sed
- Some problem (forget which ;-)
- Library ... not found
- Segfault in make
- op/sprintf test failure
Specific (mis)features of OS/2 port
- setpriority, getpriority
- system()
- extproc on the first line
- Additional modules:
- Prebuilt methods:
- Prebuilt variables:
- Misfeatures
- Modifications
- Identifying DLLs
- Centralized management of resources
Perl flavors
- perl.exe
- perl_.exe
- perl__.exe
- perl___.exe
- Why strange names?
- Why dynamic linking?
- Why chimera build?
ENVIRONMENT
- PERLLIB_PREFIX
- PERL_BADLANG
- PERL_BADFREE
- PERL_SH_DIR
- USE_PERL_FLOCK
- TMP or TEMP
Evolution
- Text-mode filehandles
- Priorities
- DLL name mangling: pre 5.6.2
- DLL name mangling: 5.6.2 and beyond
- DLL forwarder generation
- Threading
- Calls to external programs
- Memory allocation
- Threads
BUGS
AUTHOR
SEE ALSO
=head1 DESCRIPTION
=head2 Target
The target is to make OS/2 one of the best supported platform for
using/building/developing Perl and I<Perl applications>, as well as
make Perl the best language to use under OS/2. The secondary target is
to try to make this work under DOS and Win* as well (but not B<too> hard).
The current state is quite close to this target. Known limitations:
=over 5
=item *
Some *nix programs use fork() a lot; with the mostly useful flavors of
perl for OS/2 (there are several built simultaneously) this is
supported; but some flavors do not support this (e.g., when Perl is
called from inside REXX). Using fork() after
I<use>ing dynamically loading extensions would not work with I<very> old
versions of EMX.
=item *
You need a separate perl executable F<perl__.exe> (see L</perl__.exe>)
if you want to use PM code in your application (as Perl/Tk or OpenGL
Perl modules do) without having a text-mode window present.
While using the standard F<perl.exe> from a text-mode window is possible
too, I have seen cases when this causes degradation of the system stability.
Using F<perl__.exe> avoids such a degradation.
=item *
There is no simple way to access WPS objects. The only way I know
is via C<OS2::REXX> and C<SOM> extensions (see L<OS2::REXX>, L<SOM>).
However, we do not have access to
convenience methods of Object-REXX. (Is it possible at all? I know
of no Object-REXX API.) The C<SOM> extension (currently in alpha-text)
may eventually remove this shortcoming; however, due to the fact that
DII is not supported by the C<SOM> module, using C<SOM> is not as
convenient as one would like it.
=back
Please keep this list up-to-date by informing me about other items.
=head2 Other OSes
Since OS/2 port of perl uses a remarkable EMX environment, it can
run (and build extensions, and - possibly - be built itself) under any
environment which can run EMX. The current list is DOS,
DOS-inside-OS/2, Win0.3*, Win0.95 and WinNT. Out of many perl flavors,
only one works, see L</"F<perl_.exe>">.
Note that not all features of Perl are available under these
environments. This depends on the features the I<extender> - most
probably RSX - decided to implement.
Cf. L</Prerequisites>.
=head2 Prerequisites
=over 6
=item EMX
EMX runtime is required (may be substituted by RSX). Note that
it is possible to make F<perl_.exe> to run under DOS without any
external support by binding F<emx.exe>/F<rsx.exe> to it, see C<emxbind>. Note
that under DOS for best results one should use RSX runtime, which
has much more functions working (like C<fork>, C<popen> and so on). In
fact RSX is required if there is no VCPI present. Note the
RSX requires DPMI. Many implementations of DPMI are known to be very
buggy, beware!
Only the latest runtime is supported, currently C<0.9d fix 03>. Perl may run
under earlier versions of EMX, but this is not tested.
One can get different parts of EMX from, say
ftp://crydee.sai.msu.ru/pub/comp/os/os2/leo/gnu/emx+gcc/
http://hobbes.nmsu.edu/h-browse.php?dir=/pub/os2/dev/emx/v0.9d/
The runtime component should have the name F<emxrt.zip>.
B<NOTE>. When using F<emx.exe>/F<rsx.exe>, it is enough to have them on your path. One
does not need to specify them explicitly (though this
emx perl_.exe -de 0
will work as well.)
=item RSX
To run Perl on DPMI platforms one needs RSX runtime. This is
needed under DOS-inside-OS/2, Win0.3*, Win0.95 and WinNT (see
L</"Other OSes">). RSX would not work with VCPI
only, as EMX would, it requires DMPI.
Having RSX and the latest F<sh.exe> one gets a fully functional
B<*nix>-ish environment under DOS, say, C<fork>, C<``> and
pipe-C<open> work. In fact, MakeMaker works (for static build), so one
can have Perl development environment under DOS.
One can get RSX from, say
http://cd.textfiles.com/hobbesos29804/disk1/EMX09C/
ftp://crydee.sai.msu.ru/pub/comp/os/os2/leo/gnu/emx+gcc/contrib/
Contact the author on C<rainer@mathematik.uni-bielefeld.de>.
The latest F<sh.exe> with DOS hooks is available in
http://www.ilyaz.org/software/os2/
as F<sh_dos.zip> or under similar names starting with C<sh>, C<pdksh> etc.
=item HPFS
Perl does not care about file systems, but the perl library contains
many files with long names, so to install it intact one needs a file
system which supports long file names.
Note that if you do not plan to build the perl itself, it may be
possible to fool EMX to truncate file names. This is not supported,
read EMX docs to see how to do it.
=item pdksh
To start external programs with complicated command lines (like with
pipes in between, and/or quoting of arguments), Perl uses an external
shell. With EMX port such shell should be named F<sh.exe>, and located
either in the wired-in-during-compile locations (usually F<F:/bin>),
or in configurable location (see L</"C<PERL_SH_DIR>">).
For best results use EMX pdksh. The standard binary (5.2.14 or later) runs
under DOS (with L</RSX>) as well, see
http://www.ilyaz.org/software/os2/
=back
=head2 Starting Perl programs under OS/2 (and DOS and...)
Start your Perl program F<foo.pl> with arguments C<arg1 arg2 arg3> the
same way as on any other platform, by
perl foo.pl arg1 arg2 arg3
If you want to specify perl options C<-my_opts> to the perl itself (as
opposed to your program), use
perl -my_opts foo.pl arg1 arg2 arg3
Alternately, if you use OS/2-ish shell, like CMD or 4os2, put
the following at the start of your perl script:
extproc perl -S -my_opts
rename your program to F<foo.cmd>, and start it by typing
foo arg1 arg2 arg3
Note that because of stupid OS/2 limitations the full path of the perl
script is not available when you use C<extproc>, thus you are forced to
use C<-S> perl switch, and your script should be on the C<PATH>. As a plus
side, if you know a full path to your script, you may still start it
with
perl ../../blah/foo.cmd arg1 arg2 arg3
(note that the argument C<-my_opts> is taken care of by the C<extproc> line
in your script, see L<C<extproc> on the first line>).
To understand what the above I<magic> does, read perl docs about C<-S>
switch - see L<perlrun>, and cmdref about C<extproc>:
view perl perlrun
man perlrun
view cmdref extproc
help extproc
or whatever method you prefer.
There are also endless possibilities to use I<executable extensions> of
4os2, I<associations> of WPS and so on... However, if you use
*nixish shell (like F<sh.exe> supplied in the binary distribution),
you need to follow the syntax specified in L<perlrun/"Command Switches">.
Note that B<-S> switch supports scripts with additional extensions
F<.cmd>, F<.btm>, F<.bat>, F<.pl> as well.
=head2 Starting OS/2 (and DOS) programs under Perl
This is what system() (see L<perlfunc/system>), C<``> (see
L<perlop/"I/O Operators">), and I<open pipe> (see L<perlfunc/open>)
are for. (Avoid exec() (see L<perlfunc/exec>) unless you know what you
do).
Note however that to use some of these operators you need to have a
sh-syntax shell installed (see L</"Pdksh">,
L</"Frequently asked questions">), and perl should be able to find it
(see L</"C<PERL_SH_DIR>">).
The cases when the shell is used are:
=over
=item 1
One-argument system() (see L<perlfunc/system>), exec() (see L<perlfunc/exec>)
with redirection or shell meta-characters;
=item 2
Pipe-open (see L<perlfunc/open>) with the command which contains redirection
or shell meta-characters;
=item 3
Backticks C<``> (see L<perlop/"I/O Operators">) with the command which contains
redirection or shell meta-characters;
=item 4
If the executable called by system()/exec()/pipe-open()/C<``> is a script
with the "magic" C<#!> line or C<extproc> line which specifies shell;
=item 5
If the executable called by system()/exec()/pipe-open()/C<``> is a script
without "magic" line, and C<$ENV{EXECSHELL}> is set to shell;
=item 6
If the executable called by system()/exec()/pipe-open()/C<``> is not
found (is not this remark obsolete?);
=item 7
For globbing (see L<perlfunc/glob>, L<perlop/"I/O Operators">)
(obsolete? Perl uses builtin globbing nowadays...).
=back
For the sake of speed for a common case, in the above algorithms
backslashes in the command name are not considered as shell metacharacters.
Perl starts scripts which begin with cookies
C<extproc> or C<#!> directly, without an intervention of shell. Perl uses the
same algorithm to find the executable as F<pdksh>: if the path
on C<#!> line does not work, and contains C</>, then the directory
part of the executable is ignored, and the executable
is searched in F<.> and on C<PATH>. To find arguments for these scripts
Perl uses a different algorithm than F<pdksh>: up to 3 arguments are
recognized, and trailing whitespace is stripped.
If a script
does not contain such a cooky, then to avoid calling F<sh.exe>, Perl uses
the same algorithm as F<pdksh>: if C<$ENV{EXECSHELL}> is set, the
script is given as the first argument to this command, if not set, then
C<$ENV{COMSPEC} /c> is used (or a hardwired guess if C<$ENV{COMSPEC}> is
not set).
When starting scripts directly, Perl uses exactly the same algorithm as for
the search of script given by B<-S> command-line option: it will look in
the current directory, then on components of C<$ENV{PATH}> using the
following order of appended extensions: no extension, F<.cmd>, F<.btm>,
F<.bat>, F<.pl>.
Note that Perl will start to look for scripts only if OS/2 cannot start the
specified application, thus C<system 'blah'> will not look for a script if
there is an executable file F<blah.exe> I<anywhere> on C<PATH>. In
other words, C<PATH> is essentially searched twice: once by the OS for
an executable, then by Perl for scripts.
Note also that executable files on OS/2 can have an arbitrary extension, but
F<.exe> will be automatically appended if no dot is present in the name. The
workaround is as simple as that: since F<blah.> and F<blah> denote the same
file (at list on FAT and HPFS file systems), to start an executable residing in
file F<n:/bin/blah> (no extension) give an argument C<n:/bin/blah.> (dot
appended) to system().
Perl will start PM programs from VIO (=text-mode) Perl process in a
separate PM session;
the opposite is not true: when you start a non-PM program from a PM
Perl process, Perl would not run it in a separate session. If a separate
session is desired, either ensure
that shell will be used, as in C<system 'cmd /c myprog'>, or start it using
optional arguments to system() documented in C<OS2::Process> module. This
is considered to be a feature.
=head1 Frequently asked questions
=head2 "It does not work"
Perl binary distributions come with a F<testperl.cmd> script which tries
to detect common problems with misconfigured installations. There is a
pretty large chance it will discover which step of the installation you
managed to goof. C<;-)>
=head2 I cannot run external programs
=over 4
=item *
Did you run your programs with C<-w> switch? See
L<Starting OSE<sol>2 (and DOS) programs under Perl>.
=item *
Do you try to run I<internal> shell commands, like C<`copy a b`>
(internal for F<cmd.exe>), or C<`glob a*b`> (internal for ksh)? You
need to specify your shell explicitly, like C<`cmd /c copy a b`>,
since Perl cannot deduce which commands are internal to your shell.
=back
=head2 I cannot embed perl into my program, or use F<perl.dll> from my
program.
=over 4
=item Is your program EMX-compiled with C<-Zmt -Zcrtdll>?
Well, nowadays Perl DLL should be usable from a differently compiled
program too... If you can run Perl code from REXX scripts (see
L<OS2::REXX>), then there are some other aspect of interaction which
are overlooked by the current hackish code to support
differently-compiled principal programs.
If everything else fails, you need to build a stand-alone DLL for
perl. Contact me, I did it once. Sockets would not work, as a lot of
other stuff.
=item Did you use L<ExtUtils::Embed>?
Some time ago I had reports it does not work. Nowadays it is checked
in the Perl test suite, so grep F<./t> subdirectory of the build tree
(as well as F<*.t> files in the F<./lib> subdirectory) to find how it
should be done "correctly".
=back
=head2 C<``> and pipe-C<open> do not work under DOS.
This may a variant of just L</"I cannot run external programs">, or a
deeper problem. Basically: you I<need> RSX (see L</Prerequisites>)
for these commands to work, and you may need a port of F<sh.exe> which
understands command arguments. One of such ports is listed in
L</Prerequisites> under RSX. Do not forget to set variable
L</"C<PERL_SH_DIR>"> as well.
DPMI is required for RSX.
=head2 Cannot start C<find.exe "pattern" file>
The whole idea of the "standard C API to start applications" is that
the forms C<foo> and C<"foo"> of program arguments are completely
interchangeable. F<find> breaks this paradigm;
find "pattern" file
find pattern file
are not equivalent; F<find> cannot be started directly using the above
API. One needs a way to surround the doublequotes in some other
quoting construction, necessarily having an extra non-Unixish shell in
between.
Use one of
system 'cmd', '/c', 'find "pattern" file';
`cmd /c 'find "pattern" file'`
This would start F<find.exe> via F<cmd.exe> via C<sh.exe> via
C<perl.exe>, but this is a price to pay if you want to use
non-conforming program.
=head1 INSTALLATION
=head2 Automatic binary installation
The most convenient way of installing a binary distribution of perl is via perl installer
F<install.exe>. Just follow the instructions, and 99% of the
installation blues would go away.
Note however, that you need to have F<unzip.exe> on your path, and
EMX environment I<running>. The latter means that if you just
installed EMX, and made all the needed changes to F<Config.sys>,
you may need to reboot in between. Check EMX runtime by running
emxrev
Binary installer also creates a folder on your desktop with some useful
objects. If you need to change some aspects of the work of the binary
installer, feel free to edit the file F<Perl.pkg>. This may be useful
e.g., if you need to run the installer many times and do not want to
make many interactive changes in the GUI.
B<Things not taken care of by automatic binary installation:>
=over 15
=item C<PERL_BADLANG>
may be needed if you change your codepage I<after> perl installation,
and the new value is not supported by EMX. See L</"C<PERL_BADLANG>">.
=item C<PERL_BADFREE>
see L</"C<PERL_BADFREE>">.
=item F<Config.pm>
This file resides somewhere deep in the location you installed your
perl library, find it out by
perl -MConfig -le "print $INC{'Config.pm'}"
While most important values in this file I<are> updated by the binary
installer, some of them may need to be hand-edited. I know no such
data, please keep me informed if you find one. Moreover, manual
changes to the installed version may need to be accompanied by an edit
of this file.
=back
B<NOTE>. Because of a typo the binary installer of 5.00305
would install a variable C<PERL_SHPATH> into F<Config.sys>. Please
remove this variable and put L</C<PERL_SH_DIR>> instead.
=head2 Manual binary installation
As of version 5.00305, OS/2 perl binary distribution comes split
into 11 components. Unfortunately, to enable configurable binary
installation, the file paths in the zip files are not absolute, but
relative to some directory.
Note that the extraction with the stored paths is still necessary
(default with unzip, specify C<-d> to pkunzip). However, you
need to know where to extract the files. You need also to manually
change entries in F<Config.sys> to reflect where did you put the
files. Note that if you have some primitive unzipper (like
C<pkunzip>), you may get a lot of warnings/errors during
unzipping. Upgrade to C<(w)unzip>.
Below is the sample of what to do to reproduce the configuration on my
machine. In F<VIEW.EXE> you can press C<Ctrl-Insert> now, and
cut-and-paste from the resulting file - created in the directory you
started F<VIEW.EXE> from.
For each component, we mention environment variables related to each
installation directory. Either choose directories to match your
values of the variables, or create/append-to variables to take into
account the directories.
=over 3
=item Perl VIO and PM executables (dynamically linked)
unzip perl_exc.zip *.exe *.ico -d f:/emx.add/bin
unzip perl_exc.zip *.dll -d f:/emx.add/dll
(have the directories with C<*.exe> on PATH, and C<*.dll> on
LIBPATH);
=item Perl_ VIO executable (statically linked)
unzip perl_aou.zip -d f:/emx.add/bin
(have the directory on PATH);
=item Executables for Perl utilities
unzip perl_utl.zip -d f:/emx.add/bin
(have the directory on PATH);
=item Main Perl library
unzip perl_mlb.zip -d f:/perllib/lib
If this directory is exactly the same as the prefix which was compiled
into F<perl.exe>, you do not need to change
anything. However, for perl to find the library if you use a different
path, you need to
C<set PERLLIB_PREFIX> in F<Config.sys>, see L</"C<PERLLIB_PREFIX>">.
=item Additional Perl modules
unzip perl_ste.zip -d f:/perllib/lib/site_perl/5.26.3/
Same remark as above applies. Additionally, if this directory is not
one of directories on @INC (and @INC is influenced by C<PERLLIB_PREFIX>), you
need to put this
directory and subdirectory F<./os2> in C<PERLLIB> or C<PERL5LIB>
variable. Do not use C<PERL5LIB> unless you have it set already. See
L<perl/"ENVIRONMENT">.
B<[Check whether this extraction directory is still applicable with
the new directory structure layout!]>
=item Tools to compile Perl modules
unzip perl_blb.zip -d f:/perllib/lib
Same remark as for F<perl_ste.zip>.
=item Manpages for Perl and utilities
unzip perl_man.zip -d f:/perllib/man
This directory should better be on C<MANPATH>. You need to have a
working F<man> to access these files.
=item Manpages for Perl modules
unzip perl_mam.zip -d f:/perllib/man
This directory should better be on C<MANPATH>. You need to have a
working man to access these files.
=item Source for Perl documentation
unzip perl_pod.zip -d f:/perllib/lib
This is used by the C<perldoc> program (see L<perldoc>), and may be used to
generate HTML documentation usable by WWW browsers, and
documentation in zillions of other formats: C<info>, C<LaTeX>,
C<Acrobat>, C<FrameMaker> and so on. [Use programs such as
F<pod2latex> etc.]
=item Perl manual in F<.INF> format
unzip perl_inf.zip -d d:/os2/book
This directory should better be on C<BOOKSHELF>.
=item Pdksh
unzip perl_sh.zip -d f:/bin
This is used by perl to run external commands which explicitly
require shell, like the commands using I<redirection> and I<shell
metacharacters>. It is also used instead of explicit F</bin/sh>.
Set C<PERL_SH_DIR> (see L</"C<PERL_SH_DIR>">) if you move F<sh.exe> from
the above location.
B<Note.> It may be possible to use some other sh-compatible shell (untested).
=back
After you installed the components you needed and updated the
F<Config.sys> correspondingly, you need to hand-edit
F<Config.pm>. This file resides somewhere deep in the location you
installed your perl library, find it out by
perl -MConfig -le "print $INC{'Config.pm'}"
You need to correct all the entries which look like file paths (they
currently start with C<f:/>).
=head2 B<Warning>
The automatic and manual perl installation leave precompiled paths
inside perl executables. While these paths are overwriteable (see
L</"C<PERLLIB_PREFIX>">, L</"C<PERL_SH_DIR>">), some people may prefer
binary editing of paths inside the executables/DLLs.
=head1 Accessing documentation
Depending on how you built/installed perl you may have (otherwise
identical) Perl documentation in the following formats:
=head2 OS/2 F<.INF> file
Most probably the most convenient form. Under OS/2 view it as
view perl
view perl perlfunc
view perl less
view perl ExtUtils::MakeMaker
(currently the last two may hit a wrong location, but this may improve
soon). Under Win* see L</"SYNOPSIS">.
If you want to build the docs yourself, and have I<OS/2 toolkit>, run
pod2ipf > perl.ipf
in F</perllib/lib/pod> directory, then
ipfc /inf perl.ipf
(Expect a lot of errors during the both steps.) Now move it on your
BOOKSHELF path.
=head2 Plain text
If you have perl documentation in the source form, perl utilities
installed, and GNU groff installed, you may use
perldoc perlfunc
perldoc less
perldoc ExtUtils::MakeMaker
to access the perl documentation in the text form (note that you may get
better results using perl manpages).
Alternately, try running pod2text on F<.pod> files.
=head2 Manpages
If you have F<man> installed on your system, and you installed perl
manpages, use something like this:
man perlfunc
man 3 less
man ExtUtils.MakeMaker
to access documentation for different components of Perl. Start with
man perl
Note that dot (F<.>) is used as a package separator for documentation
for packages, and as usual, sometimes you need to give the section - C<3>
above - to avoid shadowing by the I<less(1) manpage>.
Make sure that the directory B<above> the directory with manpages is
on our C<MANPATH>, like this
set MANPATH=c:/man;f:/perllib/man
for Perl manpages in C<f:/perllib/man/man1/> etc.
=head2 HTML
If you have some WWW browser available, installed the Perl
documentation in the source form, and Perl utilities, you can build
HTML docs. Cd to directory with F<.pod> files, and do like this
cd f:/perllib/lib/pod
pod2html
After this you can direct your browser the file F<perl.html> in this
directory, and go ahead with reading docs, like this:
explore file:///f:/perllib/lib/pod/perl.html
Alternatively you may be able to get these docs prebuilt from CPAN.
=head2 GNU C<info> files
Users of Emacs would appreciate it very much, especially with
C<CPerl> mode loaded. You need to get latest C<pod2texi> from C<CPAN>,
or, alternately, the prebuilt info pages.
=head2 F<PDF> files
for C<Acrobat> are available on CPAN (may be for slightly older version of
perl).
=head2 C<LaTeX> docs
can be constructed using C<pod2latex>.
=head1 BUILD
Here we discuss how to build Perl under OS/2.
=head2 The short story
Assume that you are a seasoned porter, so are sure that all the necessary
tools are already present on your system, and you know how to get the Perl
source distribution. Untar it, change to the extract directory, and
gnupatch -p0 < os2\diff.configure
sh Configure -des -D prefix=f:/perllib
make
make test
make install
make aout_test
make aout_install
This puts the executables in f:/perllib/bin. Manually move them to the
C<PATH>, manually move the built F<perl*.dll> to C<LIBPATH> (here for
Perl DLL F<*> is a not-very-meaningful hex checksum), and run
make installcmd INSTALLCMDDIR=d:/ir/on/path
Assuming that the C<man>-files were put on an appropriate location,
this completes the installation of minimal Perl system. (The binary
distribution contains also a lot of additional modules, and the
documentation in INF format.)
What follows is a detailed guide through these steps.
=head2 Prerequisites
You need to have the latest EMX development environment, the full
GNU tool suite (gawk renamed to awk, and GNU F<find.exe>
earlier on path than the OS/2 F<find.exe>, same with F<sort.exe>, to
check use
find --version
sort --version
). You need the latest version of F<pdksh> installed as F<sh.exe>.
Check that you have B<BSD> libraries and headers installed, and -
optionally - Berkeley DB headers and libraries, and crypt.
Possible locations to get the files:
ftp://ftp.uni-heidelberg.de/pub/os2/unix/
http://hobbes.nmsu.edu/h-browse.php?dir=/pub/os2
http://cd.textfiles.com/hobbesos29804/disk1/DEV32/
http://cd.textfiles.com/hobbesos29804/disk1/EMX09C/
It is reported that the following archives contain enough utils to
build perl: F<gnufutil.zip>, F<gnusutil.zip>, F<gnututil.zip>, F<gnused.zip>,
F<gnupatch.zip>, F<gnuawk.zip>, F<gnumake.zip>, F<gnugrep.zip>, F<bsddev.zip> and
F<ksh527rt.zip> (or a later version). Note that all these utilities are
known to be available from LEO:
ftp://crydee.sai.msu.ru/pub/comp/os/os2/leo/gnu/
Note also that the F<db.lib> and F<db.a> from the EMX distribution
are not suitable for multi-threaded compile (even single-threaded
flavor of Perl uses multi-threaded C RTL, for
compatibility with XFree86-OS/2). Get a corrected one from
http://www.ilyaz.org/software/os2/db_mt.zip
If you have I<exactly the same version of Perl> installed already,
make sure that no copies or perl are currently running. Later steps
of the build may fail since an older version of F<perl.dll> loaded into
memory may be found. Running C<make test> becomes meaningless, since
the test are checking a previous build of perl (this situation is detected
and reported by F<os2/os2_base.t> test). Do not forget to unset
C<PERL_EMXLOAD_SEC> in environment.
Also make sure that you have F</tmp> directory on the current drive,
and F<.> directory in your C<LIBPATH>. One may try to correct the
latter condition by
set BEGINLIBPATH .\.
if you use something like F<CMD.EXE> or latest versions of
F<4os2.exe>. (Setting BEGINLIBPATH to just C<.> is ignored by the
OS/2 kernel.)
Make sure your gcc is good for C<-Zomf> linking: run C<omflibs>
script in F</emx/lib> directory.
Check that you have link386 installed. It comes standard with OS/2,
but may be not installed due to customization. If typing
link386
shows you do not have it, do I<Selective install>, and choose C<Link
object modules> in I<Optional system utilities/More>. If you get into
link386 prompts, press C<Ctrl-C> to exit.
=head2 Getting perl source
You need to fetch the latest perl source (including developers
releases). With some probability it is located in
http://www.cpan.org/src/
http://www.cpan.org/src/unsupported
If not, you may need to dig in the indices to find it in the directory
of the current maintainer.
Quick cycle of developers release may break the OS/2 build time to
time, looking into
http://www.cpan.org/ports/os2/
may indicate the latest release which was publicly released by the
maintainer. Note that the release may include some additional patches
to apply to the current source of perl.
Extract it like this
tar vzxf perl5.00409.tar.gz
You may see a message about errors while extracting F<Configure>. This is
because there is a conflict with a similarly-named file F<configure>.
Change to the directory of extraction.
=head2 Application of the patches
You need to apply the patches in F<./os2/diff.*> like this:
gnupatch -p0 < os2\diff.configure
You may also need to apply the patches supplied with the binary
distribution of perl. It also makes sense to look on the
perl5-porters mailing list for the latest OS/2-related patches (see
L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/>). Such
patches usually contain strings C</os2/> and C<patch>, so it makes
sense looking for these strings.
=head2 Hand-editing
You may look into the file F<./hints/os2.sh> and correct anything
wrong you find there. I do not expect it is needed anywhere.
=head2 Making
sh Configure -des -D prefix=f:/perllib
C<prefix> means: where to install the resulting perl library. Giving
correct prefix you may avoid the need to specify C<PERLLIB_PREFIX>,
see L</"C<PERLLIB_PREFIX>">.
I<Ignore the message about missing C<ln>, and about C<-c> option to
tr>. The latter is most probably already fixed, if you see it and can trace
where the latter spurious warning comes from, please inform me.
Now
make
At some moment the built may die, reporting a I<version mismatch> or
I<unable to run F<perl>>. This means that you do not have F<.> in
your LIBPATH, so F<perl.exe> cannot find the needed F<perl67B2.dll> (treat
these hex digits as line noise). After this is fixed the build
should finish without a lot of fuss.
=head2 Testing
Now run
make test
All tests should succeed (with some of them skipped). If you have the
same version of Perl installed, it is crucial that you have C<.> early
in your LIBPATH (or in BEGINLIBPATH), otherwise your tests will most
probably test the wrong version of Perl.
Some tests may generate extra messages similar to
=over 4
=item A lot of C<bad free>
in database tests related to Berkeley DB. I<This should be fixed already.>
If it persists, you may disable this warnings, see L</"C<PERL_BADFREE>">.
=item Process terminated by SIGTERM/SIGINT
This is a standard message issued by OS/2 applications. *nix
applications die in silence. It is considered to be a feature. One can
easily disable this by appropriate sighandlers.
However the test engine bleeds these message to screen in unexpected
moments. Two messages of this kind I<should> be present during
testing.
=back
To get finer test reports, call
perl t/harness
The report with F<io/pipe.t> failing may look like this:
Failed Test Status Wstat Total Fail Failed List of failed
------------------------------------------------------------
io/pipe.t 12 1 8.33% 9
7 tests skipped, plus 56 subtests skipped.
Failed 1/195 test scripts, 99.49% okay. 1/6542 subtests failed,
99.98% okay.
The reasons for most important skipped tests are:
=over 8
=item F<op/fs.t>
=over 4
=item Z<>18
Checks C<atime> and C<mtime> of C<stat()> - unfortunately, HPFS
provides only 2sec time granularity (for compatibility with FAT?).
=item Z<>25
Checks C<truncate()> on a filehandle just opened for write - I do not
know why this should or should not work.
=back
=item F<op/stat.t>
Checks C<stat()>. Tests:
=over 4
=item 4
Checks C<atime> and C<mtime> of C<stat()> - unfortunately, HPFS
provides only 2sec time granularity (for compatibility with FAT?).
=back
=back
=head2 Installing the built perl
If you haven't yet moved C<perl*.dll> onto LIBPATH, do it now.
Run
make install
It would put the generated files into needed locations. Manually put
F<perl.exe>, F<perl__.exe> and F<perl___.exe> to a location on your
PATH, F<perl.dll> to a location on your LIBPATH.
Run
make installcmd INSTALLCMDDIR=d:/ir/on/path
to convert perl utilities to F<.cmd> files and put them on
PATH. You need to put F<.EXE>-utilities on path manually. They are
installed in C<$prefix/bin>, here C<$prefix> is what you gave to
F<Configure>, see L</Making>.
If you use C<man>, either move the installed F<*/man/> directories to
your C<MANPATH>, or modify C<MANPATH> to match the location. (One
could have avoided this by providing a correct C<manpath> option to
F<./Configure>, or editing F<./config.sh> between configuring and
making steps.)
=head2 C<a.out>-style build
Proceed as above, but make F<perl_.exe> (see L</"F<perl_.exe>">) by
make perl_
test and install by
make aout_test
make aout_install
Manually put F<perl_.exe> to a location on your PATH.
B<Note.> The build process for C<perl_> I<does not know> about all the
dependencies, so you should make sure that anything is up-to-date,
say, by doing
make perl_dll
first.
=head1 Building a binary distribution
[This section provides a short overview only...]
Building should proceed differently depending on whether the version of perl
you install is already present and used on your system, or is a new version
not yet used. The description below assumes that the version is new, so
installing its DLLs and F<.pm> files will not disrupt the operation of your
system even if some intermediate steps are not yet fully working.
The other cases require a little bit more convoluted procedures. Below I
suppose that the current version of Perl is C<5.8.2>, so the executables are
named accordingly.
=over
=item 1.
Fully build and test the Perl distribution. Make sure that no tests are
failing with C<test> and C<aout_test> targets; fix the bugs in Perl and
the Perl test suite detected by these tests. Make sure that C<all_test>
make target runs as clean as possible. Check that F<os2/perlrexx.cmd>
runs fine.
=item 2.
Fully install Perl, including C<installcmd> target. Copy the generated DLLs
to C<LIBPATH>; copy the numbered Perl executables (as in F<perl5.8.2.exe>)
to C<PATH>; copy C<perl_.exe> to C<PATH> as C<perl_5.8.2.exe>. Think whether
you need backward-compatibility DLLs. In most cases you do not need to install
them yet; but sometime this may simplify the following steps.
=item 3.
Make sure that C<CPAN.pm> can download files from CPAN. If not, you may need
to manually install C<Net::FTP>.
=item 4.
Install the bundle C<Bundle::OS2_default>
perl5.8.2 -MCPAN -e "install Bundle::OS2_default" < nul |& tee 00cpan_i_1
This may take a couple of hours on 1GHz processor (when run the first time).
And this should not be necessarily a smooth procedure. Some modules may not
specify required dependencies, so one may need to repeat this procedure several
times until the results stabilize.
perl5.8.2 -MCPAN -e "install Bundle::OS2_default" < nul |& tee 00cpan_i_2
perl5.8.2 -MCPAN -e "install Bundle::OS2_default" < nul |& tee 00cpan_i_3
Even after they stabilize, some tests may fail.
Fix as many discovered bugs as possible. Document all the bugs which are not
fixed, and all the failures with unknown reasons. Inspect the produced logs
F<00cpan_i_1> to find suspiciously skipped tests, and other fishy events.
Keep in mind that I<installation> of some modules may fail too: for example,
the DLLs to update may be already loaded by F<CPAN.pm>. Inspect the C<install>
logs (in the example above F<00cpan_i_1> etc) for errors, and install things
manually, as in
cd $CPANHOME/.cpan/build/Digest-MD5-2.31
make install
Some distributions may fail some tests, but you may want to install them
anyway (as above, or via C<force install> command of C<CPAN.pm> shell-mode).
Since this procedure may take quite a long time to complete, it makes sense
to "freeze" your CPAN configuration by disabling periodic updates of the
local copy of CPAN index: set C<index_expire> to some big value (I use 365),
then save the settings
CPAN> o conf index_expire 365
CPAN> o conf commit
Reset back to the default value C<1> when you are finished.
=item 5.
When satisfied with the results, rerun the C<installcmd> target. Now you
can copy C<perl5.8.2.exe> to C<perl.exe>, and install the other OMF-build
executables: C<perl__.exe> etc. They are ready to be used.
=item 6.
Change to the C<./pod> directory of the build tree, download the Perl logo
F<CamelGrayBig.BMP>, and run
( perl2ipf > perl.ipf ) |& tee 00ipf
ipfc /INF perl.ipf |& tee 00inf
This produces the Perl docs online book C<perl.INF>. Install in on
C<BOOKSHELF> path.
=item 7.
Now is the time to build statically linked executable F<perl_.exe> which
includes newly-installed via C<Bundle::OS2_default> modules. Doing testing
via C<CPAN.pm> is going to be painfully slow, since it statically links
a new executable per XS extension.
Here is a possible workaround: create a toplevel F<Makefile.PL> in
F<$CPANHOME/.cpan/build/> with contents being (compare with L</Making
executables with a custom collection of statically loaded extensions>)
use ExtUtils::MakeMaker;
WriteMakefile NAME => 'dummy';
execute this as
perl_5.8.2.exe Makefile.PL <nul |& tee 00aout_c1
make -k all test <nul |& 00aout_t1
Again, this procedure should not be absolutely smooth. Some C<Makefile.PL>'s
in subdirectories may be buggy, and would not run as "child" scripts. The
interdependency of modules can strike you; however, since non-XS modules
are already installed, the prerequisites of most modules have a very good
chance to be present.
If you discover some glitches, move directories of problematic modules to a
different location; if these modules are non-XS modules, you may just ignore
them - they are already installed; the remaining, XS, modules you need to
install manually one by one.
After each such removal you need to rerun the C<Makefile.PL>/C<make> process;
usually this procedure converges soon. (But be sure to convert all the
necessary external C libraries from F<.lib> format to F<.a> format: run one of
emxaout foo.lib
emximp -o foo.a foo.lib
whichever is appropriate.) Also, make sure that the DLLs for external
libraries are usable with with executables compiled without C<-Zmtd> options.
When you are sure that only a few subdirectories
lead to failures, you may want to add C<-j4> option to C<make> to speed up
skipping subdirectories with already finished build.
When you are satisfied with the results of tests, install the build C libraries
for extensions:
make install |& tee 00aout_i
Now you can rename the file F<./perl.exe> generated during the last phase
to F<perl_5.8.2.exe>; place it on C<PATH>; if there is an inter-dependency
between some XS modules, you may need to repeat the C<test>/C<install> loop
with this new executable and some excluded modules - until the procedure
converges.
Now you have all the necessary F<.a> libraries for these Perl modules in the
places where Perl builder can find it. Use the perl builder: change to an
empty directory, create a "dummy" F<Makefile.PL> again, and run
perl_5.8.2.exe Makefile.PL |& tee 00c
make perl |& tee 00p
This should create an executable F<./perl.exe> with all the statically loaded
extensions built in. Compare the generated F<perlmain.c> files to make sure
that during the iterations the number of loaded extensions only increases.
Rename F<./perl.exe> to F<perl_5.8.2.exe> on C<PATH>.
When it converges, you got a functional variant of F<perl_5.8.2.exe>; copy it
to C<perl_.exe>. You are done with generation of the local Perl installation.
=item 8.
Make sure that the installed modules are actually installed in the location
of the new Perl, and are not inherited from entries of @INC given for
inheritance from the older versions of Perl: set C<PERLLIB_582_PREFIX> to
redirect the new version of Perl to a new location, and copy the installed
files to this new location. Redo the tests to make sure that the versions of
modules inherited from older versions of Perl are not needed.
Actually, the log output of L<pod2ipf(1)> during the step 6 gives a very detailed
info about which modules are loaded from which place; so you may use it as
an additional verification tool.
Check that some temporary files did not make into the perl install tree.
Run something like this
pfind . -f "!(/\.(pm|pl|ix|al|h|a|lib|txt|pod|imp|bs|dll|ld|bs|inc|xbm|yml|cgi|uu|e2x|skip|packlist|eg|cfg|html|pub|enc|all|ini|po|pot)$/i or /^\w+$/") | less
in the install tree (both top one and F<sitelib> one).
Compress all the DLLs with F<lxlite>. The tiny F<.exe> can be compressed with
C</c:max> (the bug only appears when there is a fixup in the last 6 bytes of a
page (?); since the tiny executables are much smaller than a page, the bug
will not hit). Do not compress C<perl_.exe> - it would not work under DOS.
=item 9.
Now you can generate the binary distribution. This is done by running the
test of the CPAN distribution C<OS2::SoftInstaller>. Tune up the file
F<test.pl> to suit the layout of current version of Perl first. Do not
forget to pack the necessary external DLLs accordingly. Include the
description of the bugs and test suite failures you could not fix. Include
the small-stack versions of Perl executables from Perl build directory.
Include F<perl5.def> so that people can relink the perl DLL preserving
the binary compatibility, or can create compatibility DLLs. Include the diff
files (C<diff -pu old new>) of fixes you did so that people can rebuild your
version. Include F<perl5.map> so that one can use remote debugging.
=item 10.
Share what you did with the other people. Relax. Enjoy fruits of your work.
=item 11.
Brace yourself for thanks, bug reports, hate mail and spam coming as result
of the previous step. No good deed should remain unpunished!
=back
=head1 Building custom F<.EXE> files
The Perl executables can be easily rebuilt at any moment. Moreover, one can
use the I<embedding> interface (see L<perlembed>) to make very customized
executables.
=head2 Making executables with a custom collection of statically loaded extensions
It is a little bit easier to do so while I<decreasing> the list of statically
loaded extensions. We discuss this case only here.
=over
=item 1.
Change to an empty directory, and create a placeholder <Makefile.PL>:
use ExtUtils::MakeMaker;
WriteMakefile NAME => 'dummy';
=item 2.
Run it with the flavor of Perl (F<perl.exe> or F<perl_.exe>) you want to
rebuild.
perl_ Makefile.PL
=item 3.
Ask it to create new Perl executable:
make perl
(you may need to manually add C<PERLTYPE=-DPERL_CORE> to this commandline on
some versions of Perl; the symptom is that the command-line globbing does not
work from OS/2 shells with the newly-compiled executable; check with
.\perl.exe -wle "print for @ARGV" *
).
=item 4.
The previous step created F<perlmain.c> which contains a list of newXS() calls
near the end. Removing unnecessary calls, and rerunning
make perl
will produce a customized executable.
=back
=head2 Making executables with a custom search-paths
The default perl executable is flexible enough to support most usages.
However, one may want something yet more flexible; for example, one may want
to find Perl DLL relatively to the location of the EXE file; or one may want
to ignore the environment when setting the Perl-library search patch, etc.
If you fill comfortable with I<embedding> interface (see L<perlembed>), such
things are easy to do repeating the steps outlined in L/<Making
executables with a custom collection of statically loaded extensions>, and
doing more comprehensive edits to main() of F<perlmain.c>. The people with
little desire to understand Perl can just rename main(), and do necessary
modification in a custom main() which calls the renamed function in appropriate
time.
However, there is a third way: perl DLL exports the main() function and several
callbacks to customize the search path. Below is a complete example of a
"Perl loader" which
=over
=item 1.
Looks for Perl DLL in the directory C<$exedir/../dll>;
=item 2.
Prepends the above directory to C<BEGINLIBPATH>;
=item 3.
Fails if the Perl DLL found via C<BEGINLIBPATH> is different from what was
loaded on step 1; e.g., another process could have loaded it from C<LIBPATH>
or from a different value of C<BEGINLIBPATH>. In these cases one needs to
modify the setting of the system so that this other process either does not
run, or loads the DLL from C<BEGINLIBPATH> with C<LIBPATHSTRICT=T> (available
with kernels after September 2000).
=item 4.
Loads Perl library from C<$exedir/../dll/lib/>.
=item 5.
Uses Bourne shell from C<$exedir/../dll/sh/ksh.exe>.
=back
For best results compile the C file below with the same options as the Perl
DLL. However, a lot of functionality will work even if the executable is not
an EMX applications, e.g., if compiled with
gcc -Wall -DDOSISH -DOS2=1 -O2 -s -Zomf -Zsys perl-starter.c \
-DPERL_DLL_BASENAME=\"perl312F\" -Zstack 8192 -Zlinker /PM:VIO
Here is the sample C file:
#define INCL_DOS
#define INCL_NOPM
/* These are needed for compile if os2.h includes os2tk.h, not
* os2emx.h */
#define INCL_DOSPROCESS
#include <os2.h>
#include "EXTERN.h"
#define PERL_IN_MINIPERLMAIN_C
#include "perl.h"
static char *me;
HMODULE handle;
static void
die_with(char *msg1, char *msg2, char *msg3, char *msg4)
{
ULONG c;
char *s = " error: ";
DosWrite(2, me, strlen(me), &c);
DosWrite(2, s, strlen(s), &c);
DosWrite(2, msg1, strlen(msg1), &c);
DosWrite(2, msg2, strlen(msg2), &c);
DosWrite(2, msg3, strlen(msg3), &c);
DosWrite(2, msg4, strlen(msg4), &c);
DosWrite(2, "\r\n", 2, &c);
exit(255);
}
typedef ULONG (*fill_extLibpath_t)(int type,
char *pre,
char *post,
int replace,
char *msg);
typedef int (*main_t)(int type, char *argv[], char *env[]);
typedef int (*handler_t)(void* data, int which);
#ifndef PERL_DLL_BASENAME
# define PERL_DLL_BASENAME "perl"
#endif
static HMODULE
load_perl_dll(char *basename)
{
char buf[300], fail[260];
STRLEN l, dirl;
fill_extLibpath_t f;
ULONG rc_fullname;
HMODULE handle, handle1;
if (_execname(buf, sizeof(buf) - 13) != 0)
die_with("Can't find full path: ", strerror(errno), "", "");
/* XXXX Fill 'me' with new value */
l = strlen(buf);
while (l && buf[l-1] != '/' && buf[l-1] != '\\')
l--;
dirl = l - 1;
strcpy(buf + l, basename);
l += strlen(basename);
strcpy(buf + l, ".dll");
if ( (rc_fullname = DosLoadModule(fail, sizeof fail, buf, &handle))
!= 0
&& DosLoadModule(fail, sizeof fail, basename, &handle) != 0 )
die_with("Can't load DLL ", buf, "", "");
if (rc_fullname)
return handle; /* was loaded with short name; all is fine */
if (DosQueryProcAddr(handle, 0, "fill_extLibpath", (PFN*)&f))
die_with(buf,
": DLL exports no symbol ",
"fill_extLibpath",
"");
buf[dirl] = 0;
if (f(0 /*BEGINLIBPATH*/, buf /* prepend */, NULL /* append */,
0 /* keep old value */, me))
die_with(me, ": prepending BEGINLIBPATH", "", "");
if (DosLoadModule(fail, sizeof fail, basename, &handle1) != 0)
die_with(me,
": finding perl DLL again via BEGINLIBPATH",
"",
"");
buf[dirl] = '\\';
if (handle1 != handle) {
if (DosQueryModuleName(handle1, sizeof(fail), fail))
strcpy(fail, "???");
die_with(buf,
":\n\tperl DLL via BEGINLIBPATH is different: \n\t",
fail,
"\n\tYou may need to manipulate global BEGINLIBPATH"
" and LIBPATHSTRICT"
"\n\tso that the other copy is loaded via"
BEGINLIBPATH.");
}
return handle;
}
int
main(int argc, char **argv, char **env)
{
main_t f;
handler_t h;
me = argv[0];
/**/
handle = load_perl_dll(PERL_DLL_BASENAME);
if (DosQueryProcAddr(handle,
0,
"Perl_OS2_handler_install",
(PFN*)&h))
die_with(PERL_DLL_BASENAME,
": DLL exports no symbol ",
"Perl_OS2_handler_install",
"");
if ( !h((void *)"~installprefix", Perlos2_handler_perllib_from)
|| !h((void *)"~dll", Perlos2_handler_perllib_to)
|| !h((void *)"~dll/sh/ksh.exe", Perlos2_handler_perl_sh) )
die_with(PERL_DLL_BASENAME,
": Can't install @INC manglers",
"",
"");
if (DosQueryProcAddr(handle, 0, "dll_perlmain", (PFN*)&f))
die_with(PERL_DLL_BASENAME,
": DLL exports no symbol ",
"dll_perlmain",
"");
return f(argc, argv, env);
}
=head1 Build FAQ
=head2 Some C</> became C<\> in pdksh.
You have a very old pdksh. See L</Prerequisites>.
=head2 C<'errno'> - unresolved external
You do not have MT-safe F<db.lib>. See L</Prerequisites>.
=head2 Problems with tr or sed
reported with very old version of tr and sed.
=head2 Some problem (forget which ;-)
You have an older version of F<perl.dll> on your LIBPATH, which
broke the build of extensions.
=head2 Library ... not found
You did not run C<omflibs>. See L</Prerequisites>.
=head2 Segfault in make
You use an old version of GNU make. See L</Prerequisites>.
=head2 op/sprintf test failure
This can result from a bug in emx sprintf which was fixed in 0.9d fix 03.
=head1 Specific (mis)features of OS/2 port
=head2 C<setpriority>, C<getpriority>
Note that these functions are compatible with *nix, not with the older
ports of '94 - 95. The priorities are absolute, go from 32 to -95,
lower is quicker. 0 is the default priority.
B<WARNING>. Calling C<getpriority> on a non-existing process could lock
the system before Warp3 fixpak22. Starting with Warp3, Perl will use
a workaround: it aborts getpriority() if the process is not present.
This is not possible on older versions C<2.*>, and has a race
condition anyway.
=head2 C<system()>
Multi-argument form of C<system()> allows an additional numeric
argument. The meaning of this argument is described in
L<OS2::Process>.
When finding a program to run, Perl first asks the OS to look for executables
on C<PATH> (OS/2 adds extension F<.exe> if no extension is present).
If not found, it looks for a script with possible extensions
added in this order: no extension, F<.cmd>, F<.btm>,
F<.bat>, F<.pl>. If found, Perl checks the start of the file for magic
strings C<"#!"> and C<"extproc ">. If found, Perl uses the rest of the
first line as the beginning of the command line to run this script. The
only mangling done to the first line is extraction of arguments (currently
up to 3), and ignoring of the path-part of the "interpreter" name if it can't
be found using the full path.
E.g., C<system 'foo', 'bar', 'baz'> may lead Perl to finding
F<C:/emx/bin/foo.cmd> with the first line being
extproc /bin/bash -x -c
If F</bin/bash.exe> is not found, then Perl looks for an executable F<bash.exe> on
C<PATH>. If found in F<C:/emx.add/bin/bash.exe>, then the above system() is
translated to
system qw(C:/emx.add/bin/bash.exe -x -c C:/emx/bin/foo.cmd bar baz)
One additional translation is performed: instead of F</bin/sh> Perl uses
the hardwired-or-customized shell (see L</"C<PERL_SH_DIR>">).
The above search for "interpreter" is recursive: if F<bash> executable is not
found, but F<bash.btm> is found, Perl will investigate its first line etc.
The only hardwired limit on the recursion depth is implicit: there is a limit
4 on the number of additional arguments inserted before the actual arguments
given to system(). In particular, if no additional arguments are specified
on the "magic" first lines, then the limit on the depth is 4.
If Perl finds that the found executable is of PM type when the
current session is not, it will start the new process in a separate session of
necessary type. Call via C<OS2::Process> to disable this magic.
B<WARNING>. Due to the described logic, you need to explicitly
specify F<.com> extension if needed. Moreover, if the executable
F<perl5.6.1> is requested, Perl will not look for F<perl5.6.1.exe>.
[This may change in the future.]
=head2 C<extproc> on the first line
If the first chars of a Perl script are C<"extproc ">, this line is treated
as C<#!>-line, thus all the switches on this line are processed (twice
if script was started via cmd.exe). See L<perlrun/DESCRIPTION>.
=head2 Additional modules:
L<OS2::Process>, L<OS2::DLL>, L<OS2::REXX>, L<OS2::PrfDB>, L<OS2::ExtAttr>. These
modules provide access to additional numeric argument for C<system>
and to the information about the running process,
to DLLs having functions with REXX signature and to the REXX runtime, to
OS/2 databases in the F<.INI> format, and to Extended Attributes.
Two additional extensions by Andreas Kaiser, C<OS2::UPM>, and
C<OS2::FTP>, are included into C<ILYAZ> directory, mirrored on CPAN.
Other OS/2-related extensions are available too.
=head2 Prebuilt methods:
=over 4
=item C<File::Copy::syscopy>
used by C<File::Copy::copy>, see L<File::Copy>.
=item C<DynaLoader::mod2fname>
used by C<DynaLoader> for DLL name mangling.
=item C<Cwd::current_drive()>
Self explanatory.
=item C<Cwd::sys_chdir(name)>
leaves drive as it is.
=item C<Cwd::change_drive(name)>
changes the "current" drive.
=item C<Cwd::sys_is_absolute(name)>
means has drive letter and is_rooted.
=item C<Cwd::sys_is_rooted(name)>
means has leading C<[/\\]> (maybe after a drive-letter:).
=item C<Cwd::sys_is_relative(name)>
means changes with current dir.
=item C<Cwd::sys_cwd(name)>
Interface to cwd from EMX. Used by C<Cwd::cwd>.
=item C<Cwd::sys_abspath(name, dir)>
Really really odious function to implement. Returns absolute name of
file which would have C<name> if CWD were C<dir>. C<Dir> defaults to the
current dir.
=item C<Cwd::extLibpath([type])>
Get current value of extended library search path. If C<type> is
present and positive, works with C<END_LIBPATH>, if negative, works
with C<LIBPATHSTRICT>, otherwise with C<BEGIN_LIBPATH>.
=item C<Cwd::extLibpath_set( path [, type ] )>
Set current value of extended library search path. If C<type> is
present and positive, works with <END_LIBPATH>, if negative, works
with C<LIBPATHSTRICT>, otherwise with C<BEGIN_LIBPATH>.
=item C<OS2::Error(do_harderror,do_exception)>
Returns C<undef> if it was not called yet, otherwise bit 1 is
set if on the previous call do_harderror was enabled, bit
2 is set if on previous call do_exception was enabled.
This function enables/disables error popups associated with
hardware errors (Disk not ready etc.) and software exceptions.
I know of no way to find out the state of popups I<before> the first call
to this function.
=item C<OS2::Errors2Drive(drive)>
Returns C<undef> if it was not called yet, otherwise return false if errors
were not requested to be written to a hard drive, or the drive letter if
this was requested.
This function may redirect error popups associated with hardware errors
(Disk not ready etc.) and software exceptions to the file POPUPLOG.OS2 at
the root directory of the specified drive. Overrides OS2::Error() specified
by individual programs. Given argument undef will disable redirection.
Has global effect, persists after the application exits.
I know of no way to find out the state of redirection of popups to the disk
I<before> the first call to this function.
=item OS2::SysInfo()
Returns a hash with system information. The keys of the hash are
MAX_PATH_LENGTH, MAX_TEXT_SESSIONS, MAX_PM_SESSIONS,
MAX_VDM_SESSIONS, BOOT_DRIVE, DYN_PRI_VARIATION,
MAX_WAIT, MIN_SLICE, MAX_SLICE, PAGE_SIZE,
VERSION_MAJOR, VERSION_MINOR, VERSION_REVISION,
MS_COUNT, TIME_LOW, TIME_HIGH, TOTPHYSMEM, TOTRESMEM,
TOTAVAILMEM, MAXPRMEM, MAXSHMEM, TIMER_INTERVAL,
MAX_COMP_LENGTH, FOREGROUND_FS_SESSION,
FOREGROUND_PROCESS
=item OS2::BootDrive()
Returns a letter without colon.
=item C<OS2::MorphPM(serve)>, C<OS2::UnMorphPM(serve)>
Transforms the current application into a PM application and back.
The argument true means that a real message loop is going to be served.
OS2::MorphPM() returns the PM message queue handle as an integer.
See L</"Centralized management of resources"> for additional details.
=item C<OS2::Serve_Messages(force)>
Fake on-demand retrieval of outstanding PM messages. If C<force> is false,
will not dispatch messages if a real message loop is known to
be present. Returns number of messages retrieved.
Dies with "QUITing..." if WM_QUIT message is obtained.
=item C<OS2::Process_Messages(force [, cnt])>
Retrieval of PM messages until window creation/destruction.
If C<force> is false, will not dispatch messages if a real message loop
is known to be present.
Returns change in number of windows. If C<cnt> is given,
it is incremented by the number of messages retrieved.
Dies with "QUITing..." if WM_QUIT message is obtained.
=item C<OS2::_control87(new,mask)>
the same as L<_control87(3)> of EMX. Takes integers as arguments, returns
the previous coprocessor control word as an integer. Only bits in C<new> which
are present in C<mask> are changed in the control word.
=item OS2::get_control87()
gets the coprocessor control word as an integer.
=item C<OS2::set_control87_em(new=MCW_EM,mask=MCW_EM)>
The variant of OS2::_control87() with default values good for
handling exception mask: if no C<mask>, uses exception mask part of C<new>
only. If no C<new>, disables all the floating point exceptions.
See L</"Misfeatures"> for details.
=item C<OS2::DLLname([how [, \&xsub]])>
Gives the information about the Perl DLL or the DLL containing the C
function bound to by C<&xsub>. The meaning of C<how> is: default (2):
full name; 0: handle; 1: module name.
=back
(Note that some of these may be moved to different libraries -
eventually).
=head2 Prebuilt variables:
=over 4
=item $OS2::emx_rev
numeric value is the same as _emx_rev of EMX, a string value the same
as _emx_vprt (similar to C<0.9c>).
=item $OS2::emx_env
same as _emx_env of EMX, a number similar to 0x8001.
=item $OS2::os_ver
a number C<OS_MAJOR + 0.001 * OS_MINOR>.
=item $OS2::is_aout
true if the Perl library was compiled in AOUT format.
=item $OS2::can_fork
true if the current executable is an AOUT EMX executable, so Perl can
fork. Do not use this, use the portable check for
$Config::Config{dfork}.
=item $OS2::nsyserror
This variable (default is 1) controls whether to enforce the contents
of $^E to start with C<SYS0003>-like id. If set to 0, then the string
value of $^E is what is available from the OS/2 message file. (Some
messages in this file have an C<SYS0003>-like id prepended, some not.)
=back
=head2 Misfeatures
=over 4
=item *
Since L<flock(3)> is present in EMX, but is not functional, it is
emulated by perl. To disable the emulations, set environment variable
C<USE_PERL_FLOCK=0>.
=item *
Here is the list of things which may be "broken" on
EMX (from EMX docs):
=over 4
=item *
The functions L<recvmsg(3)>, L<sendmsg(3)>, and L<socketpair(3)> are not
implemented.
=item *
L<sock_init(3)> is not required and not implemented.
=item *
L<flock(3)> is not yet implemented (dummy function). (Perl has a workaround.)
=item *
L<kill(3)>: Special treatment of PID=0, PID=1 and PID=-1 is not implemented.
=item *
L<waitpid(3)>:
WUNTRACED
Not implemented.
waitpid() is not implemented for negative values of PID.
=back
Note that C<kill -9> does not work with the current version of EMX.
=item *
See L</"Text-mode filehandles">.
=item *
Unix-domain sockets on OS/2 live in a pseudo-file-system C</sockets/...>.
To avoid a failure to create a socket with a name of a different form,
C<"/socket/"> is prepended to the socket name (unless it starts with this
already).
This may lead to problems later in case the socket is accessed via the
"usual" file-system calls using the "initial" name.
=item *
Apparently, IBM used a compiler (for some period of time around '95?) which
changes FP mask right and left. This is not I<that> bad for IBM's
programs, but the same compiler was used for DLLs which are used with
general-purpose applications. When these DLLs are used, the state of
floating-point flags in the application is not predictable.
What is much worse, some DLLs change the floating point flags when in
_DLLInitTerm() (e.g., F<TCP32IP>). This means that even if you do not I<call>
any function in the DLL, just the act of loading this DLL will reset your
flags. What is worse, the same compiler was used to compile some HOOK DLLs.
Given that HOOK dlls are executed in the context of I<all> the applications
in the system, this means a complete unpredictability of floating point
flags on systems using such HOOK DLLs. E.g., F<GAMESRVR.DLL> of B<DIVE>
origin changes the floating point flags on each write to the TTY of a VIO
(windowed text-mode) applications.
Some other (not completely debugged) situations when FP flags change include
some video drivers (?), and some operations related to creation of the windows.
People who code B<OpenGL> may have more experience on this.
Perl is generally used in the situation when all the floating-point
exceptions are ignored, as is the default under EMX. If they are not ignored,
some benign Perl programs would get a C<SIGFPE> and would die a horrible death.
To circumvent this, Perl uses two hacks. They help against I<one> type of
damage only: FP flags changed when loading a DLL.
One of the hacks is to disable floating point exceptions on Perl startup (as
is the default with EMX). This helps only with compile-time-linked DLLs
changing the flags before main() had a chance to be called.
The other hack is to restore FP flags after a call to dlopen(). This helps
against similar damage done by DLLs _DLLInitTerm() at runtime. Currently
no way to switch these hacks off is provided.
=back
=head2 Modifications
Perl modifies some standard C library calls in the following ways:
=over 9
=item C<popen>
C<my_popen> uses F<sh.exe> if shell is required, cf. L</"C<PERL_SH_DIR>">.
=item C<tmpnam>
is created using C<TMP> or C<TEMP> environment variable, via
C<tempnam>.
=item C<tmpfile>
If the current directory is not writable, file is created using modified
C<tmpnam>, so there may be a race condition.
=item C<ctermid>
a dummy implementation.
=item C<stat>
C<os2_stat> special-cases F</dev/tty> and F</dev/con>.
=item C<mkdir>, C<rmdir>
these EMX functions do not work if the path contains a trailing C</>.
Perl contains a workaround for this.
=item C<flock>
Since L<flock(3)> is present in EMX, but is not functional, it is
emulated by perl. To disable the emulations, set environment variable
C<USE_PERL_FLOCK=0>.
=back
=head2 Identifying DLLs
All the DLLs built with the current versions of Perl have ID strings
identifying the name of the extension, its version, and the version
of Perl required for this DLL. Run C<bldlevel DLL-name> to find this
info.
=head2 Centralized management of resources
Since to call certain OS/2 API one needs to have a correctly initialized
C<Win> subsystem, OS/2-specific extensions may require getting C<HAB>s and
C<HMQ>s. If an extension would do it on its own, another extension could
fail to initialize.
Perl provides a centralized management of these resources:
=over
=item C<HAB>
To get the HAB, the extension should call C<hab = perl_hab_GET()> in C. After
this call is performed, C<hab> may be accessed as C<Perl_hab>. There is
no need to release the HAB after it is used.
If by some reasons F<perl.h> cannot be included, use
extern int Perl_hab_GET(void);
instead.
=item C<HMQ>
There are two cases:
=over
=item *
the extension needs an C<HMQ> only because some API will not work otherwise.
Use C<serve = 0> below.
=item *
the extension needs an C<HMQ> since it wants to engage in a PM event loop.
Use C<serve = 1> below.
=back
To get an C<HMQ>, the extension should call C<hmq = perl_hmq_GET(serve)> in C.
After this call is performed, C<hmq> may be accessed as C<Perl_hmq>.
To signal to Perl that HMQ is not needed any more, call
C<perl_hmq_UNSET(serve)>. Perl process will automatically morph/unmorph itself
into/from a PM process if HMQ is needed/not-needed. Perl will automatically
enable/disable C<WM_QUIT> message during shutdown if the message queue is
served/not-served.
B<NOTE>. If during a shutdown there is a message queue which did not disable
WM_QUIT, and which did not process the received WM_QUIT message, the
shutdown will be automatically cancelled. Do not call C<perl_hmq_GET(1)>
unless you are going to process messages on an orderly basis.
=item Treating errors reported by OS/2 API
There are two principal conventions (it is useful to call them C<Dos*>
and C<Win*> - though this part of the function signature is not always
determined by the name of the API) of reporting the error conditions
of OS/2 API. Most of C<Dos*> APIs report the error code as the result
of the call (so 0 means success, and there are many types of errors).
Most of C<Win*> API report success/fail via the result being
C<TRUE>/C<FALSE>; to find the reason for the failure one should call
WinGetLastError() API.
Some C<Win*> entry points also overload a "meaningful" return value
with the error indicator; having a 0 return value indicates an error.
Yet some other C<Win*> entry points overload things even more, and 0
return value may mean a successful call returning a valid value 0, as
well as an error condition; in the case of a 0 return value one should
call WinGetLastError() API to distinguish a successful call from a
failing one.
By convention, all the calls to OS/2 API should indicate their
failures by resetting $^E. All the Perl-accessible functions which
call OS/2 API may be broken into two classes: some die()s when an API
error is encountered, the other report the error via a false return
value (of course, this does not concern Perl-accessible functions
which I<expect> a failure of the OS/2 API call, having some workarounds
coded).
Obviously, in the situation of the last type of the signature of an OS/2
API, it is must more convenient for the users if the failure is
indicated by die()ing: one does not need to check $^E to know that
something went wrong. If, however, this solution is not desirable by
some reason, the code in question should reset $^E to 0 before making
this OS/2 API call, so that the caller of this Perl-accessible
function has a chance to distinguish a success-but-0-return value from
a failure. (One may return undef as an alternative way of reporting
an error.)
The macros to simplify this type of error propagation are
=over
=item C<CheckOSError(expr)>
Returns true on error, sets $^E. Expects expr() be a call of
C<Dos*>-style API.
=item C<CheckWinError(expr)>
Returns true on error, sets $^E. Expects expr() be a call of
C<Win*>-style API.
=item C<SaveWinError(expr)>
Returns C<expr>, sets $^E from WinGetLastError() if C<expr> is false.
=item C<SaveCroakWinError(expr,die,name1,name2)>
Returns C<expr>, sets $^E from WinGetLastError() if C<expr> is false,
and die()s if C<die> and $^E are true. The message to die is the
concatenated strings C<name1> and C<name2>, separated by C<": "> from
the contents of $^E.
=item C<WinError_2_Perl_rc>
Sets C<Perl_rc> to the return value of WinGetLastError().
=item C<FillWinError>
Sets C<Perl_rc> to the return value of WinGetLastError(), and sets $^E
to the corresponding value.
=item C<FillOSError(rc)>
Sets C<Perl_rc> to C<rc>, and sets $^E to the corresponding value.
=back
=item Loading DLLs and ordinals in DLLs
Some DLLs are only present in some versions of OS/2, or in some
configurations of OS/2. Some exported entry points are present only
in DLLs shipped with some versions of OS/2. If these DLLs and entry
points were linked directly for a Perl executable/DLL or from a Perl
extensions, this binary would work only with the specified
versions/setups. Even if these entry points were not needed, the
I<load> of the executable (or DLL) would fail.
For example, many newer useful APIs are not present in OS/2 v2; many
PM-related APIs require DLLs not available on floppy-boot setup.
To make these calls fail I<only when the calls are executed>, one
should call these API via a dynamic linking API. There is a subsystem
in Perl to simplify such type of calls. A large number of entry
points available for such linking is provided (see C<entries_ordinals>
- and also C<PMWIN_entries> - in F<os2ish.h>). These ordinals can be
accessed via the APIs:
CallORD(), DeclFuncByORD(), DeclVoidFuncByORD(),
DeclOSFuncByORD(), DeclWinFuncByORD(), AssignFuncPByORD(),
DeclWinFuncByORD_CACHE(), DeclWinFuncByORD_CACHE_survive(),
DeclWinFuncByORD_CACHE_resetError_survive(),
DeclWinFunc_CACHE(), DeclWinFunc_CACHE_resetError(),
DeclWinFunc_CACHE_survive(), DeclWinFunc_CACHE_resetError_survive()
See the header files and the C code in the supplied OS/2-related
modules for the details on usage of these functions.
Some of these functions also combine dynaloading semantic with the
error-propagation semantic discussed above.
=back
=head1 Perl flavors
Because of idiosyncrasies of OS/2 one cannot have all the eggs in the
same basket (though EMX environment tries hard to overcome this
limitations, so the situation may somehow improve). There are 4
executables for Perl provided by the distribution:
=head2 F<perl.exe>
The main workhorse. This is a chimera executable: it is compiled as an
C<a.out>-style executable, but is linked with C<omf>-style dynamic
library F<perl.dll>, and with dynamic CRT DLL. This executable is a
VIO application.
It can load perl dynamic extensions, and it can fork().
B<Note.> Keep in mind that fork() is needed to open a pipe to yourself.
=head2 F<perl_.exe>
This is a statically linked C<a.out>-style executable. It cannot
load dynamic Perl extensions. The executable supplied in binary
distributions has a lot of extensions prebuilt, thus the above restriction is
important only if you use custom-built extensions. This executable is a VIO
application.
I<This is the only executable with does not require OS/2.> The
friends locked into C<M$> world would appreciate the fact that this
executable runs under DOS, Win0.3*, Win0.95 and WinNT with an
appropriate extender. See L</"Other OSes">.
=head2 F<perl__.exe>
This is the same executable as F<perl___.exe>, but it is a PM
application.
B<Note.> Usually (unless explicitly redirected during the startup)
STDIN, STDERR, and STDOUT of a PM
application are redirected to F<nul>. However, it is possible to I<see>
them if you start C<perl__.exe> from a PM program which emulates a
console window, like I<Shell mode> of Emacs or EPM. Thus it I<is
possible> to use Perl debugger (see L<perldebug>) to debug your PM
application (but beware of the message loop lockups - this will not
work if you have a message queue to serve, unless you hook the serving
into the getc() function of the debugger).
Another way to see the output of a PM program is to run it as
pm_prog args 2>&1 | cat -
with a shell I<different> from F<cmd.exe>, so that it does not create
a link between a VIO session and the session of C<pm_porg>. (Such a link
closes the VIO window.) E.g., this works with F<sh.exe> - or with Perl!
open P, 'pm_prog args 2>&1 |' or die;
print while <P>;
The flavor F<perl__.exe> is required if you want to start your program without
a VIO window present, but not C<detach>ed (run C<help detach> for more info).
Very useful for extensions which use PM, like C<Perl/Tk> or C<OpenGL>.
Note also that the differences between PM and VIO executables are only
in the I<default> behaviour. One can start I<any> executable in
I<any> kind of session by using the arguments C</fs>, C</pm> or
C</win> switches of the command C<start> (of F<CMD.EXE> or a similar
shell). Alternatively, one can use the numeric first argument of the
C<system> Perl function (see L<OS2::Process>).
=head2 F<perl___.exe>
This is an C<omf>-style executable which is dynamically linked to
F<perl.dll> and CRT DLL. I know no advantages of this executable
over C<perl.exe>, but it cannot fork() at all. Well, one advantage is
that the build process is not so convoluted as with C<perl.exe>.
It is a VIO application.
=head2 Why strange names?
Since Perl processes the C<#!>-line (cf.
L<perlrun/DESCRIPTION>, L<perlrun/Command Switches>,
L<perldiag/"No Perl script found in input">), it should know when a
program I<is a Perl>. There is some naming convention which allows
Perl to distinguish correct lines from wrong ones. The above names are
almost the only names allowed by this convention which do not contain
digits (which have absolutely different semantics).
=head2 Why dynamic linking?
Well, having several executables dynamically linked to the same huge
library has its advantages, but this would not substantiate the
additional work to make it compile. The reason is the complicated-to-developers
but very quick and convenient-to-users "hard" dynamic linking used by OS/2.
There are two distinctive features of the dyna-linking model of OS/2:
first, all the references to external functions are resolved at the compile time;
second, there is no runtime fixup of the DLLs after they are loaded into memory.
The first feature is an enormous advantage over other models: it avoids
conflicts when several DLLs used by an application export entries with
the same name. In such cases "other" models of dyna-linking just choose
between these two entry points using some random criterion - with predictable
disasters as results. But it is the second feature which requires the build
of F<perl.dll>.
The address tables of DLLs are patched only once, when they are
loaded. The addresses of the entry points into DLLs are guaranteed to be
the same for all the programs which use the same DLL. This removes the
runtime fixup - once DLL is loaded, its code is read-only.
While this allows some (significant?) performance advantages, this makes life
much harder for developers, since the above scheme makes it impossible
for a DLL to be "linked" to a symbol in the F<.EXE> file. Indeed, this
would need a DLL to have different relocations tables for the
(different) executables which use this DLL.
However, a dynamically loaded Perl extension is forced to use some symbols
from the perl
executable, e.g., to know how to find the arguments to the functions:
the arguments live on the perl
internal evaluation stack. The solution is to put the main code of
the interpreter into a DLL, and make the F<.EXE> file which just loads
this DLL into memory and supplies command-arguments. The extension DLL
cannot link to symbols in F<.EXE>, but it has no problem linking
to symbols in the F<.DLL>.
This I<greatly> increases the load time for the application (as well as
complexity of the compilation). Since interpreter is in a DLL,
the C RTL is basically forced to reside in a DLL as well (otherwise
extensions would not be able to use CRT). There are some advantages if
you use different flavors of perl, such as running F<perl.exe> and
F<perl__.exe> simultaneously: they share the memory of F<perl.dll>.
B<NOTE>. There is one additional effect which makes DLLs more wasteful:
DLLs are loaded in the shared memory region, which is a scarse resource
given the 512M barrier of the "standard" OS/2 virtual memory. The code of
F<.EXE> files is also shared by all the processes which use the particular
F<.EXE>, but they are "shared in the private address space of the process";
this is possible because the address at which different sections
of the F<.EXE> file are loaded is decided at compile-time, thus all the
processes have these sections loaded at same addresses, and no fixup
of internal links inside the F<.EXE> is needed.
Since DLLs may be loaded at run time, to have the same mechanism for DLLs
one needs to have the address range of I<any of the loaded> DLLs in the
system to be available I<in all the processes> which did not load a particular
DLL yet. This is why the DLLs are mapped to the shared memory region.
=head2 Why chimera build?
Current EMX environment does not allow DLLs compiled using Unixish
C<a.out> format to export symbols for data (or at least some types of
data). This forces C<omf>-style compile of F<perl.dll>.
Current EMX environment does not allow F<.EXE> files compiled in
C<omf> format to fork(). fork() is needed for exactly three Perl
operations:
=over 4
=item *
explicit fork() in the script,
=item *
C<open FH, "|-">
=item *
C<open FH, "-|">, in other words, opening pipes to itself.
=back
While these operations are not questions of life and death, they are
needed for a lot of
useful scripts. This forces C<a.out>-style compile of
F<perl.exe>.
=head1 ENVIRONMENT
Here we list environment variables with are either OS/2- and DOS- and
Win*-specific, or are more important under OS/2 than under other OSes.
=head2 C<PERLLIB_PREFIX>
Specific for EMX port. Should have the form
path1;path2
or
path1 path2
If the beginning of some prebuilt path matches F<path1>, it is
substituted with F<path2>.
Should be used if the perl library is moved from the default
location in preference to C<PERL(5)LIB>, since this would not leave wrong
entries in @INC. For example, if the compiled version of perl looks for @INC
in F<f:/perllib/lib>, and you want to install the library in
F<h:/opt/gnu>, do
set PERLLIB_PREFIX=f:/perllib/lib;h:/opt/gnu
This will cause Perl with the prebuilt @INC of
f:/perllib/lib/5.00553/os2
f:/perllib/lib/5.00553
f:/perllib/lib/site_perl/5.00553/os2
f:/perllib/lib/site_perl/5.00553
.
to use the following @INC:
h:/opt/gnu/5.00553/os2
h:/opt/gnu/5.00553
h:/opt/gnu/site_perl/5.00553/os2
h:/opt/gnu/site_perl/5.00553
.
=head2 C<PERL_BADLANG>
If 0, perl ignores setlocale() failing. May be useful with some
strange I<locale>s.
=head2 C<PERL_BADFREE>
If 0, perl would not warn of in case of unwarranted free(). With older
perls this might be
useful in conjunction with the module DB_File, which was buggy when
dynamically linked and OMF-built.
Should not be set with newer Perls, since this may hide some I<real> problems.
=head2 C<PERL_SH_DIR>
Specific for EMX port. Gives the directory part of the location for
F<sh.exe>.
=head2 C<USE_PERL_FLOCK>
Specific for EMX port. Since L<flock(3)> is present in EMX, but is not
functional, it is emulated by perl. To disable the emulations, set
environment variable C<USE_PERL_FLOCK=0>.
=head2 C<TMP> or C<TEMP>
Specific for EMX port. Used as storage place for temporary files.
=head1 Evolution
Here we list major changes which could make you by surprise.
=head2 Text-mode filehandles
Starting from version 5.8, Perl uses a builtin translation layer for
text-mode files. This replaces the efficient well-tested EMX layer by
some code which should be best characterized as a "quick hack".
In addition to possible bugs and an inability to follow changes to the
translation policy with off/on switches of TERMIO translation, this
introduces a serious incompatible change: before sysread() on
text-mode filehandles would go through the translation layer, now it
would not.
=head2 Priorities
C<setpriority> and C<getpriority> are not compatible with earlier
ports by Andreas Kaiser. See C<"setpriority, getpriority">.
=head2 DLL name mangling: pre 5.6.2
With the release 5.003_01 the dynamically loadable libraries
should be rebuilt when a different version of Perl is compiled. In particular,
DLLs (including F<perl.dll>) are now created with the names
which contain a checksum, thus allowing workaround for OS/2 scheme of
caching DLLs.
It may be possible to code a simple workaround which would
=over
=item *
find the old DLLs looking through the old @INC;
=item *
mangle the names according to the scheme of new perl and copy the DLLs to
these names;
=item *
edit the internal C<LX> tables of DLL to reflect the change of the name
(probably not needed for Perl extension DLLs, since the internally coded names
are not used for "specific" DLLs, they used only for "global" DLLs).
=item *
edit the internal C<IMPORT> tables and change the name of the "old"
F<perl????.dll> to the "new" F<perl????.dll>.
=back
=head2 DLL name mangling: 5.6.2 and beyond
In fact mangling of I<extension> DLLs was done due to misunderstanding
of the OS/2 dynaloading model. OS/2 (effectively) maintains two
different tables of loaded DLL:
=over
=item Global DLLs
those loaded by the base name from C<LIBPATH>; including those
associated at link time;
=item specific DLLs
loaded by the full name.
=back
When resolving a request for a global DLL, the table of already-loaded
specific DLLs is (effectively) ignored; moreover, specific DLLs are
I<always> loaded from the prescribed path.
There is/was a minor twist which makes this scheme fragile: what to do
with DLLs loaded from
=over
=item C<BEGINLIBPATH> and C<ENDLIBPATH>
(which depend on the process)
=item F<.> from C<LIBPATH>
which I<effectively> depends on the process (although C<LIBPATH> is the
same for all the processes).
=back
Unless C<LIBPATHSTRICT> is set to C<T> (and the kernel is after
2000/09/01), such DLLs are considered to be global. When loading a
global DLL it is first looked in the table of already-loaded global
DLLs. Because of this the fact that one executable loaded a DLL from
C<BEGINLIBPATH> and C<ENDLIBPATH>, or F<.> from C<LIBPATH> may affect
I<which> DLL is loaded when I<another> executable requests a DLL with
the same name. I<This> is the reason for version-specific mangling of
the DLL name for perl DLL.
Since the Perl extension DLLs are always loaded with the full path,
there is no need to mangle their names in a version-specific ways:
their directory already reflects the corresponding version of perl,
and @INC takes into account binary compatibility with older version.
Starting from C<5.6.2> the name mangling scheme is fixed to be the
same as for Perl 5.005_53 (same as in a popular binary release). Thus
new Perls will be able to I<resolve the names> of old extension DLLs
if @INC allows finding their directories.
However, this still does not guarantee that these DLL may be loaded.
The reason is the mangling of the name of the I<Perl DLL>. And since
the extension DLLs link with the Perl DLL, extension DLLs for older
versions would load an older Perl DLL, and would most probably
segfault (since the data in this DLL is not properly initialized).
There is a partial workaround (which can be made complete with newer
OS/2 kernels): create a forwarder DLL with the same name as the DLL of
the older version of Perl, which forwards the entry points to the
newer Perl's DLL. Make this DLL accessible on (say) the C<BEGINLIBPATH> of
the new Perl executable. When the new executable accesses old Perl's
extension DLLs, they would request the old Perl's DLL by name, get the
forwarder instead, so effectively will link with the currently running
(new) Perl DLL.
This may break in two ways:
=over
=item *
Old perl executable is started when a new executable is running has
loaded an extension compiled for the old executable (ouph!). In this
case the old executable will get a forwarder DLL instead of the old
perl DLL, so would link with the new perl DLL. While not directly
fatal, it will behave the same as new executable. This beats the whole
purpose of explicitly starting an old executable.
=item *
A new executable loads an extension compiled for the old executable
when an old perl executable is running. In this case the extension
will not pick up the forwarder - with fatal results.
=back
With support for C<LIBPATHSTRICT> this may be circumvented - unless
one of DLLs is started from F<.> from C<LIBPATH> (I do not know
whether C<LIBPATHSTRICT> affects this case).
B<REMARK>. Unless newer kernels allow F<.> in C<BEGINLIBPATH> (older
do not), this mess cannot be completely cleaned. (It turns out that
as of the beginning of 2002, F<.> is not allowed, but F<.\.> is - and
it has the same effect.)
B<REMARK>. C<LIBPATHSTRICT>, C<BEGINLIBPATH> and C<ENDLIBPATH> are
not environment variables, although F<cmd.exe> emulates them on C<SET
...> lines. From Perl they may be accessed by
L<Cwd::extLibpath|/Cwd::extLibpath([type])> and
L<Cwd::extLibpath_set|/Cwd::extLibpath_set( path [, type ] )>.
=head2 DLL forwarder generation
Assume that the old DLL is named F<perlE0AC.dll> (as is one for
5.005_53), and the new version is 5.6.1. Create a file
F<perl5shim.def-leader> with
LIBRARY 'perlE0AC' INITINSTANCE TERMINSTANCE
DESCRIPTION '@#perl5-porters@perl.org:5.006001#@ Perl module for 5.00553 -> Perl 5.6.1 forwarder'
CODE LOADONCALL
DATA LOADONCALL NONSHARED MULTIPLE
EXPORTS
modifying the versions/names as needed. Run
perl -wnle "next if 0../EXPORTS/; print qq( \"$1\")
if /\"(\w+)\"/" perl5.def >lst
in the Perl build directory (to make the DLL smaller replace perl5.def
with the definition file for the older version of Perl if present).
cat perl5shim.def-leader lst >perl5shim.def
gcc -Zomf -Zdll -o perlE0AC.dll perl5shim.def -s -llibperl
(ignore multiple C<warning L4085>).
=head2 Threading
As of release 5.003_01 perl is linked to multithreaded C RTL
DLL. If perl itself is not compiled multithread-enabled, so will not be perl's
malloc(). However, extensions may use multiple thread on their own
risk.
This was needed to compile C<Perl/Tk> for XFree86-OS/2 out-of-the-box, and
link with DLLs for other useful libraries, which typically are compiled
with C<-Zmt -Zcrtdll>.
=head2 Calls to external programs
Due to a popular demand the perl external program calling has been
changed wrt Andreas Kaiser's port. I<If> perl needs to call an
external program I<via shell>, the F<f:/bin/sh.exe> will be called, or
whatever is the override, see L</"C<PERL_SH_DIR>">.
Thus means that you need to get some copy of a F<sh.exe> as well (I
use one from pdksh). The path F<F:/bin> above is set up automatically during
the build to a correct value on the builder machine, but is
overridable at runtime,
B<Reasons:> a consensus on C<perl5-porters> was that perl should use
one non-overridable shell per platform. The obvious choices for OS/2
are F<cmd.exe> and F<sh.exe>. Having perl build itself would be impossible
with F<cmd.exe> as a shell, thus I picked up C<sh.exe>. This assures almost
100% compatibility with the scripts coming from *nix. As an added benefit
this works as well under DOS if you use DOS-enabled port of pdksh
(see L</Prerequisites>).
B<Disadvantages:> currently F<sh.exe> of pdksh calls external programs
via fork()/exec(), and there is I<no> functioning exec() on
OS/2. exec() is emulated by EMX by an asynchronous call while the caller
waits for child completion (to pretend that the C<pid> did not change). This
means that 1 I<extra> copy of F<sh.exe> is made active via fork()/exec(),
which may lead to some resources taken from the system (even if we do
not count extra work needed for fork()ing).
Note that this a lesser issue now when we do not spawn F<sh.exe>
unless needed (metachars found).
One can always start F<cmd.exe> explicitly via
system 'cmd', '/c', 'mycmd', 'arg1', 'arg2', ...
If you need to use F<cmd.exe>, and do not want to hand-edit thousands of your
scripts, the long-term solution proposed on p5-p is to have a directive
use OS2::Cmd;
which will override system(), exec(), C<``>, and
C<open(,'...|')>. With current perl you may override only system(),
readpipe() - the explicit version of C<``>, and maybe exec(). The code
will substitute the one-argument call to system() by
C<CORE::system('cmd.exe', '/c', shift)>.
If you have some working code for C<OS2::Cmd>, please send it to me,
I will include it into distribution. I have no need for such a module, so
cannot test it.
For the details of the current situation with calling external programs,
see L<Starting OSE<sol>2 (and DOS) programs under Perl>. Set us mention a couple
of features:
=over 4
=item *
External scripts may be called by their basename. Perl will try the same
extensions as when processing B<-S> command-line switch.
=item *
External scripts starting with C<#!> or C<extproc > will be executed directly,
without calling the shell, by calling the program specified on the rest of
the first line.
=back
=head2 Memory allocation
Perl uses its own malloc() under OS/2 - interpreters are usually malloc-bound
for speed, but perl is not, since its malloc is lightning-fast.
Perl-memory-usage-tuned benchmarks show that Perl's malloc is 5 times quicker
than EMX one. I do not have convincing data about memory footprint, but
a (pretty random) benchmark showed that Perl's one is 5% better.
Combination of perl's malloc() and rigid DLL name resolution creates
a special problem with library functions which expect their return value to
be free()d by system's free(). To facilitate extensions which need to call
such functions, system memory-allocation functions are still available with
the prefix C<emx_> added. (Currently only DLL perl has this, it should
propagate to F<perl_.exe> shortly.)
=head2 Threads
One can build perl with thread support enabled by providing C<-D usethreads>
option to F<Configure>. Currently OS/2 support of threads is very
preliminary.
Most notable problems:
=over 4
=item C<COND_WAIT>
may have a race condition (but probably does not due to edge-triggered
nature of OS/2 Event semaphores). (Needs a reimplementation (in terms of chaining
waiting threads, with the linked list stored in per-thread structure?)?)
=item F<os2.c>
has a couple of static variables used in OS/2-specific functions. (Need to be
moved to per-thread structure, or serialized?)
=back
Note that these problems should not discourage experimenting, since they
have a low probability of affecting small programs.
=head1 BUGS
This description is not updated often (since 5.6.1?), see F<./os2/Changes>
for more info.
=cut
OS/2 extensions
~~~~~~~~~~~~~~~
I include 3 extensions by Andreas Kaiser, OS2::REXX, OS2::UPM, and OS2::FTP,
into my ftp directory, mirrored on CPAN. I made
some minor changes needed to compile them by standard tools. I cannot
test UPM and FTP, so I will appreciate your feedback. Other extensions
there are OS2::ExtAttr, OS2::PrfDB for tied access to EAs and .INI
files - and maybe some other extensions at the time you read it.
Note that OS2 perl defines 2 pseudo-extension functions
OS2::Copy::copy and DynaLoader::mod2fname (many more now, see
L</Prebuilt methods>).
The -R switch of older perl is deprecated. If you need to call a REXX code
which needs access to variables, include the call into a REXX compartment
created by
REXX_call {...block...};
Two new functions are supported by REXX code,
REXX_eval 'string';
REXX_eval_with 'string', REXX_function_name => \&perl_sub_reference;
If you have some other extensions you want to share, send the code to
me. At least two are available: tied access to EA's, and tied access
to system databases.
=head1 AUTHOR
Ilya Zakharevich, cpan@ilyaz.org
=head1 SEE ALSO
perl(1).
=cut
PK z3�Z�H�^d d perldsc.podnu �[��� =head1 NAME
X<data structure> X<complex data structure> X<struct>
perldsc - Perl Data Structures Cookbook
=head1 DESCRIPTION
Perl lets us have complex data structures. You can write something like
this and all of a sudden, you'd have an array with three dimensions!
for my $x (1 .. 10) {
for my $y (1 .. 10) {
for my $z (1 .. 10) {
$AoA[$x][$y][$z] =
$x ** $y + $z;
}
}
}
Alas, however simple this may appear, underneath it's a much more
elaborate construct than meets the eye!
How do you print it out? Why can't you say just C<print @AoA>? How do
you sort it? How can you pass it to a function or get one of these back
from a function? Is it an object? Can you save it to disk to read
back later? How do you access whole rows or columns of that matrix? Do
all the values have to be numeric?
As you see, it's quite easy to become confused. While some small portion
of the blame for this can be attributed to the reference-based
implementation, it's really more due to a lack of existing documentation with
examples designed for the beginner.
This document is meant to be a detailed but understandable treatment of the
many different sorts of data structures you might want to develop. It
should also serve as a cookbook of examples. That way, when you need to
create one of these complex data structures, you can just pinch, pilfer, or
purloin a drop-in example from here.
Let's look at each of these possible constructs in detail. There are separate
sections on each of the following:
=over 5
=item * arrays of arrays
=item * hashes of arrays
=item * arrays of hashes
=item * hashes of hashes
=item * more elaborate constructs
=back
But for now, let's look at general issues common to all
these types of data structures.
=head1 REFERENCES
X<reference> X<dereference> X<dereferencing> X<pointer>
The most important thing to understand about all data structures in
Perl--including multidimensional arrays--is that even though they might
appear otherwise, Perl C<@ARRAY>s and C<%HASH>es are all internally
one-dimensional. They can hold only scalar values (meaning a string,
number, or a reference). They cannot directly contain other arrays or
hashes, but instead contain I<references> to other arrays or hashes.
X<multidimensional array> X<array, multidimensional>
You can't use a reference to an array or hash in quite the same way that you
would a real array or hash. For C or C++ programmers unused to
distinguishing between arrays and pointers to the same, this can be
confusing. If so, just think of it as the difference between a structure
and a pointer to a structure.
You can (and should) read more about references in L<perlref>.
Briefly, references are rather like pointers that know what they
point to. (Objects are also a kind of reference, but we won't be needing
them right away--if ever.) This means that when you have something which
looks to you like an access to a two-or-more-dimensional array and/or hash,
what's really going on is that the base type is
merely a one-dimensional entity that contains references to the next
level. It's just that you can I<use> it as though it were a
two-dimensional one. This is actually the way almost all C
multidimensional arrays work as well.
$array[7][12] # array of arrays
$array[7]{string} # array of hashes
$hash{string}[7] # hash of arrays
$hash{string}{'another string'} # hash of hashes
Now, because the top level contains only references, if you try to print
out your array in with a simple print() function, you'll get something
that doesn't look very nice, like this:
my @AoA = ( [2, 3], [4, 5, 7], [0] );
print $AoA[1][2];
7
print @AoA;
ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)
That's because Perl doesn't (ever) implicitly dereference your variables.
If you want to get at the thing a reference is referring to, then you have
to do this yourself using either prefix typing indicators, like
C<${$blah}>, C<@{$blah}>, C<@{$blah[$i]}>, or else postfix pointer arrows,
like C<$a-E<gt>[3]>, C<$h-E<gt>{fred}>, or even C<$ob-E<gt>method()-E<gt>[3]>.
=head1 COMMON MISTAKES
The two most common mistakes made in constructing something like
an array of arrays is either accidentally counting the number of
elements or else taking a reference to the same memory location
repeatedly. Here's the case where you just get the count instead
of a nested array:
for my $i (1..10) {
my @array = somefunc($i);
$AoA[$i] = @array; # WRONG!
}
That's just the simple case of assigning an array to a scalar and getting
its element count. If that's what you really and truly want, then you
might do well to consider being a tad more explicit about it, like this:
for my $i (1..10) {
my @array = somefunc($i);
$counts[$i] = scalar @array;
}
Here's the case of taking a reference to the same memory location
again and again:
# Either without strict or having an outer-scope my @array;
# declaration.
for my $i (1..10) {
@array = somefunc($i);
$AoA[$i] = \@array; # WRONG!
}
So, what's the big problem with that? It looks right, doesn't it?
After all, I just told you that you need an array of references, so by
golly, you've made me one!
Unfortunately, while this is true, it's still broken. All the references
in @AoA refer to the I<very same place>, and they will therefore all hold
whatever was last in @array! It's similar to the problem demonstrated in
the following C program:
#include <pwd.h>
main() {
struct passwd *getpwnam(), *rp, *dp;
rp = getpwnam("root");
dp = getpwnam("daemon");
printf("daemon name is %s\nroot name is %s\n",
dp->pw_name, rp->pw_name);
}
Which will print
daemon name is daemon
root name is daemon
The problem is that both C<rp> and C<dp> are pointers to the same location
in memory! In C, you'd have to remember to malloc() yourself some new
memory. In Perl, you'll want to use the array constructor C<[]> or the
hash constructor C<{}> instead. Here's the right way to do the preceding
broken code fragments:
X<[]> X<{}>
# Either without strict or having an outer-scope my @array;
# declaration.
for my $i (1..10) {
@array = somefunc($i);
$AoA[$i] = [ @array ];
}
The square brackets make a reference to a new array with a I<copy>
of what's in @array at the time of the assignment. This is what
you want.
Note that this will produce something similar, but it's
much harder to read:
# Either without strict or having an outer-scope my @array;
# declaration.
for my $i (1..10) {
@array = 0 .. $i;
@{$AoA[$i]} = @array;
}
Is it the same? Well, maybe so--and maybe not. The subtle difference
is that when you assign something in square brackets, you know for sure
it's always a brand new reference with a new I<copy> of the data.
Something else could be going on in this new case with the C<@{$AoA[$i]}>
dereference on the left-hand-side of the assignment. It all depends on
whether C<$AoA[$i]> had been undefined to start with, or whether it
already contained a reference. If you had already populated @AoA with
references, as in
$AoA[3] = \@another_array;
Then the assignment with the indirection on the left-hand-side would
use the existing reference that was already there:
@{$AoA[3]} = @array;
Of course, this I<would> have the "interesting" effect of clobbering
@another_array. (Have you ever noticed how when a programmer says
something is "interesting", that rather than meaning "intriguing",
they're disturbingly more apt to mean that it's "annoying",
"difficult", or both? :-)
So just remember always to use the array or hash constructors with C<[]>
or C<{}>, and you'll be fine, although it's not always optimally
efficient.
Surprisingly, the following dangerous-looking construct will
actually work out fine:
for my $i (1..10) {
my @array = somefunc($i);
$AoA[$i] = \@array;
}
That's because my() is more of a run-time statement than it is a
compile-time declaration I<per se>. This means that the my() variable is
remade afresh each time through the loop. So even though it I<looks> as
though you stored the same variable reference each time, you actually did
not! This is a subtle distinction that can produce more efficient code at
the risk of misleading all but the most experienced of programmers. So I
usually advise against teaching it to beginners. In fact, except for
passing arguments to functions, I seldom like to see the gimme-a-reference
operator (backslash) used much at all in code. Instead, I advise
beginners that they (and most of the rest of us) should try to use the
much more easily understood constructors C<[]> and C<{}> instead of
relying upon lexical (or dynamic) scoping and hidden reference-counting to
do the right thing behind the scenes.
In summary:
$AoA[$i] = [ @array ]; # usually best
$AoA[$i] = \@array; # perilous; just how my() was that array?
@{ $AoA[$i] } = @array; # way too tricky for most programmers
=head1 CAVEAT ON PRECEDENCE
X<dereference, precedence> X<dereferencing, precedence>
Speaking of things like C<@{$AoA[$i]}>, the following are actually the
same thing:
X<< -> >>
$aref->[2][2] # clear
$$aref[2][2] # confusing
That's because Perl's precedence rules on its five prefix dereferencers
(which look like someone swearing: C<$ @ * % &>) make them bind more
tightly than the postfix subscripting brackets or braces! This will no
doubt come as a great shock to the C or C++ programmer, who is quite
accustomed to using C<*a[i]> to mean what's pointed to by the I<i'th>
element of C<a>. That is, they first take the subscript, and only then
dereference the thing at that subscript. That's fine in C, but this isn't C.
The seemingly equivalent construct in Perl, C<$$aref[$i]> first does
the deref of $aref, making it take $aref as a reference to an
array, and then dereference that, and finally tell you the I<i'th> value
of the array pointed to by $AoA. If you wanted the C notion, you'd have to
write C<${$AoA[$i]}> to force the C<$AoA[$i]> to get evaluated first
before the leading C<$> dereferencer.
=head1 WHY YOU SHOULD ALWAYS C<use strict>
If this is starting to sound scarier than it's worth, relax. Perl has
some features to help you avoid its most common pitfalls. The best
way to avoid getting confused is to start every program like this:
#!/usr/bin/perl -w
use strict;
This way, you'll be forced to declare all your variables with my() and
also disallow accidental "symbolic dereferencing". Therefore if you'd done
this:
my $aref = [
[ "fred", "barney", "pebbles", "bambam", "dino", ],
[ "homer", "bart", "marge", "maggie", ],
[ "george", "jane", "elroy", "judy", ],
];
print $aref[2][2];
The compiler would immediately flag that as an error I<at compile time>,
because you were accidentally accessing C<@aref>, an undeclared
variable, and it would thereby remind you to write instead:
print $aref->[2][2]
=head1 DEBUGGING
X<data structure, debugging> X<complex data structure, debugging>
X<AoA, debugging> X<HoA, debugging> X<AoH, debugging> X<HoH, debugging>
X<array of arrays, debugging> X<hash of arrays, debugging>
X<array of hashes, debugging> X<hash of hashes, debugging>
You can use the debugger's C<x> command to dump out complex data structures.
For example, given the assignment to $AoA above, here's the debugger output:
DB<1> x $AoA
$AoA = ARRAY(0x13b5a0)
0 ARRAY(0x1f0a24)
0 'fred'
1 'barney'
2 'pebbles'
3 'bambam'
4 'dino'
1 ARRAY(0x13b558)
0 'homer'
1 'bart'
2 'marge'
3 'maggie'
2 ARRAY(0x13b540)
0 'george'
1 'jane'
2 'elroy'
3 'judy'
=head1 CODE EXAMPLES
Presented with little comment (these will get their own manpages someday)
here are short code examples illustrating access of various
types of data structures.
=head1 ARRAYS OF ARRAYS
X<array of arrays> X<AoA>
=head2 Declaration of an ARRAY OF ARRAYS
@AoA = (
[ "fred", "barney" ],
[ "george", "jane", "elroy" ],
[ "homer", "marge", "bart" ],
);
=head2 Generation of an ARRAY OF ARRAYS
# reading from file
while ( <> ) {
push @AoA, [ split ];
}
# calling a function
for $i ( 1 .. 10 ) {
$AoA[$i] = [ somefunc($i) ];
}
# using temp vars
for $i ( 1 .. 10 ) {
@tmp = somefunc($i);
$AoA[$i] = [ @tmp ];
}
# add to an existing row
push @{ $AoA[0] }, "wilma", "betty";
=head2 Access and Printing of an ARRAY OF ARRAYS
# one element
$AoA[0][0] = "Fred";
# another element
$AoA[1][1] =~ s/(\w)/\u$1/;
# print the whole thing with refs
for $aref ( @AoA ) {
print "\t [ @$aref ],\n";
}
# print the whole thing with indices
for $i ( 0 .. $#AoA ) {
print "\t [ @{$AoA[$i]} ],\n";
}
# print the whole thing one at a time
for $i ( 0 .. $#AoA ) {
for $j ( 0 .. $#{ $AoA[$i] } ) {
print "elt $i $j is $AoA[$i][$j]\n";
}
}
=head1 HASHES OF ARRAYS
X<hash of arrays> X<HoA>
=head2 Declaration of a HASH OF ARRAYS
%HoA = (
flintstones => [ "fred", "barney" ],
jetsons => [ "george", "jane", "elroy" ],
simpsons => [ "homer", "marge", "bart" ],
);
=head2 Generation of a HASH OF ARRAYS
# reading from file
# flintstones: fred barney wilma dino
while ( <> ) {
next unless s/^(.*?):\s*//;
$HoA{$1} = [ split ];
}
# reading from file; more temps
# flintstones: fred barney wilma dino
while ( $line = <> ) {
($who, $rest) = split /:\s*/, $line, 2;
@fields = split ' ', $rest;
$HoA{$who} = [ @fields ];
}
# calling a function that returns a list
for $group ( "simpsons", "jetsons", "flintstones" ) {
$HoA{$group} = [ get_family($group) ];
}
# likewise, but using temps
for $group ( "simpsons", "jetsons", "flintstones" ) {
@members = get_family($group);
$HoA{$group} = [ @members ];
}
# append new members to an existing family
push @{ $HoA{"flintstones"} }, "wilma", "betty";
=head2 Access and Printing of a HASH OF ARRAYS
# one element
$HoA{flintstones}[0] = "Fred";
# another element
$HoA{simpsons}[1] =~ s/(\w)/\u$1/;
# print the whole thing
foreach $family ( keys %HoA ) {
print "$family: @{ $HoA{$family} }\n"
}
# print the whole thing with indices
foreach $family ( keys %HoA ) {
print "family: ";
foreach $i ( 0 .. $#{ $HoA{$family} } ) {
print " $i = $HoA{$family}[$i]";
}
print "\n";
}
# print the whole thing sorted by number of members
foreach $family ( sort { @{$HoA{$b}} <=> @{$HoA{$a}} } keys %HoA ) {
print "$family: @{ $HoA{$family} }\n"
}
# print the whole thing sorted by number of members and name
foreach $family ( sort {
@{$HoA{$b}} <=> @{$HoA{$a}}
||
$a cmp $b
} keys %HoA )
{
print "$family: ", join(", ", sort @{ $HoA{$family} }), "\n";
}
=head1 ARRAYS OF HASHES
X<array of hashes> X<AoH>
=head2 Declaration of an ARRAY OF HASHES
@AoH = (
{
Lead => "fred",
Friend => "barney",
},
{
Lead => "george",
Wife => "jane",
Son => "elroy",
},
{
Lead => "homer",
Wife => "marge",
Son => "bart",
}
);
=head2 Generation of an ARRAY OF HASHES
# reading from file
# format: LEAD=fred FRIEND=barney
while ( <> ) {
$rec = {};
for $field ( split ) {
($key, $value) = split /=/, $field;
$rec->{$key} = $value;
}
push @AoH, $rec;
}
# reading from file
# format: LEAD=fred FRIEND=barney
# no temp
while ( <> ) {
push @AoH, { split /[\s+=]/ };
}
# calling a function that returns a key/value pair list, like
# "lead","fred","daughter","pebbles"
while ( %fields = getnextpairset() ) {
push @AoH, { %fields };
}
# likewise, but using no temp vars
while (<>) {
push @AoH, { parsepairs($_) };
}
# add key/value to an element
$AoH[0]{pet} = "dino";
$AoH[2]{pet} = "santa's little helper";
=head2 Access and Printing of an ARRAY OF HASHES
# one element
$AoH[0]{lead} = "fred";
# another element
$AoH[1]{lead} =~ s/(\w)/\u$1/;
# print the whole thing with refs
for $href ( @AoH ) {
print "{ ";
for $role ( keys %$href ) {
print "$role=$href->{$role} ";
}
print "}\n";
}
# print the whole thing with indices
for $i ( 0 .. $#AoH ) {
print "$i is { ";
for $role ( keys %{ $AoH[$i] } ) {
print "$role=$AoH[$i]{$role} ";
}
print "}\n";
}
# print the whole thing one at a time
for $i ( 0 .. $#AoH ) {
for $role ( keys %{ $AoH[$i] } ) {
print "elt $i $role is $AoH[$i]{$role}\n";
}
}
=head1 HASHES OF HASHES
X<hash of hashes> X<HoH>
=head2 Declaration of a HASH OF HASHES
%HoH = (
flintstones => {
lead => "fred",
pal => "barney",
},
jetsons => {
lead => "george",
wife => "jane",
"his boy" => "elroy",
},
simpsons => {
lead => "homer",
wife => "marge",
kid => "bart",
},
);
=head2 Generation of a HASH OF HASHES
# reading from file
# flintstones: lead=fred pal=barney wife=wilma pet=dino
while ( <> ) {
next unless s/^(.*?):\s*//;
$who = $1;
for $field ( split ) {
($key, $value) = split /=/, $field;
$HoH{$who}{$key} = $value;
}
# reading from file; more temps
while ( <> ) {
next unless s/^(.*?):\s*//;
$who = $1;
$rec = {};
$HoH{$who} = $rec;
for $field ( split ) {
($key, $value) = split /=/, $field;
$rec->{$key} = $value;
}
}
# calling a function that returns a key,value hash
for $group ( "simpsons", "jetsons", "flintstones" ) {
$HoH{$group} = { get_family($group) };
}
# likewise, but using temps
for $group ( "simpsons", "jetsons", "flintstones" ) {
%members = get_family($group);
$HoH{$group} = { %members };
}
# append new members to an existing family
%new_folks = (
wife => "wilma",
pet => "dino",
);
for $what (keys %new_folks) {
$HoH{flintstones}{$what} = $new_folks{$what};
}
=head2 Access and Printing of a HASH OF HASHES
# one element
$HoH{flintstones}{wife} = "wilma";
# another element
$HoH{simpsons}{lead} =~ s/(\w)/\u$1/;
# print the whole thing
foreach $family ( keys %HoH ) {
print "$family: { ";
for $role ( keys %{ $HoH{$family} } ) {
print "$role=$HoH{$family}{$role} ";
}
print "}\n";
}
# print the whole thing somewhat sorted
foreach $family ( sort keys %HoH ) {
print "$family: { ";
for $role ( sort keys %{ $HoH{$family} } ) {
print "$role=$HoH{$family}{$role} ";
}
print "}\n";
}
# print the whole thing sorted by number of members
foreach $family ( sort { keys %{$HoH{$b}} <=> keys %{$HoH{$a}} }
keys %HoH )
{
print "$family: { ";
for $role ( sort keys %{ $HoH{$family} } ) {
print "$role=$HoH{$family}{$role} ";
}
print "}\n";
}
# establish a sort order (rank) for each role
$i = 0;
for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i }
# now print the whole thing sorted by number of members
foreach $family ( sort { keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} } }
keys %HoH )
{
print "$family: { ";
# and print these according to rank order
for $role ( sort { $rank{$a} <=> $rank{$b} }
keys %{ $HoH{$family} } )
{
print "$role=$HoH{$family}{$role} ";
}
print "}\n";
}
=head1 MORE ELABORATE RECORDS
X<record> X<structure> X<struct>
=head2 Declaration of MORE ELABORATE RECORDS
Here's a sample showing how to create and use a record whose fields are of
many different sorts:
$rec = {
TEXT => $string,
SEQUENCE => [ @old_values ],
LOOKUP => { %some_table },
THATCODE => \&some_function,
THISCODE => sub { $_[0] ** $_[1] },
HANDLE => \*STDOUT,
};
print $rec->{TEXT};
print $rec->{SEQUENCE}[0];
$last = pop @ { $rec->{SEQUENCE} };
print $rec->{LOOKUP}{"key"};
($first_k, $first_v) = each %{ $rec->{LOOKUP} };
$answer = $rec->{THATCODE}->($arg);
$answer = $rec->{THISCODE}->($arg1, $arg2);
# careful of extra block braces on fh ref
print { $rec->{HANDLE} } "a string\n";
use FileHandle;
$rec->{HANDLE}->autoflush(1);
$rec->{HANDLE}->print(" a string\n");
=head2 Declaration of a HASH OF COMPLEX RECORDS
%TV = (
flintstones => {
series => "flintstones",
nights => [ qw(monday thursday friday) ],
members => [
{ name => "fred", role => "lead", age => 36, },
{ name => "wilma", role => "wife", age => 31, },
{ name => "pebbles", role => "kid", age => 4, },
],
},
jetsons => {
series => "jetsons",
nights => [ qw(wednesday saturday) ],
members => [
{ name => "george", role => "lead", age => 41, },
{ name => "jane", role => "wife", age => 39, },
{ name => "elroy", role => "kid", age => 9, },
],
},
simpsons => {
series => "simpsons",
nights => [ qw(monday) ],
members => [
{ name => "homer", role => "lead", age => 34, },
{ name => "marge", role => "wife", age => 37, },
{ name => "bart", role => "kid", age => 11, },
],
},
);
=head2 Generation of a HASH OF COMPLEX RECORDS
# reading from file
# this is most easily done by having the file itself be
# in the raw data format as shown above. perl is happy
# to parse complex data structures if declared as data, so
# sometimes it's easiest to do that
# here's a piece by piece build up
$rec = {};
$rec->{series} = "flintstones";
$rec->{nights} = [ find_days() ];
@members = ();
# assume this file in field=value syntax
while (<>) {
%fields = split /[\s=]+/;
push @members, { %fields };
}
$rec->{members} = [ @members ];
# now remember the whole thing
$TV{ $rec->{series} } = $rec;
###########################################################
# now, you might want to make interesting extra fields that
# include pointers back into the same data structure so if
# change one piece, it changes everywhere, like for example
# if you wanted a {kids} field that was a reference
# to an array of the kids' records without having duplicate
# records and thus update problems.
###########################################################
foreach $family (keys %TV) {
$rec = $TV{$family}; # temp pointer
@kids = ();
for $person ( @{ $rec->{members} } ) {
if ($person->{role} =~ /kid|son|daughter/) {
push @kids, $person;
}
}
# REMEMBER: $rec and $TV{$family} point to same data!!
$rec->{kids} = [ @kids ];
}
# you copied the array, but the array itself contains pointers
# to uncopied objects. this means that if you make bart get
# older via
$TV{simpsons}{kids}[0]{age}++;
# then this would also change in
print $TV{simpsons}{members}[2]{age};
# because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2]
# both point to the same underlying anonymous hash table
# print the whole thing
foreach $family ( keys %TV ) {
print "the $family";
print " is on during @{ $TV{$family}{nights} }\n";
print "its members are:\n";
for $who ( @{ $TV{$family}{members} } ) {
print " $who->{name} ($who->{role}), age $who->{age}\n";
}
print "it turns out that $TV{$family}{lead} has ";
print scalar ( @{ $TV{$family}{kids} } ), " kids named ";
print join (", ", map { $_->{name} } @{ $TV{$family}{kids} } );
print "\n";
}
=head1 Database Ties
You cannot easily tie a multilevel data structure (such as a hash of
hashes) to a dbm file. The first problem is that all but GDBM and
Berkeley DB have size limitations, but beyond that, you also have problems
with how references are to be represented on disk. One experimental
module that does partially attempt to address this need is the MLDBM
module. Check your nearest CPAN site as described in L<perlmodlib> for
source code to MLDBM.
=head1 SEE ALSO
L<perlref>, L<perllol>, L<perldata>, L<perlobj>
=head1 AUTHOR
Tom Christiansen <F<tchrist@perl.com>>
PK z3�Z[���
9
9
perlce.podnu �[��� If you read this file _as_is_, just ignore the funny characters you
see. It is written in the POD format (see pod/perlpod.pod) which is
specifically designed to be readable as is.
=head1 NAME
perlce - Perl for WinCE
=head1 Building Perl for WinCE
=head2 WARNING
B<< Much of this document has become very out of date and needs updating,
rewriting or deleting. The build process was overhauled during the 5.19
development track and the current instructions as of that time are given
in L</CURRENT BUILD INSTRUCTIONS>; the previous build instructions, which
are largely superseded but may still contain some useful information, are
left in L</OLD BUILD INSTRUCTIONS> but really need removing after anything
of use has been extracted from them. >>
=head2 DESCRIPTION
This file gives the instructions for building Perl5.8 and above for
WinCE. Please read and understand the terms under which this
software is distributed.
=head2 General explanations on cross-compiling WinCE
=over
=item *
F<miniperl> is built. This is a single executable (without DLL), intended
to run on Win32, and it will facilitate remaining build process; all binaries
built after it are foreign and should not run locally.
F<miniperl> is built using F<./win32/Makefile>; this is part of normal
build process invoked as dependency from wince/Makefile.ce
=item *
After F<miniperl> is built, F<configpm> is invoked to create right F<Config.pm>
in right place and its corresponding Cross.pm.
Unlike Win32 build, miniperl will not have F<Config.pm> of host within reach;
it rather will use F<Config.pm> from within cross-compilation directories.
File F<Cross.pm> is dead simple: for given cross-architecture places in @INC
a path where perl modules are, and right F<Config.pm> in that place.
That said, C<miniperl -Ilib -MConfig -we 1> should report an error, because
it can not find F<Config.pm>. If it does not give an error -- wrong F<Config.pm>
is substituted, and resulting binaries will be a mess.
C<miniperl -MCross -MConfig -we 1> should run okay, and it will provide right
F<Config.pm> for further compilations.
=item *
During extensions build phase, a script F<./win32/buildext.pl> is invoked,
which in turn steps in F<./ext> subdirectories and performs a build of
each extension in turn.
All invokes of F<Makefile.PL> are provided with C<-MCross> so to enable cross-
compile.
=back
=head2 CURRENT BUILD INSTRUCTIONS
(These instructions assume the host is 32-bit Windows. If you're on 64-bit
Windows then change "C:\Program Files" to "C:\Program Files (x86)" throughout.)
1. Install EVC4 from
http://download.microsoft.com/download/c/3/f/c3f8b58b-9753-4c2e-8b96-2dfe3476a2f7/eVC4.exe
Use the key mentioned at
http://download.cnet.com/Microsoft-eMbedded-Visual-C/3000-2212_4-10108490.html?tag=bc
The installer is ancient and has a few bugs on the paths it uses. You
will have to fix them later. Basically, some things go into "C:/Program
Files/Windows CE Tools", others go into "C:/Windows CE Tools" regardless
of the path you gave to the installer (the default will be "C:/Windows
CE Tools"). Reboots will be required for the installer to proceed. Also
.c and .h associations with Visual Studio might get overridden when
installing EVC4. You have been warned.
2. Download celib from GitHub (using "Download ZIP") at
https://github.com/bulk88/celib
Extract it to a spaceless path but not into the perl build source.
I call this directory "celib-palm-3.0" but in the GitHub
snapshot it will be called "celib-master". Make a copy of the
"wince-arm-pocket-wce300-release" folder and rename the copy to
"wince-arm-pocket-wce400". This is a hack so we can build a CE 4.0
binary by linking in CE 3.0 ARM asm; the linker doesn't care. Windows
Mobile/WinCE are backwards compatible with machine code like Desktop Windows.
3. Download console-1.3-src.tar.gz from
http://sourceforge.net/projects/perlce/files/PerlCE%20support%20files/console/
Extract it to a spaceless path but not into the perl build source.
Don't extract it into the same directory as celib. Make a copy of the
"wince-arm-pocket-wce300" folder and rename the copy to
"wince-arm-pocket-wce400". This is a hack so we can build a CE 4.0
binary by linking in CE 3.0 ARM asm; the linker doesn't care. Windows
Mobile/WinCE are backwards compatible with machine code like Desktop Windows.
4. Open a command prompt, run your regular batch file to set the environment
for desktop Visual C building, goto the perl source directory, cd into win32/,
fill out Makefile, and do a "nmake all" to build a Desktop Perl.
5. Open win32/Makefile.ce in a text editor and do something similar to the
following patch.
-CELIBDLLDIR = h:\src\wince\celib-palm-3.0
-CECONSOLEDIR = h:\src\wince\w32console
+CELIBDLLDIR = C:\sources\celib-palm-3.0
+CECONSOLEDIR = C:\sources\w32console
Also change
!if "$(MACHINE)" == ""
MACHINE=wince-arm-hpc-wce300
#MACHINE=wince-arm-hpc-wce211
#MACHINE=wince-sh3-hpc-wce211
#MACHINE=wince-mips-hpc-wce211
#MACHINE=wince-sh3-hpc-wce200
#MACHINE=wince-mips-hpc-wce200
#MACHINE=wince-arm-pocket-wce300
#MACHINE=wince-mips-pocket-wce300
#MACHINE=wince-sh3-pocket-wce300
#MACHINE=wince-x86em-pocket-wce300
#MACHINE=wince-mips-palm-wce211
#MACHINE=wince-sh3-palm-wce211
#MACHINE=wince-x86em-palm-wce211
#MACHINE=wince-x86-hpc-wce300
#MACHINE=wince-arm-pocket-wce400
!endif
to
!if "$(MACHINE)" == ""
#MACHINE=wince-arm-hpc-wce300
#MACHINE=wince-arm-hpc-wce211
#MACHINE=wince-sh3-hpc-wce211
#MACHINE=wince-mips-hpc-wce211
#MACHINE=wince-sh3-hpc-wce200
#MACHINE=wince-mips-hpc-wce200
#MACHINE=wince-arm-pocket-wce300
#MACHINE=wince-mips-pocket-wce300
#MACHINE=wince-sh3-pocket-wce300
#MACHINE=wince-x86em-pocket-wce300
#MACHINE=wince-mips-palm-wce211
#MACHINE=wince-sh3-palm-wce211
#MACHINE=wince-x86em-palm-wce211
#MACHINE=wince-x86-hpc-wce300
MACHINE=wince-arm-pocket-wce400
!endif
so wince-arm-pocket-wce400 is the MACHINE type.
6. Use a text editor to open "C:\Program Files\Microsoft eMbedded C++
4.0\EVC\WCE400\BIN\WCEARMV4.BAT". Look for
if "%SDKROOT%"=="" set SDKROOT=...
On a new install it is "C:\Windows CE Tools". Goto
"C:\Windows CE Tools" in a file manager and see if "C:\Windows CE
Tools\wce400\STANDARDSDK\Include\Armv4" exists on your disk. If not
the SDKROOT need to be changed to "C:\Program Files\Windows CE Tools".
Goto celib-palm-3.0\inc\cewin32.h, search for
typedef struct _ABC {
and uncomment the struct.
7. Open another command prompt, ensure PLATFORM is not set to anything
already unless you know what you're doing (so that the correct default
value is set by the next command), and run "C:\Program Files\Microsoft
eMbedded C++ 4.0\EVC\WCE400\BIN\WCEARMV4.BAT"
8. In the WinCE command prompt you made with WCEARMV4.BAT, goto the perl
source directory, cd into win32/ and run "nmake -f Makefile.ce".
9. The ARM perl interpreter (perl519.dll and perl.exe) will be in something
like "C:\perl519\src\win32\wince-arm-pocket-wce400", with the XS DLLs in
"C:\perl519\src\xlib\wince-arm-hpc-wce400\auto".
To prove success on the host machine, run
"dumpbin /headers wince-arm-pocket-wce400\perl.exe" from the win32/ folder
and look for "machine (ARM)" in the FILE HEADER VALUES and
"subsystem (Windows CE GUI)" in the OPTIONAL HEADER VALUES.
=head2 OLD BUILD INSTRUCTIONS
This section describes the steps to be performed to build PerlCE.
You may find additional information about building perl for WinCE
at L<http://perlce.sourceforge.net> and some pre-built binaries.
=head3 Tools & SDK
For compiling, you need following:
=over 4
=item * Microsoft Embedded Visual Tools
=item * Microsoft Visual C++
=item * Rainer Keuchel's celib-sources
=item * Rainer Keuchel's console-sources
=back
Needed source files can be downloaded at
L<http://perlce.sourceforge.net>
=head3 Make
Normally you only need to edit F<./win32/ce-helpers/compile.bat>
to reflect your system and run it.
File F<./win32/ce-helpers/compile.bat> is actually a wrapper to call
C<nmake -f makefile.ce> with appropriate parameters and it accepts extra
parameters and forwards them to C<nmake> command as additional
arguments. You should pass target this way.
To prepare distribution you need to do following:
=over 4
=item * go to F<./win32> subdirectory
=item * edit file F<./win32/ce-helpers/compile.bat>
=item * run
compile.bat
=item * run
compile.bat dist
=back
F<Makefile.ce> has C<CROSS_NAME> macro, and it is used further to refer to
your cross-compilation scheme. You could assign a name to it, but this
is not necessary, because by default it is assigned after your machine
configuration name, such as "wince-sh3-hpc-wce211", and this is enough
to distinguish different builds at the same time. This option could be
handy for several different builds on same platform to perform, say,
threaded build. In a following example we assume that all required
environment variables are set properly for C cross-compiler (a special
*.bat file could fit perfectly to this purpose) and your F<compile.bat>
has proper "MACHINE" parameter set, to, say, C<wince-mips-pocket-wce300>.
compile.bat
compile.bat dist
compile.bat CROSS_NAME=mips-wce300-thr "USE_ITHREADS=define" ^
"USE_IMP_SYS=define" "USE_MULTI=define"
compile.bat CROSS_NAME=mips-wce300-thr "USE_ITHREADS=define" ^
"USE_IMP_SYS=define" "USE_MULTI=define" dist
If all goes okay and no errors during a build, you'll get two independent
distributions: C<wince-mips-pocket-wce300> and C<mips-wce300-thr>.
Target C<dist> prepares distribution file set. Target C<zipdist> performs
same as C<dist> but additionally compresses distribution files into zip
archive.
NOTE: during a build there could be created a number (or one) of F<Config.pm>
for cross-compilation ("foreign" F<Config.pm>) and those are hidden inside
F<../xlib/$(CROSS_NAME)> with other auxiliary files, but, and this is important to
note, there should be B<no> F<Config.pm> for host miniperl.
If you'll get an error that perl could not find Config.pm somewhere in building
process this means something went wrong. Most probably you forgot to
specify a cross-compilation when invoking miniperl.exe to Makefile.PL
When building an extension for cross-compilation your command line should
look like
..\miniperl.exe -I..\lib -MCross=mips-wce300-thr Makefile.PL
or just
..\miniperl.exe -I..\lib -MCross Makefile.PL
to refer a cross-compilation that was created last time.
All questions related to building for WinCE devices could be asked in
F<perlce-user@lists.sourceforge.net> mailing list.
=head1 Using Perl on WinCE
=head2 DESCRIPTION
PerlCE is currently linked with a simple console window, so it also
works on non-hpc devices.
The simple stdio implementation creates the files F<stdin.txt>,
F<stdout.txt> and F<stderr.txt>, so you might examine them if your
console has only a limited number of cols.
When exitcode is non-zero, a message box appears, otherwise the
console closes, so you might have to catch an exit with
status 0 in your program to see any output.
stdout/stderr now go into the files F</perl-stdout.txt> and
F</perl-stderr.txt.>
PerlIDE is handy to deal with perlce.
=head2 LIMITATIONS
No fork(), pipe(), popen() etc.
=head2 ENVIRONMENT
All environment vars must be stored in HKLM\Environment as
strings. They are read at process startup.
=over
=item PERL5LIB
Usual perl lib path (semi-list).
=item PATH
Semi-list for executables.
=item TMP
- Tempdir.
=item UNIXROOTPATH
- Root for accessing some special files, i.e. F</dev/null>, F</etc/services>.
=item ROWS/COLS
- Rows/cols for console.
=item HOME
- Home directory.
=item CONSOLEFONTSIZE
- Size for console font.
=back
You can set these with cereg.exe, a (remote) registry editor
or via the PerlIDE.
=head2 REGISTRY
To start perl by clicking on a perl source file, you have
to make the according entries in HKCR (see F<ce-helpers/wince-reg.bat>).
cereg.exe (which must be executed on a desktop pc with
ActiveSync) is reported not to work on some devices.
You have to create the registry entries by hand using a
registry editor.
=head2 XS
The following Win32-Methods are built-in:
newXS("Win32::GetCwd", w32_GetCwd, file);
newXS("Win32::SetCwd", w32_SetCwd, file);
newXS("Win32::GetTickCount", w32_GetTickCount, file);
newXS("Win32::GetOSVersion", w32_GetOSVersion, file);
newXS("Win32::IsWinNT", w32_IsWinNT, file);
newXS("Win32::IsWin95", w32_IsWin95, file);
newXS("Win32::IsWinCE", w32_IsWinCE, file);
newXS("Win32::CopyFile", w32_CopyFile, file);
newXS("Win32::Sleep", w32_Sleep, file);
newXS("Win32::MessageBox", w32_MessageBox, file);
newXS("Win32::GetPowerStatus", w32_GetPowerStatus, file);
newXS("Win32::GetOemInfo", w32_GetOemInfo, file);
newXS("Win32::ShellEx", w32_ShellEx, file);
=head2 BUGS
Opening files for read-write is currently not supported if
they use stdio (normal perl file handles).
If you find bugs or if it does not work at all on your
device, send mail to the address below. Please report
the details of your device (processor, ceversion,
devicetype (hpc/palm/pocket)) and the date of the downloaded
files.
=head2 INSTALLATION
Currently installation instructions are at L<http://perlce.sourceforge.net/>.
After installation & testing processes will stabilize, information will
be more precise.
=head1 ACKNOWLEDGEMENTS
The port for Win32 was used as a reference.
=head1 History of WinCE port
=over
=item 5.6.0
Initial port of perl to WinCE. It was performed in separate directory
named F<wince>. This port was based on contents of F<./win32> directory.
F<miniperl> was not built, user must have HOST perl and properly edit
F<makefile.ce> to reflect this.
=item 5.8.0
wince port was kept in the same F<./wince> directory, and F<wince/Makefile.ce>
was used to invoke native compiler to create HOST miniperl, which then
facilitates cross-compiling process.
Extension building support was added.
=item 5.9.4
Two directories F<./win32> and F<./wince> were merged, so perlce build
process comes in F<./win32> directory.
=back
=head1 AUTHORS
=over
=item Rainer Keuchel <coyxc@rainer-keuchel.de>
provided initial port of Perl, which appears to be most essential work, as
it was a breakthrough on having Perl ported at all.
Many thanks and obligations to Rainer!
=item Vadim Konovalov
made further support of WinCE port.
=item Daniel Dragan
updated the build process during the 5.19 development track.
=back
PK z3�ZE`��x x perltodo.podnu �[��� =head1 NAME
perltodo - Link to the Perl to-do list
=head1 DESCRIPTION
The Perl 5 to-do list is maintained in the git repository, and can
be viewed at L<http://perl5.git.perl.org/perl.git/blob/HEAD:/Porting/todo.pod>
(The to-do list used to be here in perltodo. That has stopped, as installing a
snapshot that becomes increasingly out of date isn't that useful to anyone.)
PK z3�Z�(�~t ~t perlsolaris.podnu �[��� If you read this file _as_is_, just ignore the funny characters you
see. It is written in the POD format (see pod/perlpod.pod) which is
specifically designed to be readable as is.
=head1 NAME
perlsolaris - Perl version 5 on Solaris systems
=head1 DESCRIPTION
This document describes various features of Sun's Solaris operating system
that will affect how Perl version 5 (hereafter just perl) is
compiled and/or runs. Some issues relating to the older SunOS 4.x are
also discussed, though they may be out of date.
For the most part, everything should just work.
Starting with Solaris 8, perl5.00503 (or higher) is supplied with the
operating system, so you might not even need to build a newer version
of perl at all. The Sun-supplied version is installed in /usr/perl5
with F</usr/bin/perl> pointing to F</usr/perl5/bin/perl>. Do not disturb
that installation unless you really know what you are doing. If you
remove the perl supplied with the OS, you will render some bits of
your system inoperable. If you wish to install a newer version of perl,
install it under a different prefix from /usr/perl5. Common prefixes
to use are /usr/local and /opt/perl.
You may wish to put your version of perl in the PATH of all users by
changing the link F</usr/bin/perl>. This is probably OK, as most perl
scripts shipped with Solaris use an explicit path. (There are a few
exceptions, such as F</usr/bin/rpm2cpio> and F</etc/rcm/scripts/README>, but
these are also sufficiently generic that the actual version of perl
probably doesn't matter too much.)
Solaris ships with a range of Solaris-specific modules. If you choose
to install your own version of perl you will find the source of many of
these modules is available on CPAN under the Sun::Solaris:: namespace.
Solaris may include two versions of perl, e.g. Solaris 9 includes
both 5.005_03 and 5.6.1. This is to provide stability across Solaris
releases, in cases where a later perl version has incompatibilities
with the version included in the preceding Solaris release. The
default perl version will always be the most recent, and in general
the old version will only be retained for one Solaris release. Note
also that the default perl will NOT be configured to search for modules
in the older version, again due to compatibility/stability concerns.
As a consequence if you upgrade Solaris, you will have to
rebuild/reinstall any additional CPAN modules that you installed for
the previous Solaris version. See the CPAN manpage under 'autobundle'
for a quick way of doing this.
As an interim measure, you may either change the #! line of your
scripts to specifically refer to the old perl version, e.g. on
Solaris 9 use #!/usr/perl5/5.00503/bin/perl to use the perl version
that was the default for Solaris 8, or if you have a large number of
scripts it may be more convenient to make the old version of perl the
default on your system. You can do this by changing the appropriate
symlinks under /usr/perl5 as follows (example for Solaris 9):
# cd /usr/perl5
# rm bin man pod
# ln -s ./5.00503/bin
# ln -s ./5.00503/man
# ln -s ./5.00503/lib/pod
# rm /usr/bin/perl
# ln -s ../perl5/5.00503/bin/perl /usr/bin/perl
In both cases this should only be considered to be a temporary
measure - you should upgrade to the later version of perl as soon as
is practicable.
Note also that the perl command-line utilities (e.g. perldoc) and any
that are added by modules that you install will be under
/usr/perl5/bin, so that directory should be added to your PATH.
=head2 Solaris Version Numbers.
For consistency with common usage, perl's Configure script performs
some minor manipulations on the operating system name and version
number as reported by uname. Here's a partial translation table:
Sun: perl's Configure:
uname uname -r Name osname osvers
SunOS 4.1.3 Solaris 1.1 sunos 4.1.3
SunOS 5.6 Solaris 2.6 solaris 2.6
SunOS 5.8 Solaris 8 solaris 2.8
SunOS 5.9 Solaris 9 solaris 2.9
SunOS 5.10 Solaris 10 solaris 2.10
The complete table can be found in the Sun Managers' FAQ
L<ftp://ftp.cs.toronto.edu/pub/jdd/sunmanagers/faq> under
"9.1) Which Sun models run which versions of SunOS?".
=head1 RESOURCES
There are many, many sources for Solaris information. A few of the
important ones for perl:
=over 4
=item Solaris FAQ
The Solaris FAQ is available at
L<http://www.science.uva.nl/pub/solaris/solaris2.html>.
The Sun Managers' FAQ is available at
L<ftp://ftp.cs.toronto.edu/pub/jdd/sunmanagers/faq>
=item Precompiled Binaries
Precompiled binaries, links to many sites, and much, much more are
available at L<http://www.sunfreeware.com/> and
L<http://www.blastwave.org/>.
=item Solaris Documentation
All Solaris documentation is available on-line at L<http://docs.sun.com/>.
=back
=head1 SETTING UP
=head2 File Extraction Problems on Solaris.
Be sure to use a tar program compiled under Solaris (not SunOS 4.x)
to extract the perl-5.x.x.tar.gz file. Do not use GNU tar compiled
for SunOS4 on Solaris. (GNU tar compiled for Solaris should be fine.)
When you run SunOS4 binaries on Solaris, the run-time system magically
alters pathnames matching m#lib/locale# so that when tar tries to create
lib/locale.pm, a file named lib/oldlocale.pm gets created instead.
If you found this advice too late and used a SunOS4-compiled tar
anyway, you must find the incorrectly renamed file and move it back
to lib/locale.pm.
=head2 Compiler and Related Tools on Solaris.
You must use an ANSI C compiler to build perl. Perl can be compiled
with either Sun's add-on C compiler or with gcc. The C compiler that
shipped with SunOS4 will not do.
=head3 Include /usr/ccs/bin/ in your PATH.
Several tools needed to build perl are located in /usr/ccs/bin/: ar,
as, ld, and make. Make sure that /usr/ccs/bin/ is in your PATH.
On all the released versions of Solaris (8, 9 and 10) you need to make sure the following packages are installed (this info is extracted from the Solaris FAQ):
for tools (sccs, lex, yacc, make, nm, truss, ld, as): SUNWbtool,
SUNWsprot, SUNWtoo
for libraries & headers: SUNWhea, SUNWarc, SUNWlibm, SUNWlibms, SUNWdfbh,
SUNWcg6h, SUNWxwinc
Additionaly, on Solaris 8 and 9 you also need:
for 64 bit development: SUNWarcx, SUNWbtoox, SUNWdplx, SUNWscpux,
SUNWsprox, SUNWtoox, SUNWlmsx, SUNWlmx, SUNWlibCx
And only on Solaris 8 you also need:
for libraries & headers: SUNWolinc
If you are in doubt which package contains a file you are missing,
try to find an installation that has that file. Then do a
$ grep /my/missing/file /var/sadm/install/contents
This will display a line like this:
/usr/include/sys/errno.h f none 0644 root bin 7471 37605 956241356 SUNWhea
The last item listed (SUNWhea in this example) is the package you need.
=head3 Avoid /usr/ucb/cc.
You don't need to have /usr/ucb/ in your PATH to build perl. If you
want /usr/ucb/ in your PATH anyway, make sure that /usr/ucb/ is NOT
in your PATH before the directory containing the right C compiler.
=head3 Sun's C Compiler
If you use Sun's C compiler, make sure the correct directory
(usually /opt/SUNWspro/bin/) is in your PATH (before /usr/ucb/).
=head3 GCC
If you use gcc, make sure your installation is recent and complete.
perl versions since 5.6.0 build fine with gcc > 2.8.1 on Solaris >=
2.6.
You must Configure perl with
$ sh Configure -Dcc=gcc
If you don't, you may experience strange build errors.
If you have updated your Solaris version, you may also have to update
your gcc. For example, if you are running Solaris 2.6 and your gcc is
installed under /usr/local, check in /usr/local/lib/gcc-lib and make
sure you have the appropriate directory, sparc-sun-solaris2.6/ or
i386-pc-solaris2.6/. If gcc's directory is for a different version of
Solaris than you are running, then you will need to rebuild gcc for
your new version of Solaris.
You can get a precompiled version of gcc from
L<http://www.sunfreeware.com/> or L<http://www.blastwave.org/>. Make
sure you pick up the package for your Solaris release.
If you wish to use gcc to build add-on modules for use with the perl
shipped with Solaris, you should use the Solaris::PerlGcc module
which is available from CPAN. The perl shipped with Solaris
is configured and built with the Sun compilers, and the compiler
configuration information stored in Config.pm is therefore only
relevant to the Sun compilers. The Solaris:PerlGcc module contains a
replacement Config.pm that is correct for gcc - see the module for
details.
=head3 GNU as and GNU ld
The following information applies to gcc version 2. Volunteers to
update it as appropriately for gcc version 3 would be appreciated.
The versions of as and ld supplied with Solaris work fine for building
perl. There is normally no need to install the GNU versions to
compile perl.
If you decide to ignore this advice and use the GNU versions anyway,
then be sure that they are relatively recent. Versions newer than 2.7
are apparently new enough. Older versions may have trouble with
dynamic loading.
If you wish to use GNU ld, then you need to pass it the -Wl,-E flag.
The hints/solaris_2.sh file tries to do this automatically by setting
the following Configure variables:
ccdlflags="$ccdlflags -Wl,-E"
lddlflags="$lddlflags -Wl,-E -G"
However, over the years, changes in gcc, GNU ld, and Solaris ld have made
it difficult to automatically detect which ld ultimately gets called.
You may have to manually edit config.sh and add the -Wl,-E flags
yourself, or else run Configure interactively and add the flags at the
appropriate prompts.
If your gcc is configured to use GNU as and ld but you want to use the
Solaris ones instead to build perl, then you'll need to add
-B/usr/ccs/bin/ to the gcc command line. One convenient way to do
that is with
$ sh Configure -Dcc='gcc -B/usr/ccs/bin/'
Note that the trailing slash is required. This will result in some
harmless warnings as Configure is run:
gcc: file path prefix `/usr/ccs/bin/' never used
These messages may safely be ignored.
(Note that for a SunOS4 system, you must use -B/bin/ instead.)
Alternatively, you can use the GCC_EXEC_PREFIX environment variable to
ensure that Sun's as and ld are used. Consult your gcc documentation
for further information on the -B option and the GCC_EXEC_PREFIX variable.
=head3 Sun and GNU make
The make under /usr/ccs/bin works fine for building perl. If you
have the Sun C compilers, you will also have a parallel version of
make (dmake). This works fine to build perl, but can sometimes cause
problems when running 'make test' due to underspecified dependencies
between the different test harness files. The same problem can also
affect the building of some add-on modules, so in those cases either
specify '-m serial' on the dmake command line, or use
/usr/ccs/bin/make instead. If you wish to use GNU make, be sure that
the set-group-id bit is not set. If it is, then arrange your PATH so
that /usr/ccs/bin/make is before GNU make or else have the system
administrator disable the set-group-id bit on GNU make.
=head3 Avoid libucb.
Solaris provides some BSD-compatibility functions in /usr/ucblib/libucb.a.
Perl will not build and run correctly if linked against -lucb since it
contains routines that are incompatible with the standard Solaris libc.
Normally this is not a problem since the solaris hints file prevents
Configure from even looking in /usr/ucblib for libraries, and also
explicitly omits -lucb.
=head2 Environment for Compiling perl on Solaris
=head3 PATH
Make sure your PATH includes the compiler (/opt/SUNWspro/bin/ if you're
using Sun's compiler) as well as /usr/ccs/bin/ to pick up the other
development tools (such as make, ar, as, and ld). Make sure your path
either doesn't include /usr/ucb or that it includes it after the
compiler and compiler tools and other standard Solaris directories.
You definitely don't want /usr/ucb/cc.
=head3 LD_LIBRARY_PATH
If you have the LD_LIBRARY_PATH environment variable set, be sure that
it does NOT include /lib or /usr/lib. If you will be building
extensions that call third-party shared libraries (e.g. Berkeley DB)
then make sure that your LD_LIBRARY_PATH environment variable includes
the directory with that library (e.g. /usr/local/lib).
If you get an error message
dlopen: stub interception failed
it is probably because your LD_LIBRARY_PATH environment variable
includes a directory which is a symlink to /usr/lib (such as /lib).
The reason this causes a problem is quite subtle. The file
libdl.so.1.0 actually *only* contains functions which generate 'stub
interception failed' errors! The runtime linker intercepts links to
"/usr/lib/libdl.so.1.0" and links in internal implementations of those
functions instead. [Thanks to Tim Bunce for this explanation.]
=head1 RUN CONFIGURE.
See the INSTALL file for general information regarding Configure.
Only Solaris-specific issues are discussed here. Usually, the
defaults should be fine.
=head2 64-bit perl on Solaris.
See the INSTALL file for general information regarding 64-bit compiles.
In general, the defaults should be fine for most people.
By default, perl-5.6.0 (or later) is compiled as a 32-bit application
with largefile and long-long support.
=head3 General 32-bit vs. 64-bit issues.
Solaris 7 and above will run in either 32 bit or 64 bit mode on SPARC
CPUs, via a reboot. You can build 64 bit apps whilst running 32 bit
mode and vice-versa. 32 bit apps will run under Solaris running in
either 32 or 64 bit mode. 64 bit apps require Solaris to be running
64 bit mode.
Existing 32 bit apps are properly known as LP32, i.e. Longs and
Pointers are 32 bit. 64-bit apps are more properly known as LP64.
The discriminating feature of a LP64 bit app is its ability to utilise a
64-bit address space. It is perfectly possible to have a LP32 bit app
that supports both 64-bit integers (long long) and largefiles (> 2GB),
and this is the default for perl-5.6.0.
For a more complete explanation of 64-bit issues, see the
"Solaris 64-bit Developer's Guide" at L<http://docs.sun.com/>
You can detect the OS mode using "isainfo -v", e.g.
$ isainfo -v # Ultra 30 in 64 bit mode
64-bit sparcv9 applications
32-bit sparc applications
By default, perl will be compiled as a 32-bit application. Unless
you want to allocate more than ~ 4GB of memory inside perl, or unless
you need more than 255 open file descriptors, you probably don't need
perl to be a 64-bit app.
=head3 Large File Support
For Solaris 2.6 and onwards, there are two different ways for 32-bit
applications to manipulate large files (files whose size is > 2GByte).
(A 64-bit application automatically has largefile support built in
by default.)
First is the "transitional compilation environment", described in
lfcompile64(5). According to the man page,
The transitional compilation environment exports all the
explicit 64-bit functions (xxx64()) and types in addition to
all the regular functions (xxx()) and types. Both xxx() and
xxx64() functions are available to the program source. A
32-bit application must use the xxx64() functions in order
to access large files. See the lf64(5) manual page for a
complete listing of the 64-bit transitional interfaces.
The transitional compilation environment is obtained with the
following compiler and linker flags:
getconf LFS64_CFLAGS -D_LARGEFILE64_SOURCE
getconf LFS64_LDFLAG # nothing special needed
getconf LFS64_LIBS # nothing special needed
Second is the "large file compilation environment", described in
lfcompile(5). According to the man page,
Each interface named xxx() that needs to access 64-bit entities
to access large files maps to a xxx64() call in the
resulting binary. All relevant data types are defined to be
of correct size (for example, off_t has a typedef definition
for a 64-bit entity).
An application compiled in this environment is able to use
the xxx() source interfaces to access both large and small
files, rather than having to explicitly utilize the transitional
xxx64() interface calls to access large files.
Two exceptions are fseek() and ftell(). 32-bit applications should
use fseeko(3C) and ftello(3C). These will get automatically mapped
to fseeko64() and ftello64().
The large file compilation environment is obtained with
getconf LFS_CFLAGS -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
getconf LFS_LDFLAGS # nothing special needed
getconf LFS_LIBS # nothing special needed
By default, perl uses the large file compilation environment and
relies on Solaris to do the underlying mapping of interfaces.
=head3 Building an LP64 perl
To compile a 64-bit application on an UltraSparc with a recent Sun Compiler,
you need to use the flag "-xarch=v9". getconf(1) will tell you this, e.g.
$ getconf -a | grep v9
XBS5_LP64_OFF64_CFLAGS: -xarch=v9
XBS5_LP64_OFF64_LDFLAGS: -xarch=v9
XBS5_LP64_OFF64_LINTFLAGS: -xarch=v9
XBS5_LPBIG_OFFBIG_CFLAGS: -xarch=v9
XBS5_LPBIG_OFFBIG_LDFLAGS: -xarch=v9
XBS5_LPBIG_OFFBIG_LINTFLAGS: -xarch=v9
_XBS5_LP64_OFF64_CFLAGS: -xarch=v9
_XBS5_LP64_OFF64_LDFLAGS: -xarch=v9
_XBS5_LP64_OFF64_LINTFLAGS: -xarch=v9
_XBS5_LPBIG_OFFBIG_CFLAGS: -xarch=v9
_XBS5_LPBIG_OFFBIG_LDFLAGS: -xarch=v9
_XBS5_LPBIG_OFFBIG_LINTFLAGS: -xarch=v9
This flag is supported in Sun WorkShop Compilers 5.0 and onwards
(now marketed under the name Forte) when used on Solaris 7 or later on
UltraSparc systems.
If you are using gcc, you would need to use -mcpu=v9 -m64 instead. This
option is not yet supported as of gcc 2.95.2; from install/SPECIFIC
in that release:
GCC version 2.95 is not able to compile code correctly for sparc64
targets. Users of the Linux kernel, at least, can use the sparc32
program to start up a new shell invocation with an environment that
causes configure to recognize (via uname -a) the system as sparc-*-*
instead.
All this should be handled automatically by the hints file, if
requested.
=head3 Long Doubles.
As of 5.8.1, long doubles are working if you use the Sun compilers
(needed for additional math routines not included in libm).
=head2 Threads in perl on Solaris.
It is possible to build a threaded version of perl on Solaris. The entire
perl thread implementation is still experimental, however, so beware.
=head2 Malloc Issues with perl on Solaris.
Starting from perl 5.7.1 perl uses the Solaris malloc, since the perl
malloc breaks when dealing with more than 2GB of memory, and the Solaris
malloc also seems to be faster.
If you for some reason (such as binary backward compatibility) really
need to use perl's malloc, you can rebuild perl from the sources
and Configure the build with
$ sh Configure -Dusemymalloc
You should not use perl's malloc if you are building with gcc. There
are reports of core dumps, especially in the PDL module. The problem
appears to go away under -DDEBUGGING, so it has been difficult to
track down. Sun's compiler appears to be okay with or without perl's
malloc. [XXX further investigation is needed here.]
=head1 MAKE PROBLEMS.
=over 4
=item Dynamic Loading Problems With GNU as and GNU ld
If you have problems with dynamic loading using gcc on SunOS or
Solaris, and you are using GNU as and GNU ld, see the section
L</"GNU as and GNU ld"> above.
=item ld.so.1: ./perl: fatal: relocation error:
If you get this message on SunOS or Solaris, and you're using gcc,
it's probably the GNU as or GNU ld problem in the previous item
L</"GNU as and GNU ld">.
=item dlopen: stub interception failed
The primary cause of the 'dlopen: stub interception failed' message is
that the LD_LIBRARY_PATH environment variable includes a directory
which is a symlink to /usr/lib (such as /lib). See
L</"LD_LIBRARY_PATH"> above.
=item #error "No DATAMODEL_NATIVE specified"
This is a common error when trying to build perl on Solaris 2.6 with a
gcc installation from Solaris 2.5 or 2.5.1. The Solaris header files
changed, so you need to update your gcc installation. You can either
rerun the fixincludes script from gcc or take the opportunity to
update your gcc installation.
=item sh: ar: not found
This is a message from your shell telling you that the command 'ar'
was not found. You need to check your PATH environment variable to
make sure that it includes the directory with the 'ar' command. This
is a common problem on Solaris, where 'ar' is in the /usr/ccs/bin/
directory.
=back
=head1 MAKE TEST
=head2 op/stat.t test 4 in Solaris
F<op/stat.t> test 4 may fail if you are on a tmpfs of some sort.
Building in /tmp sometimes shows this behavior. The
test suite detects if you are building in /tmp, but it may not be able
to catch all tmpfs situations.
=head2 nss_delete core dump from op/pwent or op/grent
See L<perlhpux/"nss_delete core dump from op/pwent or op/grent">.
=head1 CROSS-COMPILATION
Nothing too unusual here. You can easily do this if you have a
cross-compiler available; A usual Configure invocation when targetting a
Solaris x86 looks something like this:
sh ./Configure -des -Dusecrosscompile \
-Dcc=i386-pc-solaris2.11-gcc \
-Dsysroot=$SYSROOT \
-Alddlflags=" -Wl,-z,notext" \
-Dtargethost=... # The usual cross-compilation options
The lddlflags addition is the only abnormal bit.
=head1 PREBUILT BINARIES OF PERL FOR SOLARIS.
You can pick up prebuilt binaries for Solaris from
L<http://www.sunfreeware.com/>, L<http://www.blastwave.org>,
ActiveState L<http://www.activestate.com/>, and
L<http://www.perl.com/> under the Binaries list at the top of the
page. There are probably other sources as well. Please note that
these sites are under the control of their respective owners, not the
perl developers.
=head1 RUNTIME ISSUES FOR PERL ON SOLARIS.
=head2 Limits on Numbers of Open Files on Solaris.
The stdio(3C) manpage notes that for LP32 applications, only 255
files may be opened using fopen(), and only file descriptors 0
through 255 can be used in a stream. Since perl calls open() and
then fdopen(3C) with the resulting file descriptor, perl is limited
to 255 simultaneous open files, even if sysopen() is used. If this
proves to be an insurmountable problem, you can compile perl as a
LP64 application, see L</Building an LP64 perl> for details. Note
also that the default resource limit for open file descriptors on
Solaris is 255, so you will have to modify your ulimit or rctl
(Solaris 9 onwards) appropriately.
=head1 SOLARIS-SPECIFIC MODULES.
See the modules under the Solaris:: and Sun::Solaris namespaces on CPAN,
see L<http://www.cpan.org/modules/by-module/Solaris/> and
L<http://www.cpan.org/modules/by-module/Sun/>.
=head1 SOLARIS-SPECIFIC PROBLEMS WITH MODULES.
=head2 Proc::ProcessTable on Solaris
Proc::ProcessTable does not compile on Solaris with perl5.6.0 and higher
if you have LARGEFILES defined. Since largefile support is the
default in 5.6.0 and later, you have to take special steps to use this
module.
The problem is that various structures visible via procfs use off_t,
and if you compile with largefile support these change from 32 bits to
64 bits. Thus what you get back from procfs doesn't match up with
the structures in perl, resulting in garbage. See proc(4) for further
discussion.
A fix for Proc::ProcessTable is to edit Makefile to
explicitly remove the largefile flags from the ones MakeMaker picks up
from Config.pm. This will result in Proc::ProcessTable being built
under the correct environment. Everything should then be OK as long as
Proc::ProcessTable doesn't try to share off_t's with the rest of perl,
or if it does they should be explicitly specified as off64_t.
=head2 BSD::Resource on Solaris
BSD::Resource versions earlier than 1.09 do not compile on Solaris
with perl 5.6.0 and higher, for the same reasons as Proc::ProcessTable.
BSD::Resource versions starting from 1.09 have a workaround for the problem.
=head2 Net::SSLeay on Solaris
Net::SSLeay requires a /dev/urandom to be present. This device is
available from Solaris 9 onwards. For earlier Solaris versions you
can either get the package SUNWski (packaged with several Sun
software products, for example the Sun WebServer, which is part of
the Solaris Server Intranet Extension, or the Sun Directory Services,
part of Solaris for ISPs) or download the ANDIrand package from
L<http://www.cosy.sbg.ac.at/~andi/>. If you use SUNWski, make a
symbolic link /dev/urandom pointing to /dev/random. For more details,
see Document ID27606 entitled "Differing /dev/random support requirements
within Solaris[TM] Operating Environments", available at
L<http://sunsolve.sun.com> .
It may be possible to use the Entropy Gathering Daemon (written in
Perl!), available from L<http://www.lothar.com/tech/crypto/>.
=head1 SunOS 4.x
In SunOS 4.x you most probably want to use the SunOS ld, /usr/bin/ld,
since the more recent versions of GNU ld (like 2.13) do not seem to
work for building Perl anymore. When linking the extensions, the
GNU ld gets very unhappy and spews a lot of errors like this
... relocation truncated to fit: BASE13 ...
and dies. Therefore the SunOS 4.1 hints file explicitly sets the
ld to be F</usr/bin/ld>.
As of Perl 5.8.1 the dynamic loading of libraries (DynaLoader, XSLoader)
also seems to have become broken in in SunOS 4.x. Therefore the default
is to build Perl statically.
Running the test suite in SunOS 4.1 is a bit tricky since the
F<dist/Tie-File/t/09_gen_rs.t> test hangs (subtest #51, FWIW) for some
unknown reason. Just stop the test and kill that particular Perl
process.
There are various other failures, that as of SunOS 4.1.4 and gcc 3.2.2
look a lot like gcc bugs. Many of the failures happen in the Encode
tests, where for example when the test expects "0" you get "0"
which should after a little squinting look very odd indeed.
Another example is earlier in F<t/run/fresh_perl> where chr(0xff) is
expected but the test fails because the result is chr(0xff). Exactly.
This is the "make test" result from the said combination:
Failed 27 test scripts out of 745, 96.38% okay.
Running the C<harness> is painful because of the many failing
Unicode-related tests will output megabytes of failure messages,
but if one patiently waits, one gets these results:
Failed Test Stat Wstat Total Fail Failed List of Failed
-----------------------------------------------------------------------------
...
../ext/Encode/t/at-cn.t 4 1024 29 4 13.79% 14-17
../ext/Encode/t/at-tw.t 10 2560 17 10 58.82% 2 4 6 8 10 12
14-17
../ext/Encode/t/enc_data.t 29 7424 ?? ?? % ??
../ext/Encode/t/enc_eucjp.t 29 7424 ?? ?? % ??
../ext/Encode/t/enc_module.t 29 7424 ?? ?? % ??
../ext/Encode/t/encoding.t 29 7424 ?? ?? % ??
../ext/Encode/t/grow.t 12 3072 24 12 50.00% 2 4 6 8 10 12 14
16 18 20 22 24
Failed Test Stat Wstat Total Fail Failed List of Failed
------------------------------------------------------------------------------
../ext/Encode/t/guess.t 255 65280 29 40 137.93% 10-29
../ext/Encode/t/jperl.t 29 7424 15 30 200.00% 1-15
../ext/Encode/t/mime-header.t 2 512 10 2 20.00% 2-3
../ext/Encode/t/perlio.t 22 5632 38 22 57.89% 1-4 9-16 19-20
23-24 27-32
../ext/List/Util/t/shuffle.t 0 139 ?? ?? % ??
../ext/PerlIO/t/encoding.t 14 1 7.14% 11
../ext/PerlIO/t/fallback.t 9 2 22.22% 3 5
../ext/Socket/t/socketpair.t 0 2 45 70 155.56% 11-45
../lib/CPAN/t/vcmp.t 30 1 3.33% 25
../lib/Tie/File/t/09_gen_rs.t 0 15 ?? ?? % ??
../lib/Unicode/Collate/t/test.t 199 30 15.08% 7 26-27 71-75
81-88 95 101
103-104 106 108-
109 122 124 161
169-172
../lib/sort.t 0 139 119 26 21.85% 107-119
op/alarm.t 4 1 25.00% 4
op/utfhash.t 97 1 1.03% 31
run/fresh_perl.t 91 1 1.10% 32
uni/tr_7jis.t ?? ?? % ??
uni/tr_eucjp.t 29 7424 6 12 200.00% 1-6
uni/tr_sjis.t 29 7424 6 12 200.00% 1-6
56 tests and 467 subtests skipped.
Failed 27/811 test scripts, 96.67% okay. 1383/75399 subtests failed,
98.17% okay.
The alarm() test failure is caused by system() apparently blocking
alarm(). That is probably a libc bug, and given that SunOS 4.x
has been end-of-lifed years ago, don't hold your breath for a fix.
In addition to that, don't try anything too Unicode-y, especially
with Encode, and you should be fine in SunOS 4.x.
=head1 AUTHOR
The original was written by Andy Dougherty F<doughera@lafayette.edu>
drawing heavily on advice from Alan Burlison, Nick Ing-Simmons, Tim Bunce,
and many other Solaris users over the years.
Please report any errors, updates, or suggestions to F<perlbug@perl.org>.
PK z3�Zb VU0 0 perlbot.podnu �[��� =encoding utf8
=head1 NAME
perlbot - Links to information on object-oriented programming in Perl
=head1 DESCRIPTION
For information on OO programming with Perl, please see L<perlootut>
and L<perlobj>.
(The above documents supersede the collection of tricks that was formerly here
in perlbot.)
=cut
PK z3�ZN��}�� �� perlhack.podnu �[��� =encoding utf8
=for comment
Consistent formatting of this file is achieved with:
perl ./Porting/podtidy pod/perlhack.pod
=head1 NAME
perlhack - How to hack on Perl
=head1 DESCRIPTION
This document explains how Perl development works. It includes details
about the Perl 5 Porters email list, the Perl repository, the Perlbug
bug tracker, patch guidelines, and commentary on Perl development
philosophy.
=head1 SUPER QUICK PATCH GUIDE
If you just want to submit a single small patch like a pod fix, a test
for a bug, comment fixes, etc., it's easy! Here's how:
=over 4
=item * Check out the source repository
The perl source is in a git repository. You can clone the repository
with the following command:
% git clone git://perl5.git.perl.org/perl.git perl
=item * Ensure you're following the latest advice
In case the advice in this guide has been updated recently, read the
latest version directly from the perl source:
% perldoc pod/perlhack.pod
=item * Make your change
Hack, hack, hack. Keep in mind that Perl runs on many different
platforms, with different operating systems that have different
capabilities, different filesystem organizations, and even different
character sets. L<perlhacktips> gives advice on this.
=item * Test your change
You can run all the tests with the following commands:
% ./Configure -des -Dusedevel
% make test
Keep hacking until the tests pass.
=item * Commit your change
Committing your work will save the change I<on your local system>:
% git commit -a -m 'Commit message goes here'
Make sure the commit message describes your change in a single
sentence. For example, "Fixed spelling errors in perlhack.pod".
=item * Send your change to perlbug
The next step is to submit your patch to the Perl core ticket system
via email.
If your changes are in a single git commit, run the following commands
to generate the patch file and attach it to your bug report:
% git format-patch -1
% ./perl -Ilib utils/perlbug -p 0001-*.patch
The perlbug program will ask you a few questions about your email
address and the patch you're submitting. Once you've answered them it
will submit your patch via email.
If your changes are in multiple commits, generate a patch file for each
one and provide them to perlbug's C<-p> option separated by commas:
% git format-patch -3
% ./perl -Ilib utils/perlbug -p 0001-fix1.patch,0002-fix2.patch,\
> 0003-fix3.patch
When prompted, pick a subject that summarizes your changes.
=item * Thank you
The porters appreciate the time you spent helping to make Perl better.
Thank you!
=item * Next time
The next time you wish to make a patch, you need to start from the
latest perl in a pristine state. Check you don't have any local changes
or added files in your perl check-out which you wish to keep, then run
these commands:
% git pull
% git reset --hard origin/blead
% git clean -dxf
=back
=head1 BUG REPORTING
If you want to report a bug in Perl, you must use the F<perlbug>
command line tool. This tool will ensure that your bug report includes
all the relevant system and configuration information.
To browse existing Perl bugs and patches, you can use the web interface
at L<http://rt.perl.org/>.
Please check the archive of the perl5-porters list (see below) and/or
the bug tracking system before submitting a bug report. Often, you'll
find that the bug has been reported already.
You can log in to the bug tracking system and comment on existing bug
reports. If you have additional information regarding an existing bug,
please add it. This will help the porters fix the bug.
=head1 PERL 5 PORTERS
The perl5-porters (p5p) mailing list is where the Perl standard
distribution is maintained and developed. The people who maintain Perl
are also referred to as the "Perl 5 Porters", "p5p" or just the
"porters".
A searchable archive of the list is available at
L<http://markmail.org/search/?q=perl5-porters>. There is also an archive at
L<http://archive.develooper.com/perl5-porters@perl.org/>.
=head2 perl-changes mailing list
The perl5-changes mailing list receives a copy of each patch that gets
submitted to the maintenance and development branches of the perl
repository. See L<http://lists.perl.org/list/perl5-changes.html> for
subscription and archive information.
=head2 #p5p on IRC
Many porters are also active on the L<irc://irc.perl.org/#p5p> channel.
Feel free to join the channel and ask questions about hacking on the
Perl core.
=head1 GETTING THE PERL SOURCE
All of Perl's source code is kept centrally in a Git repository at
I<perl5.git.perl.org>. The repository contains many Perl revisions
from Perl 1 onwards and all the revisions from Perforce, the previous
version control system.
For much more detail on using git with the Perl repository, please see
L<perlgit>.
=head2 Read access via Git
You will need a copy of Git for your computer. You can fetch a copy of
the repository using the git protocol:
% git clone git://perl5.git.perl.org/perl.git perl
This clones the repository and makes a local copy in the F<perl>
directory.
If you cannot use the git protocol for firewall reasons, you can also
clone via http, though this is much slower:
% git clone http://perl5.git.perl.org/perl.git perl
=head2 Read access via the web
You may access the repository over the web. This allows you to browse
the tree, see recent commits, subscribe to RSS feeds for the changes,
search for particular commits and more. You may access it at
L<http://perl5.git.perl.org/perl.git>. A mirror of the repository is
found at L<https://github.com/Perl/perl5>.
=head2 Read access via rsync
You can also choose to use rsync to get a copy of the current source
tree for the bleadperl branch and all maintenance branches:
% rsync -avz rsync://perl5.git.perl.org/perl-current .
% rsync -avz rsync://perl5.git.perl.org/perl-5.12.x .
% rsync -avz rsync://perl5.git.perl.org/perl-5.10.x .
% rsync -avz rsync://perl5.git.perl.org/perl-5.8.x .
% rsync -avz rsync://perl5.git.perl.org/perl-5.6.x .
% rsync -avz rsync://perl5.git.perl.org/perl-5.005xx .
(Add the C<--delete> option to remove leftover files.)
To get a full list of the available sync points:
% rsync perl5.git.perl.org::
=head2 Write access via git
If you have a commit bit, please see L<perlgit> for more details on
using git.
=head1 PATCHING PERL
If you're planning to do more extensive work than a single small fix,
we encourage you to read the documentation below. This will help you
focus your work and make your patches easier to incorporate into the
Perl source.
=head2 Submitting patches
If you have a small patch to submit, please submit it via perlbug. You
can also send email directly to perlbug@perl.org. Please note that
messages sent to perlbug may be held in a moderation queue, so you
won't receive a response immediately.
You'll know your submission has been processed when you receive an
email from our ticket tracking system. This email will give you a
ticket number. Once your patch has made it to the ticket tracking
system, it will also be sent to the perl5-porters@perl.org list.
Patches are reviewed and discussed on the p5p list. Simple,
uncontroversial patches will usually be applied without any discussion.
When the patch is applied, the ticket will be updated and you will
receive email. In addition, an email will be sent to the p5p list.
In other cases, the patch will need more work or discussion. That will
happen on the p5p list.
You are encouraged to participate in the discussion and advocate for
your patch. Sometimes your patch may get lost in the shuffle. It's
appropriate to send a reminder email to p5p if no action has been taken
in a month. Please remember that the Perl 5 developers are all
volunteers, and be polite.
Changes are always applied directly to the main development branch,
called "blead". Some patches may be backported to a maintenance
branch. If you think your patch is appropriate for the maintenance
branch (see L<perlpolicy/MAINTENANCE BRANCHES>), please explain why
when you submit it.
=head2 Getting your patch accepted
If you are submitting a code patch there are several things that you
can do to help the Perl 5 Porters accept your patch.
=head3 Patch style
If you used git to check out the Perl source, then using C<git
format-patch> will produce a patch in a style suitable for Perl. The
C<format-patch> command produces one patch file for each commit you
made. If you prefer to send a single patch for all commits, you can
use C<git diff>.
% git checkout blead
% git pull
% git diff blead my-branch-name
This produces a patch based on the difference between blead and your
current branch. It's important to make sure that blead is up to date
before producing the diff, that's why we call C<git pull> first.
We strongly recommend that you use git if possible. It will make your
life easier, and ours as well.
However, if you're not using git, you can still produce a suitable
patch. You'll need a pristine copy of the Perl source to diff against.
The porters prefer unified diffs. Using GNU C<diff>, you can produce a
diff like this:
% diff -Npurd perl.pristine perl.mine
Make sure that you C<make realclean> in your copy of Perl to remove any
build artifacts, or you may get a confusing result.
=head3 Commit message
As you craft each patch you intend to submit to the Perl core, it's
important to write a good commit message. This is especially important
if your submission will consist of a series of commits.
The first line of the commit message should be a short description
without a period. It should be no longer than the subject line of an
email, 50 characters being a good rule of thumb.
A lot of Git tools (Gitweb, GitHub, git log --pretty=oneline, ...) will
only display the first line (cut off at 50 characters) when presenting
commit summaries.
The commit message should include a description of the problem that the
patch corrects or new functionality that the patch adds.
As a general rule of thumb, your commit message should help a
programmer who knows the Perl core quickly understand what you were
trying to do, how you were trying to do it, and why the change matters
to Perl.
=over 4
=item * Why
Your commit message should describe why the change you are making is
important. When someone looks at your change in six months or six
years, your intent should be clear.
If you're deprecating a feature with the intent of later simplifying
another bit of code, say so. If you're fixing a performance problem or
adding a new feature to support some other bit of the core, mention
that.
=item * What
Your commit message should describe what part of the Perl core you're
changing and what you expect your patch to do.
=item * How
While it's not necessary for documentation changes, new tests or
trivial patches, it's often worth explaining how your change works.
Even if it's clear to you today, it may not be clear to a porter next
month or next year.
=back
A commit message isn't intended to take the place of comments in your
code. Commit messages should describe the change you made, while code
comments should describe the current state of the code.
If you've just implemented a new feature, complete with doc, tests and
well-commented code, a brief commit message will often suffice. If,
however, you've just changed a single character deep in the parser or
lexer, you might need to write a small novel to ensure that future
readers understand what you did and why you did it.
=head3 Comments, Comments, Comments
Be sure to adequately comment your code. While commenting every line
is unnecessary, anything that takes advantage of side effects of
operators, that creates changes that will be felt outside of the
function being patched, or that others may find confusing should be
documented. If you are going to err, it is better to err on the side
of adding too many comments than too few.
The best comments explain I<why> the code does what it does, not I<what
it does>.
=head3 Style
In general, please follow the particular style of the code you are
patching.
In particular, follow these general guidelines for patching Perl
sources:
=over 4
=item *
4-wide indents for code, 2-wide indents for nested CPP C<#define>s,
with 8-wide tabstops.
=item *
Use spaces for indentation, not tab characters.
The codebase is a mixture of tabs and spaces for indentation, and we
are moving to spaces only. Converting lines you're patching from 8-wide
tabs to spaces will help this migration.
=item *
Try hard not to exceed 79-columns
=item *
ANSI C prototypes
=item *
Uncuddled elses and "K&R" style for indenting control constructs
=item *
No C++ style (//) comments
=item *
Mark places that need to be revisited with XXX (and revisit often!)
=item *
Opening brace lines up with "if" when conditional spans multiple lines;
should be at end-of-line otherwise
=item *
In function definitions, name starts in column 0 (return value-type is on
previous line)
=item *
Single space after keywords that are followed by parens, no space
between function name and following paren
=item *
Avoid assignments in conditionals, but if they're unavoidable, use
extra paren, e.g. "if (a && (b = c)) ..."
=item *
"return foo;" rather than "return(foo);"
=item *
"if (!foo) ..." rather than "if (foo == FALSE) ..." etc.
=item *
Do not declare variables using "register". It may be counterproductive
with modern compilers, and is deprecated in C++, under which the Perl
source is regularly compiled.
=item *
In-line functions that are in headers that are accessible to XS code
need to be able to compile without warnings with commonly used extra
compilation flags, such as gcc's C<-Wswitch-default> which warns
whenever a switch statement does not have a "default" case. The use of
these extra flags is to catch potential problems in legal C code, and
is often used by Perl aggregators, such as Linux distributors.
=back
=head3 Test suite
If your patch changes code (rather than just changing documentation),
you should also include one or more test cases which illustrate the bug
you're fixing or validate the new functionality you're adding. In
general, you should update an existing test file rather than create a
new one.
Your test suite additions should generally follow these guidelines
(courtesy of Gurusamy Sarathy <gsar@activestate.com>):
=over 4
=item *
Know what you're testing. Read the docs, and the source.
=item *
Tend to fail, not succeed.
=item *
Interpret results strictly.
=item *
Use unrelated features (this will flush out bizarre interactions).
=item *
Use non-standard idioms (otherwise you are not testing TIMTOWTDI).
=item *
Avoid using hardcoded test numbers whenever possible (the EXPECTED/GOT
found in t/op/tie.t is much more maintainable, and gives better failure
reports).
=item *
Give meaningful error messages when a test fails.
=item *
Avoid using qx// and system() unless you are testing for them. If you
do use them, make sure that you cover _all_ perl platforms.
=item *
Unlink any temporary files you create.
=item *
Promote unforeseen warnings to errors with $SIG{__WARN__}.
=item *
Be sure to use the libraries and modules shipped with the version being
tested, not those that were already installed.
=item *
Add comments to the code explaining what you are testing for.
=item *
Make updating the '1..42' string unnecessary. Or make sure that you
update it.
=item *
Test _all_ behaviors of a given operator, library, or function.
Test all optional arguments.
Test return values in various contexts (boolean, scalar, list, lvalue).
Use both global and lexical variables.
Don't forget the exceptional, pathological cases.
=back
=head2 Patching a core module
This works just like patching anything else, with one extra
consideration.
Modules in the F<cpan/> directory of the source tree are maintained
outside of the Perl core. When the author updates the module, the
updates are simply copied into the core. See that module's
documentation or its listing on L<http://search.cpan.org/> for more
information on reporting bugs and submitting patches.
In most cases, patches to modules in F<cpan/> should be sent upstream
and should not be applied to the Perl core individually. If a patch to
a file in F<cpan/> absolutely cannot wait for the fix to be made
upstream, released to CPAN and copied to blead, you must add (or
update) a C<CUSTOMIZED> entry in the F<"Porting/Maintainers.pl"> file
to flag that a local modification has been made. See
F<"Porting/Maintainers.pl"> for more details.
In contrast, modules in the F<dist/> directory are maintained in the
core.
=head2 Updating perldelta
For changes significant enough to warrant a F<pod/perldelta.pod> entry,
the porters will greatly appreciate it if you submit a delta entry
along with your actual change. Significant changes include, but are
not limited to:
=over 4
=item *
Adding, deprecating, or removing core features
=item *
Adding, deprecating, removing, or upgrading core or dual-life modules
=item *
Adding new core tests
=item *
Fixing security issues and user-visible bugs in the core
=item *
Changes that might break existing code, either on the perl or C level
=item *
Significant performance improvements
=item *
Adding, removing, or significantly changing documentation in the
F<pod/> directory
=item *
Important platform-specific changes
=back
Please make sure you add the perldelta entry to the right section
within F<pod/perldelta.pod>. More information on how to write good
perldelta entries is available in the C<Style> section of
F<Porting/how_to_write_a_perldelta.pod>.
=head2 What makes for a good patch?
New features and extensions to the language can be contentious. There
is no specific set of criteria which determine what features get added,
but here are some questions to consider when developing a patch:
=head3 Does the concept match the general goals of Perl?
Our goals include, but are not limited to:
=over 4
=item 1.
Keep it fast, simple, and useful.
=item 2.
Keep features/concepts as orthogonal as possible.
=item 3.
No arbitrary limits (platforms, data sizes, cultures).
=item 4.
Keep it open and exciting to use/patch/advocate Perl everywhere.
=item 5.
Either assimilate new technologies, or build bridges to them.
=back
=head3 Where is the implementation?
All the talk in the world is useless without an implementation. In
almost every case, the person or people who argue for a new feature
will be expected to be the ones who implement it. Porters capable of
coding new features have their own agendas, and are not available to
implement your (possibly good) idea.
=head3 Backwards compatibility
It's a cardinal sin to break existing Perl programs. New warnings can
be contentious--some say that a program that emits warnings is not
broken, while others say it is. Adding keywords has the potential to
break programs, changing the meaning of existing token sequences or
functions might break programs.
The Perl 5 core includes mechanisms to help porters make backwards
incompatible changes more compatible such as the L<feature> and
L<deprecate> modules. Please use them when appropriate.
=head3 Could it be a module instead?
Perl 5 has extension mechanisms, modules and XS, specifically to avoid
the need to keep changing the Perl interpreter. You can write modules
that export functions, you can give those functions prototypes so they
can be called like built-in functions, you can even write XS code to
mess with the runtime data structures of the Perl interpreter if you
want to implement really complicated things.
Whenever possible, new features should be prototyped in a CPAN module
before they will be considered for the core.
=head3 Is the feature generic enough?
Is this something that only the submitter wants added to the language,
or is it broadly useful? Sometimes, instead of adding a feature with a
tight focus, the porters might decide to wait until someone implements
the more generalized feature.
=head3 Does it potentially introduce new bugs?
Radical rewrites of large chunks of the Perl interpreter have the
potential to introduce new bugs.
=head3 How big is it?
The smaller and more localized the change, the better. Similarly, a
series of small patches is greatly preferred over a single large patch.
=head3 Does it preclude other desirable features?
A patch is likely to be rejected if it closes off future avenues of
development. For instance, a patch that placed a true and final
interpretation on prototypes is likely to be rejected because there are
still options for the future of prototypes that haven't been addressed.
=head3 Is the implementation robust?
Good patches (tight code, complete, correct) stand more chance of going
in. Sloppy or incorrect patches might be placed on the back burner
until the pumpking has time to fix, or might be discarded altogether
without further notice.
=head3 Is the implementation generic enough to be portable?
The worst patches make use of system-specific features. It's highly
unlikely that non-portable additions to the Perl language will be
accepted.
=head3 Is the implementation tested?
Patches which change behaviour (fixing bugs or introducing new
features) must include regression tests to verify that everything works
as expected.
Without tests provided by the original author, how can anyone else
changing perl in the future be sure that they haven't unwittingly
broken the behaviour the patch implements? And without tests, how can
the patch's author be confident that his/her hard work put into the
patch won't be accidentally thrown away by someone in the future?
=head3 Is there enough documentation?
Patches without documentation are probably ill-thought out or
incomplete. No features can be added or changed without documentation,
so submitting a patch for the appropriate pod docs as well as the
source code is important.
=head3 Is there another way to do it?
Larry said "Although the Perl Slogan is I<There's More Than One Way to
Do It>, I hesitate to make 10 ways to do something". This is a tricky
heuristic to navigate, though--one man's essential addition is another
man's pointless cruft.
=head3 Does it create too much work?
Work for the pumpking, work for Perl programmers, work for module
authors, ... Perl is supposed to be easy.
=head3 Patches speak louder than words
Working code is always preferred to pie-in-the-sky ideas. A patch to
add a feature stands a much higher chance of making it to the language
than does a random feature request, no matter how fervently argued the
request might be. This ties into "Will it be useful?", as the fact
that someone took the time to make the patch demonstrates a strong
desire for the feature.
=head1 TESTING
The core uses the same testing style as the rest of Perl, a simple
"ok/not ok" run through Test::Harness, but there are a few special
considerations.
There are three ways to write a test in the core: L<Test::More>,
F<t/test.pl> and ad hoc C<print $test ? "ok 42\n" : "not ok 42\n">.
The decision of which to use depends on what part of the test suite
you're working on. This is a measure to prevent a high-level failure
(such as Config.pm breaking) from causing basic functionality tests to
fail.
The F<t/test.pl> library provides some of the features of
L<Test::More>, but avoids loading most modules and uses as few core
features as possible.
If you write your own test, use the L<Test Anything
Protocol|http://testanything.org>.
=over 4
=item * F<t/base>, F<t/comp> and F<t/opbasic>
Since we don't know if C<require> works, or even subroutines, use ad hoc
tests for these three. Step carefully to avoid using the feature being
tested. Tests in F<t/opbasic>, for instance, have been placed there
rather than in F<t/op> because they test functionality which
F<t/test.pl> presumes has already been demonstrated to work.
=item * F<t/cmd>, F<t/run>, F<t/io> and F<t/op>
Now that basic require() and subroutines are tested, you can use the
F<t/test.pl> library.
You can also use certain libraries like Config conditionally, but be
sure to skip the test gracefully if it's not there.
=item * Everything else
Now that the core of Perl is tested, L<Test::More> can and should be
used. You can also use the full suite of core modules in the tests.
=back
When you say "make test", Perl uses the F<t/TEST> program to run the
test suite (except under Win32 where it uses F<t/harness> instead).
All tests are run from the F<t/> directory, B<not> the directory which
contains the test. This causes some problems with the tests in
F<lib/>, so here's some opportunity for some patching.
You must be triply conscious of cross-platform concerns. This usually
boils down to using L<File::Spec>, avoiding things like C<fork()>
and C<system()> unless absolutely necessary, and not assuming that a
given character has a particular ordinal value (code point) or that its
UTF-8 representation is composed of particular bytes.
There are several functions available to specify characters and code
points portably in tests. The always-preloaded functions
C<utf8::unicode_to_native()> and its inverse
C<utf8::native_to_unicode()> take code points and translate
appropriately. The file F<t/charset_tools.pl> has several functions
that can be useful. It has versions of the previous two functions
that take strings as inputs -- not single numeric code points:
C<uni_to_native()> and C<native_to_uni()>. If you must look at the
individual bytes comprising a UTF-8 encoded string,
C<byte_utf8a_to_utf8n()> takes as input a string of those bytes encoded
for an ASCII platform, and returns the equivalent string in the native
platform. For example, C<byte_utf8a_to_utf8n("\xC2\xA0")> returns the
byte sequence on the current platform that form the UTF-8 for C<U+00A0>,
since C<"\xC2\xA0"> are the UTF-8 bytes on an ASCII platform for that
code point. This function returns C<"\xC2\xA0"> on an ASCII platform, and
C<"\x80\x41"> on an EBCDIC 1047 one.
But easiest is, if the character is specifiable as a literal, like
C<"A"> or C<"%">, to use that; if not so specificable, you can use use
C<\N{}> , if the side effects aren't troublesome. Simply specify all
your characters in hex, using C<\N{U+ZZ}> instead of C<\xZZ>. C<\N{}>
is the Unicode name, and so it
always gives you the Unicode character. C<\N{U+41}> is the character
whose Unicode code point is C<0x41>, hence is C<'A'> on all platforms.
The side effects are:
=over 4
=item *
These select Unicode rules. That means that in double-quotish strings,
the string is always converted to UTF-8 to force a Unicode
interpretation (you can C<utf8::downgrade()> afterwards to convert back
to non-UTF8, if possible). In regular expression patterns, the
conversion isn't done, but if the character set modifier would
otherwise be C</d>, it is changed to C</u>.
=item *
If you use the form C<\N{I<character name>}>, the L<charnames> module
gets automatically loaded. This may not be suitable for the test level
you are doing.
=back
If you are testing locales (see L<perllocale>), there are helper
functions in F<t/loc_tools.pl> to enable you to see what locales there
are on the current platform.
=head2 Special C<make test> targets
There are various special make targets that can be used to test Perl
slightly differently than the standard "test" target. Not all them are
expected to give a 100% success rate. Many of them have several
aliases, and many of them are not available on certain operating
systems.
=over 4
=item * test_porting
This runs some basic sanity tests on the source tree and helps catch
basic errors before you submit a patch.
=item * minitest
Run F<miniperl> on F<t/base>, F<t/comp>, F<t/cmd>, F<t/run>, F<t/io>,
F<t/op>, F<t/uni> and F<t/mro> tests.
=item * test.valgrind check.valgrind
(Only in Linux) Run all the tests using the memory leak + naughty
memory access tool "valgrind". The log files will be named
F<testname.valgrind>.
=item * test_harness
Run the test suite with the F<t/harness> controlling program, instead
of F<t/TEST>. F<t/harness> is more sophisticated, and uses the
L<Test::Harness> module, thus using this test target supposes that perl
mostly works. The main advantage for our purposes is that it prints a
detailed summary of failed tests at the end. Also, unlike F<t/TEST>,
it doesn't redirect stderr to stdout.
Note that under Win32 F<t/harness> is always used instead of F<t/TEST>,
so there is no special "test_harness" target.
Under Win32's "test" target you may use the TEST_SWITCHES and
TEST_FILES environment variables to control the behaviour of
F<t/harness>. This means you can say
nmake test TEST_FILES="op/*.t"
nmake test TEST_SWITCHES="-torture" TEST_FILES="op/*.t"
=item * test-notty test_notty
Sets PERL_SKIP_TTY_TEST to true before running normal test.
=back
=head2 Parallel tests
The core distribution can now run its regression tests in parallel on
Unix-like platforms. Instead of running C<make test>, set C<TEST_JOBS>
in your environment to the number of tests to run in parallel, and run
C<make test_harness>. On a Bourne-like shell, this can be done as
TEST_JOBS=3 make test_harness # Run 3 tests in parallel
An environment variable is used, rather than parallel make itself,
because L<TAP::Harness> needs to be able to schedule individual
non-conflicting test scripts itself, and there is no standard interface
to C<make> utilities to interact with their job schedulers.
Note that currently some test scripts may fail when run in parallel
(most notably F<dist/IO/t/io_dir.t>). If necessary, run just the
failing scripts again sequentially and see if the failures go away.
=head2 Running tests by hand
You can run part of the test suite by hand by using one of the
following commands from the F<t/> directory:
./perl -I../lib TEST list-of-.t-files
or
./perl -I../lib harness list-of-.t-files
(If you don't specify test scripts, the whole test suite will be run.)
=head2 Using F<t/harness> for testing
If you use C<harness> for testing, you have several command line
options available to you. The arguments are as follows, and are in the
order that they must appear if used together.
harness -v -torture -re=pattern LIST OF FILES TO TEST
harness -v -torture -re LIST OF PATTERNS TO MATCH
If C<LIST OF FILES TO TEST> is omitted, the file list is obtained from
the manifest. The file list may include shell wildcards which will be
expanded out.
=over 4
=item * -v
Run the tests under verbose mode so you can see what tests were run,
and debug output.
=item * -torture
Run the torture tests as well as the normal set.
=item * -re=PATTERN
Filter the file list so that all the test files run match PATTERN.
Note that this form is distinct from the B<-re LIST OF PATTERNS> form
below in that it allows the file list to be provided as well.
=item * -re LIST OF PATTERNS
Filter the file list so that all the test files run match
/(LIST|OF|PATTERNS)/. Note that with this form the patterns are joined
by '|' and you cannot supply a list of files, instead the test files
are obtained from the MANIFEST.
=back
You can run an individual test by a command similar to
./perl -I../lib path/to/foo.t
except that the harnesses set up some environment variables that may
affect the execution of the test:
=over 4
=item * PERL_CORE=1
indicates that we're running this test as part of the perl core test
suite. This is useful for modules that have a dual life on CPAN.
=item * PERL_DESTRUCT_LEVEL=2
is set to 2 if it isn't set already (see
L<perlhacktips/PERL_DESTRUCT_LEVEL>).
=item * PERL
(used only by F<t/TEST>) if set, overrides the path to the perl
executable that should be used to run the tests (the default being
F<./perl>).
=item * PERL_SKIP_TTY_TEST
if set, tells to skip the tests that need a terminal. It's actually
set automatically by the Makefile, but can also be forced artificially
by running 'make test_notty'.
=back
=head3 Other environment variables that may influence tests
=over 4
=item * PERL_TEST_Net_Ping
Setting this variable runs all the Net::Ping modules tests, otherwise
some tests that interact with the outside world are skipped. See
L<perl58delta>.
=item * PERL_TEST_NOVREXX
Setting this variable skips the vrexx.t tests for OS2::REXX.
=item * PERL_TEST_NUMCONVERTS
This sets a variable in op/numconvert.t.
=item * PERL_TEST_MEMORY
Setting this variable includes the tests in F<t/bigmem/>. This should
be set to the number of gigabytes of memory available for testing, eg.
C<PERL_TEST_MEMORY=4> indicates that tests that require 4GiB of
available memory can be run safely.
=back
See also the documentation for the Test and Test::Harness modules, for
more environment variables that affect testing.
=head2 Performance testing
The file F<t/perf/benchmarks> contains snippets of perl code which are
intended to be benchmarked across a range of perls by the
F<Porting/bench.pl> tool. If you fix or enhance a performance issue, you
may want to add a representative code sample to the file, then run
F<bench.pl> against the previous and current perls to see what difference
it has made, and whether anything else has slowed down as a consequence.
The file F<t/perf/opcount.t> is designed to test whether a particular
code snippet has been compiled into an optree containing specified
numbers of particular op types. This is good for testing whether
optimisations which alter ops, such as converting an C<aelem> op into an
C<aelemfast> op, are really doing that.
The files F<t/perf/speed.t> and F<t/re/speed.t> are designed to test
things that run thousands of times slower if a particular optimisation
is broken (for example, the utf8 length cache on long utf8 strings).
Add a test that will take a fraction of a second normally, and minutes
otherwise, causing the test file to time out on failure.
=head1 MORE READING FOR GUTS HACKERS
To hack on the Perl guts, you'll need to read the following things:
=over 4
=item * L<perlsource>
An overview of the Perl source tree. This will help you find the files
you're looking for.
=item * L<perlinterp>
An overview of the Perl interpreter source code and some details on how
Perl does what it does.
=item * L<perlhacktut>
This document walks through the creation of a small patch to Perl's C
code. If you're just getting started with Perl core hacking, this will
help you understand how it works.
=item * L<perlhacktips>
More details on hacking the Perl core. This document focuses on lower
level details such as how to write tests, compilation issues,
portability, debugging, etc.
If you plan on doing serious C hacking, make sure to read this.
=item * L<perlguts>
This is of paramount importance, since it's the documentation of what
goes where in the Perl source. Read it over a couple of times and it
might start to make sense - don't worry if it doesn't yet, because the
best way to study it is to read it in conjunction with poking at Perl
source, and we'll do that later on.
Gisle Aas's "illustrated perlguts", also known as I<illguts>, has very
helpful pictures:
L<http://search.cpan.org/dist/illguts/>
=item * L<perlxstut> and L<perlxs>
A working knowledge of XSUB programming is incredibly useful for core
hacking; XSUBs use techniques drawn from the PP code, the portion of
the guts that actually executes a Perl program. It's a lot gentler to
learn those techniques from simple examples and explanation than from
the core itself.
=item * L<perlapi>
The documentation for the Perl API explains what some of the internal
functions do, as well as the many macros used in the source.
=item * F<Porting/pumpkin.pod>
This is a collection of words of wisdom for a Perl porter; some of it
is only useful to the pumpkin holder, but most of it applies to anyone
wanting to go about Perl development.
=back
=head1 CPAN TESTERS AND PERL SMOKERS
The CPAN testers ( L<http://testers.cpan.org/> ) are a group of volunteers
who test CPAN modules on a variety of platforms.
Perl Smokers ( L<http://www.nntp.perl.org/group/perl.daily-build/> and
L<http://www.nntp.perl.org/group/perl.daily-build.reports/> )
automatically test Perl source releases on platforms with various
configurations.
Both efforts welcome volunteers. In order to get involved in smoke
testing of the perl itself visit
L<http://search.cpan.org/dist/Test-Smoke/>. In order to start smoke
testing CPAN modules visit
L<http://search.cpan.org/dist/CPANPLUS-YACSmoke/> or
L<http://search.cpan.org/dist/minismokebox/> or
L<http://search.cpan.org/dist/CPAN-Reporter/>.
=head1 WHAT NEXT?
If you've read all the documentation in the document and the ones
listed above, you're more than ready to hack on Perl.
Here's some more recommendations
=over 4
=item *
Subscribe to perl5-porters, follow the patches and try and understand
them; don't be afraid to ask if there's a portion you're not clear on -
who knows, you may unearth a bug in the patch...
=item *
Do read the README associated with your operating system, e.g.
README.aix on the IBM AIX OS. Don't hesitate to supply patches to that
README if you find anything missing or changed over a new OS release.
=item *
Find an area of Perl that seems interesting to you, and see if you can
work out how it works. Scan through the source, and step over it in
the debugger. Play, poke, investigate, fiddle! You'll probably get to
understand not just your chosen area but a much wider range of
F<perl>'s activity as well, and probably sooner than you'd think.
=back
=head2 "The Road goes ever on and on, down from the door where it began."
If you can do these things, you've started on the long road to Perl
porting. Thanks for wanting to help make Perl better - and happy
hacking!
=head2 Metaphoric Quotations
If you recognized the quote about the Road above, you're in luck.
Most software projects begin each file with a literal description of
each file's purpose. Perl instead begins each with a literary allusion
to that file's purpose.
Like chapters in many books, all top-level Perl source files (along
with a few others here and there) begin with an epigrammatic
inscription that alludes, indirectly and metaphorically, to the
material you're about to read.
Quotations are taken from writings of J.R.R. Tolkien pertaining to his
Legendarium, almost always from I<The Lord of the Rings>. Chapters and
page numbers are given using the following editions:
=over 4
=item *
I<The Hobbit>, by J.R.R. Tolkien. The hardcover, 70th-anniversary
edition of 2007 was used, published in the UK by Harper Collins
Publishers and in the US by the Houghton Mifflin Company.
=item *
I<The Lord of the Rings>, by J.R.R. Tolkien. The hardcover,
50th-anniversary edition of 2004 was used, published in the UK by
Harper Collins Publishers and in the US by the Houghton Mifflin
Company.
=item *
I<The Lays of Beleriand>, by J.R.R. Tolkien and published posthumously
by his son and literary executor, C.J.R. Tolkien, being the 3rd of the
12 volumes in Christopher's mammoth I<History of Middle Earth>. Page
numbers derive from the hardcover edition, first published in 1983 by
George Allen & Unwin; no page numbers changed for the special 3-volume
omnibus edition of 2002 or the various trade-paper editions, all again
now by Harper Collins or Houghton Mifflin.
=back
Other JRRT books fair game for quotes would thus include I<The
Adventures of Tom Bombadil>, I<The Silmarillion>, I<Unfinished Tales>,
and I<The Tale of the Children of Hurin>, all but the first
posthumously assembled by CJRT. But I<The Lord of the Rings> itself is
perfectly fine and probably best to quote from, provided you can find a
suitable quote there.
So if you were to supply a new, complete, top-level source file to add
to Perl, you should conform to this peculiar practice by yourself
selecting an appropriate quotation from Tolkien, retaining the original
spelling and punctuation and using the same format the rest of the
quotes are in. Indirect and oblique is just fine; remember, it's a
metaphor, so being meta is, after all, what it's for.
=head1 AUTHOR
This document was originally written by Nathan Torkington, and is
maintained by the perl5-porters mailing list.
PK z3�ZS�W�, �, perl5243delta.podnu �[��� =encoding utf8
=head1 NAME
perl5243delta - what is new for perl v5.24.3
=head1 DESCRIPTION
This document describes differences between the 5.24.2 release and the 5.24.3
release.
If you are upgrading from an earlier release such as 5.24.1, first read
L<perl5242delta>, which describes differences between 5.24.1 and 5.24.2.
=head1 Security
=head2 [CVE-2017-12837] Heap buffer overflow in regular expression compiler
Compiling certain regular expression patterns with the case-insensitive
modifier could cause a heap buffer overflow and crash perl. This has now been
fixed.
L<[perl #131582]|https://rt.perl.org/Public/Bug/Display.html?id=131582>
=head2 [CVE-2017-12883] Buffer over-read in regular expression parser
For certain types of syntax error in a regular expression pattern, the error
message could either contain the contents of a random, possibly large, chunk of
memory, or could crash perl. This has now been fixed.
L<[perl #131598]|https://rt.perl.org/Public/Bug/Display.html?id=131598>
=head2 [CVE-2017-12814] C<$ENV{$key}> stack buffer overflow on Windows
A possible stack buffer overflow in the C<%ENV> code on Windows has been fixed
by removing the buffer completely since it was superfluous anyway.
L<[perl #131665]|https://rt.perl.org/Public/Bug/Display.html?id=131665>
=head1 Incompatible Changes
There are no changes intentionally incompatible with 5.24.2. If any exist,
they are bugs, and we request that you submit a report. See L</Reporting
Bugs> below.
=head1 Modules and Pragmata
=head2 Updated Modules and Pragmata
=over 4
=item *
L<Module::CoreList> has been upgraded from version 5.20170715_24 to
5.20170922_24.
=item *
L<POSIX> has been upgraded from version 1.65 to 1.65_01.
=item *
L<Time::HiRes> has been upgraded from version 1.9733 to 1.9741.
L<[perl #128427]|https://rt.perl.org/Public/Bug/Display.html?id=128427>
L<[perl #128445]|https://rt.perl.org/Public/Bug/Display.html?id=128445>
L<[perl #128972]|https://rt.perl.org/Public/Bug/Display.html?id=128972>
L<[cpan #120032]|https://rt.cpan.org/Public/Bug/Display.html?id=120032>
=back
=head1 Configuration and Compilation
=over 4
=item *
When building with GCC 6 and link-time optimization (the B<-flto> option to
B<gcc>), F<Configure> was treating all probed symbols as present on the system,
regardless of whether they actually exist. This has been fixed.
L<[perl #128131]|https://rt.perl.org/Public/Bug/Display.html?id=128131>
=item *
F<Configure> now aborts if both C<-Duselongdouble> and C<-Dusequadmath> are
requested.
L<[perl #126203]|https://rt.perl.org/Public/Bug/Display.html?id=126203>
=item *
Fixed a bug in which F<Configure> could append C<-quadmath> to the archname
even if it was already present.
L<[perl #128538]|https://rt.perl.org/Public/Bug/Display.html?id=128538>
=item *
Clang builds with C<-DPERL_GLOBAL_STRUCT> or C<-DPERL_GLOBAL_STRUCT_PRIVATE>
have been fixed (by disabling Thread Safety Analysis for these configurations).
=back
=head1 Platform Support
=head2 Platform-Specific Notes
=over 4
=item VMS
=over 4
=item *
C<configure.com> now recognizes the VSI-branded C compiler.
=back
=item Windows
=over 4
=item *
Building XS modules with GCC 6 in a 64-bit build of Perl failed due to
incorrect mapping of C<strtoll> and C<strtoull>. This has now been fixed.
L<[perl #131726]|https://rt.perl.org/Public/Bug/Display.html?id=131726>
L<[cpan #121683]|https://rt.cpan.org/Public/Bug/Display.html?id=121683>
L<[cpan #122353]|https://rt.cpan.org/Public/Bug/Display.html?id=122353>
=back
=back
=head1 Selected Bug Fixes
=over 4
=item *
C<< /@0{0*-E<gt>@*/*0 >> and similar contortions used to crash, but no longer
do, but merely produce a syntax error.
L<[perl #128171]|https://rt.perl.org/Public/Bug/Display.html?id=128171>
=item *
C<do> or C<require> with an argument which is a reference or typeglob which,
when stringified, contains a null character, started crashing in Perl 5.20, but
has now been fixed.
L<[perl #128182]|https://rt.perl.org/Public/Bug/Display.html?id=128182>
=item *
Expressions containing an C<&&> or C<||> operator (or their synonyms C<and> and
C<or>) were being compiled incorrectly in some cases. If the left-hand side
consisted of either a negated bareword constant or a negated C<do {}> block
containing a constant expression, and the right-hand side consisted of a
negated non-foldable expression, one of the negations was effectively ignored.
The same was true of C<if> and C<unless> statement modifiers, though with the
left-hand and right-hand sides swapped. This long-standing bug has now been
fixed.
L<[perl #127952]|https://rt.perl.org/Public/Bug/Display.html?id=127952>
=item *
C<reset> with an argument no longer crashes when encountering stash entries
other than globs.
L<[perl #128106]|https://rt.perl.org/Public/Bug/Display.html?id=128106>
=item *
Assignment of hashes to, and deletion of, typeglobs named C<*::::::> no longer
causes crashes.
L<[perl #128086]|https://rt.perl.org/Public/Bug/Display.html?id=128086>
=item *
Assignment variants of any bitwise ops under the C<bitwise> feature would crash
if the left-hand side was an array or hash.
L<[perl #128204]|https://rt.perl.org/Public/Bug/Display.html?id=128204>
=item *
C<socket> now leaves the error code returned by the system in C<$!> on failure.
L<[perl #128316]|https://rt.perl.org/Public/Bug/Display.html?id=128316>
=item *
Parsing bad POSIX charclasses no longer leaks memory.
L<[perl #128313]|https://rt.perl.org/Public/Bug/Display.html?id=128313>
=item *
Since Perl 5.20, line numbers have been off by one when perl is invoked with
the B<-x> switch. This has been fixed.
L<[perl #128508]|https://rt.perl.org/Public/Bug/Display.html?id=128508>
=item *
Some obscure cases of subroutines and file handles being freed at the same time
could result in crashes, but have been fixed. The crash was introduced in Perl
5.22.
L<[perl #128597]|https://rt.perl.org/Public/Bug/Display.html?id=128597>
=item *
Some regular expression parsing glitches could lead to assertion failures with
regular expressions such as C</(?E<lt>=/> and C</(?E<lt>!/>. This has now been
fixed.
L<[perl #128170]|https://rt.perl.org/Public/Bug/Display.html?id=128170>
=item *
C<gethostent> and similar functions now perform a null check internally, to
avoid crashing with the torsocks library. This was a regression from Perl
5.22.
L<[perl #128740]|https://rt.perl.org/Public/Bug/Display.html?id=128740>
=item *
Mentioning the same constant twice in a row (which is a syntax error) no longer
fails an assertion under debugging builds. This was a regression from Perl
5.20.
L<[perl #126482]|https://rt.perl.org/Public/Bug/Display.html?id=126482>
=item *
In Perl 5.24 C<fchown> was changed not to accept negative one as an argument
because in some platforms that is an error. However, in some other platforms
that is an acceptable argument. This change has been reverted.
L<[perl #128967]|https://rt.perl.org/Public/Bug/Display.html?id=128967>.
=item *
C<@{x> followed by a newline where C<"x"> represents a control or non-ASCII
character no longer produces a garbled syntax error message or a crash.
L<[perl #128951]|https://rt.perl.org/Public/Bug/Display.html?id=128951>
=item *
A regression in Perl 5.24 with C<tr/\N{U+...}/foo/> when the code point was
between 128 and 255 has been fixed.
L<[perl #128734]|https://rt.perl.org/Public/Bug/Display.html?id=128734>.
=item *
Many issues relating to C<printf "%a"> of hexadecimal floating point were
fixed. In addition, the "subnormals" (formerly known as "denormals") floating
point numbers are now supported both with the plain IEEE 754 floating point
numbers (64-bit or 128-bit) and the x86 80-bit "extended precision". Note that
subnormal hexadecimal floating point literals will give a warning about
"exponent underflow".
L<[perl #128843]|https://rt.perl.org/Public/Bug/Display.html?id=128843>
L<[perl #128888]|https://rt.perl.org/Public/Bug/Display.html?id=128888>
L<[perl #128889]|https://rt.perl.org/Public/Bug/Display.html?id=128889>
L<[perl #128890]|https://rt.perl.org/Public/Bug/Display.html?id=128890>
L<[perl #128893]|https://rt.perl.org/Public/Bug/Display.html?id=128893>
L<[perl #128909]|https://rt.perl.org/Public/Bug/Display.html?id=128909>
L<[perl #128919]|https://rt.perl.org/Public/Bug/Display.html?id=128919>
=item *
The parser could sometimes crash if a bareword came after C<evalbytes>.
L<[perl #129196]|https://rt.perl.org/Public/Bug/Display.html?id=129196>
=item *
Fixed a place where the regex parser was not setting the syntax error correctly
on a syntactically incorrect pattern.
L<[perl #129122]|https://rt.perl.org/Public/Bug/Display.html?id=129122>
=item *
A vulnerability in Perl's C<sprintf> implementation has been fixed by avoiding
a possible memory wrap.
L<[perl #131260]|https://rt.perl.org/Public/Bug/Display.html?id=131260>
=back
=head1 Acknowledgements
Perl 5.24.3 represents approximately 2 months of development since Perl 5.24.2
and contains approximately 3,200 lines of changes across 120 files from 23
authors.
Excluding auto-generated files, documentation and release tools, there were
approximately 1,600 lines of changes to 56 .pm, .t, .c and .h files.
Perl continues to flourish into its third decade thanks to a vibrant community
of users and developers. The following people are known to have contributed
the improvements that became Perl 5.24.3:
Aaron Crane, Craig A. Berry, Dagfinn Ilmari Mannsåker, Dan Collins, Daniel
Dragan, Dave Cross, David Mitchell, Eric Herman, Father Chrysostomos, H.Merijn
Brand, Hugo van der Sanden, James E Keenan, Jarkko Hietaniemi, John SJ
Anderson, Karl Williamson, Ken Brown, Lukas Mai, Matthew Horsfall, Stevan
Little, Steve Hay, Steven Humphrey, Tony Cook, Yves Orton.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles recently
posted to the comp.lang.perl.misc newsgroup and the perl bug database at
L<https://rt.perl.org/> . There may also be information at
L<http://www.perl.org/> , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug> program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of C<perl -V>,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications which make it
inappropriate to send to a publicly archived mailing list, then see
L<perlsec/SECURITY VULNERABILITY CONTACT INFORMATION> for details of how to
report the issue.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details on
what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Z�s�� perl5242delta.podnu �[��� =encoding utf8
=head1 NAME
perl5242delta - what is new for perl v5.24.2
=head1 DESCRIPTION
This document describes differences between the 5.24.1 release and the 5.24.2
release.
If you are upgrading from an earlier release such as 5.24.0, first read
L<perl5241delta>, which describes differences between 5.24.0 and 5.24.1.
=head1 Security
=head2 Improved handling of '.' in @INC in base.pm
The handling of (the removal of) C<'.'> in C<@INC> in L<base> has been
improved. This resolves some problematic behaviour in the approach taken in
Perl 5.24.1, which is probably best described in the following two threads on
the Perl 5 Porters mailing list:
L<http://www.nntp.perl.org/group/perl.perl5.porters/2016/08/msg238991.html>,
L<http://www.nntp.perl.org/group/perl.perl5.porters/2016/10/msg240297.html>.
=head2 "Escaped" colons and relative paths in PATH
On Unix systems, Perl treats any relative paths in the PATH environment
variable as tainted when starting a new process. Previously, it was allowing a
backslash to escape a colon (unlike the OS), consequently allowing relative
paths to be considered safe if the PATH was set to something like C</\:.>. The
check has been fixed to treat C<.> as tainted in that example.
=head1 Modules and Pragmata
=head2 Updated Modules and Pragmata
=over 4
=item *
L<base> has been upgraded from version 2.23 to 2.23_01.
=item *
L<Module::CoreList> has been upgraded from version 5.20170114_24 to 5.20170715_24.
=back
=head1 Selected Bug Fixes
=over 4
=item *
Fixed a crash with C<s///l> where it thought it was dealing with UTF-8 when it
wasn't.
L<[perl #129038]|https://rt.perl.org/Ticket/Display.html?id=129038>
=back
=head1 Acknowledgements
Perl 5.24.2 represents approximately 6 months of development since Perl 5.24.1
and contains approximately 2,500 lines of changes across 53 files from 18
authors.
Excluding auto-generated files, documentation and release tools, there were
approximately 960 lines of changes to 17 .pm, .t, .c and .h files.
Perl continues to flourish into its third decade thanks to a vibrant community
of users and developers. The following people are known to have contributed
the improvements that became Perl 5.24.2:
Aaron Crane, Abigail, Aristotle Pagaltzis, Chris 'BinGOs' Williams, Dan
Collins, David Mitchell, Eric Herman, Father Chrysostomos, James E Keenan, Karl
Williamson, Lukas Mai, Renee Baecker, Ricardo Signes, Sawyer X, Stevan Little,
Steve Hay, Tony Cook, Yves Orton.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles recently
posted to the comp.lang.perl.misc newsgroup and the perl bug database at
L<https://rt.perl.org/> . There may also be information at
L<http://www.perl.org/> , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug> program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of C<perl -V>,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications which make it
inappropriate to send to a publicly archived mailing list, then see
L<perlsec/SECURITY VULNERABILITY CONTACT INFORMATION>
for details of how to report the issue.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details on
what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Z�1V�S S
perlcn.podnu �[��� =encoding utf8
如果你用一般的文字编辑器阅览这份文件, 请忽略文中奇特的注记字符.
这份文件是以 POD (简明文件格式) 写成; 这种格式是为了能让人直接阅读,
而特别设计的. 关于此格式的进一步信息, 请参考 perlpod 线上文件.
=head1 NAME
perlcn - 简体中文 Perl 指南
=head1 DESCRIPTION
欢迎来到 Perl 的天地!
从 5.8.0 版开始, Perl 具备了完善的 Unicode (统一码) 支援,
也连带支援了许多拉丁语系以外的编码方式; CJK (中日韩) 便是其中的一部份.
Unicode 是国际性的标准, 试图涵盖世界上所有的字符: 西方世界, 东方世界,
以及两者间的一切 (希腊文, 叙利亚文, 亚拉伯文, 希伯来文, 印度文,
印地安文, 等等). 它也容纳了多种作业系统与平台 (如 PC 及麦金塔).
Perl 本身以 Unicode 进行操作. 这表示 Perl 内部的字符串数据可用 Unicode
表示; Perl 的函式与算符 (例如正规表示式比对) 也能对 Unicode 进行操作.
在输入及输出时, 为了处理以 Unicode 之前的编码方式存放的数据, Perl
提供了 Encode 这个模块, 可以让你轻易地读取及写入旧有的编码数据.
Encode 延伸模块支援下列简体中文的编码方式 ('gb2312' 表示 'euc-cn'):
euc-cn Unix 延伸字符集, 也就是俗称的国标码
gb2312-raw 未经处理的 (低比特) GB2312 字符表
gb12345 未经处理的中国用繁体中文编码
iso-ir-165 GB2312 + GB6345 + GB8565 + 新增字符
cp936 字码页 936, 也可以用 'GBK' (扩充国标码) 指明
hz 7 比特逸出式 GB2312 编码
举例来说, 将 EUC-CN 编码的档案转成 Unicode, 祗需键入下列指令:
perl -Mencoding=euc-cn,STDOUT,utf8 -pe1 < file.euc-cn > file.utf8
Perl 也内附了 "piconv", 一支完全以 Perl 写成的字符转换工具程序, 用法如下:
piconv -f euc-cn -t utf8 < file.euc-cn > file.utf8
piconv -f utf8 -t euc-cn < file.utf8 > file.euc-cn
另外, 利用 encoding 模块, 你可以轻易写出以字符为单位的程序码, 如下所示:
#!/usr/bin/env perl
# 启动 euc-cn 字串解析; 标准输出入及标准错误都设为 euc-cn 编码
use encoding 'euc-cn', STDIN => 'euc-cn', STDOUT => 'euc-cn';
print length("骆驼"); # 2 (双引号表示字符)
print length('骆驼'); # 4 (单引号表示字节)
print index("谆谆教诲", "蛔唤"); # -1 (不包含此子字符串)
print index('谆谆教诲', '蛔唤'); # 1 (从第二个字节开始)
在最后一列例子里, "谆" 的第二个字节与 "谆" 的第一个字节结合成 EUC-CN
码的 "蛔"; "谆" 的第二个字节则与 "教" 的第一个字节结合成 "唤".
这解决了以前 EUC-CN 码比对处理上常见的问题.
=head2 额外的中文编码
如果需要更多的中文编码, 可以从 CPAN (L<http://www.cpan.org/>) 下载
Encode::HanExtra 模块. 它目前提供下列编码方式:
gb18030 扩充过的国标码, 包含繁体中文
另外, Encode::HanConvert 模块则提供了简繁转换用的两种编码:
big5-simp Big5 繁体中文与 Unicode 简体中文互转
gbk-trad GBK 简体中文与 Unicode 繁体中文互转
若想在 GBK 与 Big5 之间互转, 请参考该模块内附的 b2g.pl 与 g2b.pl 两支程序,
或在程序内使用下列写法:
use Encode::HanConvert;
$euc_cn = big5_to_gb($big5); # 从 Big5 转为 GBK
$big5 = gb_to_big5($euc_cn); # 从 GBK 转为 Big5
=head2 进一步的信息
请参考 Perl 内附的大量说明文件 (不幸全是用英文写的), 来学习更多关于
Perl 的知识, 以及 Unicode 的使用方式. 不过, 外部的资源相当丰富:
=head2 提供 Perl 资源的网址
=over 4
=item L<http://www.perl.com/>
Perl 的首页 (由欧莱礼公司维护)
=item L<http://www.cpan.org/>
Perl 综合典藏网 (Comprehensive Perl Archive Network)
=item L<http://lists.perl.org/>
Perl 邮递论坛一览
=back
=head2 学习 Perl 的网址
=over 4
=item L<http://www.oreilly.com.cn/index.php?func=booklist&cat=68>
简体中文版的欧莱礼 Perl 书藉
=back
=head2 Perl 使用者集会
=over 4
=item L<http://www.pm.org/groups/asia.html>
中国 Perl 推广组一览
=back
=head2 Unicode 相关网址
=over 4
=item L<http://www.unicode.org/>
Unicode 学术学会 (Unicode 标准的制定者)
=item L<http://www.cl.cam.ac.uk/%7Emgk25/unicode.html>
Unix/Linux 上的 UTF-8 及 Unicode 答客问
=back
=head1 SEE ALSO
L<Encode>, L<Encode::CN>, L<encoding>, L<perluniintro>, L<perlunicode>
=head1 AUTHORS
Jarkko Hietaniemi E<lt>jhi@iki.fiE<gt>
Audrey Tang (唐凤) E<lt>audreyt@audreyt.orgE<gt>
=cut
PK z3�Zn�W�� �
perlcheat.podnu �[��� =head1 NAME
perlcheat - Perl 5 Cheat Sheet
=head1 DESCRIPTION
This 'cheat sheet' is a handy reference, meant for beginning Perl
programmers. Not everything is mentioned, but 195 features may
already be overwhelming.
=head2 The sheet
CONTEXTS SIGILS ref ARRAYS HASHES
void $scalar SCALAR @array %hash
scalar @array ARRAY @array[0, 2] @hash{'a', 'b'}
list %hash HASH $array[0] $hash{'a'}
&sub CODE
*glob GLOB SCALAR VALUES
FORMAT number, string, ref, glob, undef
REFERENCES
\ reference $$foo[1] aka $foo->[1]
$@%&* dereference $$foo{bar} aka $foo->{bar}
[] anon. arrayref ${$$foo[1]}[2] aka $foo->[1]->[2]
{} anon. hashref ${$$foo[1]}[2] aka $foo->[1][2]
\() list of refs
SYNTAX
OPERATOR PRECEDENCE foreach (LIST) { } for (a;b;c) { }
-> while (e) { } until (e) { }
++ -- if (e) { } elsif (e) { } else { }
** unless (e) { } elsif (e) { } else { }
! ~ \ u+ u- given (e) { when (e) {} default {} }
=~ !~
* / % x NUMBERS vs STRINGS FALSE vs TRUE
+ - . = = undef, "", 0, "0"
<< >> + . anything else
named uops == != eq ne
< > <= >= lt gt le ge < > <= >= lt gt le ge
== != <=> eq ne cmp ~~ <=> cmp
&
| ^ REGEX MODIFIERS REGEX METACHARS
&& /i case insensitive ^ string begin
|| // /m line based ^$ $ str end (bfr \n)
.. ... /s . includes \n + one or more
?: /x /xx ign. wh.space * zero or more
= += last goto /p preserve ? zero or one
, => /a ASCII /aa safe {3,7} repeat in range
list ops /l locale /d dual | alternation
not /u Unicode [] character class
and /e evaluate /ee rpts \b boundary
or xor /g global \z string end
/o compile pat once () capture
DEBUG (?:p) no capture
-MO=Deparse REGEX CHARCLASSES (?#t) comment
-MO=Terse . [^\n] (?=p) ZW pos ahead
-D## \s whitespace (?!p) ZW neg ahead
-d:Trace \w word chars (?<=p) ZW pos behind \K
\d digits (?<!p) ZW neg behind
CONFIGURATION \pP named property (?>p) no backtrack
perl -V:ivsize \h horiz.wh.space (?|p|p)branch reset
\R linebreak (?<n>p)named capture
\S \W \D \H negate \g{n} ref to named cap
\K keep left part
FUNCTION RETURN LISTS
stat localtime caller SPECIAL VARIABLES
0 dev 0 second 0 package $_ default variable
1 ino 1 minute 1 filename $0 program name
2 mode 2 hour 2 line $/ input separator
3 nlink 3 day 3 subroutine $\ output separator
4 uid 4 month-1 4 hasargs $| autoflush
5 gid 5 year-1900 5 wantarray $! sys/libcall error
6 rdev 6 weekday 6 evaltext $@ eval error
7 size 7 yearday 7 is_require $$ process ID
8 atime 8 is_dst 8 hints $. line number
9 mtime 9 bitmask @ARGV command line args
10 ctime 10 hinthash @INC include paths
11 blksz 3..10 only @_ subroutine args
12 blcks with EXPR %ENV environment
=head1 ACKNOWLEDGEMENTS
The first version of this document appeared on Perl Monks, where several
people had useful suggestions. Thank you, Perl Monks.
A special thanks to Damian Conway, who didn't only suggest important changes,
but also took the time to count the number of listed features and make a
Perl 6 version to show that Perl will stay Perl.
=head1 AUTHOR
Juerd Waalboer <#####@juerd.nl>, with the help of many Perl Monks.
=head1 SEE ALSO
=over 4
=item *
L<http://perlmonks.org/?node_id=216602> - the original PM post
=item *
L<http://perlmonks.org/?node_id=238031> - Damian Conway's Perl 6 version
=item *
L<http://juerd.nl/site.plp/perlcheat> - home of the Perl Cheat Sheet
=back
PK z3�Z2-z z perlbs2000.podnu �[��� This document is written in pod format hence there are punctuation
characters in odd places. Do not worry, you've apparently got the
ASCII->EBCDIC translation worked out correctly. You can read more
about pod in pod/perlpod.pod or the short summary in the INSTALL file.
=head1 NAME
perlbs2000 - building and installing Perl for BS2000.
B<This document needs to be updated, but we don't know what it should say.
Please email comments to L<perlbug@perl.org|mailto:perlbug@perl.org>.>
=head1 SYNOPSIS
This document will help you Configure, build, test and install Perl
on BS2000 in the POSIX subsystem.
=head1 DESCRIPTION
This is a ported perl for the POSIX subsystem in BS2000 VERSION OSD
V3.1A or later. It may work on other versions, but we started porting
and testing it with 3.1A and are currently using Version V4.0A.
You may need the following GNU programs in order to install perl:
=head2 gzip on BS2000
We used version 1.2.4, which could be installed out of the box with
one failure during 'make check'.
=head2 bison on BS2000
The yacc coming with BS2000 POSIX didn't work for us. So we had to
use bison. We had to make a few changes to perl in order to use the
pure (reentrant) parser of bison. We used version 1.25, but we had to
add a few changes due to EBCDIC. See below for more details
concerning yacc.
=head2 Unpacking Perl Distribution on BS2000
To extract an ASCII tar archive on BS2000 POSIX you need an ASCII
filesystem (we used the mountpoint /usr/local/ascii for this). Now
you extract the archive in the ASCII filesystem without
I/O-conversion:
cd /usr/local/ascii
export IO_CONVERSION=NO
gunzip < /usr/local/src/perl.tar.gz | pax -r
You may ignore the error message for the first element of the archive
(this doesn't look like a tar archive / skipping to next file...),
it's only the directory which will be created automatically anyway.
After extracting the archive you copy the whole directory tree to your
EBCDIC filesystem. B<This time you use I/O-conversion>:
cd /usr/local/src
IO_CONVERSION=YES
cp -r /usr/local/ascii/perl5.005_02 ./
=head2 Compiling Perl on BS2000
There is a "hints" file for BS2000 called hints.posix-bc (because
posix-bc is the OS name given by `uname`) that specifies the correct
values for most things. The major problem is (of course) the EBCDIC
character set. We have german EBCDIC version.
Because of our problems with the native yacc we used GNU bison to
generate a pure (=reentrant) parser for perly.y. So our yacc is
really the following script:
-----8<-----/usr/local/bin/yacc-----8<-----
#! /usr/bin/sh
# Bison as a reentrant yacc:
# save parameters:
params=""
while [[ $# -gt 1 ]]; do
params="$params $1"
shift
done
# add flag %pure_parser:
tmpfile=/tmp/bison.$$.y
echo %pure_parser > $tmpfile
cat $1 >> $tmpfile
# call bison:
echo "/usr/local/bin/bison --yacc $params $1\t\t\t(Pure Parser)"
/usr/local/bin/bison --yacc $params $tmpfile
# cleanup:
rm -f $tmpfile
-----8<----------8<-----
We still use the normal yacc for a2p.y though!!! We made a softlink
called byacc to distinguish between the two versions:
ln -s /usr/bin/yacc /usr/local/bin/byacc
We build perl using GNU make. We tried the native make once and it
worked too.
=head2 Testing Perl on BS2000
We still got a few errors during C<make test>. Some of them are the
result of using bison. Bison prints I<parser error> instead of I<syntax
error>, so we may ignore them. The following list shows
our errors, your results may differ:
op/numconvert.......FAILED tests 1409-1440
op/regexp...........FAILED tests 483, 496
op/regexp_noamp.....FAILED tests 483, 496
pragma/overload.....FAILED tests 152-153, 170-171
pragma/warnings.....FAILED tests 14, 82, 129, 155, 192, 205, 207
lib/bigfloat........FAILED tests 351-352, 355
lib/bigfltpm........FAILED tests 354-355, 358
lib/complex.........FAILED tests 267, 487
lib/dumper..........FAILED tests 43, 45
Failed 11/231 test scripts, 95.24% okay. 57/10595 subtests failed, 99.46% okay.
=head2 Installing Perl on BS2000
We have no nroff on BS2000 POSIX (yet), so we ignored any errors while
installing the documentation.
=head2 Using Perl in the Posix-Shell of BS2000
BS2000 POSIX doesn't support the shebang notation
(C<#!/usr/local/bin/perl>), so you have to use the following lines
instead:
: # use perl
eval 'exec /usr/local/bin/perl -S $0 ${1+"$@"}'
if $running_under_some_shell;
=head2 Using Perl in "native" BS2000
We don't have much experience with this yet, but try the following:
Copy your Perl executable to a BS2000 LLM using bs2cp:
C<bs2cp /usr/local/bin/perl 'bs2:perl(perl,l)'>
Now you can start it with the following (SDF) command:
C</START-PROG FROM-FILE=*MODULE(PERL,PERL),PROG-MODE=*ANY,RUN-MODE=*ADV>
First you get the BS2000 commandline prompt ('*'). Here you may enter
your parameters, e.g. C<-e 'print "Hello World!\\n";'> (note the
double backslash!) or C<-w> and the name of your Perl script.
Filenames starting with C</> are searched in the Posix filesystem,
others are searched in the BS2000 filesystem. You may even use
wildcards if you put a C<%> in front of your filename (e.g. C<-w
checkfiles.pl %*.c>). Read your C/C++ manual for additional
possibilities of the commandline prompt (look for
PARAMETER-PROMPTING).
=head2 Floating point anomalies on BS2000
There appears to be a bug in the floating point implementation on BS2000 POSIX
systems such that calling int() on the product of a number and a small
magnitude number is not the same as calling int() on the quotient of
that number and a large magnitude number. For example, in the following
Perl code:
my $x = 100000.0;
my $y = int($x * 1e-5) * 1e5; # '0'
my $z = int($x / 1e+5) * 1e5; # '100000'
print "\$y is $y and \$z is $z\n"; # $y is 0 and $z is 100000
Although one would expect the quantities $y and $z to be the same and equal
to 100000 they will differ and instead will be 0 and 100000 respectively.
=head2 Using PerlIO and different encodings on ASCII and EBCDIC partitions
Since version 5.8 Perl uses the new PerlIO on BS2000. This enables
you using different encodings per IO channel. For example you may use
use Encode;
open($f, ">:encoding(ascii)", "test.ascii");
print $f "Hello World!\n";
open($f, ">:encoding(posix-bc)", "test.ebcdic");
print $f "Hello World!\n";
open($f, ">:encoding(latin1)", "test.latin1");
print $f "Hello World!\n";
open($f, ">:encoding(utf8)", "test.utf8");
print $f "Hello World!\n";
to get two files containing "Hello World!\n" in ASCII, EBCDIC, ISO
Latin-1 (in this example identical to ASCII) respective UTF-EBCDIC (in
this example identical to normal EBCDIC). See the documentation of
Encode::PerlIO for details.
As the PerlIO layer uses raw IO internally, all this totally ignores
the type of your filesystem (ASCII or EBCDIC) and the IO_CONVERSION
environment variable. If you want to get the old behavior, that the
BS2000 IO functions determine conversion depending on the filesystem
PerlIO still is your friend. You use IO_CONVERSION as usual and tell
Perl, that it should use the native IO layer:
export IO_CONVERSION=YES
export PERLIO=stdio
Now your IO would be ASCII on ASCII partitions and EBCDIC on EBCDIC
partitions. See the documentation of PerlIO (without C<Encode::>!)
for further possibilities.
=head1 AUTHORS
Thomas Dorner
=head1 SEE ALSO
L<INSTALL>, L<perlport>.
=head2 Mailing list
If you are interested in the z/OS (formerly known as OS/390)
and POSIX-BC (BS2000) ports of Perl then see the perl-mvs mailing list.
To subscribe, send an empty message to perl-mvs-subscribe@perl.org.
See also:
http://lists.perl.org/list/perl-mvs.html
There are web archives of the mailing list at:
http://www.xray.mpe.mpg.de/mailing-lists/perl-mvs/
http://archive.develooper.com/perl-mvs@perl.org/
=head1 HISTORY
This document was originally written by Thomas Dorner for the 5.005
release of Perl.
This document was podified for the 5.6 release of perl 11 July 2000.
=cut
PK z3�Z��})� )� perl561delta.podnu �[��� =head1 NAME
perl561delta - what's new for perl v5.6.1
=head1 DESCRIPTION
This document describes differences between the 5.005 release and the 5.6.1
release.
=head1 Summary of changes between 5.6.0 and 5.6.1
This section contains a summary of the changes between the 5.6.0 release
and the 5.6.1 release. More details about the changes mentioned here
may be found in the F<Changes> files that accompany the Perl source
distribution. See L<perlhack> for pointers to online resources where you
can inspect the individual patches described by these changes.
=head2 Security Issues
suidperl will not run /bin/mail anymore, because some platforms have
a /bin/mail that is vulnerable to buffer overflow attacks.
Note that suidperl is neither built nor installed by default in
any recent version of perl. Use of suidperl is highly discouraged.
If you think you need it, try alternatives such as sudo first.
See http://www.courtesan.com/sudo/ .
=head2 Core bug fixes
This is not an exhaustive list. It is intended to cover only the
significant user-visible changes.
=over
=item C<UNIVERSAL::isa()>
A bug in the caching mechanism used by C<UNIVERSAL::isa()> that affected
base.pm has been fixed. The bug has existed since the 5.005 releases,
but wasn't tickled by base.pm in those releases.
=item Memory leaks
Various cases of memory leaks and attempts to access uninitialized memory
have been cured. See L</"Known Problems"> below for further issues.
=item Numeric conversions
Numeric conversions did not recognize changes in the string value
properly in certain circumstances.
In other situations, large unsigned numbers (those above 2**31) could
sometimes lose their unsignedness, causing bogus results in arithmetic
operations.
Integer modulus on large unsigned integers sometimes returned
incorrect values.
Perl 5.6.0 generated "not a number" warnings on certain conversions where
previous versions didn't.
These problems have all been rectified.
Infinity is now recognized as a number.
=item qw(a\\b)
In Perl 5.6.0, qw(a\\b) produced a string with two backslashes instead
of one, in a departure from the behavior in previous versions. The
older behavior has been reinstated.
=item caller()
caller() could cause core dumps in certain situations. Carp was sometimes
affected by this problem.
=item Bugs in regular expressions
Pattern matches on overloaded values are now handled correctly.
Perl 5.6.0 parsed m/\x{ab}/ incorrectly, leading to spurious warnings.
This has been corrected.
The RE engine found in Perl 5.6.0 accidentally pessimised certain kinds
of simple pattern matches. These are now handled better.
Regular expression debug output (whether through C<use re 'debug'>
or via C<-Dr>) now looks better.
Multi-line matches like C<"a\nxb\n" =~ /(?!\A)x/m> were flawed. The
bug has been fixed.
Use of $& could trigger a core dump under some situations. This
is now avoided.
Match variables $1 et al., weren't being unset when a pattern match
was backtracking, and the anomaly showed up inside C</...(?{ ... }).../>
etc. These variables are now tracked correctly.
pos() did not return the correct value within s///ge in earlier
versions. This is now handled correctly.
=item "slurp" mode
readline() on files opened in "slurp" mode could return an extra "" at
the end in certain situations. This has been corrected.
=item Autovivification of symbolic references to special variables
Autovivification of symbolic references of special variables described
in L<perlvar> (as in C<${$num}>) was accidentally disabled. This works
again now.
=item Lexical warnings
Lexical warnings now propagate correctly into C<eval "...">.
C<use warnings qw(FATAL all)> did not work as intended. This has been
corrected.
Lexical warnings could leak into other scopes in some situations.
This is now fixed.
warnings::enabled() now reports the state of $^W correctly if the caller
isn't using lexical warnings.
=item Spurious warnings and errors
Perl 5.6.0 could emit spurious warnings about redefinition of dl_error()
when statically building extensions into perl. This has been corrected.
"our" variables could result in bogus "Variable will not stay shared"
warnings. This is now fixed.
"our" variables of the same name declared in two sibling blocks
resulted in bogus warnings about "redeclaration" of the variables.
The problem has been corrected.
=item glob()
Compatibility of the builtin glob() with old csh-based glob has been
improved with the addition of GLOB_ALPHASORT option. See C<File::Glob>.
File::Glob::glob() has been renamed to File::Glob::bsd_glob()
because the name clashes with the builtin glob(). The older
name is still available for compatibility, but is deprecated.
Spurious syntax errors generated in certain situations, when glob()
caused File::Glob to be loaded for the first time, have been fixed.
=item Tainting
Some cases of inconsistent taint propagation (such as within hash
values) have been fixed.
The tainting behavior of sprintf() has been rationalized. It does
not taint the result of floating point formats anymore, making the
behavior consistent with that of string interpolation.
=item sort()
Arguments to sort() weren't being provided the right wantarray() context.
The comparison block is now run in scalar context, and the arguments to
be sorted are always provided list context.
sort() is also fully reentrant, in the sense that the sort function
can itself call sort(). This did not work reliably in previous releases.
=item #line directives
#line directives now work correctly when they appear at the very
beginning of C<eval "...">.
=item Subroutine prototypes
The (\&) prototype now works properly.
=item map()
map() could get pathologically slow when the result list it generates
is larger than the source list. The performance has been improved for
common scenarios.
=item Debugger
Debugger exit code now reflects the script exit code.
Condition C<"0"> in breakpoints is now treated correctly.
The C<d> command now checks the line number.
C<$.> is no longer corrupted by the debugger.
All debugger output now correctly goes to the socket if RemotePort
is set.
=item PERL5OPT
PERL5OPT can be set to more than one switch group. Previously,
it used to be limited to one group of options only.
=item chop()
chop(@list) in list context returned the characters chopped in reverse
order. This has been reversed to be in the right order.
=item Unicode support
Unicode support has seen a large number of incremental improvements,
but continues to be highly experimental. It is not expected to be
fully supported in the 5.6.x maintenance releases.
substr(), join(), repeat(), reverse(), quotemeta() and string
concatenation were all handling Unicode strings incorrectly in
Perl 5.6.0. This has been corrected.
Support for C<tr///CU> and C<tr///UC> etc., have been removed since
we realized the interface is broken. For similar functionality,
see L<perlfunc/pack>.
The Unicode Character Database has been updated to version 3.0.1
with additions made available to the public as of August 30, 2000.
The Unicode character classes \p{Blank} and \p{SpacePerl} have been
added. "Blank" is like C isblank(), that is, it contains only
"horizontal whitespace" (the space character is, the newline isn't),
and the "SpacePerl" is the Unicode equivalent of C<\s> (\p{Space}
isn't, since that includes the vertical tabulator character, whereas
C<\s> doesn't.)
If you are experimenting with Unicode support in perl, the development
versions of Perl may have more to offer. In particular, I/O layers
are now available in the development track, but not in the maintenance
track, primarily to do backward compatibility issues. Unicode support
is also evolving rapidly on a daily basis in the development track--the
maintenance track only reflects the most conservative of these changes.
=item 64-bit support
Support for 64-bit platforms has been improved, but continues to be
experimental. The level of support varies greatly among platforms.
=item Compiler
The B Compiler and its various backends have had many incremental
improvements, but they continue to remain highly experimental. Use in
production environments is discouraged.
The perlcc tool has been rewritten so that the user interface is much
more like that of a C compiler.
The perlbc tools has been removed. Use C<perlcc -B> instead.
=item Lvalue subroutines
There have been various bugfixes to support lvalue subroutines better.
However, the feature still remains experimental.
=item IO::Socket
IO::Socket::INET failed to open the specified port if the service
name was not known. It now correctly uses the supplied port number
as is.
=item File::Find
File::Find now chdir()s correctly when chasing symbolic links.
=item xsubpp
xsubpp now tolerates embedded POD sections.
=item C<no Module;>
C<no Module;> does not produce an error even if Module does not have an
unimport() method. This parallels the behavior of C<use> vis-a-vis
C<import>.
=item Tests
A large number of tests have been added.
=back
=head2 Core features
untie() will now call an UNTIE() hook if it exists. See L<perltie>
for details.
The C<-DT> command line switch outputs copious tokenizing information.
See L<perlrun>.
Arrays are now always interpolated in double-quotish strings. Previously,
C<"foo@bar.com"> used to be a fatal error at compile time, if an array
C<@bar> was not used or declared. This transitional behavior was
intended to help migrate perl4 code, and is deemed to be no longer useful.
See L</"Arrays now always interpolate into double-quoted strings">.
keys(), each(), pop(), push(), shift(), splice() and unshift()
can all be overridden now.
C<my __PACKAGE__ $obj> now does the expected thing.
=head2 Configuration issues
On some systems (IRIX and Solaris among them) the system malloc is demonstrably
better. While the defaults haven't been changed in order to retain binary
compatibility with earlier releases, you may be better off building perl
with C<Configure -Uusemymalloc ...> as discussed in the F<INSTALL> file.
C<Configure> has been enhanced in various ways:
=over
=item *
Minimizes use of temporary files.
=item *
By default, does not link perl with libraries not used by it, such as
the various dbm libraries. SunOS 4.x hints preserve behavior on that
platform.
=item *
Support for pdp11-style memory models has been removed due to obsolescence.
=item *
Building outside the source tree is supported on systems that have
symbolic links. This is done by running
sh /path/to/source/Configure -Dmksymlinks ...
make all test install
in a directory other than the perl source directory. See F<INSTALL>.
=item *
C<Configure -S> can be run non-interactively.
=back
=head2 Documentation
README.aix, README.solaris and README.macos have been added.
README.posix-bc has been renamed to README.bs2000. These are
installed as L<perlaix>, L<perlsolaris>, L<perlmacos>, and
L<perlbs2000> respectively.
The following pod documents are brand new:
perlclib Internal replacements for standard C library functions
perldebtut Perl debugging tutorial
perlebcdic Considerations for running Perl on EBCDIC platforms
perlnewmod Perl modules: preparing a new module for distribution
perlrequick Perl regular expressions quick start
perlretut Perl regular expressions tutorial
perlutil utilities packaged with the Perl distribution
The F<INSTALL> file has been expanded to cover various issues, such as
64-bit support.
A longer list of contributors has been added to the source distribution.
See the file C<AUTHORS>.
Numerous other changes have been made to the included documentation and FAQs.
=head2 Bundled modules
The following modules have been added.
=over
=item B::Concise
Walks Perl syntax tree, printing concise info about ops. See L<B::Concise>.
=item File::Temp
Returns name and handle of a temporary file safely. See L<File::Temp>.
=item Pod::LaTeX
Converts Pod data to formatted LaTeX. See L<Pod::LaTeX>.
=item Pod::Text::Overstrike
Converts POD data to formatted overstrike text. See L<Pod::Text::Overstrike>.
=back
The following modules have been upgraded.
=over
=item CGI
CGI v2.752 is now included.
=item CPAN
CPAN v1.59_54 is now included.
=item Class::Struct
Various bugfixes have been added.
=item DB_File
DB_File v1.75 supports newer Berkeley DB versions, among other
improvements.
=item Devel::Peek
Devel::Peek has been enhanced to support dumping of memory statistics,
when perl is built with the included malloc().
=item File::Find
File::Find now supports pre and post-processing of the files in order
to sort() them, etc.
=item Getopt::Long
Getopt::Long v2.25 is included.
=item IO::Poll
Various bug fixes have been included.
=item IPC::Open3
IPC::Open3 allows use of numeric file descriptors.
=item Math::BigFloat
The fmod() function supports modulus operations. Various bug fixes
have also been included.
=item Math::Complex
Math::Complex handles inf, NaN etc., better.
=item Net::Ping
ping() could fail on odd number of data bytes, and when the echo service
isn't running. This has been corrected.
=item Opcode
A memory leak has been fixed.
=item Pod::Parser
Version 1.13 of the Pod::Parser suite is included.
=item Pod::Text
Pod::Text and related modules have been upgraded to the versions
in podlators suite v2.08.
=item SDBM_File
On dosish platforms, some keys went missing because of lack of support for
files with "holes". A workaround for the problem has been added.
=item Sys::Syslog
Various bug fixes have been included.
=item Tie::RefHash
Now supports Tie::RefHash::Nestable to automagically tie hashref values.
=item Tie::SubstrHash
Various bug fixes have been included.
=back
=head2 Platform-specific improvements
The following new ports are now available.
=over
=item NCR MP-RAS
=item NonStop-UX
=back
Perl now builds under Amdahl UTS.
Perl has also been verified to build under Amiga OS.
Support for EPOC has been much improved. See README.epoc.
Building perl with -Duseithreads or -Duse5005threads now works
under HP-UX 10.20 (previously it only worked under 10.30 or later).
You will need a thread library package installed. See README.hpux.
Long doubles should now work under Linux.
Mac OS Classic is now supported in the mainstream source package.
See README.macos.
Support for MPE/iX has been updated. See README.mpeix.
Support for OS/2 has been improved. See C<os2/Changes> and README.os2.
Dynamic loading on z/OS (formerly OS/390) has been improved. See
README.os390.
Support for VMS has seen many incremental improvements, including
better support for operators like backticks and system(), and better
%ENV handling. See C<README.vms> and L<perlvms>.
Support for Stratus VOS has been improved. See C<vos/Changes> and README.vos.
Support for Windows has been improved.
=over
=item *
fork() emulation has been improved in various ways, but still continues
to be experimental. See L<perlfork> for known bugs and caveats.
=item *
%SIG has been enabled under USE_ITHREADS, but its use is completely
unsupported under all configurations.
=item *
Borland C++ v5.5 is now a supported compiler that can build Perl.
However, the generated binaries continue to be incompatible with those
generated by the other supported compilers (GCC and Visual C++).
=item *
Non-blocking waits for child processes (or pseudo-processes) are
supported via C<waitpid($pid, &POSIX::WNOHANG)>.
=item *
A memory leak in accept() has been fixed.
=item *
wait(), waitpid() and backticks now return the correct exit status under
Windows 9x.
=item *
Trailing new %ENV entries weren't propagated to child processes. This
is now fixed.
=item *
Current directory entries in %ENV are now correctly propagated to child
processes.
=item *
Duping socket handles with open(F, ">&MYSOCK") now works under Windows 9x.
=item *
The makefiles now provide a single switch to bulk-enable all the features
enabled in ActiveState ActivePerl (a popular binary distribution).
=item *
Win32::GetCwd() correctly returns C:\ instead of C: when at the drive root.
Other bugs in chdir() and Cwd::cwd() have also been fixed.
=item *
fork() correctly returns undef and sets EAGAIN when it runs out of
pseudo-process handles.
=item *
ExtUtils::MakeMaker now uses $ENV{LIB} to search for libraries.
=item *
UNC path handling is better when perl is built to support fork().
=item *
A handle leak in socket handling has been fixed.
=item *
send() works from within a pseudo-process.
=back
Unless specifically qualified otherwise, the remainder of this document
covers changes between the 5.005 and 5.6.0 releases.
=head1 Core Enhancements
=head2 Interpreter cloning, threads, and concurrency
Perl 5.6.0 introduces the beginnings of support for running multiple
interpreters concurrently in different threads. In conjunction with
the perl_clone() API call, which can be used to selectively duplicate
the state of any given interpreter, it is possible to compile a
piece of code once in an interpreter, clone that interpreter
one or more times, and run all the resulting interpreters in distinct
threads.
On the Windows platform, this feature is used to emulate fork() at the
interpreter level. See L<perlfork> for details about that.
This feature is still in evolution. It is eventually meant to be used
to selectively clone a subroutine and data reachable from that
subroutine in a separate interpreter and run the cloned subroutine
in a separate thread. Since there is no shared data between the
interpreters, little or no locking will be needed (unless parts of
the symbol table are explicitly shared). This is obviously intended
to be an easy-to-use replacement for the existing threads support.
Support for cloning interpreters and interpreter concurrency can be
enabled using the -Dusethreads Configure option (see win32/Makefile for
how to enable it on Windows.) The resulting perl executable will be
functionally identical to one that was built with -Dmultiplicity, but
the perl_clone() API call will only be available in the former.
-Dusethreads enables the cpp macro USE_ITHREADS by default, which in turn
enables Perl source code changes that provide a clear separation between
the op tree and the data it operates with. The former is immutable, and
can therefore be shared between an interpreter and all of its clones,
while the latter is considered local to each interpreter, and is therefore
copied for each clone.
Note that building Perl with the -Dusemultiplicity Configure option
is adequate if you wish to run multiple B<independent> interpreters
concurrently in different threads. -Dusethreads only provides the
additional functionality of the perl_clone() API call and other
support for running B<cloned> interpreters concurrently.
NOTE: This is an experimental feature. Implementation details are
subject to change.
=head2 Lexically scoped warning categories
You can now control the granularity of warnings emitted by perl at a finer
level using the C<use warnings> pragma. L<warnings> and L<perllexwarn>
have copious documentation on this feature.
=head2 Unicode and UTF-8 support
Perl now uses UTF-8 as its internal representation for character
strings. The C<utf8> and C<bytes> pragmas are used to control this support
in the current lexical scope. See L<perlunicode>, L<utf8> and L<bytes> for
more information.
This feature is expected to evolve quickly to support some form of I/O
disciplines that can be used to specify the kind of input and output data
(bytes or characters). Until that happens, additional modules from CPAN
will be needed to complete the toolkit for dealing with Unicode.
NOTE: This should be considered an experimental feature. Implementation
details are subject to change.
=head2 Support for interpolating named characters
The new C<\N> escape interpolates named characters within strings.
For example, C<"Hi! \N{WHITE SMILING FACE}"> evaluates to a string
with a Unicode smiley face at the end.
=head2 "our" declarations
An "our" declaration introduces a value that can be best understood
as a lexically scoped symbolic alias to a global variable in the
package that was current where the variable was declared. This is
mostly useful as an alternative to the C<vars> pragma, but also provides
the opportunity to introduce typing and other attributes for such
variables. See L<perlfunc/our>.
=head2 Support for strings represented as a vector of ordinals
Literals of the form C<v1.2.3.4> are now parsed as a string composed
of characters with the specified ordinals. This is an alternative, more
readable way to construct (possibly Unicode) strings instead of
interpolating characters, as in C<"\x{1}\x{2}\x{3}\x{4}">. The leading
C<v> may be omitted if there are more than two ordinals, so C<1.2.3> is
parsed the same as C<v1.2.3>.
Strings written in this form are also useful to represent version "numbers".
It is easy to compare such version "numbers" (which are really just plain
strings) using any of the usual string comparison operators C<eq>, C<ne>,
C<lt>, C<gt>, etc., or perform bitwise string operations on them using C<|>,
C<&>, etc.
In conjunction with the new C<$^V> magic variable (which contains
the perl version as a string), such literals can be used as a readable way
to check if you're running a particular version of Perl:
# this will parse in older versions of Perl also
if ($^V and $^V gt v5.6.0) {
# new features supported
}
C<require> and C<use> also have some special magic to support such literals.
They will be interpreted as a version rather than as a module name:
require v5.6.0; # croak if $^V lt v5.6.0
use v5.6.0; # same, but croaks at compile-time
Alternatively, the C<v> may be omitted if there is more than one dot:
require 5.6.0;
use 5.6.0;
Also, C<sprintf> and C<printf> support the Perl-specific format flag C<%v>
to print ordinals of characters in arbitrary strings:
printf "v%vd", $^V; # prints current version, such as "v5.5.650"
printf "%*vX", ":", $addr; # formats IPv6 address
printf "%*vb", " ", $bits; # displays bitstring
See L<perldata/"Scalar value constructors"> for additional information.
=head2 Improved Perl version numbering system
Beginning with Perl version 5.6.0, the version number convention has been
changed to a "dotted integer" scheme that is more commonly found in open
source projects.
Maintenance versions of v5.6.0 will be released as v5.6.1, v5.6.2 etc.
The next development series following v5.6.0 will be numbered v5.7.x,
beginning with v5.7.0, and the next major production release following
v5.6.0 will be v5.8.0.
The English module now sets $PERL_VERSION to $^V (a string value) rather
than C<$]> (a numeric value). (This is a potential incompatibility.
Send us a report via perlbug if you are affected by this.)
The v1.2.3 syntax is also now legal in Perl.
See L</Support for strings represented as a vector of ordinals> for more on that.
To cope with the new versioning system's use of at least three significant
digits for each version component, the method used for incrementing the
subversion number has also changed slightly. We assume that versions older
than v5.6.0 have been incrementing the subversion component in multiples of
10. Versions after v5.6.0 will increment them by 1. Thus, using the new
notation, 5.005_03 is the "same" as v5.5.30, and the first maintenance
version following v5.6.0 will be v5.6.1 (which should be read as being
equivalent to a floating point value of 5.006_001 in the older format,
stored in C<$]>).
=head2 New syntax for declaring subroutine attributes
Formerly, if you wanted to mark a subroutine as being a method call or
as requiring an automatic lock() when it is entered, you had to declare
that with a C<use attrs> pragma in the body of the subroutine.
That can now be accomplished with declaration syntax, like this:
sub mymethod : locked method;
...
sub mymethod : locked method {
...
}
sub othermethod :locked :method;
...
sub othermethod :locked :method {
...
}
(Note how only the first C<:> is mandatory, and whitespace surrounding
the C<:> is optional.)
F<AutoSplit.pm> and F<SelfLoader.pm> have been updated to keep the attributes
with the stubs they provide. See L<attributes>.
=head2 File and directory handles can be autovivified
Similar to how constructs such as C<< $x->[0] >> autovivify a reference,
handle constructors (open(), opendir(), pipe(), socketpair(), sysopen(),
socket(), and accept()) now autovivify a file or directory handle
if the handle passed to them is an uninitialized scalar variable. This
allows the constructs such as C<open(my $fh, ...)> and C<open(local $fh,...)>
to be used to create filehandles that will conveniently be closed
automatically when the scope ends, provided there are no other references
to them. This largely eliminates the need for typeglobs when opening
filehandles that must be passed around, as in the following example:
sub myopen {
open my $fh, "@_"
or die "Can't open '@_': $!";
return $fh;
}
{
my $f = myopen("</etc/motd");
print <$f>;
# $f implicitly closed here
}
=head2 open() with more than two arguments
If open() is passed three arguments instead of two, the second argument
is used as the mode and the third argument is taken to be the file name.
This is primarily useful for protecting against unintended magic behavior
of the traditional two-argument form. See L<perlfunc/open>.
=head2 64-bit support
Any platform that has 64-bit integers either
(1) natively as longs or ints
(2) via special compiler flags
(3) using long long or int64_t
is able to use "quads" (64-bit integers) as follows:
=over 4
=item *
constants (decimal, hexadecimal, octal, binary) in the code
=item *
arguments to oct() and hex()
=item *
arguments to print(), printf() and sprintf() (flag prefixes ll, L, q)
=item *
printed as such
=item *
pack() and unpack() "q" and "Q" formats
=item *
in basic arithmetics: + - * / % (NOTE: operating close to the limits
of the integer values may produce surprising results)
=item *
in bit arithmetics: & | ^ ~ << >> (NOTE: these used to be forced
to be 32 bits wide but now operate on the full native width.)
=item *
vec()
=back
Note that unless you have the case (a) you will have to configure
and compile Perl using the -Duse64bitint Configure flag.
NOTE: The Configure flags -Duselonglong and -Duse64bits have been
deprecated. Use -Duse64bitint instead.
There are actually two modes of 64-bitness: the first one is achieved
using Configure -Duse64bitint and the second one using Configure
-Duse64bitall. The difference is that the first one is minimal and
the second one maximal. The first works in more places than the second.
The C<use64bitint> does only as much as is required to get 64-bit
integers into Perl (this may mean, for example, using "long longs")
while your memory may still be limited to 2 gigabytes (because your
pointers could still be 32-bit). Note that the name C<64bitint> does
not imply that your C compiler will be using 64-bit C<int>s (it might,
but it doesn't have to): the C<use64bitint> means that you will be
able to have 64 bits wide scalar values.
The C<use64bitall> goes all the way by attempting to switch also
integers (if it can), longs (and pointers) to being 64-bit. This may
create an even more binary incompatible Perl than -Duse64bitint: the
resulting executable may not run at all in a 32-bit box, or you may
have to reboot/reconfigure/rebuild your operating system to be 64-bit
aware.
Natively 64-bit systems like Alpha and Cray need neither -Duse64bitint
nor -Duse64bitall.
Last but not least: note that due to Perl's habit of always using
floating point numbers, the quads are still not true integers.
When quads overflow their limits (0...18_446_744_073_709_551_615 unsigned,
-9_223_372_036_854_775_808...9_223_372_036_854_775_807 signed), they
are silently promoted to floating point numbers, after which they will
start losing precision (in their lower digits).
NOTE: 64-bit support is still experimental on most platforms.
Existing support only covers the LP64 data model. In particular, the
LLP64 data model is not yet supported. 64-bit libraries and system
APIs on many platforms have not stabilized--your mileage may vary.
=head2 Large file support
If you have filesystems that support "large files" (files larger than
2 gigabytes), you may now also be able to create and access them from
Perl.
NOTE: The default action is to enable large file support, if
available on the platform.
If the large file support is on, and you have a Fcntl constant
O_LARGEFILE, the O_LARGEFILE is automatically added to the flags
of sysopen().
Beware that unless your filesystem also supports "sparse files" seeking
to umpteen petabytes may be inadvisable.
Note that in addition to requiring a proper file system to do large
files you may also need to adjust your per-process (or your
per-system, or per-process-group, or per-user-group) maximum filesize
limits before running Perl scripts that try to handle large files,
especially if you intend to write such files.
Finally, in addition to your process/process group maximum filesize
limits, you may have quota limits on your filesystems that stop you
(your user id or your user group id) from using large files.
Adjusting your process/user/group/file system/operating system limits
is outside the scope of Perl core language. For process limits, you
may try increasing the limits using your shell's limits/limit/ulimit
command before running Perl. The BSD::Resource extension (not
included with the standard Perl distribution) may also be of use, it
offers the getrlimit/setrlimit interface that can be used to adjust
process resource usage limits, including the maximum filesize limit.
=head2 Long doubles
In some systems you may be able to use long doubles to enhance the
range and precision of your double precision floating point numbers
(that is, Perl's numbers). Use Configure -Duselongdouble to enable
this support (if it is available).
=head2 "more bits"
You can "Configure -Dusemorebits" to turn on both the 64-bit support
and the long double support.
=head2 Enhanced support for sort() subroutines
Perl subroutines with a prototype of C<($$)>, and XSUBs in general, can
now be used as sort subroutines. In either case, the two elements to
be compared are passed as normal parameters in @_. See L<perlfunc/sort>.
For unprototyped sort subroutines, the historical behavior of passing
the elements to be compared as the global variables $a and $b remains
unchanged.
=head2 C<sort $coderef @foo> allowed
sort() did not accept a subroutine reference as the comparison
function in earlier versions. This is now permitted.
=head2 File globbing implemented internally
Perl now uses the File::Glob implementation of the glob() operator
automatically. This avoids using an external csh process and the
problems associated with it.
NOTE: This is currently an experimental feature. Interfaces and
implementation are subject to change.
=head2 Support for CHECK blocks
In addition to C<BEGIN>, C<INIT>, C<END>, C<DESTROY> and C<AUTOLOAD>,
subroutines named C<CHECK> are now special. These are queued up during
compilation and behave similar to END blocks, except they are called at
the end of compilation rather than at the end of execution. They cannot
be called directly.
=head2 POSIX character class syntax [: :] supported
For example to match alphabetic characters use /[[:alpha:]]/.
See L<perlre> for details.
=head2 Better pseudo-random number generator
In 5.005_0x and earlier, perl's rand() function used the C library
rand(3) function. As of 5.005_52, Configure tests for drand48(),
random(), and rand() (in that order) and picks the first one it finds.
These changes should result in better random numbers from rand().
=head2 Improved C<qw//> operator
The C<qw//> operator is now evaluated at compile time into a true list
instead of being replaced with a run time call to C<split()>. This
removes the confusing misbehaviour of C<qw//> in scalar context, which
had inherited that behaviour from split().
Thus:
$foo = ($bar) = qw(a b c); print "$foo|$bar\n";
now correctly prints "3|a", instead of "2|a".
=head2 Better worst-case behavior of hashes
Small changes in the hashing algorithm have been implemented in
order to improve the distribution of lower order bits in the
hashed value. This is expected to yield better performance on
keys that are repeated sequences.
=head2 pack() format 'Z' supported
The new format type 'Z' is useful for packing and unpacking null-terminated
strings. See L<perlfunc/"pack">.
=head2 pack() format modifier '!' supported
The new format type modifier '!' is useful for packing and unpacking
native shorts, ints, and longs. See L<perlfunc/"pack">.
=head2 pack() and unpack() support counted strings
The template character '/' can be used to specify a counted string
type to be packed or unpacked. See L<perlfunc/"pack">.
=head2 Comments in pack() templates
The '#' character in a template introduces a comment up to
end of the line. This facilitates documentation of pack()
templates.
=head2 Weak references
In previous versions of Perl, you couldn't cache objects so as
to allow them to be deleted if the last reference from outside
the cache is deleted. The reference in the cache would hold a
reference count on the object and the objects would never be
destroyed.
Another familiar problem is with circular references. When an
object references itself, its reference count would never go
down to zero, and it would not get destroyed until the program
is about to exit.
Weak references solve this by allowing you to "weaken" any
reference, that is, make it not count towards the reference count.
When the last non-weak reference to an object is deleted, the object
is destroyed and all the weak references to the object are
automatically undef-ed.
To use this feature, you need the Devel::WeakRef package from CPAN, which
contains additional documentation.
NOTE: This is an experimental feature. Details are subject to change.
=head2 Binary numbers supported
Binary numbers are now supported as literals, in s?printf formats, and
C<oct()>:
$answer = 0b101010;
printf "The answer is: %b\n", oct("0b101010");
=head2 Lvalue subroutines
Subroutines can now return modifiable lvalues.
See L<perlsub/"Lvalue subroutines">.
NOTE: This is an experimental feature. Details are subject to change.
=head2 Some arrows may be omitted in calls through references
Perl now allows the arrow to be omitted in many constructs
involving subroutine calls through references. For example,
C<< $foo[10]->('foo') >> may now be written C<$foo[10]('foo')>.
This is rather similar to how the arrow may be omitted from
C<< $foo[10]->{'foo'} >>. Note however, that the arrow is still
required for C<< foo(10)->('bar') >>.
=head2 Boolean assignment operators are legal lvalues
Constructs such as C<($a ||= 2) += 1> are now allowed.
=head2 exists() is supported on subroutine names
The exists() builtin now works on subroutine names. A subroutine
is considered to exist if it has been declared (even if implicitly).
See L<perlfunc/exists> for examples.
=head2 exists() and delete() are supported on array elements
The exists() and delete() builtins now work on simple arrays as well.
The behavior is similar to that on hash elements.
exists() can be used to check whether an array element has been
initialized. This avoids autovivifying array elements that don't exist.
If the array is tied, the EXISTS() method in the corresponding tied
package will be invoked.
delete() may be used to remove an element from the array and return
it. The array element at that position returns to its uninitialized
state, so that testing for the same element with exists() will return
false. If the element happens to be the one at the end, the size of
the array also shrinks up to the highest element that tests true for
exists(), or 0 if none such is found. If the array is tied, the DELETE()
method in the corresponding tied package will be invoked.
See L<perlfunc/exists> and L<perlfunc/delete> for examples.
=head2 Pseudo-hashes work better
Dereferencing some types of reference values in a pseudo-hash,
such as C<< $ph->{foo}[1] >>, was accidentally disallowed. This has
been corrected.
When applied to a pseudo-hash element, exists() now reports whether
the specified value exists, not merely if the key is valid.
delete() now works on pseudo-hashes. When given a pseudo-hash element
or slice it deletes the values corresponding to the keys (but not the keys
themselves). See L<perlref/"Pseudo-hashes: Using an array as a hash">.
Pseudo-hash slices with constant keys are now optimized to array lookups
at compile-time.
List assignments to pseudo-hash slices are now supported.
The C<fields> pragma now provides ways to create pseudo-hashes, via
fields::new() and fields::phash(). See L<fields>.
NOTE: The pseudo-hash data type continues to be experimental.
Limiting oneself to the interface elements provided by the
fields pragma will provide protection from any future changes.
=head2 Automatic flushing of output buffers
fork(), exec(), system(), qx//, and pipe open()s now flush buffers
of all files opened for output when the operation was attempted. This
mostly eliminates confusing buffering mishaps suffered by users unaware
of how Perl internally handles I/O.
This is not supported on some platforms like Solaris where a suitably
correct implementation of fflush(NULL) isn't available.
=head2 Better diagnostics on meaningless filehandle operations
Constructs such as C<< open(<FH>) >> and C<< close(<FH>) >>
are compile time errors. Attempting to read from filehandles that
were opened only for writing will now produce warnings (just as
writing to read-only filehandles does).
=head2 Where possible, buffered data discarded from duped input filehandle
C<< open(NEW, "<&OLD") >> now attempts to discard any data that
was previously read and buffered in C<OLD> before duping the handle.
On platforms where doing this is allowed, the next read operation
on C<NEW> will return the same data as the corresponding operation
on C<OLD>. Formerly, it would have returned the data from the start
of the following disk block instead.
=head2 eof() has the same old magic as <>
C<eof()> would return true if no attempt to read from C<< <> >> had
yet been made. C<eof()> has been changed to have a little magic of its
own, it now opens the C<< <> >> files.
=head2 binmode() can be used to set :crlf and :raw modes
binmode() now accepts a second argument that specifies a discipline
for the handle in question. The two pseudo-disciplines ":raw" and
":crlf" are currently supported on DOS-derivative platforms.
See L<perlfunc/"binmode"> and L<open>.
=head2 C<-T> filetest recognizes UTF-8 encoded files as "text"
The algorithm used for the C<-T> filetest has been enhanced to
correctly identify UTF-8 content as "text".
=head2 system(), backticks and pipe open now reflect exec() failure
On Unix and similar platforms, system(), qx() and open(FOO, "cmd |")
etc., are implemented via fork() and exec(). When the underlying
exec() fails, earlier versions did not report the error properly,
since the exec() happened to be in a different process.
The child process now communicates with the parent about the
error in launching the external command, which allows these
constructs to return with their usual error value and set $!.
=head2 Improved diagnostics
Line numbers are no longer suppressed (under most likely circumstances)
during the global destruction phase.
Diagnostics emitted from code running in threads other than the main
thread are now accompanied by the thread ID.
Embedded null characters in diagnostics now actually show up. They
used to truncate the message in prior versions.
$foo::a and $foo::b are now exempt from "possible typo" warnings only
if sort() is encountered in package C<foo>.
Unrecognized alphabetic escapes encountered when parsing quote
constructs now generate a warning, since they may take on new
semantics in later versions of Perl.
Many diagnostics now report the internal operation in which the warning
was provoked, like so:
Use of uninitialized value in concatenation (.) at (eval 1) line 1.
Use of uninitialized value in print at (eval 1) line 1.
Diagnostics that occur within eval may also report the file and line
number where the eval is located, in addition to the eval sequence
number and the line number within the evaluated text itself. For
example:
Not enough arguments for scalar at (eval 4)[newlib/perl5db.pl:1411] line 2, at EOF
=head2 Diagnostics follow STDERR
Diagnostic output now goes to whichever file the C<STDERR> handle
is pointing at, instead of always going to the underlying C runtime
library's C<stderr>.
=head2 More consistent close-on-exec behavior
On systems that support a close-on-exec flag on filehandles, the
flag is now set for any handles created by pipe(), socketpair(),
socket(), and accept(), if that is warranted by the value of $^F
that may be in effect. Earlier versions neglected to set the flag
for handles created with these operators. See L<perlfunc/pipe>,
L<perlfunc/socketpair>, L<perlfunc/socket>, L<perlfunc/accept>,
and L<perlvar/$^F>.
=head2 syswrite() ease-of-use
The length argument of C<syswrite()> has become optional.
=head2 Better syntax checks on parenthesized unary operators
Expressions such as:
print defined(&foo,&bar,&baz);
print uc("foo","bar","baz");
undef($foo,&bar);
used to be accidentally allowed in earlier versions, and produced
unpredictable behaviour. Some produced ancillary warnings
when used in this way; others silently did the wrong thing.
The parenthesized forms of most unary operators that expect a single
argument now ensure that they are not called with more than one
argument, making the cases shown above syntax errors. The usual
behaviour of:
print defined &foo, &bar, &baz;
print uc "foo", "bar", "baz";
undef $foo, &bar;
remains unchanged. See L<perlop>.
=head2 Bit operators support full native integer width
The bit operators (& | ^ ~ << >>) now operate on the full native
integral width (the exact size of which is available in $Config{ivsize}).
For example, if your platform is either natively 64-bit or if Perl
has been configured to use 64-bit integers, these operations apply
to 8 bytes (as opposed to 4 bytes on 32-bit platforms).
For portability, be sure to mask off the excess bits in the result of
unary C<~>, e.g., C<~$x & 0xffffffff>.
=head2 Improved security features
More potentially unsafe operations taint their results for improved
security.
The C<passwd> and C<shell> fields returned by the getpwent(), getpwnam(),
and getpwuid() are now tainted, because the user can affect their own
encrypted password and login shell.
The variable modified by shmread(), and messages returned by msgrcv()
(and its object-oriented interface IPC::SysV::Msg::rcv) are also tainted,
because other untrusted processes can modify messages and shared memory
segments for their own nefarious purposes.
=head2 More functional bareword prototype (*)
Bareword prototypes have been rationalized to enable them to be used
to override builtins that accept barewords and interpret them in
a special way, such as C<require> or C<do>.
Arguments prototyped as C<*> will now be visible within the subroutine
as either a simple scalar or as a reference to a typeglob.
See L<perlsub/Prototypes>.
=head2 C<require> and C<do> may be overridden
C<require> and C<do 'file'> operations may be overridden locally
by importing subroutines of the same name into the current package
(or globally by importing them into the CORE::GLOBAL:: namespace).
Overriding C<require> will also affect C<use>, provided the override
is visible at compile-time.
See L<perlsub/"Overriding Built-in Functions">.
=head2 $^X variables may now have names longer than one character
Formerly, $^X was synonymous with ${"\cX"}, but $^XY was a syntax
error. Now variable names that begin with a control character may be
arbitrarily long. However, for compatibility reasons, these variables
I<must> be written with explicit braces, as C<${^XY}> for example.
C<${^XYZ}> is synonymous with ${"\cXYZ"}. Variable names with more
than one control character, such as C<${^XY^Z}>, are illegal.
The old syntax has not changed. As before, `^X' may be either a
literal control-X character or the two-character sequence `caret' plus
`X'. When braces are omitted, the variable name stops after the
control character. Thus C<"$^XYZ"> continues to be synonymous with
C<$^X . "YZ"> as before.
As before, lexical variables may not have names beginning with control
characters. As before, variables whose names begin with a control
character are always forced to be in package `main'. All such variables
are reserved for future extensions, except those that begin with
C<^_>, which may be used by user programs and are guaranteed not to
acquire special meaning in any future version of Perl.
=head2 New variable $^C reflects C<-c> switch
C<$^C> has a boolean value that reflects whether perl is being run
in compile-only mode (i.e. via the C<-c> switch). Since
BEGIN blocks are executed under such conditions, this variable
enables perl code to determine whether actions that make sense
only during normal running are warranted. See L<perlvar>.
=head2 New variable $^V contains Perl version as a string
C<$^V> contains the Perl version number as a string composed of
characters whose ordinals match the version numbers, i.e. v5.6.0.
This may be used in string comparisons.
See C<Support for strings represented as a vector of ordinals> for an
example.
=head2 Optional Y2K warnings
If Perl is built with the cpp macro C<PERL_Y2KWARN> defined,
it emits optional warnings when concatenating the number 19
with another number.
This behavior must be specifically enabled when running Configure.
See F<INSTALL> and F<README.Y2K>.
=head2 Arrays now always interpolate into double-quoted strings
In double-quoted strings, arrays now interpolate, no matter what. The
behavior in earlier versions of perl 5 was that arrays would interpolate
into strings if the array had been mentioned before the string was
compiled, and otherwise Perl would raise a fatal compile-time error.
In versions 5.000 through 5.003, the error was
Literal @example now requires backslash
In versions 5.004_01 through 5.6.0, the error was
In string, @example now must be written as \@example
The idea here was to get people into the habit of writing
C<"fred\@example.com"> when they wanted a literal C<@> sign, just as
they have always written C<"Give me back my \$5"> when they wanted a
literal C<$> sign.
Starting with 5.6.1, when Perl now sees an C<@> sign in a
double-quoted string, it I<always> attempts to interpolate an array,
regardless of whether or not the array has been used or declared
already. The fatal error has been downgraded to an optional warning:
Possible unintended interpolation of @example in string
This warns you that C<"fred@example.com"> is going to turn into
C<fred.com> if you don't backslash the C<@>.
See http://perl.plover.com/at-error.html for more details
about the history here.
=head2 @- and @+ provide starting/ending offsets of regex submatches
The new magic variables @- and @+ provide the starting and ending
offsets, respectively, of $&, $1, $2, etc. See L<perlvar> for
details.
=head1 Modules and Pragmata
=head2 Modules
=over 4
=item attributes
While used internally by Perl as a pragma, this module also
provides a way to fetch subroutine and variable attributes.
See L<attributes>.
=item B
The Perl Compiler suite has been extensively reworked for this
release. More of the standard Perl test suite passes when run
under the Compiler, but there is still a significant way to
go to achieve production quality compiled executables.
NOTE: The Compiler suite remains highly experimental. The
generated code may not be correct, even when it manages to execute
without errors.
=item Benchmark
Overall, Benchmark results exhibit lower average error and better timing
accuracy.
You can now run tests for I<n> seconds instead of guessing the right
number of tests to run: e.g., timethese(-5, ...) will run each
code for at least 5 CPU seconds. Zero as the "number of repetitions"
means "for at least 3 CPU seconds". The output format has also
changed. For example:
use Benchmark;$x=3;timethese(-5,{a=>sub{$x*$x},b=>sub{$x**2}})
will now output something like this:
Benchmark: running a, b, each for at least 5 CPU seconds...
a: 5 wallclock secs ( 5.77 usr + 0.00 sys = 5.77 CPU) @ 200551.91/s (n=1156516)
b: 4 wallclock secs ( 5.00 usr + 0.02 sys = 5.02 CPU) @ 159605.18/s (n=800686)
New features: "each for at least N CPU seconds...", "wallclock secs",
and the "@ operations/CPU second (n=operations)".
timethese() now returns a reference to a hash of Benchmark objects containing
the test results, keyed on the names of the tests.
timethis() now returns the iterations field in the Benchmark result object
instead of 0.
timethese(), timethis(), and the new cmpthese() (see below) can also take
a format specifier of 'none' to suppress output.
A new function countit() is just like timeit() except that it takes a
TIME instead of a COUNT.
A new function cmpthese() prints a chart comparing the results of each test
returned from a timethese() call. For each possible pair of tests, the
percentage speed difference (iters/sec or seconds/iter) is shown.
For other details, see L<Benchmark>.
=item ByteLoader
The ByteLoader is a dedicated extension to generate and run
Perl bytecode. See L<ByteLoader>.
=item constant
References can now be used.
The new version also allows a leading underscore in constant names, but
disallows a double leading underscore (as in "__LINE__"). Some other names
are disallowed or warned against, including BEGIN, END, etc. Some names
which were forced into main:: used to fail silently in some cases; now they're
fatal (outside of main::) and an optional warning (inside of main::).
The ability to detect whether a constant had been set with a given name has
been added.
See L<constant>.
=item charnames
This pragma implements the C<\N> string escape. See L<charnames>.
=item Data::Dumper
A C<Maxdepth> setting can be specified to avoid venturing
too deeply into deep data structures. See L<Data::Dumper>.
The XSUB implementation of Dump() is now automatically called if the
C<Useqq> setting is not in use.
Dumping C<qr//> objects works correctly.
=item DB
C<DB> is an experimental module that exposes a clean abstraction
to Perl's debugging API.
=item DB_File
DB_File can now be built with Berkeley DB versions 1, 2 or 3.
See C<ext/DB_File/Changes>.
=item Devel::DProf
Devel::DProf, a Perl source code profiler has been added. See
L<Devel::DProf> and L<dprofpp>.
=item Devel::Peek
The Devel::Peek module provides access to the internal representation
of Perl variables and data. It is a data debugging tool for the XS programmer.
=item Dumpvalue
The Dumpvalue module provides screen dumps of Perl data.
=item DynaLoader
DynaLoader now supports a dl_unload_file() function on platforms that
support unloading shared objects using dlclose().
Perl can also optionally arrange to unload all extension shared objects
loaded by Perl. To enable this, build Perl with the Configure option
C<-Accflags=-DDL_UNLOAD_ALL_AT_EXIT>. (This maybe useful if you are
using Apache with mod_perl.)
=item English
$PERL_VERSION now stands for C<$^V> (a string value) rather than for C<$]>
(a numeric value).
=item Env
Env now supports accessing environment variables like PATH as array
variables.
=item Fcntl
More Fcntl constants added: F_SETLK64, F_SETLKW64, O_LARGEFILE for
large file (more than 4GB) access (NOTE: the O_LARGEFILE is
automatically added to sysopen() flags if large file support has been
configured, as is the default), Free/Net/OpenBSD locking behaviour
flags F_FLOCK, F_POSIX, Linux F_SHLCK, and O_ACCMODE: the combined
mask of O_RDONLY, O_WRONLY, and O_RDWR. The seek()/sysseek()
constants SEEK_SET, SEEK_CUR, and SEEK_END are available via the
C<:seek> tag. The chmod()/stat() S_IF* constants and S_IS* functions
are available via the C<:mode> tag.
=item File::Compare
A compare_text() function has been added, which allows custom
comparison functions. See L<File::Compare>.
=item File::Find
File::Find now works correctly when the wanted() function is either
autoloaded or is a symbolic reference.
A bug that caused File::Find to lose track of the working directory
when pruning top-level directories has been fixed.
File::Find now also supports several other options to control its
behavior. It can follow symbolic links if the C<follow> option is
specified. Enabling the C<no_chdir> option will make File::Find skip
changing the current directory when walking directories. The C<untaint>
flag can be useful when running with taint checks enabled.
See L<File::Find>.
=item File::Glob
This extension implements BSD-style file globbing. By default,
it will also be used for the internal implementation of the glob()
operator. See L<File::Glob>.
=item File::Spec
New methods have been added to the File::Spec module: devnull() returns
the name of the null device (/dev/null on Unix) and tmpdir() the name of
the temp directory (normally /tmp on Unix). There are now also methods
to convert between absolute and relative filenames: abs2rel() and
rel2abs(). For compatibility with operating systems that specify volume
names in file paths, the splitpath(), splitdir(), and catdir() methods
have been added.
=item File::Spec::Functions
The new File::Spec::Functions modules provides a function interface
to the File::Spec module. Allows shorthand
$fullname = catfile($dir1, $dir2, $file);
instead of
$fullname = File::Spec->catfile($dir1, $dir2, $file);
=item Getopt::Long
Getopt::Long licensing has changed to allow the Perl Artistic License
as well as the GPL. It used to be GPL only, which got in the way of
non-GPL applications that wanted to use Getopt::Long.
Getopt::Long encourages the use of Pod::Usage to produce help
messages. For example:
use Getopt::Long;
use Pod::Usage;
my $man = 0;
my $help = 0;
GetOptions('help|?' => \$help, man => \$man) or pod2usage(2);
pod2usage(1) if $help;
pod2usage(-exitstatus => 0, -verbose => 2) if $man;
__END__
=head1 NAME
sample - Using Getopt::Long and Pod::Usage
=head1 SYNOPSIS
sample [options] [file ...]
Options:
-help brief help message
-man full documentation
=head1 OPTIONS
=over 8
=item B<-help>
Print a brief help message and exits.
=item B<-man>
Prints the manual page and exits.
=back
=head1 DESCRIPTION
B<This program> will read the given input file(s) and do something
useful with the contents thereof.
=cut
See L<Pod::Usage> for details.
A bug that prevented the non-option call-back <> from being
specified as the first argument has been fixed.
To specify the characters < and > as option starters, use ><. Note,
however, that changing option starters is strongly deprecated.
=item IO
write() and syswrite() will now accept a single-argument
form of the call, for consistency with Perl's syswrite().
You can now create a TCP-based IO::Socket::INET without forcing
a connect attempt. This allows you to configure its options
(like making it non-blocking) and then call connect() manually.
A bug that prevented the IO::Socket::protocol() accessor
from ever returning the correct value has been corrected.
IO::Socket::connect now uses non-blocking IO instead of alarm()
to do connect timeouts.
IO::Socket::accept now uses select() instead of alarm() for doing
timeouts.
IO::Socket::INET->new now sets $! correctly on failure. $@ is
still set for backwards compatibility.
=item JPL
Java Perl Lingo is now distributed with Perl. See jpl/README
for more information.
=item lib
C<use lib> now weeds out any trailing duplicate entries.
C<no lib> removes all named entries.
=item Math::BigInt
The bitwise operations C<<< << >>>, C<<< >> >>>, C<&>, C<|>,
and C<~> are now supported on bigints.
=item Math::Complex
The accessor methods Re, Im, arg, abs, rho, and theta can now also
act as mutators (accessor $z->Re(), mutator $z->Re(3)).
The class method C<display_format> and the corresponding object method
C<display_format>, in addition to accepting just one argument, now can
also accept a parameter hash. Recognized keys of a parameter hash are
C<"style">, which corresponds to the old one parameter case, and two
new parameters: C<"format">, which is a printf()-style format string
(defaults usually to C<"%.15g">, you can revert to the default by
setting the format string to C<undef>) used for both parts of a
complex number, and C<"polar_pretty_print"> (defaults to true),
which controls whether an attempt is made to try to recognize small
multiples and rationals of pi (2pi, pi/2) at the argument (angle) of a
polar complex number.
The potentially disruptive change is that in list context both methods
now I<return the parameter hash>, instead of only the value of the
C<"style"> parameter.
=item Math::Trig
A little bit of radial trigonometry (cylindrical and spherical),
radial coordinate conversions, and the great circle distance were added.
=item Pod::Parser, Pod::InputObjects
Pod::Parser is a base class for parsing and selecting sections of
pod documentation from an input stream. This module takes care of
identifying pod paragraphs and commands in the input and hands off the
parsed paragraphs and commands to user-defined methods which are free
to interpret or translate them as they see fit.
Pod::InputObjects defines some input objects needed by Pod::Parser, and
for advanced users of Pod::Parser that need more about a command besides
its name and text.
As of release 5.6.0 of Perl, Pod::Parser is now the officially sanctioned
"base parser code" recommended for use by all pod2xxx translators.
Pod::Text (pod2text) and Pod::Man (pod2man) have already been converted
to use Pod::Parser and efforts to convert Pod::HTML (pod2html) are already
underway. For any questions or comments about pod parsing and translating
issues and utilities, please use the pod-people@perl.org mailing list.
For further information, please see L<Pod::Parser> and L<Pod::InputObjects>.
=item Pod::Checker, podchecker
This utility checks pod files for correct syntax, according to
L<perlpod>. Obvious errors are flagged as such, while warnings are
printed for mistakes that can be handled gracefully. The checklist is
not complete yet. See L<Pod::Checker>.
=item Pod::ParseUtils, Pod::Find
These modules provide a set of gizmos that are useful mainly for pod
translators. L<Pod::Find|Pod::Find> traverses directory structures and
returns found pod files, along with their canonical names (like
C<File::Spec::Unix>). L<Pod::ParseUtils|Pod::ParseUtils> contains
B<Pod::List> (useful for storing pod list information), B<Pod::Hyperlink>
(for parsing the contents of C<LE<lt>E<gt>> sequences) and B<Pod::Cache>
(for caching information about pod files, e.g., link nodes).
=item Pod::Select, podselect
Pod::Select is a subclass of Pod::Parser which provides a function
named "podselect()" to filter out user-specified sections of raw pod
documentation from an input stream. podselect is a script that provides
access to Pod::Select from other scripts to be used as a filter.
See L<Pod::Select>.
=item Pod::Usage, pod2usage
Pod::Usage provides the function "pod2usage()" to print usage messages for
a Perl script based on its embedded pod documentation. The pod2usage()
function is generally useful to all script authors since it lets them
write and maintain a single source (the pods) for documentation, thus
removing the need to create and maintain redundant usage message text
consisting of information already in the pods.
There is also a pod2usage script which can be used from other kinds of
scripts to print usage messages from pods (even for non-Perl scripts
with pods embedded in comments).
For details and examples, please see L<Pod::Usage>.
=item Pod::Text and Pod::Man
Pod::Text has been rewritten to use Pod::Parser. While pod2text() is
still available for backwards compatibility, the module now has a new
preferred interface. See L<Pod::Text> for the details. The new Pod::Text
module is easily subclassed for tweaks to the output, and two such
subclasses (Pod::Text::Termcap for man-page-style bold and underlining
using termcap information, and Pod::Text::Color for markup with ANSI color
sequences) are now standard.
pod2man has been turned into a module, Pod::Man, which also uses
Pod::Parser. In the process, several outstanding bugs related to quotes
in section headers, quoting of code escapes, and nested lists have been
fixed. pod2man is now a wrapper script around this module.
=item SDBM_File
An EXISTS method has been added to this module (and sdbm_exists() has
been added to the underlying sdbm library), so one can now call exists
on an SDBM_File tied hash and get the correct result, rather than a
runtime error.
A bug that may have caused data loss when more than one disk block
happens to be read from the database in a single FETCH() has been
fixed.
=item Sys::Syslog
Sys::Syslog now uses XSUBs to access facilities from syslog.h so it
no longer requires syslog.ph to exist.
=item Sys::Hostname
Sys::Hostname now uses XSUBs to call the C library's gethostname() or
uname() if they exist.
=item Term::ANSIColor
Term::ANSIColor is a very simple module to provide easy and readable
access to the ANSI color and highlighting escape sequences, supported by
most ANSI terminal emulators. It is now included standard.
=item Time::Local
The timelocal() and timegm() functions used to silently return bogus
results when the date fell outside the machine's integer range. They
now consistently croak() if the date falls in an unsupported range.
=item Win32
The error return value in list context has been changed for all functions
that return a list of values. Previously these functions returned a list
with a single element C<undef> if an error occurred. Now these functions
return the empty list in these situations. This applies to the following
functions:
Win32::FsType
Win32::GetOSVersion
The remaining functions are unchanged and continue to return C<undef> on
error even in list context.
The Win32::SetLastError(ERROR) function has been added as a complement
to the Win32::GetLastError() function.
The new Win32::GetFullPathName(FILENAME) returns the full absolute
pathname for FILENAME in scalar context. In list context it returns
a two-element list containing the fully qualified directory name and
the filename. See L<Win32>.
=item XSLoader
The XSLoader extension is a simpler alternative to DynaLoader.
See L<XSLoader>.
=item DBM Filters
A new feature called "DBM Filters" has been added to all the
DBM modules--DB_File, GDBM_File, NDBM_File, ODBM_File, and SDBM_File.
DBM Filters add four new methods to each DBM module:
filter_store_key
filter_store_value
filter_fetch_key
filter_fetch_value
These can be used to filter key-value pairs before the pairs are
written to the database or just after they are read from the database.
See L<perldbmfilter> for further information.
=back
=head2 Pragmata
C<use attrs> is now obsolete, and is only provided for
backward-compatibility. It's been replaced by the C<sub : attributes>
syntax. See L<perlsub/"Subroutine Attributes"> and L<attributes>.
Lexical warnings pragma, C<use warnings;>, to control optional warnings.
See L<perllexwarn>.
C<use filetest> to control the behaviour of filetests (C<-r> C<-w>
...). Currently only one subpragma implemented, "use filetest
'access';", that uses access(2) or equivalent to check permissions
instead of using stat(2) as usual. This matters in filesystems
where there are ACLs (access control lists): the stat(2) might lie,
but access(2) knows better.
The C<open> pragma can be used to specify default disciplines for
handle constructors (e.g. open()) and for qx//. The two
pseudo-disciplines C<:raw> and C<:crlf> are currently supported on
DOS-derivative platforms (i.e. where binmode is not a no-op).
See also L</"binmode() can be used to set :crlf and :raw modes">.
=head1 Utility Changes
=head2 dprofpp
C<dprofpp> is used to display profile data generated using C<Devel::DProf>.
See L<dprofpp>.
=head2 find2perl
The C<find2perl> utility now uses the enhanced features of the File::Find
module. The -depth and -follow options are supported. Pod documentation
is also included in the script.
=head2 h2xs
The C<h2xs> tool can now work in conjunction with C<C::Scan> (available
from CPAN) to automatically parse real-life header files. The C<-M>,
C<-a>, C<-k>, and C<-o> options are new.
=head2 perlcc
C<perlcc> now supports the C and Bytecode backends. By default,
it generates output from the simple C backend rather than the
optimized C backend.
Support for non-Unix platforms has been improved.
=head2 perldoc
C<perldoc> has been reworked to avoid possible security holes.
It will not by default let itself be run as the superuser, but you
may still use the B<-U> switch to try to make it drop privileges
first.
=head2 The Perl Debugger
Many bug fixes and enhancements were added to F<perl5db.pl>, the
Perl debugger. The help documentation was rearranged. New commands
include C<< < ? >>, C<< > ? >>, and C<< { ? >> to list out current
actions, C<man I<docpage>> to run your doc viewer on some perl
docset, and support for quoted options. The help information was
rearranged, and should be viewable once again if you're using B<less>
as your pager. A serious security hole was plugged--you should
immediately remove all older versions of the Perl debugger as
installed in previous releases, all the way back to perl3, from
your system to avoid being bitten by this.
=head1 Improved Documentation
Many of the platform-specific README files are now part of the perl
installation. See L<perl> for the complete list.
=over 4
=item perlapi.pod
The official list of public Perl API functions.
=item perlboot.pod
A tutorial for beginners on object-oriented Perl.
=item perlcompile.pod
An introduction to using the Perl Compiler suite.
=item perldbmfilter.pod
A howto document on using the DBM filter facility.
=item perldebug.pod
All material unrelated to running the Perl debugger, plus all
low-level guts-like details that risked crushing the casual user
of the debugger, have been relocated from the old manpage to the
next entry below.
=item perldebguts.pod
This new manpage contains excessively low-level material not related
to the Perl debugger, but slightly related to debugging Perl itself.
It also contains some arcane internal details of how the debugging
process works that may only be of interest to developers of Perl
debuggers.
=item perlfork.pod
Notes on the fork() emulation currently available for the Windows platform.
=item perlfilter.pod
An introduction to writing Perl source filters.
=item perlhack.pod
Some guidelines for hacking the Perl source code.
=item perlintern.pod
A list of internal functions in the Perl source code.
(List is currently empty.)
=item perllexwarn.pod
Introduction and reference information about lexically scoped
warning categories.
=item perlnumber.pod
Detailed information about numbers as they are represented in Perl.
=item perlopentut.pod
A tutorial on using open() effectively.
=item perlreftut.pod
A tutorial that introduces the essentials of references.
=item perltootc.pod
A tutorial on managing class data for object modules.
=item perltodo.pod
Discussion of the most often wanted features that may someday be
supported in Perl.
=item perlunicode.pod
An introduction to Unicode support features in Perl.
=back
=head1 Performance enhancements
=head2 Simple sort() using { $a <=> $b } and the like are optimized
Many common sort() operations using a simple inlined block are now
optimized for faster performance.
=head2 Optimized assignments to lexical variables
Certain operations in the RHS of assignment statements have been
optimized to directly set the lexical variable on the LHS,
eliminating redundant copying overheads.
=head2 Faster subroutine calls
Minor changes in how subroutine calls are handled internally
provide marginal improvements in performance.
=head2 delete(), each(), values() and hash iteration are faster
The hash values returned by delete(), each(), values() and hashes in a
list context are the actual values in the hash, instead of copies.
This results in significantly better performance, because it eliminates
needless copying in most situations.
=head1 Installation and Configuration Improvements
=head2 -Dusethreads means something different
The -Dusethreads flag now enables the experimental interpreter-based thread
support by default. To get the flavor of experimental threads that was in
5.005 instead, you need to run Configure with "-Dusethreads -Duse5005threads".
As of v5.6.0, interpreter-threads support is still lacking a way to
create new threads from Perl (i.e., C<use Thread;> will not work with
interpreter threads). C<use Thread;> continues to be available when you
specify the -Duse5005threads option to Configure, bugs and all.
NOTE: Support for threads continues to be an experimental feature.
Interfaces and implementation are subject to sudden and drastic changes.
=head2 New Configure flags
The following new flags may be enabled on the Configure command line
by running Configure with C<-Dflag>.
usemultiplicity
usethreads useithreads (new interpreter threads: no Perl API yet)
usethreads use5005threads (threads as they were in 5.005)
use64bitint (equal to now deprecated 'use64bits')
use64bitall
uselongdouble
usemorebits
uselargefiles
usesocks (only SOCKS v5 supported)
=head2 Threadedness and 64-bitness now more daring
The Configure options enabling the use of threads and the use of
64-bitness are now more daring in the sense that they no more have an
explicit list of operating systems of known threads/64-bit
capabilities. In other words: if your operating system has the
necessary APIs and datatypes, you should be able just to go ahead and
use them, for threads by Configure -Dusethreads, and for 64 bits
either explicitly by Configure -Duse64bitint or implicitly if your
system has 64-bit wide datatypes. See also L</"64-bit support">.
=head2 Long Doubles
Some platforms have "long doubles", floating point numbers of even
larger range than ordinary "doubles". To enable using long doubles for
Perl's scalars, use -Duselongdouble.
=head2 -Dusemorebits
You can enable both -Duse64bitint and -Duselongdouble with -Dusemorebits.
See also L</"64-bit support">.
=head2 -Duselargefiles
Some platforms support system APIs that are capable of handling large files
(typically, files larger than two gigabytes). Perl will try to use these
APIs if you ask for -Duselargefiles.
See L</"Large file support"> for more information.
=head2 installusrbinperl
You can use "Configure -Uinstallusrbinperl" which causes installperl
to skip installing perl also as /usr/bin/perl. This is useful if you
prefer not to modify /usr/bin for some reason or another but harmful
because many scripts assume to find Perl in /usr/bin/perl.
=head2 SOCKS support
You can use "Configure -Dusesocks" which causes Perl to probe
for the SOCKS proxy protocol library (v5, not v4). For more information
on SOCKS, see:
http://www.socks.nec.com/
=head2 C<-A> flag
You can "post-edit" the Configure variables using the Configure C<-A>
switch. The editing happens immediately after the platform specific
hints files have been processed but before the actual configuration
process starts. Run C<Configure -h> to find out the full C<-A> syntax.
=head2 Enhanced Installation Directories
The installation structure has been enriched to improve the support
for maintaining multiple versions of perl, to provide locations for
vendor-supplied modules, scripts, and manpages, and to ease maintenance
of locally-added modules, scripts, and manpages. See the section on
Installation Directories in the INSTALL file for complete details.
For most users building and installing from source, the defaults should
be fine.
If you previously used C<Configure -Dsitelib> or C<-Dsitearch> to set
special values for library directories, you might wish to consider using
the new C<-Dsiteprefix> setting instead. Also, if you wish to re-use a
config.sh file from an earlier version of perl, you should be sure to
check that Configure makes sensible choices for the new directories.
See INSTALL for complete details.
=head2 gcc automatically tried if 'cc' does not seem to be working
In many platforms the vendor-supplied 'cc' is too stripped-down to
build Perl (basically, the 'cc' doesn't do ANSI C). If this seems
to be the case and the 'cc' does not seem to be the GNU C compiler
'gcc', an automatic attempt is made to find and use 'gcc' instead.
=head1 Platform specific changes
=head2 Supported platforms
=over 4
=item *
The Mach CThreads (NEXTSTEP, OPENSTEP) are now supported by the Thread
extension.
=item *
GNU/Hurd is now supported.
=item *
Rhapsody/Darwin is now supported.
=item *
EPOC is now supported (on Psion 5).
=item *
The cygwin port (formerly cygwin32) has been greatly improved.
=back
=head2 DOS
=over 4
=item *
Perl now works with djgpp 2.02 (and 2.03 alpha).
=item *
Environment variable names are not converted to uppercase any more.
=item *
Incorrect exit codes from backticks have been fixed.
=item *
This port continues to use its own builtin globbing (not File::Glob).
=back
=head2 OS390 (OpenEdition MVS)
Support for this EBCDIC platform has not been renewed in this release.
There are difficulties in reconciling Perl's standardization on UTF-8
as its internal representation for characters with the EBCDIC character
set, because the two are incompatible.
It is unclear whether future versions will renew support for this
platform, but the possibility exists.
=head2 VMS
Numerous revisions and extensions to configuration, build, testing, and
installation process to accommodate core changes and VMS-specific options.
Expand %ENV-handling code to allow runtime mapping to logical names,
CLI symbols, and CRTL environ array.
Extension of subprocess invocation code to accept filespecs as command
"verbs".
Add to Perl command line processing the ability to use default file types and
to recognize Unix-style C<2E<gt>&1>.
Expansion of File::Spec::VMS routines, and integration into ExtUtils::MM_VMS.
Extension of ExtUtils::MM_VMS to handle complex extensions more flexibly.
Barewords at start of Unix-syntax paths may be treated as text rather than
only as logical names.
Optional secure translation of several logical names used internally by Perl.
Miscellaneous bugfixing and porting of new core code to VMS.
Thanks are gladly extended to the many people who have contributed VMS
patches, testing, and ideas.
=head2 Win32
Perl can now emulate fork() internally, using multiple interpreters running
in different concurrent threads. This support must be enabled at build
time. See L<perlfork> for detailed information.
When given a pathname that consists only of a drivename, such as C<A:>,
opendir() and stat() now use the current working directory for the drive
rather than the drive root.
The builtin XSUB functions in the Win32:: namespace are documented. See
L<Win32>.
$^X now contains the full path name of the running executable.
A Win32::GetLongPathName() function is provided to complement
Win32::GetFullPathName() and Win32::GetShortPathName(). See L<Win32>.
POSIX::uname() is supported.
system(1,...) now returns true process IDs rather than process
handles. kill() accepts any real process id, rather than strictly
return values from system(1,...).
For better compatibility with Unix, C<kill(0, $pid)> can now be used to
test whether a process exists.
The C<Shell> module is supported.
Better support for building Perl under command.com in Windows 95
has been added.
Scripts are read in binary mode by default to allow ByteLoader (and
the filter mechanism in general) to work properly. For compatibility,
the DATA filehandle will be set to text mode if a carriage return is
detected at the end of the line containing the __END__ or __DATA__
token; if not, the DATA filehandle will be left open in binary mode.
Earlier versions always opened the DATA filehandle in text mode.
The glob() operator is implemented via the C<File::Glob> extension,
which supports glob syntax of the C shell. This increases the flexibility
of the glob() operator, but there may be compatibility issues for
programs that relied on the older globbing syntax. If you want to
preserve compatibility with the older syntax, you might want to run
perl with C<-MFile::DosGlob>. For details and compatibility information,
see L<File::Glob>.
=head1 Significant bug fixes
=head2 <HANDLE> on empty files
With C<$/> set to C<undef>, "slurping" an empty file returns a string of
zero length (instead of C<undef>, as it used to) the first time the
HANDLE is read after C<$/> is set to C<undef>. Further reads yield
C<undef>.
This means that the following will append "foo" to an empty file (it used
to do nothing):
perl -0777 -pi -e 's/^/foo/' empty_file
The behaviour of:
perl -pi -e 's/^/foo/' empty_file
is unchanged (it continues to leave the file empty).
=head2 C<eval '...'> improvements
Line numbers (as reflected by caller() and most diagnostics) within
C<eval '...'> were often incorrect where here documents were involved.
This has been corrected.
Lexical lookups for variables appearing in C<eval '...'> within
functions that were themselves called within an C<eval '...'> were
searching the wrong place for lexicals. The lexical search now
correctly ends at the subroutine's block boundary.
The use of C<return> within C<eval {...}> caused $@ not to be reset
correctly when no exception occurred within the eval. This has
been fixed.
Parsing of here documents used to be flawed when they appeared as
the replacement expression in C<eval 's/.../.../e'>. This has
been fixed.
=head2 All compilation errors are true errors
Some "errors" encountered at compile time were by necessity
generated as warnings followed by eventual termination of the
program. This enabled more such errors to be reported in a
single run, rather than causing a hard stop at the first error
that was encountered.
The mechanism for reporting such errors has been reimplemented
to queue compile-time errors and report them at the end of the
compilation as true errors rather than as warnings. This fixes
cases where error messages leaked through in the form of warnings
when code was compiled at run time using C<eval STRING>, and
also allows such errors to be reliably trapped using C<eval "...">.
=head2 Implicitly closed filehandles are safer
Sometimes implicitly closed filehandles (as when they are localized,
and Perl automatically closes them on exiting the scope) could
inadvertently set $? or $!. This has been corrected.
=head2 Behavior of list slices is more consistent
When taking a slice of a literal list (as opposed to a slice of
an array or hash), Perl used to return an empty list if the
result happened to be composed of all undef values.
The new behavior is to produce an empty list if (and only if)
the original list was empty. Consider the following example:
@a = (1,undef,undef,2)[2,1,2];
The old behavior would have resulted in @a having no elements.
The new behavior ensures it has three undefined elements.
Note in particular that the behavior of slices of the following
cases remains unchanged:
@a = ()[1,2];
@a = (getpwent)[7,0];
@a = (anything_returning_empty_list())[2,1,2];
@a = @b[2,1,2];
@a = @c{'a','b','c'};
See L<perldata>.
=head2 C<(\$)> prototype and C<$foo{a}>
A scalar reference prototype now correctly allows a hash or
array element in that slot.
=head2 C<goto &sub> and AUTOLOAD
The C<goto &sub> construct works correctly when C<&sub> happens
to be autoloaded.
=head2 C<-bareword> allowed under C<use integer>
The autoquoting of barewords preceded by C<-> did not work
in prior versions when the C<integer> pragma was enabled.
This has been fixed.
=head2 Failures in DESTROY()
When code in a destructor threw an exception, it went unnoticed
in earlier versions of Perl, unless someone happened to be
looking in $@ just after the point the destructor happened to
run. Such failures are now visible as warnings when warnings are
enabled.
=head2 Locale bugs fixed
printf() and sprintf() previously reset the numeric locale
back to the default "C" locale. This has been fixed.
Numbers formatted according to the local numeric locale
(such as using a decimal comma instead of a decimal dot) caused
"isn't numeric" warnings, even while the operations accessing
those numbers produced correct results. These warnings have been
discontinued.
=head2 Memory leaks
The C<eval 'return sub {...}'> construct could sometimes leak
memory. This has been fixed.
Operations that aren't filehandle constructors used to leak memory
when used on invalid filehandles. This has been fixed.
Constructs that modified C<@_> could fail to deallocate values
in C<@_> and thus leak memory. This has been corrected.
=head2 Spurious subroutine stubs after failed subroutine calls
Perl could sometimes create empty subroutine stubs when a
subroutine was not found in the package. Such cases stopped
later method lookups from progressing into base packages.
This has been corrected.
=head2 Taint failures under C<-U>
When running in unsafe mode, taint violations could sometimes
cause silent failures. This has been fixed.
=head2 END blocks and the C<-c> switch
Prior versions used to run BEGIN B<and> END blocks when Perl was
run in compile-only mode. Since this is typically not the expected
behavior, END blocks are not executed anymore when the C<-c> switch
is used, or if compilation fails.
See L</"Support for CHECK blocks"> for how to run things when the compile
phase ends.
=head2 Potential to leak DATA filehandles
Using the C<__DATA__> token creates an implicit filehandle to
the file that contains the token. It is the program's
responsibility to close it when it is done reading from it.
This caveat is now better explained in the documentation.
See L<perldata>.
=head1 New or Changed Diagnostics
=over 4
=item "%s" variable %s masks earlier declaration in same %s
(W misc) A "my" or "our" variable has been redeclared in the current scope or statement,
effectively eliminating all access to the previous instance. This is almost
always a typographical error. Note that the earlier variable will still exist
until the end of the scope or until all closure referents to it are
destroyed.
=item "my sub" not yet implemented
(F) Lexically scoped subroutines are not yet implemented. Don't try that
yet.
=item "our" variable %s redeclared
(W misc) You seem to have already declared the same global once before in the
current lexical scope.
=item '!' allowed only after types %s
(F) The '!' is allowed in pack() and unpack() only after certain types.
See L<perlfunc/pack>.
=item / cannot take a count
(F) You had an unpack template indicating a counted-length string,
but you have also specified an explicit size for the string.
See L<perlfunc/pack>.
=item / must be followed by a, A or Z
(F) You had an unpack template indicating a counted-length string,
which must be followed by one of the letters a, A or Z
to indicate what sort of string is to be unpacked.
See L<perlfunc/pack>.
=item / must be followed by a*, A* or Z*
(F) You had a pack template indicating a counted-length string,
Currently the only things that can have their length counted are a*, A* or Z*.
See L<perlfunc/pack>.
=item / must follow a numeric type
(F) You had an unpack template that contained a '#',
but this did not follow some numeric unpack specification.
See L<perlfunc/pack>.
=item /%s/: Unrecognized escape \\%c passed through
(W regexp) You used a backslash-character combination which is not recognized
by Perl. This combination appears in an interpolated variable or a
C<'>-delimited regular expression. The character was understood literally.
=item /%s/: Unrecognized escape \\%c in character class passed through
(W regexp) You used a backslash-character combination which is not recognized
by Perl inside character classes. The character was understood literally.
=item /%s/ should probably be written as "%s"
(W syntax) You have used a pattern where Perl expected to find a string,
as in the first argument to C<join>. Perl will treat the true
or false result of matching the pattern against $_ as the string,
which is probably not what you had in mind.
=item %s() called too early to check prototype
(W prototype) You've called a function that has a prototype before the parser saw a
definition or declaration for it, and Perl could not check that the call
conforms to the prototype. You need to either add an early prototype
declaration for the subroutine in question, or move the subroutine
definition ahead of the call to get proper prototype checking. Alternatively,
if you are certain that you're calling the function correctly, you may put
an ampersand before the name to avoid the warning. See L<perlsub>.
=item %s argument is not a HASH or ARRAY element
(F) The argument to exists() must be a hash or array element, such as:
$foo{$bar}
$ref->{"susie"}[12]
=item %s argument is not a HASH or ARRAY element or slice
(F) The argument to delete() must be either a hash or array element, such as:
$foo{$bar}
$ref->{"susie"}[12]
or a hash or array slice, such as:
@foo[$bar, $baz, $xyzzy]
@{$ref->[12]}{"susie", "queue"}
=item %s argument is not a subroutine name
(F) The argument to exists() for C<exists &sub> must be a subroutine
name, and not a subroutine call. C<exists &sub()> will generate this error.
=item %s package attribute may clash with future reserved word: %s
(W reserved) A lowercase attribute name was used that had a package-specific handler.
That name might have a meaning to Perl itself some day, even though it
doesn't yet. Perhaps you should use a mixed-case attribute name, instead.
See L<attributes>.
=item (in cleanup) %s
(W misc) This prefix usually indicates that a DESTROY() method raised
the indicated exception. Since destructors are usually called by
the system at arbitrary points during execution, and often a vast
number of times, the warning is issued only once for any number
of failures that would otherwise result in the same message being
repeated.
Failure of user callbacks dispatched using the C<G_KEEPERR> flag
could also result in this warning. See L<perlcall/G_KEEPERR>.
=item <> should be quotes
(F) You wrote C<< require <file> >> when you should have written
C<require 'file'>.
=item Attempt to join self
(F) You tried to join a thread from within itself, which is an
impossible task. You may be joining the wrong thread, or you may
need to move the join() to some other thread.
=item Bad evalled substitution pattern
(F) You've used the /e switch to evaluate the replacement for a
substitution, but perl found a syntax error in the code to evaluate,
most likely an unexpected right brace '}'.
=item Bad realloc() ignored
(S) An internal routine called realloc() on something that had never been
malloc()ed in the first place. Mandatory, but can be disabled by
setting environment variable C<PERL_BADFREE> to 1.
=item Bareword found in conditional
(W bareword) The compiler found a bareword where it expected a conditional,
which often indicates that an || or && was parsed as part of the
last argument of the previous construct, for example:
open FOO || die;
It may also indicate a misspelled constant that has been interpreted
as a bareword:
use constant TYPO => 1;
if (TYOP) { print "foo" }
The C<strict> pragma is useful in avoiding such errors.
=item Binary number > 0b11111111111111111111111111111111 non-portable
(W portable) The binary number you specified is larger than 2**32-1
(4294967295) and therefore non-portable between systems. See
L<perlport> for more on portability concerns.
=item Bit vector size > 32 non-portable
(W portable) Using bit vector sizes larger than 32 is non-portable.
=item Buffer overflow in prime_env_iter: %s
(W internal) A warning peculiar to VMS. While Perl was preparing to iterate over
%ENV, it encountered a logical name or symbol definition which was too long,
so it was truncated to the string shown.
=item Can't check filesystem of script "%s"
(P) For some reason you can't check the filesystem of the script for nosuid.
=item Can't declare class for non-scalar %s in "%s"
(S) Currently, only scalar variables can declared with a specific class
qualifier in a "my" or "our" declaration. The semantics may be extended
for other types of variables in future.
=item Can't declare %s in "%s"
(F) Only scalar, array, and hash variables may be declared as "my" or
"our" variables. They must have ordinary identifiers as names.
=item Can't ignore signal CHLD, forcing to default
(W signal) Perl has detected that it is being run with the SIGCHLD signal
(sometimes known as SIGCLD) disabled. Since disabling this signal
will interfere with proper determination of exit status of child
processes, Perl has reset the signal to its default value.
This situation typically indicates that the parent program under
which Perl may be running (e.g., cron) is being very careless.
=item Can't modify non-lvalue subroutine call
(F) Subroutines meant to be used in lvalue context should be declared as
such, see L<perlsub/"Lvalue subroutines">.
=item Can't read CRTL environ
(S) A warning peculiar to VMS. Perl tried to read an element of %ENV
from the CRTL's internal environment array and discovered the array was
missing. You need to figure out where your CRTL misplaced its environ
or define F<PERL_ENV_TABLES> (see L<perlvms>) so that environ is not searched.
=item Can't remove %s: %s, skipping file
(S) You requested an inplace edit without creating a backup file. Perl
was unable to remove the original file to replace it with the modified
file. The file was left unmodified.
=item Can't return %s from lvalue subroutine
(F) Perl detected an attempt to return illegal lvalues (such
as temporary or readonly values) from a subroutine used as an lvalue.
This is not allowed.
=item Can't weaken a nonreference
(F) You attempted to weaken something that was not a reference. Only
references can be weakened.
=item Character class [:%s:] unknown
(F) The class in the character class [: :] syntax is unknown.
See L<perlre>.
=item Character class syntax [%s] belongs inside character classes
(W unsafe) The character class constructs [: :], [= =], and [. .] go
I<inside> character classes, the [] are part of the construct,
for example: /[012[:alpha:]345]/. Note that [= =] and [. .]
are not currently implemented; they are simply placeholders for
future extensions.
=item Constant is not %s reference
(F) A constant value (perhaps declared using the C<use constant> pragma)
is being dereferenced, but it amounts to the wrong type of reference. The
message indicates the type of reference that was expected. This usually
indicates a syntax error in dereferencing the constant value.
See L<perlsub/"Constant Functions"> and L<constant>.
=item constant(%s): %s
(F) The parser found inconsistencies either while attempting to define an
overloaded constant, or when trying to find the character name specified
in the C<\N{...}> escape. Perhaps you forgot to load the corresponding
C<overload> or C<charnames> pragma? See L<charnames> and L<overload>.
=item CORE::%s is not a keyword
(F) The CORE:: namespace is reserved for Perl keywords.
=item defined(@array) is deprecated
(D) defined() is not usually useful on arrays because it checks for an
undefined I<scalar> value. If you want to see if the array is empty,
just use C<if (@array) { # not empty }> for example.
=item defined(%hash) is deprecated
(D) defined() is not usually useful on hashes because it checks for an
undefined I<scalar> value. If you want to see if the hash is empty,
just use C<if (%hash) { # not empty }> for example.
=item Did not produce a valid header
See Server error.
=item (Did you mean "local" instead of "our"?)
(W misc) Remember that "our" does not localize the declared global variable.
You have declared it again in the same lexical scope, which seems superfluous.
=item Document contains no data
See Server error.
=item entering effective %s failed
(F) While under the C<use filetest> pragma, switching the real and
effective uids or gids failed.
=item false [] range "%s" in regexp
(W regexp) A character class range must start and end at a literal character, not
another character class like C<\d> or C<[:alpha:]>. The "-" in your false
range is interpreted as a literal "-". Consider quoting the "-", "\-".
See L<perlre>.
=item Filehandle %s opened only for output
(W io) You tried to read from a filehandle opened only for writing. If you
intended it to be a read/write filehandle, you needed to open it with
"+<" or "+>" or "+>>" instead of with "<" or nothing. If
you intended only to read from the file, use "<". See
L<perlfunc/open>.
=item flock() on closed filehandle %s
(W closed) The filehandle you're attempting to flock() got itself closed some
time before now. Check your logic flow. flock() operates on filehandles.
Are you attempting to call flock() on a dirhandle by the same name?
=item Global symbol "%s" requires explicit package name
(F) You've said "use strict vars", which indicates that all variables
must either be lexically scoped (using "my"), declared beforehand using
"our", or explicitly qualified to say which package the global variable
is in (using "::").
=item Hexadecimal number > 0xffffffff non-portable
(W portable) The hexadecimal number you specified is larger than 2**32-1
(4294967295) and therefore non-portable between systems. See
L<perlport> for more on portability concerns.
=item Ill-formed CRTL environ value "%s"
(W internal) A warning peculiar to VMS. Perl tried to read the CRTL's internal
environ array, and encountered an element without the C<=> delimiter
used to separate keys from values. The element is ignored.
=item Ill-formed message in prime_env_iter: |%s|
(W internal) A warning peculiar to VMS. Perl tried to read a logical name
or CLI symbol definition when preparing to iterate over %ENV, and
didn't see the expected delimiter between key and value, so the
line was ignored.
=item Illegal binary digit %s
(F) You used a digit other than 0 or 1 in a binary number.
=item Illegal binary digit %s ignored
(W digit) You may have tried to use a digit other than 0 or 1 in a binary number.
Interpretation of the binary number stopped before the offending digit.
=item Illegal number of bits in vec
(F) The number of bits in vec() (the third argument) must be a power of
two from 1 to 32 (or 64, if your platform supports that).
=item Integer overflow in %s number
(W overflow) The hexadecimal, octal or binary number you have specified either
as a literal or as an argument to hex() or oct() is too big for your
architecture, and has been converted to a floating point number. On a
32-bit architecture the largest hexadecimal, octal or binary number
representable without overflow is 0xFFFFFFFF, 037777777777, or
0b11111111111111111111111111111111 respectively. Note that Perl
transparently promotes all numbers to a floating point representation
internally--subject to loss of precision errors in subsequent
operations.
=item Invalid %s attribute: %s
The indicated attribute for a subroutine or variable was not recognized
by Perl or by a user-supplied handler. See L<attributes>.
=item Invalid %s attributes: %s
The indicated attributes for a subroutine or variable were not recognized
by Perl or by a user-supplied handler. See L<attributes>.
=item invalid [] range "%s" in regexp
The offending range is now explicitly displayed.
=item Invalid separator character %s in attribute list
(F) Something other than a colon or whitespace was seen between the
elements of an attribute list. If the previous attribute
had a parenthesised parameter list, perhaps that list was terminated
too soon. See L<attributes>.
=item Invalid separator character %s in subroutine attribute list
(F) Something other than a colon or whitespace was seen between the
elements of a subroutine attribute list. If the previous attribute
had a parenthesised parameter list, perhaps that list was terminated
too soon.
=item leaving effective %s failed
(F) While under the C<use filetest> pragma, switching the real and
effective uids or gids failed.
=item Lvalue subs returning %s not implemented yet
(F) Due to limitations in the current implementation, array and hash
values cannot be returned in subroutines used in lvalue context.
See L<perlsub/"Lvalue subroutines">.
=item Method %s not permitted
See Server error.
=item Missing %sbrace%s on \N{}
(F) Wrong syntax of character name literal C<\N{charname}> within
double-quotish context.
=item Missing command in piped open
(W pipe) You used the C<open(FH, "| command")> or C<open(FH, "command |")>
construction, but the command was missing or blank.
=item Missing name in "my sub"
(F) The reserved syntax for lexically scoped subroutines requires that they
have a name with which they can be found.
=item No %s specified for -%c
(F) The indicated command line switch needs a mandatory argument, but
you haven't specified one.
=item No package name allowed for variable %s in "our"
(F) Fully qualified variable names are not allowed in "our" declarations,
because that doesn't make much sense under existing semantics. Such
syntax is reserved for future extensions.
=item No space allowed after -%c
(F) The argument to the indicated command line switch must follow immediately
after the switch, without intervening spaces.
=item no UTC offset information; assuming local time is UTC
(S) A warning peculiar to VMS. Perl was unable to find the local
timezone offset, so it's assuming that local system time is equivalent
to UTC. If it's not, define the logical name F<SYS$TIMEZONE_DIFFERENTIAL>
to translate to the number of seconds which need to be added to UTC to
get local time.
=item Octal number > 037777777777 non-portable
(W portable) The octal number you specified is larger than 2**32-1 (4294967295)
and therefore non-portable between systems. See L<perlport> for more
on portability concerns.
See also L<perlport> for writing portable code.
=item panic: del_backref
(P) Failed an internal consistency check while trying to reset a weak
reference.
=item panic: kid popen errno read
(F) forked child returned an incomprehensible message about its errno.
=item panic: magic_killbackrefs
(P) Failed an internal consistency check while trying to reset all weak
references to an object.
=item Parentheses missing around "%s" list
(W parenthesis) You said something like
my $foo, $bar = @_;
when you meant
my ($foo, $bar) = @_;
Remember that "my", "our", and "local" bind tighter than comma.
=item Possible unintended interpolation of %s in string
(W ambiguous) It used to be that Perl would try to guess whether you
wanted an array interpolated or a literal @. It no longer does this;
arrays are now I<always> interpolated into strings. This means that
if you try something like:
print "fred@example.com";
and the array C<@example> doesn't exist, Perl is going to print
C<fred.com>, which is probably not what you wanted. To get a literal
C<@> sign in a string, put a backslash before it, just as you would
to get a literal C<$> sign.
=item Possible Y2K bug: %s
(W y2k) You are concatenating the number 19 with another number, which
could be a potential Year 2000 problem.
=item pragma "attrs" is deprecated, use "sub NAME : ATTRS" instead
(W deprecated) You have written something like this:
sub doit
{
use attrs qw(locked);
}
You should use the new declaration syntax instead.
sub doit : locked
{
...
The C<use attrs> pragma is now obsolete, and is only provided for
backward-compatibility. See L<perlsub/"Subroutine Attributes">.
=item Premature end of script headers
See Server error.
=item Repeat count in pack overflows
(F) You can't specify a repeat count so large that it overflows
your signed integers. See L<perlfunc/pack>.
=item Repeat count in unpack overflows
(F) You can't specify a repeat count so large that it overflows
your signed integers. See L<perlfunc/unpack>.
=item realloc() of freed memory ignored
(S) An internal routine called realloc() on something that had already
been freed.
=item Reference is already weak
(W misc) You have attempted to weaken a reference that is already weak.
Doing so has no effect.
=item setpgrp can't take arguments
(F) Your system has the setpgrp() from BSD 4.2, which takes no arguments,
unlike POSIX setpgid(), which takes a process ID and process group ID.
=item Strange *+?{} on zero-length expression
(W regexp) You applied a regular expression quantifier in a place where it
makes no sense, such as on a zero-width assertion.
Try putting the quantifier inside the assertion instead. For example,
the way to match "abc" provided that it is followed by three
repetitions of "xyz" is C</abc(?=(?:xyz){3})/>, not C</abc(?=xyz){3}/>.
=item switching effective %s is not implemented
(F) While under the C<use filetest> pragma, we cannot switch the
real and effective uids or gids.
=item This Perl can't reset CRTL environ elements (%s)
=item This Perl can't set CRTL environ elements (%s=%s)
(W internal) Warnings peculiar to VMS. You tried to change or delete an element
of the CRTL's internal environ array, but your copy of Perl wasn't
built with a CRTL that contained the setenv() function. You'll need to
rebuild Perl with a CRTL that does, or redefine F<PERL_ENV_TABLES> (see
L<perlvms>) so that the environ array isn't the target of the change to
%ENV which produced the warning.
=item Too late to run %s block
(W void) A CHECK or INIT block is being defined during run time proper,
when the opportunity to run them has already passed. Perhaps you are
loading a file with C<require> or C<do> when you should be using
C<use> instead. Or perhaps you should put the C<require> or C<do>
inside a BEGIN block.
=item Unknown open() mode '%s'
(F) The second argument of 3-argument open() is not among the list
of valid modes: C<< < >>, C<< > >>, C<<< >> >>>, C<< +< >>,
C<< +> >>, C<<< +>> >>>, C<-|>, C<|->.
=item Unknown process %x sent message to prime_env_iter: %s
(P) An error peculiar to VMS. Perl was reading values for %ENV before
iterating over it, and someone else stuck a message in the stream of
data Perl expected. Someone's very confused, or perhaps trying to
subvert Perl's population of %ENV for nefarious purposes.
=item Unrecognized escape \\%c passed through
(W misc) You used a backslash-character combination which is not recognized
by Perl. The character was understood literally.
=item Unterminated attribute parameter in attribute list
(F) The lexer saw an opening (left) parenthesis character while parsing an
attribute list, but the matching closing (right) parenthesis
character was not found. You may need to add (or remove) a backslash
character to get your parentheses to balance. See L<attributes>.
=item Unterminated attribute list
(F) The lexer found something other than a simple identifier at the start
of an attribute, and it wasn't a semicolon or the start of a
block. Perhaps you terminated the parameter list of the previous attribute
too soon. See L<attributes>.
=item Unterminated attribute parameter in subroutine attribute list
(F) The lexer saw an opening (left) parenthesis character while parsing a
subroutine attribute list, but the matching closing (right) parenthesis
character was not found. You may need to add (or remove) a backslash
character to get your parentheses to balance.
=item Unterminated subroutine attribute list
(F) The lexer found something other than a simple identifier at the start
of a subroutine attribute, and it wasn't a semicolon or the start of a
block. Perhaps you terminated the parameter list of the previous attribute
too soon.
=item Value of CLI symbol "%s" too long
(W misc) A warning peculiar to VMS. Perl tried to read the value of an %ENV
element from a CLI symbol table, and found a resultant string longer
than 1024 characters. The return value has been truncated to 1024
characters.
=item Version number must be a constant number
(P) The attempt to translate a C<use Module n.n LIST> statement into
its equivalent C<BEGIN> block found an internal inconsistency with
the version number.
=back
=head1 New tests
=over 4
=item lib/attrs
Compatibility tests for C<sub : attrs> vs the older C<use attrs>.
=item lib/env
Tests for new environment scalar capability (e.g., C<use Env qw($BAR);>).
=item lib/env-array
Tests for new environment array capability (e.g., C<use Env qw(@PATH);>).
=item lib/io_const
IO constants (SEEK_*, _IO*).
=item lib/io_dir
Directory-related IO methods (new, read, close, rewind, tied delete).
=item lib/io_multihomed
INET sockets with multi-homed hosts.
=item lib/io_poll
IO poll().
=item lib/io_unix
UNIX sockets.
=item op/attrs
Regression tests for C<my ($x,@y,%z) : attrs> and <sub : attrs>.
=item op/filetest
File test operators.
=item op/lex_assign
Verify operations that access pad objects (lexicals and temporaries).
=item op/exists_sub
Verify C<exists &sub> operations.
=back
=head1 Incompatible Changes
=head2 Perl Source Incompatibilities
Beware that any new warnings that have been added or old ones
that have been enhanced are B<not> considered incompatible changes.
Since all new warnings must be explicitly requested via the C<-w>
switch or the C<warnings> pragma, it is ultimately the programmer's
responsibility to ensure that warnings are enabled judiciously.
=over 4
=item CHECK is a new keyword
All subroutine definitions named CHECK are now special. See
C</"Support for CHECK blocks"> for more information.
=item Treatment of list slices of undef has changed
There is a potential incompatibility in the behavior of list slices
that are comprised entirely of undefined values.
See L</"Behavior of list slices is more consistent">.
=item Format of $English::PERL_VERSION is different
The English module now sets $PERL_VERSION to $^V (a string value) rather
than C<$]> (a numeric value). This is a potential incompatibility.
Send us a report via perlbug if you are affected by this.
See L</"Improved Perl version numbering system"> for the reasons for
this change.
=item Literals of the form C<1.2.3> parse differently
Previously, numeric literals with more than one dot in them were
interpreted as a floating point number concatenated with one or more
numbers. Such "numbers" are now parsed as strings composed of the
specified ordinals.
For example, C<print 97.98.99> used to output C<97.9899> in earlier
versions, but now prints C<abc>.
See L</"Support for strings represented as a vector of ordinals">.
=item Possibly changed pseudo-random number generator
Perl programs that depend on reproducing a specific set of pseudo-random
numbers may now produce different output due to improvements made to the
rand() builtin. You can use C<sh Configure -Drandfunc=rand> to obtain
the old behavior.
See L</"Better pseudo-random number generator">.
=item Hashing function for hash keys has changed
Even though Perl hashes are not order preserving, the apparently
random order encountered when iterating on the contents of a hash
is actually determined by the hashing algorithm used. Improvements
in the algorithm may yield a random order that is B<different> from
that of previous versions, especially when iterating on hashes.
See L</"Better worst-case behavior of hashes"> for additional
information.
=item C<undef> fails on read only values
Using the C<undef> operator on a readonly value (such as $1) has
the same effect as assigning C<undef> to the readonly value--it
throws an exception.
=item Close-on-exec bit may be set on pipe and socket handles
Pipe and socket handles are also now subject to the close-on-exec
behavior determined by the special variable $^F.
See L</"More consistent close-on-exec behavior">.
=item Writing C<"$$1"> to mean C<"${$}1"> is unsupported
Perl 5.004 deprecated the interpretation of C<$$1> and
similar within interpolated strings to mean C<$$ . "1">,
but still allowed it.
In Perl 5.6.0 and later, C<"$$1"> always means C<"${$1}">.
=item delete(), each(), values() and C<\(%h)>
operate on aliases to values, not copies
delete(), each(), values() and hashes (e.g. C<\(%h)>)
in a list context return the actual
values in the hash, instead of copies (as they used to in earlier
versions). Typical idioms for using these constructs copy the
returned values, but this can make a significant difference when
creating references to the returned values. Keys in the hash are still
returned as copies when iterating on a hash.
See also L</"delete(), each(), values() and hash iteration are faster">.
=item vec(EXPR,OFFSET,BITS) enforces powers-of-two BITS
vec() generates a run-time error if the BITS argument is not
a valid power-of-two integer.
=item Text of some diagnostic output has changed
Most references to internal Perl operations in diagnostics
have been changed to be more descriptive. This may be an
issue for programs that may incorrectly rely on the exact
text of diagnostics for proper functioning.
=item C<%@> has been removed
The undocumented special variable C<%@> that used to accumulate
"background" errors (such as those that happen in DESTROY())
has been removed, because it could potentially result in memory
leaks.
=item Parenthesized not() behaves like a list operator
The C<not> operator now falls under the "if it looks like a function,
it behaves like a function" rule.
As a result, the parenthesized form can be used with C<grep> and C<map>.
The following construct used to be a syntax error before, but it works
as expected now:
grep not($_), @things;
On the other hand, using C<not> with a literal list slice may not
work. The following previously allowed construct:
print not (1,2,3)[0];
needs to be written with additional parentheses now:
print not((1,2,3)[0]);
The behavior remains unaffected when C<not> is not followed by parentheses.
=item Semantics of bareword prototype C<(*)> have changed
The semantics of the bareword prototype C<*> have changed. Perl 5.005
always coerced simple scalar arguments to a typeglob, which wasn't useful
in situations where the subroutine must distinguish between a simple
scalar and a typeglob. The new behavior is to not coerce bareword
arguments to a typeglob. The value will always be visible as either
a simple scalar or as a reference to a typeglob.
See L</"More functional bareword prototype (*)">.
=item Semantics of bit operators may have changed on 64-bit platforms
If your platform is either natively 64-bit or if Perl has been
configured to used 64-bit integers, i.e., $Config{ivsize} is 8,
there may be a potential incompatibility in the behavior of bitwise
numeric operators (& | ^ ~ << >>). These operators used to strictly
operate on the lower 32 bits of integers in previous versions, but now
operate over the entire native integral width. In particular, note
that unary C<~> will produce different results on platforms that have
different $Config{ivsize}. For portability, be sure to mask off
the excess bits in the result of unary C<~>, e.g., C<~$x & 0xffffffff>.
See L</"Bit operators support full native integer width">.
=item More builtins taint their results
As described in L</"Improved security features">, there may be more
sources of taint in a Perl program.
To avoid these new tainting behaviors, you can build Perl with the
Configure option C<-Accflags=-DINCOMPLETE_TAINTS>. Beware that the
ensuing perl binary may be insecure.
=back
=head2 C Source Incompatibilities
=over 4
=item C<PERL_POLLUTE>
Release 5.005 grandfathered old global symbol names by providing preprocessor
macros for extension source compatibility. As of release 5.6.0, these
preprocessor definitions are not available by default. You need to explicitly
compile perl with C<-DPERL_POLLUTE> to get these definitions. For
extensions still using the old symbols, this option can be
specified via MakeMaker:
perl Makefile.PL POLLUTE=1
=item C<PERL_IMPLICIT_CONTEXT>
This new build option provides a set of macros for all API functions
such that an implicit interpreter/thread context argument is passed to
every API function. As a result of this, something like C<sv_setsv(foo,bar)>
amounts to a macro invocation that actually translates to something like
C<Perl_sv_setsv(my_perl,foo,bar)>. While this is generally expected
to not have any significant source compatibility issues, the difference
between a macro and a real function call will need to be considered.
This means that there B<is> a source compatibility issue as a result of
this if your extensions attempt to use pointers to any of the Perl API
functions.
Note that the above issue is not relevant to the default build of
Perl, whose interfaces continue to match those of prior versions
(but subject to the other options described here).
See L<perlguts/Background and PERL_IMPLICIT_CONTEXT> for detailed information
on the ramifications of building Perl with this option.
NOTE: PERL_IMPLICIT_CONTEXT is automatically enabled whenever Perl is built
with one of -Dusethreads, -Dusemultiplicity, or both. It is not
intended to be enabled by users at this time.
=item C<PERL_POLLUTE_MALLOC>
Enabling Perl's malloc in release 5.005 and earlier caused the namespace of
the system's malloc family of functions to be usurped by the Perl versions,
since by default they used the same names. Besides causing problems on
platforms that do not allow these functions to be cleanly replaced, this
also meant that the system versions could not be called in programs that
used Perl's malloc. Previous versions of Perl have allowed this behaviour
to be suppressed with the HIDEMYMALLOC and EMBEDMYMALLOC preprocessor
definitions.
As of release 5.6.0, Perl's malloc family of functions have default names
distinct from the system versions. You need to explicitly compile perl with
C<-DPERL_POLLUTE_MALLOC> to get the older behaviour. HIDEMYMALLOC
and EMBEDMYMALLOC have no effect, since the behaviour they enabled is now
the default.
Note that these functions do B<not> constitute Perl's memory allocation API.
See L<perlguts/"Memory Allocation"> for further information about that.
=back
=head2 Compatible C Source API Changes
=over 4
=item C<PATCHLEVEL> is now C<PERL_VERSION>
The cpp macros C<PERL_REVISION>, C<PERL_VERSION>, and C<PERL_SUBVERSION>
are now available by default from perl.h, and reflect the base revision,
patchlevel, and subversion respectively. C<PERL_REVISION> had no
prior equivalent, while C<PERL_VERSION> and C<PERL_SUBVERSION> were
previously available as C<PATCHLEVEL> and C<SUBVERSION>.
The new names cause less pollution of the B<cpp> namespace and reflect what
the numbers have come to stand for in common practice. For compatibility,
the old names are still supported when F<patchlevel.h> is explicitly
included (as required before), so there is no source incompatibility
from the change.
=back
=head2 Binary Incompatibilities
In general, the default build of this release is expected to be binary
compatible for extensions built with the 5.005 release or its maintenance
versions. However, specific platforms may have broken binary compatibility
due to changes in the defaults used in hints files. Therefore, please be
sure to always check the platform-specific README files for any notes to
the contrary.
The usethreads or usemultiplicity builds are B<not> binary compatible
with the corresponding builds in 5.005.
On platforms that require an explicit list of exports (AIX, OS/2 and Windows,
among others), purely internal symbols such as parser functions and the
run time opcodes are not exported by default. Perl 5.005 used to export
all functions irrespective of whether they were considered part of the
public API or not.
For the full list of public API functions, see L<perlapi>.
=head1 Known Problems
=head2 Localizing a tied hash element may leak memory
As of the 5.6.1 release, there is a known leak when code such as this
is executed:
use Tie::Hash;
tie my %tie_hash => 'Tie::StdHash';
...
local($tie_hash{Foo}) = 1; # leaks
=head2 Known test failures
=over
=item *
64-bit builds
Subtest #15 of lib/b.t may fail under 64-bit builds on platforms such
as HP-UX PA64 and Linux IA64. The issue is still being investigated.
The lib/io_multihomed test may hang in HP-UX if Perl has been
configured to be 64-bit. Because other 64-bit platforms do not
hang in this test, HP-UX is suspect. All other tests pass
in 64-bit HP-UX. The test attempts to create and connect to
"multihomed" sockets (sockets which have multiple IP addresses).
Note that 64-bit support is still experimental.
=item *
Failure of Thread tests
The subtests 19 and 20 of lib/thr5005.t test are known to fail due to
fundamental problems in the 5.005 threading implementation. These are
not new failures--Perl 5.005_0x has the same bugs, but didn't have these
tests. (Note that support for 5.005-style threading remains experimental.)
=item *
NEXTSTEP 3.3 POSIX test failure
In NEXTSTEP 3.3p2 the implementation of the strftime(3) in the
operating system libraries is buggy: the %j format numbers the days of
a month starting from zero, which, while being logical to programmers,
will cause the subtests 19 to 27 of the lib/posix test may fail.
=item *
Tru64 (aka Digital UNIX, aka DEC OSF/1) lib/sdbm test failure with gcc
If compiled with gcc 2.95 the lib/sdbm test will fail (dump core).
The cure is to use the vendor cc, it comes with the operating system
and produces good code.
=back
=head2 EBCDIC platforms not fully supported
In earlier releases of Perl, EBCDIC environments like OS390 (also
known as Open Edition MVS) and VM-ESA were supported. Due to changes
required by the UTF-8 (Unicode) support, the EBCDIC platforms are not
supported in Perl 5.6.0.
The 5.6.1 release improves support for EBCDIC platforms, but they
are not fully supported yet.
=head2 UNICOS/mk CC failures during Configure run
In UNICOS/mk the following errors may appear during the Configure run:
Guessing which symbols your C compiler and preprocessor define...
CC-20 cc: ERROR File = try.c, Line = 3
...
bad switch yylook 79bad switch yylook 79bad switch yylook 79bad switch yylook 79#ifdef A29K
...
4 errors detected in the compilation of "try.c".
The culprit is the broken awk of UNICOS/mk. The effect is fortunately
rather mild: Perl itself is not adversely affected by the error, only
the h2ph utility coming with Perl, and that is rather rarely needed
these days.
=head2 Arrow operator and arrays
When the left argument to the arrow operator C<< -> >> is an array, or
the C<scalar> operator operating on an array, the result of the
operation must be considered erroneous. For example:
@x->[2]
scalar(@x)->[2]
These expressions will get run-time errors in some future release of
Perl.
=head2 Experimental features
As discussed above, many features are still experimental. Interfaces and
implementation of these features are subject to change, and in extreme cases,
even subject to removal in some future release of Perl. These features
include the following:
=over 4
=item Threads
=item Unicode
=item 64-bit support
=item Lvalue subroutines
=item Weak references
=item The pseudo-hash data type
=item The Compiler suite
=item Internal implementation of file globbing
=item The DB module
=item The regular expression code constructs:
C<(?{ code })> and C<(??{ code })>
=back
=head1 Obsolete Diagnostics
=over 4
=item Character class syntax [: :] is reserved for future extensions
(W) Within regular expression character classes ([]) the syntax beginning
with "[:" and ending with ":]" is reserved for future extensions.
If you need to represent those character sequences inside a regular
expression character class, just quote the square brackets with the
backslash: "\[:" and ":\]".
=item Ill-formed logical name |%s| in prime_env_iter
(W) A warning peculiar to VMS. A logical name was encountered when preparing
to iterate over %ENV which violates the syntactic rules governing logical
names. Because it cannot be translated normally, it is skipped, and will not
appear in %ENV. This may be a benign occurrence, as some software packages
might directly modify logical name tables and introduce nonstandard names,
or it may indicate that a logical name table has been corrupted.
=item In string, @%s now must be written as \@%s
The description of this error used to say:
(Someday it will simply assume that an unbackslashed @
interpolates an array.)
That day has come, and this fatal error has been removed. It has been
replaced by a non-fatal warning instead.
See L</Arrays now always interpolate into double-quoted strings> for
details.
=item Probable precedence problem on %s
(W) The compiler found a bareword where it expected a conditional,
which often indicates that an || or && was parsed as part of the
last argument of the previous construct, for example:
open FOO || die;
=item regexp too big
(F) The current implementation of regular expressions uses shorts as
address offsets within a string. Unfortunately this means that if
the regular expression compiles to longer than 32767, it'll blow up.
Usually when you want a regular expression this big, there is a better
way to do it with multiple statements. See L<perlre>.
=item Use of "$$<digit>" to mean "${$}<digit>" is deprecated
(D) Perl versions before 5.004 misinterpreted any type marker followed
by "$" and a digit. For example, "$$0" was incorrectly taken to mean
"${$}0" instead of "${$0}". This bug is (mostly) fixed in Perl 5.004.
However, the developers of Perl 5.004 could not fix this bug completely,
because at least two widely-used modules depend on the old meaning of
"$$0" in a string. So Perl 5.004 still interprets "$$<digit>" in the
old (broken) way inside strings; but it generates this message as a
warning. And in Perl 5.005, this special treatment will cease.
=back
=head1 Reporting Bugs
If you find what you think is a bug, you might check the
articles recently posted to the comp.lang.perl.misc newsgroup.
There may also be information at http://www.perl.com/ , the Perl
Home Page.
If you believe you have an unreported bug, please run the B<perlbug>
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
=head1 SEE ALSO
The F<Changes> file for exhaustive details on what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=head1 HISTORY
Written by Gurusamy Sarathy <F<gsar@ActiveState.com>>, with many
contributions from The Perl Porters.
Send omissions or corrections to <F<perlbug@perl.org>>.
=cut
PK z3�Z/�a� �
perlhaiku.podnu �[��� If you read this file _as_is_, just ignore the funny characters you see.
It is written in the POD format (see pod/perlpod.pod) which is specially
designed to be readable as is.
=head1 NAME
perlhaiku - Perl version 5.10+ on Haiku
=head1 DESCRIPTION
This file contains instructions how to build Perl for Haiku and lists
known problems.
=head1 BUILD AND INSTALL
The build procedure is completely standard:
./Configure -de
make
make install
Make perl executable and create a symlink for libperl:
chmod a+x /boot/common/bin/perl
cd /boot/common/lib; ln -s perl5/5.26.3/BePC-haiku/CORE/libperl.so .
Replace C<5.26.3> with your respective version of Perl.
=head1 KNOWN PROBLEMS
The following problems are encountered with Haiku revision 28311:
=over 4
=item *
Perl cannot be compiled with threading support ATM.
=item *
The F<cpan/Socket/t/socketpair.t> test fails. More precisely: the subtests
using datagram sockets fail. Unix datagram sockets aren't implemented in
Haiku yet.
=item *
A subtest of the F<cpan/Sys-Syslog/t/syslog.t> test fails. This is due to Haiku
not implementing F</dev/log> support yet.
=item *
The tests F<dist/Net-Ping/t/450_service.t> and F<dist/Net-Ping/t/510_ping_udp.t>
fail. This is due to bugs in Haiku's network stack implementation.
=back
=head1 CONTACT
For Haiku specific problems contact the HaikuPorts developers:
L<http://ports.haiku-files.org/>
The initial Haiku port was done by Ingo Weinhold <ingo_weinhold@gmx.de>.
Last update: 2008-10-29
PK z3�Z�_z~|) |) perltrap.podnu �[��� =head1 NAME
perltrap - Perl traps for the unwary
=head1 DESCRIPTION
The biggest trap of all is forgetting to C<use warnings> or use the B<-w>
switch; see L<warnings> and L<perlrun>. The second biggest trap is not
making your entire program runnable under C<use strict>. The third biggest
trap is not reading the list of changes in this version of Perl; see
L<perldelta>.
=head2 Awk Traps
Accustomed B<awk> users should take special note of the following:
=over 4
=item *
A Perl program executes only once, not once for each input line. You can
do an implicit loop with C<-n> or C<-p>.
=item *
The English module, loaded via
use English;
allows you to refer to special variables (like C<$/>) with names (like
$RS), as though they were in B<awk>; see L<perlvar> for details.
=item *
Semicolons are required after all simple statements in Perl (except
at the end of a block). Newline is not a statement delimiter.
=item *
Curly brackets are required on C<if>s and C<while>s.
=item *
Variables begin with "$", "@" or "%" in Perl.
=item *
Arrays index from 0. Likewise string positions in substr() and
index().
=item *
You have to decide whether your array has numeric or string indices.
=item *
Hash values do not spring into existence upon mere reference.
=item *
You have to decide whether you want to use string or numeric
comparisons.
=item *
Reading an input line does not split it for you. You get to split it
to an array yourself. And the split() operator has different
arguments than B<awk>'s.
=item *
The current input line is normally in $_, not $0. It generally does
not have the newline stripped. ($0 is the name of the program
executed.) See L<perlvar>.
=item *
$<I<digit>> does not refer to fields--it refers to substrings matched
by the last match pattern.
=item *
The print() statement does not add field and record separators unless
you set C<$,> and C<$\>. You can set $OFS and $ORS if you're using
the English module.
=item *
You must open your files before you print to them.
=item *
The range operator is "..", not comma. The comma operator works as in
C.
=item *
The match operator is "=~", not "~". ("~" is the one's complement
operator, as in C.)
=item *
The exponentiation operator is "**", not "^". "^" is the XOR
operator, as in C. (You know, one could get the feeling that B<awk> is
basically incompatible with C.)
=item *
The concatenation operator is ".", not the null string. (Using the
null string would render C</pat/ /pat/> unparsable, because the third slash
would be interpreted as a division operator--the tokenizer is in fact
slightly context sensitive for operators like "/", "?", and ">".
And in fact, "." itself can be the beginning of a number.)
=item *
The C<next>, C<exit>, and C<continue> keywords work differently.
=item *
The following variables work differently:
Awk Perl
ARGC scalar @ARGV (compare with $#ARGV)
ARGV[0] $0
FILENAME $ARGV
FNR $. - something
FS (whatever you like)
NF $#Fld, or some such
NR $.
OFMT $#
OFS $,
ORS $\
RLENGTH length($&)
RS $/
RSTART length($`)
SUBSEP $;
=item *
You cannot set $RS to a pattern, only a string.
=item *
When in doubt, run the B<awk> construct through B<a2p> and see what it
gives you.
=back
=head2 C/C++ Traps
Cerebral C and C++ programmers should take note of the following:
=over 4
=item *
Curly brackets are required on C<if>'s and C<while>'s.
=item *
You must use C<elsif> rather than C<else if>.
=item *
The C<break> and C<continue> keywords from C become in Perl C<last>
and C<next>, respectively. Unlike in C, these do I<not> work within a
C<do { } while> construct. See L<perlsyn/"Loop Control">.
=item *
The switch statement is called C<given>/C<when> and only available in
perl 5.10 or newer. See L<perlsyn/"Switch Statements">.
=item *
Variables begin with "$", "@" or "%" in Perl.
=item *
Comments begin with "#", not "/*" or "//". Perl may interpret C/C++
comments as division operators, unterminated regular expressions or
the defined-or operator.
=item *
You can't take the address of anything, although a similar operator
in Perl is the backslash, which creates a reference.
=item *
C<ARGV> must be capitalized. C<$ARGV[0]> is C's C<argv[1]>, and C<argv[0]>
ends up in C<$0>.
=item *
System calls such as link(), unlink(), rename(), etc. return nonzero for
success, not 0. (system(), however, returns zero for success.)
=item *
Signal handlers deal with signal names, not numbers. Use C<kill -l>
to find their names on your system.
=back
=head2 JavaScript Traps
Judicious JavaScript programmers should take note of the following:
=over 4
=item *
In Perl, binary C<+> is always addition. C<$string1 + $string2> converts
both strings to numbers and then adds them. To concatenate two strings,
use the C<.> operator.
=item *
The C<+> unary operator doesn't do anything in Perl. It exists to avoid
syntactic ambiguities.
=item *
Unlike C<for...in>, Perl's C<for> (also spelled C<foreach>) does not allow
the left-hand side to be an arbitrary expression. It must be a variable:
for my $variable (keys %hash) {
...
}
Furthermore, don't forget the C<keys> in there, as
C<foreach my $kv (%hash) {}> iterates over the keys and values, and is
generally not useful ($kv would be a key, then a value, and so on).
=item *
To iterate over the indices of an array, use C<foreach my $i (0 .. $#array)
{}>. C<foreach my $v (@array) {}> iterates over the values.
=item *
Perl requires braces following C<if>, C<while>, C<foreach>, etc.
=item *
In Perl, C<else if> is spelled C<elsif>.
=item *
C<? :> has higher precedence than assignment. In JavaScript, one can
write:
condition ? do_something() : variable = 3
and the variable is only assigned if the condition is false. In Perl, you
need parentheses:
$condition ? do_something() : ($variable = 3);
Or just use C<if>.
=item *
Perl requires semicolons to separate statements.
=item *
Variables declared with C<my> only affect code I<after> the declaration.
You cannot write C<$x = 1; my $x;> and expect the first assignment to
affect the same variable. It will instead assign to an C<$x> declared
previously in an outer scope, or to a global variable.
Note also that the variable is not visible until the following
I<statement>. This means that in C<my $x = 1 + $x> the second $x refers
to one declared previously.
=item *
C<my> variables are scoped to the current block, not to the current
function. If you write C<{my $x;} $x;>, the second C<$x> does not refer to
the one declared inside the block.
=item *
An object's members cannot be made accessible as variables. The closest
Perl equivalent to C<with(object) { method() }> is C<for>, which can alias
C<$_> to the object:
for ($object) {
$_->method;
}
=item *
The object or class on which a method is called is passed as one of the
method's arguments, not as a separate C<this> value.
=back
=head2 Sed Traps
Seasoned B<sed> programmers should take note of the following:
=over 4
=item *
A Perl program executes only once, not once for each input line. You can
do an implicit loop with C<-n> or C<-p>.
=item *
Backreferences in substitutions use "$" rather than "\".
=item *
The pattern matching metacharacters "(", ")", and "|" do not have backslashes
in front.
=item *
The range operator is C<...>, rather than comma.
=back
=head2 Shell Traps
Sharp shell programmers should take note of the following:
=over 4
=item *
The backtick operator does variable interpolation without regard to
the presence of single quotes in the command.
=item *
The backtick operator does no translation of the return value, unlike B<csh>.
=item *
Shells (especially B<csh>) do several levels of substitution on each
command line. Perl does substitution in only certain constructs
such as double quotes, backticks, angle brackets, and search patterns.
=item *
Shells interpret scripts a little bit at a time. Perl compiles the
entire program before executing it (except for C<BEGIN> blocks, which
execute at compile time).
=item *
The arguments are available via @ARGV, not $1, $2, etc.
=item *
The environment is not automatically made available as separate scalar
variables.
=item *
The shell's C<test> uses "=", "!=", "<" etc for string comparisons and "-eq",
"-ne", "-lt" etc for numeric comparisons. This is the reverse of Perl, which
uses C<eq>, C<ne>, C<lt> for string comparisons, and C<==>, C<!=> C<< < >> etc
for numeric comparisons.
=back
=head2 Perl Traps
Practicing Perl Programmers should take note of the following:
=over 4
=item *
Remember that many operations behave differently in a list
context than they do in a scalar one. See L<perldata> for details.
=item *
Avoid barewords if you can, especially all lowercase ones.
You can't tell by just looking at it whether a bareword is
a function or a string. By using quotes on strings and
parentheses on function calls, you won't ever get them confused.
=item *
You cannot discern from mere inspection which builtins
are unary operators (like chop() and chdir())
and which are list operators (like print() and unlink()).
(Unless prototyped, user-defined subroutines can B<only> be list
operators, never unary ones.) See L<perlop> and L<perlsub>.
=item *
People have a hard time remembering that some functions
default to $_, or @ARGV, or whatever, but that others which
you might expect to do not.
=item *
The <FH> construct is not the name of the filehandle, it is a readline
operation on that handle. The data read is assigned to $_ only if the
file read is the sole condition in a while loop:
while (<FH>) { }
while (defined($_ = <FH>)) { }..
<FH>; # data discarded!
=item *
Remember not to use C<=> when you need C<=~>;
these two constructs are quite different:
$x = /foo/;
$x =~ /foo/;
=item *
The C<do {}> construct isn't a real loop that you can use
loop control on.
=item *
Use C<my()> for local variables whenever you can get away with
it (but see L<perlform> for where you can't).
Using C<local()> actually gives a local value to a global
variable, which leaves you open to unforeseen side-effects
of dynamic scoping.
=item *
If you localize an exported variable in a module, its exported value will
not change. The local name becomes an alias to a new value but the
external name is still an alias for the original.
=back
As always, if any of these are ever officially declared as bugs,
they'll be fixed and removed.
PK z3�Zbj-R�� �� perlcall.podnu �[��� =head1 NAME
perlcall - Perl calling conventions from C
=head1 DESCRIPTION
The purpose of this document is to show you how to call Perl subroutines
directly from C, i.e., how to write I<callbacks>.
Apart from discussing the C interface provided by Perl for writing
callbacks the document uses a series of examples to show how the
interface actually works in practice. In addition some techniques for
coding callbacks are covered.
Examples where callbacks are necessary include
=over 5
=item * An Error Handler
You have created an XSUB interface to an application's C API.
A fairly common feature in applications is to allow you to define a C
function that will be called whenever something nasty occurs. What we
would like is to be able to specify a Perl subroutine that will be
called instead.
=item * An Event-Driven Program
The classic example of where callbacks are used is when writing an
event driven program, such as for an X11 application. In this case
you register functions to be called whenever specific events occur,
e.g., a mouse button is pressed, the cursor moves into a window or a
menu item is selected.
=back
Although the techniques described here are applicable when embedding
Perl in a C program, this is not the primary goal of this document.
There are other details that must be considered and are specific to
embedding Perl. For details on embedding Perl in C refer to
L<perlembed>.
Before you launch yourself head first into the rest of this document,
it would be a good idea to have read the following two documents--L<perlxs>
and L<perlguts>.
=head1 THE CALL_ FUNCTIONS
Although this stuff is easier to explain using examples, you first need
be aware of a few important definitions.
Perl has a number of C functions that allow you to call Perl
subroutines. They are
I32 call_sv(SV* sv, I32 flags);
I32 call_pv(char *subname, I32 flags);
I32 call_method(char *methname, I32 flags);
I32 call_argv(char *subname, I32 flags, char **argv);
The key function is I<call_sv>. All the other functions are
fairly simple wrappers which make it easier to call Perl subroutines in
special cases. At the end of the day they will all call I<call_sv>
to invoke the Perl subroutine.
All the I<call_*> functions have a C<flags> parameter which is
used to pass a bit mask of options to Perl. This bit mask operates
identically for each of the functions. The settings available in the
bit mask are discussed in L</FLAG VALUES>.
Each of the functions will now be discussed in turn.
=over 5
=item call_sv
I<call_sv> takes two parameters. The first, C<sv>, is an SV*.
This allows you to specify the Perl subroutine to be called either as a
C string (which has first been converted to an SV) or a reference to a
subroutine. The section, L</Using call_sv>, shows how you can make
use of I<call_sv>.
=item call_pv
The function, I<call_pv>, is similar to I<call_sv> except it
expects its first parameter to be a C char* which identifies the Perl
subroutine you want to call, e.g., C<call_pv("fred", 0)>. If the
subroutine you want to call is in another package, just include the
package name in the string, e.g., C<"pkg::fred">.
=item call_method
The function I<call_method> is used to call a method from a Perl
class. The parameter C<methname> corresponds to the name of the method
to be called. Note that the class that the method belongs to is passed
on the Perl stack rather than in the parameter list. This class can be
either the name of the class (for a static method) or a reference to an
object (for a virtual method). See L<perlobj> for more information on
static and virtual methods and L</Using call_method> for an example
of using I<call_method>.
=item call_argv
I<call_argv> calls the Perl subroutine specified by the C string
stored in the C<subname> parameter. It also takes the usual C<flags>
parameter. The final parameter, C<argv>, consists of a NULL-terminated
list of C strings to be passed as parameters to the Perl subroutine.
See L</Using call_argv>.
=back
All the functions return an integer. This is a count of the number of
items returned by the Perl subroutine. The actual items returned by the
subroutine are stored on the Perl stack.
As a general rule you should I<always> check the return value from
these functions. Even if you are expecting only a particular number of
values to be returned from the Perl subroutine, there is nothing to
stop someone from doing something unexpected--don't say you haven't
been warned.
=head1 FLAG VALUES
The C<flags> parameter in all the I<call_*> functions is one of G_VOID,
G_SCALAR, or G_ARRAY, which indicate the call context, OR'ed together
with a bit mask of any combination of the other G_* symbols defined below.
=head2 G_VOID
Calls the Perl subroutine in a void context.
This flag has 2 effects:
=over 5
=item 1.
It indicates to the subroutine being called that it is executing in
a void context (if it executes I<wantarray> the result will be the
undefined value).
=item 2.
It ensures that nothing is actually returned from the subroutine.
=back
The value returned by the I<call_*> function indicates how many
items have been returned by the Perl subroutine--in this case it will
be 0.
=head2 G_SCALAR
Calls the Perl subroutine in a scalar context. This is the default
context flag setting for all the I<call_*> functions.
This flag has 2 effects:
=over 5
=item 1.
It indicates to the subroutine being called that it is executing in a
scalar context (if it executes I<wantarray> the result will be false).
=item 2.
It ensures that only a scalar is actually returned from the subroutine.
The subroutine can, of course, ignore the I<wantarray> and return a
list anyway. If so, then only the last element of the list will be
returned.
=back
The value returned by the I<call_*> function indicates how many
items have been returned by the Perl subroutine - in this case it will
be either 0 or 1.
If 0, then you have specified the G_DISCARD flag.
If 1, then the item actually returned by the Perl subroutine will be
stored on the Perl stack - the section L</Returning a Scalar> shows how
to access this value on the stack. Remember that regardless of how
many items the Perl subroutine returns, only the last one will be
accessible from the stack - think of the case where only one value is
returned as being a list with only one element. Any other items that
were returned will not exist by the time control returns from the
I<call_*> function. The section L</Returning a List in Scalar
Context> shows an example of this behavior.
=head2 G_ARRAY
Calls the Perl subroutine in a list context.
As with G_SCALAR, this flag has 2 effects:
=over 5
=item 1.
It indicates to the subroutine being called that it is executing in a
list context (if it executes I<wantarray> the result will be true).
=item 2.
It ensures that all items returned from the subroutine will be
accessible when control returns from the I<call_*> function.
=back
The value returned by the I<call_*> function indicates how many
items have been returned by the Perl subroutine.
If 0, then you have specified the G_DISCARD flag.
If not 0, then it will be a count of the number of items returned by
the subroutine. These items will be stored on the Perl stack. The
section L</Returning a List of Values> gives an example of using the
G_ARRAY flag and the mechanics of accessing the returned items from the
Perl stack.
=head2 G_DISCARD
By default, the I<call_*> functions place the items returned from
by the Perl subroutine on the stack. If you are not interested in
these items, then setting this flag will make Perl get rid of them
automatically for you. Note that it is still possible to indicate a
context to the Perl subroutine by using either G_SCALAR or G_ARRAY.
If you do not set this flag then it is I<very> important that you make
sure that any temporaries (i.e., parameters passed to the Perl
subroutine and values returned from the subroutine) are disposed of
yourself. The section L</Returning a Scalar> gives details of how to
dispose of these temporaries explicitly and the section L</Using Perl to
Dispose of Temporaries> discusses the specific circumstances where you
can ignore the problem and let Perl deal with it for you.
=head2 G_NOARGS
Whenever a Perl subroutine is called using one of the I<call_*>
functions, it is assumed by default that parameters are to be passed to
the subroutine. If you are not passing any parameters to the Perl
subroutine, you can save a bit of time by setting this flag. It has
the effect of not creating the C<@_> array for the Perl subroutine.
Although the functionality provided by this flag may seem
straightforward, it should be used only if there is a good reason to do
so. The reason for being cautious is that, even if you have specified
the G_NOARGS flag, it is still possible for the Perl subroutine that
has been called to think that you have passed it parameters.
In fact, what can happen is that the Perl subroutine you have called
can access the C<@_> array from a previous Perl subroutine. This will
occur when the code that is executing the I<call_*> function has
itself been called from another Perl subroutine. The code below
illustrates this
sub fred
{ print "@_\n" }
sub joe
{ &fred }
&joe(1,2,3);
This will print
1 2 3
What has happened is that C<fred> accesses the C<@_> array which
belongs to C<joe>.
=head2 G_EVAL
It is possible for the Perl subroutine you are calling to terminate
abnormally, e.g., by calling I<die> explicitly or by not actually
existing. By default, when either of these events occurs, the
process will terminate immediately. If you want to trap this
type of event, specify the G_EVAL flag. It will put an I<eval { }>
around the subroutine call.
Whenever control returns from the I<call_*> function you need to
check the C<$@> variable as you would in a normal Perl script.
The value returned from the I<call_*> function is dependent on
what other flags have been specified and whether an error has
occurred. Here are all the different cases that can occur:
=over 5
=item *
If the I<call_*> function returns normally, then the value
returned is as specified in the previous sections.
=item *
If G_DISCARD is specified, the return value will always be 0.
=item *
If G_ARRAY is specified I<and> an error has occurred, the return value
will always be 0.
=item *
If G_SCALAR is specified I<and> an error has occurred, the return value
will be 1 and the value on the top of the stack will be I<undef>. This
means that if you have already detected the error by checking C<$@> and
you want the program to continue, you must remember to pop the I<undef>
from the stack.
=back
See L</Using G_EVAL> for details on using G_EVAL.
=head2 G_KEEPERR
Using the G_EVAL flag described above will always set C<$@>: clearing
it if there was no error, and setting it to describe the error if there
was an error in the called code. This is what you want if your intention
is to handle possible errors, but sometimes you just want to trap errors
and stop them interfering with the rest of the program.
This scenario will mostly be applicable to code that is meant to be called
from within destructors, asynchronous callbacks, and signal handlers.
In such situations, where the code being called has little relation to the
surrounding dynamic context, the main program needs to be insulated from
errors in the called code, even if they can't be handled intelligently.
It may also be useful to do this with code for C<__DIE__> or C<__WARN__>
hooks, and C<tie> functions.
The G_KEEPERR flag is meant to be used in conjunction with G_EVAL in
I<call_*> functions that are used to implement such code, or with
C<eval_sv>. This flag has no effect on the C<call_*> functions when
G_EVAL is not used.
When G_KEEPERR is used, any error in the called code will terminate the
call as usual, and the error will not propagate beyond the call (as usual
for G_EVAL), but it will not go into C<$@>. Instead the error will be
converted into a warning, prefixed with the string "\t(in cleanup)".
This can be disabled using C<no warnings 'misc'>. If there is no error,
C<$@> will not be cleared.
Note that the G_KEEPERR flag does not propagate into inner evals; these
may still set C<$@>.
The G_KEEPERR flag was introduced in Perl version 5.002.
See L</Using G_KEEPERR> for an example of a situation that warrants the
use of this flag.
=head2 Determining the Context
As mentioned above, you can determine the context of the currently
executing subroutine in Perl with I<wantarray>. The equivalent test
can be made in C by using the C<GIMME_V> macro, which returns
C<G_ARRAY> if you have been called in a list context, C<G_SCALAR> if
in a scalar context, or C<G_VOID> if in a void context (i.e., the
return value will not be used). An older version of this macro is
called C<GIMME>; in a void context it returns C<G_SCALAR> instead of
C<G_VOID>. An example of using the C<GIMME_V> macro is shown in
section L</Using GIMME_V>.
=head1 EXAMPLES
Enough of the definition talk! Let's have a few examples.
Perl provides many macros to assist in accessing the Perl stack.
Wherever possible, these macros should always be used when interfacing
to Perl internals. We hope this should make the code less vulnerable
to any changes made to Perl in the future.
Another point worth noting is that in the first series of examples I
have made use of only the I<call_pv> function. This has been done
to keep the code simpler and ease you into the topic. Wherever
possible, if the choice is between using I<call_pv> and
I<call_sv>, you should always try to use I<call_sv>. See
L</Using call_sv> for details.
=head2 No Parameters, Nothing Returned
This first trivial example will call a Perl subroutine, I<PrintUID>, to
print out the UID of the process.
sub PrintUID
{
print "UID is $<\n";
}
and here is a C function to call it
static void
call_PrintUID()
{
dSP;
PUSHMARK(SP);
call_pv("PrintUID", G_DISCARD|G_NOARGS);
}
Simple, eh?
A few points to note about this example:
=over 5
=item 1.
Ignore C<dSP> and C<PUSHMARK(SP)> for now. They will be discussed in
the next example.
=item 2.
We aren't passing any parameters to I<PrintUID> so G_NOARGS can be
specified.
=item 3.
We aren't interested in anything returned from I<PrintUID>, so
G_DISCARD is specified. Even if I<PrintUID> was changed to
return some value(s), having specified G_DISCARD will mean that they
will be wiped by the time control returns from I<call_pv>.
=item 4.
As I<call_pv> is being used, the Perl subroutine is specified as a
C string. In this case the subroutine name has been 'hard-wired' into the
code.
=item 5.
Because we specified G_DISCARD, it is not necessary to check the value
returned from I<call_pv>. It will always be 0.
=back
=head2 Passing Parameters
Now let's make a slightly more complex example. This time we want to
call a Perl subroutine, C<LeftString>, which will take 2 parameters--a
string ($s) and an integer ($n). The subroutine will simply
print the first $n characters of the string.
So the Perl subroutine would look like this:
sub LeftString
{
my($s, $n) = @_;
print substr($s, 0, $n), "\n";
}
The C function required to call I<LeftString> would look like this:
static void
call_LeftString(a, b)
char * a;
int b;
{
dSP;
ENTER;
SAVETMPS;
PUSHMARK(SP);
EXTEND(SP, 2);
PUSHs(sv_2mortal(newSVpv(a, 0)));
PUSHs(sv_2mortal(newSViv(b)));
PUTBACK;
call_pv("LeftString", G_DISCARD);
FREETMPS;
LEAVE;
}
Here are a few notes on the C function I<call_LeftString>.
=over 5
=item 1.
Parameters are passed to the Perl subroutine using the Perl stack.
This is the purpose of the code beginning with the line C<dSP> and
ending with the line C<PUTBACK>. The C<dSP> declares a local copy
of the stack pointer. This local copy should B<always> be accessed
as C<SP>.
=item 2.
If you are going to put something onto the Perl stack, you need to know
where to put it. This is the purpose of the macro C<dSP>--it declares
and initializes a I<local> copy of the Perl stack pointer.
All the other macros which will be used in this example require you to
have used this macro.
The exception to this rule is if you are calling a Perl subroutine
directly from an XSUB function. In this case it is not necessary to
use the C<dSP> macro explicitly--it will be declared for you
automatically.
=item 3.
Any parameters to be pushed onto the stack should be bracketed by the
C<PUSHMARK> and C<PUTBACK> macros. The purpose of these two macros, in
this context, is to count the number of parameters you are
pushing automatically. Then whenever Perl is creating the C<@_> array for the
subroutine, it knows how big to make it.
The C<PUSHMARK> macro tells Perl to make a mental note of the current
stack pointer. Even if you aren't passing any parameters (like the
example shown in the section L</No Parameters, Nothing Returned>) you
must still call the C<PUSHMARK> macro before you can call any of the
I<call_*> functions--Perl still needs to know that there are no
parameters.
The C<PUTBACK> macro sets the global copy of the stack pointer to be
the same as our local copy. If we didn't do this, I<call_pv>
wouldn't know where the two parameters we pushed were--remember that
up to now all the stack pointer manipulation we have done is with our
local copy, I<not> the global copy.
=item 4.
Next, we come to EXTEND and PUSHs. This is where the parameters
actually get pushed onto the stack. In this case we are pushing a
string and an integer.
Alternatively you can use the XPUSHs() macro, which combines a
C<EXTEND(SP, 1)> and C<PUSHs()>. This is less efficient if you're
pushing multiple values.
See L<perlguts/"XSUBs and the Argument Stack"> for details
on how the PUSH macros work.
=item 5.
Because we created temporary values (by means of sv_2mortal() calls)
we will have to tidy up the Perl stack and dispose of mortal SVs.
This is the purpose of
ENTER;
SAVETMPS;
at the start of the function, and
FREETMPS;
LEAVE;
at the end. The C<ENTER>/C<SAVETMPS> pair creates a boundary for any
temporaries we create. This means that the temporaries we get rid of
will be limited to those which were created after these calls.
The C<FREETMPS>/C<LEAVE> pair will get rid of any values returned by
the Perl subroutine (see next example), plus it will also dump the
mortal SVs we have created. Having C<ENTER>/C<SAVETMPS> at the
beginning of the code makes sure that no other mortals are destroyed.
Think of these macros as working a bit like C<{> and C<}> in Perl
to limit the scope of local variables.
See the section L</Using Perl to Dispose of Temporaries> for details of
an alternative to using these macros.
=item 6.
Finally, I<LeftString> can now be called via the I<call_pv> function.
The only flag specified this time is G_DISCARD. Because we are passing
2 parameters to the Perl subroutine this time, we have not specified
G_NOARGS.
=back
=head2 Returning a Scalar
Now for an example of dealing with the items returned from a Perl
subroutine.
Here is a Perl subroutine, I<Adder>, that takes 2 integer parameters
and simply returns their sum.
sub Adder
{
my($a, $b) = @_;
$a + $b;
}
Because we are now concerned with the return value from I<Adder>, the C
function required to call it is now a bit more complex.
static void
call_Adder(a, b)
int a;
int b;
{
dSP;
int count;
ENTER;
SAVETMPS;
PUSHMARK(SP);
EXTEND(SP, 2);
PUSHs(sv_2mortal(newSViv(a)));
PUSHs(sv_2mortal(newSViv(b)));
PUTBACK;
count = call_pv("Adder", G_SCALAR);
SPAGAIN;
if (count != 1)
croak("Big trouble\n");
printf ("The sum of %d and %d is %d\n", a, b, POPi);
PUTBACK;
FREETMPS;
LEAVE;
}
Points to note this time are
=over 5
=item 1.
The only flag specified this time was G_SCALAR. That means that the C<@_>
array will be created and that the value returned by I<Adder> will
still exist after the call to I<call_pv>.
=item 2.
The purpose of the macro C<SPAGAIN> is to refresh the local copy of the
stack pointer. This is necessary because it is possible that the memory
allocated to the Perl stack has been reallocated during the
I<call_pv> call.
If you are making use of the Perl stack pointer in your code you must
always refresh the local copy using SPAGAIN whenever you make use
of the I<call_*> functions or any other Perl internal function.
=item 3.
Although only a single value was expected to be returned from I<Adder>,
it is still good practice to check the return code from I<call_pv>
anyway.
Expecting a single value is not quite the same as knowing that there
will be one. If someone modified I<Adder> to return a list and we
didn't check for that possibility and take appropriate action the Perl
stack would end up in an inconsistent state. That is something you
I<really> don't want to happen ever.
=item 4.
The C<POPi> macro is used here to pop the return value from the stack.
In this case we wanted an integer, so C<POPi> was used.
Here is the complete list of POP macros available, along with the types
they return.
POPs SV
POPp pointer (PV)
POPpbytex pointer to bytes (PV)
POPn double (NV)
POPi integer (IV)
POPu unsigned integer (UV)
POPl long
POPul unsigned long
Since these macros have side-effects don't use them as arguments to
macros that may evaluate their argument several times, for example:
/* Bad idea, don't do this */
STRLEN len;
const char *s = SvPV(POPs, len);
Instead, use a temporary:
STRLEN len;
SV *sv = POPs;
const char *s = SvPV(sv, len);
or a macro that guarantees it will evaluate its arguments only once:
STRLEN len;
const char *s = SvPVx(POPs, len);
=item 5.
The final C<PUTBACK> is used to leave the Perl stack in a consistent
state before exiting the function. This is necessary because when we
popped the return value from the stack with C<POPi> it updated only our
local copy of the stack pointer. Remember, C<PUTBACK> sets the global
stack pointer to be the same as our local copy.
=back
=head2 Returning a List of Values
Now, let's extend the previous example to return both the sum of the
parameters and the difference.
Here is the Perl subroutine
sub AddSubtract
{
my($a, $b) = @_;
($a+$b, $a-$b);
}
and this is the C function
static void
call_AddSubtract(a, b)
int a;
int b;
{
dSP;
int count;
ENTER;
SAVETMPS;
PUSHMARK(SP);
EXTEND(SP, 2);
PUSHs(sv_2mortal(newSViv(a)));
PUSHs(sv_2mortal(newSViv(b)));
PUTBACK;
count = call_pv("AddSubtract", G_ARRAY);
SPAGAIN;
if (count != 2)
croak("Big trouble\n");
printf ("%d - %d = %d\n", a, b, POPi);
printf ("%d + %d = %d\n", a, b, POPi);
PUTBACK;
FREETMPS;
LEAVE;
}
If I<call_AddSubtract> is called like this
call_AddSubtract(7, 4);
then here is the output
7 - 4 = 3
7 + 4 = 11
Notes
=over 5
=item 1.
We wanted list context, so G_ARRAY was used.
=item 2.
Not surprisingly C<POPi> is used twice this time because we were
retrieving 2 values from the stack. The important thing to note is that
when using the C<POP*> macros they come off the stack in I<reverse>
order.
=back
=head2 Returning a List in Scalar Context
Say the Perl subroutine in the previous section was called in a scalar
context, like this
static void
call_AddSubScalar(a, b)
int a;
int b;
{
dSP;
int count;
int i;
ENTER;
SAVETMPS;
PUSHMARK(SP);
EXTEND(SP, 2);
PUSHs(sv_2mortal(newSViv(a)));
PUSHs(sv_2mortal(newSViv(b)));
PUTBACK;
count = call_pv("AddSubtract", G_SCALAR);
SPAGAIN;
printf ("Items Returned = %d\n", count);
for (i = 1; i <= count; ++i)
printf ("Value %d = %d\n", i, POPi);
PUTBACK;
FREETMPS;
LEAVE;
}
The other modification made is that I<call_AddSubScalar> will print the
number of items returned from the Perl subroutine and their value (for
simplicity it assumes that they are integer). So if
I<call_AddSubScalar> is called
call_AddSubScalar(7, 4);
then the output will be
Items Returned = 1
Value 1 = 3
In this case the main point to note is that only the last item in the
list is returned from the subroutine. I<AddSubtract> actually made it back to
I<call_AddSubScalar>.
=head2 Returning Data from Perl via the Parameter List
It is also possible to return values directly via the parameter
list--whether it is actually desirable to do it is another matter entirely.
The Perl subroutine, I<Inc>, below takes 2 parameters and increments
each directly.
sub Inc
{
++ $_[0];
++ $_[1];
}
and here is a C function to call it.
static void
call_Inc(a, b)
int a;
int b;
{
dSP;
int count;
SV * sva;
SV * svb;
ENTER;
SAVETMPS;
sva = sv_2mortal(newSViv(a));
svb = sv_2mortal(newSViv(b));
PUSHMARK(SP);
EXTEND(SP, 2);
PUSHs(sva);
PUSHs(svb);
PUTBACK;
count = call_pv("Inc", G_DISCARD);
if (count != 0)
croak ("call_Inc: expected 0 values from 'Inc', got %d\n",
count);
printf ("%d + 1 = %d\n", a, SvIV(sva));
printf ("%d + 1 = %d\n", b, SvIV(svb));
FREETMPS;
LEAVE;
}
To be able to access the two parameters that were pushed onto the stack
after they return from I<call_pv> it is necessary to make a note
of their addresses--thus the two variables C<sva> and C<svb>.
The reason this is necessary is that the area of the Perl stack which
held them will very likely have been overwritten by something else by
the time control returns from I<call_pv>.
=head2 Using G_EVAL
Now an example using G_EVAL. Below is a Perl subroutine which computes
the difference of its 2 parameters. If this would result in a negative
result, the subroutine calls I<die>.
sub Subtract
{
my ($a, $b) = @_;
die "death can be fatal\n" if $a < $b;
$a - $b;
}
and some C to call it
static void
call_Subtract(a, b)
int a;
int b;
{
dSP;
int count;
SV *err_tmp;
ENTER;
SAVETMPS;
PUSHMARK(SP);
EXTEND(SP, 2);
PUSHs(sv_2mortal(newSViv(a)));
PUSHs(sv_2mortal(newSViv(b)));
PUTBACK;
count = call_pv("Subtract", G_EVAL|G_SCALAR);
SPAGAIN;
/* Check the eval first */
err_tmp = ERRSV;
if (SvTRUE(err_tmp))
{
printf ("Uh oh - %s\n", SvPV_nolen(err_tmp));
POPs;
}
else
{
if (count != 1)
croak("call_Subtract: wanted 1 value from 'Subtract', got %d\n",
count);
printf ("%d - %d = %d\n", a, b, POPi);
}
PUTBACK;
FREETMPS;
LEAVE;
}
If I<call_Subtract> is called thus
call_Subtract(4, 5)
the following will be printed
Uh oh - death can be fatal
Notes
=over 5
=item 1.
We want to be able to catch the I<die> so we have used the G_EVAL
flag. Not specifying this flag would mean that the program would
terminate immediately at the I<die> statement in the subroutine
I<Subtract>.
=item 2.
The code
err_tmp = ERRSV;
if (SvTRUE(err_tmp))
{
printf ("Uh oh - %s\n", SvPV_nolen(err_tmp));
POPs;
}
is the direct equivalent of this bit of Perl
print "Uh oh - $@\n" if $@;
C<PL_errgv> is a perl global of type C<GV *> that points to the symbol
table entry containing the error. C<ERRSV> therefore refers to the C
equivalent of C<$@>. We use a local temporary, C<err_tmp>, since
C<ERRSV> is a macro that calls a function, and C<SvTRUE(ERRSV)> would
end up calling that function multiple times.
=item 3.
Note that the stack is popped using C<POPs> in the block where
C<SvTRUE(err_tmp)> is true. This is necessary because whenever a
I<call_*> function invoked with G_EVAL|G_SCALAR returns an error,
the top of the stack holds the value I<undef>. Because we want the
program to continue after detecting this error, it is essential that
the stack be tidied up by removing the I<undef>.
=back
=head2 Using G_KEEPERR
Consider this rather facetious example, where we have used an XS
version of the call_Subtract example above inside a destructor:
package Foo;
sub new { bless {}, $_[0] }
sub Subtract {
my($a,$b) = @_;
die "death can be fatal" if $a < $b;
$a - $b;
}
sub DESTROY { call_Subtract(5, 4); }
sub foo { die "foo dies"; }
package main;
{
my $foo = Foo->new;
eval { $foo->foo };
}
print "Saw: $@" if $@; # should be, but isn't
This example will fail to recognize that an error occurred inside the
C<eval {}>. Here's why: the call_Subtract code got executed while perl
was cleaning up temporaries when exiting the outer braced block, and because
call_Subtract is implemented with I<call_pv> using the G_EVAL
flag, it promptly reset C<$@>. This results in the failure of the
outermost test for C<$@>, and thereby the failure of the error trap.
Appending the G_KEEPERR flag, so that the I<call_pv> call in
call_Subtract reads:
count = call_pv("Subtract", G_EVAL|G_SCALAR|G_KEEPERR);
will preserve the error and restore reliable error handling.
=head2 Using call_sv
In all the previous examples I have 'hard-wired' the name of the Perl
subroutine to be called from C. Most of the time though, it is more
convenient to be able to specify the name of the Perl subroutine from
within the Perl script, and you'll want to use
L<call_sv|perlapi/call_sv>.
Consider the Perl code below
sub fred
{
print "Hello there\n";
}
CallSubPV("fred");
Here is a snippet of XSUB which defines I<CallSubPV>.
void
CallSubPV(name)
char * name
CODE:
PUSHMARK(SP);
call_pv(name, G_DISCARD|G_NOARGS);
That is fine as far as it goes. The thing is, the Perl subroutine
can be specified as only a string, however, Perl allows references
to subroutines and anonymous subroutines.
This is where I<call_sv> is useful.
The code below for I<CallSubSV> is identical to I<CallSubPV> except
that the C<name> parameter is now defined as an SV* and we use
I<call_sv> instead of I<call_pv>.
void
CallSubSV(name)
SV * name
CODE:
PUSHMARK(SP);
call_sv(name, G_DISCARD|G_NOARGS);
Because we are using an SV to call I<fred> the following can all be used:
CallSubSV("fred");
CallSubSV(\&fred);
$ref = \&fred;
CallSubSV($ref);
CallSubSV( sub { print "Hello there\n" } );
As you can see, I<call_sv> gives you much greater flexibility in
how you can specify the Perl subroutine.
You should note that, if it is necessary to store the SV (C<name> in the
example above) which corresponds to the Perl subroutine so that it can
be used later in the program, it not enough just to store a copy of the
pointer to the SV. Say the code above had been like this:
static SV * rememberSub;
void
SaveSub1(name)
SV * name
CODE:
rememberSub = name;
void
CallSavedSub1()
CODE:
PUSHMARK(SP);
call_sv(rememberSub, G_DISCARD|G_NOARGS);
The reason this is wrong is that, by the time you come to use the
pointer C<rememberSub> in C<CallSavedSub1>, it may or may not still refer
to the Perl subroutine that was recorded in C<SaveSub1>. This is
particularly true for these cases:
SaveSub1(\&fred);
CallSavedSub1();
SaveSub1( sub { print "Hello there\n" } );
CallSavedSub1();
By the time each of the C<SaveSub1> statements above has been executed,
the SV*s which corresponded to the parameters will no longer exist.
Expect an error message from Perl of the form
Can't use an undefined value as a subroutine reference at ...
for each of the C<CallSavedSub1> lines.
Similarly, with this code
$ref = \&fred;
SaveSub1($ref);
$ref = 47;
CallSavedSub1();
you can expect one of these messages (which you actually get is dependent on
the version of Perl you are using)
Not a CODE reference at ...
Undefined subroutine &main::47 called ...
The variable $ref may have referred to the subroutine C<fred>
whenever the call to C<SaveSub1> was made but by the time
C<CallSavedSub1> gets called it now holds the number C<47>. Because we
saved only a pointer to the original SV in C<SaveSub1>, any changes to
$ref will be tracked by the pointer C<rememberSub>. This means that
whenever C<CallSavedSub1> gets called, it will attempt to execute the
code which is referenced by the SV* C<rememberSub>. In this case
though, it now refers to the integer C<47>, so expect Perl to complain
loudly.
A similar but more subtle problem is illustrated with this code:
$ref = \&fred;
SaveSub1($ref);
$ref = \&joe;
CallSavedSub1();
This time whenever C<CallSavedSub1> gets called it will execute the Perl
subroutine C<joe> (assuming it exists) rather than C<fred> as was
originally requested in the call to C<SaveSub1>.
To get around these problems it is necessary to take a full copy of the
SV. The code below shows C<SaveSub2> modified to do that.
/* this isn't thread-safe */
static SV * keepSub = (SV*)NULL;
void
SaveSub2(name)
SV * name
CODE:
/* Take a copy of the callback */
if (keepSub == (SV*)NULL)
/* First time, so create a new SV */
keepSub = newSVsv(name);
else
/* Been here before, so overwrite */
SvSetSV(keepSub, name);
void
CallSavedSub2()
CODE:
PUSHMARK(SP);
call_sv(keepSub, G_DISCARD|G_NOARGS);
To avoid creating a new SV every time C<SaveSub2> is called,
the function first checks to see if it has been called before. If not,
then space for a new SV is allocated and the reference to the Perl
subroutine C<name> is copied to the variable C<keepSub> in one
operation using C<newSVsv>. Thereafter, whenever C<SaveSub2> is called,
the existing SV, C<keepSub>, is overwritten with the new value using
C<SvSetSV>.
Note: using a static or global variable to store the SV isn't
thread-safe. You can either use the C<MY_CXT> mechanism documented in
L<perlxs/Safely Storing Static Data in XS> which is fast, or store the
values in perl global variables, using get_sv(), which is much slower.
=head2 Using call_argv
Here is a Perl subroutine which prints whatever parameters are passed
to it.
sub PrintList
{
my(@list) = @_;
foreach (@list) { print "$_\n" }
}
And here is an example of I<call_argv> which will call
I<PrintList>.
static char * words[] = {"alpha", "beta", "gamma", "delta", NULL};
static void
call_PrintList()
{
call_argv("PrintList", G_DISCARD, words);
}
Note that it is not necessary to call C<PUSHMARK> in this instance.
This is because I<call_argv> will do it for you.
=head2 Using call_method
Consider the following Perl code:
{
package Mine;
sub new
{
my($type) = shift;
bless [@_]
}
sub Display
{
my ($self, $index) = @_;
print "$index: $$self[$index]\n";
}
sub PrintID
{
my($class) = @_;
print "This is Class $class version 1.0\n";
}
}
It implements just a very simple class to manage an array. Apart from
the constructor, C<new>, it declares methods, one static and one
virtual. The static method, C<PrintID>, prints out simply the class
name and a version number. The virtual method, C<Display>, prints out a
single element of the array. Here is an all-Perl example of using it.
$a = Mine->new('red', 'green', 'blue');
$a->Display(1);
Mine->PrintID;
will print
1: green
This is Class Mine version 1.0
Calling a Perl method from C is fairly straightforward. The following
things are required:
=over 5
=item *
A reference to the object for a virtual method or the name of the class
for a static method
=item *
The name of the method
=item *
Any other parameters specific to the method
=back
Here is a simple XSUB which illustrates the mechanics of calling both
the C<PrintID> and C<Display> methods from C.
void
call_Method(ref, method, index)
SV * ref
char * method
int index
CODE:
PUSHMARK(SP);
EXTEND(SP, 2);
PUSHs(ref);
PUSHs(sv_2mortal(newSViv(index)));
PUTBACK;
call_method(method, G_DISCARD);
void
call_PrintID(class, method)
char * class
char * method
CODE:
PUSHMARK(SP);
XPUSHs(sv_2mortal(newSVpv(class, 0)));
PUTBACK;
call_method(method, G_DISCARD);
So the methods C<PrintID> and C<Display> can be invoked like this:
$a = Mine->new('red', 'green', 'blue');
call_Method($a, 'Display', 1);
call_PrintID('Mine', 'PrintID');
The only thing to note is that, in both the static and virtual methods,
the method name is not passed via the stack--it is used as the first
parameter to I<call_method>.
=head2 Using GIMME_V
Here is a trivial XSUB which prints the context in which it is
currently executing.
void
PrintContext()
CODE:
U8 gimme = GIMME_V;
if (gimme == G_VOID)
printf ("Context is Void\n");
else if (gimme == G_SCALAR)
printf ("Context is Scalar\n");
else
printf ("Context is Array\n");
And here is some Perl to test it.
PrintContext;
$a = PrintContext;
@a = PrintContext;
The output from that will be
Context is Void
Context is Scalar
Context is Array
=head2 Using Perl to Dispose of Temporaries
In the examples given to date, any temporaries created in the callback
(i.e., parameters passed on the stack to the I<call_*> function or
values returned via the stack) have been freed by one of these methods:
=over 5
=item *
Specifying the G_DISCARD flag with I<call_*>
=item *
Explicitly using the C<ENTER>/C<SAVETMPS>--C<FREETMPS>/C<LEAVE> pairing
=back
There is another method which can be used, namely letting Perl do it
for you automatically whenever it regains control after the callback
has terminated. This is done by simply not using the
ENTER;
SAVETMPS;
...
FREETMPS;
LEAVE;
sequence in the callback (and not, of course, specifying the G_DISCARD
flag).
If you are going to use this method you have to be aware of a possible
memory leak which can arise under very specific circumstances. To
explain these circumstances you need to know a bit about the flow of
control between Perl and the callback routine.
The examples given at the start of the document (an error handler and
an event driven program) are typical of the two main sorts of flow
control that you are likely to encounter with callbacks. There is a
very important distinction between them, so pay attention.
In the first example, an error handler, the flow of control could be as
follows. You have created an interface to an external library.
Control can reach the external library like this
perl --> XSUB --> external library
Whilst control is in the library, an error condition occurs. You have
previously set up a Perl callback to handle this situation, so it will
get executed. Once the callback has finished, control will drop back to
Perl again. Here is what the flow of control will be like in that
situation
perl --> XSUB --> external library
...
error occurs
...
external library --> call_* --> perl
|
perl <-- XSUB <-- external library <-- call_* <----+
After processing of the error using I<call_*> is completed,
control reverts back to Perl more or less immediately.
In the diagram, the further right you go the more deeply nested the
scope is. It is only when control is back with perl on the extreme
left of the diagram that you will have dropped back to the enclosing
scope and any temporaries you have left hanging around will be freed.
In the second example, an event driven program, the flow of control
will be more like this
perl --> XSUB --> event handler
...
event handler --> call_* --> perl
|
event handler <-- call_* <----+
...
event handler --> call_* --> perl
|
event handler <-- call_* <----+
...
event handler --> call_* --> perl
|
event handler <-- call_* <----+
In this case the flow of control can consist of only the repeated
sequence
event handler --> call_* --> perl
for practically the complete duration of the program. This means that
control may I<never> drop back to the surrounding scope in Perl at the
extreme left.
So what is the big problem? Well, if you are expecting Perl to tidy up
those temporaries for you, you might be in for a long wait. For Perl
to dispose of your temporaries, control must drop back to the
enclosing scope at some stage. In the event driven scenario that may
never happen. This means that, as time goes on, your program will
create more and more temporaries, none of which will ever be freed. As
each of these temporaries consumes some memory your program will
eventually consume all the available memory in your system--kapow!
So here is the bottom line--if you are sure that control will revert
back to the enclosing Perl scope fairly quickly after the end of your
callback, then it isn't absolutely necessary to dispose explicitly of
any temporaries you may have created. Mind you, if you are at all
uncertain about what to do, it doesn't do any harm to tidy up anyway.
=head2 Strategies for Storing Callback Context Information
Potentially one of the trickiest problems to overcome when designing a
callback interface can be figuring out how to store the mapping between
the C callback function and the Perl equivalent.
To help understand why this can be a real problem first consider how a
callback is set up in an all C environment. Typically a C API will
provide a function to register a callback. This will expect a pointer
to a function as one of its parameters. Below is a call to a
hypothetical function C<register_fatal> which registers the C function
to get called when a fatal error occurs.
register_fatal(cb1);
The single parameter C<cb1> is a pointer to a function, so you must
have defined C<cb1> in your code, say something like this
static void
cb1()
{
printf ("Fatal Error\n");
exit(1);
}
Now change that to call a Perl subroutine instead
static SV * callback = (SV*)NULL;
static void
cb1()
{
dSP;
PUSHMARK(SP);
/* Call the Perl sub to process the callback */
call_sv(callback, G_DISCARD);
}
void
register_fatal(fn)
SV * fn
CODE:
/* Remember the Perl sub */
if (callback == (SV*)NULL)
callback = newSVsv(fn);
else
SvSetSV(callback, fn);
/* register the callback with the external library */
register_fatal(cb1);
where the Perl equivalent of C<register_fatal> and the callback it
registers, C<pcb1>, might look like this
# Register the sub pcb1
register_fatal(\&pcb1);
sub pcb1
{
die "I'm dying...\n";
}
The mapping between the C callback and the Perl equivalent is stored in
the global variable C<callback>.
This will be adequate if you ever need to have only one callback
registered at any time. An example could be an error handler like the
code sketched out above. Remember though, repeated calls to
C<register_fatal> will replace the previously registered callback
function with the new one.
Say for example you want to interface to a library which allows asynchronous
file i/o. In this case you may be able to register a callback whenever
a read operation has completed. To be of any use we want to be able to
call separate Perl subroutines for each file that is opened. As it
stands, the error handler example above would not be adequate as it
allows only a single callback to be defined at any time. What we
require is a means of storing the mapping between the opened file and
the Perl subroutine we want to be called for that file.
Say the i/o library has a function C<asynch_read> which associates a C
function C<ProcessRead> with a file handle C<fh>--this assumes that it
has also provided some routine to open the file and so obtain the file
handle.
asynch_read(fh, ProcessRead)
This may expect the C I<ProcessRead> function of this form
void
ProcessRead(fh, buffer)
int fh;
char * buffer;
{
...
}
To provide a Perl interface to this library we need to be able to map
between the C<fh> parameter and the Perl subroutine we want called. A
hash is a convenient mechanism for storing this mapping. The code
below shows a possible implementation
static HV * Mapping = (HV*)NULL;
void
asynch_read(fh, callback)
int fh
SV * callback
CODE:
/* If the hash doesn't already exist, create it */
if (Mapping == (HV*)NULL)
Mapping = newHV();
/* Save the fh -> callback mapping */
hv_store(Mapping, (char*)&fh, sizeof(fh), newSVsv(callback), 0);
/* Register with the C Library */
asynch_read(fh, asynch_read_if);
and C<asynch_read_if> could look like this
static void
asynch_read_if(fh, buffer)
int fh;
char * buffer;
{
dSP;
SV ** sv;
/* Get the callback associated with fh */
sv = hv_fetch(Mapping, (char*)&fh , sizeof(fh), FALSE);
if (sv == (SV**)NULL)
croak("Internal error...\n");
PUSHMARK(SP);
EXTEND(SP, 2);
PUSHs(sv_2mortal(newSViv(fh)));
PUSHs(sv_2mortal(newSVpv(buffer, 0)));
PUTBACK;
/* Call the Perl sub */
call_sv(*sv, G_DISCARD);
}
For completeness, here is C<asynch_close>. This shows how to remove
the entry from the hash C<Mapping>.
void
asynch_close(fh)
int fh
CODE:
/* Remove the entry from the hash */
(void) hv_delete(Mapping, (char*)&fh, sizeof(fh), G_DISCARD);
/* Now call the real asynch_close */
asynch_close(fh);
So the Perl interface would look like this
sub callback1
{
my($handle, $buffer) = @_;
}
# Register the Perl callback
asynch_read($fh, \&callback1);
asynch_close($fh);
The mapping between the C callback and Perl is stored in the global
hash C<Mapping> this time. Using a hash has the distinct advantage that
it allows an unlimited number of callbacks to be registered.
What if the interface provided by the C callback doesn't contain a
parameter which allows the file handle to Perl subroutine mapping? Say
in the asynchronous i/o package, the callback function gets passed only
the C<buffer> parameter like this
void
ProcessRead(buffer)
char * buffer;
{
...
}
Without the file handle there is no straightforward way to map from the
C callback to the Perl subroutine.
In this case a possible way around this problem is to predefine a
series of C functions to act as the interface to Perl, thus
#define MAX_CB 3
#define NULL_HANDLE -1
typedef void (*FnMap)();
struct MapStruct {
FnMap Function;
SV * PerlSub;
int Handle;
};
static void fn1();
static void fn2();
static void fn3();
static struct MapStruct Map [MAX_CB] =
{
{ fn1, NULL, NULL_HANDLE },
{ fn2, NULL, NULL_HANDLE },
{ fn3, NULL, NULL_HANDLE }
};
static void
Pcb(index, buffer)
int index;
char * buffer;
{
dSP;
PUSHMARK(SP);
XPUSHs(sv_2mortal(newSVpv(buffer, 0)));
PUTBACK;
/* Call the Perl sub */
call_sv(Map[index].PerlSub, G_DISCARD);
}
static void
fn1(buffer)
char * buffer;
{
Pcb(0, buffer);
}
static void
fn2(buffer)
char * buffer;
{
Pcb(1, buffer);
}
static void
fn3(buffer)
char * buffer;
{
Pcb(2, buffer);
}
void
array_asynch_read(fh, callback)
int fh
SV * callback
CODE:
int index;
int null_index = MAX_CB;
/* Find the same handle or an empty entry */
for (index = 0; index < MAX_CB; ++index)
{
if (Map[index].Handle == fh)
break;
if (Map[index].Handle == NULL_HANDLE)
null_index = index;
}
if (index == MAX_CB && null_index == MAX_CB)
croak ("Too many callback functions registered\n");
if (index == MAX_CB)
index = null_index;
/* Save the file handle */
Map[index].Handle = fh;
/* Remember the Perl sub */
if (Map[index].PerlSub == (SV*)NULL)
Map[index].PerlSub = newSVsv(callback);
else
SvSetSV(Map[index].PerlSub, callback);
asynch_read(fh, Map[index].Function);
void
array_asynch_close(fh)
int fh
CODE:
int index;
/* Find the file handle */
for (index = 0; index < MAX_CB; ++ index)
if (Map[index].Handle == fh)
break;
if (index == MAX_CB)
croak ("could not close fh %d\n", fh);
Map[index].Handle = NULL_HANDLE;
SvREFCNT_dec(Map[index].PerlSub);
Map[index].PerlSub = (SV*)NULL;
asynch_close(fh);
In this case the functions C<fn1>, C<fn2>, and C<fn3> are used to
remember the Perl subroutine to be called. Each of the functions holds
a separate hard-wired index which is used in the function C<Pcb> to
access the C<Map> array and actually call the Perl subroutine.
There are some obvious disadvantages with this technique.
Firstly, the code is considerably more complex than with the previous
example.
Secondly, there is a hard-wired limit (in this case 3) to the number of
callbacks that can exist simultaneously. The only way to increase the
limit is by modifying the code to add more functions and then
recompiling. None the less, as long as the number of functions is
chosen with some care, it is still a workable solution and in some
cases is the only one available.
To summarize, here are a number of possible methods for you to consider
for storing the mapping between C and the Perl callback
=over 5
=item 1. Ignore the problem - Allow only 1 callback
For a lot of situations, like interfacing to an error handler, this may
be a perfectly adequate solution.
=item 2. Create a sequence of callbacks - hard wired limit
If it is impossible to tell from the parameters passed back from the C
callback what the context is, then you may need to create a sequence of C
callback interface functions, and store pointers to each in an array.
=item 3. Use a parameter to map to the Perl callback
A hash is an ideal mechanism to store the mapping between C and Perl.
=back
=head2 Alternate Stack Manipulation
Although I have made use of only the C<POP*> macros to access values
returned from Perl subroutines, it is also possible to bypass these
macros and read the stack using the C<ST> macro (See L<perlxs> for a
full description of the C<ST> macro).
Most of the time the C<POP*> macros should be adequate; the main
problem with them is that they force you to process the returned values
in sequence. This may not be the most suitable way to process the
values in some cases. What we want is to be able to access the stack in
a random order. The C<ST> macro as used when coding an XSUB is ideal
for this purpose.
The code below is the example given in the section L</Returning a List
of Values> recoded to use C<ST> instead of C<POP*>.
static void
call_AddSubtract2(a, b)
int a;
int b;
{
dSP;
I32 ax;
int count;
ENTER;
SAVETMPS;
PUSHMARK(SP);
EXTEND(SP, 2);
PUSHs(sv_2mortal(newSViv(a)));
PUSHs(sv_2mortal(newSViv(b)));
PUTBACK;
count = call_pv("AddSubtract", G_ARRAY);
SPAGAIN;
SP -= count;
ax = (SP - PL_stack_base) + 1;
if (count != 2)
croak("Big trouble\n");
printf ("%d + %d = %d\n", a, b, SvIV(ST(0)));
printf ("%d - %d = %d\n", a, b, SvIV(ST(1)));
PUTBACK;
FREETMPS;
LEAVE;
}
Notes
=over 5
=item 1.
Notice that it was necessary to define the variable C<ax>. This is
because the C<ST> macro expects it to exist. If we were in an XSUB it
would not be necessary to define C<ax> as it is already defined for
us.
=item 2.
The code
SPAGAIN;
SP -= count;
ax = (SP - PL_stack_base) + 1;
sets the stack up so that we can use the C<ST> macro.
=item 3.
Unlike the original coding of this example, the returned
values are not accessed in reverse order. So C<ST(0)> refers to the
first value returned by the Perl subroutine and C<ST(count-1)>
refers to the last.
=back
=head2 Creating and Calling an Anonymous Subroutine in C
As we've already shown, C<call_sv> can be used to invoke an
anonymous subroutine. However, our example showed a Perl script
invoking an XSUB to perform this operation. Let's see how it can be
done inside our C code:
...
SV *cvrv
= eval_pv("sub {
print 'You will not find me cluttering any namespace!'
}", TRUE);
...
call_sv(cvrv, G_VOID|G_NOARGS);
C<eval_pv> is used to compile the anonymous subroutine, which
will be the return value as well (read more about C<eval_pv> in
L<perlapi/eval_pv>). Once this code reference is in hand, it
can be mixed in with all the previous examples we've shown.
=head1 LIGHTWEIGHT CALLBACKS
Sometimes you need to invoke the same subroutine repeatedly.
This usually happens with a function that acts on a list of
values, such as Perl's built-in sort(). You can pass a
comparison function to sort(), which will then be invoked
for every pair of values that needs to be compared. The first()
and reduce() functions from L<List::Util> follow a similar
pattern.
In this case it is possible to speed up the routine (often
quite substantially) by using the lightweight callback API.
The idea is that the calling context only needs to be
created and destroyed once, and the sub can be called
arbitrarily many times in between.
It is usual to pass parameters using global variables (typically
$_ for one parameter, or $a and $b for two parameters) rather
than via @_. (It is possible to use the @_ mechanism if you know
what you're doing, though there is as yet no supported API for
it. It's also inherently slower.)
The pattern of macro calls is like this:
dMULTICALL; /* Declare local variables */
U8 gimme = G_SCALAR; /* context of the call: G_SCALAR,
* G_ARRAY, or G_VOID */
PUSH_MULTICALL(cv); /* Set up the context for calling cv,
and set local vars appropriately */
/* loop */ {
/* set the value(s) af your parameter variables */
MULTICALL; /* Make the actual call */
} /* end of loop */
POP_MULTICALL; /* Tear down the calling context */
For some concrete examples, see the implementation of the
first() and reduce() functions of List::Util 1.18. There you
will also find a header file that emulates the multicall API
on older versions of perl.
=head1 SEE ALSO
L<perlxs>, L<perlguts>, L<perlembed>
=head1 AUTHOR
Paul Marquess
Special thanks to the following people who assisted in the creation of
the document.
Jeff Okamoto, Tim Bunce, Nick Gianniotis, Steve Kelem, Gurusamy Sarathy
and Larry Wall.
=head1 DATE
Last updated for perl 5.23.1.
PK z3�Z e%� �
perldelta.podnu �[��� =encoding utf8
=head1 NAME
perldelta - what is new for perl v5.26.3
=head1 DESCRIPTION
This document describes differences between the 5.26.2 release and the 5.26.3
release.
If you are upgrading from an earlier release such as 5.26.1, first read
L<perl5262delta>, which describes differences between 5.26.1 and 5.26.2.
=head1 Security
=head2 [CVE-2018-12015] Directory traversal in module Archive::Tar
By default, L<Archive::Tar> doesn't allow extracting files outside the current
working directory. However, this secure extraction mode could be bypassed by
putting a symlink and a regular file with the same name into the tar file.
L<[perl #133250]|https://rt.perl.org/Ticket/Display.html?id=133250>
L<[cpan #125523]|https://rt.cpan.org/Ticket/Display.html?id=125523>
=head2 [CVE-2018-18311] Integer overflow leading to buffer overflow and segmentation fault
Integer arithmetic in C<Perl_my_setenv()> could wrap when the combined length
of the environment variable name and value exceeded around 0x7fffffff. This
could lead to writing beyond the end of an allocated buffer with attacker
supplied data.
L<[perl #133204]|https://rt.perl.org/Ticket/Display.html?id=133204>
=head2 [CVE-2018-18312] Heap-buffer-overflow write in S_regatom (regcomp.c)
A crafted regular expression could cause heap-buffer-overflow write during
compilation, potentially allowing arbitrary code execution.
L<[perl #133423]|https://rt.perl.org/Ticket/Display.html?id=133423>
=head2 [CVE-2018-18313] Heap-buffer-overflow read in S_grok_bslash_N (regcomp.c)
A crafted regular expression could cause heap-buffer-overflow read during
compilation, potentially leading to sensitive information being leaked.
L<[perl #133192]|https://rt.perl.org/Ticket/Display.html?id=133192>
=head2 [CVE-2018-18314] Heap-buffer-overflow write in S_regatom (regcomp.c)
A crafted regular expression could cause heap-buffer-overflow write during
compilation, potentially allowing arbitrary code execution.
L<[perl #131649]|https://rt.perl.org/Ticket/Display.html?id=131649>
=head1 Incompatible Changes
There are no changes intentionally incompatible with 5.26.2. If any exist,
they are bugs, and we request that you submit a report. See
L</Reporting Bugs> below.
=head1 Modules and Pragmata
=head2 Updated Modules and Pragmata
=over 4
=item *
L<Archive::Tar> has been upgraded from version 2.24 to 2.24_01.
=item *
L<Module::CoreList> has been upgraded from version 5.20180414_26 to 5.20181129_26.
=back
=head1 Diagnostics
The following additions or changes have been made to diagnostic output,
including warnings and fatal error messages. For the complete list of
diagnostic messages, see L<perldiag>.
=head2 New Diagnostics
=head3 New Errors
=over 4
=item *
L<Unexpected ']' with no following ')' in (?[... in regex; marked by E<lt>-- HERE in mE<sol>%sE<sol>|perldiag/"Unexpected ']' with no following ')' in (?[... in regex; marked by E<lt>-- HERE in mE<sol>%sE<sol>">
(F) While parsing an extended character class a ']' character was encountered
at a point in the definition where the only legal use of ']' is to close the
character class definition as part of a '])', you may have forgotten the close
paren, or otherwise confused the parser.
=item *
L<Expecting close paren for nested extended charclass in regex; marked by E<lt>-- HERE in mE<sol>%sE<sol>|perldiag/"Expecting close paren for nested extended charclass in regex; marked by E<lt>-- HERE in mE<sol>%sE<sol>">
(F) While parsing a nested extended character class like:
(?[ ... (?flags:(?[ ... ])) ... ])
^
we expected to see a close paren ')' (marked by ^) but did not.
=item *
L<Expecting close paren for wrapper for nested extended charclass in regex; marked by E<lt>-- HERE in mE<sol>%sE<sol>|perldiag/"Expecting close paren for wrapper for nested extended charclass in regex; marked by E<lt>-- HERE in mE<sol>%sE<sol>">
(F) While parsing a nested extended character class like:
(?[ ... (?flags:(?[ ... ])) ... ])
^
we expected to see a close paren ')' (marked by ^) but did not.
=back
=head2 Changes to Existing Diagnostics
=over 4
=item *
L<Syntax error in (?[...]) in regex; marked by E<lt>-- HERE in mE<sol>%sE<sol>|perldiag/"Syntax error in (?[...]) in regex; marked by E<lt>-- HERE in mE<sol>%sE<sol>">
This fatal error message has been slightly expanded (from "Syntax error in
(?[...]) in regex mE<sol>%sE<sol>") for greater clarity.
=back
=head1 Acknowledgements
Perl 5.26.3 represents approximately 8 months of development since Perl 5.26.2
and contains approximately 4,500 lines of changes across 51 files from 15
authors.
Excluding auto-generated files, documentation and release tools, there were
approximately 770 lines of changes to 10 .pm, .t, .c and .h files.
Perl continues to flourish into its third decade thanks to a vibrant community
of users and developers. The following people are known to have contributed
the improvements that became Perl 5.26.3:
Aaron Crane, Abigail, Chris 'BinGOs' Williams, Dagfinn Ilmari Mannsåker, David
Mitchell, H.Merijn Brand, James E Keenan, John SJ Anderson, Karen Etheridge,
Karl Williamson, Sawyer X, Steve Hay, Todd Rinaldo, Tony Cook, Yves Orton.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the perl bug database
at L<https://rt.perl.org/> . There may also be information at
L<http://www.perl.org/> , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug> program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of C<perl -V>,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications which make it
inappropriate to send to a publicly archived mailing list, then see
L<perlsec/SECURITY VULNERABILITY CONTACT INFORMATION>
for details of how to report the issue.
=head1 Give Thanks
If you wish to thank the Perl 5 Porters for the work we had done in Perl 5,
you can do so by running the C<perlthanks> program:
perlthanks
This will send an email to the Perl 5 Porters list with your show of thanks.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details on
what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Z��`�H4 H4
perlxs.podnu �[��� =head1 NAME
perlxs - XS language reference manual
=head1 DESCRIPTION
=head2 Introduction
XS is an interface description file format used to create an extension
interface between Perl and C code (or a C library) which one wishes
to use with Perl. The XS interface is combined with the library to
create a new library which can then be either dynamically loaded
or statically linked into perl. The XS interface description is
written in the XS language and is the core component of the Perl
extension interface.
Before writing XS, read the L</CAVEATS> section below.
An B<XSUB> forms the basic unit of the XS interface. After compilation
by the B<xsubpp> compiler, each XSUB amounts to a C function definition
which will provide the glue between Perl calling conventions and C
calling conventions.
The glue code pulls the arguments from the Perl stack, converts these
Perl values to the formats expected by a C function, call this C function,
transfers the return values of the C function back to Perl.
Return values here may be a conventional C return value or any C
function arguments that may serve as output parameters. These return
values may be passed back to Perl either by putting them on the
Perl stack, or by modifying the arguments supplied from the Perl side.
The above is a somewhat simplified view of what really happens. Since
Perl allows more flexible calling conventions than C, XSUBs may do much
more in practice, such as checking input parameters for validity,
throwing exceptions (or returning undef/empty list) if the return value
from the C function indicates failure, calling different C functions
based on numbers and types of the arguments, providing an object-oriented
interface, etc.
Of course, one could write such glue code directly in C. However, this
would be a tedious task, especially if one needs to write glue for
multiple C functions, and/or one is not familiar enough with the Perl
stack discipline and other such arcana. XS comes to the rescue here:
instead of writing this glue C code in long-hand, one can write
a more concise short-hand I<description> of what should be done by
the glue, and let the XS compiler B<xsubpp> handle the rest.
The XS language allows one to describe the mapping between how the C
routine is used, and how the corresponding Perl routine is used. It
also allows creation of Perl routines which are directly translated to
C code and which are not related to a pre-existing C function. In cases
when the C interface coincides with the Perl interface, the XSUB
declaration is almost identical to a declaration of a C function (in K&R
style). In such circumstances, there is another tool called C<h2xs>
that is able to translate an entire C header file into a corresponding
XS file that will provide glue to the functions/macros described in
the header file.
The XS compiler is called B<xsubpp>. This compiler creates
the constructs necessary to let an XSUB manipulate Perl values, and
creates the glue necessary to let Perl call the XSUB. The compiler
uses B<typemaps> to determine how to map C function parameters
and output values to Perl values and back. The default typemap
(which comes with Perl) handles many common C types. A supplementary
typemap may also be needed to handle any special structures and types
for the library being linked. For more information on typemaps,
see L<perlxstypemap>.
A file in XS format starts with a C language section which goes until the
first C<MODULE =Z<>> directive. Other XS directives and XSUB definitions
may follow this line. The "language" used in this part of the file
is usually referred to as the XS language. B<xsubpp> recognizes and
skips POD (see L<perlpod>) in both the C and XS language sections, which
allows the XS file to contain embedded documentation.
See L<perlxstut> for a tutorial on the whole extension creation process.
Note: For some extensions, Dave Beazley's SWIG system may provide a
significantly more convenient mechanism for creating the extension
glue code. See L<http://www.swig.org/> for more information.
=head2 On The Road
Many of the examples which follow will concentrate on creating an interface
between Perl and the ONC+ RPC bind library functions. The rpcb_gettime()
function is used to demonstrate many features of the XS language. This
function has two parameters; the first is an input parameter and the second
is an output parameter. The function also returns a status value.
bool_t rpcb_gettime(const char *host, time_t *timep);
From C this function will be called with the following
statements.
#include <rpc/rpc.h>
bool_t status;
time_t timep;
status = rpcb_gettime( "localhost", &timep );
If an XSUB is created to offer a direct translation between this function
and Perl, then this XSUB will be used from Perl with the following code.
The $status and $timep variables will contain the output of the function.
use RPC;
$status = rpcb_gettime( "localhost", $timep );
The following XS file shows an XS subroutine, or XSUB, which
demonstrates one possible interface to the rpcb_gettime()
function. This XSUB represents a direct translation between
C and Perl and so preserves the interface even from Perl.
This XSUB will be invoked from Perl with the usage shown
above. Note that the first three #include statements, for
C<EXTERN.h>, C<perl.h>, and C<XSUB.h>, will always be present at the
beginning of an XS file. This approach and others will be
expanded later in this document. A #define for C<PERL_NO_GET_CONTEXT>
should be present to fetch the interpreter context more efficiently,
see L<perlguts|perlguts/How multiple interpreters and concurrency are
supported> for details.
#define PERL_NO_GET_CONTEXT
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include <rpc/rpc.h>
MODULE = RPC PACKAGE = RPC
bool_t
rpcb_gettime(host,timep)
char *host
time_t &timep
OUTPUT:
timep
Any extension to Perl, including those containing XSUBs,
should have a Perl module to serve as the bootstrap which
pulls the extension into Perl. This module will export the
extension's functions and variables to the Perl program and
will cause the extension's XSUBs to be linked into Perl.
The following module will be used for most of the examples
in this document and should be used from Perl with the C<use>
command as shown earlier. Perl modules are explained in
more detail later in this document.
package RPC;
require Exporter;
require DynaLoader;
@ISA = qw(Exporter DynaLoader);
@EXPORT = qw( rpcb_gettime );
bootstrap RPC;
1;
Throughout this document a variety of interfaces to the rpcb_gettime()
XSUB will be explored. The XSUBs will take their parameters in different
orders or will take different numbers of parameters. In each case the
XSUB is an abstraction between Perl and the real C rpcb_gettime()
function, and the XSUB must always ensure that the real rpcb_gettime()
function is called with the correct parameters. This abstraction will
allow the programmer to create a more Perl-like interface to the C
function.
=head2 The Anatomy of an XSUB
The simplest XSUBs consist of 3 parts: a description of the return
value, the name of the XSUB routine and the names of its arguments,
and a description of types or formats of the arguments.
The following XSUB allows a Perl program to access a C library function
called sin(). The XSUB will imitate the C function which takes a single
argument and returns a single value.
double
sin(x)
double x
Optionally, one can merge the description of types and the list of
argument names, rewriting this as
double
sin(double x)
This makes this XSUB look similar to an ANSI C declaration. An optional
semicolon is allowed after the argument list, as in
double
sin(double x);
Parameters with C pointer types can have different semantic: C functions
with similar declarations
bool string_looks_as_a_number(char *s);
bool make_char_uppercase(char *c);
are used in absolutely incompatible manner. Parameters to these functions
could be described B<xsubpp> like this:
char * s
char &c
Both these XS declarations correspond to the C<char*> C type, but they have
different semantics, see L<"The & Unary Operator">.
It is convenient to think that the indirection operator
C<*> should be considered as a part of the type and the address operator C<&>
should be considered part of the variable. See L<perlxstypemap>
for more info about handling qualifiers and unary operators in C types.
The function name and the return type must be placed on
separate lines and should be flush left-adjusted.
INCORRECT CORRECT
double sin(x) double
double x sin(x)
double x
The rest of the function description may be indented or left-adjusted. The
following example shows a function with its body left-adjusted. Most
examples in this document will indent the body for better readability.
CORRECT
double
sin(x)
double x
More complicated XSUBs may contain many other sections. Each section of
an XSUB starts with the corresponding keyword, such as INIT: or CLEANUP:.
However, the first two lines of an XSUB always contain the same data:
descriptions of the return type and the names of the function and its
parameters. Whatever immediately follows these is considered to be
an INPUT: section unless explicitly marked with another keyword.
(See L<The INPUT: Keyword>.)
An XSUB section continues until another section-start keyword is found.
=head2 The Argument Stack
The Perl argument stack is used to store the values which are
sent as parameters to the XSUB and to store the XSUB's
return value(s). In reality all Perl functions (including non-XSUB
ones) keep their values on this stack all the same time, each limited
to its own range of positions on the stack. In this document the
first position on that stack which belongs to the active
function will be referred to as position 0 for that function.
XSUBs refer to their stack arguments with the macro B<ST(x)>, where I<x>
refers to a position in this XSUB's part of the stack. Position 0 for that
function would be known to the XSUB as ST(0). The XSUB's incoming
parameters and outgoing return values always begin at ST(0). For many
simple cases the B<xsubpp> compiler will generate the code necessary to
handle the argument stack by embedding code fragments found in the
typemaps. In more complex cases the programmer must supply the code.
=head2 The RETVAL Variable
The RETVAL variable is a special C variable that is declared automatically
for you. The C type of RETVAL matches the return type of the C library
function. The B<xsubpp> compiler will declare this variable in each XSUB
with non-C<void> return type. By default the generated C function
will use RETVAL to hold the return value of the C library function being
called. In simple cases the value of RETVAL will be placed in ST(0) of
the argument stack where it can be received by Perl as the return value
of the XSUB.
If the XSUB has a return type of C<void> then the compiler will
not declare a RETVAL variable for that function. When using
a PPCODE: section no manipulation of the RETVAL variable is required, the
section may use direct stack manipulation to place output values on the stack.
If PPCODE: directive is not used, C<void> return value should be used
only for subroutines which do not return a value, I<even if> CODE:
directive is used which sets ST(0) explicitly.
Older versions of this document recommended to use C<void> return
value in such cases. It was discovered that this could lead to
segfaults in cases when XSUB was I<truly> C<void>. This practice is
now deprecated, and may be not supported at some future version. Use
the return value C<SV *> in such cases. (Currently C<xsubpp> contains
some heuristic code which tries to disambiguate between "truly-void"
and "old-practice-declared-as-void" functions. Hence your code is at
mercy of this heuristics unless you use C<SV *> as return value.)
=head2 Returning SVs, AVs and HVs through RETVAL
When you're using RETVAL to return an C<SV *>, there's some magic
going on behind the scenes that should be mentioned. When you're
manipulating the argument stack using the ST(x) macro, for example,
you usually have to pay special attention to reference counts. (For
more about reference counts, see L<perlguts>.) To make your life
easier, the typemap file automatically makes C<RETVAL> mortal when
you're returning an C<SV *>. Thus, the following two XSUBs are more
or less equivalent:
void
alpha()
PPCODE:
ST(0) = newSVpv("Hello World",0);
sv_2mortal(ST(0));
XSRETURN(1);
SV *
beta()
CODE:
RETVAL = newSVpv("Hello World",0);
OUTPUT:
RETVAL
This is quite useful as it usually improves readability. While
this works fine for an C<SV *>, it's unfortunately not as easy
to have C<AV *> or C<HV *> as a return value. You I<should> be
able to write:
AV *
array()
CODE:
RETVAL = newAV();
/* do something with RETVAL */
OUTPUT:
RETVAL
But due to an unfixable bug (fixing it would break lots of existing
CPAN modules) in the typemap file, the reference count of the C<AV *>
is not properly decremented. Thus, the above XSUB would leak memory
whenever it is being called. The same problem exists for C<HV *>,
C<CV *>, and C<SVREF> (which indicates a scalar reference, not
a general C<SV *>).
In XS code on perls starting with perl 5.16, you can override the
typemaps for any of these types with a version that has proper
handling of refcounts. In your C<TYPEMAP> section, do
AV* T_AVREF_REFCOUNT_FIXED
to get the repaired variant. For backward compatibility with older
versions of perl, you can instead decrement the reference count
manually when you're returning one of the aforementioned
types using C<sv_2mortal>:
AV *
array()
CODE:
RETVAL = newAV();
sv_2mortal((SV*)RETVAL);
/* do something with RETVAL */
OUTPUT:
RETVAL
Remember that you don't have to do this for an C<SV *>. The reference
documentation for all core typemaps can be found in L<perlxstypemap>.
=head2 The MODULE Keyword
The MODULE keyword is used to start the XS code and to specify the package
of the functions which are being defined. All text preceding the first
MODULE keyword is considered C code and is passed through to the output with
POD stripped, but otherwise untouched. Every XS module will have a
bootstrap function which is used to hook the XSUBs into Perl. The package
name of this bootstrap function will match the value of the last MODULE
statement in the XS source files. The value of MODULE should always remain
constant within the same XS file, though this is not required.
The following example will start the XS code and will place
all functions in a package named RPC.
MODULE = RPC
=head2 The PACKAGE Keyword
When functions within an XS source file must be separated into packages
the PACKAGE keyword should be used. This keyword is used with the MODULE
keyword and must follow immediately after it when used.
MODULE = RPC PACKAGE = RPC
[ XS code in package RPC ]
MODULE = RPC PACKAGE = RPCB
[ XS code in package RPCB ]
MODULE = RPC PACKAGE = RPC
[ XS code in package RPC ]
The same package name can be used more than once, allowing for
non-contiguous code. This is useful if you have a stronger ordering
principle than package names.
Although this keyword is optional and in some cases provides redundant
information it should always be used. This keyword will ensure that the
XSUBs appear in the desired package.
=head2 The PREFIX Keyword
The PREFIX keyword designates prefixes which should be
removed from the Perl function names. If the C function is
C<rpcb_gettime()> and the PREFIX value is C<rpcb_> then Perl will
see this function as C<gettime()>.
This keyword should follow the PACKAGE keyword when used.
If PACKAGE is not used then PREFIX should follow the MODULE
keyword.
MODULE = RPC PREFIX = rpc_
MODULE = RPC PACKAGE = RPCB PREFIX = rpcb_
=head2 The OUTPUT: Keyword
The OUTPUT: keyword indicates that certain function parameters should be
updated (new values made visible to Perl) when the XSUB terminates or that
certain values should be returned to the calling Perl function. For
simple functions which have no CODE: or PPCODE: section,
such as the sin() function above, the RETVAL variable is
automatically designated as an output value. For more complex functions
the B<xsubpp> compiler will need help to determine which variables are output
variables.
This keyword will normally be used to complement the CODE: keyword.
The RETVAL variable is not recognized as an output variable when the
CODE: keyword is present. The OUTPUT: keyword is used in this
situation to tell the compiler that RETVAL really is an output
variable.
The OUTPUT: keyword can also be used to indicate that function parameters
are output variables. This may be necessary when a parameter has been
modified within the function and the programmer would like the update to
be seen by Perl.
bool_t
rpcb_gettime(host,timep)
char *host
time_t &timep
OUTPUT:
timep
The OUTPUT: keyword will also allow an output parameter to
be mapped to a matching piece of code rather than to a
typemap.
bool_t
rpcb_gettime(host,timep)
char *host
time_t &timep
OUTPUT:
timep sv_setnv(ST(1), (double)timep);
B<xsubpp> emits an automatic C<SvSETMAGIC()> for all parameters in the
OUTPUT section of the XSUB, except RETVAL. This is the usually desired
behavior, as it takes care of properly invoking 'set' magic on output
parameters (needed for hash or array element parameters that must be
created if they didn't exist). If for some reason, this behavior is
not desired, the OUTPUT section may contain a C<SETMAGIC: DISABLE> line
to disable it for the remainder of the parameters in the OUTPUT section.
Likewise, C<SETMAGIC: ENABLE> can be used to reenable it for the
remainder of the OUTPUT section. See L<perlguts> for more details
about 'set' magic.
=head2 The NO_OUTPUT Keyword
The NO_OUTPUT can be placed as the first token of the XSUB. This keyword
indicates that while the C subroutine we provide an interface to has
a non-C<void> return type, the return value of this C subroutine should not
be returned from the generated Perl subroutine.
With this keyword present L<The RETVAL Variable> is created, and in the
generated call to the subroutine this variable is assigned to, but the value
of this variable is not going to be used in the auto-generated code.
This keyword makes sense only if C<RETVAL> is going to be accessed by the
user-supplied code. It is especially useful to make a function interface
more Perl-like, especially when the C return value is just an error condition
indicator. For example,
NO_OUTPUT int
delete_file(char *name)
POSTCALL:
if (RETVAL != 0)
croak("Error %d while deleting file '%s'", RETVAL, name);
Here the generated XS function returns nothing on success, and will die()
with a meaningful error message on error.
=head2 The CODE: Keyword
This keyword is used in more complicated XSUBs which require
special handling for the C function. The RETVAL variable is
still declared, but it will not be returned unless it is specified
in the OUTPUT: section.
The following XSUB is for a C function which requires special handling of
its parameters. The Perl usage is given first.
$status = rpcb_gettime( "localhost", $timep );
The XSUB follows.
bool_t
rpcb_gettime(host,timep)
char *host
time_t timep
CODE:
RETVAL = rpcb_gettime( host, &timep );
OUTPUT:
timep
RETVAL
=head2 The INIT: Keyword
The INIT: keyword allows initialization to be inserted into the XSUB before
the compiler generates the call to the C function. Unlike the CODE: keyword
above, this keyword does not affect the way the compiler handles RETVAL.
bool_t
rpcb_gettime(host,timep)
char *host
time_t &timep
INIT:
printf("# Host is %s\n", host );
OUTPUT:
timep
Another use for the INIT: section is to check for preconditions before
making a call to the C function:
long long
lldiv(a,b)
long long a
long long b
INIT:
if (a == 0 && b == 0)
XSRETURN_UNDEF;
if (b == 0)
croak("lldiv: cannot divide by 0");
=head2 The NO_INIT Keyword
The NO_INIT keyword is used to indicate that a function
parameter is being used only as an output value. The B<xsubpp>
compiler will normally generate code to read the values of
all function parameters from the argument stack and assign
them to C variables upon entry to the function. NO_INIT
will tell the compiler that some parameters will be used for
output rather than for input and that they will be handled
before the function terminates.
The following example shows a variation of the rpcb_gettime() function.
This function uses the timep variable only as an output variable and does
not care about its initial contents.
bool_t
rpcb_gettime(host,timep)
char *host
time_t &timep = NO_INIT
OUTPUT:
timep
=head2 The TYPEMAP: Keyword
Starting with Perl 5.16, you can embed typemaps into your XS code
instead of or in addition to typemaps in a separate file. Multiple
such embedded typemaps will be processed in order of appearance in
the XS code and like local typemap files take precedence over the
default typemap, the embedded typemaps may overwrite previous
definitions of TYPEMAP, INPUT, and OUTPUT stanzas. The syntax for
embedded typemaps is
TYPEMAP: <<HERE
... your typemap code here ...
HERE
where the C<TYPEMAP> keyword must appear in the first column of a
new line.
Refer to L<perlxstypemap> for details on writing typemaps.
=head2 Initializing Function Parameters
C function parameters are normally initialized with their values from
the argument stack (which in turn contains the parameters that were
passed to the XSUB from Perl). The typemaps contain the
code segments which are used to translate the Perl values to
the C parameters. The programmer, however, is allowed to
override the typemaps and supply alternate (or additional)
initialization code. Initialization code starts with the first
C<=>, C<;> or C<+> on a line in the INPUT: section. The only
exception happens if this C<;> terminates the line, then this C<;>
is quietly ignored.
The following code demonstrates how to supply initialization code for
function parameters. The initialization code is eval'ed within double
quotes by the compiler before it is added to the output so anything
which should be interpreted literally [mainly C<$>, C<@>, or C<\\>]
must be protected with backslashes. The variables C<$var>, C<$arg>,
and C<$type> can be used as in typemaps.
bool_t
rpcb_gettime(host,timep)
char *host = (char *)SvPV_nolen($arg);
time_t &timep = 0;
OUTPUT:
timep
This should not be used to supply default values for parameters. One
would normally use this when a function parameter must be processed by
another library function before it can be used. Default parameters are
covered in the next section.
If the initialization begins with C<=>, then it is output in
the declaration for the input variable, replacing the initialization
supplied by the typemap. If the initialization
begins with C<;> or C<+>, then it is performed after
all of the input variables have been declared. In the C<;>
case the initialization normally supplied by the typemap is not performed.
For the C<+> case, the declaration for the variable will include the
initialization from the typemap. A global
variable, C<%v>, is available for the truly rare case where
information from one initialization is needed in another
initialization.
Here's a truly obscure example:
bool_t
rpcb_gettime(host,timep)
time_t &timep; /* \$v{timep}=@{[$v{timep}=$arg]} */
char *host + SvOK($v{timep}) ? SvPV_nolen($arg) : NULL;
OUTPUT:
timep
The construct C<\$v{timep}=@{[$v{timep}=$arg]}> used in the above
example has a two-fold purpose: first, when this line is processed by
B<xsubpp>, the Perl snippet C<$v{timep}=$arg> is evaluated. Second,
the text of the evaluated snippet is output into the generated C file
(inside a C comment)! During the processing of C<char *host> line,
C<$arg> will evaluate to C<ST(0)>, and C<$v{timep}> will evaluate to
C<ST(1)>.
=head2 Default Parameter Values
Default values for XSUB arguments can be specified by placing an
assignment statement in the parameter list. The default value may
be a number, a string or the special string C<NO_INIT>. Defaults should
always be used on the right-most parameters only.
To allow the XSUB for rpcb_gettime() to have a default host
value the parameters to the XSUB could be rearranged. The
XSUB will then call the real rpcb_gettime() function with
the parameters in the correct order. This XSUB can be called
from Perl with either of the following statements:
$status = rpcb_gettime( $timep, $host );
$status = rpcb_gettime( $timep );
The XSUB will look like the code which follows. A CODE:
block is used to call the real rpcb_gettime() function with
the parameters in the correct order for that function.
bool_t
rpcb_gettime(timep,host="localhost")
char *host
time_t timep = NO_INIT
CODE:
RETVAL = rpcb_gettime( host, &timep );
OUTPUT:
timep
RETVAL
=head2 The PREINIT: Keyword
The PREINIT: keyword allows extra variables to be declared immediately
before or after the declarations of the parameters from the INPUT: section
are emitted.
If a variable is declared inside a CODE: section it will follow any typemap
code that is emitted for the input parameters. This may result in the
declaration ending up after C code, which is C syntax error. Similar
errors may happen with an explicit C<;>-type or C<+>-type initialization of
parameters is used (see L<"Initializing Function Parameters">). Declaring
these variables in an INIT: section will not help.
In such cases, to force an additional variable to be declared together
with declarations of other variables, place the declaration into a
PREINIT: section. The PREINIT: keyword may be used one or more times
within an XSUB.
The following examples are equivalent, but if the code is using complex
typemaps then the first example is safer.
bool_t
rpcb_gettime(timep)
time_t timep = NO_INIT
PREINIT:
char *host = "localhost";
CODE:
RETVAL = rpcb_gettime( host, &timep );
OUTPUT:
timep
RETVAL
For this particular case an INIT: keyword would generate the
same C code as the PREINIT: keyword. Another correct, but error-prone example:
bool_t
rpcb_gettime(timep)
time_t timep = NO_INIT
CODE:
char *host = "localhost";
RETVAL = rpcb_gettime( host, &timep );
OUTPUT:
timep
RETVAL
Another way to declare C<host> is to use a C block in the CODE: section:
bool_t
rpcb_gettime(timep)
time_t timep = NO_INIT
CODE:
{
char *host = "localhost";
RETVAL = rpcb_gettime( host, &timep );
}
OUTPUT:
timep
RETVAL
The ability to put additional declarations before the typemap entries are
processed is very handy in the cases when typemap conversions manipulate
some global state:
MyObject
mutate(o)
PREINIT:
MyState st = global_state;
INPUT:
MyObject o;
CLEANUP:
reset_to(global_state, st);
Here we suppose that conversion to C<MyObject> in the INPUT: section and from
MyObject when processing RETVAL will modify a global variable C<global_state>.
After these conversions are performed, we restore the old value of
C<global_state> (to avoid memory leaks, for example).
There is another way to trade clarity for compactness: INPUT sections allow
declaration of C variables which do not appear in the parameter list of
a subroutine. Thus the above code for mutate() can be rewritten as
MyObject
mutate(o)
MyState st = global_state;
MyObject o;
CLEANUP:
reset_to(global_state, st);
and the code for rpcb_gettime() can be rewritten as
bool_t
rpcb_gettime(timep)
time_t timep = NO_INIT
char *host = "localhost";
C_ARGS:
host, &timep
OUTPUT:
timep
RETVAL
=head2 The SCOPE: Keyword
The SCOPE: keyword allows scoping to be enabled for a particular XSUB. If
enabled, the XSUB will invoke ENTER and LEAVE automatically.
To support potentially complex type mappings, if a typemap entry used
by an XSUB contains a comment like C</*scope*/> then scoping will
be automatically enabled for that XSUB.
To enable scoping:
SCOPE: ENABLE
To disable scoping:
SCOPE: DISABLE
=head2 The INPUT: Keyword
The XSUB's parameters are usually evaluated immediately after entering the
XSUB. The INPUT: keyword can be used to force those parameters to be
evaluated a little later. The INPUT: keyword can be used multiple times
within an XSUB and can be used to list one or more input variables. This
keyword is used with the PREINIT: keyword.
The following example shows how the input parameter C<timep> can be
evaluated late, after a PREINIT.
bool_t
rpcb_gettime(host,timep)
char *host
PREINIT:
time_t tt;
INPUT:
time_t timep
CODE:
RETVAL = rpcb_gettime( host, &tt );
timep = tt;
OUTPUT:
timep
RETVAL
The next example shows each input parameter evaluated late.
bool_t
rpcb_gettime(host,timep)
PREINIT:
time_t tt;
INPUT:
char *host
PREINIT:
char *h;
INPUT:
time_t timep
CODE:
h = host;
RETVAL = rpcb_gettime( h, &tt );
timep = tt;
OUTPUT:
timep
RETVAL
Since INPUT sections allow declaration of C variables which do not appear
in the parameter list of a subroutine, this may be shortened to:
bool_t
rpcb_gettime(host,timep)
time_t tt;
char *host;
char *h = host;
time_t timep;
CODE:
RETVAL = rpcb_gettime( h, &tt );
timep = tt;
OUTPUT:
timep
RETVAL
(We used our knowledge that input conversion for C<char *> is a "simple" one,
thus C<host> is initialized on the declaration line, and our assignment
C<h = host> is not performed too early. Otherwise one would need to have the
assignment C<h = host> in a CODE: or INIT: section.)
=head2 The IN/OUTLIST/IN_OUTLIST/OUT/IN_OUT Keywords
In the list of parameters for an XSUB, one can precede parameter names
by the C<IN>/C<OUTLIST>/C<IN_OUTLIST>/C<OUT>/C<IN_OUT> keywords.
C<IN> keyword is the default, the other keywords indicate how the Perl
interface should differ from the C interface.
Parameters preceded by C<OUTLIST>/C<IN_OUTLIST>/C<OUT>/C<IN_OUT>
keywords are considered to be used by the C subroutine I<via
pointers>. C<OUTLIST>/C<OUT> keywords indicate that the C subroutine
does not inspect the memory pointed by this parameter, but will write
through this pointer to provide additional return values.
Parameters preceded by C<OUTLIST> keyword do not appear in the usage
signature of the generated Perl function.
Parameters preceded by C<IN_OUTLIST>/C<IN_OUT>/C<OUT> I<do> appear as
parameters to the Perl function. With the exception of
C<OUT>-parameters, these parameters are converted to the corresponding
C type, then pointers to these data are given as arguments to the C
function. It is expected that the C function will write through these
pointers.
The return list of the generated Perl function consists of the C return value
from the function (unless the XSUB is of C<void> return type or
C<The NO_OUTPUT Keyword> was used) followed by all the C<OUTLIST>
and C<IN_OUTLIST> parameters (in the order of appearance). On the
return from the XSUB the C<IN_OUT>/C<OUT> Perl parameter will be
modified to have the values written by the C function.
For example, an XSUB
void
day_month(OUTLIST day, IN unix_time, OUTLIST month)
int day
int unix_time
int month
should be used from Perl as
my ($day, $month) = day_month(time);
The C signature of the corresponding function should be
void day_month(int *day, int unix_time, int *month);
The C<IN>/C<OUTLIST>/C<IN_OUTLIST>/C<IN_OUT>/C<OUT> keywords can be
mixed with ANSI-style declarations, as in
void
day_month(OUTLIST int day, int unix_time, OUTLIST int month)
(here the optional C<IN> keyword is omitted).
The C<IN_OUT> parameters are identical with parameters introduced with
L<The & Unary Operator> and put into the C<OUTPUT:> section (see
L<The OUTPUT: Keyword>). The C<IN_OUTLIST> parameters are very similar,
the only difference being that the value C function writes through the
pointer would not modify the Perl parameter, but is put in the output
list.
The C<OUTLIST>/C<OUT> parameter differ from C<IN_OUTLIST>/C<IN_OUT>
parameters only by the initial value of the Perl parameter not
being read (and not being given to the C function - which gets some
garbage instead). For example, the same C function as above can be
interfaced with as
void day_month(OUT int day, int unix_time, OUT int month);
or
void
day_month(day, unix_time, month)
int &day = NO_INIT
int unix_time
int &month = NO_INIT
OUTPUT:
day
month
However, the generated Perl function is called in very C-ish style:
my ($day, $month);
day_month($day, time, $month);
=head2 The C<length(NAME)> Keyword
If one of the input arguments to the C function is the length of a string
argument C<NAME>, one can substitute the name of the length-argument by
C<length(NAME)> in the XSUB declaration. This argument must be omitted when
the generated Perl function is called. E.g.,
void
dump_chars(char *s, short l)
{
short n = 0;
while (n < l) {
printf("s[%d] = \"\\%#03o\"\n", n, (int)s[n]);
n++;
}
}
MODULE = x PACKAGE = x
void dump_chars(char *s, short length(s))
should be called as C<dump_chars($string)>.
This directive is supported with ANSI-type function declarations only.
=head2 Variable-length Parameter Lists
XSUBs can have variable-length parameter lists by specifying an ellipsis
C<(...)> in the parameter list. This use of the ellipsis is similar to that
found in ANSI C. The programmer is able to determine the number of
arguments passed to the XSUB by examining the C<items> variable which the
B<xsubpp> compiler supplies for all XSUBs. By using this mechanism one can
create an XSUB which accepts a list of parameters of unknown length.
The I<host> parameter for the rpcb_gettime() XSUB can be
optional so the ellipsis can be used to indicate that the
XSUB will take a variable number of parameters. Perl should
be able to call this XSUB with either of the following statements.
$status = rpcb_gettime( $timep, $host );
$status = rpcb_gettime( $timep );
The XS code, with ellipsis, follows.
bool_t
rpcb_gettime(timep, ...)
time_t timep = NO_INIT
PREINIT:
char *host = "localhost";
CODE:
if( items > 1 )
host = (char *)SvPV_nolen(ST(1));
RETVAL = rpcb_gettime( host, &timep );
OUTPUT:
timep
RETVAL
=head2 The C_ARGS: Keyword
The C_ARGS: keyword allows creating of XSUBS which have different
calling sequence from Perl than from C, without a need to write
CODE: or PPCODE: section. The contents of the C_ARGS: paragraph is
put as the argument to the called C function without any change.
For example, suppose that a C function is declared as
symbolic nth_derivative(int n, symbolic function, int flags);
and that the default flags are kept in a global C variable
C<default_flags>. Suppose that you want to create an interface which
is called as
$second_deriv = $function->nth_derivative(2);
To do this, declare the XSUB as
symbolic
nth_derivative(function, n)
symbolic function
int n
C_ARGS:
n, function, default_flags
=head2 The PPCODE: Keyword
The PPCODE: keyword is an alternate form of the CODE: keyword and is used
to tell the B<xsubpp> compiler that the programmer is supplying the code to
control the argument stack for the XSUBs return values. Occasionally one
will want an XSUB to return a list of values rather than a single value.
In these cases one must use PPCODE: and then explicitly push the list of
values on the stack. The PPCODE: and CODE: keywords should not be used
together within the same XSUB.
The actual difference between PPCODE: and CODE: sections is in the
initialization of C<SP> macro (which stands for the I<current> Perl
stack pointer), and in the handling of data on the stack when returning
from an XSUB. In CODE: sections SP preserves the value which was on
entry to the XSUB: SP is on the function pointer (which follows the
last parameter). In PPCODE: sections SP is moved backward to the
beginning of the parameter list, which allows C<PUSH*()> macros
to place output values in the place Perl expects them to be when
the XSUB returns back to Perl.
The generated trailer for a CODE: section ensures that the number of return
values Perl will see is either 0 or 1 (depending on the C<void>ness of the
return value of the C function, and heuristics mentioned in
L<"The RETVAL Variable">). The trailer generated for a PPCODE: section
is based on the number of return values and on the number of times
C<SP> was updated by C<[X]PUSH*()> macros.
Note that macros C<ST(i)>, C<XST_m*()> and C<XSRETURN*()> work equally
well in CODE: sections and PPCODE: sections.
The following XSUB will call the C rpcb_gettime() function
and will return its two output values, timep and status, to
Perl as a single list.
void
rpcb_gettime(host)
char *host
PREINIT:
time_t timep;
bool_t status;
PPCODE:
status = rpcb_gettime( host, &timep );
EXTEND(SP, 2);
PUSHs(sv_2mortal(newSViv(status)));
PUSHs(sv_2mortal(newSViv(timep)));
Notice that the programmer must supply the C code necessary
to have the real rpcb_gettime() function called and to have
the return values properly placed on the argument stack.
The C<void> return type for this function tells the B<xsubpp> compiler that
the RETVAL variable is not needed or used and that it should not be created.
In most scenarios the void return type should be used with the PPCODE:
directive.
The EXTEND() macro is used to make room on the argument
stack for 2 return values. The PPCODE: directive causes the
B<xsubpp> compiler to create a stack pointer available as C<SP>, and it
is this pointer which is being used in the EXTEND() macro.
The values are then pushed onto the stack with the PUSHs()
macro.
Now the rpcb_gettime() function can be used from Perl with
the following statement.
($status, $timep) = rpcb_gettime("localhost");
When handling output parameters with a PPCODE section, be sure to handle
'set' magic properly. See L<perlguts> for details about 'set' magic.
=head2 Returning Undef And Empty Lists
Occasionally the programmer will want to return simply
C<undef> or an empty list if a function fails rather than a
separate status value. The rpcb_gettime() function offers
just this situation. If the function succeeds we would like
to have it return the time and if it fails we would like to
have undef returned. In the following Perl code the value
of $timep will either be undef or it will be a valid time.
$timep = rpcb_gettime( "localhost" );
The following XSUB uses the C<SV *> return type as a mnemonic only,
and uses a CODE: block to indicate to the compiler
that the programmer has supplied all the necessary code. The
sv_newmortal() call will initialize the return value to undef, making that
the default return value.
SV *
rpcb_gettime(host)
char * host
PREINIT:
time_t timep;
bool_t x;
CODE:
ST(0) = sv_newmortal();
if( rpcb_gettime( host, &timep ) )
sv_setnv( ST(0), (double)timep);
The next example demonstrates how one would place an explicit undef in the
return value, should the need arise.
SV *
rpcb_gettime(host)
char * host
PREINIT:
time_t timep;
bool_t x;
CODE:
if( rpcb_gettime( host, &timep ) ){
ST(0) = sv_newmortal();
sv_setnv( ST(0), (double)timep);
}
else{
ST(0) = &PL_sv_undef;
}
To return an empty list one must use a PPCODE: block and
then not push return values on the stack.
void
rpcb_gettime(host)
char *host
PREINIT:
time_t timep;
PPCODE:
if( rpcb_gettime( host, &timep ) )
PUSHs(sv_2mortal(newSViv(timep)));
else{
/* Nothing pushed on stack, so an empty
* list is implicitly returned. */
}
Some people may be inclined to include an explicit C<return> in the above
XSUB, rather than letting control fall through to the end. In those
situations C<XSRETURN_EMPTY> should be used, instead. This will ensure that
the XSUB stack is properly adjusted. Consult L<perlapi> for other
C<XSRETURN> macros.
Since C<XSRETURN_*> macros can be used with CODE blocks as well, one can
rewrite this example as:
int
rpcb_gettime(host)
char *host
PREINIT:
time_t timep;
CODE:
RETVAL = rpcb_gettime( host, &timep );
if (RETVAL == 0)
XSRETURN_UNDEF;
OUTPUT:
RETVAL
In fact, one can put this check into a POSTCALL: section as well. Together
with PREINIT: simplifications, this leads to:
int
rpcb_gettime(host)
char *host
time_t timep;
POSTCALL:
if (RETVAL == 0)
XSRETURN_UNDEF;
=head2 The REQUIRE: Keyword
The REQUIRE: keyword is used to indicate the minimum version of the
B<xsubpp> compiler needed to compile the XS module. An XS module which
contains the following statement will compile with only B<xsubpp> version
1.922 or greater:
REQUIRE: 1.922
=head2 The CLEANUP: Keyword
This keyword can be used when an XSUB requires special cleanup procedures
before it terminates. When the CLEANUP: keyword is used it must follow
any CODE:, or OUTPUT: blocks which are present in the XSUB. The code
specified for the cleanup block will be added as the last statements in
the XSUB.
=head2 The POSTCALL: Keyword
This keyword can be used when an XSUB requires special procedures
executed after the C subroutine call is performed. When the POSTCALL:
keyword is used it must precede OUTPUT: and CLEANUP: blocks which are
present in the XSUB.
See examples in L<"The NO_OUTPUT Keyword"> and L<"Returning Undef And Empty Lists">.
The POSTCALL: block does not make a lot of sense when the C subroutine
call is supplied by user by providing either CODE: or PPCODE: section.
=head2 The BOOT: Keyword
The BOOT: keyword is used to add code to the extension's bootstrap
function. The bootstrap function is generated by the B<xsubpp> compiler and
normally holds the statements necessary to register any XSUBs with Perl.
With the BOOT: keyword the programmer can tell the compiler to add extra
statements to the bootstrap function.
This keyword may be used any time after the first MODULE keyword and should
appear on a line by itself. The first blank line after the keyword will
terminate the code block.
BOOT:
# The following message will be printed when the
# bootstrap function executes.
printf("Hello from the bootstrap!\n");
=head2 The VERSIONCHECK: Keyword
The VERSIONCHECK: keyword corresponds to B<xsubpp>'s C<-versioncheck> and
C<-noversioncheck> options. This keyword overrides the command line
options. Version checking is enabled by default. When version checking is
enabled the XS module will attempt to verify that its version matches the
version of the PM module.
To enable version checking:
VERSIONCHECK: ENABLE
To disable version checking:
VERSIONCHECK: DISABLE
Note that if the version of the PM module is an NV (a floating point
number), it will be stringified with a possible loss of precision
(currently chopping to nine decimal places) so that it may not match
the version of the XS module anymore. Quoting the $VERSION declaration
to make it a string is recommended if long version numbers are used.
=head2 The PROTOTYPES: Keyword
The PROTOTYPES: keyword corresponds to B<xsubpp>'s C<-prototypes> and
C<-noprototypes> options. This keyword overrides the command line options.
Prototypes are disabled by default. When prototypes are enabled, XSUBs will
be given Perl prototypes. This keyword may be used multiple times in an XS
module to enable and disable prototypes for different parts of the module.
Note that B<xsubpp> will nag you if you don't explicitly enable or disable
prototypes, with:
Please specify prototyping behavior for Foo.xs (see perlxs manual)
To enable prototypes:
PROTOTYPES: ENABLE
To disable prototypes:
PROTOTYPES: DISABLE
=head2 The PROTOTYPE: Keyword
This keyword is similar to the PROTOTYPES: keyword above but can be used to
force B<xsubpp> to use a specific prototype for the XSUB. This keyword
overrides all other prototype options and keywords but affects only the
current XSUB. Consult L<perlsub/Prototypes> for information about Perl
prototypes.
bool_t
rpcb_gettime(timep, ...)
time_t timep = NO_INIT
PROTOTYPE: $;$
PREINIT:
char *host = "localhost";
CODE:
if( items > 1 )
host = (char *)SvPV_nolen(ST(1));
RETVAL = rpcb_gettime( host, &timep );
OUTPUT:
timep
RETVAL
If the prototypes are enabled, you can disable it locally for a given
XSUB as in the following example:
void
rpcb_gettime_noproto()
PROTOTYPE: DISABLE
...
=head2 The ALIAS: Keyword
The ALIAS: keyword allows an XSUB to have two or more unique Perl names
and to know which of those names was used when it was invoked. The Perl
names may be fully-qualified with package names. Each alias is given an
index. The compiler will setup a variable called C<ix> which contain the
index of the alias which was used. When the XSUB is called with its
declared name C<ix> will be 0.
The following example will create aliases C<FOO::gettime()> and
C<BAR::getit()> for this function.
bool_t
rpcb_gettime(host,timep)
char *host
time_t &timep
ALIAS:
FOO::gettime = 1
BAR::getit = 2
INIT:
printf("# ix = %d\n", ix );
OUTPUT:
timep
=head2 The OVERLOAD: Keyword
Instead of writing an overloaded interface using pure Perl, you
can also use the OVERLOAD keyword to define additional Perl names
for your functions (like the ALIAS: keyword above). However, the
overloaded functions must be defined with three parameters (except
for the nomethod() function which needs four parameters). If any
function has the OVERLOAD: keyword, several additional lines
will be defined in the c file generated by xsubpp in order to
register with the overload magic.
Since blessed objects are actually stored as RV's, it is useful
to use the typemap features to preprocess parameters and extract
the actual SV stored within the blessed RV. See the sample for
T_PTROBJ_SPECIAL below.
To use the OVERLOAD: keyword, create an XS function which takes
three input parameters ( or use the c style '...' definition) like
this:
SV *
cmp (lobj, robj, swap)
My_Module_obj lobj
My_Module_obj robj
IV swap
OVERLOAD: cmp <=>
{ /* function defined here */}
In this case, the function will overload both of the three way
comparison operators. For all overload operations using non-alpha
characters, you must type the parameter without quoting, separating
multiple overloads with whitespace. Note that "" (the stringify
overload) should be entered as \"\" (i.e. escaped).
=head2 The FALLBACK: Keyword
In addition to the OVERLOAD keyword, if you need to control how
Perl autogenerates missing overloaded operators, you can set the
FALLBACK keyword in the module header section, like this:
MODULE = RPC PACKAGE = RPC
FALLBACK: TRUE
...
where FALLBACK can take any of the three values TRUE, FALSE, or
UNDEF. If you do not set any FALLBACK value when using OVERLOAD,
it defaults to UNDEF. FALLBACK is not used except when one or
more functions using OVERLOAD have been defined. Please see
L<overload/fallback> for more details.
=head2 The INTERFACE: Keyword
This keyword declares the current XSUB as a keeper of the given
calling signature. If some text follows this keyword, it is
considered as a list of functions which have this signature, and
should be attached to the current XSUB.
For example, if you have 4 C functions multiply(), divide(), add(),
subtract() all having the signature:
symbolic f(symbolic, symbolic);
you can make them all to use the same XSUB using this:
symbolic
interface_s_ss(arg1, arg2)
symbolic arg1
symbolic arg2
INTERFACE:
multiply divide
add subtract
(This is the complete XSUB code for 4 Perl functions!) Four generated
Perl function share names with corresponding C functions.
The advantage of this approach comparing to ALIAS: keyword is that there
is no need to code a switch statement, each Perl function (which shares
the same XSUB) knows which C function it should call. Additionally, one
can attach an extra function remainder() at runtime by using
CV *mycv = newXSproto("Symbolic::remainder",
XS_Symbolic_interface_s_ss, __FILE__, "$$");
XSINTERFACE_FUNC_SET(mycv, remainder);
say, from another XSUB. (This example supposes that there was no
INTERFACE_MACRO: section, otherwise one needs to use something else instead of
C<XSINTERFACE_FUNC_SET>, see the next section.)
=head2 The INTERFACE_MACRO: Keyword
This keyword allows one to define an INTERFACE using a different way
to extract a function pointer from an XSUB. The text which follows
this keyword should give the name of macros which would extract/set a
function pointer. The extractor macro is given return type, C<CV*>,
and C<XSANY.any_dptr> for this C<CV*>. The setter macro is given cv,
and the function pointer.
The default value is C<XSINTERFACE_FUNC> and C<XSINTERFACE_FUNC_SET>.
An INTERFACE keyword with an empty list of functions can be omitted if
INTERFACE_MACRO keyword is used.
Suppose that in the previous example functions pointers for
multiply(), divide(), add(), subtract() are kept in a global C array
C<fp[]> with offsets being C<multiply_off>, C<divide_off>, C<add_off>,
C<subtract_off>. Then one can use
#define XSINTERFACE_FUNC_BYOFFSET(ret,cv,f) \
((XSINTERFACE_CVT_ANON(ret))fp[CvXSUBANY(cv).any_i32])
#define XSINTERFACE_FUNC_BYOFFSET_set(cv,f) \
CvXSUBANY(cv).any_i32 = CAT2( f, _off )
in C section,
symbolic
interface_s_ss(arg1, arg2)
symbolic arg1
symbolic arg2
INTERFACE_MACRO:
XSINTERFACE_FUNC_BYOFFSET
XSINTERFACE_FUNC_BYOFFSET_set
INTERFACE:
multiply divide
add subtract
in XSUB section.
=head2 The INCLUDE: Keyword
This keyword can be used to pull other files into the XS module. The other
files may have XS code. INCLUDE: can also be used to run a command to
generate the XS code to be pulled into the module.
The file F<Rpcb1.xsh> contains our C<rpcb_gettime()> function:
bool_t
rpcb_gettime(host,timep)
char *host
time_t &timep
OUTPUT:
timep
The XS module can use INCLUDE: to pull that file into it.
INCLUDE: Rpcb1.xsh
If the parameters to the INCLUDE: keyword are followed by a pipe (C<|>) then
the compiler will interpret the parameters as a command. This feature is
mildly deprecated in favour of the C<INCLUDE_COMMAND:> directive, as documented
below.
INCLUDE: cat Rpcb1.xsh |
Do not use this to run perl: C<INCLUDE: perl |> will run the perl that
happens to be the first in your path and not necessarily the same perl that is
used to run C<xsubpp>. See L<"The INCLUDE_COMMAND: Keyword">.
=head2 The INCLUDE_COMMAND: Keyword
Runs the supplied command and includes its output into the current XS
document. C<INCLUDE_COMMAND> assigns special meaning to the C<$^X> token
in that it runs the same perl interpreter that is running C<xsubpp>:
INCLUDE_COMMAND: cat Rpcb1.xsh
INCLUDE_COMMAND: $^X -e ...
=head2 The CASE: Keyword
The CASE: keyword allows an XSUB to have multiple distinct parts with each
part acting as a virtual XSUB. CASE: is greedy and if it is used then all
other XS keywords must be contained within a CASE:. This means nothing may
precede the first CASE: in the XSUB and anything following the last CASE: is
included in that case.
A CASE: might switch via a parameter of the XSUB, via the C<ix> ALIAS:
variable (see L<"The ALIAS: Keyword">), or maybe via the C<items> variable
(see L<"Variable-length Parameter Lists">). The last CASE: becomes the
B<default> case if it is not associated with a conditional. The following
example shows CASE switched via C<ix> with a function C<rpcb_gettime()>
having an alias C<x_gettime()>. When the function is called as
C<rpcb_gettime()> its parameters are the usual C<(char *host, time_t *timep)>,
but when the function is called as C<x_gettime()> its parameters are
reversed, C<(time_t *timep, char *host)>.
long
rpcb_gettime(a,b)
CASE: ix == 1
ALIAS:
x_gettime = 1
INPUT:
# 'a' is timep, 'b' is host
char *b
time_t a = NO_INIT
CODE:
RETVAL = rpcb_gettime( b, &a );
OUTPUT:
a
RETVAL
CASE:
# 'a' is host, 'b' is timep
char *a
time_t &b = NO_INIT
OUTPUT:
b
RETVAL
That function can be called with either of the following statements. Note
the different argument lists.
$status = rpcb_gettime( $host, $timep );
$status = x_gettime( $timep, $host );
=head2 The EXPORT_XSUB_SYMBOLS: Keyword
The EXPORT_XSUB_SYMBOLS: keyword is likely something you will never need.
In perl versions earlier than 5.16.0, this keyword does nothing. Starting
with 5.16, XSUB symbols are no longer exported by default. That is, they
are C<static> functions. If you include
EXPORT_XSUB_SYMBOLS: ENABLE
in your XS code, the XSUBs following this line will not be declared C<static>.
You can later disable this with
EXPORT_XSUB_SYMBOLS: DISABLE
which, again, is the default that you should probably never change.
You cannot use this keyword on versions of perl before 5.16 to make
XSUBs C<static>.
=head2 The & Unary Operator
The C<&> unary operator in the INPUT: section is used to tell B<xsubpp>
that it should convert a Perl value to/from C using the C type to the left
of C<&>, but provide a pointer to this value when the C function is called.
This is useful to avoid a CODE: block for a C function which takes a parameter
by reference. Typically, the parameter should be not a pointer type (an
C<int> or C<long> but not an C<int*> or C<long*>).
The following XSUB will generate incorrect C code. The B<xsubpp> compiler will
turn this into code which calls C<rpcb_gettime()> with parameters C<(char
*host, time_t timep)>, but the real C<rpcb_gettime()> wants the C<timep>
parameter to be of type C<time_t*> rather than C<time_t>.
bool_t
rpcb_gettime(host,timep)
char *host
time_t timep
OUTPUT:
timep
That problem is corrected by using the C<&> operator. The B<xsubpp> compiler
will now turn this into code which calls C<rpcb_gettime()> correctly with
parameters C<(char *host, time_t *timep)>. It does this by carrying the
C<&> through, so the function call looks like C<rpcb_gettime(host, &timep)>.
bool_t
rpcb_gettime(host,timep)
char *host
time_t &timep
OUTPUT:
timep
=head2 Inserting POD, Comments and C Preprocessor Directives
C preprocessor directives are allowed within BOOT:, PREINIT: INIT:, CODE:,
PPCODE:, POSTCALL:, and CLEANUP: blocks, as well as outside the functions.
Comments are allowed anywhere after the MODULE keyword. The compiler will
pass the preprocessor directives through untouched and will remove the
commented lines. POD documentation is allowed at any point, both in the
C and XS language sections. POD must be terminated with a C<=cut> command;
C<xsubpp> will exit with an error if it does not. It is very unlikely that
human generated C code will be mistaken for POD, as most indenting styles
result in whitespace in front of any line starting with C<=>. Machine
generated XS files may fall into this trap unless care is taken to
ensure that a space breaks the sequence "\n=".
Comments can be added to XSUBs by placing a C<#> as the first
non-whitespace of a line. Care should be taken to avoid making the
comment look like a C preprocessor directive, lest it be interpreted as
such. The simplest way to prevent this is to put whitespace in front of
the C<#>.
If you use preprocessor directives to choose one of two
versions of a function, use
#if ... version1
#else /* ... version2 */
#endif
and not
#if ... version1
#endif
#if ... version2
#endif
because otherwise B<xsubpp> will believe that you made a duplicate
definition of the function. Also, put a blank line before the
#else/#endif so it will not be seen as part of the function body.
=head2 Using XS With C++
If an XSUB name contains C<::>, it is considered to be a C++ method.
The generated Perl function will assume that
its first argument is an object pointer. The object pointer
will be stored in a variable called THIS. The object should
have been created by C++ with the new() function and should
be blessed by Perl with the sv_setref_pv() macro. The
blessing of the object by Perl can be handled by a typemap. An example
typemap is shown at the end of this section.
If the return type of the XSUB includes C<static>, the method is considered
to be a static method. It will call the C++
function using the class::method() syntax. If the method is not static
the function will be called using the THIS-E<gt>method() syntax.
The next examples will use the following C++ class.
class color {
public:
color();
~color();
int blue();
void set_blue( int );
private:
int c_blue;
};
The XSUBs for the blue() and set_blue() methods are defined with the class
name but the parameter for the object (THIS, or "self") is implicit and is
not listed.
int
color::blue()
void
color::set_blue( val )
int val
Both Perl functions will expect an object as the first parameter. In the
generated C++ code the object is called C<THIS>, and the method call will
be performed on this object. So in the C++ code the blue() and set_blue()
methods will be called as this:
RETVAL = THIS->blue();
THIS->set_blue( val );
You could also write a single get/set method using an optional argument:
int
color::blue( val = NO_INIT )
int val
PROTOTYPE $;$
CODE:
if (items > 1)
THIS->set_blue( val );
RETVAL = THIS->blue();
OUTPUT:
RETVAL
If the function's name is B<DESTROY> then the C++ C<delete> function will be
called and C<THIS> will be given as its parameter. The generated C++ code for
void
color::DESTROY()
will look like this:
color *THIS = ...; // Initialized as in typemap
delete THIS;
If the function's name is B<new> then the C++ C<new> function will be called
to create a dynamic C++ object. The XSUB will expect the class name, which
will be kept in a variable called C<CLASS>, to be given as the first
argument.
color *
color::new()
The generated C++ code will call C<new>.
RETVAL = new color();
The following is an example of a typemap that could be used for this C++
example.
TYPEMAP
color * O_OBJECT
OUTPUT
# The Perl object is blessed into 'CLASS', which should be a
# char* having the name of the package for the blessing.
O_OBJECT
sv_setref_pv( $arg, CLASS, (void*)$var );
INPUT
O_OBJECT
if( sv_isobject($arg) && (SvTYPE(SvRV($arg)) == SVt_PVMG) )
$var = ($type)SvIV((SV*)SvRV( $arg ));
else{
warn("${Package}::$func_name() -- " .
"$var is not a blessed SV reference");
XSRETURN_UNDEF;
}
=head2 Interface Strategy
When designing an interface between Perl and a C library a straight
translation from C to XS (such as created by C<h2xs -x>) is often sufficient.
However, sometimes the interface will look
very C-like and occasionally nonintuitive, especially when the C function
modifies one of its parameters, or returns failure inband (as in "negative
return values mean failure"). In cases where the programmer wishes to
create a more Perl-like interface the following strategy may help to
identify the more critical parts of the interface.
Identify the C functions with input/output or output parameters. The XSUBs for
these functions may be able to return lists to Perl.
Identify the C functions which use some inband info as an indication
of failure. They may be
candidates to return undef or an empty list in case of failure. If the
failure may be detected without a call to the C function, you may want to use
an INIT: section to report the failure. For failures detectable after the C
function returns one may want to use a POSTCALL: section to process the
failure. In more complicated cases use CODE: or PPCODE: sections.
If many functions use the same failure indication based on the return value,
you may want to create a special typedef to handle this situation. Put
typedef int negative_is_failure;
near the beginning of XS file, and create an OUTPUT typemap entry
for C<negative_is_failure> which converts negative values to C<undef>, or
maybe croak()s. After this the return value of type C<negative_is_failure>
will create more Perl-like interface.
Identify which values are used by only the C and XSUB functions
themselves, say, when a parameter to a function should be a contents of a
global variable. If Perl does not need to access the contents of the value
then it may not be necessary to provide a translation for that value
from C to Perl.
Identify the pointers in the C function parameter lists and return
values. Some pointers may be used to implement input/output or
output parameters, they can be handled in XS with the C<&> unary operator,
and, possibly, using the NO_INIT keyword.
Some others will require handling of types like C<int *>, and one needs
to decide what a useful Perl translation will do in such a case. When
the semantic is clear, it is advisable to put the translation into a typemap
file.
Identify the structures used by the C functions. In many
cases it may be helpful to use the T_PTROBJ typemap for
these structures so they can be manipulated by Perl as
blessed objects. (This is handled automatically by C<h2xs -x>.)
If the same C type is used in several different contexts which require
different translations, C<typedef> several new types mapped to this C type,
and create separate F<typemap> entries for these new types. Use these
types in declarations of return type and parameters to XSUBs.
=head2 Perl Objects And C Structures
When dealing with C structures one should select either
B<T_PTROBJ> or B<T_PTRREF> for the XS type. Both types are
designed to handle pointers to complex objects. The
T_PTRREF type will allow the Perl object to be unblessed
while the T_PTROBJ type requires that the object be blessed.
By using T_PTROBJ one can achieve a form of type-checking
because the XSUB will attempt to verify that the Perl object
is of the expected type.
The following XS code shows the getnetconfigent() function which is used
with ONC+ TIRPC. The getnetconfigent() function will return a pointer to a
C structure and has the C prototype shown below. The example will
demonstrate how the C pointer will become a Perl reference. Perl will
consider this reference to be a pointer to a blessed object and will
attempt to call a destructor for the object. A destructor will be
provided in the XS source to free the memory used by getnetconfigent().
Destructors in XS can be created by specifying an XSUB function whose name
ends with the word B<DESTROY>. XS destructors can be used to free memory
which may have been malloc'd by another XSUB.
struct netconfig *getnetconfigent(const char *netid);
A C<typedef> will be created for C<struct netconfig>. The Perl
object will be blessed in a class matching the name of the C
type, with the tag C<Ptr> appended, and the name should not
have embedded spaces if it will be a Perl package name. The
destructor will be placed in a class corresponding to the
class of the object and the PREFIX keyword will be used to
trim the name to the word DESTROY as Perl will expect.
typedef struct netconfig Netconfig;
MODULE = RPC PACKAGE = RPC
Netconfig *
getnetconfigent(netid)
char *netid
MODULE = RPC PACKAGE = NetconfigPtr PREFIX = rpcb_
void
rpcb_DESTROY(netconf)
Netconfig *netconf
CODE:
printf("Now in NetconfigPtr::DESTROY\n");
free( netconf );
This example requires the following typemap entry. Consult
L<perlxstypemap> for more information about adding new typemaps
for an extension.
TYPEMAP
Netconfig * T_PTROBJ
This example will be used with the following Perl statements.
use RPC;
$netconf = getnetconfigent("udp");
When Perl destroys the object referenced by $netconf it will send the
object to the supplied XSUB DESTROY function. Perl cannot determine, and
does not care, that this object is a C struct and not a Perl object. In
this sense, there is no difference between the object created by the
getnetconfigent() XSUB and an object created by a normal Perl subroutine.
=head2 Safely Storing Static Data in XS
Starting with Perl 5.8, a macro framework has been defined to allow
static data to be safely stored in XS modules that will be accessed from
a multi-threaded Perl.
Although primarily designed for use with multi-threaded Perl, the macros
have been designed so that they will work with non-threaded Perl as well.
It is therefore strongly recommended that these macros be used by all
XS modules that make use of static data.
The easiest way to get a template set of macros to use is by specifying
the C<-g> (C<--global>) option with h2xs (see L<h2xs>).
Below is an example module that makes use of the macros.
#define PERL_NO_GET_CONTEXT
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
/* Global Data */
#define MY_CXT_KEY "BlindMice::_guts" XS_VERSION
typedef struct {
int count;
char name[3][100];
} my_cxt_t;
START_MY_CXT
MODULE = BlindMice PACKAGE = BlindMice
BOOT:
{
MY_CXT_INIT;
MY_CXT.count = 0;
strcpy(MY_CXT.name[0], "None");
strcpy(MY_CXT.name[1], "None");
strcpy(MY_CXT.name[2], "None");
}
int
newMouse(char * name)
PREINIT:
dMY_CXT;
CODE:
if (MY_CXT.count >= 3) {
warn("Already have 3 blind mice");
RETVAL = 0;
}
else {
RETVAL = ++ MY_CXT.count;
strcpy(MY_CXT.name[MY_CXT.count - 1], name);
}
OUTPUT:
RETVAL
char *
get_mouse_name(index)
int index
PREINIT:
dMY_CXT;
CODE:
if (index > MY_CXT.count)
croak("There are only 3 blind mice.");
else
RETVAL = MY_CXT.name[index - 1];
OUTPUT:
RETVAL
void
CLONE(...)
CODE:
MY_CXT_CLONE;
=head3 MY_CXT REFERENCE
=over 5
=item MY_CXT_KEY
This macro is used to define a unique key to refer to the static data
for an XS module. The suggested naming scheme, as used by h2xs, is to
use a string that consists of the module name, the string "::_guts"
and the module version number.
#define MY_CXT_KEY "MyModule::_guts" XS_VERSION
=item typedef my_cxt_t
This struct typedef I<must> always be called C<my_cxt_t>. The other
C<CXT*> macros assume the existence of the C<my_cxt_t> typedef name.
Declare a typedef named C<my_cxt_t> that is a structure that contains
all the data that needs to be interpreter-local.
typedef struct {
int some_value;
} my_cxt_t;
=item START_MY_CXT
Always place the START_MY_CXT macro directly after the declaration
of C<my_cxt_t>.
=item MY_CXT_INIT
The MY_CXT_INIT macro initializes storage for the C<my_cxt_t> struct.
It I<must> be called exactly once, typically in a BOOT: section. If you
are maintaining multiple interpreters, it should be called once in each
interpreter instance, except for interpreters cloned from existing ones.
(But see L</MY_CXT_CLONE> below.)
=item dMY_CXT
Use the dMY_CXT macro (a declaration) in all the functions that access
MY_CXT.
=item MY_CXT
Use the MY_CXT macro to access members of the C<my_cxt_t> struct. For
example, if C<my_cxt_t> is
typedef struct {
int index;
} my_cxt_t;
then use this to access the C<index> member
dMY_CXT;
MY_CXT.index = 2;
=item aMY_CXT/pMY_CXT
C<dMY_CXT> may be quite expensive to calculate, and to avoid the overhead
of invoking it in each function it is possible to pass the declaration
onto other functions using the C<aMY_CXT>/C<pMY_CXT> macros, eg
void sub1() {
dMY_CXT;
MY_CXT.index = 1;
sub2(aMY_CXT);
}
void sub2(pMY_CXT) {
MY_CXT.index = 2;
}
Analogously to C<pTHX>, there are equivalent forms for when the macro is the
first or last in multiple arguments, where an underscore represents a
comma, i.e. C<_aMY_CXT>, C<aMY_CXT_>, C<_pMY_CXT> and C<pMY_CXT_>.
=item MY_CXT_CLONE
By default, when a new interpreter is created as a copy of an existing one
(eg via C<< threads->create() >>), both interpreters share the same physical
my_cxt_t structure. Calling C<MY_CXT_CLONE> (typically via the package's
C<CLONE()> function), causes a byte-for-byte copy of the structure to be
taken, and any future dMY_CXT will cause the copy to be accessed instead.
=item MY_CXT_INIT_INTERP(my_perl)
=item dMY_CXT_INTERP(my_perl)
These are versions of the macros which take an explicit interpreter as an
argument.
=back
Note that these macros will only work together within the I<same> source
file; that is, a dMY_CTX in one source file will access a different structure
than a dMY_CTX in another source file.
=head2 Thread-aware system interfaces
Starting from Perl 5.8, in C/C++ level Perl knows how to wrap
system/library interfaces that have thread-aware versions
(e.g. getpwent_r()) into frontend macros (e.g. getpwent()) that
correctly handle the multithreaded interaction with the Perl
interpreter. This will happen transparently, the only thing
you need to do is to instantiate a Perl interpreter.
This wrapping happens always when compiling Perl core source
(PERL_CORE is defined) or the Perl core extensions (PERL_EXT is
defined). When compiling XS code outside of Perl core the wrapping
does not take place. Note, however, that intermixing the _r-forms
(as Perl compiled for multithreaded operation will do) and the _r-less
forms is neither well-defined (inconsistent results, data corruption,
or even crashes become more likely), nor is it very portable.
=head1 EXAMPLES
File C<RPC.xs>: Interface to some ONC+ RPC bind library functions.
#define PERL_NO_GET_CONTEXT
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include <rpc/rpc.h>
typedef struct netconfig Netconfig;
MODULE = RPC PACKAGE = RPC
SV *
rpcb_gettime(host="localhost")
char *host
PREINIT:
time_t timep;
CODE:
ST(0) = sv_newmortal();
if( rpcb_gettime( host, &timep ) )
sv_setnv( ST(0), (double)timep );
Netconfig *
getnetconfigent(netid="udp")
char *netid
MODULE = RPC PACKAGE = NetconfigPtr PREFIX = rpcb_
void
rpcb_DESTROY(netconf)
Netconfig *netconf
CODE:
printf("NetconfigPtr::DESTROY\n");
free( netconf );
File C<typemap>: Custom typemap for RPC.xs. (cf. L<perlxstypemap>)
TYPEMAP
Netconfig * T_PTROBJ
File C<RPC.pm>: Perl module for the RPC extension.
package RPC;
require Exporter;
require DynaLoader;
@ISA = qw(Exporter DynaLoader);
@EXPORT = qw(rpcb_gettime getnetconfigent);
bootstrap RPC;
1;
File C<rpctest.pl>: Perl test program for the RPC extension.
use RPC;
$netconf = getnetconfigent();
$a = rpcb_gettime();
print "time = $a\n";
print "netconf = $netconf\n";
$netconf = getnetconfigent("tcp");
$a = rpcb_gettime("poplar");
print "time = $a\n";
print "netconf = $netconf\n";
=head1 CAVEATS
XS code has full access to system calls including C library functions.
It thus has the capability of interfering with things that the Perl core
or other modules have set up, such as signal handlers or file handles.
It could mess with the memory, or any number of harmful things. Don't.
Some modules have an event loop, waiting for user-input. It is highly
unlikely that two such modules would work adequately together in a
single Perl application.
In general, the perl interpreter views itself as the center of the
universe as far as the Perl program goes. XS code is viewed as a
help-mate, to accomplish things that perl doesn't do, or doesn't do fast
enough, but always subservient to perl. The closer XS code adheres to
this model, the less likely conflicts will occur.
One area where there has been conflict is in regards to C locales. (See
L<perllocale>.) perl, with one exception and unless told otherwise,
sets up the underlying locale the program is running in to the locale
passed
into it from the environment. This is an important difference from a
generic C language program, where the underlying locale is the "C"
locale unless the program changes it. As of v5.20, this underlying
locale is completely hidden from pure perl code outside the lexical
scope of C<S<use locale>> except for a couple of function calls in the
POSIX module which of necessity use it. But the underlying locale, with
that
one exception is exposed to XS code, affecting all C library routines
whose behavior is locale-dependent. Your XS code better not assume that
the underlying locale is "C". The exception is the
L<C<LC_NUMERIC>|perllocale/Category LC_NUMERIC: Numeric Formatting>
locale category, and the reason it is an exception is that experience
has shown that it can be problematic for XS code, whereas we have not
had reports of problems with the
L<other locale categories|perllocale/WHAT IS A LOCALE>. And the reason
for this one category being problematic is that the character used as a
decimal point can vary. Many European languages use a comma, whereas
English, and hence Perl are expecting a dot (U+002E: FULL STOP). Many
modules can handle only the radix character being a dot, and so perl
attempts to make it so. Up through Perl v5.20, the attempt was merely
to set C<LC_NUMERIC> upon startup to the C<"C"> locale. Any
L<setlocale()|perllocale/The setlocale function> otherwise would change
it; this caused some failures. Therefore, starting in v5.22, perl tries
to keep C<LC_NUMERIC> always set to C<"C"> for XS code.
To summarize, here's what to expect and how to handle locales in XS code:
=over
=item Non-locale-aware XS code
Keep in mind that even if you think your code is not locale-aware, it
may call a C library function that is. Hopefully the man page for such
a function will indicate that dependency, but the documentation is
imperfect.
The current locale is exposed to XS code except possibly C<LC_NUMERIC>
(explained in the next paragraph).
There have not been reports of problems with the other categories.
Perl initializes things on start-up so that the current locale is the
one which is indicated by the user's environment in effect at that time.
See L<perllocale/ENVIRONMENT>.
However, up through v5.20, Perl initialized things on start-up so that
C<LC_NUMERIC> was set to the "C" locale. But if any code anywhere
changed it, it would stay changed. This means that your module can't
count on C<LC_NUMERIC> being something in particular, and you can't
expect floating point numbers (including version strings) to have dots
in them. If you don't allow for a non-dot, your code could break if
anyone anywhere changed the locale. For this reason, v5.22 changed
the behavior so that Perl tries to keep C<LC_NUMERIC> in the "C" locale
except around the operations internally where it should be something
else. Misbehaving XS code will always be able to change the locale
anyway, but the most common instance of this is checked for and
handled.
=item Locale-aware XS code
If the locale from the user's environment is desired, there should be no
need for XS code to set the locale except for C<LC_NUMERIC>, as perl has
already set it up. XS code should avoid changing the locale, as it can
adversely affect other, unrelated, code and may not be thread safe.
However, some alien libraries that may be called do set it, such as
C<Gtk>. This can cause problems for the perl core and other modules.
Starting in v5.20.1, calling the function
L<sync_locale()|perlapi/sync_locale> from XS should be sufficient to
avoid most of these problems. Prior to this, you need a pure Perl
statement that does this:
POSIX::setlocale(LC_ALL, POSIX::setlocale(LC_ALL));
In the event that your XS code may need the underlying C<LC_NUMERIC>
locale, there are macros available to access this; see
L<perlapi/Locale-related functions and macros>.
=back
=head1 XS VERSION
This document covers features supported by C<ExtUtils::ParseXS>
(also known as C<xsubpp>) 3.13_01.
=head1 AUTHOR
Originally written by Dean Roehrich <F<roehrich@cray.com>>.
Maintained since 1996 by The Perl Porters <F<perlbug@perl.org>>.
PK z3�Z<E�� � perl5262delta.podnu �[��� =encoding utf8
=head1 NAME
perl5262delta - what is new for perl v5.26.2
=head1 DESCRIPTION
This document describes differences between the 5.26.1 release and the 5.26.2
release.
If you are upgrading from an earlier release such as 5.26.0, first read
L<perl5261delta>, which describes differences between 5.26.0 and 5.26.1.
=head1 Security
=head2 [CVE-2018-6797] heap-buffer-overflow (WRITE of size 1) in S_regatom (regcomp.c)
A crafted regular expression could cause a heap buffer write overflow, with
control over the bytes written.
L<[perl #132227]|https://rt.perl.org/Public/Bug/Display.html?id=132227>
=head2 [CVE-2018-6798] Heap-buffer-overflow in Perl__byte_dump_string (utf8.c)
Matching a crafted locale dependent regular expression could cause a heap
buffer read overflow and potentially information disclosure.
L<[perl #132063]|https://rt.perl.org/Public/Bug/Display.html?id=132063>
=head2 [CVE-2018-6913] heap-buffer-overflow in S_pack_rec
C<pack()> could cause a heap buffer write overflow with a large item count.
L<[perl #131844]|https://rt.perl.org/Public/Bug/Display.html?id=131844>
=head2 Assertion failure in Perl__core_swash_init (utf8.c)
Control characters in a supposed Unicode property name could cause perl to
crash. This has been fixed.
L<[perl #132055]|https://rt.perl.org/Public/Bug/Display.html?id=132055>
L<[perl #132553]|https://rt.perl.org/Public/Bug/Display.html?id=132553>
L<[perl #132658]|https://rt.perl.org/Public/Bug/Display.html?id=132658>
=head1 Incompatible Changes
There are no changes intentionally incompatible with 5.26.1. If any exist,
they are bugs, and we request that you submit a report. See L</Reporting
Bugs> below.
=head1 Modules and Pragmata
=head2 Updated Modules and Pragmata
=over 4
=item *
L<Module::CoreList> has been upgraded from version 5.20170922_26 to 5.20180414_26.
=item *
L<PerlIO::via> has been upgraded from version 0.16 to 0.17.
=item *
L<Term::ReadLine> has been upgraded from version 1.16 to 1.17.
=item *
L<Unicode::UCD> has been upgraded from version 0.68 to 0.69.
=back
=head1 Documentation
=head2 Changes to Existing Documentation
=head3 L<perluniprops>
=over 4
=item *
This has been updated to note that C<\p{Word}> now includes code points
matching the C<\p{Join_Control}> property. The change to the property was made
in Perl 5.18, but not documented until now. There are currently only two code
points that match this property: U+200C (ZERO WIDTH NON-JOINER) and U+200D
(ZERO WIDTH JOINER).
=back
=head1 Platform Support
=head2 Platform-Specific Notes
=over 4
=item Windows
Visual C++ compiler version detection has been improved to work on non-English
language systems.
L<[perl #132421]|https://rt.perl.org/Public/Bug/Display.html?id=132421>
We now set C<$Config{libpth}> correctly for 64-bit builds using Visual C++
versions earlier than 14.1.
L<[perl #132484]|https://rt.perl.org/Public/Bug/Display.html?id=132484>
=back
=head1 Selected Bug Fixes
=over 4
=item *
The C<readpipe()> built-in function now checks at compile time that it has only
one parameter expression, and puts it in scalar context, thus ensuring that it
doesn't corrupt the stack at runtime.
L<[perl #4574]|https://rt.perl.org/Public/Bug/Display.html?id=4574>
=item *
Fixed a use after free bug in C<pp_list> introduced in Perl 5.27.1.
L<[perl #131954]|https://rt.perl.org/Public/Bug/Display.html?id=131954>
=item *
Parsing a C<sub> definition could cause a use after free if the C<sub> keyword
was followed by whitespace including newlines (and comments).
L<[perl #131836]|https://rt.perl.org/Public/Bug/Display.html?id=131836>
=item *
The tokenizer now correctly adjusts a parse pointer when skipping whitespace in
an C< ${identifier} > construct.
L<[perl #131949]|https://rt.perl.org/Public/Bug/Display.html?id=131949>
=item *
Accesses to C<${^LAST_FH}> no longer assert after using any of a variety of I/O
operations on a non-glob.
L<[perl #128263]|https://rt.perl.org/Public/Bug/Display.html?id=128263>
=item *
C<sort> now performs correct reference counting when aliasing C<$a> and C<$b>,
thus avoiding premature destruction and leakage of scalars if they are
re-aliased during execution of the sort comparator.
L<[perl #92264]|https://rt.perl.org/Public/Bug/Display.html?id=92264>
=item *
Some convoluted kinds of regexp no longer cause an arithmetic overflow when
compiled.
L<[perl #131893]|https://rt.perl.org/Public/Bug/Display.html?id=131893>
=item *
Fixed a duplicate symbol failure with B<-flto -mieee-fp> builds. F<pp.c>
defined C<_LIB_VERSION> which B<-lieee> already defines.
L<[perl #131786]|https://rt.perl.org/Public/Bug/Display.html?id=131786>
=item *
A NULL pointer dereference in the C<S_regmatch()> function has been fixed.
L<[perl #132017]|https://rt.perl.org/Public/Bug/Display.html?id=132017>
=item *
Failures while compiling code within other constructs, such as with string
interpolation and the right part of C<s///e> now cause compilation to abort
earlier.
Previously compilation could continue in order to report other errors, but the
failed sub-parse could leave partly parsed constructs on the parser
shift-reduce stack, confusing the parser, leading to perl crashes.
L<[perl #125351]|https://rt.perl.org/Public/Bug/Display.html?id=125351>
=back
=head1 Acknowledgements
Perl 5.26.2 represents approximately 7 months of development since Perl 5.26.1
and contains approximately 3,300 lines of changes across 82 files from 17
authors.
Excluding auto-generated files, documentation and release tools, there were
approximately 1,800 lines of changes to 36 .pm, .t, .c and .h files.
Perl continues to flourish into its third decade thanks to a vibrant community
of users and developers. The following people are known to have contributed
the improvements that became Perl 5.26.2:
Aaron Crane, Abigail, Chris 'BinGOs' Williams, H.Merijn Brand, James E Keenan,
Jarkko Hietaniemi, John SJ Anderson, Karen Etheridge, Karl Williamson, Lukas
Mai, Renee Baecker, Sawyer X, Steve Hay, Todd Rinaldo, Tony Cook, Yves Orton,
Zefram.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the perl bug database
at L<https://rt.perl.org/> . There may also be information at
L<http://www.perl.org/> , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug> program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of C<perl -V>,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications which make it
inappropriate to send to a publicly archived mailing list, then see
L<perlsec/SECURITY VULNERABILITY CONTACT INFORMATION>
for details of how to report the issue.
=head1 Give Thanks
If you wish to thank the Perl 5 Porters for the work we had done in Perl 5,
you can do so by running the C<perlthanks> program:
perlthanks
This will send an email to the Perl 5 Porters list with your show of thanks.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details on
what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK z3�Zbˌ�* �* perlmodlib.podnu �[��� -*- buffer-read-only: t -*-
!!!!!!! DO NOT EDIT THIS FILE !!!!!!!
This file is built by pod/perlmodlib.PL extracting documentation from the
Perl source files.
Any changes made here will be lost!
=head1 NAME
perlmodlib - constructing new Perl modules and finding existing ones
=head1 THE PERL MODULE LIBRARY
Many modules are included in the Perl distribution. These are described
below, and all end in F<.pm>. You may discover compiled library
files (usually ending in F<.so>) or small pieces of modules to be
autoloaded (ending in F<.al>); these were automatically generated
by the installation process. You may also discover files in the
library directory that end in either F<.pl> or F<.ph>. These are
old libraries supplied so that old programs that use them still
run. The F<.pl> files will all eventually be converted into standard
modules, and the F<.ph> files made by B<h2ph> will probably end up
as extension modules made by B<h2xs>. (Some F<.ph> values may
already be available through the POSIX, Errno, or Fcntl modules.)
The B<pl2pm> file in the distribution may help in your conversion,
but it's just a mechanical process and therefore far from bulletproof.
=head2 Pragmatic Modules
They work somewhat like compiler directives (pragmata) in that they
tend to affect the compilation of your program, and thus will usually
work well only when used within a C<use>, or C<no>. Most of these
are lexically scoped, so an inner BLOCK may countermand them
by saying:
no integer;
no strict 'refs';
no warnings;
which lasts until the end of that BLOCK.
Some pragmas are lexically scoped--typically those that affect the
C<$^H> hints variable. Others affect the current package instead,
like C<use vars> and C<use subs>, which allow you to predeclare a
variables or subroutines within a particular I<file> rather than
just a block. Such declarations are effective for the entire file
for which they were declared. You cannot rescind them with C<no
vars> or C<no subs>.
The following pragmas are defined (and have their own documentation).
=over 12
=item arybase
Set indexing base via $[
=item attributes
Get/set subroutine or variable attributes
=item autodie
Replace functions with ones that succeed or die with lexical scope
=item autodie::exception
Exceptions from autodying functions.
=item autodie::exception::system
Exceptions from autodying system().
=item autodie::hints
Provide hints about user subroutines to autodie
=item autodie::skip
Skip a package when throwing autodie exceptions
=item autouse
Postpone load of modules until a function is used
=item base
Establish an ISA relationship with base classes at compile time
=item bigint
Transparent BigInteger support for Perl
=item bignum
Transparent BigNumber support for Perl
=item bigrat
Transparent BigNumber/BigRational support for Perl
=item blib
Use MakeMaker's uninstalled version of a package
=item bytes
Expose the individual bytes of characters
=item charnames
Access to Unicode character names and named character sequences; also define character names
=item constant
Declare constants
=item deprecate
Perl pragma for deprecating the core version of a module
=item diagnostics
Produce verbose warning diagnostics
=item encoding
Allows you to write your script in non-ASCII and non-UTF-8
=item encoding::warnings
Warn on implicit encoding conversions
=item experimental
Experimental features made easy
=item feature
Enable new features
=item fields
Compile-time class fields
=item filetest
Control the filetest permission operators
=item if
C<use> a Perl module if a condition holds (also can C<no> a module)
=item integer
Use integer arithmetic instead of floating point
=item less
Request less of something
=item lib
Manipulate @INC at compile time
=item locale
Use or avoid POSIX locales for built-in operations
=item mro
Method Resolution Order
=item ok
Alternative to Test::More::use_ok
=item open
Set default PerlIO layers for input and output
=item ops
Restrict unsafe operations when compiling
=item overload
Package for overloading Perl operations
=item overloading
Lexically control overloading
=item parent
Establish an ISA relationship with base classes at compile time
=item re
Alter regular expression behaviour
=item sigtrap
Enable simple signal handling
=item sort
Control sort() behaviour
=item strict
Restrict unsafe constructs
=item subs
Predeclare sub names
=item threads
Perl interpreter-based threads
=item threads::shared
Perl extension for sharing data structures between threads
=item utf8
Enable/disable UTF-8 (or UTF-EBCDIC) in source code
=item vars
Predeclare global variable names
=item version
Perl extension for Version Objects
=item vmsish
Control VMS-specific language features
=item warnings::register
Warnings import function
=back
=head2 Standard Modules
Standard, bundled modules are all expected to behave in a well-defined
manner with respect to namespace pollution because they use the
Exporter module. See their own documentation for details.
It's possible that not all modules listed below are installed on your
system. For example, the GDBM_File module will not be installed if you
don't have the gdbm library.
=over 12
=item Amiga::ARexx
Perl extension for ARexx support
=item Amiga::Exec
Perl extension for low level amiga support
=item AnyDBM_File
Provide framework for multiple DBMs
=item App::Cpan
Easily interact with CPAN from the command line
=item App::Prove
Implements the C<prove> command.
=item App::Prove::State
State storage for the C<prove> command.
=item App::Prove::State::Result
Individual test suite results.
=item App::Prove::State::Result::Test
Individual test results.
=item Archive::Tar
Module for manipulations of tar archives
=item Archive::Tar::File
A subclass for in-memory extracted file from Archive::Tar
=item Attribute::Handlers
Simpler definition of attribute handlers
=item AutoLoader
Load subroutines only on demand
=item AutoSplit
Split a package for autoloading
=item B
The Perl Compiler Backend
=item B::Concise
Walk Perl syntax tree, printing concise info about ops
=item B::Debug
Walk Perl syntax tree, printing debug info about ops
=item B::Deparse
Perl compiler backend to produce perl code
=item B::Op_private
OP op_private flag definitions
=item B::Showlex
Show lexical variables used in functions or files
=item B::Terse
Walk Perl syntax tree, printing terse info about ops
=item B::Xref
Generates cross reference reports for Perl programs
=item Benchmark
Benchmark running times of Perl code
=item C<IO::Socket::IP>
Family-neutral IP socket supporting both IPv4 and IPv6
=item C<Socket>
Networking constants and support functions
=item CORE
Namespace for Perl's core routines
=item CPAN
Query, download and build perl modules from CPAN sites
=item CPAN::API::HOWTO
A recipe book for programming with CPAN.pm
=item CPAN::Debug
Internal debugging for CPAN.pm
=item CPAN::Distroprefs
Read and match distroprefs
=item CPAN::FirstTime
Utility for CPAN::Config file Initialization
=item CPAN::HandleConfig
Internal configuration handling for CPAN.pm
=item CPAN::Kwalify
Interface between CPAN.pm and Kwalify.pm
=item CPAN::Meta
The distribution metadata for a CPAN dist
=item CPAN::Meta::Converter
Convert CPAN distribution metadata structures
=item CPAN::Meta::Feature
An optional feature provided by a CPAN distribution
=item CPAN::Meta::History
History of CPAN Meta Spec changes
=item CPAN::Meta::History::Meta_1_0
Version 1.0 metadata specification for META.yml
=item CPAN::Meta::History::Meta_1_1
Version 1.1 metadata specification for META.yml
=item CPAN::Meta::History::Meta_1_2
Version 1.2 metadata specification for META.yml
=item CPAN::Meta::History::Meta_1_3
Version 1.3 metadata specification for META.yml
=item CPAN::Meta::History::Meta_1_4
Version 1.4 metadata specification for META.yml
=item CPAN::Meta::Merge
Merging CPAN Meta fragments
=item CPAN::Meta::Prereqs
A set of distribution prerequisites by phase and type
=item CPAN::Meta::Requirements
A set of version requirements for a CPAN dist
=item CPAN::Meta::Spec
Specification for CPAN distribution metadata
=item CPAN::Meta::Validator
Validate CPAN distribution metadata structures
=item CPAN::Meta::YAML
Read and write a subset of YAML for CPAN Meta files
=item CPAN::Nox
Wrapper around CPAN.pm without using any XS module
=item CPAN::Plugin
Base class for CPAN shell extensions
=item CPAN::Plugin::Specfile
Proof of concept implementation of a trivial CPAN::Plugin
=item CPAN::Queue
Internal queue support for CPAN.pm
=item CPAN::Tarzip
Internal handling of tar archives for CPAN.pm
=item CPAN::Version
Utility functions to compare CPAN versions
=item Carp
Alternative warn and die for modules
=item Class::Struct
Declare struct-like datatypes as Perl classes
=item Compress::Raw::Bzip2
Low-Level Interface to bzip2 compression library
=item Compress::Raw::Zlib
Low-Level Interface to zlib compression library
=item Compress::Zlib
Interface to zlib compression library
=item Config
Access Perl configuration information
=item Config::Perl::V
Structured data retrieval of perl -V output
=item Cwd
Get pathname of current working directory
=item DB
Programmatic interface to the Perl debugging API
=item DBM_Filter
Filter DBM keys/values
=item DBM_Filter::compress
Filter for DBM_Filter
=item DBM_Filter::encode
Filter for DBM_Filter
=item DBM_Filter::int32
Filter for DBM_Filter
=item DBM_Filter::null
Filter for DBM_Filter
=item DBM_Filter::utf8
Filter for DBM_Filter
=item DB_File
Perl5 access to Berkeley DB version 1.x
=item Data::Dumper
Stringified perl data structures, suitable for both printing and C<eval>
=item Devel::PPPort
Perl/Pollution/Portability
=item Devel::Peek
A data debugging tool for the XS programmer
=item Devel::SelfStubber
Generate stubs for a SelfLoading module
=item Digest
Modules that calculate message digests
=item Digest::MD5
Perl interface to the MD5 Algorithm
=item Digest::SHA
Perl extension for SHA-1/224/256/384/512
=item Digest::base
Digest base class
=item Digest::file
Calculate digests of files
=item DirHandle
Supply object methods for directory handles
=item Dumpvalue
Provides screen dump of Perl data.
=item DynaLoader
Dynamically load C libraries into Perl code
=item Encode
Character encodings in Perl
=item Encode::Alias
Alias definitions to encodings
=item Encode::Byte
Single Byte Encodings
=item Encode::CJKConstants
Internally used by Encode::??::ISO_2022_*
=item Encode::CN
China-based Chinese Encodings
=item Encode::CN::HZ
Internally used by Encode::CN
=item Encode::Config
Internally used by Encode
=item Encode::EBCDIC
EBCDIC Encodings
=item Encode::Encoder
Object Oriented Encoder
=item Encode::Encoding
Encode Implementation Base Class
=item Encode::GSM0338
ESTI GSM 03.38 Encoding
=item Encode::Guess
Guesses encoding from data
=item Encode::JP
Japanese Encodings
=item Encode::JP::H2Z
Internally used by Encode::JP::2022_JP*
=item Encode::JP::JIS7
Internally used by Encode::JP
=item Encode::KR
Korean Encodings
=item Encode::KR::2022_KR
Internally used by Encode::KR
=item Encode::MIME::Header
MIME encoding for an unstructured email header
=item Encode::MIME::Name
Internally used by Encode
=item Encode::PerlIO
A detailed document on Encode and PerlIO
=item Encode::Supported
Encodings supported by Encode
=item Encode::Symbol
Symbol Encodings
=item Encode::TW
Taiwan-based Chinese Encodings
=item Encode::Unicode
Various Unicode Transformation Formats
=item Encode::Unicode::UTF7
UTF-7 encoding
=item English
Use nice English (or awk) names for ugly punctuation variables
=item Env
Perl module that imports environment variables as scalars or arrays
=item Errno
System errno constants
=item Exporter
Implements default import method for modules
=item Exporter::Heavy
Exporter guts
=item ExtUtils::CBuilder
Compile and link C code for Perl modules
=item ExtUtils::CBuilder::Platform::Windows
Builder class for Windows platforms
=item ExtUtils::Command
Utilities to replace common UNIX commands in Makefiles etc.
=item ExtUtils::Command::MM
Commands for the MM's to use in Makefiles
=item ExtUtils::Constant
Generate XS code to import C header constants
=item ExtUtils::Constant::Base
Base class for ExtUtils::Constant objects
=item ExtUtils::Constant::Utils
Helper functions for ExtUtils::Constant
=item ExtUtils::Constant::XS
Generate C code for XS modules' constants.
=item ExtUtils::Embed
Utilities for embedding Perl in C/C++ applications
=item ExtUtils::Install
Install files from here to there
=item ExtUtils::Installed
Inventory management of installed modules
=item ExtUtils::Liblist
Determine libraries to use and how to use them
=item ExtUtils::MM
OS adjusted ExtUtils::MakeMaker subclass
=item ExtUtils::MM::Utils
ExtUtils::MM methods without dependency on ExtUtils::MakeMaker
=item ExtUtils::MM_AIX
AIX specific subclass of ExtUtils::MM_Unix
=item ExtUtils::MM_Any
Platform-agnostic MM methods
=item ExtUtils::MM_BeOS
Methods to override UN*X behaviour in ExtUtils::MakeMaker
=item ExtUtils::MM_Cygwin
Methods to override UN*X behaviour in ExtUtils::MakeMaker
=item ExtUtils::MM_DOS
DOS specific subclass of ExtUtils::MM_Unix
=item ExtUtils::MM_Darwin
Special behaviors for OS X
=item ExtUtils::MM_MacOS
Once produced Makefiles for MacOS Classic
=item ExtUtils::MM_NW5
Methods to override UN*X behaviour in ExtUtils::MakeMaker
=item ExtUtils::MM_OS2
Methods to override UN*X behaviour in ExtUtils::MakeMaker
=item ExtUtils::MM_QNX
QNX specific subclass of ExtUtils::MM_Unix
=item ExtUtils::MM_UWIN
U/WIN specific subclass of ExtUtils::MM_Unix
=item ExtUtils::MM_Unix
Methods used by ExtUtils::MakeMaker
=item ExtUtils::MM_VMS
Methods to override UN*X behaviour in ExtUtils::MakeMaker
=item ExtUtils::MM_VOS
VOS specific subclass of ExtUtils::MM_Unix
=item ExtUtils::MM_Win32
Methods to override UN*X behaviour in ExtUtils::MakeMaker
=item ExtUtils::MM_Win95
Method to customize MakeMaker for Win9X
=item ExtUtils::MY
ExtUtils::MakeMaker subclass for customization
=item ExtUtils::MakeMaker
Create a module Makefile
=item ExtUtils::MakeMaker::Config
Wrapper around Config.pm
=item ExtUtils::MakeMaker::FAQ
Frequently Asked Questions About MakeMaker
=item ExtUtils::MakeMaker::Locale
Bundled Encode::Locale
=item ExtUtils::MakeMaker::Tutorial
Writing a module with MakeMaker
=item ExtUtils::Manifest
Utilities to write and check a MANIFEST file
=item ExtUtils::Miniperl
Write the C code for miniperlmain.c and perlmain.c
=item ExtUtils::Mkbootstrap
Make a bootstrap file for use by DynaLoader
=item ExtUtils::Mksymlists
Write linker options files for dynamic extension
=item ExtUtils::Packlist
Manage .packlist files
=item ExtUtils::ParseXS
Converts Perl XS code into C code
=item ExtUtils::ParseXS::Constants
Initialization values for some globals
=item ExtUtils::ParseXS::Eval
Clean package to evaluate code in
=item ExtUtils::ParseXS::Utilities
Subroutines used with ExtUtils::ParseXS
=item ExtUtils::Typemaps
Read/Write/Modify Perl/XS typemap files
=item ExtUtils::Typemaps::Cmd
Quick commands for handling typemaps
=item ExtUtils::Typemaps::InputMap
Entry in the INPUT section of a typemap
=item ExtUtils::Typemaps::OutputMap
Entry in the OUTPUT section of a typemap
=item ExtUtils::Typemaps::Type
Entry in the TYPEMAP section of a typemap
=item ExtUtils::XSSymSet
Keep sets of symbol names palatable to the VMS linker
=item ExtUtils::testlib
Add blib/* directories to @INC
=item Fatal
Replace functions with equivalents which succeed or die
=item Fcntl
Load the C Fcntl.h defines
=item File::Basename
Parse file paths into directory, filename and suffix.
=item File::Compare
Compare files or filehandles
=item File::Copy
Copy files or filehandles
=item File::DosGlob
DOS like globbing and then some
=item File::Fetch
A generic file fetching mechanism
=item File::Find
Traverse a directory tree.
=item File::Glob
Perl extension for BSD glob routine
=item File::GlobMapper
Extend File Glob to Allow Input and Output Files
=item File::Path
Create or remove directory trees
=item File::Spec
Portably perform operations on file names
=item File::Spec::AmigaOS
File::Spec for AmigaOS
=item File::Spec::Cygwin
Methods for Cygwin file specs
=item File::Spec::Epoc
Methods for Epoc file specs
=item File::Spec::Functions
Portably perform operations on file names
=item File::Spec::Mac
File::Spec for Mac OS (Classic)
=item File::Spec::OS2
Methods for OS/2 file specs
=item File::Spec::Unix
File::Spec for Unix, base for other File::Spec modules
=item File::Spec::VMS
Methods for VMS file specs
=item File::Spec::Win32
Methods for Win32 file specs
=item File::Temp
Return name and handle of a temporary file safely
=item File::stat
By-name interface to Perl's built-in stat() functions
=item FileCache
Keep more files open than the system permits
=item FileHandle
Supply object methods for filehandles
=item Filter::Simple
Simplified source filtering
=item Filter::Util::Call
Perl Source Filter Utility Module
=item FindBin
Locate directory of original perl script
=item GDBM_File
Perl5 access to the gdbm library.
=item Getopt::Long
Extended processing of command line options
=item Getopt::Std
Process single-character switches with switch clustering
=item HTTP::Tiny
A small, simple, correct HTTP/1.1 client
=item Hash::Util
A selection of general-utility hash subroutines
=item Hash::Util::FieldHash
Support for Inside-Out Classes
=item I18N::Collate
Compare 8-bit scalar data according to the current locale
=item I18N::LangTags
Functions for dealing with RFC3066-style language tags
=item I18N::LangTags::Detect
Detect the user's language preferences
=item I18N::LangTags::List
Tags and names for human languages
=item I18N::Langinfo
Query locale information
=item IO
Load various IO modules
=item IO::Compress::Base
Base Class for IO::Compress modules
=item IO::Compress::Bzip2
Write bzip2 files/buffers
=item IO::Compress::Deflate
Write RFC 1950 files/buffers
=item IO::Compress::FAQ
Frequently Asked Questions about IO::Compress
=item IO::Compress::Gzip
Write RFC 1952 files/buffers
=item IO::Compress::RawDeflate
Write RFC 1951 files/buffers
=item IO::Compress::Zip
Write zip files/buffers
=item IO::Dir
Supply object methods for directory handles
=item IO::File
Supply object methods for filehandles
=item IO::Handle
Supply object methods for I/O handles
=item IO::Pipe
Supply object methods for pipes
=item IO::Poll
Object interface to system poll call
=item IO::Seekable
Supply seek based methods for I/O objects
=item IO::Select
OO interface to the select system call
=item IO::Socket
Object interface to socket communications
=item IO::Socket::INET
Object interface for AF_INET domain sockets
=item IO::Socket::UNIX
Object interface for AF_UNIX domain sockets
=item IO::Uncompress::AnyInflate
Uncompress zlib-based (zip, gzip) file/buffer
=item IO::Uncompress::AnyUncompress
Uncompress gzip, zip, bzip2 or lzop file/buffer
=item IO::Uncompress::Base
Base Class for IO::Uncompress modules
=item IO::Uncompress::Bunzip2
Read bzip2 files/buffers
=item IO::Uncompress::Gunzip
Read RFC 1952 files/buffers
=item IO::Uncompress::Inflate
Read RFC 1950 files/buffers
=item IO::Uncompress::RawInflate
Read RFC 1951 files/buffers
=item IO::Uncompress::Unzip
Read zip files/buffers
=item IO::Zlib
IO:: style interface to L<Compress::Zlib>
=item IPC::Cmd
Finding and running system commands made easy
=item IPC::Msg
SysV Msg IPC object class
=item IPC::Open2
Open a process for both reading and writing using open2()
=item IPC::Open3
Open a process for reading, writing, and error handling using open3()
=item IPC::Semaphore
SysV Semaphore IPC object class
=item IPC::SharedMem
SysV Shared Memory IPC object class
=item IPC::SysV
System V IPC constants and system calls
=item Internals
Reserved special namespace for internals related functions
=item JSON::PP
JSON::XS compatible pure-Perl module.
=item JSON::PP::Boolean
Dummy module providing JSON::PP::Boolean
=item List::Util
A selection of general-utility list subroutines
=item List::Util::XS
Indicate if List::Util was compiled with a C compiler
=item Locale::Codes
A distribution of modules to handle locale codes
=item Locale::Codes::API
A description of the callable function in each module
=item Locale::Codes::Changes
Details changes to Locale::Codes
=item Locale::Codes::Country
Standard codes for country identification
=item Locale::Codes::Currency
Standard codes for currency identification
=item Locale::Codes::LangExt
Standard codes for language extension identification
=item Locale::Codes::LangFam
Standard codes for language extension identification
=item Locale::Codes::LangVar
Standard codes for language variation identification
=item Locale::Codes::Language
Standard codes for language identification
=item Locale::Codes::Script
Standard codes for script identification
=item Locale::Country
Standard codes for country identification
=item Locale::Currency
Standard codes for currency identification
=item Locale::Language
Standard codes for language identification
=item Locale::Maketext
Framework for localization
=item Locale::Maketext::Cookbook
Recipes for using Locale::Maketext
=item Locale::Maketext::Guts
Deprecated module to load Locale::Maketext utf8 code
=item Locale::Maketext::GutsLoader
Deprecated module to load Locale::Maketext utf8 code
=item Locale::Maketext::Simple
Simple interface to Locale::Maketext::Lexicon
=item Locale::Maketext::TPJ13
Article about software localization
=item Locale::Script
Standard codes for script identification
=item MIME::Base64
Encoding and decoding of base64 strings
=item MIME::QuotedPrint
Encoding and decoding of quoted-printable strings
=item Math::BigFloat
Arbitrary size floating point math package
=item Math::BigInt
Arbitrary size integer/float math package
=item Math::BigInt::Calc
Pure Perl module to support Math::BigInt
=item Math::BigInt::CalcEmu
Emulate low-level math with BigInt code
=item Math::BigInt::FastCalc
Math::BigInt::Calc with some XS for more speed
=item Math::BigInt::Lib
Virtual parent class for Math::BigInt libraries
=item Math::BigRat
Arbitrary big rational numbers
=item Math::Complex
Complex numbers and associated mathematical functions
=item Math::Trig
Trigonometric functions
=item Memoize
Make functions faster by trading space for time
=item Memoize::AnyDBM_File
Glue to provide EXISTS for AnyDBM_File for Storable use
=item Memoize::Expire
Plug-in module for automatic expiration of memoized values
=item Memoize::ExpireFile
Test for Memoize expiration semantics
=item Memoize::ExpireTest
Test for Memoize expiration semantics
=item Memoize::NDBM_File
Glue to provide EXISTS for NDBM_File for Storable use
=item Memoize::SDBM_File
Glue to provide EXISTS for SDBM_File for Storable use
=item Memoize::Storable
Store Memoized data in Storable database
=item Module::CoreList
What modules shipped with versions of perl
=item Module::CoreList::Utils
What utilities shipped with versions of perl
=item Module::Load
Runtime require of both modules and files
=item Module::Load::Conditional
Looking up module information / loading at runtime
=item Module::Loaded
Mark modules as loaded or unloaded
=item Module::Metadata
Gather package and POD information from perl module files
=item NDBM_File
Tied access to ndbm files
=item NEXT
Provide a pseudo-class NEXT (et al) that allows method redispatch
=item Net::Cmd
Network Command class (as used by FTP, SMTP etc)
=item Net::Config
Local configuration data for libnet
=item Net::Domain
Attempt to evaluate the current host's internet name and domain
=item Net::FTP
FTP Client class
=item Net::FTP::dataconn
FTP Client data connection class
=item Net::NNTP
NNTP Client class
=item Net::Netrc
OO interface to users netrc file
=item Net::POP3
Post Office Protocol 3 Client class (RFC1939)
=item Net::Ping
Check a remote host for reachability
=item Net::SMTP
Simple Mail Transfer Protocol Client
=item Net::Time
Time and daytime network client interface
=item Net::hostent
By-name interface to Perl's built-in gethost*() functions
=item Net::libnetFAQ
Libnet Frequently Asked Questions
=item Net::netent
By-name interface to Perl's built-in getnet*() functions
=item Net::protoent
By-name interface to Perl's built-in getproto*() functions
=item Net::servent
By-name interface to Perl's built-in getserv*() functions
=item O
Generic interface to Perl Compiler backends
=item ODBM_File
Tied access to odbm files
=item Opcode
Disable named opcodes when compiling perl code
=item POSIX
Perl interface to IEEE Std 1003.1
=item Params::Check
A generic input parsing/checking mechanism.
=item Parse::CPAN::Meta
Parse META.yml and META.json CPAN metadata files
=item Perl::OSType
Map Perl operating system names to generic types
=item PerlIO
On demand loader for PerlIO layers and root of PerlIO::* name space
=item PerlIO::encoding
Encoding layer
=item PerlIO::mmap
Memory mapped IO
=item PerlIO::scalar
In-memory IO, scalar IO
=item PerlIO::via
Helper class for PerlIO layers implemented in perl
=item PerlIO::via::QuotedPrint
PerlIO layer for quoted-printable strings
=item Pod::Checker
Check pod documents for syntax errors
=item Pod::Escapes
For resolving Pod EE<lt>...E<gt> sequences
=item Pod::Find
Find POD documents in directory trees
=item Pod::Functions
Group Perl's functions a la perlfunc.pod
=item Pod::Html
Module to convert pod files to HTML
=item Pod::InputObjects
Objects representing POD input paragraphs, commands, etc.
=item Pod::Man
Convert POD data to formatted *roff input
=item Pod::ParseLink
Parse an LE<lt>E<gt> formatting code in POD text
=item Pod::ParseUtils
Helpers for POD parsing and conversion
=item Pod::Parser
Base class for creating POD filters and translators
=item Pod::Perldoc
Look up Perl documentation in Pod format.
=item Pod::Perldoc::BaseTo
Base for Pod::Perldoc formatters
=item Pod::Perldoc::GetOptsOO
Customized option parser for Pod::Perldoc
=item Pod::Perldoc::ToANSI
Render Pod with ANSI color escapes
=item Pod::Perldoc::ToChecker
Let Perldoc check Pod for errors
=item Pod::Perldoc::ToMan
Let Perldoc render Pod as man pages
=item Pod::Perldoc::ToNroff
Let Perldoc convert Pod to nroff
=item Pod::Perldoc::ToPod
Let Perldoc render Pod as ... Pod!
=item Pod::Perldoc::ToRtf
Let Perldoc render Pod as RTF
=item Pod::Perldoc::ToTerm
Render Pod with terminal escapes
=item Pod::Perldoc::ToText
Let Perldoc render Pod as plaintext
=item Pod::Perldoc::ToTk
Let Perldoc use Tk::Pod to render Pod
=item Pod::Perldoc::ToXml
Let Perldoc render Pod as XML
=item Pod::PlainText
Convert POD data to formatted ASCII text
=item Pod::Select
Extract selected sections of POD from input
=item Pod::Simple
Framework for parsing Pod
=item Pod::Simple::Checker
Check the Pod syntax of a document
=item Pod::Simple::Debug
Put Pod::Simple into trace/debug mode
=item Pod::Simple::DumpAsText
Dump Pod-parsing events as text
=item Pod::Simple::DumpAsXML
Turn Pod into XML
=item Pod::Simple::HTML
Convert Pod to HTML
=item Pod::Simple::HTMLBatch
Convert several Pod files to several HTML files
=item Pod::Simple::LinkSection
Represent "section" attributes of L codes
=item Pod::Simple::Methody
Turn Pod::Simple events into method calls
=item Pod::Simple::PullParser
A pull-parser interface to parsing Pod
=item Pod::Simple::PullParserEndToken
End-tokens from Pod::Simple::PullParser
=item Pod::Simple::PullParserStartToken
Start-tokens from Pod::Simple::PullParser
=item Pod::Simple::PullParserTextToken
Text-tokens from Pod::Simple::PullParser
=item Pod::Simple::PullParserToken
Tokens from Pod::Simple::PullParser
=item Pod::Simple::RTF
Format Pod as RTF
=item Pod::Simple::Search
Find POD documents in directory trees
=item Pod::Simple::SimpleTree
Parse Pod into a simple parse tree
=item Pod::Simple::Subclassing
Write a formatter as a Pod::Simple subclass
=item Pod::Simple::Text
Format Pod as plaintext
=item Pod::Simple::TextContent
Get the text content of Pod
=item Pod::Simple::XHTML
Format Pod as validating XHTML
=item Pod::Simple::XMLOutStream
Turn Pod into XML
=item Pod::Text
Convert POD data to formatted text
=item Pod::Text::Color
Convert POD data to formatted color ASCII text
=item Pod::Text::Termcap
Convert POD data to ASCII text with format escapes
=item Pod::Usage
Print a usage message from embedded pod documentation
=item SDBM_File
Tied access to sdbm files
=item Safe
Compile and execute code in restricted compartments
=item Scalar::Util
A selection of general-utility scalar subroutines
=item Search::Dict
Look - search for key in dictionary file
=item SelectSaver
Save and restore selected file handle
=item SelfLoader
Load functions only on demand
=item Storable
Persistence for Perl data structures
=item Sub::Util
A selection of utility subroutines for subs and CODE references
=item Symbol
Manipulate Perl symbols and their names
=item Sys::Hostname
Try every conceivable way to get hostname
=item Sys::Syslog
Perl interface to the UNIX syslog(3) calls
=item Sys::Syslog::Win32
Win32 support for Sys::Syslog
=item TAP::Base
Base class that provides common functionality to L<TAP::Parser>
=item TAP::Formatter::Base
Base class for harness output delegates
=item TAP::Formatter::Color
Run Perl test scripts with color
=item TAP::Formatter::Console
Harness output delegate for default console output
=item TAP::Formatter::Console::ParallelSession
Harness output delegate for parallel console output
=item TAP::Formatter::Console::Session
Harness output delegate for default console output
=item TAP::Formatter::File
Harness output delegate for file output
=item TAP::Formatter::File::Session
Harness output delegate for file output
=item TAP::Formatter::Session
Abstract base class for harness output delegate
=item TAP::Harness
Run test scripts with statistics
=item TAP::Harness::Env
Parsing harness related environmental variables where appropriate
=item TAP::Object
Base class that provides common functionality to all C<TAP::*> modules
=item TAP::Parser
Parse L<TAP|Test::Harness::TAP> output
=item TAP::Parser::Aggregator
Aggregate TAP::Parser results
=item TAP::Parser::Grammar
A grammar for the Test Anything Protocol.
=item TAP::Parser::Iterator
Base class for TAP source iterators
=item TAP::Parser::Iterator::Array
Iterator for array-based TAP sources
=item TAP::Parser::Iterator::Process
Iterator for process-based TAP sources
=item TAP::Parser::Iterator::Stream
Iterator for filehandle-based TAP sources
=item TAP::Parser::IteratorFactory
Figures out which SourceHandler objects to use for a given Source
=item TAP::Parser::Multiplexer
Multiplex multiple TAP::Parsers
=item TAP::Parser::Result
Base class for TAP::Parser output objects
=item TAP::Parser::Result::Bailout
Bailout result token.
=item TAP::Parser::Result::Comment
Comment result token.
=item TAP::Parser::Result::Plan
Plan result token.
=item TAP::Parser::Result::Pragma
TAP pragma token.
=item TAP::Parser::Result::Test
Test result token.
=item TAP::Parser::Result::Unknown
Unknown result token.
=item TAP::Parser::Result::Version
TAP syntax version token.
=item TAP::Parser::Result::YAML
YAML result token.
=item TAP::Parser::ResultFactory
Factory for creating TAP::Parser output objects
=item TAP::Parser::Scheduler
Schedule tests during parallel testing
=item TAP::Parser::Scheduler::Job
A single testing job.
=item TAP::Parser::Scheduler::Spinner
A no-op job.
=item TAP::Parser::Source
A TAP source & meta data about it
=item TAP::Parser::SourceHandler
Base class for different TAP source handlers
=item TAP::Parser::SourceHandler::Executable
Stream output from an executable TAP source
=item TAP::Parser::SourceHandler::File
Stream TAP from a text file.
=item TAP::Parser::SourceHandler::Handle
Stream TAP from an IO::Handle or a GLOB.
=item TAP::Parser::SourceHandler::Perl
Stream TAP from a Perl executable
=item TAP::Parser::SourceHandler::RawTAP
Stream output from raw TAP in a scalar/array ref.
=item TAP::Parser::YAMLish::Reader
Read YAMLish data from iterator
=item TAP::Parser::YAMLish::Writer
Write YAMLish data
=item Term::ANSIColor
Color screen output using ANSI escape sequences
=item Term::Cap
Perl termcap interface
=item Term::Complete
Perl word completion module
=item Term::ReadLine
Perl interface to various C<readline> packages.
=item Test
Provides a simple framework for writing test scripts
=item Test2
Framework for writing test tools that all work together.
=item Test2::API
Primary interface for writing Test2 based testing tools.
=item Test2::API::Breakage
What breaks at what version
=item Test2::API::Context
Object to represent a testing context.
=item Test2::API::Instance
Object used by Test2::API under the hood
=item Test2::API::Stack
Object to manage a stack of L<Test2::Hub>
=item Test2::Event
Base class for events
=item Test2::Event::Bail
Bailout!
=item Test2::Event::Diag
Diag event type
=item Test2::Event::Encoding
Set the encoding for the output stream
=item Test2::Event::Exception
Exception event
=item Test2::Event::Generic
Generic event type.
=item Test2::Event::Info
Info event base class
=item Test2::Event::Note
Note event type
=item Test2::Event::Ok
Ok event type
=item Test2::Event::Plan
The event of a plan
=item Test2::Event::Skip
Skip event type
=item Test2::Event::Subtest
Event for subtest types
=item Test2::Event::TAP::Version
Event for TAP version.
=item Test2::Event::Waiting
Tell all procs/threads it is time to be done
=item Test2::Formatter
Namespace for formatters.
=item Test2::Formatter::TAP
Standard TAP formatter
=item Test2::Hub
The conduit through which all events flow.
=item Test2::Hub::Interceptor
Hub used by interceptor to grab results.
=item Test2::Hub::Interceptor::Terminator
Exception class used by
=item Test2::Hub::Subtest
Hub used by subtests
=item Test2::IPC
Turn on IPC for threading or forking support.
=item Test2::IPC::Driver
Base class for Test2 IPC drivers.
=item Test2::IPC::Driver::Files
Temp dir + Files concurrency model.
=item Test2::Tools::Tiny
Tiny set of tools for unfortunate souls who cannot use
=item Test2::Transition
Transition notes when upgrading to Test2
=item Test2::Util
Tools used by Test2 and friends.
=item Test2::Util::ExternalMeta
Allow third party tools to safely attach meta-data
=item Test2::Util::HashBase
Build hash based classes.
=item Test2::Util::Trace
Debug information for events
=item Test::Builder
Backend for building test libraries
=item Test::Builder::Formatter
Test::Builder subclass of Test2::Formatter::TAP
=item Test::Builder::IO::Scalar
A copy of IO::Scalar for Test::Builder
=item Test::Builder::Module
Base class for test modules
=item Test::Builder::Tester
Test testsuites that have been built with
=item Test::Builder::Tester::Color
Turn on colour in Test::Builder::Tester
=item Test::Builder::TodoDiag
Test::Builder subclass of Test2::Event::Diag
=item Test::Harness
Run Perl standard test scripts with statistics
=item Test::Harness::Beyond
Beyond make test
=item Test::More
Yet another framework for writing test scripts
=item Test::Simple
Basic utilities for writing tests.
=item Test::Tester
Ease testing test modules built with Test::Builder
=item Test::Tester::Capture
Help testing test modules built with Test::Builder
=item Test::Tester::CaptureRunner
Help testing test modules built with Test::Builder
=item Test::Tutorial
A tutorial about writing really basic tests
=item Test::use::ok
Alternative to Test::More::use_ok
=item Text::Abbrev
Abbrev - create an abbreviation table from a list
=item Text::Balanced
Extract delimited text sequences from strings.
=item Text::ParseWords
Parse text into an array of tokens or array of arrays
=item Text::Tabs
Expand and unexpand tabs like unix expand(1) and unexpand(1)
=item Text::Wrap
Line wrapping to form simple paragraphs
=item Thread
Manipulate threads in Perl (for old code only)
=item Thread::Queue
Thread-safe queues
=item Thread::Semaphore
Thread-safe semaphores
=item Tie::Array
Base class for tied arrays
=item Tie::File
Access the lines of a disk file via a Perl array
=item Tie::Handle
Base class definitions for tied handles
=item Tie::Hash
Base class definitions for tied hashes
=item Tie::Hash::NamedCapture
Named regexp capture buffers
=item Tie::Memoize
Add data to hash when needed
=item Tie::RefHash
Use references as hash keys
=item Tie::Scalar
Base class definitions for tied scalars
=item Tie::StdHandle
Base class definitions for tied handles
=item Tie::SubstrHash
Fixed-table-size, fixed-key-length hashing
=item Time::HiRes
High resolution alarm, sleep, gettimeofday, interval timers
=item Time::Local
Efficiently compute time from local and GMT time
=item Time::Piece
Object Oriented time objects
=item Time::Seconds
A simple API to convert seconds to other date values
=item Time::gmtime
By-name interface to Perl's built-in gmtime() function
=item Time::localtime
By-name interface to Perl's built-in localtime() function
=item Time::tm
Internal object used by Time::gmtime and Time::localtime
=item UNIVERSAL
Base class for ALL classes (blessed references)
=item Unicode::Collate
Unicode Collation Algorithm
=item Unicode::Collate::CJK::Big5
Weighting CJK Unified Ideographs
=item Unicode::Collate::CJK::GB2312
Weighting CJK Unified Ideographs
=item Unicode::Collate::CJK::JISX0208
Weighting JIS KANJI for Unicode::Collate
=item Unicode::Collate::CJK::Korean
Weighting CJK Unified Ideographs
=item Unicode::Collate::CJK::Pinyin
Weighting CJK Unified Ideographs
=item Unicode::Collate::CJK::Stroke
Weighting CJK Unified Ideographs
=item Unicode::Collate::CJK::Zhuyin
Weighting CJK Unified Ideographs
=item Unicode::Collate::Locale
Linguistic tailoring for DUCET via Unicode::Collate
=item Unicode::Normalize
Unicode Normalization Forms
=item Unicode::UCD
Unicode character database
=item User::grent
By-name interface to Perl's built-in getgr*() functions
=item User::pwent
By-name interface to Perl's built-in getpw*() functions
=item VMS::DCLsym
Perl extension to manipulate DCL symbols
=item VMS::Filespec
Convert between VMS and Unix file specification syntax
=item VMS::Stdio
Standard I/O functions via VMS extensions
=item Win32
Interfaces to some Win32 API Functions
=item Win32API::File
Low-level access to Win32 system API calls for files/dirs.
=item Win32CORE
Win32 CORE function stubs
=item XS::APItest
Test the perl C API
=item XS::Typemap
Module to test the XS typemaps distributed with perl
=item XSLoader
Dynamically load C libraries into Perl code
=item autodie::Scope::Guard
Wrapper class for calling subs at end of scope
=item autodie::Scope::GuardStack
Hook stack for managing scopes via %^H
=item autodie::Util
Internal Utility subroutines for autodie and Fatal
=item version::Internals
Perl extension for Version Objects
=back
To find out I<all> modules installed on your system, including
those without documentation or outside the standard release,
just use the following command (under the default win32 shell,
double quotes should be used instead of single quotes).
% perl -MFile::Find=find -MFile::Spec::Functions -Tlwe \
'find { wanted => sub { print canonpath $_ if /\.pm\z/ },
no_chdir => 1 }, @INC'
(The -T is here to prevent '.' from being listed in @INC.)
They should all have their own documentation installed and accessible
via your system man(1) command. If you do not have a B<find>
program, you can use the Perl B<find2perl> program instead, which
generates Perl code as output you can run through perl. If you
have a B<man> program but it doesn't find your modules, you'll have
to fix your manpath. See L<perl> for details. If you have no
system B<man> command, you might try the B<perldoc> program.
Note also that the command C<perldoc perllocal> gives you a (possibly
incomplete) list of the modules that have been further installed on
your system. (The perllocal.pod file is updated by the standard MakeMaker
install process.)
=head2 Extension Modules
Extension modules are written in C (or a mix of Perl and C). They
are usually dynamically loaded into Perl if and when you need them,
but may also be linked in statically. Supported extension modules
include Socket, Fcntl, and POSIX.
Many popular C extension modules do not come bundled (at least, not
completely) due to their sizes, volatility, or simply lack of time
for adequate testing and configuration across the multitude of
platforms on which Perl was beta-tested. You are encouraged to
look for them on CPAN (described below), or using web search engines
like Alta Vista or Google.
=head1 CPAN
CPAN stands for Comprehensive Perl Archive Network; it's a globally
replicated trove of Perl materials, including documentation, style
guides, tricks and traps, alternate ports to non-Unix systems and
occasional binary distributions for these. Search engines for
CPAN can be found at http://www.cpan.org/
Most importantly, CPAN includes around a thousand unbundled modules,
some of which require a C compiler to build. Major categories of
modules are:
=over
=item *
Language Extensions and Documentation Tools
=item *
Development Support
=item *
Operating System Interfaces
=item *
Networking, Device Control (modems) and InterProcess Communication
=item *
Data Types and Data Type Utilities
=item *
Database Interfaces
=item *
User Interfaces
=item *
Interfaces to / Emulations of Other Programming Languages
=item *
File Names, File Systems and File Locking (see also File Handles)
=item *
String Processing, Language Text Processing, Parsing, and Searching
=item *
Option, Argument, Parameter, and Configuration File Processing
=item *
Internationalization and Locale
=item *
Authentication, Security, and Encryption
=item *
World Wide Web, HTML, HTTP, CGI, MIME
=item *
Server and Daemon Utilities
=item *
Archiving and Compression
=item *
Images, Pixmap and Bitmap Manipulation, Drawing, and Graphing
=item *
Mail and Usenet News
=item *
Control Flow Utilities (callbacks and exceptions etc)
=item *
File Handle and Input/Output Stream Utilities
=item *
Miscellaneous Modules
=back
The list of the registered CPAN sites follows.
Please note that the sorting order is alphabetical on fields:
Continent
|
|-->Country
|
|-->[state/province]
|
|-->ftp
|
|-->[http]
and thus the North American servers happen to be listed between the
European and the South American sites.
Registered CPAN sites
=for maintainers
Generated by Porting/make_modlib_cpan.pl
=head2 Africa
=over 4
=item South Africa
http://mirror.is.co.za/pub/cpan/
ftp://ftp.is.co.za/pub/cpan/
http://cpan.mirror.ac.za/
ftp://cpan.mirror.ac.za/
http://cpan.saix.net/
ftp://ftp.saix.net/pub/CPAN/
http://ftp.wa.co.za/pub/CPAN/
ftp://ftp.wa.co.za/pub/CPAN/
=item Uganda
http://mirror.ucu.ac.ug/cpan/
=item Zimbabwe
http://mirror.zol.co.zw/CPAN/
ftp://mirror.zol.co.zw/CPAN/
=back
=head2 Asia
=over 4
=item Bangladesh
http://mirror.dhakacom.com/CPAN/
ftp://mirror.dhakacom.com/CPAN/
=item China
http://cpan.communilink.net/
http://ftp.cuhk.edu.hk/pub/packages/perl/CPAN/
ftp://ftp.cuhk.edu.hk/pub/packages/perl/CPAN/
http://mirrors.hust.edu.cn/CPAN/
http://mirrors.neusoft.edu.cn/cpan/
http://mirror.lzu.edu.cn/CPAN/
http://mirrors.163.com/cpan/
http://mirrors.sohu.com/CPAN/
http://mirrors.ustc.edu.cn/CPAN/
ftp://mirrors.ustc.edu.cn/CPAN/
http://mirrors.xmu.edu.cn/CPAN/
ftp://mirrors.xmu.edu.cn/CPAN/
http://mirrors.zju.edu.cn/CPAN/
=item India
http://cpan.excellmedia.net/
http://perlmirror.indialinks.com/
=item Indonesia
http://kambing.ui.ac.id/cpan/
http://cpan.pesat.net.id/
http://mirror.poliwangi.ac.id/CPAN/
http://kartolo.sby.datautama.net.id/CPAN/
http://mirror.wanxp.id/cpan/
=item Iran
http://mirror.yazd.ac.ir/cpan/
=item Israel
http://biocourse.weizmann.ac.il/CPAN/
=item Japan
http://ftp.jaist.ac.jp/pub/CPAN/
ftp://ftp.jaist.ac.jp/pub/CPAN/
http://mirror.jre655.com/CPAN/
ftp://mirror.jre655.com/CPAN/
ftp://ftp.kddilabs.jp/CPAN/
http://ftp.nara.wide.ad.jp/pub/CPAN/
ftp://ftp.nara.wide.ad.jp/pub/CPAN/
http://ftp.riken.jp/lang/CPAN/
ftp://ftp.riken.jp/lang/CPAN/
ftp://ftp.u-aizu.ac.jp/pub/CPAN/
http://ftp.yz.yamagata-u.ac.jp/pub/lang/cpan/
ftp://ftp.yz.yamagata-u.ac.jp/pub/lang/cpan/
=item Kazakhstan
http://mirror.neolabs.kz/CPAN/
ftp://mirror.neolabs.kz/CPAN/
=item Philippines
http://mirror.pregi.net/CPAN/
ftp://mirror.pregi.net/CPAN/
http://mirror.rise.ph/cpan/
ftp://mirror.rise.ph/cpan/
=item Qatar
http://mirror.qnren.qa/CPAN/
ftp://mirror.qnren.qa/CPAN/
=item Republic of Korea
http://cpan.mirror.cdnetworks.com/
ftp://cpan.mirror.cdnetworks.com/CPAN/
http://ftp.kaist.ac.kr/pub/CPAN/
ftp://ftp.kaist.ac.kr/CPAN/
http://ftp.kr.freebsd.org/pub/CPAN/
ftp://ftp.kr.freebsd.org/pub/CPAN/
http://mirror.navercorp.com/CPAN/
http://ftp.neowiz.com/CPAN/
ftp://ftp.neowiz.com/CPAN/
=item Singapore
http://cpan.mirror.choon.net/
http://mirror.0x.sg/CPAN/
ftp://mirror.0x.sg/CPAN/
=item Taiwan
http://cpan.cdpa.nsysu.edu.tw/Unix/Lang/CPAN/
ftp://cpan.cdpa.nsysu.edu.tw/Unix/Lang/CPAN/
http://cpan.stu.edu.tw/
ftp://ftp.stu.edu.tw/CPAN/
http://ftp.yzu.edu.tw/CPAN/
ftp://ftp.yzu.edu.tw/CPAN/
http://cpan.nctu.edu.tw/
ftp://cpan.nctu.edu.tw/
http://ftp.ubuntu-tw.org/mirror/CPAN/
ftp://ftp.ubuntu-tw.org/mirror/CPAN/
=item Turkey
http://cpan.ulak.net.tr/
ftp://ftp.ulak.net.tr/pub/perl/CPAN/
http://mirror.vit.com.tr/mirror/CPAN/
ftp://mirror.vit.com.tr/CPAN/
=item Viet Nam
http://mirrors.digipower.vn/CPAN/
http://mirror.downloadvn.com/cpan/
http://mirrors.vinahost.vn/CPAN/
=back
=head2 Europe
=over 4
=item Austria
http://cpan.inode.at/
ftp://cpan.inode.at/
http://mirror.easyname.at/cpan/
ftp://mirror.easyname.at/cpan/
http://gd.tuwien.ac.at/languages/perl/CPAN/
ftp://gd.tuwien.ac.at/pub/CPAN/
=item Belarus
http://ftp.byfly.by/pub/CPAN/
ftp://ftp.byfly.by/pub/CPAN/
http://mirror.datacenter.by/pub/CPAN/
ftp://mirror.datacenter.by/pub/CPAN/
=item Belgium
http://ftp.belnet.be/ftp.cpan.org/
ftp://ftp.belnet.be/mirror/ftp.cpan.org/
http://cpan.cu.be/
http://lib.ugent.be/CPAN/
http://cpan.weepeetelecom.be/
=item Bosnia and Herzegovina
http://cpan.mirror.ba/
ftp://ftp.mirror.ba/CPAN/
=item Bulgaria
http://mirrors.neterra.net/CPAN/
ftp://mirrors.neterra.net/CPAN/
http://mirrors.netix.net/CPAN/
ftp://mirrors.netix.net/CPAN/
=item Croatia
http://ftp.carnet.hr/pub/CPAN/
ftp://ftp.carnet.hr/pub/CPAN/
=item Czech Republic
http://mirror.dkm.cz/cpan/
ftp://mirror.dkm.cz/cpan/
ftp://ftp.fi.muni.cz/pub/CPAN/
http://mirrors.nic.cz/CPAN/
ftp://mirrors.nic.cz/pub/CPAN/
http://cpan.mirror.vutbr.cz/
ftp://mirror.vutbr.cz/cpan/
=item Denmark
http://www.cpan.dk/
http://mirrors.dotsrc.org/cpan/
ftp://mirrors.dotsrc.org/cpan/
=item Finland
ftp://ftp.funet.fi/pub/languages/perl/CPAN/
=item France
http://ftp.ciril.fr/pub/cpan/
ftp://ftp.ciril.fr/pub/cpan/
http://distrib-coffee.ipsl.jussieu.fr/pub/mirrors/cpan/
ftp://distrib-coffee.ipsl.jussieu.fr/pub/mirrors/cpan/
http://ftp.lip6.fr/pub/perl/CPAN/
ftp://ftp.lip6.fr/pub/perl/CPAN/
http://mirror.ibcp.fr/pub/CPAN/
ftp://ftp.oleane.net/pub/CPAN/
http://cpan.mirrors.ovh.net/ftp.cpan.org/
ftp://cpan.mirrors.ovh.net/ftp.cpan.org/
http://cpan.enstimac.fr/
=item Germany
http://mirror.23media.de/cpan/
ftp://mirror.23media.de/cpan/
http://artfiles.org/cpan.org/
ftp://artfiles.org/cpan.org/
http://mirror.bibleonline.ru/cpan/
http://mirror.checkdomain.de/CPAN/
ftp://mirror.checkdomain.de/CPAN/
http://cpan.noris.de/
http://mirror.de.leaseweb.net/CPAN/
ftp://mirror.de.leaseweb.net/CPAN/
http://cpan.mirror.euserv.net/
ftp://mirror.euserv.net/cpan/
http://ftp-stud.hs-esslingen.de/pub/Mirrors/CPAN/
ftp://mirror.fraunhofer.de/CPAN/
ftp://ftp.freenet.de/pub/ftp.cpan.org/pub/CPAN/
http://ftp.hosteurope.de/pub/CPAN/
ftp://ftp.hosteurope.de/pub/CPAN/
ftp://ftp.fu-berlin.de/unix/languages/perl/
http://ftp.gwdg.de/pub/languages/perl/CPAN/
ftp://ftp.gwdg.de/pub/languages/perl/CPAN/
http://ftp.hawo.stw.uni-erlangen.de/CPAN/
ftp://ftp.hawo.stw.uni-erlangen.de/CPAN/
http://cpan.mirror.iphh.net/
ftp://cpan.mirror.iphh.net/pub/CPAN/
ftp://ftp.mpi-inf.mpg.de/pub/perl/CPAN/
http://cpan.netbet.org/
http://mirror.netcologne.de/cpan/
ftp://mirror.netcologne.de/cpan/
ftp://mirror.petamem.com/CPAN/
http://www.planet-elektronik.de/CPAN/
http://ftp.halifax.rwth-aachen.de/cpan/
ftp://ftp.halifax.rwth-aachen.de/cpan/
http://mirror.softaculous.com/cpan/
http://ftp.u-tx.net/CPAN/
ftp://ftp.u-tx.net/CPAN/
http://mirror.reismil.ch/CPAN/
=item Greece
http://cpan.cc.uoc.gr/mirrors/CPAN/
ftp://ftp.cc.uoc.gr/mirrors/CPAN/
http://ftp.ntua.gr/pub/lang/perl/
ftp://ftp.ntua.gr/pub/lang/perl/
=item Hungary
http://mirror.met.hu/CPAN/
=item Ireland
http://ftp.heanet.ie/mirrors/ftp.perl.org/pub/CPAN/
ftp://ftp.heanet.ie/mirrors/ftp.perl.org/pub/CPAN/
=item Italy
http://bo.mirror.garr.it/mirrors/CPAN/
ftp://ftp.eutelia.it/CPAN_Mirror/
http://cpan.panu.it/
ftp://ftp.panu.it/pub/mirrors/perl/CPAN/
http://cpan.muzzy.it/
=item Latvia
http://kvin.lv/pub/CPAN/
=item Lithuania
http://ftp.litnet.lt/pub/CPAN/
ftp://ftp.litnet.lt/pub/CPAN/
=item Moldova
http://mirror.as43289.net/pub/CPAN/
ftp://mirror.as43289.net/pub/CPAN/
=item Netherlands
http://cpan.cs.uu.nl/
ftp://ftp.cs.uu.nl/pub/CPAN/
http://mirror.nl.leaseweb.net/CPAN/
ftp://mirror.nl.leaseweb.net/CPAN/
http://ftp.nluug.nl/languages/perl/CPAN/
ftp://ftp.nluug.nl/pub/languages/perl/CPAN/
http://mirror.transip.net/CPAN/
ftp://mirror.transip.net/CPAN/
http://cpan.mirror.triple-it.nl/
http://ftp.tudelft.nl/cpan/
ftp://ftp.tudelft.nl/pub/CPAN/
ftp://download.xs4all.nl/pub/mirror/CPAN/
=item Norway
http://cpan.uib.no/
ftp://cpan.uib.no/pub/CPAN/
ftp://ftp.uninett.no/pub/languages/perl/CPAN/
http://cpan.vianett.no/
=item Poland
http://ftp.agh.edu.pl/CPAN/
ftp://ftp.agh.edu.pl/CPAN/
http://ftp.piotrkosoft.net/pub/mirrors/CPAN/
ftp://ftp.piotrkosoft.net/pub/mirrors/CPAN/
ftp://ftp.ps.pl/pub/CPAN/
http://sunsite.icm.edu.pl/pub/CPAN/
ftp://sunsite.icm.edu.pl/pub/CPAN/
=item Portugal
http://cpan.dcc.fc.up.pt/
http://mirrors.fe.up.pt/pub/CPAN/
http://cpan.perl-hackers.net/
http://cpan.perl.pt/
=item Romania
http://mirrors.hostingromania.ro/cpan.org/
ftp://ftp.lug.ro/CPAN/
http://mirrors.m247.ro/CPAN/
http://mirrors.evowise.com/CPAN/
http://mirrors.teentelecom.net/CPAN/
ftp://mirrors.teentelecom.net/CPAN/
http://mirrors.xservers.ro/CPAN/
=item Russian Federation
ftp://ftp.aha.ru/CPAN/
http://cpan.rinet.ru/
ftp://cpan.rinet.ru/pub/mirror/CPAN/
http://cpan-mirror.rbc.ru/pub/CPAN/
http://mirror.rol.ru/CPAN/
http://cpan.uni-altai.ru/
http://cpan.webdesk.ru/
ftp://cpan.webdesk.ru/cpan/
http://mirror.yandex.ru/mirrors/cpan/
ftp://mirror.yandex.ru/mirrors/cpan/
=item Serbia
http://mirror.sbb.rs/CPAN/
ftp://mirror.sbb.rs/CPAN/
=item Slovakia
http://cpan.lnx.sk/
http://tux.rainside.sk/CPAN/
ftp://tux.rainside.sk/CPAN/
=item Slovenia
http://ftp.arnes.si/software/perl/CPAN/
ftp://ftp.arnes.si/software/perl/CPAN/
=item Spain
http://mirrors.evowise.com/CPAN/
http://osl.ugr.es/CPAN/
http://ftp.rediris.es/mirror/CPAN/
ftp://ftp.rediris.es/mirror/CPAN/
=item Sweden
http://ftp.acc.umu.se/mirror/CPAN/
ftp://ftp.acc.umu.se/mirror/CPAN/
=item Switzerland
http://www.pirbot.com/mirrors/cpan/
http://mirror.switch.ch/ftp/mirror/CPAN/
ftp://mirror.switch.ch/mirror/CPAN/
=item Ukraine
http://cpan.ip-connect.vn.ua/
ftp://cpan.ip-connect.vn.ua/mirror/cpan/
=item United Kingdom
http://cpan.mirror.anlx.net/
ftp://ftp.mirror.anlx.net/CPAN/
http://mirror.bytemark.co.uk/CPAN/
ftp://mirror.bytemark.co.uk/CPAN/
http://mirrors.coreix.net/CPAN/
http://cpan.etla.org/
ftp://cpan.etla.org/pub/CPAN/
http://cpan.cpantesters.org/
http://mirror.sax.uk.as61049.net/CPAN/
http://mirror.sov.uk.goscomb.net/CPAN/
http://www.mirrorservice.org/sites/cpan.perl.org/CPAN/
ftp://ftp.mirrorservice.org/sites/cpan.perl.org/CPAN/
http://mirror.ox.ac.uk/sites/www.cpan.org/
ftp://mirror.ox.ac.uk/sites/www.cpan.org/
http://ftp.ticklers.org/pub/CPAN/
ftp://ftp.ticklers.org/pub/CPAN/
http://cpan.mirrors.uk2.net/
ftp://mirrors.uk2.net/pub/CPAN/
http://mirror.ukhost4u.com/CPAN/
=back
=head2 North America
=over 4
=item Canada
http://CPAN.mirror.rafal.ca/
ftp://CPAN.mirror.rafal.ca/pub/CPAN/
http://mirror.csclub.uwaterloo.ca/CPAN/
ftp://mirror.csclub.uwaterloo.ca/CPAN/
http://mirrors.gossamer-threads.com/CPAN/
http://mirror.its.dal.ca/cpan/
ftp://mirror.its.dal.ca/cpan/
ftp://ftp.ottix.net/pub/CPAN/
=item Costa Rica
http://mirrors.ucr.ac.cr/CPAN/
=item Mexico
http://www.msg.com.mx/CPAN/
ftp://ftp.msg.com.mx/pub/CPAN/
=item United States
=over 8
=item Alabama
http://mirror.teklinks.com/CPAN/
=item Arizona
http://mirror.n5tech.com/CPAN/
http://mirrors.namecheap.com/CPAN/
ftp://mirrors.namecheap.com/CPAN/
=item California
http://cpan.develooper.com/
http://httpupdate127.cpanel.net/CPAN/
http://mirrors.sonic.net/cpan/
ftp://mirrors.sonic.net/cpan/
http://www.perl.com/CPAN/
http://cpan.yimg.com/
=item Idaho
http://mirrors.syringanetworks.net/CPAN/
ftp://mirrors.syringanetworks.net/CPAN/
=item Illinois
http://cpan.mirrors.hoobly.com/
http://mirror.team-cymru.org/CPAN/
ftp://mirror.team-cymru.org/CPAN/
=item Indiana
http://cpan.netnitco.net/
ftp://cpan.netnitco.net/pub/mirrors/CPAN/
ftp://ftp.uwsg.iu.edu/pub/perl/CPAN/
=item Kansas
http://mirrors.concertpass.com/cpan/
=item Massachusetts
http://mirrors.ccs.neu.edu/CPAN/
=item Michigan
http://cpan.cse.msu.edu/
ftp://cpan.cse.msu.edu/
http://httpupdate118.cpanel.net/CPAN/
http://mirrors-usa.go-parts.com/cpan/
http://ftp.wayne.edu/CPAN/
ftp://ftp.wayne.edu/CPAN/
=item New Hampshire
http://mirror.metrocast.net/cpan/
=item New Jersey
http://mirror.datapipe.net/CPAN/
ftp://mirror.datapipe.net/pub/CPAN/
http://www.hoovism.com/CPAN/
ftp://ftp.hoovism.com/CPAN/
http://cpan.mirror.nac.net/
=item New York
http://mirror.cc.columbia.edu/pub/software/cpan/
ftp://mirror.cc.columbia.edu/pub/software/cpan/
http://cpan.belfry.net/
http://cpan.erlbaum.net/
ftp://cpan.erlbaum.net/CPAN/
http://cpan.hexten.net/
ftp://cpan.hexten.net/
http://mirror.nyi.net/CPAN/
ftp://mirror.nyi.net/pub/CPAN/
http://noodle.portalus.net/CPAN/
ftp://noodle.portalus.net/CPAN/
http://mirrors.rit.edu/CPAN/
ftp://mirrors.rit.edu/CPAN/
=item North Carolina
http://httpupdate140.cpanel.net/CPAN/
http://mirrors.ibiblio.org/CPAN/
=item Oregon
http://ftp.osuosl.org/pub/CPAN/
ftp://ftp.osuosl.org/pub/CPAN/
http://mirror.uoregon.edu/CPAN/
=item Pennsylvania
http://cpan.pair.com/
ftp://cpan.pair.com/pub/CPAN/
http://cpan.mirrors.ionfish.org/
=item South Carolina
http://cpan.mirror.clemson.edu/
=item Texas
http://mirror.uta.edu/CPAN/
=item Utah
http://cpan.cs.utah.edu/
ftp://cpan.cs.utah.edu/CPAN/
ftp://mirror.xmission.com/CPAN/
=item Virginia
http://mirror.cogentco.com/pub/CPAN/
ftp://mirror.cogentco.com/pub/CPAN/
http://mirror.jmu.edu/pub/CPAN/
ftp://mirror.jmu.edu/pub/CPAN/
http://mirror.us.leaseweb.net/CPAN/
ftp://mirror.us.leaseweb.net/CPAN/
=item Washington
http://cpan.llarian.net/
ftp://cpan.llarian.net/pub/CPAN/
=item Wisconsin
http://cpan.mirrors.tds.net/
ftp://cpan.mirrors.tds.net/pub/CPAN/
=back
=back
=head2 Oceania
=over 4
=item Australia
http://mirror.as24220.net/pub/cpan/
ftp://mirror.as24220.net/pub/cpan/
http://cpan.mirrors.ilisys.com.au/
http://cpan.mirror.digitalpacific.com.au/
ftp://mirror.internode.on.net/pub/cpan/
http://mirror.optusnet.com.au/CPAN/
http://cpan.mirror.serversaustralia.com.au/
http://cpan.uberglobalmirror.com/
http://mirror.waia.asn.au/pub/cpan/
=item New Caledonia
http://cpan.lagoon.nc/pub/CPAN/
ftp://cpan.lagoon.nc/pub/CPAN/
http://cpan.nautile.nc/CPAN/
ftp://cpan.nautile.nc/CPAN/
=item New Zealand
ftp://ftp.auckland.ac.nz/pub/perl/CPAN/
http://cpan.catalyst.net.nz/CPAN/
ftp://cpan.catalyst.net.nz/pub/CPAN/
http://cpan.inspire.net.nz/
ftp://cpan.inspire.net.nz/cpan/
http://mirror.webtastix.net/CPAN/
ftp://mirror.webtastix.net/CPAN/
=back
=head2 South America
=over 4
=item Argentina
http://cpan.mmgdesigns.com.ar/
=item Brazil
http://cpan.kinghost.net/
http://linorg.usp.br/CPAN/
http://mirror.nbtelecom.com.br/CPAN/
=item Chile
http://cpan.dcc.uchile.cl/
ftp://cpan.dcc.uchile.cl/pub/lang/cpan/
=back
=head2 RSYNC Mirrors
rsync://ftp.is.co.za/IS-Mirror/ftp.cpan.org/
rsync://mirror.ac.za/CPAN/
rsync://mirror.zol.co.zw/CPAN/
rsync://mirror.dhakacom.com/CPAN/
rsync://mirrors.ustc.edu.cn/CPAN/
rsync://mirrors.xmu.edu.cn/CPAN/
rsync://kambing.ui.ac.id/CPAN/
rsync://ftp.jaist.ac.jp/pub/CPAN/
rsync://mirror.jre655.com/CPAN/
rsync://ftp.kddilabs.jp/cpan/
rsync://ftp.nara.wide.ad.jp/cpan/
rsync://ftp.riken.jp/cpan/
rsync://mirror.neolabs.kz/CPAN/
rsync://mirror.qnren.qa/CPAN/
rsync://ftp.neowiz.com/CPAN/
rsync://mirror.0x.sg/CPAN/
rsync://ftp.yzu.edu.tw/pub/CPAN/
rsync://ftp.ubuntu-tw.org/CPAN/
rsync://mirrors.digipower.vn/CPAN/
rsync://cpan.inode.at/CPAN/
rsync://ftp.byfly.by/CPAN/
rsync://mirror.datacenter.by/CPAN/
rsync://ftp.belnet.be/cpan/
rsync://cpan.mirror.ba/CPAN/
rsync://mirrors.neterra.net/CPAN/
rsync://mirrors.netix.net/CPAN/
rsync://mirror.dkm.cz/cpan/
rsync://mirrors.nic.cz/CPAN/
rsync://cpan.mirror.vutbr.cz/cpan/
rsync://rsync.nic.funet.fi/CPAN/
rsync://ftp.ciril.fr/pub/cpan/
rsync://distrib-coffee.ipsl.jussieu.fr/pub/mirrors/cpan/
rsync://cpan.mirrors.ovh.net/CPAN/
rsync://mirror.de.leaseweb.net/CPAN/
rsync://mirror.euserv.net/cpan/
rsync://ftp-stud.hs-esslingen.de/CPAN/
rsync://ftp.gwdg.de/pub/languages/perl/CPAN/
rsync://ftp.hawo.stw.uni-erlangen.de/CPAN/
rsync://cpan.mirror.iphh.net/CPAN/
rsync://mirror.netcologne.de/cpan/
rsync://ftp.halifax.rwth-aachen.de/cpan/
rsync://ftp.ntua.gr/CPAN/
rsync://mirror.met.hu/CPAN/
rsync://ftp.heanet.ie/mirrors/ftp.perl.org/pub/CPAN/
rsync://rsync.panu.it/CPAN/
rsync://mirror.as43289.net/CPAN/
rsync://rsync.cs.uu.nl/CPAN/
rsync://mirror.nl.leaseweb.net/CPAN/
rsync://ftp.nluug.nl/CPAN/
rsync://mirror.transip.net/CPAN/
rsync://cpan.uib.no/cpan/
rsync://cpan.vianett.no/CPAN/
rsync://cpan.perl-hackers.net/CPAN/
rsync://cpan.perl.pt/cpan/
rsync://mirrors.m247.ro/CPAN/
rsync://mirrors.teentelecom.net/CPAN/
rsync://cpan.webdesk.ru/CPAN/
rsync://mirror.yandex.ru/mirrors/cpan/
rsync://mirror.sbb.rs/CPAN/
rsync://ftp.acc.umu.se/mirror/CPAN/
rsync://rsync.pirbot.com/ftp/cpan/
rsync://cpan.ip-connect.vn.ua/CPAN/
rsync://rsync.mirror.anlx.net/CPAN/
rsync://mirror.bytemark.co.uk/CPAN/
rsync://mirror.sax.uk.as61049.net/CPAN/
rsync://rsync.mirrorservice.org/cpan.perl.org/CPAN/
rsync://ftp.ticklers.org/CPAN/
rsync://mirrors.uk2.net/CPAN/
rsync://CPAN.mirror.rafal.ca/CPAN/
rsync://mirror.csclub.uwaterloo.ca/CPAN/
rsync://mirrors.namecheap.com/CPAN/
rsync://mirrors.syringanetworks.net/CPAN/
rsync://mirror.team-cymru.org/CPAN/
rsync://debian.cse.msu.edu/cpan/
rsync://mirrors-usa.go-parts.com/mirrors/cpan/
rsync://rsync.hoovism.com/CPAN/
rsync://mirror.cc.columbia.edu/cpan/
rsync://noodle.portalus.net/CPAN/
rsync://mirrors.rit.edu/cpan/
rsync://mirrors.ibiblio.org/CPAN/
rsync://cpan.pair.com/CPAN/
rsync://cpan.cs.utah.edu/CPAN/
rsync://mirror.cogentco.com/CPAN/
rsync://mirror.jmu.edu/CPAN/
rsync://mirror.us.leaseweb.net/CPAN/
rsync://cpan.mirror.digitalpacific.com.au/cpan/
rsync://mirror.internode.on.net/cpan/
rsync://uberglobalmirror.com/cpan/
rsync://cpan.lagoon.nc/cpan/
rsync://mirrors.mmgdesigns.com.ar/CPAN/
For an up-to-date listing of CPAN sites,
see L<http://www.cpan.org/SITES> or L<ftp://www.cpan.org/SITES>.
=head1 Modules: Creation, Use, and Abuse
(The following section is borrowed directly from Tim Bunce's modules
file, available at your nearest CPAN site.)
Perl implements a class using a package, but the presence of a
package doesn't imply the presence of a class. A package is just a
namespace. A class is a package that provides subroutines that can be
used as methods. A method is just a subroutine that expects, as its
first argument, either the name of a package (for "static" methods),
or a reference to something (for "virtual" methods).
A module is a file that (by convention) provides a class of the same
name (sans the .pm), plus an import method in that class that can be
called to fetch exported symbols. This module may implement some of
its methods by loading dynamic C or C++ objects, but that should be
totally transparent to the user of the module. Likewise, the module
might set up an AUTOLOAD function to slurp in subroutine definitions on
demand, but this is also transparent. Only the F<.pm> file is required to
exist. See L<perlsub>, L<perlobj>, and L<AutoLoader> for details about
the AUTOLOAD mechanism.
=head2 Guidelines for Module Creation
=over 4
=item *
Do similar modules already exist in some form?
If so, please try to reuse the existing modules either in whole or
by inheriting useful features into a new class. If this is not
practical try to get together with the module authors to work on
extending or enhancing the functionality of the existing modules.
A perfect example is the plethora of packages in perl4 for dealing
with command line options.
If you are writing a module to expand an already existing set of
modules, please coordinate with the author of the package. It
helps if you follow the same naming scheme and module interaction
scheme as the original author.
=item *
Try to design the new module to be easy to extend and reuse.
Try to C<use warnings;> (or C<use warnings qw(...);>).
Remember that you can add C<no warnings qw(...);> to individual blocks
of code that need less warnings.
Use blessed references. Use the two argument form of bless to bless
into the class name given as the first parameter of the constructor,
e.g.,:
sub new {
my $class = shift;
return bless {}, $class;
}
or even this if you'd like it to be used as either a static
or a virtual method.
sub new {
my $self = shift;
my $class = ref($self) || $self;
return bless {}, $class;
}
Pass arrays as references so more parameters can be added later
(it's also faster). Convert functions into methods where
appropriate. Split large methods into smaller more flexible ones.
Inherit methods from other modules if appropriate.
Avoid class name tests like: C<die "Invalid" unless ref $ref eq 'FOO'>.
Generally you can delete the C<eq 'FOO'> part with no harm at all.
Let the objects look after themselves! Generally, avoid hard-wired
class names as far as possible.
Avoid C<< $r->Class::func() >> where using C<@ISA=qw(... Class ...)> and
C<< $r->func() >> would work.
Use autosplit so little used or newly added functions won't be a
burden to programs that don't use them. Add test functions to
the module after __END__ either using AutoSplit or by saying:
eval join('',<main::DATA>) || die $@ unless caller();
Does your module pass the 'empty subclass' test? If you say
C<@SUBCLASS::ISA = qw(YOURCLASS);> your applications should be able
to use SUBCLASS in exactly the same way as YOURCLASS. For example,
does your application still work if you change: C<< $obj = YOURCLASS->new(); >>
into: C<< $obj = SUBCLASS->new(); >> ?
Avoid keeping any state information in your packages. It makes it
difficult for multiple other packages to use yours. Keep state
information in objects.
Always use B<-w>.
Try to C<use strict;> (or C<use strict qw(...);>).
Remember that you can add C<no strict qw(...);> to individual blocks
of code that need less strictness.
Always use B<-w>.
Follow the guidelines in L<perlstyle>.
Always use B<-w>.
=item *
Some simple style guidelines
The perlstyle manual supplied with Perl has many helpful points.
Coding style is a matter of personal taste. Many people evolve their
style over several years as they learn what helps them write and
maintain good code. Here's one set of assorted suggestions that
seem to be widely used by experienced developers:
Use underscores to separate words. It is generally easier to read
$var_names_like_this than $VarNamesLikeThis, especially for
non-native speakers of English. It's also a simple rule that works
consistently with VAR_NAMES_LIKE_THIS.
Package/Module names are an exception to this rule. Perl informally
reserves lowercase module names for 'pragma' modules like integer
and strict. Other modules normally begin with a capital letter and
use mixed case with no underscores (need to be short and portable).
You may find it helpful to use letter case to indicate the scope
or nature of a variable. For example:
$ALL_CAPS_HERE constants only (beware clashes with Perl vars)
$Some_Caps_Here package-wide global/static
$no_caps_here function scope my() or local() variables
Function and method names seem to work best as all lowercase.
e.g., C<< $obj->as_string() >>.
You can use a leading underscore to indicate that a variable or
function should not be used outside the package that defined it.
=item *
Select what to export.
Do NOT export method names!
Do NOT export anything else by default without a good reason!
Exports pollute the namespace of the module user. If you must
export try to use @EXPORT_OK in preference to @EXPORT and avoid
short or common names to reduce the risk of name clashes.
Generally anything not exported is still accessible from outside the
module using the ModuleName::item_name (or C<< $blessed_ref->method >>)
syntax. By convention you can use a leading underscore on names to
indicate informally that they are 'internal' and not for public use.
(It is actually possible to get private functions by saying:
C<my $subref = sub { ... }; &$subref;>. But there's no way to call that
directly as a method, because a method must have a name in the symbol
table.)
As a general rule, if the module is trying to be object oriented
then export nothing. If it's just a collection of functions then
@EXPORT_OK anything but use @EXPORT with caution.
=item *
Select a name for the module.
This name should be as descriptive, accurate, and complete as
possible. Avoid any risk of ambiguity. Always try to use two or
more whole words. Generally the name should reflect what is special
about what the module does rather than how it does it. Please use
nested module names to group informally or categorize a module.
There should be a very good reason for a module not to have a nested name.
Module names should begin with a capital letter.
Having 57 modules all called Sort will not make life easy for anyone
(though having 23 called Sort::Quick is only marginally better :-).
Imagine someone trying to install your module alongside many others.
If you are developing a suite of related modules/classes it's good
practice to use nested classes with a common prefix as this will
avoid namespace clashes. For example: Xyz::Control, Xyz::View,
Xyz::Model etc. Use the modules in this list as a naming guide.
If adding a new module to a set, follow the original author's
standards for naming modules and the interface to methods in
those modules.
If developing modules for private internal or project specific use,
that will never be released to the public, then you should ensure
that their names will not clash with any future public module. You
can do this either by using the reserved Local::* category or by
using a category name that includes an underscore like Foo_Corp::*.
To be portable each component of a module name should be limited to
11 characters. If it might be used on MS-DOS then try to ensure each is
unique in the first 8 characters. Nested modules make this easier.
For additional guidance on the naming of modules, please consult:
http://pause.perl.org/pause/query?ACTION=pause_namingmodules
or send mail to the <module-authors@perl.org> mailing list.
=item *
Have you got it right?
How do you know that you've made the right decisions? Have you
picked an interface design that will cause problems later? Have
you picked the most appropriate name? Do you have any questions?
The best way to know for sure, and pick up many helpful suggestions,
is to ask someone who knows. The <module-authors@perl.org> mailing list
is useful for this purpose; it's also accessible via news interface as
perl.module-authors at nntp.perl.org.
All you need to do is post a short summary of the module, its
purpose and interfaces. A few lines on each of the main methods is
probably enough. (If you post the whole module it might be ignored
by busy people - generally the very people you want to read it!)
Don't worry about posting if you can't say when the module will be
ready - just say so in the message. It might be worth inviting
others to help you, they may be able to complete it for you!
=item *
README and other Additional Files.
It's well known that software developers usually fully document the
software they write. If, however, the world is in urgent need of
your software and there is not enough time to write the full
documentation please at least provide a README file containing:
=over 10
=item *
A description of the module/package/extension etc.
=item *
A copyright notice - see below.
=item *
Prerequisites - what else you may need to have.
=item *
How to build it - possible changes to Makefile.PL etc.
=item *
How to install it.
=item *
Recent changes in this release, especially incompatibilities
=item *
Changes / enhancements you plan to make in the future.
=back
If the README file seems to be getting too large you may wish to
split out some of the sections into separate files: INSTALL,
Copying, ToDo etc.
=over 4
=item *
Adding a Copyright Notice.
How you choose to license your work is a personal decision.
The general mechanism is to assert your Copyright and then make
a declaration of how others may copy/use/modify your work.
Perl, for example, is supplied with two types of licence: The GNU GPL
and The Artistic Licence (see the files README, Copying, and Artistic,
or L<perlgpl> and L<perlartistic>). Larry has good reasons for NOT
just using the GNU GPL.
My personal recommendation, out of respect for Larry, Perl, and the
Perl community at large is to state something simply like:
Copyright (c) 1995 Your Name. All rights reserved.
This program is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
This statement should at least appear in the README file. You may
also wish to include it in a Copying file and your source files.
Remember to include the other words in addition to the Copyright.
=item *
Give the module a version/issue/release number.
To be fully compatible with the Exporter and MakeMaker modules you
should store your module's version number in a non-my package
variable called $VERSION. This should be a positive floating point
number with at least two digits after the decimal (i.e., hundredths,
e.g, C<$VERSION = "0.01">). Don't use a "1.3.2" style version.
See L<Exporter> for details.
It may be handy to add a function or method to retrieve the number.
Use the number in announcements and archive file names when
releasing the module (ModuleName-1.02.tar.Z).
See perldoc ExtUtils::MakeMaker.pm for details.
=item *
How to release and distribute a module.
If possible, register the module with CPAN. Follow the instructions
and links on:
http://www.cpan.org/modules/04pause.html
and upload to:
http://pause.perl.org/
and notify <modules@perl.org>. This will allow anyone to install
your module using the C<cpan> tool distributed with Perl.
By using the WWW interface you can ask the Upload Server to mirror
your modules from your ftp or WWW site into your own directory on
CPAN!
=item *
Take care when changing a released module.
Always strive to remain compatible with previous released versions.
Otherwise try to add a mechanism to revert to the
old behavior if people rely on it. Document incompatible changes.
=back
=back
=head2 Guidelines for Converting Perl 4 Library Scripts into Modules
=over 4
=item *
There is no requirement to convert anything.
If it ain't broke, don't fix it! Perl 4 library scripts should
continue to work with no problems. You may need to make some minor
changes (like escaping non-array @'s in double quoted strings) but
there is no need to convert a .pl file into a Module for just that.
=item *
Consider the implications.
All Perl applications that make use of the script will need to
be changed (slightly) if the script is converted into a module. Is
it worth it unless you plan to make other changes at the same time?
=item *
Make the most of the opportunity.
If you are going to convert the script to a module you can use the
opportunity to redesign the interface. The guidelines for module
creation above include many of the issues you should consider.
=item *
The pl2pm utility will get you started.
This utility will read *.pl files (given as parameters) and write
corresponding *.pm files. The pl2pm utilities does the following:
=over 10
=item *
Adds the standard Module prologue lines
=item *
Converts package specifiers from ' to ::
=item *
Converts die(...) to croak(...)
=item *
Several other minor changes
=back
Being a mechanical process pl2pm is not bullet proof. The converted
code will need careful checking, especially any package statements.
Don't delete the original .pl file till the new .pm one works!
=back
=head2 Guidelines for Reusing Application Code
=over 4
=item *
Complete applications rarely belong in the Perl Module Library.
=item *
Many applications contain some Perl code that could be reused.
Help save the world! Share your code in a form that makes it easy
to reuse.
=item *
Break-out the reusable code into one or more separate module files.
=item *
Take the opportunity to reconsider and redesign the interfaces.
=item *
In some cases the 'application' can then be reduced to a small
fragment of code built on top of the reusable modules. In these cases
the application could invoked as:
% perl -e 'use Module::Name; method(@ARGV)' ...
or
% perl -mModule::Name ... (in perl5.002 or higher)
=back
=head1 NOTE
Perl does not enforce private and public parts of its modules as you may
have been used to in other languages like C++, Ada, or Modula-17. Perl
doesn't have an infatuation with enforced privacy. It would prefer
that you stayed out of its living room because you weren't invited, not
because it has a shotgun.
The module and its user have a contract, part of which is common law,
and part of which is "written". Part of the common law contract is
that a module doesn't pollute any namespace it wasn't asked to. The
written contract for the module (A.K.A. documentation) may make other
provisions. But then you know when you C<use RedefineTheWorld> that
you're redefining the world and willing to take the consequences.
=cut
ex: set ro:
PK {3�Zʳ��* * perl5221delta.podnu �[��� =encoding utf8
=head1 NAME
perl5221delta - what is new for perl v5.22.1
=head1 DESCRIPTION
This document describes differences between the 5.22.0 release and the 5.22.1
release.
If you are upgrading from an earlier release such as 5.20.0, first read
L<perl5220delta>, which describes differences between 5.20.0 and 5.22.0.
=head1 Incompatible Changes
There are no changes intentionally incompatible with 5.20.0 other than the
following single exception, which we deemed to be a sensible change to make in
order to get the new C<\b{wb}> and (in particular) C<\b{sb}> features sane
before people decided they're worthless because of bugs in their Perl 5.22.0
implementation and avoided them in the future.
If any others exist, they are bugs, and we request that you submit a report.
See L</Reporting Bugs> below.
=head2 Bounds Checking Constructs
Several bugs, including a segmentation fault, have been fixed with the bounds
checking constructs (introduced in Perl 5.22) C<\b{gcb}>, C<\b{sb}>, C<\b{wb}>,
C<\B{gcb}>, C<\B{sb}>, and C<\B{wb}>. All the C<\B{}> ones now match an empty
string; none of the C<\b{}> ones do.
L<[perl #126319]|https://rt.perl.org/Ticket/Display.html?id=126319>
=head1 Modules and Pragmata
=head2 Updated Modules and Pragmata
=over 4
=item *
L<Module::CoreList> has been upgraded from version 5.20150520 to 5.20151213.
=item *
L<PerlIO::scalar> has been upgraded from version 0.22 to 0.23.
=item *
L<POSIX> has been upgraded from version 1.53 to 1.53_01.
If C<POSIX::strerror> was passed C<$!> as its argument then it accidentally
cleared C<$!>. This has been fixed.
L<[perl #126229]|https://rt.perl.org/Ticket/Display.html?id=126229>
=item *
L<Storable> has been upgraded from version 2.53 to 2.53_01.
=item *
L<warnings> has been upgraded from version 1.32 to 1.34.
The C<warnings::enabled> example now actually uses C<warnings::enabled>.
L<[perl #126051]|https://rt.perl.org/Ticket/Display.html?id=126051>
=item *
L<Win32> has been upgraded from version 0.51 to 0.52.
This has been updated for Windows 8.1, 10 and 2012 R2 Server.
=back
=head1 Documentation
=head2 Changes to Existing Documentation
=head3 L<perltie>
=over 4
=item *
The usage of C<FIRSTKEY> and C<NEXTKEY> has been clarified.
=back
=head3 L<perlvar>
=over 4
=item *
The specific true value of C<$!{E...}> is now documented, noting that it is
subject to change and not guaranteed.
=back
=head1 Diagnostics
The following additions or changes have been made to diagnostic output,
including warnings and fatal error messages. For the complete list of
diagnostic messages, see L<perldiag>.
=head2 Changes to Existing Diagnostics
=over 4
=item *
The C<printf> and C<sprintf> builtins are now more careful about the warnings
they emit: argument reordering now disables the "redundant argument" warning in
all cases.
L<[perl #125469]|https://rt.perl.org/Ticket/Display.html?id=125469>
=back
=head1 Configuration and Compilation
=over 4
=item *
Using the C<NO_HASH_SEED> define in combination with the default hash algorithm
C<PERL_HASH_FUNC_ONE_AT_A_TIME_HARD> resulted in a fatal error while compiling
the interpreter, since Perl 5.17.10. This has been fixed.
=item *
Configuring with ccflags containing quotes (e.g.
C<< -Accflags='-DAPPLLIB_EXP=\"/usr/libperl\"' >>) was broken in Perl 5.22.0
but has now been fixed again.
L<[perl #125314]|https://rt.perl.org/Ticket/Display.html?id=125314>
=back
=head1 Platform Support
=head2 Platform-Specific Notes
=over 4
=item IRIX
=over
=item *
Under some circumstances IRIX stdio fgetc() and fread() set the errno to
C<ENOENT>, which made no sense according to either IRIX or POSIX docs. Errno
is now cleared in such cases.
L<[perl #123977]|https://rt.perl.org/Ticket/Display.html?id=123977>
=item *
Problems when multiplying long doubles by infinity have been fixed.
L<[perl #126396]|https://rt.perl.org/Ticket/Display.html?id=126396>
=item *
All tests pass now on IRIX with the default build configuration.
=back
=back
=head1 Selected Bug Fixes
=over 4
=item *
C<qr/(?[ () ])/> no longer segfaults, giving a syntax error message instead.
L<[perl #125805]|https://rt.perl.org/Ticket/Display.html?id=125805>
=item *
Regular expression possessive quantifier Perl 5.20 regression now fixed.
C<qr/>I<PAT>C<{>I<min>,I<max>C<}+>C</> is supposed to behave identically to
C<qr/(?E<gt>>I<PAT>C<{>I<min>,I<max>C<})/>. Since Perl 5.20, this didn't work
if I<min> and I<max> were equal.
L<[perl #125825]|https://rt.perl.org/Ticket/Display.html?id=125825>
=item *
Certain syntax errors in
L<perlrecharclass/Extended Bracketed Character Classes> caused panics instead
of the proper error message. This has now been fixed.
L<[perl #126481]|https://rt.perl.org/Ticket/Display.html?id=126481>
=item *
C<< BEGIN <> >> no longer segfaults and properly produces an error message.
L<[perl #125341]|https://rt.perl.org/Ticket/Display.html?id=125341>
=item *
A regression from Perl 5.20 has been fixed, in which some syntax errors in
L<C<(?[...])>|perlrecharclass/Extended Bracketed Character Classes> constructs
within regular expression patterns could cause a segfault instead of a proper
error message.
L<[perl #126180]|https://rt.perl.org/Ticket/Display.html?id=126180>
=item *
Another problem with
L<C<(?[...])>|perlrecharclass/Extended Bracketed Character Classes>
constructs has been fixed wherein things like C<\c]> could cause panics.
L<[perl #126181]|https://rt.perl.org/Ticket/Display.html?id=126181>
=item *
In Perl 5.22.0, the logic changed when parsing a numeric parameter to the -C
option, such that the successfully parsed number was not saved as the option
value if it parsed to the end of the argument.
L<[perl #125381]|https://rt.perl.org/Ticket/Display.html?id=125381>
=item *
Warning fatality is now ignored when rewinding the stack. This prevents
infinite recursion when the now fatal error also causes rewinding of the stack.
L<[perl #123398]|https://rt.perl.org/Ticket/Display.html?id=123398>
=item *
A crash with C<< %::=(); J->${\"::"} >> has been fixed.
L<[perl #125541]|https://rt.perl.org/Ticket/Display.html?id=125541>
=item *
Nested quantifiers such as C</.{1}??/> should cause perl to throw a fatal
error, but were being silently accepted since Perl 5.20.0. This has been
fixed.
L<[perl #126253]|https://rt.perl.org/Ticket/Display.html?id=126253>
=item *
Regular expression sequences such as C</(?i/> (and similarly with other
recognized flags or combination of flags) should cause perl to throw a fatal
error, but were being silently accepted since Perl 5.18.0. This has been
fixed.
L<[perl #126178]|https://rt.perl.org/Ticket/Display.html?id=126178>
=item *
A bug in hexadecimal floating point literal support meant that high-order bits
could be lost in cases where mantissa overflow was caused by too many trailing
zeros in the fractional part. This has been fixed.
L<[perl #126582]|https://rt.perl.org/Ticket/Display.html?id=126582>
=item *
Another hexadecimal floating point bug, causing low-order bits to be lost in
cases where the last hexadecimal digit of the mantissa has bits straddling the
limit of the number of bits allowed for the mantissa, has also been fixed.
L<[perl #126586]|https://rt.perl.org/Ticket/Display.html?id=126586>
=item *
Further hexadecimal floating point bugs have been fixed: In some circumstances,
the C<%a> format specifier could variously lose the sign of the negative zero,
fail to display zeros after the radix point with the requested precision, or
even lose the radix point after the leftmost hexadecimal digit completely.
=item *
A crash caused by incomplete expressions within C<< /(?[ ])/ >> (e.g.
C<< /(?[[0]+()+])/ >>) has been fixed.
L<[perl #126615]|https://rt.perl.org/Ticket/Display.html?id=126615>
=back
=head1 Acknowledgements
Perl 5.22.1 represents approximately 6 months of development since Perl 5.22.0
and contains approximately 19,000 lines of changes across 130 files from 27
authors.
Excluding auto-generated files, documentation and release tools, there were
approximately 1,700 lines of changes to 44 .pm, .t, .c and .h files.
Perl continues to flourish into its third decade thanks to a vibrant community
of users and developers. The following people are known to have contributed
the improvements that became Perl 5.22.1:
Aaron Crane, Abigail, Andy Broad, Aristotle Pagaltzis, Chase Whitener, Chris
'BinGOs' Williams, Craig A. Berry, Daniel Dragan, David Mitchell, Father
Chrysostomos, Herbert Breunung, Hugo van der Sanden, James E Keenan, Jan
Dubois, Jarkko Hietaniemi, Karen Etheridge, Karl Williamson, Lukas Mai, Matthew
Horsfall, Peter Martini, Rafael Garcia-Suarez, Ricardo Signes, Shlomi Fish,
Sisyphus, Steve Hay, Tony Cook, Victor Adam.
The list above is almost certainly incomplete as it is automatically generated
from version control history. In particular, it does not include the names of
the (very much appreciated) contributors who reported issues to the Perl bug
tracker.
Many of the changes included in this version originated in the CPAN modules
included in Perl's core. We're grateful to the entire CPAN community for
helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see
the F<AUTHORS> file in the Perl source distribution.
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles recently
posted to the comp.lang.perl.misc newsgroup and the perl bug database at
https://rt.perl.org/ . There may also be information at
http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the L<perlbug> program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of C<perl -V>,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it
inappropriate to send to a publicly archived mailing list, then please send it
to perl5-security-report@perl.org. This points to a closed subscription
unarchived mailing list, which includes all the core committers, who will be
able to help assess the impact of issues, figure out a resolution, and help
co-ordinate the release of patches to mitigate or fix the problem across all
platforms on which Perl is supported. Please only use this address for
security issues in the Perl core, not for modules independently distributed on
CPAN.
=head1 SEE ALSO
The F<Changes> file for an explanation of how to view exhaustive details on
what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK {3�Z��7�� �� perl56delta.podnu �[��� =head1 NAME
perl56delta - what's new for perl v5.6.0
=head1 DESCRIPTION
This document describes differences between the 5.005 release and the 5.6.0
release.
=head1 Core Enhancements
=head2 Interpreter cloning, threads, and concurrency
Perl 5.6.0 introduces the beginnings of support for running multiple
interpreters concurrently in different threads. In conjunction with
the perl_clone() API call, which can be used to selectively duplicate
the state of any given interpreter, it is possible to compile a
piece of code once in an interpreter, clone that interpreter
one or more times, and run all the resulting interpreters in distinct
threads.
On the Windows platform, this feature is used to emulate fork() at the
interpreter level. See L<perlfork> for details about that.
This feature is still in evolution. It is eventually meant to be used
to selectively clone a subroutine and data reachable from that
subroutine in a separate interpreter and run the cloned subroutine
in a separate thread. Since there is no shared data between the
interpreters, little or no locking will be needed (unless parts of
the symbol table are explicitly shared). This is obviously intended
to be an easy-to-use replacement for the existing threads support.
Support for cloning interpreters and interpreter concurrency can be
enabled using the -Dusethreads Configure option (see win32/Makefile for
how to enable it on Windows.) The resulting perl executable will be
functionally identical to one that was built with -Dmultiplicity, but
the perl_clone() API call will only be available in the former.
-Dusethreads enables the cpp macro USE_ITHREADS by default, which in turn
enables Perl source code changes that provide a clear separation between
the op tree and the data it operates with. The former is immutable, and
can therefore be shared between an interpreter and all of its clones,
while the latter is considered local to each interpreter, and is therefore
copied for each clone.
Note that building Perl with the -Dusemultiplicity Configure option
is adequate if you wish to run multiple B<independent> interpreters
concurrently in different threads. -Dusethreads only provides the
additional functionality of the perl_clone() API call and other
support for running B<cloned> interpreters concurrently.
NOTE: This is an experimental feature. Implementation details are
subject to change.
=head2 Lexically scoped warning categories
You can now control the granularity of warnings emitted by perl at a finer
level using the C<use warnings> pragma. L<warnings> and L<perllexwarn>
have copious documentation on this feature.
=head2 Unicode and UTF-8 support
Perl now uses UTF-8 as its internal representation for character
strings. The C<utf8> and C<bytes> pragmas are used to control this support
in the current lexical scope. See L<perlunicode>, L<utf8> and L<bytes> for
more information.
This feature is expected to evolve quickly to support some form of I/O
disciplines that can be used to specify the kind of input and output data
(bytes or characters). Until that happens, additional modules from CPAN
will be needed to complete the toolkit for dealing with Unicode.
NOTE: This should be considered an experimental feature. Implementation
details are subject to change.
=head2 Support for interpolating named characters
The new C<\N> escape interpolates named characters within strings.
For example, C<"Hi! \N{WHITE SMILING FACE}"> evaluates to a string
with a unicode smiley face at the end.
=head2 "our" declarations
An "our" declaration introduces a value that can be best understood
as a lexically scoped symbolic alias to a global variable in the
package that was current where the variable was declared. This is
mostly useful as an alternative to the C<vars> pragma, but also provides
the opportunity to introduce typing and other attributes for such
variables. See L<perlfunc/our>.
=head2 Support for strings represented as a vector of ordinals
Literals of the form C<v1.2.3.4> are now parsed as a string composed
of characters with the specified ordinals. This is an alternative, more
readable way to construct (possibly unicode) strings instead of
interpolating characters, as in C<"\x{1}\x{2}\x{3}\x{4}">. The leading
C<v> may be omitted if there are more than two ordinals, so C<1.2.3> is
parsed the same as C<v1.2.3>.
Strings written in this form are also useful to represent version "numbers".
It is easy to compare such version "numbers" (which are really just plain
strings) using any of the usual string comparison operators C<eq>, C<ne>,
C<lt>, C<gt>, etc., or perform bitwise string operations on them using C<|>,
C<&>, etc.
In conjunction with the new C<$^V> magic variable (which contains
the perl version as a string), such literals can be used as a readable way
to check if you're running a particular version of Perl:
# this will parse in older versions of Perl also
if ($^V and $^V gt v5.6.0) {
# new features supported
}
C<require> and C<use> also have some special magic to support such
literals, but this particular usage should be avoided because it leads to
misleading error messages under versions of Perl which don't support vector
strings. Using a true version number will ensure correct behavior in all
versions of Perl:
require 5.006; # run time check for v5.6
use 5.006_001; # compile time check for v5.6.1
Also, C<sprintf> and C<printf> support the Perl-specific format flag C<%v>
to print ordinals of characters in arbitrary strings:
printf "v%vd", $^V; # prints current version, such as "v5.5.650"
printf "%*vX", ":", $addr; # formats IPv6 address
printf "%*vb", " ", $bits; # displays bitstring
See L<perldata/"Scalar value constructors"> for additional information.
=head2 Improved Perl version numbering system
Beginning with Perl version 5.6.0, the version number convention has been
changed to a "dotted integer" scheme that is more commonly found in open
source projects.
Maintenance versions of v5.6.0 will be released as v5.6.1, v5.6.2 etc.
The next development series following v5.6.0 will be numbered v5.7.x,
beginning with v5.7.0, and the next major production release following
v5.6.0 will be v5.8.0.
The English module now sets $PERL_VERSION to $^V (a string value) rather
than C<$]> (a numeric value). (This is a potential incompatibility.
Send us a report via perlbug if you are affected by this.)
The v1.2.3 syntax is also now legal in Perl.
See L</Support for strings represented as a vector of ordinals> for more on that.
To cope with the new versioning system's use of at least three significant
digits for each version component, the method used for incrementing the
subversion number has also changed slightly. We assume that versions older
than v5.6.0 have been incrementing the subversion component in multiples of
10. Versions after v5.6.0 will increment them by 1. Thus, using the new
notation, 5.005_03 is the "same" as v5.5.30, and the first maintenance
version following v5.6.0 will be v5.6.1 (which should be read as being
equivalent to a floating point value of 5.006_001 in the older format,
stored in C<$]>).
=head2 New syntax for declaring subroutine attributes
Formerly, if you wanted to mark a subroutine as being a method call or
as requiring an automatic lock() when it is entered, you had to declare
that with a C<use attrs> pragma in the body of the subroutine.
That can now be accomplished with declaration syntax, like this:
sub mymethod : locked method;
...
sub mymethod : locked method {
...
}
sub othermethod :locked :method;
...
sub othermethod :locked :method {
...
}
(Note how only the first C<:> is mandatory, and whitespace surrounding
the C<:> is optional.)
F<AutoSplit.pm> and F<SelfLoader.pm> have been updated to keep the attributes
with the stubs they provide. See L<attributes>.
=head2 File and directory handles can be autovivified
Similar to how constructs such as C<< $x->[0] >> autovivify a reference,
handle constructors (open(), opendir(), pipe(), socketpair(), sysopen(),
socket(), and accept()) now autovivify a file or directory handle
if the handle passed to them is an uninitialized scalar variable. This
allows the constructs such as C<open(my $fh, ...)> and C<open(local $fh,...)>
to be used to create filehandles that will conveniently be closed
automatically when the scope ends, provided there are no other references
to them. This largely eliminates the need for typeglobs when opening
filehandles that must be passed around, as in the following example:
sub myopen {
open my $fh, "@_"
or die "Can't open '@_': $!";
return $fh;
}
{
my $f = myopen("</etc/motd");
print <$f>;
# $f implicitly closed here
}
=head2 open() with more than two arguments
If open() is passed three arguments instead of two, the second argument
is used as the mode and the third argument is taken to be the file name.
This is primarily useful for protecting against unintended magic behavior
of the traditional two-argument form. See L<perlfunc/open>.
=head2 64-bit support
Any platform that has 64-bit integers either
(1) natively as longs or ints
(2) via special compiler flags
(3) using long long or int64_t
is able to use "quads" (64-bit integers) as follows:
=over 4
=item *
constants (decimal, hexadecimal, octal, binary) in the code
=item *
arguments to oct() and hex()
=item *
arguments to print(), printf() and sprintf() (flag prefixes ll, L, q)
=item *
printed as such
=item *
pack() and unpack() "q" and "Q" formats
=item *
in basic arithmetics: + - * / % (NOTE: operating close to the limits
of the integer values may produce surprising results)
=item *
in bit arithmetics: & | ^ ~ << >> (NOTE: these used to be forced
to be 32 bits wide but now operate on the full native width.)
=item *
vec()
=back
Note that unless you have the case (a) you will have to configure
and compile Perl using the -Duse64bitint Configure flag.
NOTE: The Configure flags -Duselonglong and -Duse64bits have been
deprecated. Use -Duse64bitint instead.
There are actually two modes of 64-bitness: the first one is achieved
using Configure -Duse64bitint and the second one using Configure
-Duse64bitall. The difference is that the first one is minimal and
the second one maximal. The first works in more places than the second.
The C<use64bitint> does only as much as is required to get 64-bit
integers into Perl (this may mean, for example, using "long longs")
while your memory may still be limited to 2 gigabytes (because your
pointers could still be 32-bit). Note that the name C<64bitint> does
not imply that your C compiler will be using 64-bit C<int>s (it might,
but it doesn't have to): the C<use64bitint> means that you will be
able to have 64 bits wide scalar values.
The C<use64bitall> goes all the way by attempting to switch also
integers (if it can), longs (and pointers) to being 64-bit. This may
create an even more binary incompatible Perl than -Duse64bitint: the
resulting executable may not run at all in a 32-bit box, or you may
have to reboot/reconfigure/rebuild your operating system to be 64-bit
aware.
Natively 64-bit systems like Alpha and Cray need neither -Duse64bitint
nor -Duse64bitall.
Last but not least: note that due to Perl's habit of always using
floating point numbers, the quads are still not true integers.
When quads overflow their limits (0...18_446_744_073_709_551_615 unsigned,
-9_223_372_036_854_775_808...9_223_372_036_854_775_807 signed), they
are silently promoted to floating point numbers, after which they will
start losing precision (in their lower digits).
NOTE: 64-bit support is still experimental on most platforms.
Existing support only covers the LP64 data model. In particular, the
LLP64 data model is not yet supported. 64-bit libraries and system
APIs on many platforms have not stabilized--your mileage may vary.
=head2 Large file support
If you have filesystems that support "large files" (files larger than
2 gigabytes), you may now also be able to create and access them from
Perl.
NOTE: The default action is to enable large file support, if
available on the platform.
If the large file support is on, and you have a Fcntl constant
O_LARGEFILE, the O_LARGEFILE is automatically added to the flags
of sysopen().
Beware that unless your filesystem also supports "sparse files" seeking
to umpteen petabytes may be inadvisable.
Note that in addition to requiring a proper file system to do large
files you may also need to adjust your per-process (or your
per-system, or per-process-group, or per-user-group) maximum filesize
limits before running Perl scripts that try to handle large files,
especially if you intend to write such files.
Finally, in addition to your process/process group maximum filesize
limits, you may have quota limits on your filesystems that stop you
(your user id or your user group id) from using large files.
Adjusting your process/user/group/file system/operating system limits
is outside the scope of Perl core language. For process limits, you
may try increasing the limits using your shell's limits/limit/ulimit
command before running Perl. The BSD::Resource extension (not
included with the standard Perl distribution) may also be of use, it
offers the getrlimit/setrlimit interface that can be used to adjust
process resource usage limits, including the maximum filesize limit.
=head2 Long doubles
In some systems you may be able to use long doubles to enhance the
range and precision of your double precision floating point numbers
(that is, Perl's numbers). Use Configure -Duselongdouble to enable
this support (if it is available).
=head2 "more bits"
You can "Configure -Dusemorebits" to turn on both the 64-bit support
and the long double support.
=head2 Enhanced support for sort() subroutines
Perl subroutines with a prototype of C<($$)>, and XSUBs in general, can
now be used as sort subroutines. In either case, the two elements to
be compared are passed as normal parameters in @_. See L<perlfunc/sort>.
For unprototyped sort subroutines, the historical behavior of passing
the elements to be compared as the global variables $a and $b remains
unchanged.
=head2 C<sort $coderef @foo> allowed
sort() did not accept a subroutine reference as the comparison
function in earlier versions. This is now permitted.
=head2 File globbing implemented internally
Perl now uses the File::Glob implementation of the glob() operator
automatically. This avoids using an external csh process and the
problems associated with it.
NOTE: This is currently an experimental feature. Interfaces and
implementation are subject to change.
=head2 Support for CHECK blocks
In addition to C<BEGIN>, C<INIT>, C<END>, C<DESTROY> and C<AUTOLOAD>,
subroutines named C<CHECK> are now special. These are queued up during
compilation and behave similar to END blocks, except they are called at
the end of compilation rather than at the end of execution. They cannot
be called directly.
=head2 POSIX character class syntax [: :] supported
For example to match alphabetic characters use /[[:alpha:]]/.
See L<perlre> for details.
=head2 Better pseudo-random number generator
In 5.005_0x and earlier, perl's rand() function used the C library
rand(3) function. As of 5.005_52, Configure tests for drand48(),
random(), and rand() (in that order) and picks the first one it finds.
These changes should result in better random numbers from rand().
=head2 Improved C<qw//> operator
The C<qw//> operator is now evaluated at compile time into a true list
instead of being replaced with a run time call to C<split()>. This
removes the confusing misbehaviour of C<qw//> in scalar context, which
had inherited that behaviour from split().
Thus:
$foo = ($bar) = qw(a b c); print "$foo|$bar\n";
now correctly prints "3|a", instead of "2|a".
=head2 Better worst-case behavior of hashes
Small changes in the hashing algorithm have been implemented in
order to improve the distribution of lower order bits in the
hashed value. This is expected to yield better performance on
keys that are repeated sequences.
=head2 pack() format 'Z' supported
The new format type 'Z' is useful for packing and unpacking null-terminated
strings. See L<perlfunc/"pack">.
=head2 pack() format modifier '!' supported
The new format type modifier '!' is useful for packing and unpacking
native shorts, ints, and longs. See L<perlfunc/"pack">.
=head2 pack() and unpack() support counted strings
The template character '/' can be used to specify a counted string
type to be packed or unpacked. See L<perlfunc/"pack">.
=head2 Comments in pack() templates
The '#' character in a template introduces a comment up to
end of the line. This facilitates documentation of pack()
templates.
=head2 Weak references
In previous versions of Perl, you couldn't cache objects so as
to allow them to be deleted if the last reference from outside
the cache is deleted. The reference in the cache would hold a
reference count on the object and the objects would never be
destroyed.
Another familiar problem is with circular references. When an
object references itself, its reference count would never go
down to zero, and it would not get destroyed until the program
is about to exit.
Weak references solve this by allowing you to "weaken" any
reference, that is, make it not count towards the reference count.
When the last non-weak reference to an object is deleted, the object
is destroyed and all the weak references to the object are
automatically undef-ed.
To use this feature, you need the Devel::WeakRef package from CPAN, which
contains additional documentation.
NOTE: This is an experimental feature. Details are subject to change.
=head2 Binary numbers supported
Binary numbers are now supported as literals, in s?printf formats, and
C<oct()>:
$answer = 0b101010;
printf "The answer is: %b\n", oct("0b101010");
=head2 Lvalue subroutines
Subroutines can now return modifiable lvalues.
See L<perlsub/"Lvalue subroutines">.
NOTE: This is an experimental feature. Details are subject to change.
=head2 Some arrows may be omitted in calls through references
Perl now allows the arrow to be omitted in many constructs
involving subroutine calls through references. For example,
C<< $foo[10]->('foo') >> may now be written C<$foo[10]('foo')>.
This is rather similar to how the arrow may be omitted from
C<< $foo[10]->{'foo'} >>. Note however, that the arrow is still
required for C<< foo(10)->('bar') >>.
=head2 Boolean assignment operators are legal lvalues
Constructs such as C<($a ||= 2) += 1> are now allowed.
=head2 exists() is supported on subroutine names
The exists() builtin now works on subroutine names. A subroutine
is considered to exist if it has been declared (even if implicitly).
See L<perlfunc/exists> for examples.
=head2 exists() and delete() are supported on array elements
The exists() and delete() builtins now work on simple arrays as well.
The behavior is similar to that on hash elements.
exists() can be used to check whether an array element has been
initialized. This avoids autovivifying array elements that don't exist.
If the array is tied, the EXISTS() method in the corresponding tied
package will be invoked.
delete() may be used to remove an element from the array and return
it. The array element at that position returns to its uninitialized
state, so that testing for the same element with exists() will return
false. If the element happens to be the one at the end, the size of
the array also shrinks up to the highest element that tests true for
exists(), or 0 if none such is found. If the array is tied, the DELETE()
method in the corresponding tied package will be invoked.
See L<perlfunc/exists> and L<perlfunc/delete> for examples.
=head2 Pseudo-hashes work better
Dereferencing some types of reference values in a pseudo-hash,
such as C<< $ph->{foo}[1] >>, was accidentally disallowed. This has
been corrected.
When applied to a pseudo-hash element, exists() now reports whether
the specified value exists, not merely if the key is valid.
delete() now works on pseudo-hashes. When given a pseudo-hash element
or slice it deletes the values corresponding to the keys (but not the keys
themselves). See L<perlref/"Pseudo-hashes: Using an array as a hash">.
Pseudo-hash slices with constant keys are now optimized to array lookups
at compile-time.
List assignments to pseudo-hash slices are now supported.
The C<fields> pragma now provides ways to create pseudo-hashes, via
fields::new() and fields::phash(). See L<fields>.
NOTE: The pseudo-hash data type continues to be experimental.
Limiting oneself to the interface elements provided by the
fields pragma will provide protection from any future changes.
=head2 Automatic flushing of output buffers
fork(), exec(), system(), qx//, and pipe open()s now flush buffers
of all files opened for output when the operation was attempted. This
mostly eliminates confusing buffering mishaps suffered by users unaware
of how Perl internally handles I/O.
This is not supported on some platforms like Solaris where a suitably
correct implementation of fflush(NULL) isn't available.
=head2 Better diagnostics on meaningless filehandle operations
Constructs such as C<< open(<FH>) >> and C<< close(<FH>) >>
are compile time errors. Attempting to read from filehandles that
were opened only for writing will now produce warnings (just as
writing to read-only filehandles does).
=head2 Where possible, buffered data discarded from duped input filehandle
C<< open(NEW, "<&OLD") >> now attempts to discard any data that
was previously read and buffered in C<OLD> before duping the handle.
On platforms where doing this is allowed, the next read operation
on C<NEW> will return the same data as the corresponding operation
on C<OLD>. Formerly, it would have returned the data from the start
of the following disk block instead.
=head2 eof() has the same old magic as <>
C<eof()> would return true if no attempt to read from C<< <> >> had
yet been made. C<eof()> has been changed to have a little magic of its
own, it now opens the C<< <> >> files.
=head2 binmode() can be used to set :crlf and :raw modes
binmode() now accepts a second argument that specifies a discipline
for the handle in question. The two pseudo-disciplines ":raw" and
":crlf" are currently supported on DOS-derivative platforms.
See L<perlfunc/"binmode"> and L<open>.
=head2 C<-T> filetest recognizes UTF-8 encoded files as "text"
The algorithm used for the C<-T> filetest has been enhanced to
correctly identify UTF-8 content as "text".
=head2 system(), backticks and pipe open now reflect exec() failure
On Unix and similar platforms, system(), qx() and open(FOO, "cmd |")
etc., are implemented via fork() and exec(). When the underlying
exec() fails, earlier versions did not report the error properly,
since the exec() happened to be in a different process.
The child process now communicates with the parent about the
error in launching the external command, which allows these
constructs to return with their usual error value and set $!.
=head2 Improved diagnostics
Line numbers are no longer suppressed (under most likely circumstances)
during the global destruction phase.
Diagnostics emitted from code running in threads other than the main
thread are now accompanied by the thread ID.
Embedded null characters in diagnostics now actually show up. They
used to truncate the message in prior versions.
$foo::a and $foo::b are now exempt from "possible typo" warnings only
if sort() is encountered in package C<foo>.
Unrecognized alphabetic escapes encountered when parsing quote
constructs now generate a warning, since they may take on new
semantics in later versions of Perl.
Many diagnostics now report the internal operation in which the warning
was provoked, like so:
Use of uninitialized value in concatenation (.) at (eval 1) line 1.
Use of uninitialized value in print at (eval 1) line 1.
Diagnostics that occur within eval may also report the file and line
number where the eval is located, in addition to the eval sequence
number and the line number within the evaluated text itself. For
example:
Not enough arguments for scalar at (eval 4)[newlib/perl5db.pl:1411] line 2, at EOF
=head2 Diagnostics follow STDERR
Diagnostic output now goes to whichever file the C<STDERR> handle
is pointing at, instead of always going to the underlying C runtime
library's C<stderr>.
=head2 More consistent close-on-exec behavior
On systems that support a close-on-exec flag on filehandles, the
flag is now set for any handles created by pipe(), socketpair(),
socket(), and accept(), if that is warranted by the value of $^F
that may be in effect. Earlier versions neglected to set the flag
for handles created with these operators. See L<perlfunc/pipe>,
L<perlfunc/socketpair>, L<perlfunc/socket>, L<perlfunc/accept>,
and L<perlvar/$^F>.
=head2 syswrite() ease-of-use
The length argument of C<syswrite()> has become optional.
=head2 Better syntax checks on parenthesized unary operators
Expressions such as:
print defined(&foo,&bar,&baz);
print uc("foo","bar","baz");
undef($foo,&bar);
used to be accidentally allowed in earlier versions, and produced
unpredictable behaviour. Some produced ancillary warnings
when used in this way; others silently did the wrong thing.
The parenthesized forms of most unary operators that expect a single
argument now ensure that they are not called with more than one
argument, making the cases shown above syntax errors. The usual
behaviour of:
print defined &foo, &bar, &baz;
print uc "foo", "bar", "baz";
undef $foo, &bar;
remains unchanged. See L<perlop>.
=head2 Bit operators support full native integer width
The bit operators (& | ^ ~ << >>) now operate on the full native
integral width (the exact size of which is available in $Config{ivsize}).
For example, if your platform is either natively 64-bit or if Perl
has been configured to use 64-bit integers, these operations apply
to 8 bytes (as opposed to 4 bytes on 32-bit platforms).
For portability, be sure to mask off the excess bits in the result of
unary C<~>, e.g., C<~$x & 0xffffffff>.
=head2 Improved security features
More potentially unsafe operations taint their results for improved
security.
The C<passwd> and C<shell> fields returned by the getpwent(), getpwnam(),
and getpwuid() are now tainted, because the user can affect their own
encrypted password and login shell.
The variable modified by shmread(), and messages returned by msgrcv()
(and its object-oriented interface IPC::SysV::Msg::rcv) are also tainted,
because other untrusted processes can modify messages and shared memory
segments for their own nefarious purposes.
=head2 More functional bareword prototype (*)
Bareword prototypes have been rationalized to enable them to be used
to override builtins that accept barewords and interpret them in
a special way, such as C<require> or C<do>.
Arguments prototyped as C<*> will now be visible within the subroutine
as either a simple scalar or as a reference to a typeglob.
See L<perlsub/Prototypes>.
=head2 C<require> and C<do> may be overridden
C<require> and C<do 'file'> operations may be overridden locally
by importing subroutines of the same name into the current package
(or globally by importing them into the CORE::GLOBAL:: namespace).
Overriding C<require> will also affect C<use>, provided the override
is visible at compile-time.
See L<perlsub/"Overriding Built-in Functions">.
=head2 $^X variables may now have names longer than one character
Formerly, $^X was synonymous with ${"\cX"}, but $^XY was a syntax
error. Now variable names that begin with a control character may be
arbitrarily long. However, for compatibility reasons, these variables
I<must> be written with explicit braces, as C<${^XY}> for example.
C<${^XYZ}> is synonymous with ${"\cXYZ"}. Variable names with more
than one control character, such as C<${^XY^Z}>, are illegal.
The old syntax has not changed. As before, `^X' may be either a
literal control-X character or the two-character sequence `caret' plus
`X'. When braces are omitted, the variable name stops after the
control character. Thus C<"$^XYZ"> continues to be synonymous with
C<$^X . "YZ"> as before.
As before, lexical variables may not have names beginning with control
characters. As before, variables whose names begin with a control
character are always forced to be in package `main'. All such variables
are reserved for future extensions, except those that begin with
C<^_>, which may be used by user programs and are guaranteed not to
acquire special meaning in any future version of Perl.
=head2 New variable $^C reflects C<-c> switch
C<$^C> has a boolean value that reflects whether perl is being run
in compile-only mode (i.e. via the C<-c> switch). Since
BEGIN blocks are executed under such conditions, this variable
enables perl code to determine whether actions that make sense
only during normal running are warranted. See L<perlvar>.
=head2 New variable $^V contains Perl version as a string
C<$^V> contains the Perl version number as a string composed of
characters whose ordinals match the version numbers, i.e. v5.6.0.
This may be used in string comparisons.
See C<Support for strings represented as a vector of ordinals> for an
example.
=head2 Optional Y2K warnings
If Perl is built with the cpp macro C<PERL_Y2KWARN> defined,
it emits optional warnings when concatenating the number 19
with another number.
This behavior must be specifically enabled when running Configure.
See F<INSTALL> and F<README.Y2K>.
=head2 Arrays now always interpolate into double-quoted strings
In double-quoted strings, arrays now interpolate, no matter what. The
behavior in earlier versions of perl 5 was that arrays would interpolate
into strings if the array had been mentioned before the string was
compiled, and otherwise Perl would raise a fatal compile-time error.
In versions 5.000 through 5.003, the error was
Literal @example now requires backslash
In versions 5.004_01 through 5.6.0, the error was
In string, @example now must be written as \@example
The idea here was to get people into the habit of writing
C<"fred\@example.com"> when they wanted a literal C<@> sign, just as
they have always written C<"Give me back my \$5"> when they wanted a
literal C<$> sign.
Starting with 5.6.1, when Perl now sees an C<@> sign in a
double-quoted string, it I<always> attempts to interpolate an array,
regardless of whether or not the array has been used or declared
already. The fatal error has been downgraded to an optional warning:
Possible unintended interpolation of @example in string
This warns you that C<"fred@example.com"> is going to turn into
C<fred.com> if you don't backslash the C<@>.
See http://perl.plover.com/at-error.html for more details
about the history here.
=head2 @- and @+ provide starting/ending offsets of regex matches
The new magic variables @- and @+ provide the starting and ending
offsets, respectively, of $&, $1, $2, etc. See L<perlvar> for
details.
=head1 Modules and Pragmata
=head2 Modules
=over 4
=item attributes
While used internally by Perl as a pragma, this module also
provides a way to fetch subroutine and variable attributes.
See L<attributes>.
=item B
The Perl Compiler suite has been extensively reworked for this
release. More of the standard Perl test suite passes when run
under the Compiler, but there is still a significant way to
go to achieve production quality compiled executables.
NOTE: The Compiler suite remains highly experimental. The
generated code may not be correct, even when it manages to execute
without errors.
=item Benchmark
Overall, Benchmark results exhibit lower average error and better timing
accuracy.
You can now run tests for I<n> seconds instead of guessing the right
number of tests to run: e.g., timethese(-5, ...) will run each
code for at least 5 CPU seconds. Zero as the "number of repetitions"
means "for at least 3 CPU seconds". The output format has also
changed. For example:
use Benchmark;$x=3;timethese(-5,{a=>sub{$x*$x},b=>sub{$x**2}})
will now output something like this:
Benchmark: running a, b, each for at least 5 CPU seconds...
a: 5 wallclock secs ( 5.77 usr + 0.00 sys = 5.77 CPU) @ 200551.91/s (n=1156516)
b: 4 wallclock secs ( 5.00 usr + 0.02 sys = 5.02 CPU) @ 159605.18/s (n=800686)
New features: "each for at least N CPU seconds...", "wallclock secs",
and the "@ operations/CPU second (n=operations)".
timethese() now returns a reference to a hash of Benchmark objects containing
the test results, keyed on the names of the tests.
timethis() now returns the iterations field in the Benchmark result object
instead of 0.
timethese(), timethis(), and the new cmpthese() (see below) can also take
a format specifier of 'none' to suppress output.
A new function countit() is just like timeit() except that it takes a
TIME instead of a COUNT.
A new function cmpthese() prints a chart comparing the results of each test
returned from a timethese() call. For each possible pair of tests, the
percentage speed difference (iters/sec or seconds/iter) is shown.
For other details, see L<Benchmark>.
=item ByteLoader
The ByteLoader is a dedicated extension to generate and run
Perl bytecode. See L<ByteLoader>.
=item constant
References can now be used.
The new version also allows a leading underscore in constant names, but
disallows a double leading underscore (as in "__LINE__"). Some other names
are disallowed or warned against, including BEGIN, END, etc. Some names
which were forced into main:: used to fail silently in some cases; now they're
fatal (outside of main::) and an optional warning (inside of main::).
The ability to detect whether a constant had been set with a given name has
been added.
See L<constant>.
=item charnames
This pragma implements the C<\N> string escape. See L<charnames>.
=item Data::Dumper
A C<Maxdepth> setting can be specified to avoid venturing
too deeply into deep data structures. See L<Data::Dumper>.
The XSUB implementation of Dump() is now automatically called if the
C<Useqq> setting is not in use.
Dumping C<qr//> objects works correctly.
=item DB
C<DB> is an experimental module that exposes a clean abstraction
to Perl's debugging API.
=item DB_File
DB_File can now be built with Berkeley DB versions 1, 2 or 3.
See C<ext/DB_File/Changes>.
=item Devel::DProf
Devel::DProf, a Perl source code profiler has been added. See
L<Devel::DProf> and L<dprofpp>.
=item Devel::Peek
The Devel::Peek module provides access to the internal representation
of Perl variables and data. It is a data debugging tool for the XS programmer.
=item Dumpvalue
The Dumpvalue module provides screen dumps of Perl data.
=item DynaLoader
DynaLoader now supports a dl_unload_file() function on platforms that
support unloading shared objects using dlclose().
Perl can also optionally arrange to unload all extension shared objects
loaded by Perl. To enable this, build Perl with the Configure option
C<-Accflags=-DDL_UNLOAD_ALL_AT_EXIT>. (This maybe useful if you are
using Apache with mod_perl.)
=item English
$PERL_VERSION now stands for C<$^V> (a string value) rather than for C<$]>
(a numeric value).
=item Env
Env now supports accessing environment variables like PATH as array
variables.
=item Fcntl
More Fcntl constants added: F_SETLK64, F_SETLKW64, O_LARGEFILE for
large file (more than 4GB) access (NOTE: the O_LARGEFILE is
automatically added to sysopen() flags if large file support has been
configured, as is the default), Free/Net/OpenBSD locking behaviour
flags F_FLOCK, F_POSIX, Linux F_SHLCK, and O_ACCMODE: the combined
mask of O_RDONLY, O_WRONLY, and O_RDWR. The seek()/sysseek()
constants SEEK_SET, SEEK_CUR, and SEEK_END are available via the
C<:seek> tag. The chmod()/stat() S_IF* constants and S_IS* functions
are available via the C<:mode> tag.
=item File::Compare
A compare_text() function has been added, which allows custom
comparison functions. See L<File::Compare>.
=item File::Find
File::Find now works correctly when the wanted() function is either
autoloaded or is a symbolic reference.
A bug that caused File::Find to lose track of the working directory
when pruning top-level directories has been fixed.
File::Find now also supports several other options to control its
behavior. It can follow symbolic links if the C<follow> option is
specified. Enabling the C<no_chdir> option will make File::Find skip
changing the current directory when walking directories. The C<untaint>
flag can be useful when running with taint checks enabled.
See L<File::Find>.
=item File::Glob
This extension implements BSD-style file globbing. By default,
it will also be used for the internal implementation of the glob()
operator. See L<File::Glob>.
=item File::Spec
New methods have been added to the File::Spec module: devnull() returns
the name of the null device (/dev/null on Unix) and tmpdir() the name of
the temp directory (normally /tmp on Unix). There are now also methods
to convert between absolute and relative filenames: abs2rel() and
rel2abs(). For compatibility with operating systems that specify volume
names in file paths, the splitpath(), splitdir(), and catdir() methods
have been added.
=item File::Spec::Functions
The new File::Spec::Functions modules provides a function interface
to the File::Spec module. Allows shorthand
$fullname = catfile($dir1, $dir2, $file);
instead of
$fullname = File::Spec->catfile($dir1, $dir2, $file);
=item Getopt::Long
Getopt::Long licensing has changed to allow the Perl Artistic License
as well as the GPL. It used to be GPL only, which got in the way of
non-GPL applications that wanted to use Getopt::Long.
Getopt::Long encourages the use of Pod::Usage to produce help
messages. For example:
use Getopt::Long;
use Pod::Usage;
my $man = 0;
my $help = 0;
GetOptions('help|?' => \$help, man => \$man) or pod2usage(2);
pod2usage(1) if $help;
pod2usage(-exitstatus => 0, -verbose => 2) if $man;
__END__
=head1 NAME
sample - Using Getopt::Long and Pod::Usage
=head1 SYNOPSIS
sample [options] [file ...]
Options:
-help brief help message
-man full documentation
=head1 OPTIONS
=over 8
=item B<-help>
Print a brief help message and exits.
=item B<-man>
Prints the manual page and exits.
=back
=head1 DESCRIPTION
B<This program> will read the given input file(s) and do something
useful with the contents thereof.
=cut
See L<Pod::Usage> for details.
A bug that prevented the non-option call-back <> from being
specified as the first argument has been fixed.
To specify the characters < and > as option starters, use ><. Note,
however, that changing option starters is strongly deprecated.
=item IO
write() and syswrite() will now accept a single-argument
form of the call, for consistency with Perl's syswrite().
You can now create a TCP-based IO::Socket::INET without forcing
a connect attempt. This allows you to configure its options
(like making it non-blocking) and then call connect() manually.
A bug that prevented the IO::Socket::protocol() accessor
from ever returning the correct value has been corrected.
IO::Socket::connect now uses non-blocking IO instead of alarm()
to do connect timeouts.
IO::Socket::accept now uses select() instead of alarm() for doing
timeouts.
IO::Socket::INET->new now sets $! correctly on failure. $@ is
still set for backwards compatibility.
=item JPL
Java Perl Lingo is now distributed with Perl. See jpl/README
for more information.
=item lib
C<use lib> now weeds out any trailing duplicate entries.
C<no lib> removes all named entries.
=item Math::BigInt
The bitwise operations C<<< << >>>, C<<< >> >>>, C<&>, C<|>,
and C<~> are now supported on bigints.
=item Math::Complex
The accessor methods Re, Im, arg, abs, rho, and theta can now also
act as mutators (accessor $z->Re(), mutator $z->Re(3)).
The class method C<display_format> and the corresponding object method
C<display_format>, in addition to accepting just one argument, now can
also accept a parameter hash. Recognized keys of a parameter hash are
C<"style">, which corresponds to the old one parameter case, and two
new parameters: C<"format">, which is a printf()-style format string
(defaults usually to C<"%.15g">, you can revert to the default by
setting the format string to C<undef>) used for both parts of a
complex number, and C<"polar_pretty_print"> (defaults to true),
which controls whether an attempt is made to try to recognize small
multiples and rationals of pi (2pi, pi/2) at the argument (angle) of a
polar complex number.
The potentially disruptive change is that in list context both methods
now I<return the parameter hash>, instead of only the value of the
C<"style"> parameter.
=item Math::Trig
A little bit of radial trigonometry (cylindrical and spherical),
radial coordinate conversions, and the great circle distance were added.
=item Pod::Parser, Pod::InputObjects
Pod::Parser is a base class for parsing and selecting sections of
pod documentation from an input stream. This module takes care of
identifying pod paragraphs and commands in the input and hands off the
parsed paragraphs and commands to user-defined methods which are free
to interpret or translate them as they see fit.
Pod::InputObjects defines some input objects needed by Pod::Parser, and
for advanced users of Pod::Parser that need more about a command besides
its name and text.
As of release 5.6.0 of Perl, Pod::Parser is now the officially sanctioned
"base parser code" recommended for use by all pod2xxx translators.
Pod::Text (pod2text) and Pod::Man (pod2man) have already been converted
to use Pod::Parser and efforts to convert Pod::HTML (pod2html) are already
underway. For any questions or comments about pod parsing and translating
issues and utilities, please use the pod-people@perl.org mailing list.
For further information, please see L<Pod::Parser> and L<Pod::InputObjects>.
=item Pod::Checker, podchecker
This utility checks pod files for correct syntax, according to
L<perlpod>. Obvious errors are flagged as such, while warnings are
printed for mistakes that can be handled gracefully. The checklist is
not complete yet. See L<Pod::Checker>.
=item Pod::ParseUtils, Pod::Find
These modules provide a set of gizmos that are useful mainly for pod
translators. L<Pod::Find|Pod::Find> traverses directory structures and
returns found pod files, along with their canonical names (like
C<File::Spec::Unix>). L<Pod::ParseUtils|Pod::ParseUtils> contains
B<Pod::List> (useful for storing pod list information), B<Pod::Hyperlink>
(for parsing the contents of C<LE<lt>E<gt>> sequences) and B<Pod::Cache>
(for caching information about pod files, e.g., link nodes).
=item Pod::Select, podselect
Pod::Select is a subclass of Pod::Parser which provides a function
named "podselect()" to filter out user-specified sections of raw pod
documentation from an input stream. podselect is a script that provides
access to Pod::Select from other scripts to be used as a filter.
See L<Pod::Select>.
=item Pod::Usage, pod2usage
Pod::Usage provides the function "pod2usage()" to print usage messages for
a Perl script based on its embedded pod documentation. The pod2usage()
function is generally useful to all script authors since it lets them
write and maintain a single source (the pods) for documentation, thus
removing the need to create and maintain redundant usage message text
consisting of information already in the pods.
There is also a pod2usage script which can be used from other kinds of
scripts to print usage messages from pods (even for non-Perl scripts
with pods embedded in comments).
For details and examples, please see L<Pod::Usage>.
=item Pod::Text and Pod::Man
Pod::Text has been rewritten to use Pod::Parser. While pod2text() is
still available for backwards compatibility, the module now has a new
preferred interface. See L<Pod::Text> for the details. The new Pod::Text
module is easily subclassed for tweaks to the output, and two such
subclasses (Pod::Text::Termcap for man-page-style bold and underlining
using termcap information, and Pod::Text::Color for markup with ANSI color
sequences) are now standard.
pod2man has been turned into a module, Pod::Man, which also uses
Pod::Parser. In the process, several outstanding bugs related to quotes
in section headers, quoting of code escapes, and nested lists have been
fixed. pod2man is now a wrapper script around this module.
=item SDBM_File
An EXISTS method has been added to this module (and sdbm_exists() has
been added to the underlying sdbm library), so one can now call exists
on an SDBM_File tied hash and get the correct result, rather than a
runtime error.
A bug that may have caused data loss when more than one disk block
happens to be read from the database in a single FETCH() has been
fixed.
=item Sys::Syslog
Sys::Syslog now uses XSUBs to access facilities from syslog.h so it
no longer requires syslog.ph to exist.
=item Sys::Hostname
Sys::Hostname now uses XSUBs to call the C library's gethostname() or
uname() if they exist.
=item Term::ANSIColor
Term::ANSIColor is a very simple module to provide easy and readable
access to the ANSI color and highlighting escape sequences, supported by
most ANSI terminal emulators. It is now included standard.
=item Time::Local
The timelocal() and timegm() functions used to silently return bogus
results when the date fell outside the machine's integer range. They
now consistently croak() if the date falls in an unsupported range.
=item Win32
The error return value in list context has been changed for all functions
that return a list of values. Previously these functions returned a list
with a single element C<undef> if an error occurred. Now these functions
return the empty list in these situations. This applies to the following
functions:
Win32::FsType
Win32::GetOSVersion
The remaining functions are unchanged and continue to return C<undef> on
error even in list context.
The Win32::SetLastError(ERROR) function has been added as a complement
to the Win32::GetLastError() function.
The new Win32::GetFullPathName(FILENAME) returns the full absolute
pathname for FILENAME in scalar context. In list context it returns
a two-element list containing the fully qualified directory name and
the filename. See L<Win32>.
=item XSLoader
The XSLoader extension is a simpler alternative to DynaLoader.
See L<XSLoader>.
=item DBM Filters
A new feature called "DBM Filters" has been added to all the
DBM modules--DB_File, GDBM_File, NDBM_File, ODBM_File, and SDBM_File.
DBM Filters add four new methods to each DBM module:
filter_store_key
filter_store_value
filter_fetch_key
filter_fetch_value
These can be used to filter key-value pairs before the pairs are
written to the database or just after they are read from the database.
See L<perldbmfilter> for further information.
=back
=head2 Pragmata
C<use attrs> is now obsolete, and is only provided for
backward-compatibility. It's been replaced by the C<sub : attributes>
syntax. See L<perlsub/"Subroutine Attributes"> and L<attributes>.
Lexical warnings pragma, C<use warnings;>, to control optional warnings.
See L<perllexwarn>.
C<use filetest> to control the behaviour of filetests (C<-r> C<-w>
...). Currently only one subpragma implemented, "use filetest
'access';", that uses access(2) or equivalent to check permissions
instead of using stat(2) as usual. This matters in filesystems
where there are ACLs (access control lists): the stat(2) might lie,
but access(2) knows better.
The C<open> pragma can be used to specify default disciplines for
handle constructors (e.g. open()) and for qx//. The two
pseudo-disciplines C<:raw> and C<:crlf> are currently supported on
DOS-derivative platforms (i.e. where binmode is not a no-op).
See also L</"binmode() can be used to set :crlf and :raw modes">.
=head1 Utility Changes
=head2 dprofpp
C<dprofpp> is used to display profile data generated using C<Devel::DProf>.
See L<dprofpp>.
=head2 find2perl
The C<find2perl> utility now uses the enhanced features of the File::Find
module. The -depth and -follow options are supported. Pod documentation
is also included in the script.
=head2 h2xs
The C<h2xs> tool can now work in conjunction with C<C::Scan> (available
from CPAN) to automatically parse real-life header files. The C<-M>,
C<-a>, C<-k>, and C<-o> options are new.
=head2 perlcc
C<perlcc> now supports the C and Bytecode backends. By default,
it generates output from the simple C backend rather than the
optimized C backend.
Support for non-Unix platforms has been improved.
=head2 perldoc
C<perldoc> has been reworked to avoid possible security holes.
It will not by default let itself be run as the superuser, but you
may still use the B<-U> switch to try to make it drop privileges
first.
=head2 The Perl Debugger
Many bug fixes and enhancements were added to F<perl5db.pl>, the
Perl debugger. The help documentation was rearranged. New commands
include C<< < ? >>, C<< > ? >>, and C<< { ? >> to list out current
actions, C<man I<docpage>> to run your doc viewer on some perl
docset, and support for quoted options. The help information was
rearranged, and should be viewable once again if you're using B<less>
as your pager. A serious security hole was plugged--you should
immediately remove all older versions of the Perl debugger as
installed in previous releases, all the way back to perl3, from
your system to avoid being bitten by this.
=head1 Improved Documentation
Many of the platform-specific README files are now part of the perl
installation. See L<perl> for the complete list.
=over 4
=item perlapi.pod
The official list of public Perl API functions.
=item perlboot.pod
A tutorial for beginners on object-oriented Perl.
=item perlcompile.pod
An introduction to using the Perl Compiler suite.
=item perldbmfilter.pod
A howto document on using the DBM filter facility.
=item perldebug.pod
All material unrelated to running the Perl debugger, plus all
low-level guts-like details that risked crushing the casual user
of the debugger, have been relocated from the old manpage to the
next entry below.
=item perldebguts.pod
This new manpage contains excessively low-level material not related
to the Perl debugger, but slightly related to debugging Perl itself.
It also contains some arcane internal details of how the debugging
process works that may only be of interest to developers of Perl
debuggers.
=item perlfork.pod
Notes on the fork() emulation currently available for the Windows platform.
=item perlfilter.pod
An introduction to writing Perl source filters.
=item perlhack.pod
Some guidelines for hacking the Perl source code.
=item perlintern.pod
A list of internal functions in the Perl source code.
(List is currently empty.)
=item perllexwarn.pod
Introduction and reference information about lexically scoped
warning categories.
=item perlnumber.pod
Detailed information about numbers as they are represented in Perl.
=item perlopentut.pod
A tutorial on using open() effectively.
=item perlreftut.pod
A tutorial that introduces the essentials of references.
=item perltootc.pod
A tutorial on managing class data for object modules.
=item perltodo.pod
Discussion of the most often wanted features that may someday be
supported in Perl.
=item perlunicode.pod
An introduction to Unicode support features in Perl.
=back
=head1 Performance enhancements
=head2 Simple sort() using { $a <=> $b } and the like are optimized
Many common sort() operations using a simple inlined block are now
optimized for faster performance.
=head2 Optimized assignments to lexical variables
Certain operations in the RHS of assignment statements have been
optimized to directly set the lexical variable on the LHS,
eliminating redundant copying overheads.
=head2 Faster subroutine calls
Minor changes in how subroutine calls are handled internally
provide marginal improvements in performance.
=head2 delete(), each(), values() and hash iteration are faster
The hash values returned by delete(), each(), values() and hashes in a
list context are the actual values in the hash, instead of copies.
This results in significantly better performance, because it eliminates
needless copying in most situations.
=head1 Installation and Configuration Improvements
=head2 -Dusethreads means something different
The -Dusethreads flag now enables the experimental interpreter-based thread
support by default. To get the flavor of experimental threads that was in
5.005 instead, you need to run Configure with "-Dusethreads -Duse5005threads".
As of v5.6.0, interpreter-threads support is still lacking a way to
create new threads from Perl (i.e., C<use Thread;> will not work with
interpreter threads). C<use Thread;> continues to be available when you
specify the -Duse5005threads option to Configure, bugs and all.
NOTE: Support for threads continues to be an experimental feature.
Interfaces and implementation are subject to sudden and drastic changes.
=head2 New Configure flags
The following new flags may be enabled on the Configure command line
by running Configure with C<-Dflag>.
usemultiplicity
usethreads useithreads (new interpreter threads: no Perl API yet)
usethreads use5005threads (threads as they were in 5.005)
use64bitint (equal to now deprecated 'use64bits')
use64bitall
uselongdouble
usemorebits
uselargefiles
usesocks (only SOCKS v5 supported)
=head2 Threadedness and 64-bitness now more daring
The Configure options enabling the use of threads and the use of
64-bitness are now more daring in the sense that they no more have an
explicit list of operating systems of known threads/64-bit
capabilities. In other words: if your operating system has the
necessary APIs and datatypes, you should be able just to go ahead and
use them, for threads by Configure -Dusethreads, and for 64 bits
either explicitly by Configure -Duse64bitint or implicitly if your
system has 64-bit wide datatypes. See also L</"64-bit support">.
=head2 Long Doubles
Some platforms have "long doubles", floating point numbers of even
larger range than ordinary "doubles". To enable using long doubles for
Perl's scalars, use -Duselongdouble.
=head2 -Dusemorebits
You can enable both -Duse64bitint and -Duselongdouble with -Dusemorebits.
See also L</"64-bit support">.
=head2 -Duselargefiles
Some platforms support system APIs that are capable of handling large files
(typically, files larger than two gigabytes). Perl will try to use these
APIs if you ask for -Duselargefiles.
See L</"Large file support"> for more information.
=head2 installusrbinperl
You can use "Configure -Uinstallusrbinperl" which causes installperl
to skip installing perl also as /usr/bin/perl. This is useful if you
prefer not to modify /usr/bin for some reason or another but harmful
because many scripts assume to find Perl in /usr/bin/perl.
=head2 SOCKS support
You can use "Configure -Dusesocks" which causes Perl to probe
for the SOCKS proxy protocol library (v5, not v4). For more information
on SOCKS, see:
http://www.socks.nec.com/
=head2 C<-A> flag
You can "post-edit" the Configure variables using the Configure C<-A>
switch. The editing happens immediately after the platform specific
hints files have been processed but before the actual configuration
process starts. Run C<Configure -h> to find out the full C<-A> syntax.
=head2 Enhanced Installation Directories
The installation structure has been enriched to improve the support
for maintaining multiple versions of perl, to provide locations for
vendor-supplied modules, scripts, and manpages, and to ease maintenance
of locally-added modules, scripts, and manpages. See the section on
Installation Directories in the INSTALL file for complete details.
For most users building and installing from source, the defaults should
be fine.
If you previously used C<Configure -Dsitelib> or C<-Dsitearch> to set
special values for library directories, you might wish to consider using
the new C<-Dsiteprefix> setting instead. Also, if you wish to re-use a
config.sh file from an earlier version of perl, you should be sure to
check that Configure makes sensible choices for the new directories.
See INSTALL for complete details.
=head1 Platform specific changes
=head2 Supported platforms
=over 4
=item *
The Mach CThreads (NEXTSTEP, OPENSTEP) are now supported by the Thread
extension.
=item *
GNU/Hurd is now supported.
=item *
Rhapsody/Darwin is now supported.
=item *
EPOC is now supported (on Psion 5).
=item *
The cygwin port (formerly cygwin32) has been greatly improved.
=back
=head2 DOS
=over 4
=item *
Perl now works with djgpp 2.02 (and 2.03 alpha).
=item *
Environment variable names are not converted to uppercase any more.
=item *
Incorrect exit codes from backticks have been fixed.
=item *
This port continues to use its own builtin globbing (not File::Glob).
=back
=head2 OS390 (OpenEdition MVS)
Support for this EBCDIC platform has not been renewed in this release.
There are difficulties in reconciling Perl's standardization on UTF-8
as its internal representation for characters with the EBCDIC character
set, because the two are incompatible.
It is unclear whether future versions will renew support for this
platform, but the possibility exists.
=head2 VMS
Numerous revisions and extensions to configuration, build, testing, and
installation process to accommodate core changes and VMS-specific options.
Expand %ENV-handling code to allow runtime mapping to logical names,
CLI symbols, and CRTL environ array.
Extension of subprocess invocation code to accept filespecs as command
"verbs".
Add to Perl command line processing the ability to use default file types and
to recognize Unix-style C<2E<gt>&1>.
Expansion of File::Spec::VMS routines, and integration into ExtUtils::MM_VMS.
Extension of ExtUtils::MM_VMS to handle complex extensions more flexibly.
Barewords at start of Unix-syntax paths may be treated as text rather than
only as logical names.
Optional secure translation of several logical names used internally by Perl.
Miscellaneous bugfixing and porting of new core code to VMS.
Thanks are gladly extended to the many people who have contributed VMS
patches, testing, and ideas.
=head2 Win32
Perl can now emulate fork() internally, using multiple interpreters running
in different concurrent threads. This support must be enabled at build
time. See L<perlfork> for detailed information.
When given a pathname that consists only of a drivename, such as C<A:>,
opendir() and stat() now use the current working directory for the drive
rather than the drive root.
The builtin XSUB functions in the Win32:: namespace are documented. See
L<Win32>.
$^X now contains the full path name of the running executable.
A Win32::GetLongPathName() function is provided to complement
Win32::GetFullPathName() and Win32::GetShortPathName(). See L<Win32>.
POSIX::uname() is supported.
system(1,...) now returns true process IDs rather than process
handles. kill() accepts any real process id, rather than strictly
return values from system(1,...).
For better compatibility with Unix, C<kill(0, $pid)> can now be used to
test whether a process exists.
The C<Shell> module is supported.
Better support for building Perl under command.com in Windows 95
has been added.
Scripts are read in binary mode by default to allow ByteLoader (and
the filter mechanism in general) to work properly. For compatibility,
the DATA filehandle will be set to text mode if a carriage return is
detected at the end of the line containing the __END__ or __DATA__
token; if not, the DATA filehandle will be left open in binary mode.
Earlier versions always opened the DATA filehandle in text mode.
The glob() operator is implemented via the C<File::Glob> extension,
which supports glob syntax of the C shell. This increases the flexibility
of the glob() operator, but there may be compatibility issues for
programs that relied on the older globbing syntax. If you want to
preserve compatibility with the older syntax, you might want to run
perl with C<-MFile::DosGlob>. For details and compatibility information,
see L<File::Glob>.
=head1 Significant bug fixes
=head2 <HANDLE> on empty files
With C<$/> set to C<undef>, "slurping" an empty file returns a string of
zero length (instead of C<undef>, as it used to) the first time the
HANDLE is read after C<$/> is set to C<undef>. Further reads yield
C<undef>.
This means that the following will append "foo" to an empty file (it used
to do nothing):
perl -0777 -pi -e 's/^/foo/' empty_file
The behaviour of:
perl -pi -e 's/^/foo/' empty_file
is unchanged (it continues to leave the file empty).
=head2 C<eval '...'> improvements
Line numbers (as reflected by caller() and most diagnostics) within
C<eval '...'> were often incorrect where here documents were involved.
This has been corrected.
Lexical lookups for variables appearing in C<eval '...'> within
functions that were themselves called within an C<eval '...'> were
searching the wrong place for lexicals. The lexical search now
correctly ends at the subroutine's block boundary.
The use of C<return> within C<eval {...}> caused $@ not to be reset
correctly when no exception occurred within the eval. This has
been fixed.
Parsing of here documents used to be flawed when they appeared as
the replacement expression in C<eval 's/.../.../e'>. This has
been fixed.
=head2 All compilation errors are true errors
Some "errors" encountered at compile time were by necessity
generated as warnings followed by eventual termination of the
program. This enabled more such errors to be reported in a
single run, rather than causing a hard stop at the first error
that was encountered.
The mechanism for reporting such errors has been reimplemented
to queue compile-time errors and report them at the end of the
compilation as true errors rather than as warnings. This fixes
cases where error messages leaked through in the form of warnings
when code was compiled at run time using C<eval STRING>, and
also allows such errors to be reliably trapped using C<eval "...">.
=head2 Implicitly closed filehandles are safer
Sometimes implicitly closed filehandles (as when they are localized,
and Perl automatically closes them on exiting the scope) could
inadvertently set $? or $!. This has been corrected.
=head2 Behavior of list slices is more consistent
When taking a slice of a literal list (as opposed to a slice of
an array or hash), Perl used to return an empty list if the
result happened to be composed of all undef values.
The new behavior is to produce an empty list if (and only if)
the original list was empty. Consider the following example:
@a = (1,undef,undef,2)[2,1,2];
The old behavior would have resulted in @a having no elements.
The new behavior ensures it has three undefined elements.
Note in particular that the behavior of slices of the following
cases remains unchanged:
@a = ()[1,2];
@a = (getpwent)[7,0];
@a = (anything_returning_empty_list())[2,1,2];
@a = @b[2,1,2];
@a = @c{'a','b','c'};
See L<perldata>.
=head2 C<(\$)> prototype and C<$foo{a}>
A scalar reference prototype now correctly allows a hash or
array element in that slot.
=head2 C<goto &sub> and AUTOLOAD
The C<goto &sub> construct works correctly when C<&sub> happens
to be autoloaded.
=head2 C<-bareword> allowed under C<use integer>
The autoquoting of barewords preceded by C<-> did not work
in prior versions when the C<integer> pragma was enabled.
This has been fixed.
=head2 Failures in DESTROY()
When code in a destructor threw an exception, it went unnoticed
in earlier versions of Perl, unless someone happened to be
looking in $@ just after the point the destructor happened to
run. Such failures are now visible as warnings when warnings are
enabled.
=head2 Locale bugs fixed
printf() and sprintf() previously reset the numeric locale
back to the default "C" locale. This has been fixed.
Numbers formatted according to the local numeric locale
(such as using a decimal comma instead of a decimal dot) caused
"isn't numeric" warnings, even while the operations accessing
those numbers produced correct results. These warnings have been
discontinued.
=head2 Memory leaks
The C<eval 'return sub {...}'> construct could sometimes leak
memory. This has been fixed.
Operations that aren't filehandle constructors used to leak memory
when used on invalid filehandles. This has been fixed.
Constructs that modified C<@_> could fail to deallocate values
in C<@_> and thus leak memory. This has been corrected.
=head2 Spurious subroutine stubs after failed subroutine calls
Perl could sometimes create empty subroutine stubs when a
subroutine was not found in the package. Such cases stopped
later method lookups from progressing into base packages.
This has been corrected.
=head2 Taint failures under C<-U>
When running in unsafe mode, taint violations could sometimes
cause silent failures. This has been fixed.
=head2 END blocks and the C<-c> switch
Prior versions used to run BEGIN B<and> END blocks when Perl was
run in compile-only mode. Since this is typically not the expected
behavior, END blocks are not executed anymore when the C<-c> switch
is used, or if compilation fails.
See L</"Support for CHECK blocks"> for how to run things when the compile
phase ends.
=head2 Potential to leak DATA filehandles
Using the C<__DATA__> token creates an implicit filehandle to
the file that contains the token. It is the program's
responsibility to close it when it is done reading from it.
This caveat is now better explained in the documentation.
See L<perldata>.
=head1 New or Changed Diagnostics
=over 4
=item "%s" variable %s masks earlier declaration in same %s
(W misc) A "my" or "our" variable has been redeclared in the current scope or statement,
effectively eliminating all access to the previous instance. This is almost
always a typographical error. Note that the earlier variable will still exist
until the end of the scope or until all closure referents to it are
destroyed.
=item "my sub" not yet implemented
(F) Lexically scoped subroutines are not yet implemented. Don't try that
yet.
=item "our" variable %s redeclared
(W misc) You seem to have already declared the same global once before in the
current lexical scope.
=item '!' allowed only after types %s
(F) The '!' is allowed in pack() and unpack() only after certain types.
See L<perlfunc/pack>.
=item / cannot take a count
(F) You had an unpack template indicating a counted-length string,
but you have also specified an explicit size for the string.
See L<perlfunc/pack>.
=item / must be followed by a, A or Z
(F) You had an unpack template indicating a counted-length string,
which must be followed by one of the letters a, A or Z
to indicate what sort of string is to be unpacked.
See L<perlfunc/pack>.
=item / must be followed by a*, A* or Z*
(F) You had a pack template indicating a counted-length string,
Currently the only things that can have their length counted are a*, A* or Z*.
See L<perlfunc/pack>.
=item / must follow a numeric type
(F) You had an unpack template that contained a '#',
but this did not follow some numeric unpack specification.
See L<perlfunc/pack>.
=item /%s/: Unrecognized escape \\%c passed through
(W regexp) You used a backslash-character combination which is not recognized
by Perl. This combination appears in an interpolated variable or a
C<'>-delimited regular expression. The character was understood literally.
=item /%s/: Unrecognized escape \\%c in character class passed through
(W regexp) You used a backslash-character combination which is not recognized
by Perl inside character classes. The character was understood literally.
=item /%s/ should probably be written as "%s"
(W syntax) You have used a pattern where Perl expected to find a string,
as in the first argument to C<join>. Perl will treat the true
or false result of matching the pattern against $_ as the string,
which is probably not what you had in mind.
=item %s() called too early to check prototype
(W prototype) You've called a function that has a prototype before the parser saw a
definition or declaration for it, and Perl could not check that the call
conforms to the prototype. You need to either add an early prototype
declaration for the subroutine in question, or move the subroutine
definition ahead of the call to get proper prototype checking. Alternatively,
if you are certain that you're calling the function correctly, you may put
an ampersand before the name to avoid the warning. See L<perlsub>.
=item %s argument is not a HASH or ARRAY element
(F) The argument to exists() must be a hash or array element, such as:
$foo{$bar}
$ref->{"susie"}[12]
=item %s argument is not a HASH or ARRAY element or slice
(F) The argument to delete() must be either a hash or array element, such as:
$foo{$bar}
$ref->{"susie"}[12]
or a hash or array slice, such as:
@foo[$bar, $baz, $xyzzy]
@{$ref->[12]}{"susie", "queue"}
=item %s argument is not a subroutine name
(F) The argument to exists() for C<exists &sub> must be a subroutine
name, and not a subroutine call. C<exists &sub()> will generate this error.
=item %s package attribute may clash with future reserved word: %s
(W reserved) A lowercase attribute name was used that had a package-specific handler.
That name might have a meaning to Perl itself some day, even though it
doesn't yet. Perhaps you should use a mixed-case attribute name, instead.
See L<attributes>.
=item (in cleanup) %s
(W misc) This prefix usually indicates that a DESTROY() method raised
the indicated exception. Since destructors are usually called by
the system at arbitrary points during execution, and often a vast
number of times, the warning is issued only once for any number
of failures that would otherwise result in the same message being
repeated.
Failure of user callbacks dispatched using the C<G_KEEPERR> flag
could also result in this warning. See L<perlcall/G_KEEPERR>.
=item <> should be quotes
(F) You wrote C<< require <file> >> when you should have written
C<require 'file'>.
=item Attempt to join self
(F) You tried to join a thread from within itself, which is an
impossible task. You may be joining the wrong thread, or you may
need to move the join() to some other thread.
=item Bad evalled substitution pattern
(F) You've used the /e switch to evaluate the replacement for a
substitution, but perl found a syntax error in the code to evaluate,
most likely an unexpected right brace '}'.
=item Bad realloc() ignored
(S) An internal routine called realloc() on something that had never been
malloc()ed in the first place. Mandatory, but can be disabled by
setting environment variable C<PERL_BADFREE> to 1.
=item Bareword found in conditional
(W bareword) The compiler found a bareword where it expected a conditional,
which often indicates that an || or && was parsed as part of the
last argument of the previous construct, for example:
open FOO || die;
It may also indicate a misspelled constant that has been interpreted
as a bareword:
use constant TYPO => 1;
if (TYOP) { print "foo" }
The C<strict> pragma is useful in avoiding such errors.
=item Binary number > 0b11111111111111111111111111111111 non-portable
(W portable) The binary number you specified is larger than 2**32-1
(4294967295) and therefore non-portable between systems. See
L<perlport> for more on portability concerns.
=item Bit vector size > 32 non-portable
(W portable) Using bit vector sizes larger than 32 is non-portable.
=item Buffer overflow in prime_env_iter: %s
(W internal) A warning peculiar to VMS. While Perl was preparing to iterate over
%ENV, it encountered a logical name or symbol definition which was too long,
so it was truncated to the string shown.
=item Can't check filesystem of script "%s"
(P) For some reason you can't check the filesystem of the script for nosuid.
=item Can't declare class for non-scalar %s in "%s"
(S) Currently, only scalar variables can declared with a specific class
qualifier in a "my" or "our" declaration. The semantics may be extended
for other types of variables in future.
=item Can't declare %s in "%s"
(F) Only scalar, array, and hash variables may be declared as "my" or
"our" variables. They must have ordinary identifiers as names.
=item Can't ignore signal CHLD, forcing to default
(W signal) Perl has detected that it is being run with the SIGCHLD signal
(sometimes known as SIGCLD) disabled. Since disabling this signal
will interfere with proper determination of exit status of child
processes, Perl has reset the signal to its default value.
This situation typically indicates that the parent program under
which Perl may be running (e.g., cron) is being very careless.
=item Can't modify non-lvalue subroutine call
(F) Subroutines meant to be used in lvalue context should be declared as
such, see L<perlsub/"Lvalue subroutines">.
=item Can't read CRTL environ
(S) A warning peculiar to VMS. Perl tried to read an element of %ENV
from the CRTL's internal environment array and discovered the array was
missing. You need to figure out where your CRTL misplaced its environ
or define F<PERL_ENV_TABLES> (see L<perlvms>) so that environ is not searched.
=item Can't remove %s: %s, skipping file
(S) You requested an inplace edit without creating a backup file. Perl
was unable to remove the original file to replace it with the modified
file. The file was left unmodified.
=item Can't return %s from lvalue subroutine
(F) Perl detected an attempt to return illegal lvalues (such
as temporary or readonly values) from a subroutine used as an lvalue.
This is not allowed.
=item Can't weaken a nonreference
(F) You attempted to weaken something that was not a reference. Only
references can be weakened.
=item Character class [:%s:] unknown
(F) The class in the character class [: :] syntax is unknown.
See L<perlre>.
=item Character class syntax [%s] belongs inside character classes
(W unsafe) The character class constructs [: :], [= =], and [. .] go
I<inside> character classes, the [] are part of the construct,
for example: /[012[:alpha:]345]/. Note that [= =] and [. .]
are not currently implemented; they are simply placeholders for
future extensions.
=item Constant is not %s reference
(F) A constant value (perhaps declared using the C<use constant> pragma)
is being dereferenced, but it amounts to the wrong type of reference. The
message indicates the type of reference that was expected. This usually
indicates a syntax error in dereferencing the constant value.
See L<perlsub/"Constant Functions"> and L<constant>.
=item constant(%s): %s
(F) The parser found inconsistencies either while attempting to define an
overloaded constant, or when trying to find the character name specified
in the C<\N{...}> escape. Perhaps you forgot to load the corresponding
C<overload> or C<charnames> pragma? See L<charnames> and L<overload>.
=item CORE::%s is not a keyword
(F) The CORE:: namespace is reserved for Perl keywords.
=item defined(@array) is deprecated
(D) defined() is not usually useful on arrays because it checks for an
undefined I<scalar> value. If you want to see if the array is empty,
just use C<if (@array) { # not empty }> for example.
=item defined(%hash) is deprecated
(D) defined() is not usually useful on hashes because it checks for an
undefined I<scalar> value. If you want to see if the hash is empty,
just use C<if (%hash) { # not empty }> for example.
=item Did not produce a valid header
See Server error.
=item (Did you mean "local" instead of "our"?)
(W misc) Remember that "our" does not localize the declared global variable.
You have declared it again in the same lexical scope, which seems superfluous.
=item Document contains no data
See Server error.
=item entering effective %s failed
(F) While under the C<use filetest> pragma, switching the real and
effective uids or gids failed.
=item false [] range "%s" in regexp
(W regexp) A character class range must start and end at a literal character, not
another character class like C<\d> or C<[:alpha:]>. The "-" in your false
range is interpreted as a literal "-". Consider quoting the "-", "\-".
See L<perlre>.
=item Filehandle %s opened only for output
(W io) You tried to read from a filehandle opened only for writing. If you
intended it to be a read/write filehandle, you needed to open it with
"+<" or "+>" or "+>>" instead of with "<" or nothing. If
you intended only to read from the file, use "<". See
L<perlfunc/open>.
=item flock() on closed filehandle %s
(W closed) The filehandle you're attempting to flock() got itself closed some
time before now. Check your logic flow. flock() operates on filehandles.
Are you attempting to call flock() on a dirhandle by the same name?
=item Global symbol "%s" requires explicit package name
(F) You've said "use strict vars", which indicates that all variables
must either be lexically scoped (using "my"), declared beforehand using
"our", or explicitly qualified to say which package the global variable
is in (using "::").
=item Hexadecimal number > 0xffffffff non-portable
(W portable) The hexadecimal number you specified is larger than 2**32-1
(4294967295) and therefore non-portable between systems. See
L<perlport> for more on portability concerns.
=item Ill-formed CRTL environ value "%s"
(W internal) A warning peculiar to VMS. Perl tried to read the CRTL's internal
environ array, and encountered an element without the C<=> delimiter
used to separate keys from values. The element is ignored.
=item Ill-formed message in prime_env_iter: |%s|
(W internal) A warning peculiar to VMS. Perl tried to read a logical name
or CLI symbol definition when preparing to iterate over %ENV, and
didn't see the expected delimiter between key and value, so the
line was ignored.
=item Illegal binary digit %s
(F) You used a digit other than 0 or 1 in a binary number.
=item Illegal binary digit %s ignored
(W digit) You may have tried to use a digit other than 0 or 1 in a binary number.
Interpretation of the binary number stopped before the offending digit.
=item Illegal number of bits in vec
(F) The number of bits in vec() (the third argument) must be a power of
two from 1 to 32 (or 64, if your platform supports that).
=item Integer overflow in %s number
(W overflow) The hexadecimal, octal or binary number you have specified either
as a literal or as an argument to hex() or oct() is too big for your
architecture, and has been converted to a floating point number. On a
32-bit architecture the largest hexadecimal, octal or binary number
representable without overflow is 0xFFFFFFFF, 037777777777, or
0b11111111111111111111111111111111 respectively. Note that Perl
transparently promotes all numbers to a floating point representation
internally--subject to loss of precision errors in subsequent
operations.
=item Invalid %s attribute: %s
The indicated attribute for a subroutine or variable was not recognized
by Perl or by a user-supplied handler. See L<attributes>.
=item Invalid %s attributes: %s
The indicated attributes for a subroutine or variable were not recognized
by Perl or by a user-supplied handler. See L<attributes>.
=item invalid [] range "%s" in regexp
The offending range is now explicitly displayed.
=item Invalid separator character %s in attribute list
(F) Something other than a colon or whitespace was seen between the
elements of an attribute list. If the previous attribute
had a parenthesised parameter list, perhaps that list was terminated
too soon. See L<attributes>.
=item Invalid separator character %s in subroutine attribute list
(F) Something other than a colon or whitespace was seen between the
elements of a subroutine attribute list. If the previous attribute
had a parenthesised parameter list, perhaps that list was terminated
too soon.
=item leaving effective %s failed
(F) While under the C<use filetest> pragma, switching the real and
effective uids or gids failed.
=item Lvalue subs returning %s not implemented yet
(F) Due to limitations in the current implementation, array and hash
values cannot be returned in subroutines used in lvalue context.
See L<perlsub/"Lvalue subroutines">.
=item Method %s not permitted
See Server error.
=item Missing %sbrace%s on \N{}
(F) Wrong syntax of character name literal C<\N{charname}> within
double-quotish context.
=item Missing command in piped open
(W pipe) You used the C<open(FH, "| command")> or C<open(FH, "command |")>
construction, but the command was missing or blank.
=item Missing name in "my sub"
(F) The reserved syntax for lexically scoped subroutines requires that they
have a name with which they can be found.
=item No %s specified for -%c
(F) The indicated command line switch needs a mandatory argument, but
you haven't specified one.
=item No package name allowed for variable %s in "our"
(F) Fully qualified variable names are not allowed in "our" declarations,
because that doesn't make much sense under existing semantics. Such
syntax is reserved for future extensions.
=item No space allowed after -%c
(F) The argument to the indicated command line switch must follow immediately
after the switch, without intervening spaces.
=item no UTC offset information; assuming local time is UTC
(S) A warning peculiar to VMS. Perl was unable to find the local
timezone offset, so it's assuming that local system time is equivalent
to UTC. If it's not, define the logical name F<SYS$TIMEZONE_DIFFERENTIAL>
to translate to the number of seconds which need to be added to UTC to
get local time.
=item Octal number > 037777777777 non-portable
(W portable) The octal number you specified is larger than 2**32-1 (4294967295)
and therefore non-portable between systems. See L<perlport> for more
on portability concerns.
See also L<perlport> for writing portable code.
=item panic: del_backref
(P) Failed an internal consistency check while trying to reset a weak
reference.
=item panic: kid popen errno read
(F) forked child returned an incomprehensible message about its errno.
=item panic: magic_killbackrefs
(P) Failed an internal consistency check while trying to reset all weak
references to an object.
=item Parentheses missing around "%s" list
(W parenthesis) You said something like
my $foo, $bar = @_;
when you meant
my ($foo, $bar) = @_;
Remember that "my", "our", and "local" bind tighter than comma.
=item Possible unintended interpolation of %s in string
(W ambiguous) It used to be that Perl would try to guess whether you
wanted an array interpolated or a literal @. It no longer does this;
arrays are now I<always> interpolated into strings. This means that
if you try something like:
print "fred@example.com";
and the array C<@example> doesn't exist, Perl is going to print
C<fred.com>, which is probably not what you wanted. To get a literal
C<@> sign in a string, put a backslash before it, just as you would
to get a literal C<$> sign.
=item Possible Y2K bug: %s
(W y2k) You are concatenating the number 19 with another number, which
could be a potential Year 2000 problem.
=item pragma "attrs" is deprecated, use "sub NAME : ATTRS" instead
(W deprecated) You have written something like this:
sub doit
{
use attrs qw(locked);
}
You should use the new declaration syntax instead.
sub doit : locked
{
...
The C<use attrs> pragma is now obsolete, and is only provided for
backward-compatibility. See L<perlsub/"Subroutine Attributes">.
=item Premature end of script headers
See Server error.
=item Repeat count in pack overflows
(F) You can't specify a repeat count so large that it overflows
your signed integers. See L<perlfunc/pack>.
=item Repeat count in unpack overflows
(F) You can't specify a repeat count so large that it overflows
your signed integers. See L<perlfunc/unpack>.
=item realloc() of freed memory ignored
(S) An internal routine called realloc() on something that had already
been freed.
=item Reference is already weak
(W misc) You have attempted to weaken a reference that is already weak.
Doing so has no effect.
=item setpgrp can't take arguments
(F) Your system has the setpgrp() from BSD 4.2, which takes no arguments,
unlike POSIX setpgid(), which takes a process ID and process group ID.
=item Strange *+?{} on zero-length expression
(W regexp) You applied a regular expression quantifier in a place where it
makes no sense, such as on a zero-width assertion.
Try putting the quantifier inside the assertion instead. For example,
the way to match "abc" provided that it is followed by three
repetitions of "xyz" is C</abc(?=(?:xyz){3})/>, not C</abc(?=xyz){3}/>.
=item switching effective %s is not implemented
(F) While under the C<use filetest> pragma, we cannot switch the
real and effective uids or gids.
=item This Perl can't reset CRTL environ elements (%s)
=item This Perl can't set CRTL environ elements (%s=%s)
(W internal) Warnings peculiar to VMS. You tried to change or delete an element
of the CRTL's internal environ array, but your copy of Perl wasn't
built with a CRTL that contained the setenv() function. You'll need to
rebuild Perl with a CRTL that does, or redefine F<PERL_ENV_TABLES> (see
L<perlvms>) so that the environ array isn't the target of the change to
%ENV which produced the warning.
=item Too late to run %s block
(W void) A CHECK or INIT block is being defined during run time proper,
when the opportunity to run them has already passed. Perhaps you are
loading a file with C<require> or C<do> when you should be using
C<use> instead. Or perhaps you should put the C<require> or C<do>
inside a BEGIN block.
=item Unknown open() mode '%s'
(F) The second argument of 3-argument open() is not among the list
of valid modes: C<< < >>, C<< > >>, C<<< >> >>>, C<< +< >>,
C<< +> >>, C<<< +>> >>>, C<-|>, C<|->.
=item Unknown process %x sent message to prime_env_iter: %s
(P) An error peculiar to VMS. Perl was reading values for %ENV before
iterating over it, and someone else stuck a message in the stream of
data Perl expected. Someone's very confused, or perhaps trying to
subvert Perl's population of %ENV for nefarious purposes.
=item Unrecognized escape \\%c passed through
(W misc) You used a backslash-character combination which is not recognized
by Perl. The character was understood literally.
=item Unterminated attribute parameter in attribute list
(F) The lexer saw an opening (left) parenthesis character while parsing an
attribute list, but the matching closing (right) parenthesis
character was not found. You may need to add (or remove) a backslash
character to get your parentheses to balance. See L<attributes>.
=item Unterminated attribute list
(F) The lexer found something other than a simple identifier at the start
of an attribute, and it wasn't a semicolon or the start of a
block. Perhaps you terminated the parameter list of the previous attribute
too soon. See L<attributes>.
=item Unterminated attribute parameter in subroutine attribute list
(F) The lexer saw an opening (left) parenthesis character while parsing a
subroutine attribute list, but the matching closing (right) parenthesis
character was not found. You may need to add (or remove) a backslash
character to get your parentheses to balance.
=item Unterminated subroutine attribute list
(F) The lexer found something other than a simple identifier at the start
of a subroutine attribute, and it wasn't a semicolon or the start of a
block. Perhaps you terminated the parameter list of the previous attribute
too soon.
=item Value of CLI symbol "%s" too long
(W misc) A warning peculiar to VMS. Perl tried to read the value of an %ENV
element from a CLI symbol table, and found a resultant string longer
than 1024 characters. The return value has been truncated to 1024
characters.
=item Version number must be a constant number
(P) The attempt to translate a C<use Module n.n LIST> statement into
its equivalent C<BEGIN> block found an internal inconsistency with
the version number.
=back
=head1 New tests
=over 4
=item lib/attrs
Compatibility tests for C<sub : attrs> vs the older C<use attrs>.
=item lib/env
Tests for new environment scalar capability (e.g., C<use Env qw($BAR);>).
=item lib/env-array
Tests for new environment array capability (e.g., C<use Env qw(@PATH);>).
=item lib/io_const
IO constants (SEEK_*, _IO*).
=item lib/io_dir
Directory-related IO methods (new, read, close, rewind, tied delete).
=item lib/io_multihomed
INET sockets with multi-homed hosts.
=item lib/io_poll
IO poll().
=item lib/io_unix
UNIX sockets.
=item op/attrs
Regression tests for C<my ($x,@y,%z) : attrs> and <sub : attrs>.
=item op/filetest
File test operators.
=item op/lex_assign
Verify operations that access pad objects (lexicals and temporaries).
=item op/exists_sub
Verify C<exists &sub> operations.
=back
=head1 Incompatible Changes
=head2 Perl Source Incompatibilities
Beware that any new warnings that have been added or old ones
that have been enhanced are B<not> considered incompatible changes.
Since all new warnings must be explicitly requested via the C<-w>
switch or the C<warnings> pragma, it is ultimately the programmer's
responsibility to ensure that warnings are enabled judiciously.
=over 4
=item CHECK is a new keyword
All subroutine definitions named CHECK are now special. See
C</"Support for CHECK blocks"> for more information.
=item Treatment of list slices of undef has changed
There is a potential incompatibility in the behavior of list slices
that are comprised entirely of undefined values.
See L</"Behavior of list slices is more consistent">.
=item Format of $English::PERL_VERSION is different
The English module now sets $PERL_VERSION to $^V (a string value) rather
than C<$]> (a numeric value). This is a potential incompatibility.
Send us a report via perlbug if you are affected by this.
See L</"Improved Perl version numbering system"> for the reasons for
this change.
=item Literals of the form C<1.2.3> parse differently
Previously, numeric literals with more than one dot in them were
interpreted as a floating point number concatenated with one or more
numbers. Such "numbers" are now parsed as strings composed of the
specified ordinals.
For example, C<print 97.98.99> used to output C<97.9899> in earlier
versions, but now prints C<abc>.
See L</"Support for strings represented as a vector of ordinals">.
=item Possibly changed pseudo-random number generator
Perl programs that depend on reproducing a specific set of pseudo-random
numbers may now produce different output due to improvements made to the
rand() builtin. You can use C<sh Configure -Drandfunc=rand> to obtain
the old behavior.
See L</"Better pseudo-random number generator">.
=item Hashing function for hash keys has changed
Even though Perl hashes are not order preserving, the apparently
random order encountered when iterating on the contents of a hash
is actually determined by the hashing algorithm used. Improvements
in the algorithm may yield a random order that is B<different> from
that of previous versions, especially when iterating on hashes.
See L</"Better worst-case behavior of hashes"> for additional
information.
=item C<undef> fails on read only values
Using the C<undef> operator on a readonly value (such as $1) has
the same effect as assigning C<undef> to the readonly value--it
throws an exception.
=item Close-on-exec bit may be set on pipe and socket handles
Pipe and socket handles are also now subject to the close-on-exec
behavior determined by the special variable $^F.
See L</"More consistent close-on-exec behavior">.
=item Writing C<"$$1"> to mean C<"${$}1"> is unsupported
Perl 5.004 deprecated the interpretation of C<$$1> and
similar within interpolated strings to mean C<$$ . "1">,
but still allowed it.
In Perl 5.6.0 and later, C<"$$1"> always means C<"${$1}">.
=item delete(), each(), values() and C<\(%h)>
operate on aliases to values, not copies
delete(), each(), values() and hashes (e.g. C<\(%h)>)
in a list context return the actual
values in the hash, instead of copies (as they used to in earlier
versions). Typical idioms for using these constructs copy the
returned values, but this can make a significant difference when
creating references to the returned values. Keys in the hash are still
returned as copies when iterating on a hash.
See also L</"delete(), each(), values() and hash iteration are faster">.
=item vec(EXPR,OFFSET,BITS) enforces powers-of-two BITS
vec() generates a run-time error if the BITS argument is not
a valid power-of-two integer.
=item Text of some diagnostic output has changed
Most references to internal Perl operations in diagnostics
have been changed to be more descriptive. This may be an
issue for programs that may incorrectly rely on the exact
text of diagnostics for proper functioning.
=item C<%@> has been removed
The undocumented special variable C<%@> that used to accumulate
"background" errors (such as those that happen in DESTROY())
has been removed, because it could potentially result in memory
leaks.
=item Parenthesized not() behaves like a list operator
The C<not> operator now falls under the "if it looks like a function,
it behaves like a function" rule.
As a result, the parenthesized form can be used with C<grep> and C<map>.
The following construct used to be a syntax error before, but it works
as expected now:
grep not($_), @things;
On the other hand, using C<not> with a literal list slice may not
work. The following previously allowed construct:
print not (1,2,3)[0];
needs to be written with additional parentheses now:
print not((1,2,3)[0]);
The behavior remains unaffected when C<not> is not followed by parentheses.
=item Semantics of bareword prototype C<(*)> have changed
The semantics of the bareword prototype C<*> have changed. Perl 5.005
always coerced simple scalar arguments to a typeglob, which wasn't useful
in situations where the subroutine must distinguish between a simple
scalar and a typeglob. The new behavior is to not coerce bareword
arguments to a typeglob. The value will always be visible as either
a simple scalar or as a reference to a typeglob.
See L</"More functional bareword prototype (*)">.
=item Semantics of bit operators may have changed on 64-bit platforms
If your platform is either natively 64-bit or if Perl has been
configured to used 64-bit integers, i.e., $Config{ivsize} is 8,
there may be a potential incompatibility in the behavior of bitwise
numeric operators (& | ^ ~ << >>). These operators used to strictly
operate on the lower 32 bits of integers in previous versions, but now
operate over the entire native integral width. In particular, note
that unary C<~> will produce different results on platforms that have
different $Config{ivsize}. For portability, be sure to mask off
the excess bits in the result of unary C<~>, e.g., C<~$x & 0xffffffff>.
See L</"Bit operators support full native integer width">.
=item More builtins taint their results
As described in L</"Improved security features">, there may be more
sources of taint in a Perl program.
To avoid these new tainting behaviors, you can build Perl with the
Configure option C<-Accflags=-DINCOMPLETE_TAINTS>. Beware that the
ensuing perl binary may be insecure.
=back
=head2 C Source Incompatibilities
=over 4
=item C<PERL_POLLUTE>
Release 5.005 grandfathered old global symbol names by providing preprocessor
macros for extension source compatibility. As of release 5.6.0, these
preprocessor definitions are not available by default. You need to explicitly
compile perl with C<-DPERL_POLLUTE> to get these definitions. For
extensions still using the old symbols, this option can be
specified via MakeMaker:
perl Makefile.PL POLLUTE=1
=item C<PERL_IMPLICIT_CONTEXT>
This new build option provides a set of macros for all API functions
such that an implicit interpreter/thread context argument is passed to
every API function. As a result of this, something like C<sv_setsv(foo,bar)>
amounts to a macro invocation that actually translates to something like
C<Perl_sv_setsv(my_perl,foo,bar)>. While this is generally expected
to not have any significant source compatibility issues, the difference
between a macro and a real function call will need to be considered.
This means that there B<is> a source compatibility issue as a result of
this if your extensions attempt to use pointers to any of the Perl API
functions.
Note that the above issue is not relevant to the default build of
Perl, whose interfaces continue to match those of prior versions
(but subject to the other options described here).
See L<perlguts/Background and PERL_IMPLICIT_CONTEXT> for detailed information on the
ramifications of building Perl with this option.
NOTE: PERL_IMPLICIT_CONTEXT is automatically enabled whenever Perl is built
with one of -Dusethreads, -Dusemultiplicity, or both. It is not
intended to be enabled by users at this time.
=item C<PERL_POLLUTE_MALLOC>
Enabling Perl's malloc in release 5.005 and earlier caused the namespace of
the system's malloc family of functions to be usurped by the Perl versions,
since by default they used the same names. Besides causing problems on
platforms that do not allow these functions to be cleanly replaced, this
also meant that the system versions could not be called in programs that
used Perl's malloc. Previous versions of Perl have allowed this behaviour
to be suppressed with the HIDEMYMALLOC and EMBEDMYMALLOC preprocessor
definitions.
As of release 5.6.0, Perl's malloc family of functions have default names
distinct from the system versions. You need to explicitly compile perl with
C<-DPERL_POLLUTE_MALLOC> to get the older behaviour. HIDEMYMALLOC
and EMBEDMYMALLOC have no effect, since the behaviour they enabled is now
the default.
Note that these functions do B<not> constitute Perl's memory allocation API.
See L<perlguts/"Memory Allocation"> for further information about that.
=back
=head2 Compatible C Source API Changes
=over 4
=item C<PATCHLEVEL> is now C<PERL_VERSION>
The cpp macros C<PERL_REVISION>, C<PERL_VERSION>, and C<PERL_SUBVERSION>
are now available by default from perl.h, and reflect the base revision,
patchlevel, and subversion respectively. C<PERL_REVISION> had no
prior equivalent, while C<PERL_VERSION> and C<PERL_SUBVERSION> were
previously available as C<PATCHLEVEL> and C<SUBVERSION>.
The new names cause less pollution of the B<cpp> namespace and reflect what
the numbers have come to stand for in common practice. For compatibility,
the old names are still supported when F<patchlevel.h> is explicitly
included (as required before), so there is no source incompatibility
from the change.
=back
=head2 Binary Incompatibilities
In general, the default build of this release is expected to be binary
compatible for extensions built with the 5.005 release or its maintenance
versions. However, specific platforms may have broken binary compatibility
due to changes in the defaults used in hints files. Therefore, please be
sure to always check the platform-specific README files for any notes to
the contrary.
The usethreads or usemultiplicity builds are B<not> binary compatible
with the corresponding builds in 5.005.
On platforms that require an explicit list of exports (AIX, OS/2 and Windows,
among others), purely internal symbols such as parser functions and the
run time opcodes are not exported by default. Perl 5.005 used to export
all functions irrespective of whether they were considered part of the
public API or not.
For the full list of public API functions, see L<perlapi>.
=head1 Known Problems
=head2 Thread test failures
The subtests 19 and 20 of lib/thr5005.t test are known to fail due to
fundamental problems in the 5.005 threading implementation. These are
not new failures--Perl 5.005_0x has the same bugs, but didn't have these
tests.
=head2 EBCDIC platforms not supported
In earlier releases of Perl, EBCDIC environments like OS390 (also
known as Open Edition MVS) and VM-ESA were supported. Due to changes
required by the UTF-8 (Unicode) support, the EBCDIC platforms are not
supported in Perl 5.6.0.
=head2 In 64-bit HP-UX the lib/io_multihomed test may hang
The lib/io_multihomed test may hang in HP-UX if Perl has been
configured to be 64-bit. Because other 64-bit platforms do not
hang in this test, HP-UX is suspect. All other tests pass
in 64-bit HP-UX. The test attempts to create and connect to
"multihomed" sockets (sockets which have multiple IP addresses).
=head2 NEXTSTEP 3.3 POSIX test failure
In NEXTSTEP 3.3p2 the implementation of the strftime(3) in the
operating system libraries is buggy: the %j format numbers the days of
a month starting from zero, which, while being logical to programmers,
will cause the subtests 19 to 27 of the lib/posix test may fail.
=head2 Tru64 (aka Digital UNIX, aka DEC OSF/1) lib/sdbm test failure with gcc
If compiled with gcc 2.95 the lib/sdbm test will fail (dump core).
The cure is to use the vendor cc, it comes with the operating system
and produces good code.
=head2 UNICOS/mk CC failures during Configure run
In UNICOS/mk the following errors may appear during the Configure run:
Guessing which symbols your C compiler and preprocessor define...
CC-20 cc: ERROR File = try.c, Line = 3
...
bad switch yylook 79bad switch yylook 79bad switch yylook 79bad switch yylook 79#ifdef A29K
...
4 errors detected in the compilation of "try.c".
The culprit is the broken awk of UNICOS/mk. The effect is fortunately
rather mild: Perl itself is not adversely affected by the error, only
the h2ph utility coming with Perl, and that is rather rarely needed
these days.
=head2 Arrow operator and arrays
When the left argument to the arrow operator C<< -> >> is an array, or
the C<scalar> operator operating on an array, the result of the
operation must be considered erroneous. For example:
@x->[2]
scalar(@x)->[2]
These expressions will get run-time errors in some future release of
Perl.
=head2 Experimental features
As discussed above, many features are still experimental. Interfaces and
implementation of these features are subject to change, and in extreme cases,
even subject to removal in some future release of Perl. These features
include the following:
=over 4
=item Threads
=item Unicode
=item 64-bit support
=item Lvalue subroutines
=item Weak references
=item The pseudo-hash data type
=item The Compiler suite
=item Internal implementation of file globbing
=item The DB module
=item The regular expression code constructs:
C<(?{ code })> and C<(??{ code })>
=back
=head1 Obsolete Diagnostics
=over 4
=item Character class syntax [: :] is reserved for future extensions
(W) Within regular expression character classes ([]) the syntax beginning
with "[:" and ending with ":]" is reserved for future extensions.
If you need to represent those character sequences inside a regular
expression character class, just quote the square brackets with the
backslash: "\[:" and ":\]".
=item Ill-formed logical name |%s| in prime_env_iter
(W) A warning peculiar to VMS. A logical name was encountered when preparing
to iterate over %ENV which violates the syntactic rules governing logical
names. Because it cannot be translated normally, it is skipped, and will not
appear in %ENV. This may be a benign occurrence, as some software packages
might directly modify logical name tables and introduce nonstandard names,
or it may indicate that a logical name table has been corrupted.
=item In string, @%s now must be written as \@%s
The description of this error used to say:
(Someday it will simply assume that an unbackslashed @
interpolates an array.)
That day has come, and this fatal error has been removed. It has been
replaced by a non-fatal warning instead.
See L</Arrays now always interpolate into double-quoted strings> for
details.
=item Probable precedence problem on %s
(W) The compiler found a bareword where it expected a conditional,
which often indicates that an || or && was parsed as part of the
last argument of the previous construct, for example:
open FOO || die;
=item regexp too big
(F) The current implementation of regular expressions uses shorts as
address offsets within a string. Unfortunately this means that if
the regular expression compiles to longer than 32767, it'll blow up.
Usually when you want a regular expression this big, there is a better
way to do it with multiple statements. See L<perlre>.
=item Use of "$$<digit>" to mean "${$}<digit>" is deprecated
(D) Perl versions before 5.004 misinterpreted any type marker followed
by "$" and a digit. For example, "$$0" was incorrectly taken to mean
"${$}0" instead of "${$0}". This bug is (mostly) fixed in Perl 5.004.
However, the developers of Perl 5.004 could not fix this bug completely,
because at least two widely-used modules depend on the old meaning of
"$$0" in a string. So Perl 5.004 still interprets "$$<digit>" in the
old (broken) way inside strings; but it generates this message as a
warning. And in Perl 5.005, this special treatment will cease.
=back
=head1 Reporting Bugs
If you find what you think is a bug, you might check the
articles recently posted to the comp.lang.perl.misc newsgroup.
There may also be information at http://www.perl.com/perl/ , the Perl
Home Page.
If you believe you have an unreported bug, please run the B<perlbug>
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
=head1 SEE ALSO
The F<Changes> file for exhaustive details on what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=head1 HISTORY
Written by Gurusamy Sarathy <F<gsar@activestate.com>>, with many
contributions from The Perl Porters.
Send omissions or corrections to <F<perlbug@perl.org>>.
=cut
PK {3�Z�`� perlhacktut.podnu �[��� =encoding utf8
=for comment
Consistent formatting of this file is achieved with:
perl ./Porting/podtidy pod/perlhacktut.pod
=head1 NAME
perlhacktut - Walk through the creation of a simple C code patch
=head1 DESCRIPTION
This document takes you through a simple patch example.
If you haven't read L<perlhack> yet, go do that first! You might also
want to read through L<perlsource> too.
Once you're done here, check out L<perlhacktips> next.
=head1 EXAMPLE OF A SIMPLE PATCH
Let's take a simple patch from start to finish.
Here's something Larry suggested: if a C<U> is the first active format
during a C<pack>, (for example, C<pack "U3C8", @stuff>) then the
resulting string should be treated as UTF-8 encoded.
If you are working with a git clone of the Perl repository, you will
want to create a branch for your changes. This will make creating a
proper patch much simpler. See the L<perlgit> for details on how to do
this.
=head2 Writing the patch
How do we prepare to fix this up? First we locate the code in question
- the C<pack> happens at runtime, so it's going to be in one of the
F<pp> files. Sure enough, C<pp_pack> is in F<pp.c>. Since we're going
to be altering this file, let's copy it to F<pp.c~>.
[Well, it was in F<pp.c> when this tutorial was written. It has now
been split off with C<pp_unpack> to its own file, F<pp_pack.c>]
Now let's look over C<pp_pack>: we take a pattern into C<pat>, and then
loop over the pattern, taking each format character in turn into
C<datum_type>. Then for each possible format character, we swallow up
the other arguments in the pattern (a field width, an asterisk, and so
on) and convert the next chunk input into the specified format, adding
it onto the output SV C<cat>.
How do we know if the C<U> is the first format in the C<pat>? Well, if
we have a pointer to the start of C<pat> then, if we see a C<U> we can
test whether we're still at the start of the string. So, here's where
C<pat> is set up:
STRLEN fromlen;
char *pat = SvPVx(*++MARK, fromlen);
char *patend = pat + fromlen;
I32 len;
I32 datumtype;
SV *fromstr;
We'll have another string pointer in there:
STRLEN fromlen;
char *pat = SvPVx(*++MARK, fromlen);
char *patend = pat + fromlen;
+ char *patcopy;
I32 len;
I32 datumtype;
SV *fromstr;
And just before we start the loop, we'll set C<patcopy> to be the start
of C<pat>:
items = SP - MARK;
MARK++;
SvPVCLEAR(cat);
+ patcopy = pat;
while (pat < patend) {
Now if we see a C<U> which was at the start of the string, we turn on
the C<UTF8> flag for the output SV, C<cat>:
+ if (datumtype == 'U' && pat==patcopy+1)
+ SvUTF8_on(cat);
if (datumtype == '#') {
while (pat < patend && *pat != '\n')
pat++;
Remember that it has to be C<patcopy+1> because the first character of
the string is the C<U> which has been swallowed into C<datumtype!>
Oops, we forgot one thing: what if there are spaces at the start of the
pattern? C<pack(" U*", @stuff)> will have C<U> as the first active
character, even though it's not the first thing in the pattern. In this
case, we have to advance C<patcopy> along with C<pat> when we see
spaces:
if (isSPACE(datumtype))
continue;
needs to become
if (isSPACE(datumtype)) {
patcopy++;
continue;
}
OK. That's the C part done. Now we must do two additional things before
this patch is ready to go: we've changed the behaviour of Perl, and so
we must document that change. We must also provide some more regression
tests to make sure our patch works and doesn't create a bug somewhere
else along the line.
=head2 Testing the patch
The regression tests for each operator live in F<t/op/>, and so we make
a copy of F<t/op/pack.t> to F<t/op/pack.t~>. Now we can add our tests
to the end. First, we'll test that the C<U> does indeed create Unicode
strings.
t/op/pack.t has a sensible ok() function, but if it didn't we could use
the one from t/test.pl.
require './test.pl';
plan( tests => 159 );
so instead of this:
print 'not ' unless "1.20.300.4000" eq sprintf "%vd",
pack("U*",1,20,300,4000);
print "ok $test\n"; $test++;
we can write the more sensible (see L<Test::More> for a full
explanation of is() and other testing functions).
is( "1.20.300.4000", sprintf "%vd", pack("U*",1,20,300,4000),
"U* produces Unicode" );
Now we'll test that we got that space-at-the-beginning business right:
is( "1.20.300.4000", sprintf "%vd", pack(" U*",1,20,300,4000),
" with spaces at the beginning" );
And finally we'll test that we don't make Unicode strings if C<U> is
B<not> the first active format:
isnt( v1.20.300.4000, sprintf "%vd", pack("C0U*",1,20,300,4000),
"U* not first isn't Unicode" );
Mustn't forget to change the number of tests which appears at the top,
or else the automated tester will get confused. This will either look
like this:
print "1..156\n";
or this:
plan( tests => 156 );
We now compile up Perl, and run it through the test suite. Our new
tests pass, hooray!
=head2 Documenting the patch
Finally, the documentation. The job is never done until the paperwork
is over, so let's describe the change we've just made. The relevant
place is F<pod/perlfunc.pod>; again, we make a copy, and then we'll
insert this text in the description of C<pack>:
=item *
If the pattern begins with a C<U>, the resulting string will be treated
as UTF-8-encoded Unicode. You can force UTF-8 encoding on in a string
with an initial C<U0>, and the bytes that follow will be interpreted as
Unicode characters. If you don't want this to happen, you can begin
your pattern with C<C0> (or anything else) to force Perl not to UTF-8
encode your string, and then follow this with a C<U*> somewhere in your
pattern.
=head2 Submit
See L<perlhack> for details on how to submit this patch.
=head1 AUTHOR
This document was originally written by Nathan Torkington, and is
maintained by the perl5-porters mailing list.
PK {3�ZPnS�� ��
perlretut.podnu �[��� =head1 NAME
perlretut - Perl regular expressions tutorial
=head1 DESCRIPTION
This page provides a basic tutorial on understanding, creating and
using regular expressions in Perl. It serves as a complement to the
reference page on regular expressions L<perlre>. Regular expressions
are an integral part of the C<m//>, C<s///>, C<qr//> and C<split>
operators and so this tutorial also overlaps with
L<perlop/"Regexp Quote-Like Operators"> and L<perlfunc/split>.
Perl is widely renowned for excellence in text processing, and regular
expressions are one of the big factors behind this fame. Perl regular
expressions display an efficiency and flexibility unknown in most
other computer languages. Mastering even the basics of regular
expressions will allow you to manipulate text with surprising ease.
What is a regular expression? At its most basic, a regular expression
is a template that is used to determine if a string has certain
characteristics. The string is most often some text, such as a line,
sentence, web page, or even a whole book, but less commonly it could be
some binary data as well.
Suppose we want to determine if the text in variable, C<$var> contains
the sequence of characters S<C<m u s h r o o m>>
(blanks added for legibility). We can write in Perl
$var =~ m/mushroom/
The value of this expression will be TRUE if C<$var> contains that
sequence of characters, and FALSE otherwise. The portion enclosed in
C<'E<sol>'> characters denotes the characteristic we are looking for.
We use the term I<pattern> for it. The process of looking to see if the
pattern occurs in the string is called I<matching>, and the C<"=~">
operator along with the C<m//> tell Perl to try to match the pattern
against the string. Note that the pattern is also a string, but a very
special kind of one, as we will see. Patterns are in common use these
days;
examples are the patterns typed into a search engine to find web pages
and the patterns used to list files in a directory, I<e.g.>, "C<ls *.txt>"
or "C<dir *.*>". In Perl, the patterns described by regular expressions
are used not only to search strings, but to also extract desired parts
of strings, and to do search and replace operations.
Regular expressions have the undeserved reputation of being abstract
and difficult to understand. This really stems simply because the
notation used to express them tends to be terse and dense, and not
because of inherent complexity. We recommend using the C</x> regular
expression modifier (described below) along with plenty of white space
to make them less dense, and easier to read. Regular expressions are
constructed using
simple concepts like conditionals and loops and are no more difficult
to understand than the corresponding C<if> conditionals and C<while>
loops in the Perl language itself.
This tutorial flattens the learning curve by discussing regular
expression concepts, along with their notation, one at a time and with
many examples. The first part of the tutorial will progress from the
simplest word searches to the basic regular expression concepts. If
you master the first part, you will have all the tools needed to solve
about 98% of your needs. The second part of the tutorial is for those
comfortable with the basics and hungry for more power tools. It
discusses the more advanced regular expression operators and
introduces the latest cutting-edge innovations.
A note: to save time, "regular expression" is often abbreviated as
regexp or regex. Regexp is a more natural abbreviation than regex, but
is harder to pronounce. The Perl pod documentation is evenly split on
regexp vs regex; in Perl, there is more than one way to abbreviate it.
We'll use regexp in this tutorial.
New in v5.22, L<C<use re 'strict'>|re/'strict' mode> applies stricter
rules than otherwise when compiling regular expression patterns. It can
find things that, while legal, may not be what you intended.
=head1 Part 1: The basics
=head2 Simple word matching
The simplest regexp is simply a word, or more generally, a string of
characters. A regexp consisting of just a word matches any string that
contains that word:
"Hello World" =~ /World/; # matches
What is this Perl statement all about? C<"Hello World"> is a simple
double-quoted string. C<World> is the regular expression and the
C<//> enclosing C</World/> tells Perl to search a string for a match.
The operator C<=~> associates the string with the regexp match and
produces a true value if the regexp matched, or false if the regexp
did not match. In our case, C<World> matches the second word in
C<"Hello World">, so the expression is true. Expressions like this
are useful in conditionals:
if ("Hello World" =~ /World/) {
print "It matches\n";
}
else {
print "It doesn't match\n";
}
There are useful variations on this theme. The sense of the match can
be reversed by using the C<!~> operator:
if ("Hello World" !~ /World/) {
print "It doesn't match\n";
}
else {
print "It matches\n";
}
The literal string in the regexp can be replaced by a variable:
my $greeting = "World";
if ("Hello World" =~ /$greeting/) {
print "It matches\n";
}
else {
print "It doesn't match\n";
}
If you're matching against the special default variable C<$_>, the
C<$_ =~> part can be omitted:
$_ = "Hello World";
if (/World/) {
print "It matches\n";
}
else {
print "It doesn't match\n";
}
And finally, the C<//> default delimiters for a match can be changed
to arbitrary delimiters by putting an C<'m'> out front:
"Hello World" =~ m!World!; # matches, delimited by '!'
"Hello World" =~ m{World}; # matches, note the matching '{}'
"/usr/bin/perl" =~ m"/perl"; # matches after '/usr/bin',
# '/' becomes an ordinary char
C</World/>, C<m!World!>, and C<m{World}> all represent the
same thing. When, I<e.g.>, the quote (C<'"'>) is used as a delimiter, the forward
slash C<'/'> becomes an ordinary character and can be used in this regexp
without trouble.
Let's consider how different regexps would match C<"Hello World">:
"Hello World" =~ /world/; # doesn't match
"Hello World" =~ /o W/; # matches
"Hello World" =~ /oW/; # doesn't match
"Hello World" =~ /World /; # doesn't match
The first regexp C<world> doesn't match because regexps are
case-sensitive. The second regexp matches because the substring
S<C<'o W'>> occurs in the string S<C<"Hello World">>. The space
character C<' '> is treated like any other character in a regexp and is
needed to match in this case. The lack of a space character is the
reason the third regexp C<'oW'> doesn't match. The fourth regexp
"C<World >" doesn't match because there is a space at the end of the
regexp, but not at the end of the string. The lesson here is that
regexps must match a part of the string I<exactly> in order for the
statement to be true.
If a regexp matches in more than one place in the string, Perl will
always match at the earliest possible point in the string:
"Hello World" =~ /o/; # matches 'o' in 'Hello'
"That hat is red" =~ /hat/; # matches 'hat' in 'That'
With respect to character matching, there are a few more points you
need to know about. First of all, not all characters can be used "as
is" in a match. Some characters, called I<metacharacters>, are reserved
for use in regexp notation. The metacharacters are
{}[]()^$.|*+?-\
The significance of each of these will be explained
in the rest of the tutorial, but for now, it is important only to know
that a metacharacter can be matched as-is by putting a backslash before
it:
"2+2=4" =~ /2+2/; # doesn't match, + is a metacharacter
"2+2=4" =~ /2\+2/; # matches, \+ is treated like an ordinary +
"The interval is [0,1)." =~ /[0,1)./ # is a syntax error!
"The interval is [0,1)." =~ /\[0,1\)\./ # matches
"#!/usr/bin/perl" =~ /#!\/usr\/bin\/perl/; # matches
In the last regexp, the forward slash C<'/'> is also backslashed,
because it is used to delimit the regexp. This can lead to LTS
(leaning toothpick syndrome), however, and it is often more readable
to change delimiters.
"#!/usr/bin/perl" =~ m!#\!/usr/bin/perl!; # easier to read
The backslash character C<'\'> is a metacharacter itself and needs to
be backslashed:
'C:\WIN32' =~ /C:\\WIN/; # matches
In situations where it doesn't make sense for a particular metacharacter
to mean what it normally does, it automatically loses its
metacharacter-ness and becomes an ordinary character that is to be
matched literally. For example, the C<'}'> is a metacharacter only when
it is the mate of a C<'{'> metacharacter. Otherwise it is treated as a
literal RIGHT CURLY BRACKET. This may lead to unexpected results.
L<C<use re 'strict'>|re/'strict' mode> can catch some of these.
In addition to the metacharacters, there are some ASCII characters
which don't have printable character equivalents and are instead
represented by I<escape sequences>. Common examples are C<\t> for a
tab, C<\n> for a newline, C<\r> for a carriage return and C<\a> for a
bell (or alert). If your string is better thought of as a sequence of arbitrary
bytes, the octal escape sequence, I<e.g.>, C<\033>, or hexadecimal escape
sequence, I<e.g.>, C<\x1B> may be a more natural representation for your
bytes. Here are some examples of escapes:
"1000\t2000" =~ m(0\t2) # matches
"1000\n2000" =~ /0\n20/ # matches
"1000\t2000" =~ /\000\t2/ # doesn't match, "0" ne "\000"
"cat" =~ /\o{143}\x61\x74/ # matches in ASCII, but a weird way
# to spell cat
If you've been around Perl a while, all this talk of escape sequences
may seem familiar. Similar escape sequences are used in double-quoted
strings and in fact the regexps in Perl are mostly treated as
double-quoted strings. This means that variables can be used in
regexps as well. Just like double-quoted strings, the values of the
variables in the regexp will be substituted in before the regexp is
evaluated for matching purposes. So we have:
$foo = 'house';
'housecat' =~ /$foo/; # matches
'cathouse' =~ /cat$foo/; # matches
'housecat' =~ /${foo}cat/; # matches
So far, so good. With the knowledge above you can already perform
searches with just about any literal string regexp you can dream up.
Here is a I<very simple> emulation of the Unix grep program:
% cat > simple_grep
#!/usr/bin/perl
$regexp = shift;
while (<>) {
print if /$regexp/;
}
^D
% chmod +x simple_grep
% simple_grep abba /usr/dict/words
Babbage
cabbage
cabbages
sabbath
Sabbathize
Sabbathizes
sabbatical
scabbard
scabbards
This program is easy to understand. C<#!/usr/bin/perl> is the standard
way to invoke a perl program from the shell.
S<C<$regexp = shift;>> saves the first command line argument as the
regexp to be used, leaving the rest of the command line arguments to
be treated as files. S<C<< while (<>) >>> loops over all the lines in
all the files. For each line, S<C<print if /$regexp/;>> prints the
line if the regexp matches the line. In this line, both C<print> and
C</$regexp/> use the default variable C<$_> implicitly.
With all of the regexps above, if the regexp matched anywhere in the
string, it was considered a match. Sometimes, however, we'd like to
specify I<where> in the string the regexp should try to match. To do
this, we would use the I<anchor> metacharacters C<'^'> and C<'$'>. The
anchor C<'^'> means match at the beginning of the string and the anchor
C<'$'> means match at the end of the string, or before a newline at the
end of the string. Here is how they are used:
"housekeeper" =~ /keeper/; # matches
"housekeeper" =~ /^keeper/; # doesn't match
"housekeeper" =~ /keeper$/; # matches
"housekeeper\n" =~ /keeper$/; # matches
The second regexp doesn't match because C<'^'> constrains C<keeper> to
match only at the beginning of the string, but C<"housekeeper"> has
keeper starting in the middle. The third regexp does match, since the
C<'$'> constrains C<keeper> to match only at the end of the string.
When both C<'^'> and C<'$'> are used at the same time, the regexp has to
match both the beginning and the end of the string, I<i.e.>, the regexp
matches the whole string. Consider
"keeper" =~ /^keep$/; # doesn't match
"keeper" =~ /^keeper$/; # matches
"" =~ /^$/; # ^$ matches an empty string
The first regexp doesn't match because the string has more to it than
C<keep>. Since the second regexp is exactly the string, it
matches. Using both C<'^'> and C<'$'> in a regexp forces the complete
string to match, so it gives you complete control over which strings
match and which don't. Suppose you are looking for a fellow named
bert, off in a string by himself:
"dogbert" =~ /bert/; # matches, but not what you want
"dilbert" =~ /^bert/; # doesn't match, but ..
"bertram" =~ /^bert/; # matches, so still not good enough
"bertram" =~ /^bert$/; # doesn't match, good
"dilbert" =~ /^bert$/; # doesn't match, good
"bert" =~ /^bert$/; # matches, perfect
Of course, in the case of a literal string, one could just as easily
use the string comparison S<C<$string eq 'bert'>> and it would be
more efficient. The C<^...$> regexp really becomes useful when we
add in the more powerful regexp tools below.
=head2 Using character classes
Although one can already do quite a lot with the literal string
regexps above, we've only scratched the surface of regular expression
technology. In this and subsequent sections we will introduce regexp
concepts (and associated metacharacter notations) that will allow a
regexp to represent not just a single character sequence, but a I<whole
class> of them.
One such concept is that of a I<character class>. A character class
allows a set of possible characters, rather than just a single
character, to match at a particular point in a regexp. You can define
your own custom character classes. These
are denoted by brackets C<[...]>, with the set of characters
to be possibly matched inside. Here are some examples:
/cat/; # matches 'cat'
/[bcr]at/; # matches 'bat, 'cat', or 'rat'
/item[0123456789]/; # matches 'item0' or ... or 'item9'
"abc" =~ /[cab]/; # matches 'a'
In the last statement, even though C<'c'> is the first character in
the class, C<'a'> matches because the first character position in the
string is the earliest point at which the regexp can match.
/[yY][eE][sS]/; # match 'yes' in a case-insensitive way
# 'yes', 'Yes', 'YES', etc.
This regexp displays a common task: perform a case-insensitive
match. Perl provides a way of avoiding all those brackets by simply
appending an C<'i'> to the end of the match. Then C</[yY][eE][sS]/;>
can be rewritten as C</yes/i;>. The C<'i'> stands for
case-insensitive and is an example of a I<modifier> of the matching
operation. We will meet other modifiers later in the tutorial.
We saw in the section above that there were ordinary characters, which
represented themselves, and special characters, which needed a
backslash C<'\'> to represent themselves. The same is true in a
character class, but the sets of ordinary and special characters
inside a character class are different than those outside a character
class. The special characters for a character class are C<-]\^$> (and
the pattern delimiter, whatever it is).
C<']'> is special because it denotes the end of a character class. C<'$'> is
special because it denotes a scalar variable. C<'\'> is special because
it is used in escape sequences, just like above. Here is how the
special characters C<]$\> are handled:
/[\]c]def/; # matches ']def' or 'cdef'
$x = 'bcr';
/[$x]at/; # matches 'bat', 'cat', or 'rat'
/[\$x]at/; # matches '$at' or 'xat'
/[\\$x]at/; # matches '\at', 'bat, 'cat', or 'rat'
The last two are a little tricky. In C<[\$x]>, the backslash protects
the dollar sign, so the character class has two members C<'$'> and C<'x'>.
In C<[\\$x]>, the backslash is protected, so C<$x> is treated as a
variable and substituted in double quote fashion.
The special character C<'-'> acts as a range operator within character
classes, so that a contiguous set of characters can be written as a
range. With ranges, the unwieldy C<[0123456789]> and C<[abc...xyz]>
become the svelte C<[0-9]> and C<[a-z]>. Some examples are
/item[0-9]/; # matches 'item0' or ... or 'item9'
/[0-9bx-z]aa/; # matches '0aa', ..., '9aa',
# 'baa', 'xaa', 'yaa', or 'zaa'
/[0-9a-fA-F]/; # matches a hexadecimal digit
/[0-9a-zA-Z_]/; # matches a "word" character,
# like those in a Perl variable name
If C<'-'> is the first or last character in a character class, it is
treated as an ordinary character; C<[-ab]>, C<[ab-]> and C<[a\-b]> are
all equivalent.
The special character C<'^'> in the first position of a character class
denotes a I<negated character class>, which matches any character but
those in the brackets. Both C<[...]> and C<[^...]> must match a
character, or the match fails. Then
/[^a]at/; # doesn't match 'aat' or 'at', but matches
# all other 'bat', 'cat, '0at', '%at', etc.
/[^0-9]/; # matches a non-numeric character
/[a^]at/; # matches 'aat' or '^at'; here '^' is ordinary
Now, even C<[0-9]> can be a bother to write multiple times, so in the
interest of saving keystrokes and making regexps more readable, Perl
has several abbreviations for common character classes, as shown below.
Since the introduction of Unicode, unless the C</a> modifier is in
effect, these character classes match more than just a few characters in
the ASCII range.
=over 4
=item *
C<\d> matches a digit, not just C<[0-9]> but also digits from non-roman scripts
=item *
C<\s> matches a whitespace character, the set C<[\ \t\r\n\f]> and others
=item *
C<\w> matches a word character (alphanumeric or C<'_'>), not just C<[0-9a-zA-Z_]>
but also digits and characters from non-roman scripts
=item *
C<\D> is a negated C<\d>; it represents any other character than a digit, or C<[^\d]>
=item *
C<\S> is a negated C<\s>; it represents any non-whitespace character C<[^\s]>
=item *
C<\W> is a negated C<\w>; it represents any non-word character C<[^\w]>
=item *
The period C<'.'> matches any character but C<"\n"> (unless the modifier C</s> is
in effect, as explained below).
=item *
C<\N>, like the period, matches any character but C<"\n">, but it does so
regardless of whether the modifier C</s> is in effect.
=back
The C</a> modifier, available starting in Perl 5.14, is used to
restrict the matches of C<\d>, C<\s>, and C<\w> to just those in the ASCII range.
It is useful to keep your program from being needlessly exposed to full
Unicode (and its accompanying security considerations) when all you want
is to process English-like text. (The "a" may be doubled, C</aa>, to
provide even more restrictions, preventing case-insensitive matching of
ASCII with non-ASCII characters; otherwise a Unicode "Kelvin Sign"
would caselessly match a "k" or "K".)
The C<\d\s\w\D\S\W> abbreviations can be used both inside and outside
of bracketed character classes. Here are some in use:
/\d\d:\d\d:\d\d/; # matches a hh:mm:ss time format
/[\d\s]/; # matches any digit or whitespace character
/\w\W\w/; # matches a word char, followed by a
# non-word char, followed by a word char
/..rt/; # matches any two chars, followed by 'rt'
/end\./; # matches 'end.'
/end[.]/; # same thing, matches 'end.'
Because a period is a metacharacter, it needs to be escaped to match
as an ordinary period. Because, for example, C<\d> and C<\w> are sets
of characters, it is incorrect to think of C<[^\d\w]> as C<[\D\W]>; in
fact C<[^\d\w]> is the same as C<[^\w]>, which is the same as
C<[\W]>. Think DeMorgan's laws.
In actuality, the period and C<\d\s\w\D\S\W> abbreviations are
themselves types of character classes, so the ones surrounded by
brackets are just one type of character class. When we need to make a
distinction, we refer to them as "bracketed character classes."
An anchor useful in basic regexps is the I<word anchor>
C<\b>. This matches a boundary between a word character and a non-word
character C<\w\W> or C<\W\w>:
$x = "Housecat catenates house and cat";
$x =~ /cat/; # matches cat in 'housecat'
$x =~ /\bcat/; # matches cat in 'catenates'
$x =~ /cat\b/; # matches cat in 'housecat'
$x =~ /\bcat\b/; # matches 'cat' at end of string
Note in the last example, the end of the string is considered a word
boundary.
For natural language processing (so that, for example, apostrophes are
included in words), use instead C<\b{wb}>
"don't" =~ / .+? \b{wb} /x; # matches the whole string
You might wonder why C<'.'> matches everything but C<"\n"> - why not
every character? The reason is that often one is matching against
lines and would like to ignore the newline characters. For instance,
while the string C<"\n"> represents one line, we would like to think
of it as empty. Then
"" =~ /^$/; # matches
"\n" =~ /^$/; # matches, $ anchors before "\n"
"" =~ /./; # doesn't match; it needs a char
"" =~ /^.$/; # doesn't match; it needs a char
"\n" =~ /^.$/; # doesn't match; it needs a char other than "\n"
"a" =~ /^.$/; # matches
"a\n" =~ /^.$/; # matches, $ anchors before "\n"
This behavior is convenient, because we usually want to ignore
newlines when we count and match characters in a line. Sometimes,
however, we want to keep track of newlines. We might even want C<'^'>
and C<'$'> to anchor at the beginning and end of lines within the
string, rather than just the beginning and end of the string. Perl
allows us to choose between ignoring and paying attention to newlines
by using the C</s> and C</m> modifiers. C</s> and C</m> stand for
single line and multi-line and they determine whether a string is to
be treated as one continuous string, or as a set of lines. The two
modifiers affect two aspects of how the regexp is interpreted: 1) how
the C<'.'> character class is defined, and 2) where the anchors C<'^'>
and C<'$'> are able to match. Here are the four possible combinations:
=over 4
=item *
no modifiers: Default behavior. C<'.'> matches any character
except C<"\n">. C<'^'> matches only at the beginning of the string and
C<'$'> matches only at the end or before a newline at the end.
=item *
s modifier (C</s>): Treat string as a single long line. C<'.'> matches
any character, even C<"\n">. C<'^'> matches only at the beginning of
the string and C<'$'> matches only at the end or before a newline at the
end.
=item *
m modifier (C</m>): Treat string as a set of multiple lines. C<'.'>
matches any character except C<"\n">. C<'^'> and C<'$'> are able to match
at the start or end of I<any> line within the string.
=item *
both s and m modifiers (C</sm>): Treat string as a single long line, but
detect multiple lines. C<'.'> matches any character, even
C<"\n">. C<'^'> and C<'$'>, however, are able to match at the start or end
of I<any> line within the string.
=back
Here are examples of C</s> and C</m> in action:
$x = "There once was a girl\nWho programmed in Perl\n";
$x =~ /^Who/; # doesn't match, "Who" not at start of string
$x =~ /^Who/s; # doesn't match, "Who" not at start of string
$x =~ /^Who/m; # matches, "Who" at start of second line
$x =~ /^Who/sm; # matches, "Who" at start of second line
$x =~ /girl.Who/; # doesn't match, "." doesn't match "\n"
$x =~ /girl.Who/s; # matches, "." matches "\n"
$x =~ /girl.Who/m; # doesn't match, "." doesn't match "\n"
$x =~ /girl.Who/sm; # matches, "." matches "\n"
Most of the time, the default behavior is what is wanted, but C</s> and
C</m> are occasionally very useful. If C</m> is being used, the start
of the string can still be matched with C<\A> and the end of the string
can still be matched with the anchors C<\Z> (matches both the end and
the newline before, like C<'$'>), and C<\z> (matches only the end):
$x =~ /^Who/m; # matches, "Who" at start of second line
$x =~ /\AWho/m; # doesn't match, "Who" is not at start of string
$x =~ /girl$/m; # matches, "girl" at end of first line
$x =~ /girl\Z/m; # doesn't match, "girl" is not at end of string
$x =~ /Perl\Z/m; # matches, "Perl" is at newline before end
$x =~ /Perl\z/m; # doesn't match, "Perl" is not at end of string
We now know how to create choices among classes of characters in a
regexp. What about choices among words or character strings? Such
choices are described in the next section.
=head2 Matching this or that
Sometimes we would like our regexp to be able to match different
possible words or character strings. This is accomplished by using
the I<alternation> metacharacter C<'|'>. To match C<dog> or C<cat>, we
form the regexp C<dog|cat>. As before, Perl will try to match the
regexp at the earliest possible point in the string. At each
character position, Perl will first try to match the first
alternative, C<dog>. If C<dog> doesn't match, Perl will then try the
next alternative, C<cat>. If C<cat> doesn't match either, then the
match fails and Perl moves to the next position in the string. Some
examples:
"cats and dogs" =~ /cat|dog|bird/; # matches "cat"
"cats and dogs" =~ /dog|cat|bird/; # matches "cat"
Even though C<dog> is the first alternative in the second regexp,
C<cat> is able to match earlier in the string.
"cats" =~ /c|ca|cat|cats/; # matches "c"
"cats" =~ /cats|cat|ca|c/; # matches "cats"
Here, all the alternatives match at the first string position, so the
first alternative is the one that matches. If some of the
alternatives are truncations of the others, put the longest ones first
to give them a chance to match.
"cab" =~ /a|b|c/ # matches "c"
# /a|b|c/ == /[abc]/
The last example points out that character classes are like
alternations of characters. At a given character position, the first
alternative that allows the regexp match to succeed will be the one
that matches.
=head2 Grouping things and hierarchical matching
Alternation allows a regexp to choose among alternatives, but by
itself it is unsatisfying. The reason is that each alternative is a whole
regexp, but sometime we want alternatives for just part of a
regexp. For instance, suppose we want to search for housecats or
housekeepers. The regexp C<housecat|housekeeper> fits the bill, but is
inefficient because we had to type C<house> twice. It would be nice to
have parts of the regexp be constant, like C<house>, and some
parts have alternatives, like C<cat|keeper>.
The I<grouping> metacharacters C<()> solve this problem. Grouping
allows parts of a regexp to be treated as a single unit. Parts of a
regexp are grouped by enclosing them in parentheses. Thus we could solve
the C<housecat|housekeeper> by forming the regexp as
C<house(cat|keeper)>. The regexp C<house(cat|keeper)> means match
C<house> followed by either C<cat> or C<keeper>. Some more examples
are
/(a|b)b/; # matches 'ab' or 'bb'
/(ac|b)b/; # matches 'acb' or 'bb'
/(^a|b)c/; # matches 'ac' at start of string or 'bc' anywhere
/(a|[bc])d/; # matches 'ad', 'bd', or 'cd'
/house(cat|)/; # matches either 'housecat' or 'house'
/house(cat(s|)|)/; # matches either 'housecats' or 'housecat' or
# 'house'. Note groups can be nested.
/(19|20|)\d\d/; # match years 19xx, 20xx, or the Y2K problem, xx
"20" =~ /(19|20|)\d\d/; # matches the null alternative '()\d\d',
# because '20\d\d' can't match
Alternations behave the same way in groups as out of them: at a given
string position, the leftmost alternative that allows the regexp to
match is taken. So in the last example at the first string position,
C<"20"> matches the second alternative, but there is nothing left over
to match the next two digits C<\d\d>. So Perl moves on to the next
alternative, which is the null alternative and that works, since
C<"20"> is two digits.
The process of trying one alternative, seeing if it matches, and
moving on to the next alternative, while going back in the string
from where the previous alternative was tried, if it doesn't, is called
I<backtracking>. The term "backtracking" comes from the idea that
matching a regexp is like a walk in the woods. Successfully matching
a regexp is like arriving at a destination. There are many possible
trailheads, one for each string position, and each one is tried in
order, left to right. From each trailhead there may be many paths,
some of which get you there, and some which are dead ends. When you
walk along a trail and hit a dead end, you have to backtrack along the
trail to an earlier point to try another trail. If you hit your
destination, you stop immediately and forget about trying all the
other trails. You are persistent, and only if you have tried all the
trails from all the trailheads and not arrived at your destination, do
you declare failure. To be concrete, here is a step-by-step analysis
of what Perl does when it tries to match the regexp
"abcde" =~ /(abd|abc)(df|d|de)/;
=over 4
=item Z<>0. Start with the first letter in the string C<'a'>.
E<nbsp>
=item Z<>1. Try the first alternative in the first group C<'abd'>.
E<nbsp>
=item Z<>2. Match C<'a'> followed by C<'b'>. So far so good.
E<nbsp>
=item Z<>3. C<'d'> in the regexp doesn't match C<'c'> in the string - a
dead end. So backtrack two characters and pick the second alternative
in the first group C<'abc'>.
E<nbsp>
=item Z<>4. Match C<'a'> followed by C<'b'> followed by C<'c'>. We are on a roll
and have satisfied the first group. Set C<$1> to C<'abc'>.
E<nbsp>
=item Z<>5 Move on to the second group and pick the first alternative C<'df'>.
E<nbsp>
=item Z<>6 Match the C<'d'>.
E<nbsp>
=item Z<>7. C<'f'> in the regexp doesn't match C<'e'> in the string, so a dead
end. Backtrack one character and pick the second alternative in the
second group C<'d'>.
E<nbsp>
=item Z<>8. C<'d'> matches. The second grouping is satisfied, so set
C<$2> to C<'d'>.
E<nbsp>
=item Z<>9. We are at the end of the regexp, so we are done! We have
matched C<'abcd'> out of the string C<"abcde">.
=back
There are a couple of things to note about this analysis. First, the
third alternative in the second group C<'de'> also allows a match, but we
stopped before we got to it - at a given character position, leftmost
wins. Second, we were able to get a match at the first character
position of the string C<'a'>. If there were no matches at the first
position, Perl would move to the second character position C<'b'> and
attempt the match all over again. Only when all possible paths at all
possible character positions have been exhausted does Perl give
up and declare S<C<$string =~ /(abd|abc)(df|d|de)/;>> to be false.
Even with all this work, regexp matching happens remarkably fast. To
speed things up, Perl compiles the regexp into a compact sequence of
opcodes that can often fit inside a processor cache. When the code is
executed, these opcodes can then run at full throttle and search very
quickly.
=head2 Extracting matches
The grouping metacharacters C<()> also serve another completely
different function: they allow the extraction of the parts of a string
that matched. This is very useful to find out what matched and for
text processing in general. For each grouping, the part that matched
inside goes into the special variables C<$1>, C<$2>, I<etc>. They can be
used just as ordinary variables:
# extract hours, minutes, seconds
if ($time =~ /(\d\d):(\d\d):(\d\d)/) { # match hh:mm:ss format
$hours = $1;
$minutes = $2;
$seconds = $3;
}
Now, we know that in scalar context,
S<C<$time =~ /(\d\d):(\d\d):(\d\d)/>> returns a true or false
value. In list context, however, it returns the list of matched values
C<($1,$2,$3)>. So we could write the code more compactly as
# extract hours, minutes, seconds
($hours, $minutes, $second) = ($time =~ /(\d\d):(\d\d):(\d\d)/);
If the groupings in a regexp are nested, C<$1> gets the group with the
leftmost opening parenthesis, C<$2> the next opening parenthesis,
I<etc>. Here is a regexp with nested groups:
/(ab(cd|ef)((gi)|j))/;
1 2 34
If this regexp matches, C<$1> contains a string starting with
C<'ab'>, C<$2> is either set to C<'cd'> or C<'ef'>, C<$3> equals either
C<'gi'> or C<'j'>, and C<$4> is either set to C<'gi'>, just like C<$3>,
or it remains undefined.
For convenience, Perl sets C<$+> to the string held by the highest numbered
C<$1>, C<$2>,... that got assigned (and, somewhat related, C<$^N> to the
value of the C<$1>, C<$2>,... most-recently assigned; I<i.e.> the C<$1>,
C<$2>,... associated with the rightmost closing parenthesis used in the
match).
=head2 Backreferences
Closely associated with the matching variables C<$1>, C<$2>, ... are
the I<backreferences> C<\g1>, C<\g2>,... Backreferences are simply
matching variables that can be used I<inside> a regexp. This is a
really nice feature; what matches later in a regexp is made to depend on
what matched earlier in the regexp. Suppose we wanted to look
for doubled words in a text, like "the the". The following regexp finds
all 3-letter doubles with a space in between:
/\b(\w\w\w)\s\g1\b/;
The grouping assigns a value to C<\g1>, so that the same 3-letter sequence
is used for both parts.
A similar task is to find words consisting of two identical parts:
% simple_grep '^(\w\w\w\w|\w\w\w|\w\w|\w)\g1$' /usr/dict/words
beriberi
booboo
coco
mama
murmur
papa
The regexp has a single grouping which considers 4-letter
combinations, then 3-letter combinations, I<etc>., and uses C<\g1> to look for
a repeat. Although C<$1> and C<\g1> represent the same thing, care should be
taken to use matched variables C<$1>, C<$2>,... only I<outside> a regexp
and backreferences C<\g1>, C<\g2>,... only I<inside> a regexp; not doing
so may lead to surprising and unsatisfactory results.
=head2 Relative backreferences
Counting the opening parentheses to get the correct number for a
backreference is error-prone as soon as there is more than one
capturing group. A more convenient technique became available
with Perl 5.10: relative backreferences. To refer to the immediately
preceding capture group one now may write C<\g{-1}>, the next but
last is available via C<\g{-2}>, and so on.
Another good reason in addition to readability and maintainability
for using relative backreferences is illustrated by the following example,
where a simple pattern for matching peculiar strings is used:
$a99a = '([a-z])(\d)\g2\g1'; # matches a11a, g22g, x33x, etc.
Now that we have this pattern stored as a handy string, we might feel
tempted to use it as a part of some other pattern:
$line = "code=e99e";
if ($line =~ /^(\w+)=$a99a$/){ # unexpected behavior!
print "$1 is valid\n";
} else {
print "bad line: '$line'\n";
}
But this doesn't match, at least not the way one might expect. Only
after inserting the interpolated C<$a99a> and looking at the resulting
full text of the regexp is it obvious that the backreferences have
backfired. The subexpression C<(\w+)> has snatched number 1 and
demoted the groups in C<$a99a> by one rank. This can be avoided by
using relative backreferences:
$a99a = '([a-z])(\d)\g{-1}\g{-2}'; # safe for being interpolated
=head2 Named backreferences
Perl 5.10 also introduced named capture groups and named backreferences.
To attach a name to a capturing group, you write either
C<< (?<name>...) >> or C<< (?'name'...) >>. The backreference may
then be written as C<\g{name}>. It is permissible to attach the
same name to more than one group, but then only the leftmost one of the
eponymous set can be referenced. Outside of the pattern a named
capture group is accessible through the C<%+> hash.
Assuming that we have to match calendar dates which may be given in one
of the three formats yyyy-mm-dd, mm/dd/yyyy or dd.mm.yyyy, we can write
three suitable patterns where we use C<'d'>, C<'m'> and C<'y'> respectively as the
names of the groups capturing the pertaining components of a date. The
matching operation combines the three patterns as alternatives:
$fmt1 = '(?<y>\d\d\d\d)-(?<m>\d\d)-(?<d>\d\d)';
$fmt2 = '(?<m>\d\d)/(?<d>\d\d)/(?<y>\d\d\d\d)';
$fmt3 = '(?<d>\d\d)\.(?<m>\d\d)\.(?<y>\d\d\d\d)';
for my $d qw( 2006-10-21 15.01.2007 10/31/2005 ){
if ( $d =~ m{$fmt1|$fmt2|$fmt3} ){
print "day=$+{d} month=$+{m} year=$+{y}\n";
}
}
If any of the alternatives matches, the hash C<%+> is bound to contain the
three key-value pairs.
=head2 Alternative capture group numbering
Yet another capturing group numbering technique (also as from Perl 5.10)
deals with the problem of referring to groups within a set of alternatives.
Consider a pattern for matching a time of the day, civil or military style:
if ( $time =~ /(\d\d|\d):(\d\d)|(\d\d)(\d\d)/ ){
# process hour and minute
}
Processing the results requires an additional if statement to determine
whether C<$1> and C<$2> or C<$3> and C<$4> contain the goodies. It would
be easier if we could use group numbers 1 and 2 in second alternative as
well, and this is exactly what the parenthesized construct C<(?|...)>,
set around an alternative achieves. Here is an extended version of the
previous pattern:
if($time =~ /(?|(\d\d|\d):(\d\d)|(\d\d)(\d\d))\s+([A-Z][A-Z][A-Z])/){
print "hour=$1 minute=$2 zone=$3\n";
}
Within the alternative numbering group, group numbers start at the same
position for each alternative. After the group, numbering continues
with one higher than the maximum reached across all the alternatives.
=head2 Position information
In addition to what was matched, Perl also provides the
positions of what was matched as contents of the C<@-> and C<@+>
arrays. C<$-[0]> is the position of the start of the entire match and
C<$+[0]> is the position of the end. Similarly, C<$-[n]> is the
position of the start of the C<$n> match and C<$+[n]> is the position
of the end. If C<$n> is undefined, so are C<$-[n]> and C<$+[n]>. Then
this code
$x = "Mmm...donut, thought Homer";
$x =~ /^(Mmm|Yech)\.\.\.(donut|peas)/; # matches
foreach $exp (1..$#-) {
print "Match $exp: '${$exp}' at position ($-[$exp],$+[$exp])\n";
}
prints
Match 1: 'Mmm' at position (0,3)
Match 2: 'donut' at position (6,11)
Even if there are no groupings in a regexp, it is still possible to
find out what exactly matched in a string. If you use them, Perl
will set C<$`> to the part of the string before the match, will set C<$&>
to the part of the string that matched, and will set C<'$'> to the part
of the string after the match. An example:
$x = "the cat caught the mouse";
$x =~ /cat/; # $` = 'the ', $& = 'cat', $' = ' caught the mouse'
$x =~ /the/; # $` = '', $& = 'the', $' = ' cat caught the mouse'
In the second match, C<$`> equals C<''> because the regexp matched at the
first character position in the string and stopped; it never saw the
second "the".
If your code is to run on Perl versions earlier than
5.20, it is worthwhile to note that using C<$`> and C<'$'>
slows down regexp matching quite a bit, while C<$&> slows it down to a
lesser extent, because if they are used in one regexp in a program,
they are generated for I<all> regexps in the program. So if raw
performance is a goal of your application, they should be avoided.
If you need to extract the corresponding substrings, use C<@-> and
C<@+> instead:
$` is the same as substr( $x, 0, $-[0] )
$& is the same as substr( $x, $-[0], $+[0]-$-[0] )
$' is the same as substr( $x, $+[0] )
As of Perl 5.10, the C<${^PREMATCH}>, C<${^MATCH}> and C<${^POSTMATCH}>
variables may be used. These are only set if the C</p> modifier is
present. Consequently they do not penalize the rest of the program. In
Perl 5.20, C<${^PREMATCH}>, C<${^MATCH}> and C<${^POSTMATCH}> are available
whether the C</p> has been used or not (the modifier is ignored), and
C<$`>, C<'$'> and C<$&> do not cause any speed difference.
=head2 Non-capturing groupings
A group that is required to bundle a set of alternatives may or may not be
useful as a capturing group. If it isn't, it just creates a superfluous
addition to the set of available capture group values, inside as well as
outside the regexp. Non-capturing groupings, denoted by C<(?:regexp)>,
still allow the regexp to be treated as a single unit, but don't establish
a capturing group at the same time. Both capturing and non-capturing
groupings are allowed to co-exist in the same regexp. Because there is
no extraction, non-capturing groupings are faster than capturing
groupings. Non-capturing groupings are also handy for choosing exactly
which parts of a regexp are to be extracted to matching variables:
# match a number, $1-$4 are set, but we only want $1
/([+-]?\ *(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)/;
# match a number faster , only $1 is set
/([+-]?\ *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?)/;
# match a number, get $1 = whole number, $2 = exponent
/([+-]?\ *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE]([+-]?\d+))?)/;
Non-capturing groupings are also useful for removing nuisance
elements gathered from a split operation where parentheses are
required for some reason:
$x = '12aba34ba5';
@num = split /(a|b)+/, $x; # @num = ('12','a','34','a','5')
@num = split /(?:a|b)+/, $x; # @num = ('12','34','5')
In Perl 5.22 and later, all groups within a regexp can be set to
non-capturing by using the new C</n> flag:
"hello" =~ /(hi|hello)/n; # $1 is not set!
See L<perlre/"n"> for more information.
=head2 Matching repetitions
The examples in the previous section display an annoying weakness. We
were only matching 3-letter words, or chunks of words of 4 letters or
less. We'd like to be able to match words or, more generally, strings
of any length, without writing out tedious alternatives like
C<\w\w\w\w|\w\w\w|\w\w|\w>.
This is exactly the problem the I<quantifier> metacharacters C<'?'>,
C<'*'>, C<'+'>, and C<{}> were created for. They allow us to delimit the
number of repeats for a portion of a regexp we consider to be a
match. Quantifiers are put immediately after the character, character
class, or grouping that we want to specify. They have the following
meanings:
=over 4
=item *
C<a?> means: match C<'a'> 1 or 0 times
=item *
C<a*> means: match C<'a'> 0 or more times, I<i.e.>, any number of times
=item *
C<a+> means: match C<'a'> 1 or more times, I<i.e.>, at least once
=item *
C<a{n,m}> means: match at least C<n> times, but not more than C<m>
times.
=item *
C<a{n,}> means: match at least C<n> or more times
=item *
C<a{n}> means: match exactly C<n> times
=back
Here are some examples:
/[a-z]+\s+\d*/; # match a lowercase word, at least one space, and
# any number of digits
/(\w+)\s+\g1/; # match doubled words of arbitrary length
/y(es)?/i; # matches 'y', 'Y', or a case-insensitive 'yes'
$year =~ /^\d{2,4}$/; # make sure year is at least 2 but not more
# than 4 digits
$year =~ /^\d{4}$|^\d{2}$/; # better match; throw out 3-digit dates
$year =~ /^\d{2}(\d{2})?$/; # same thing written differently.
# However, this captures the last two
# digits in $1 and the other does not.
% simple_grep '^(\w+)\g1$' /usr/dict/words # isn't this easier?
beriberi
booboo
coco
mama
murmur
papa
For all of these quantifiers, Perl will try to match as much of the
string as possible, while still allowing the regexp to succeed. Thus
with C</a?.../>, Perl will first try to match the regexp with the C<'a'>
present; if that fails, Perl will try to match the regexp without the
C<'a'> present. For the quantifier C<'*'>, we get the following:
$x = "the cat in the hat";
$x =~ /^(.*)(cat)(.*)$/; # matches,
# $1 = 'the '
# $2 = 'cat'
# $3 = ' in the hat'
Which is what we might expect, the match finds the only C<cat> in the
string and locks onto it. Consider, however, this regexp:
$x =~ /^(.*)(at)(.*)$/; # matches,
# $1 = 'the cat in the h'
# $2 = 'at'
# $3 = '' (0 characters match)
One might initially guess that Perl would find the C<at> in C<cat> and
stop there, but that wouldn't give the longest possible string to the
first quantifier C<.*>. Instead, the first quantifier C<.*> grabs as
much of the string as possible while still having the regexp match. In
this example, that means having the C<at> sequence with the final C<at>
in the string. The other important principle illustrated here is that,
when there are two or more elements in a regexp, the I<leftmost>
quantifier, if there is one, gets to grab as much of the string as
possible, leaving the rest of the regexp to fight over scraps. Thus in
our example, the first quantifier C<.*> grabs most of the string, while
the second quantifier C<.*> gets the empty string. Quantifiers that
grab as much of the string as possible are called I<maximal match> or
I<greedy> quantifiers.
When a regexp can match a string in several different ways, we can use
the principles above to predict which way the regexp will match:
=over 4
=item *
Principle 0: Taken as a whole, any regexp will be matched at the
earliest possible position in the string.
=item *
Principle 1: In an alternation C<a|b|c...>, the leftmost alternative
that allows a match for the whole regexp will be the one used.
=item *
Principle 2: The maximal matching quantifiers C<'?'>, C<'*'>, C<'+'> and
C<{n,m}> will in general match as much of the string as possible while
still allowing the whole regexp to match.
=item *
Principle 3: If there are two or more elements in a regexp, the
leftmost greedy quantifier, if any, will match as much of the string
as possible while still allowing the whole regexp to match. The next
leftmost greedy quantifier, if any, will try to match as much of the
string remaining available to it as possible, while still allowing the
whole regexp to match. And so on, until all the regexp elements are
satisfied.
=back
As we have seen above, Principle 0 overrides the others. The regexp
will be matched as early as possible, with the other principles
determining how the regexp matches at that earliest character
position.
Here is an example of these principles in action:
$x = "The programming republic of Perl";
$x =~ /^(.+)(e|r)(.*)$/; # matches,
# $1 = 'The programming republic of Pe'
# $2 = 'r'
# $3 = 'l'
This regexp matches at the earliest string position, C<'T'>. One
might think that C<'e'>, being leftmost in the alternation, would be
matched, but C<'r'> produces the longest string in the first quantifier.
$x =~ /(m{1,2})(.*)$/; # matches,
# $1 = 'mm'
# $2 = 'ing republic of Perl'
Here, The earliest possible match is at the first C<'m'> in
C<programming>. C<m{1,2}> is the first quantifier, so it gets to match
a maximal C<mm>.
$x =~ /.*(m{1,2})(.*)$/; # matches,
# $1 = 'm'
# $2 = 'ing republic of Perl'
Here, the regexp matches at the start of the string. The first
quantifier C<.*> grabs as much as possible, leaving just a single
C<'m'> for the second quantifier C<m{1,2}>.
$x =~ /(.?)(m{1,2})(.*)$/; # matches,
# $1 = 'a'
# $2 = 'mm'
# $3 = 'ing republic of Perl'
Here, C<.?> eats its maximal one character at the earliest possible
position in the string, C<'a'> in C<programming>, leaving C<m{1,2}>
the opportunity to match both C<'m'>'s. Finally,
"aXXXb" =~ /(X*)/; # matches with $1 = ''
because it can match zero copies of C<'X'> at the beginning of the
string. If you definitely want to match at least one C<'X'>, use
C<X+>, not C<X*>.
Sometimes greed is not good. At times, we would like quantifiers to
match a I<minimal> piece of string, rather than a maximal piece. For
this purpose, Larry Wall created the I<minimal match> or
I<non-greedy> quantifiers C<??>, C<*?>, C<+?>, and C<{}?>. These are
the usual quantifiers with a C<'?'> appended to them. They have the
following meanings:
=over 4
=item *
C<a??> means: match C<'a'> 0 or 1 times. Try 0 first, then 1.
=item *
C<a*?> means: match C<'a'> 0 or more times, I<i.e.>, any number of times,
but as few times as possible
=item *
C<a+?> means: match C<'a'> 1 or more times, I<i.e.>, at least once, but
as few times as possible
=item *
C<a{n,m}?> means: match at least C<n> times, not more than C<m>
times, as few times as possible
=item *
C<a{n,}?> means: match at least C<n> times, but as few times as
possible
=item *
C<a{n}?> means: match exactly C<n> times. Because we match exactly
C<n> times, C<a{n}?> is equivalent to C<a{n}> and is just there for
notational consistency.
=back
Let's look at the example above, but with minimal quantifiers:
$x = "The programming republic of Perl";
$x =~ /^(.+?)(e|r)(.*)$/; # matches,
# $1 = 'Th'
# $2 = 'e'
# $3 = ' programming republic of Perl'
The minimal string that will allow both the start of the string C<'^'>
and the alternation to match is C<Th>, with the alternation C<e|r>
matching C<'e'>. The second quantifier C<.*> is free to gobble up the
rest of the string.
$x =~ /(m{1,2}?)(.*?)$/; # matches,
# $1 = 'm'
# $2 = 'ming republic of Perl'
The first string position that this regexp can match is at the first
C<'m'> in C<programming>. At this position, the minimal C<m{1,2}?>
matches just one C<'m'>. Although the second quantifier C<.*?> would
prefer to match no characters, it is constrained by the end-of-string
anchor C<'$'> to match the rest of the string.
$x =~ /(.*?)(m{1,2}?)(.*)$/; # matches,
# $1 = 'The progra'
# $2 = 'm'
# $3 = 'ming republic of Perl'
In this regexp, you might expect the first minimal quantifier C<.*?>
to match the empty string, because it is not constrained by a C<'^'>
anchor to match the beginning of the word. Principle 0 applies here,
however. Because it is possible for the whole regexp to match at the
start of the string, it I<will> match at the start of the string. Thus
the first quantifier has to match everything up to the first C<'m'>. The
second minimal quantifier matches just one C<'m'> and the third
quantifier matches the rest of the string.
$x =~ /(.??)(m{1,2})(.*)$/; # matches,
# $1 = 'a'
# $2 = 'mm'
# $3 = 'ing republic of Perl'
Just as in the previous regexp, the first quantifier C<.??> can match
earliest at position C<'a'>, so it does. The second quantifier is
greedy, so it matches C<mm>, and the third matches the rest of the
string.
We can modify principle 3 above to take into account non-greedy
quantifiers:
=over 4
=item *
Principle 3: If there are two or more elements in a regexp, the
leftmost greedy (non-greedy) quantifier, if any, will match as much
(little) of the string as possible while still allowing the whole
regexp to match. The next leftmost greedy (non-greedy) quantifier, if
any, will try to match as much (little) of the string remaining
available to it as possible, while still allowing the whole regexp to
match. And so on, until all the regexp elements are satisfied.
=back
Just like alternation, quantifiers are also susceptible to
backtracking. Here is a step-by-step analysis of the example
$x = "the cat in the hat";
$x =~ /^(.*)(at)(.*)$/; # matches,
# $1 = 'the cat in the h'
# $2 = 'at'
# $3 = '' (0 matches)
=over 4
=item Z<>0. Start with the first letter in the string C<'t'>.
E<nbsp>
=item Z<>1. The first quantifier C<'.*'> starts out by matching the whole
string "C<the cat in the hat>".
E<nbsp>
=item Z<>2. C<'a'> in the regexp element C<'at'> doesn't match the end
of the string. Backtrack one character.
E<nbsp>
=item Z<>3. C<'a'> in the regexp element C<'at'> still doesn't match
the last letter of the string C<'t'>, so backtrack one more character.
E<nbsp>
=item Z<>4. Now we can match the C<'a'> and the C<'t'>.
E<nbsp>
=item Z<>5. Move on to the third element C<'.*'>. Since we are at the
end of the string and C<'.*'> can match 0 times, assign it the empty
string.
E<nbsp>
=item Z<>6. We are done!
=back
Most of the time, all this moving forward and backtracking happens
quickly and searching is fast. There are some pathological regexps,
however, whose execution time exponentially grows with the size of the
string. A typical structure that blows up in your face is of the form
/(a|b+)*/;
The problem is the nested indeterminate quantifiers. There are many
different ways of partitioning a string of length n between the C<'+'>
and C<'*'>: one repetition with C<b+> of length n, two repetitions with
the first C<b+> length k and the second with length n-k, m repetitions
whose bits add up to length n, I<etc>. In fact there are an exponential
number of ways to partition a string as a function of its length. A
regexp may get lucky and match early in the process, but if there is
no match, Perl will try I<every> possibility before giving up. So be
careful with nested C<'*'>'s, C<{n,m}>'s, and C<'+'>'s. The book
I<Mastering Regular Expressions> by Jeffrey Friedl gives a wonderful
discussion of this and other efficiency issues.
=head2 Possessive quantifiers
Backtracking during the relentless search for a match may be a waste
of time, particularly when the match is bound to fail. Consider
the simple pattern
/^\w+\s+\w+$/; # a word, spaces, a word
Whenever this is applied to a string which doesn't quite meet the
pattern's expectations such as S<C<"abc ">> or S<C<"abc def ">>,
the regexp engine will backtrack, approximately once for each character
in the string. But we know that there is no way around taking I<all>
of the initial word characters to match the first repetition, that I<all>
spaces must be eaten by the middle part, and the same goes for the second
word.
With the introduction of the I<possessive quantifiers> in Perl 5.10, we
have a way of instructing the regexp engine not to backtrack, with the
usual quantifiers with a C<'+'> appended to them. This makes them greedy as
well as stingy; once they succeed they won't give anything back to permit
another solution. They have the following meanings:
=over 4
=item *
C<a{n,m}+> means: match at least C<n> times, not more than C<m> times,
as many times as possible, and don't give anything up. C<a?+> is short
for C<a{0,1}+>
=item *
C<a{n,}+> means: match at least C<n> times, but as many times as possible,
and don't give anything up. C<a*+> is short for C<a{0,}+> and C<a++> is
short for C<a{1,}+>.
=item *
C<a{n}+> means: match exactly C<n> times. It is just there for
notational consistency.
=back
These possessive quantifiers represent a special case of a more general
concept, the I<independent subexpression>, see below.
As an example where a possessive quantifier is suitable we consider
matching a quoted string, as it appears in several programming languages.
The backslash is used as an escape character that indicates that the
next character is to be taken literally, as another character for the
string. Therefore, after the opening quote, we expect a (possibly
empty) sequence of alternatives: either some character except an
unescaped quote or backslash or an escaped character.
/"(?:[^"\\]++|\\.)*+"/;
=head2 Building a regexp
At this point, we have all the basic regexp concepts covered, so let's
give a more involved example of a regular expression. We will build a
regexp that matches numbers.
The first task in building a regexp is to decide what we want to match
and what we want to exclude. In our case, we want to match both
integers and floating point numbers and we want to reject any string
that isn't a number.
The next task is to break the problem down into smaller problems that
are easily converted into a regexp.
The simplest case is integers. These consist of a sequence of digits,
with an optional sign in front. The digits we can represent with
C<\d+> and the sign can be matched with C<[+-]>. Thus the integer
regexp is
/[+-]?\d+/; # matches integers
A floating point number potentially has a sign, an integral part, a
decimal point, a fractional part, and an exponent. One or more of these
parts is optional, so we need to check out the different
possibilities. Floating point numbers which are in proper form include
123., 0.345, .34, -1e6, and 25.4E-72. As with integers, the sign out
front is completely optional and can be matched by C<[+-]?>. We can
see that if there is no exponent, floating point numbers must have a
decimal point, otherwise they are integers. We might be tempted to
model these with C<\d*\.\d*>, but this would also match just a single
decimal point, which is not a number. So the three cases of floating
point number without exponent are
/[+-]?\d+\./; # 1., 321., etc.
/[+-]?\.\d+/; # .1, .234, etc.
/[+-]?\d+\.\d+/; # 1.0, 30.56, etc.
These can be combined into a single regexp with a three-way alternation:
/[+-]?(\d+\.\d+|\d+\.|\.\d+)/; # floating point, no exponent
In this alternation, it is important to put C<'\d+\.\d+'> before
C<'\d+\.'>. If C<'\d+\.'> were first, the regexp would happily match that
and ignore the fractional part of the number.
Now consider floating point numbers with exponents. The key
observation here is that I<both> integers and numbers with decimal
points are allowed in front of an exponent. Then exponents, like the
overall sign, are independent of whether we are matching numbers with
or without decimal points, and can be "decoupled" from the
mantissa. The overall form of the regexp now becomes clear:
/^(optional sign)(integer | f.p. mantissa)(optional exponent)$/;
The exponent is an C<'e'> or C<'E'>, followed by an integer. So the
exponent regexp is
/[eE][+-]?\d+/; # exponent
Putting all the parts together, we get a regexp that matches numbers:
/^[+-]?(\d+\.\d+|\d+\.|\.\d+|\d+)([eE][+-]?\d+)?$/; # Ta da!
Long regexps like this may impress your friends, but can be hard to
decipher. In complex situations like this, the C</x> modifier for a
match is invaluable. It allows one to put nearly arbitrary whitespace
and comments into a regexp without affecting their meaning. Using it,
we can rewrite our "extended" regexp in the more pleasing form
/^
[+-]? # first, match an optional sign
( # then match integers or f.p. mantissas:
\d+\.\d+ # mantissa of the form a.b
|\d+\. # mantissa of the form a.
|\.\d+ # mantissa of the form .b
|\d+ # integer of the form a
)
( [eE] [+-]? \d+ )? # finally, optionally match an exponent
$/x;
If whitespace is mostly irrelevant, how does one include space
characters in an extended regexp? The answer is to backslash it
S<C<'\ '>> or put it in a character class S<C<[ ]>>. The same thing
goes for pound signs: use C<\#> or C<[#]>. For instance, Perl allows
a space between the sign and the mantissa or integer, and we could add
this to our regexp as follows:
/^
[+-]?\ * # first, match an optional sign *and space*
( # then match integers or f.p. mantissas:
\d+\.\d+ # mantissa of the form a.b
|\d+\. # mantissa of the form a.
|\.\d+ # mantissa of the form .b
|\d+ # integer of the form a
)
( [eE] [+-]? \d+ )? # finally, optionally match an exponent
$/x;
In this form, it is easier to see a way to simplify the
alternation. Alternatives 1, 2, and 4 all start with C<\d+>, so it
could be factored out:
/^
[+-]?\ * # first, match an optional sign
( # then match integers or f.p. mantissas:
\d+ # start out with a ...
(
\.\d* # mantissa of the form a.b or a.
)? # ? takes care of integers of the form a
|\.\d+ # mantissa of the form .b
)
( [eE] [+-]? \d+ )? # finally, optionally match an exponent
$/x;
Starting in Perl v5.26, specifying C</xx> changes the square-bracketed
portions of a pattern to ignore tabs and space characters unless they
are escaped by preceding them with a backslash. So, we could write
/^
[ + - ]?\ * # first, match an optional sign
( # then match integers or f.p. mantissas:
\d+ # start out with a ...
(
\.\d* # mantissa of the form a.b or a.
)? # ? takes care of integers of the form a
|\.\d+ # mantissa of the form .b
)
( [ e E ] [ + - ]? \d+ )? # finally, optionally match an exponent
$/xx;
This doesn't really improve the legibility of this example, but it's
available in case you want it. Squashing the pattern down to the
compact form, we have
/^[+-]?\ *(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?$/;
This is our final regexp. To recap, we built a regexp by
=over 4
=item *
specifying the task in detail,
=item *
breaking down the problem into smaller parts,
=item *
translating the small parts into regexps,
=item *
combining the regexps,
=item *
and optimizing the final combined regexp.
=back
These are also the typical steps involved in writing a computer
program. This makes perfect sense, because regular expressions are
essentially programs written in a little computer language that specifies
patterns.
=head2 Using regular expressions in Perl
The last topic of Part 1 briefly covers how regexps are used in Perl
programs. Where do they fit into Perl syntax?
We have already introduced the matching operator in its default
C</regexp/> and arbitrary delimiter C<m!regexp!> forms. We have used
the binding operator C<=~> and its negation C<!~> to test for string
matches. Associated with the matching operator, we have discussed the
single line C</s>, multi-line C</m>, case-insensitive C</i> and
extended C</x> modifiers. There are a few more things you might
want to know about matching operators.
=head3 Prohibiting substitution
If you change C<$pattern> after the first substitution happens, Perl
will ignore it. If you don't want any substitutions at all, use the
special delimiter C<m''>:
@pattern = ('Seuss');
while (<>) {
print if m'@pattern'; # matches literal '@pattern', not 'Seuss'
}
Similar to strings, C<m''> acts like apostrophes on a regexp; all other
C<'m'> delimiters act like quotes. If the regexp evaluates to the empty string,
the regexp in the I<last successful match> is used instead. So we have
"dog" =~ /d/; # 'd' matches
"dogbert =~ //; # this matches the 'd' regexp used before
=head3 Global matching
The final two modifiers we will discuss here,
C</g> and C</c>, concern multiple matches.
The modifier C</g> stands for global matching and allows the
matching operator to match within a string as many times as possible.
In scalar context, successive invocations against a string will have
C</g> jump from match to match, keeping track of position in the
string as it goes along. You can get or set the position with the
C<pos()> function.
The use of C</g> is shown in the following example. Suppose we have
a string that consists of words separated by spaces. If we know how
many words there are in advance, we could extract the words using
groupings:
$x = "cat dog house"; # 3 words
$x =~ /^\s*(\w+)\s+(\w+)\s+(\w+)\s*$/; # matches,
# $1 = 'cat'
# $2 = 'dog'
# $3 = 'house'
But what if we had an indeterminate number of words? This is the sort
of task C</g> was made for. To extract all words, form the simple
regexp C<(\w+)> and loop over all matches with C</(\w+)/g>:
while ($x =~ /(\w+)/g) {
print "Word is $1, ends at position ", pos $x, "\n";
}
prints
Word is cat, ends at position 3
Word is dog, ends at position 7
Word is house, ends at position 13
A failed match or changing the target string resets the position. If
you don't want the position reset after failure to match, add the
C</c>, as in C</regexp/gc>. The current position in the string is
associated with the string, not the regexp. This means that different
strings have different positions and their respective positions can be
set or read independently.
In list context, C</g> returns a list of matched groupings, or if
there are no groupings, a list of matches to the whole regexp. So if
we wanted just the words, we could use
@words = ($x =~ /(\w+)/g); # matches,
# $words[0] = 'cat'
# $words[1] = 'dog'
# $words[2] = 'house'
Closely associated with the C</g> modifier is the C<\G> anchor. The
C<\G> anchor matches at the point where the previous C</g> match left
off. C<\G> allows us to easily do context-sensitive matching:
$metric = 1; # use metric units
...
$x = <FILE>; # read in measurement
$x =~ /^([+-]?\d+)\s*/g; # get magnitude
$weight = $1;
if ($metric) { # error checking
print "Units error!" unless $x =~ /\Gkg\./g;
}
else {
print "Units error!" unless $x =~ /\Glbs\./g;
}
$x =~ /\G\s+(widget|sprocket)/g; # continue processing
The combination of C</g> and C<\G> allows us to process the string a
bit at a time and use arbitrary Perl logic to decide what to do next.
Currently, the C<\G> anchor is only fully supported when used to anchor
to the start of the pattern.
C<\G> is also invaluable in processing fixed-length records with
regexps. Suppose we have a snippet of coding region DNA, encoded as
base pair letters C<ATCGTTGAAT...> and we want to find all the stop
codons C<TGA>. In a coding region, codons are 3-letter sequences, so
we can think of the DNA snippet as a sequence of 3-letter records. The
naive regexp
# expanded, this is "ATC GTT GAA TGC AAA TGA CAT GAC"
$dna = "ATCGTTGAATGCAAATGACATGAC";
$dna =~ /TGA/;
doesn't work; it may match a C<TGA>, but there is no guarantee that
the match is aligned with codon boundaries, I<e.g.>, the substring
S<C<GTT GAA>> gives a match. A better solution is
while ($dna =~ /(\w\w\w)*?TGA/g) { # note the minimal *?
print "Got a TGA stop codon at position ", pos $dna, "\n";
}
which prints
Got a TGA stop codon at position 18
Got a TGA stop codon at position 23
Position 18 is good, but position 23 is bogus. What happened?
The answer is that our regexp works well until we get past the last
real match. Then the regexp will fail to match a synchronized C<TGA>
and start stepping ahead one character position at a time, not what we
want. The solution is to use C<\G> to anchor the match to the codon
alignment:
while ($dna =~ /\G(\w\w\w)*?TGA/g) {
print "Got a TGA stop codon at position ", pos $dna, "\n";
}
This prints
Got a TGA stop codon at position 18
which is the correct answer. This example illustrates that it is
important not only to match what is desired, but to reject what is not
desired.
(There are other regexp modifiers that are available, such as
C</o>, but their specialized uses are beyond the
scope of this introduction. )
=head3 Search and replace
Regular expressions also play a big role in I<search and replace>
operations in Perl. Search and replace is accomplished with the
C<s///> operator. The general form is
C<s/regexp/replacement/modifiers>, with everything we know about
regexps and modifiers applying in this case as well. The
I<replacement> is a Perl double-quoted string that replaces in the
string whatever is matched with the C<regexp>. The operator C<=~> is
also used here to associate a string with C<s///>. If matching
against C<$_>, the S<C<$_ =~>> can be dropped. If there is a match,
C<s///> returns the number of substitutions made; otherwise it returns
false. Here are a few examples:
$x = "Time to feed the cat!";
$x =~ s/cat/hacker/; # $x contains "Time to feed the hacker!"
if ($x =~ s/^(Time.*hacker)!$/$1 now!/) {
$more_insistent = 1;
}
$y = "'quoted words'";
$y =~ s/^'(.*)'$/$1/; # strip single quotes,
# $y contains "quoted words"
In the last example, the whole string was matched, but only the part
inside the single quotes was grouped. With the C<s///> operator, the
matched variables C<$1>, C<$2>, I<etc>. are immediately available for use
in the replacement expression, so we use C<$1> to replace the quoted
string with just what was quoted. With the global modifier, C<s///g>
will search and replace all occurrences of the regexp in the string:
$x = "I batted 4 for 4";
$x =~ s/4/four/; # doesn't do it all:
# $x contains "I batted four for 4"
$x = "I batted 4 for 4";
$x =~ s/4/four/g; # does it all:
# $x contains "I batted four for four"
If you prefer "regex" over "regexp" in this tutorial, you could use
the following program to replace it:
% cat > simple_replace
#!/usr/bin/perl
$regexp = shift;
$replacement = shift;
while (<>) {
s/$regexp/$replacement/g;
print;
}
^D
% simple_replace regexp regex perlretut.pod
In C<simple_replace> we used the C<s///g> modifier to replace all
occurrences of the regexp on each line. (Even though the regular
expression appears in a loop, Perl is smart enough to compile it
only once.) As with C<simple_grep>, both the
C<print> and the C<s/$regexp/$replacement/g> use C<$_> implicitly.
If you don't want C<s///> to change your original variable you can use
the non-destructive substitute modifier, C<s///r>. This changes the
behavior so that C<s///r> returns the final substituted string
(instead of the number of substitutions):
$x = "I like dogs.";
$y = $x =~ s/dogs/cats/r;
print "$x $y\n";
That example will print "I like dogs. I like cats". Notice the original
C<$x> variable has not been affected. The overall
result of the substitution is instead stored in C<$y>. If the
substitution doesn't affect anything then the original string is
returned:
$x = "I like dogs.";
$y = $x =~ s/elephants/cougars/r;
print "$x $y\n"; # prints "I like dogs. I like dogs."
One other interesting thing that the C<s///r> flag allows is chaining
substitutions:
$x = "Cats are great.";
print $x =~ s/Cats/Dogs/r =~ s/Dogs/Frogs/r =~
s/Frogs/Hedgehogs/r, "\n";
# prints "Hedgehogs are great."
A modifier available specifically to search and replace is the
C<s///e> evaluation modifier. C<s///e> treats the
replacement text as Perl code, rather than a double-quoted
string. The value that the code returns is substituted for the
matched substring. C<s///e> is useful if you need to do a bit of
computation in the process of replacing text. This example counts
character frequencies in a line:
$x = "Bill the cat";
$x =~ s/(.)/$chars{$1}++;$1/eg; # final $1 replaces char with itself
print "frequency of '$_' is $chars{$_}\n"
foreach (sort {$chars{$b} <=> $chars{$a}} keys %chars);
This prints
frequency of ' ' is 2
frequency of 't' is 2
frequency of 'l' is 2
frequency of 'B' is 1
frequency of 'c' is 1
frequency of 'e' is 1
frequency of 'h' is 1
frequency of 'i' is 1
frequency of 'a' is 1
As with the match C<m//> operator, C<s///> can use other delimiters,
such as C<s!!!> and C<s{}{}>, and even C<s{}//>. If single quotes are
used C<s'''>, then the regexp and replacement are
treated as single-quoted strings and there are no
variable substitutions. C<s///> in list context
returns the same thing as in scalar context, I<i.e.>, the number of
matches.
=head3 The split function
The C<split()> function is another place where a regexp is used.
C<split /regexp/, string, limit> separates the C<string> operand into
a list of substrings and returns that list. The regexp must be designed
to match whatever constitutes the separators for the desired substrings.
The C<limit>, if present, constrains splitting into no more than C<limit>
number of strings. For example, to split a string into words, use
$x = "Calvin and Hobbes";
@words = split /\s+/, $x; # $word[0] = 'Calvin'
# $word[1] = 'and'
# $word[2] = 'Hobbes'
If the empty regexp C<//> is used, the regexp always matches and
the string is split into individual characters. If the regexp has
groupings, then the resulting list contains the matched substrings from the
groupings as well. For instance,
$x = "/usr/bin/perl";
@dirs = split m!/!, $x; # $dirs[0] = ''
# $dirs[1] = 'usr'
# $dirs[2] = 'bin'
# $dirs[3] = 'perl'
@parts = split m!(/)!, $x; # $parts[0] = ''
# $parts[1] = '/'
# $parts[2] = 'usr'
# $parts[3] = '/'
# $parts[4] = 'bin'
# $parts[5] = '/'
# $parts[6] = 'perl'
Since the first character of C<$x> matched the regexp, C<split> prepended
an empty initial element to the list.
If you have read this far, congratulations! You now have all the basic
tools needed to use regular expressions to solve a wide range of text
processing problems. If this is your first time through the tutorial,
why not stop here and play around with regexps a while.... S<Part 2>
concerns the more esoteric aspects of regular expressions and those
concepts certainly aren't needed right at the start.
=head1 Part 2: Power tools
OK, you know the basics of regexps and you want to know more. If
matching regular expressions is analogous to a walk in the woods, then
the tools discussed in Part 1 are analogous to topo maps and a
compass, basic tools we use all the time. Most of the tools in part 2
are analogous to flare guns and satellite phones. They aren't used
too often on a hike, but when we are stuck, they can be invaluable.
What follows are the more advanced, less used, or sometimes esoteric
capabilities of Perl regexps. In Part 2, we will assume you are
comfortable with the basics and concentrate on the advanced features.
=head2 More on characters, strings, and character classes
There are a number of escape sequences and character classes that we
haven't covered yet.
There are several escape sequences that convert characters or strings
between upper and lower case, and they are also available within
patterns. C<\l> and C<\u> convert the next character to lower or
upper case, respectively:
$x = "perl";
$string =~ /\u$x/; # matches 'Perl' in $string
$x = "M(rs?|s)\\."; # note the double backslash
$string =~ /\l$x/; # matches 'mr.', 'mrs.', and 'ms.',
A C<\L> or C<\U> indicates a lasting conversion of case, until
terminated by C<\E> or thrown over by another C<\U> or C<\L>:
$x = "This word is in lower case:\L SHOUT\E";
$x =~ /shout/; # matches
$x = "I STILL KEYPUNCH CARDS FOR MY 360"
$x =~ /\Ukeypunch/; # matches punch card string
If there is no C<\E>, case is converted until the end of the
string. The regexps C<\L\u$word> or C<\u\L$word> convert the first
character of C<$word> to uppercase and the rest of the characters to
lowercase.
Control characters can be escaped with C<\c>, so that a control-Z
character would be matched with C<\cZ>. The escape sequence
C<\Q>...C<\E> quotes, or protects most non-alphabetic characters. For
instance,
$x = "\QThat !^*&%~& cat!";
$x =~ /\Q!^*&%~&\E/; # check for rough language
It does not protect C<'$'> or C<'@'>, so that variables can still be
substituted.
C<\Q>, C<\L>, C<\l>, C<\U>, C<\u> and C<\E> are actually part of
double-quotish syntax, and not part of regexp syntax proper. They will
work if they appear in a regular expression embedded directly in a
program, but not when contained in a string that is interpolated in a
pattern.
Perl regexps can handle more than just the
standard ASCII character set. Perl supports I<Unicode>, a standard
for representing the alphabets from virtually all of the world's written
languages, and a host of symbols. Perl's text strings are Unicode strings, so
they can contain characters with a value (codepoint or character number) higher
than 255.
What does this mean for regexps? Well, regexp users don't need to know
much about Perl's internal representation of strings. But they do need
to know 1) how to represent Unicode characters in a regexp and 2) that
a matching operation will treat the string to be searched as a sequence
of characters, not bytes. The answer to 1) is that Unicode characters
greater than C<chr(255)> are represented using the C<\x{hex}> notation, because
C<\x>I<XY> (without curly braces and I<XY> are two hex digits) doesn't
go further than 255. (Starting in Perl 5.14, if you're an octal fan,
you can also use C<\o{oct}>.)
/\x{263a}/; # match a Unicode smiley face :)
B<NOTE>: In Perl 5.6.0 it used to be that one needed to say C<use
utf8> to use any Unicode features. This is no more the case: for
almost all Unicode processing, the explicit C<utf8> pragma is not
needed. (The only case where it matters is if your Perl script is in
Unicode and encoded in UTF-8, then an explicit C<use utf8> is needed.)
Figuring out the hexadecimal sequence of a Unicode character you want
or deciphering someone else's hexadecimal Unicode regexp is about as
much fun as programming in machine code. So another way to specify
Unicode characters is to use the I<named character> escape
sequence C<\N{I<name>}>. I<name> is a name for the Unicode character, as
specified in the Unicode standard. For instance, if we wanted to
represent or match the astrological sign for the planet Mercury, we
could use
$x = "abc\N{MERCURY}def";
$x =~ /\N{MERCURY}/; # matches
One can also use "short" names:
print "\N{GREEK SMALL LETTER SIGMA} is called sigma.\n";
print "\N{greek:Sigma} is an upper-case sigma.\n";
You can also restrict names to a certain alphabet by specifying the
L<charnames> pragma:
use charnames qw(greek);
print "\N{sigma} is Greek sigma\n";
An index of character names is available on-line from the Unicode
Consortium, L<http://www.unicode.org/charts/charindex.html>; explanatory
material with links to other resources at
L<http://www.unicode.org/standard/where>.
The answer to requirement 2) is that a regexp (mostly)
uses Unicode characters. The "mostly" is for messy backward
compatibility reasons, but starting in Perl 5.14, any regexp compiled in
the scope of a C<use feature 'unicode_strings'> (which is automatically
turned on within the scope of a C<use 5.012> or higher) will turn that
"mostly" into "always". If you want to handle Unicode properly, you
should ensure that C<'unicode_strings'> is turned on.
Internally, this is encoded to bytes using either UTF-8 or a native 8
bit encoding, depending on the history of the string, but conceptually
it is a sequence of characters, not bytes. See L<perlunitut> for a
tutorial about that.
Let us now discuss Unicode character classes, most usually called
"character properties". These are represented by the C<\p{I<name>}>
escape sequence. The negation of this is C<\P{I<name>}>. For example,
to match lower and uppercase characters,
$x = "BOB";
$x =~ /^\p{IsUpper}/; # matches, uppercase char class
$x =~ /^\P{IsUpper}/; # doesn't match, char class sans uppercase
$x =~ /^\p{IsLower}/; # doesn't match, lowercase char class
$x =~ /^\P{IsLower}/; # matches, char class sans lowercase
(The "C<Is>" is optional.)
There are many, many Unicode character properties. For the full list
see L<perluniprops>. Most of them have synonyms with shorter names,
also listed there. Some synonyms are a single character. For these,
you can drop the braces. For instance, C<\pM> is the same thing as
C<\p{Mark}>, meaning things like accent marks.
The Unicode C<\p{Script}> and C<\p{Script_Extensions}> properties are
used to categorize every Unicode character into the language script it
is written in. (C<Script_Extensions> is an improved version of
C<Script>, which is retained for backward compatibility, and so you
should generally use C<Script_Extensions>.)
For example,
English, French, and a bunch of other European languages are written in
the Latin script. But there is also the Greek script, the Thai script,
the Katakana script, I<etc>. You can test whether a character is in a
particular script (based on C<Script_Extensions>) with, for example
C<\p{Latin}>, C<\p{Greek}>, or C<\p{Katakana}>. To test if it isn't in
the Balinese script, you would use C<\P{Balinese}>.
What we have described so far is the single form of the C<\p{...}> character
classes. There is also a compound form which you may run into. These
look like C<\p{I<name>=I<value>}> or C<\p{I<name>:I<value>}> (the equals sign and colon
can be used interchangeably). These are more general than the single form,
and in fact most of the single forms are just Perl-defined shortcuts for common
compound forms. For example, the script examples in the previous paragraph
could be written equivalently as C<\p{Script_Extensions=Latin}>, C<\p{Script_Extensions:Greek}>,
C<\p{script_extensions=katakana}>, and C<\P{script_extensions=balinese}> (case is irrelevant
between the C<{}> braces). You may
never have to use the compound forms, but sometimes it is necessary, and their
use can make your code easier to understand.
C<\X> is an abbreviation for a character class that comprises
a Unicode I<extended grapheme cluster>. This represents a "logical character":
what appears to be a single character, but may be represented internally by more
than one. As an example, using the Unicode full names, I<e.g.>, "S<A + COMBINING
RING>" is a grapheme cluster with base character "A" and combining character
"S<COMBINING RING>, which translates in Danish to "A" with the circle atop it,
as in the word E<Aring>ngstrom.
For the full and latest information about Unicode see the latest
Unicode standard, or the Unicode Consortium's website L<http://www.unicode.org>
As if all those classes weren't enough, Perl also defines POSIX-style
character classes. These have the form C<[:I<name>:]>, with I<name> the
name of the POSIX class. The POSIX classes are C<alpha>, C<alnum>,
C<ascii>, C<cntrl>, C<digit>, C<graph>, C<lower>, C<print>, C<punct>,
C<space>, C<upper>, and C<xdigit>, and two extensions, C<word> (a Perl
extension to match C<\w>), and C<blank> (a GNU extension). The C</a>
modifier restricts these to matching just in the ASCII range; otherwise
they can match the same as their corresponding Perl Unicode classes:
C<[:upper:]> is the same as C<\p{IsUpper}>, I<etc>. (There are some
exceptions and gotchas with this; see L<perlrecharclass> for a full
discussion.) The C<[:digit:]>, C<[:word:]>, and
C<[:space:]> correspond to the familiar C<\d>, C<\w>, and C<\s>
character classes. To negate a POSIX class, put a C<'^'> in front of
the name, so that, I<e.g.>, C<[:^digit:]> corresponds to C<\D> and, under
Unicode, C<\P{IsDigit}>. The Unicode and POSIX character classes can
be used just like C<\d>, with the exception that POSIX character
classes can only be used inside of a character class:
/\s+[abc[:digit:]xyz]\s*/; # match a,b,c,x,y,z, or a digit
/^=item\s[[:digit:]]/; # match '=item',
# followed by a space and a digit
/\s+[abc\p{IsDigit}xyz]\s+/; # match a,b,c,x,y,z, or a digit
/^=item\s\p{IsDigit}/; # match '=item',
# followed by a space and a digit
Whew! That is all the rest of the characters and character classes.
=head2 Compiling and saving regular expressions
In Part 1 we mentioned that Perl compiles a regexp into a compact
sequence of opcodes. Thus, a compiled regexp is a data structure
that can be stored once and used again and again. The regexp quote
C<qr//> does exactly that: C<qr/string/> compiles the C<string> as a
regexp and transforms the result into a form that can be assigned to a
variable:
$reg = qr/foo+bar?/; # reg contains a compiled regexp
Then C<$reg> can be used as a regexp:
$x = "fooooba";
$x =~ $reg; # matches, just like /foo+bar?/
$x =~ /$reg/; # same thing, alternate form
C<$reg> can also be interpolated into a larger regexp:
$x =~ /(abc)?$reg/; # still matches
As with the matching operator, the regexp quote can use different
delimiters, I<e.g.>, C<qr!!>, C<qr{}> or C<qr~~>. Apostrophes
as delimiters (C<qr''>) inhibit any interpolation.
Pre-compiled regexps are useful for creating dynamic matches that
don't need to be recompiled each time they are encountered. Using
pre-compiled regexps, we write a C<grep_step> program which greps
for a sequence of patterns, advancing to the next pattern as soon
as one has been satisfied.
% cat > grep_step
#!/usr/bin/perl
# grep_step - match <number> regexps, one after the other
# usage: multi_grep <number> regexp1 regexp2 ... file1 file2 ...
$number = shift;
$regexp[$_] = shift foreach (0..$number-1);
@compiled = map qr/$_/, @regexp;
while ($line = <>) {
if ($line =~ /$compiled[0]/) {
print $line;
shift @compiled;
last unless @compiled;
}
}
^D
% grep_step 3 shift print last grep_step
$number = shift;
print $line;
last unless @compiled;
Storing pre-compiled regexps in an array C<@compiled> allows us to
simply loop through the regexps without any recompilation, thus gaining
flexibility without sacrificing speed.
=head2 Composing regular expressions at runtime
Backtracking is more efficient than repeated tries with different regular
expressions. If there are several regular expressions and a match with
any of them is acceptable, then it is possible to combine them into a set
of alternatives. If the individual expressions are input data, this
can be done by programming a join operation. We'll exploit this idea in
an improved version of the C<simple_grep> program: a program that matches
multiple patterns:
% cat > multi_grep
#!/usr/bin/perl
# multi_grep - match any of <number> regexps
# usage: multi_grep <number> regexp1 regexp2 ... file1 file2 ...
$number = shift;
$regexp[$_] = shift foreach (0..$number-1);
$pattern = join '|', @regexp;
while ($line = <>) {
print $line if $line =~ /$pattern/;
}
^D
% multi_grep 2 shift for multi_grep
$number = shift;
$regexp[$_] = shift foreach (0..$number-1);
Sometimes it is advantageous to construct a pattern from the I<input>
that is to be analyzed and use the permissible values on the left
hand side of the matching operations. As an example for this somewhat
paradoxical situation, let's assume that our input contains a command
verb which should match one out of a set of available command verbs,
with the additional twist that commands may be abbreviated as long as
the given string is unique. The program below demonstrates the basic
algorithm.
% cat > keymatch
#!/usr/bin/perl
$kwds = 'copy compare list print';
while( $cmd = <> ){
$cmd =~ s/^\s+|\s+$//g; # trim leading and trailing spaces
if( ( @matches = $kwds =~ /\b$cmd\w*/g ) == 1 ){
print "command: '@matches'\n";
} elsif( @matches == 0 ){
print "no such command: '$cmd'\n";
} else {
print "not unique: '$cmd' (could be one of: @matches)\n";
}
}
^D
% keymatch
li
command: 'list'
co
not unique: 'co' (could be one of: copy compare)
printer
no such command: 'printer'
Rather than trying to match the input against the keywords, we match the
combined set of keywords against the input. The pattern matching
operation S<C<$kwds =~ /\b($cmd\w*)/g>> does several things at the
same time. It makes sure that the given command begins where a keyword
begins (C<\b>). It tolerates abbreviations due to the added C<\w*>. It
tells us the number of matches (C<scalar @matches>) and all the keywords
that were actually matched. You could hardly ask for more.
=head2 Embedding comments and modifiers in a regular expression
Starting with this section, we will be discussing Perl's set of
I<extended patterns>. These are extensions to the traditional regular
expression syntax that provide powerful new tools for pattern
matching. We have already seen extensions in the form of the minimal
matching constructs C<??>, C<*?>, C<+?>, C<{n,m}?>, and C<{n,}?>. Most
of the extensions below have the form C<(?char...)>, where the
C<char> is a character that determines the type of extension.
The first extension is an embedded comment C<(?#text)>. This embeds a
comment into the regular expression without affecting its meaning. The
comment should not have any closing parentheses in the text. An
example is
/(?# Match an integer:)[+-]?\d+/;
This style of commenting has been largely superseded by the raw,
freeform commenting that is allowed with the C</x> modifier.
Most modifiers, such as C</i>, C</m>, C</s> and C</x> (or any
combination thereof) can also be embedded in
a regexp using C<(?i)>, C<(?m)>, C<(?s)>, and C<(?x)>. For instance,
/(?i)yes/; # match 'yes' case insensitively
/yes/i; # same thing
/(?x)( # freeform version of an integer regexp
[+-]? # match an optional sign
\d+ # match a sequence of digits
)
/x;
Embedded modifiers can have two important advantages over the usual
modifiers. Embedded modifiers allow a custom set of modifiers for
I<each> regexp pattern. This is great for matching an array of regexps
that must have different modifiers:
$pattern[0] = '(?i)doctor';
$pattern[1] = 'Johnson';
...
while (<>) {
foreach $patt (@pattern) {
print if /$patt/;
}
}
The second advantage is that embedded modifiers (except C</p>, which
modifies the entire regexp) only affect the regexp
inside the group the embedded modifier is contained in. So grouping
can be used to localize the modifier's effects:
/Answer: ((?i)yes)/; # matches 'Answer: yes', 'Answer: YES', etc.
Embedded modifiers can also turn off any modifiers already present
by using, I<e.g.>, C<(?-i)>. Modifiers can also be combined into
a single expression, I<e.g.>, C<(?s-i)> turns on single line mode and
turns off case insensitivity.
Embedded modifiers may also be added to a non-capturing grouping.
C<(?i-m:regexp)> is a non-capturing grouping that matches C<regexp>
case insensitively and turns off multi-line mode.
=head2 Looking ahead and looking behind
This section concerns the lookahead and lookbehind assertions. First,
a little background.
In Perl regular expressions, most regexp elements "eat up" a certain
amount of string when they match. For instance, the regexp element
C<[abc]> eats up one character of the string when it matches, in the
sense that Perl moves to the next character position in the string
after the match. There are some elements, however, that don't eat up
characters (advance the character position) if they match. The examples
we have seen so far are the anchors. The anchor C<'^'> matches the
beginning of the line, but doesn't eat any characters. Similarly, the
word boundary anchor C<\b> matches wherever a character matching C<\w>
is next to a character that doesn't, but it doesn't eat up any
characters itself. Anchors are examples of I<zero-width assertions>:
zero-width, because they consume
no characters, and assertions, because they test some property of the
string. In the context of our walk in the woods analogy to regexp
matching, most regexp elements move us along a trail, but anchors have
us stop a moment and check our surroundings. If the local environment
checks out, we can proceed forward. But if the local environment
doesn't satisfy us, we must backtrack.
Checking the environment entails either looking ahead on the trail,
looking behind, or both. C<'^'> looks behind, to see that there are no
characters before. C<'$'> looks ahead, to see that there are no
characters after. C<\b> looks both ahead and behind, to see if the
characters on either side differ in their "word-ness".
The lookahead and lookbehind assertions are generalizations of the
anchor concept. Lookahead and lookbehind are zero-width assertions
that let us specify which characters we want to test for. The
lookahead assertion is denoted by C<(?=regexp)> and the lookbehind
assertion is denoted by C<< (?<=fixed-regexp) >>. Some examples are
$x = "I catch the housecat 'Tom-cat' with catnip";
$x =~ /cat(?=\s)/; # matches 'cat' in 'housecat'
@catwords = ($x =~ /(?<=\s)cat\w+/g); # matches,
# $catwords[0] = 'catch'
# $catwords[1] = 'catnip'
$x =~ /\bcat\b/; # matches 'cat' in 'Tom-cat'
$x =~ /(?<=\s)cat(?=\s)/; # doesn't match; no isolated 'cat' in
# middle of $x
Note that the parentheses in C<(?=regexp)> and C<< (?<=regexp) >> are
non-capturing, since these are zero-width assertions. Thus in the
second regexp, the substrings captured are those of the whole regexp
itself. Lookahead C<(?=regexp)> can match arbitrary regexps, but
lookbehind C<< (?<=fixed-regexp) >> only works for regexps of fixed
width, I<i.e.>, a fixed number of characters long. Thus
C<< (?<=(ab|bc)) >> is fine, but C<< (?<=(ab)*) >> is not. The
negated versions of the lookahead and lookbehind assertions are
denoted by C<(?!regexp)> and C<< (?<!fixed-regexp) >> respectively.
They evaluate true if the regexps do I<not> match:
$x = "foobar";
$x =~ /foo(?!bar)/; # doesn't match, 'bar' follows 'foo'
$x =~ /foo(?!baz)/; # matches, 'baz' doesn't follow 'foo'
$x =~ /(?<!\s)foo/; # matches, there is no \s before 'foo'
Here is an example where a string containing blank-separated words,
numbers and single dashes is to be split into its components.
Using C</\s+/> alone won't work, because spaces are not required between
dashes, or a word or a dash. Additional places for a split are established
by looking ahead and behind:
$str = "one two - --6-8";
@toks = split / \s+ # a run of spaces
| (?<=\S) (?=-) # any non-space followed by '-'
| (?<=-) (?=\S) # a '-' followed by any non-space
/x, $str; # @toks = qw(one two - - - 6 - 8)
=head2 Using independent subexpressions to prevent backtracking
I<Independent subexpressions> are regular expressions, in the
context of a larger regular expression, that function independently of
the larger regular expression. That is, they consume as much or as
little of the string as they wish without regard for the ability of
the larger regexp to match. Independent subexpressions are represented
by C<< (?>regexp) >>. We can illustrate their behavior by first
considering an ordinary regexp:
$x = "ab";
$x =~ /a*ab/; # matches
This obviously matches, but in the process of matching, the
subexpression C<a*> first grabbed the C<'a'>. Doing so, however,
wouldn't allow the whole regexp to match, so after backtracking, C<a*>
eventually gave back the C<'a'> and matched the empty string. Here, what
C<a*> matched was I<dependent> on what the rest of the regexp matched.
Contrast that with an independent subexpression:
$x =~ /(?>a*)ab/; # doesn't match!
The independent subexpression C<< (?>a*) >> doesn't care about the rest
of the regexp, so it sees an C<'a'> and grabs it. Then the rest of the
regexp C<ab> cannot match. Because C<< (?>a*) >> is independent, there
is no backtracking and the independent subexpression does not give
up its C<'a'>. Thus the match of the regexp as a whole fails. A similar
behavior occurs with completely independent regexps:
$x = "ab";
$x =~ /a*/g; # matches, eats an 'a'
$x =~ /\Gab/g; # doesn't match, no 'a' available
Here C</g> and C<\G> create a "tag team" handoff of the string from
one regexp to the other. Regexps with an independent subexpression are
much like this, with a handoff of the string to the independent
subexpression, and a handoff of the string back to the enclosing
regexp.
The ability of an independent subexpression to prevent backtracking
can be quite useful. Suppose we want to match a non-empty string
enclosed in parentheses up to two levels deep. Then the following
regexp matches:
$x = "abc(de(fg)h"; # unbalanced parentheses
$x =~ /\( ( [ ^ () ]+ | \( [ ^ () ]* \) )+ \)/xx;
The regexp matches an open parenthesis, one or more copies of an
alternation, and a close parenthesis. The alternation is two-way, with
the first alternative C<[^()]+> matching a substring with no
parentheses and the second alternative C<\([^()]*\)> matching a
substring delimited by parentheses. The problem with this regexp is
that it is pathological: it has nested indeterminate quantifiers
of the form C<(a+|b)+>. We discussed in Part 1 how nested quantifiers
like this could take an exponentially long time to execute if there
was no match possible. To prevent the exponential blowup, we need to
prevent useless backtracking at some point. This can be done by
enclosing the inner quantifier as an independent subexpression:
$x =~ /\( ( (?> [ ^ () ]+ ) | \([ ^ () ]* \) )+ \)/xx;
Here, C<< (?>[^()]+) >> breaks the degeneracy of string partitioning
by gobbling up as much of the string as possible and keeping it. Then
match failures fail much more quickly.
=head2 Conditional expressions
A I<conditional expression> is a form of if-then-else statement
that allows one to choose which patterns are to be matched, based on
some condition. There are two types of conditional expression:
C<(?(I<condition>)I<yes-regexp>)> and
C<(?(condition)I<yes-regexp>|I<no-regexp>)>.
C<(?(I<condition>)I<yes-regexp>)> is
like an S<C<'if () {}'>> statement in Perl. If the I<condition> is true,
the I<yes-regexp> will be matched. If the I<condition> is false, the
I<yes-regexp> will be skipped and Perl will move onto the next regexp
element. The second form is like an S<C<'if () {} else {}'>> statement
in Perl. If the I<condition> is true, the I<yes-regexp> will be
matched, otherwise the I<no-regexp> will be matched.
The I<condition> can have several forms. The first form is simply an
integer in parentheses C<(I<integer>)>. It is true if the corresponding
backreference C<\I<integer>> matched earlier in the regexp. The same
thing can be done with a name associated with a capture group, written
as C<<< (E<lt>I<name>E<gt>) >>> or C<< ('I<name>') >>. The second form is a bare
zero-width assertion C<(?...)>, either a lookahead, a lookbehind, or a
code assertion (discussed in the next section). The third set of forms
provides tests that return true if the expression is executed within
a recursion (C<(R)>) or is being called from some capturing group,
referenced either by number (C<(R1)>, C<(R2)>,...) or by name
(C<(R&I<name>)>).
The integer or name form of the C<condition> allows us to choose,
with more flexibility, what to match based on what matched earlier in the
regexp. This searches for words of the form C<"$x$x"> or C<"$x$y$y$x">:
% simple_grep '^(\w+)(\w+)?(?(2)\g2\g1|\g1)$' /usr/dict/words
beriberi
coco
couscous
deed
...
toot
toto
tutu
The lookbehind C<condition> allows, along with backreferences,
an earlier part of the match to influence a later part of the
match. For instance,
/[ATGC]+(?(?<=AA)G|C)$/;
matches a DNA sequence such that it either ends in C<AAG>, or some
other base pair combination and C<'C'>. Note that the form is
C<< (?(?<=AA)G|C) >> and not C<< (?((?<=AA))G|C) >>; for the
lookahead, lookbehind or code assertions, the parentheses around the
conditional are not needed.
=head2 Defining named patterns
Some regular expressions use identical subpatterns in several places.
Starting with Perl 5.10, it is possible to define named subpatterns in
a section of the pattern so that they can be called up by name
anywhere in the pattern. This syntactic pattern for this definition
group is C<< (?(DEFINE)(?<I<name>>I<pattern>)...) >>. An insertion
of a named pattern is written as C<(?&I<name>)>.
The example below illustrates this feature using the pattern for
floating point numbers that was presented earlier on. The three
subpatterns that are used more than once are the optional sign, the
digit sequence for an integer and the decimal fraction. The C<DEFINE>
group at the end of the pattern contains their definition. Notice
that the decimal fraction pattern is the first place where we can
reuse the integer pattern.
/^ (?&osg)\ * ( (?&int)(?&dec)? | (?&dec) )
(?: [eE](?&osg)(?&int) )?
$
(?(DEFINE)
(?<osg>[-+]?) # optional sign
(?<int>\d++) # integer
(?<dec>\.(?&int)) # decimal fraction
)/x
=head2 Recursive patterns
This feature (introduced in Perl 5.10) significantly extends the
power of Perl's pattern matching. By referring to some other
capture group anywhere in the pattern with the construct
C<(?I<group-ref>)>, the I<pattern> within the referenced group is used
as an independent subpattern in place of the group reference itself.
Because the group reference may be contained I<within> the group it
refers to, it is now possible to apply pattern matching to tasks that
hitherto required a recursive parser.
To illustrate this feature, we'll design a pattern that matches if
a string contains a palindrome. (This is a word or a sentence that,
while ignoring spaces, interpunctuation and case, reads the same backwards
as forwards. We begin by observing that the empty string or a string
containing just one word character is a palindrome. Otherwise it must
have a word character up front and the same at its end, with another
palindrome in between.
/(?: (\w) (?...Here be a palindrome...) \g{-1} | \w? )/x
Adding C<\W*> at either end to eliminate what is to be ignored, we already
have the full pattern:
my $pp = qr/^(\W* (?: (\w) (?1) \g{-1} | \w? ) \W*)$/ix;
for $s ( "saippuakauppias", "A man, a plan, a canal: Panama!" ){
print "'$s' is a palindrome\n" if $s =~ /$pp/;
}
In C<(?...)> both absolute and relative backreferences may be used.
The entire pattern can be reinserted with C<(?R)> or C<(?0)>.
If you prefer to name your groups, you can use C<(?&I<name>)> to
recurse into that group.
=head2 A bit of magic: executing Perl code in a regular expression
Normally, regexps are a part of Perl expressions.
I<Code evaluation> expressions turn that around by allowing
arbitrary Perl code to be a part of a regexp. A code evaluation
expression is denoted C<(?{I<code>})>, with I<code> a string of Perl
statements.
Code expressions are zero-width assertions, and the value they return
depends on their environment. There are two possibilities: either the
code expression is used as a conditional in a conditional expression
C<(?(I<condition>)...)>, or it is not. If the code expression is a
conditional, the code is evaluated and the result (I<i.e.>, the result of
the last statement) is used to determine truth or falsehood. If the
code expression is not used as a conditional, the assertion always
evaluates true and the result is put into the special variable
C<$^R>. The variable C<$^R> can then be used in code expressions later
in the regexp. Here are some silly examples:
$x = "abcdef";
$x =~ /abc(?{print "Hi Mom!";})def/; # matches,
# prints 'Hi Mom!'
$x =~ /aaa(?{print "Hi Mom!";})def/; # doesn't match,
# no 'Hi Mom!'
Pay careful attention to the next example:
$x =~ /abc(?{print "Hi Mom!";})ddd/; # doesn't match,
# no 'Hi Mom!'
# but why not?
At first glance, you'd think that it shouldn't print, because obviously
the C<ddd> isn't going to match the target string. But look at this
example:
$x =~ /abc(?{print "Hi Mom!";})[dD]dd/; # doesn't match,
# but _does_ print
Hmm. What happened here? If you've been following along, you know that
the above pattern should be effectively (almost) the same as the last one;
enclosing the C<'d'> in a character class isn't going to change what it
matches. So why does the first not print while the second one does?
The answer lies in the optimizations the regexp engine makes. In the first
case, all the engine sees are plain old characters (aside from the
C<?{}> construct). It's smart enough to realize that the string C<'ddd'>
doesn't occur in our target string before actually running the pattern
through. But in the second case, we've tricked it into thinking that our
pattern is more complicated. It takes a look, sees our
character class, and decides that it will have to actually run the
pattern to determine whether or not it matches, and in the process of
running it hits the print statement before it discovers that we don't
have a match.
To take a closer look at how the engine does optimizations, see the
section L</"Pragmas and debugging"> below.
More fun with C<?{}>:
$x =~ /(?{print "Hi Mom!";})/; # matches,
# prints 'Hi Mom!'
$x =~ /(?{$c = 1;})(?{print "$c";})/; # matches,
# prints '1'
$x =~ /(?{$c = 1;})(?{print "$^R";})/; # matches,
# prints '1'
The bit of magic mentioned in the section title occurs when the regexp
backtracks in the process of searching for a match. If the regexp
backtracks over a code expression and if the variables used within are
localized using C<local>, the changes in the variables produced by the
code expression are undone! Thus, if we wanted to count how many times
a character got matched inside a group, we could use, I<e.g.>,
$x = "aaaa";
$count = 0; # initialize 'a' count
$c = "bob"; # test if $c gets clobbered
$x =~ /(?{local $c = 0;}) # initialize count
( a # match 'a'
(?{local $c = $c + 1;}) # increment count
)* # do this any number of times,
aa # but match 'aa' at the end
(?{$count = $c;}) # copy local $c var into $count
/x;
print "'a' count is $count, \$c variable is '$c'\n";
This prints
'a' count is 2, $c variable is 'bob'
If we replace the S<C< (?{local $c = $c + 1;})>> with
S<C< (?{$c = $c + 1;})>>, the variable changes are I<not> undone
during backtracking, and we get
'a' count is 4, $c variable is 'bob'
Note that only localized variable changes are undone. Other side
effects of code expression execution are permanent. Thus
$x = "aaaa";
$x =~ /(a(?{print "Yow\n";}))*aa/;
produces
Yow
Yow
Yow
Yow
The result C<$^R> is automatically localized, so that it will behave
properly in the presence of backtracking.
This example uses a code expression in a conditional to match a
definite article, either C<'the'> in English or C<'der|die|das'> in
German:
$lang = 'DE'; # use German
...
$text = "das";
print "matched\n"
if $text =~ /(?(?{
$lang eq 'EN'; # is the language English?
})
the | # if so, then match 'the'
(der|die|das) # else, match 'der|die|das'
)
/xi;
Note that the syntax here is C<(?(?{...})I<yes-regexp>|I<no-regexp>)>, not
C<(?((?{...}))I<yes-regexp>|I<no-regexp>)>. In other words, in the case of a
code expression, we don't need the extra parentheses around the
conditional.
If you try to use code expressions where the code text is contained within
an interpolated variable, rather than appearing literally in the pattern,
Perl may surprise you:
$bar = 5;
$pat = '(?{ 1 })';
/foo(?{ $bar })bar/; # compiles ok, $bar not interpolated
/foo(?{ 1 })$bar/; # compiles ok, $bar interpolated
/foo${pat}bar/; # compile error!
$pat = qr/(?{ $foo = 1 })/; # precompile code regexp
/foo${pat}bar/; # compiles ok
If a regexp has a variable that interpolates a code expression, Perl
treats the regexp as an error. If the code expression is precompiled into
a variable, however, interpolating is ok. The question is, why is this an
error?
The reason is that variable interpolation and code expressions
together pose a security risk. The combination is dangerous because
many programmers who write search engines often take user input and
plug it directly into a regexp:
$regexp = <>; # read user-supplied regexp
$chomp $regexp; # get rid of possible newline
$text =~ /$regexp/; # search $text for the $regexp
If the C<$regexp> variable contains a code expression, the user could
then execute arbitrary Perl code. For instance, some joker could
search for S<C<system('rm -rf *');>> to erase your files. In this
sense, the combination of interpolation and code expressions I<taints>
your regexp. So by default, using both interpolation and code
expressions in the same regexp is not allowed. If you're not
concerned about malicious users, it is possible to bypass this
security check by invoking S<C<use re 'eval'>>:
use re 'eval'; # throw caution out the door
$bar = 5;
$pat = '(?{ 1 })';
/foo${pat}bar/; # compiles ok
Another form of code expression is the I<pattern code expression>.
The pattern code expression is like a regular code expression, except
that the result of the code evaluation is treated as a regular
expression and matched immediately. A simple example is
$length = 5;
$char = 'a';
$x = 'aaaaabb';
$x =~ /(??{$char x $length})/x; # matches, there are 5 of 'a'
This final example contains both ordinary and pattern code
expressions. It detects whether a binary string C<1101010010001...> has a
Fibonacci spacing 0,1,1,2,3,5,... of the C<'1'>'s:
$x = "1101010010001000001";
$z0 = ''; $z1 = '0'; # initial conditions
print "It is a Fibonacci sequence\n"
if $x =~ /^1 # match an initial '1'
(?:
((??{ $z0 })) # match some '0'
1 # and then a '1'
(?{ $z0 = $z1; $z1 .= $^N; })
)+ # repeat as needed
$ # that is all there is
/x;
printf "Largest sequence matched was %d\n", length($z1)-length($z0);
Remember that C<$^N> is set to whatever was matched by the last
completed capture group. This prints
It is a Fibonacci sequence
Largest sequence matched was 5
Ha! Try that with your garden variety regexp package...
Note that the variables C<$z0> and C<$z1> are not substituted when the
regexp is compiled, as happens for ordinary variables outside a code
expression. Rather, the whole code block is parsed as perl code at the
same time as perl is compiling the code containing the literal regexp
pattern.
This regexp without the C</x> modifier is
/^1(?:((??{ $z0 }))1(?{ $z0 = $z1; $z1 .= $^N; }))+$/
which shows that spaces are still possible in the code parts. Nevertheless,
when working with code and conditional expressions, the extended form of
regexps is almost necessary in creating and debugging regexps.
=head2 Backtracking control verbs
Perl 5.10 introduced a number of control verbs intended to provide
detailed control over the backtracking process, by directly influencing
the regexp engine and by providing monitoring techniques. See
L<perlre/"Special Backtracking Control Verbs"> for a detailed
description.
Below is just one example, illustrating the control verb C<(*FAIL)>,
which may be abbreviated as C<(*F)>. If this is inserted in a regexp
it will cause it to fail, just as it would at some
mismatch between the pattern and the string. Processing
of the regexp continues as it would after any "normal"
failure, so that, for instance, the next position in the string or another
alternative will be tried. As failing to match doesn't preserve capture
groups or produce results, it may be necessary to use this in
combination with embedded code.
%count = ();
"supercalifragilisticexpialidocious" =~
/([aeiou])(?{ $count{$1}++; })(*FAIL)/i;
printf "%3d '%s'\n", $count{$_}, $_ for (sort keys %count);
The pattern begins with a class matching a subset of letters. Whenever
this matches, a statement like C<$count{'a'}++;> is executed, incrementing
the letter's counter. Then C<(*FAIL)> does what it says, and
the regexp engine proceeds according to the book: as long as the end of
the string hasn't been reached, the position is advanced before looking
for another vowel. Thus, match or no match makes no difference, and the
regexp engine proceeds until the entire string has been inspected.
(It's remarkable that an alternative solution using something like
$count{lc($_)}++ for split('', "supercalifragilisticexpialidocious");
printf "%3d '%s'\n", $count2{$_}, $_ for ( qw{ a e i o u } );
is considerably slower.)
=head2 Pragmas and debugging
Speaking of debugging, there are several pragmas available to control
and debug regexps in Perl. We have already encountered one pragma in
the previous section, S<C<use re 'eval';>>, that allows variable
interpolation and code expressions to coexist in a regexp. The other
pragmas are
use re 'taint';
$tainted = <>;
@parts = ($tainted =~ /(\w+)\s+(\w+)/; # @parts is now tainted
The C<taint> pragma causes any substrings from a match with a tainted
variable to be tainted as well. This is not normally the case, as
regexps are often used to extract the safe bits from a tainted
variable. Use C<taint> when you are not extracting safe bits, but are
performing some other processing. Both C<taint> and C<eval> pragmas
are lexically scoped, which means they are in effect only until
the end of the block enclosing the pragmas.
use re '/m'; # or any other flags
$multiline_string =~ /^foo/; # /m is implied
The C<re '/flags'> pragma (introduced in Perl
5.14) turns on the given regular expression flags
until the end of the lexical scope. See
L<re/"'E<sol>flags' mode"> for more
detail.
use re 'debug';
/^(.*)$/s; # output debugging info
use re 'debugcolor';
/^(.*)$/s; # output debugging info in living color
The global C<debug> and C<debugcolor> pragmas allow one to get
detailed debugging info about regexp compilation and
execution. C<debugcolor> is the same as debug, except the debugging
information is displayed in color on terminals that can display
termcap color sequences. Here is example output:
% perl -e 'use re "debug"; "abc" =~ /a*b+c/;'
Compiling REx 'a*b+c'
size 9 first at 1
1: STAR(4)
2: EXACT <a>(0)
4: PLUS(7)
5: EXACT <b>(0)
7: EXACT <c>(9)
9: END(0)
floating 'bc' at 0..2147483647 (checking floating) minlen 2
Guessing start of match, REx 'a*b+c' against 'abc'...
Found floating substr 'bc' at offset 1...
Guessed: match at offset 0
Matching REx 'a*b+c' against 'abc'
Setting an EVAL scope, savestack=3
0 <> <abc> | 1: STAR
EXACT <a> can match 1 times out of 32767...
Setting an EVAL scope, savestack=3
1 <a> <bc> | 4: PLUS
EXACT <b> can match 1 times out of 32767...
Setting an EVAL scope, savestack=3
2 <ab> <c> | 7: EXACT <c>
3 <abc> <> | 9: END
Match successful!
Freeing REx: 'a*b+c'
If you have gotten this far into the tutorial, you can probably guess
what the different parts of the debugging output tell you. The first
part
Compiling REx 'a*b+c'
size 9 first at 1
1: STAR(4)
2: EXACT <a>(0)
4: PLUS(7)
5: EXACT <b>(0)
7: EXACT <c>(9)
9: END(0)
describes the compilation stage. C<STAR(4)> means that there is a
starred object, in this case C<'a'>, and if it matches, goto line 4,
I<i.e.>, C<PLUS(7)>. The middle lines describe some heuristics and
optimizations performed before a match:
floating 'bc' at 0..2147483647 (checking floating) minlen 2
Guessing start of match, REx 'a*b+c' against 'abc'...
Found floating substr 'bc' at offset 1...
Guessed: match at offset 0
Then the match is executed and the remaining lines describe the
process:
Matching REx 'a*b+c' against 'abc'
Setting an EVAL scope, savestack=3
0 <> <abc> | 1: STAR
EXACT <a> can match 1 times out of 32767...
Setting an EVAL scope, savestack=3
1 <a> <bc> | 4: PLUS
EXACT <b> can match 1 times out of 32767...
Setting an EVAL scope, savestack=3
2 <ab> <c> | 7: EXACT <c>
3 <abc> <> | 9: END
Match successful!
Freeing REx: 'a*b+c'
Each step is of the form S<C<< n <x> <y> >>>, with C<< <x> >> the
part of the string matched and C<< <y> >> the part not yet
matched. The S<C<< | 1: STAR >>> says that Perl is at line number 1
in the compilation list above. See
L<perldebguts/"Debugging Regular Expressions"> for much more detail.
An alternative method of debugging regexps is to embed C<print>
statements within the regexp. This provides a blow-by-blow account of
the backtracking in an alternation:
"that this" =~ m@(?{print "Start at position ", pos, "\n";})
t(?{print "t1\n";})
h(?{print "h1\n";})
i(?{print "i1\n";})
s(?{print "s1\n";})
|
t(?{print "t2\n";})
h(?{print "h2\n";})
a(?{print "a2\n";})
t(?{print "t2\n";})
(?{print "Done at position ", pos, "\n";})
@x;
prints
Start at position 0
t1
h1
t2
h2
a2
t2
Done at position 4
=head1 SEE ALSO
This is just a tutorial. For the full story on Perl regular
expressions, see the L<perlre> regular expressions reference page.
For more information on the matching C<m//> and substitution C<s///>
operators, see L<perlop/"Regexp Quote-Like Operators">. For
information on the C<split> operation, see L<perlfunc/split>.
For an excellent all-around resource on the care and feeding of
regular expressions, see the book I<Mastering Regular Expressions> by
Jeffrey Friedl (published by O'Reilly, ISBN 1556592-257-3).
=head1 AUTHOR AND COPYRIGHT
Copyright (c) 2000 Mark Kvale.
All rights reserved.
Now maintained by Perl porters.
This document may be distributed under the same terms as Perl itself.
=head2 Acknowledgments
The inspiration for the stop codon DNA example came from the ZIP
code example in chapter 7 of I<Mastering Regular Expressions>.
The author would like to thank Jeff Pinyan, Andrew Johnson, Peter
Haworth, Ronald J Kimball, and Joe Smith for all their helpful
comments.
=cut
PK {3�Zݍ�2 2 perlvar.podnu �[��� =head1 NAME
perlvar - Perl predefined variables
=head1 DESCRIPTION
=head2 The Syntax of Variable Names
Variable names in Perl can have several formats. Usually, they
must begin with a letter or underscore, in which case they can be
arbitrarily long (up to an internal limit of 251 characters) and
may contain letters, digits, underscores, or the special sequence
C<::> or C<'>. In this case, the part before the last C<::> or
C<'> is taken to be a I<package qualifier>; see L<perlmod>.
A Unicode letter that is not ASCII is not considered to be a letter
unless S<C<"use utf8">> is in effect, and somewhat more complicated
rules apply; see L<perldata/Identifier parsing> for details.
Perl variable names may also be a sequence of digits, a single
punctuation character, or the two-character sequence: C<^> (caret or
CIRCUMFLEX ACCENT) followed by any one of the characters C<[][A-Z^_?\]>.
These names are all reserved for
special uses by Perl; for example, the all-digits names are used
to hold data captured by backreferences after a regular expression
match.
Since Perl v5.6.0, Perl variable names may also be alphanumeric strings
preceded by a caret. These must all be written in the form C<${^Foo}>;
the braces are not optional. C<${^Foo}> denotes the scalar variable
whose name is considered to be a control-C<F> followed by two C<o>'s.
These variables are
reserved for future special uses by Perl, except for the ones that
begin with C<^_> (caret-underscore). No
name that begins with C<^_> will acquire a special
meaning in any future version of Perl; such names may therefore be
used safely in programs. C<$^_> itself, however, I<is> reserved.
Perl identifiers that begin with digits or
punctuation characters are exempt from the effects of the C<package>
declaration and are always forced to be in package C<main>; they are
also exempt from C<strict 'vars'> errors. A few other names are also
exempt in these ways:
ENV STDIN
INC STDOUT
ARGV STDERR
ARGVOUT
SIG
In particular, the special C<${^_XYZ}> variables are always taken
to be in package C<main>, regardless of any C<package> declarations
presently in scope.
=head1 SPECIAL VARIABLES
The following names have special meaning to Perl. Most punctuation
names have reasonable mnemonics, or analogs in the shells.
Nevertheless, if you wish to use long variable names, you need only say:
use English;
at the top of your program. This aliases all the short names to the long
names in the current package. Some even have medium names, generally
borrowed from B<awk>. For more info, please see L<English>.
Before you continue, note the sort order for variables. In general, we
first list the variables in case-insensitive, almost-lexigraphical
order (ignoring the C<{> or C<^> preceding words, as in C<${^UNICODE}>
or C<$^T>), although C<$_> and C<@_> move up to the top of the pile.
For variables with the same identifier, we list it in order of scalar,
array, hash, and bareword.
=head2 General Variables
=over 8
=item $ARG
=item $_
X<$_> X<$ARG>
The default input and pattern-searching space. The following pairs are
equivalent:
while (<>) {...} # equivalent only in while!
while (defined($_ = <>)) {...}
/^Subject:/
$_ =~ /^Subject:/
tr/a-z/A-Z/
$_ =~ tr/a-z/A-Z/
chomp
chomp($_)
Here are the places where Perl will assume C<$_> even if you don't use it:
=over 3
=item *
The following functions use C<$_> as a default argument:
abs, alarm, chomp, chop, chr, chroot,
cos, defined, eval, evalbytes, exp, fc, glob, hex, int, lc,
lcfirst, length, log, lstat, mkdir, oct, ord, pos, print, printf,
quotemeta, readlink, readpipe, ref, require, reverse (in scalar context only),
rmdir, say, sin, split (for its second
argument), sqrt, stat, study, uc, ucfirst,
unlink, unpack.
=item *
All file tests (C<-f>, C<-d>) except for C<-t>, which defaults to STDIN.
See L<perlfunc/-X>
=item *
The pattern matching operations C<m//>, C<s///> and C<tr///> (aka C<y///>)
when used without an C<=~> operator.
=item *
The default iterator variable in a C<foreach> loop if no other
variable is supplied.
=item *
The implicit iterator variable in the C<grep()> and C<map()> functions.
=item *
The implicit variable of C<given()>.
=item *
The default place to put the next value or input record
when a C<< <FH> >>, C<readline>, C<readdir> or C<each>
operation's result is tested by itself as the sole criterion of a C<while>
test. Outside a C<while> test, this will not happen.
=back
C<$_> is by default a global variable. However, as
of perl v5.10.0, you can use a lexical version of
C<$_> by declaring it in a file or in a block with C<my>. Moreover,
declaring C<our $_> restores the global C<$_> in the current scope. Though
this seemed like a good idea at the time it was introduced, lexical C<$_>
actually causes more problems than it solves. If you call a function that
expects to be passed information via C<$_>, it may or may not work,
depending on how the function is written, there not being any easy way to
solve this. Just avoid lexical C<$_>, unless you are feeling particularly
masochistic. For this reason lexical C<$_> is still experimental and will
produce a warning unless warnings have been disabled. As with other
experimental features, the behavior of lexical C<$_> is subject to change
without notice, including change into a fatal error.
Mnemonic: underline is understood in certain operations.
=item @ARG
=item @_
X<@_> X<@ARG>
Within a subroutine the array C<@_> contains the parameters passed to
that subroutine. Inside a subroutine, C<@_> is the default array for
the array operators C<pop> and C<shift>.
See L<perlsub>.
=item $LIST_SEPARATOR
=item $"
X<$"> X<$LIST_SEPARATOR>
When an array or an array slice is interpolated into a double-quoted
string or a similar context such as C</.../>, its elements are
separated by this value. Default is a space. For example, this:
print "The array is: @array\n";
is equivalent to this:
print "The array is: " . join($", @array) . "\n";
Mnemonic: works in double-quoted context.
=item $PROCESS_ID
=item $PID
=item $$
X<$$> X<$PID> X<$PROCESS_ID>
The process number of the Perl running this script. Though you I<can> set
this variable, doing so is generally discouraged, although it can be
invaluable for some testing purposes. It will be reset automatically
across C<fork()> calls.
Note for Linux and Debian GNU/kFreeBSD users: Before Perl v5.16.0 perl
would emulate POSIX semantics on Linux systems using LinuxThreads, a
partial implementation of POSIX Threads that has since been superseded
by the Native POSIX Thread Library (NPTL).
LinuxThreads is now obsolete on Linux, and caching C<getpid()>
like this made embedding perl unnecessarily complex (since you'd have
to manually update the value of $$), so now C<$$> and C<getppid()>
will always return the same values as the underlying C library.
Debian GNU/kFreeBSD systems also used LinuxThreads up until and
including the 6.0 release, but after that moved to FreeBSD thread
semantics, which are POSIX-like.
To see if your system is affected by this discrepancy check if
C<getconf GNU_LIBPTHREAD_VERSION | grep -q NPTL> returns a false
value. NTPL threads preserve the POSIX semantics.
Mnemonic: same as shells.
=item $PROGRAM_NAME
=item $0
X<$0> X<$PROGRAM_NAME>
Contains the name of the program being executed.
On some (but not all) operating systems assigning to C<$0> modifies
the argument area that the C<ps> program sees. On some platforms you
may have to use special C<ps> options or a different C<ps> to see the
changes. Modifying the C<$0> is more useful as a way of indicating the
current program state than it is for hiding the program you're
running.
Note that there are platform-specific limitations on the maximum
length of C<$0>. In the most extreme case it may be limited to the
space occupied by the original C<$0>.
In some platforms there may be arbitrary amount of padding, for
example space characters, after the modified name as shown by C<ps>.
In some platforms this padding may extend all the way to the original
length of the argument area, no matter what you do (this is the case
for example with Linux 2.2).
Note for BSD users: setting C<$0> does not completely remove "perl"
from the ps(1) output. For example, setting C<$0> to C<"foobar"> may
result in C<"perl: foobar (perl)"> (whether both the C<"perl: "> prefix
and the " (perl)" suffix are shown depends on your exact BSD variant
and version). This is an operating system feature, Perl cannot help it.
In multithreaded scripts Perl coordinates the threads so that any
thread may modify its copy of the C<$0> and the change becomes visible
to ps(1) (assuming the operating system plays along). Note that
the view of C<$0> the other threads have will not change since they
have their own copies of it.
If the program has been given to perl via the switches C<-e> or C<-E>,
C<$0> will contain the string C<"-e">.
On Linux as of perl v5.14.0 the legacy process name will be set with
C<prctl(2)>, in addition to altering the POSIX name via C<argv[0]> as
perl has done since version 4.000. Now system utilities that read the
legacy process name such as ps, top and killall will recognize the
name you set when assigning to C<$0>. The string you supply will be
cut off at 16 bytes, this is a limitation imposed by Linux.
Mnemonic: same as B<sh> and B<ksh>.
=item $REAL_GROUP_ID
=item $GID
=item $(
X<$(> X<$GID> X<$REAL_GROUP_ID>
The real gid of this process. If you are on a machine that supports
membership in multiple groups simultaneously, gives a space separated
list of groups you are in. The first number is the one returned by
C<getgid()>, and the subsequent ones by C<getgroups()>, one of which may be
the same as the first number.
However, a value assigned to C<$(> must be a single number used to
set the real gid. So the value given by C<$(> should I<not> be assigned
back to C<$(> without being forced numeric, such as by adding zero. Note
that this is different to the effective gid (C<$)>) which does take a
list.
You can change both the real gid and the effective gid at the same
time by using C<POSIX::setgid()>. Changes
to C<$(> require a check to C<$!>
to detect any possible errors after an attempted change.
Mnemonic: parentheses are used to I<group> things. The real gid is the
group you I<left>, if you're running setgid.
=item $EFFECTIVE_GROUP_ID
=item $EGID
=item $)
X<$)> X<$EGID> X<$EFFECTIVE_GROUP_ID>
The effective gid of this process. If you are on a machine that
supports membership in multiple groups simultaneously, gives a space
separated list of groups you are in. The first number is the one
returned by C<getegid()>, and the subsequent ones by C<getgroups()>,
one of which may be the same as the first number.
Similarly, a value assigned to C<$)> must also be a space-separated
list of numbers. The first number sets the effective gid, and
the rest (if any) are passed to C<setgroups()>. To get the effect of an
empty list for C<setgroups()>, just repeat the new effective gid; that is,
to force an effective gid of 5 and an effectively empty C<setgroups()>
list, say C< $) = "5 5" >.
You can change both the effective gid and the real gid at the same
time by using C<POSIX::setgid()> (use only a single numeric argument).
Changes to C<$)> require a check to C<$!> to detect any possible errors
after an attempted change.
C<< $< >>, C<< $> >>, C<$(> and C<$)> can be set only on
machines that support the corresponding I<set[re][ug]id()> routine. C<$(>
and C<$)> can be swapped only on machines supporting C<setregid()>.
Mnemonic: parentheses are used to I<group> things. The effective gid
is the group that's I<right> for you, if you're running setgid.
=item $REAL_USER_ID
=item $UID
=item $<
X<< $< >> X<$UID> X<$REAL_USER_ID>
The real uid of this process. You can change both the real uid and the
effective uid at the same time by using C<POSIX::setuid()>. Since
changes to C<< $< >> require a system call, check C<$!> after a change
attempt to detect any possible errors.
Mnemonic: it's the uid you came I<from>, if you're running setuid.
=item $EFFECTIVE_USER_ID
=item $EUID
=item $>
X<< $> >> X<$EUID> X<$EFFECTIVE_USER_ID>
The effective uid of this process. For example:
$< = $>; # set real to effective uid
($<,$>) = ($>,$<); # swap real and effective uids
You can change both the effective uid and the real uid at the same
time by using C<POSIX::setuid()>. Changes to C<< $> >> require a check
to C<$!> to detect any possible errors after an attempted change.
C<< $< >> and C<< $> >> can be swapped only on machines
supporting C<setreuid()>.
Mnemonic: it's the uid you went I<to>, if you're running setuid.
=item $SUBSCRIPT_SEPARATOR
=item $SUBSEP
=item $;
X<$;> X<$SUBSEP> X<SUBSCRIPT_SEPARATOR>
The subscript separator for multidimensional array emulation. If you
refer to a hash element as
$foo{$x,$y,$z}
it really means
$foo{join($;, $x, $y, $z)}
But don't put
@foo{$x,$y,$z} # a slice--note the @
which means
($foo{$x},$foo{$y},$foo{$z})
Default is "\034", the same as SUBSEP in B<awk>. If your keys contain
binary data there might not be any safe value for C<$;>.
Consider using "real" multidimensional arrays as described
in L<perllol>.
Mnemonic: comma (the syntactic subscript separator) is a semi-semicolon.
=item $a
=item $b
X<$a> X<$b>
Special package variables when using C<sort()>, see L<perlfunc/sort>.
Because of this specialness C<$a> and C<$b> don't need to be declared
(using C<use vars>, or C<our()>) even when using the C<strict 'vars'>
pragma. Don't lexicalize them with C<my $a> or C<my $b> if you want to
be able to use them in the C<sort()> comparison block or function.
=item %ENV
X<%ENV>
The hash C<%ENV> contains your current environment. Setting a
value in C<ENV> changes the environment for any child processes
you subsequently C<fork()> off.
As of v5.18.0, both keys and values stored in C<%ENV> are stringified.
my $foo = 1;
$ENV{'bar'} = \$foo;
if( ref $ENV{'bar'} ) {
say "Pre 5.18.0 Behaviour";
} else {
say "Post 5.18.0 Behaviour";
}
Previously, only child processes received stringified values:
my $foo = 1;
$ENV{'bar'} = \$foo;
# Always printed 'non ref'
system($^X, '-e',
q/print ( ref $ENV{'bar'} ? 'ref' : 'non ref' ) /);
This happens because you can't really share arbitrary data structures with
foreign processes.
=item $OLD_PERL_VERSION
=item $]
X<$]> X<$OLD_PERL_VERSION>
The revision, version, and subversion of the Perl interpreter, represented
as a decimal of the form 5.XXXYYY, where XXX is the version / 1e3 and YYY
is the subversion / 1e6. For example, Perl v5.10.1 would be "5.010001".
This variable can be used to determine whether the Perl interpreter
executing a script is in the right range of versions:
warn "No PerlIO!\n" if $] lt '5.008';
When comparing C<$]>, string comparison operators are B<highly
recommended>. The inherent limitations of binary floating point
representation can sometimes lead to incorrect comparisons for some
numbers on some architectures.
See also the documentation of C<use VERSION> and C<require VERSION>
for a convenient way to fail if the running Perl interpreter is too old.
See L</$^V> for a representation of the Perl version as a L<version>
object, which allows more flexible string comparisons.
The main advantage of C<$]> over C<$^V> is that it works the same on any
version of Perl. The disadvantages are that it can't easily be compared
to versions in other formats (e.g. literal v-strings, "v1.2.3" or
version objects) and numeric comparisons can occasionally fail; it's good
for string literal version checks and bad for comparing to a variable
that hasn't been sanity-checked.
The C<$OLD_PERL_VERSION> form was added in Perl v5.20.0 for historical
reasons but its use is discouraged. (If your reason to use C<$]> is to
run code on old perls then referring to it as C<$OLD_PERL_VERSION> would
be self-defeating.)
Mnemonic: Is this version of perl in the right bracket?
=item $SYSTEM_FD_MAX
=item $^F
X<$^F> X<$SYSTEM_FD_MAX>
The maximum system file descriptor, ordinarily 2. System file
descriptors are passed to C<exec()>ed processes, while higher file
descriptors are not. Also, during an
C<open()>, system file descriptors are
preserved even if the C<open()> fails (ordinary file descriptors are
closed before the C<open()> is attempted). The close-on-exec
status of a file descriptor will be decided according to the value of
C<$^F> when the corresponding file, pipe, or socket was opened, not the
time of the C<exec()>.
=item @F
X<@F>
The array C<@F> contains the fields of each line read in when autosplit
mode is turned on. See L<perlrun> for the B<-a> switch. This array
is package-specific, and must be declared or given a full package name
if not in package main when running under C<strict 'vars'>.
=item @INC
X<@INC>
The array C<@INC> contains the list of places that the C<do EXPR>,
C<require>, or C<use> constructs look for their library files. It
initially consists of the arguments to any B<-I> command-line
switches, followed by the default Perl library, probably
F</usr/local/lib/perl>, followed by ".", to represent the current
directory. ("." will not be appended if taint checks are enabled,
either by C<-T> or by C<-t>, or if configured not to do so by the
C<-Ddefault_inc_excludes_dot> compile time option.) If you need to
modify this at runtime, you should use the C<use lib> pragma to get
the machine-dependent library properly loaded also:
use lib '/mypath/libdir/';
use SomeMod;
You can also insert hooks into the file inclusion system by putting Perl
code directly into C<@INC>. Those hooks may be subroutine references,
array references or blessed objects. See L<perlfunc/require> for details.
=item %INC
X<%INC>
The hash C<%INC> contains entries for each filename included via the
C<do>, C<require>, or C<use> operators. The key is the filename
you specified (with module names converted to pathnames), and the
value is the location of the file found. The C<require>
operator uses this hash to determine whether a particular file has
already been included.
If the file was loaded via a hook (e.g. a subroutine reference, see
L<perlfunc/require> for a description of these hooks), this hook is
by default inserted into C<%INC> in place of a filename. Note, however,
that the hook may have set the C<%INC> entry by itself to provide some more
specific info.
=item $INPLACE_EDIT
=item $^I
X<$^I> X<$INPLACE_EDIT>
The current value of the inplace-edit extension. Use C<undef> to disable
inplace editing.
Mnemonic: value of B<-i> switch.
=item @ISA
X<@ISA>
Each package contains a special array called C<@ISA> which contains a list
of that class's parent classes, if any. This array is simply a list of
scalars, each of which is a string that corresponds to a package name. The
array is examined when Perl does method resolution, which is covered in
L<perlobj>.
To load packages while adding them to C<@ISA>, see the L<parent> pragma. The
discouraged L<base> pragma does this as well, but should not be used except
when compatibility with the discouraged L<fields> pragma is required.
=item $^M
X<$^M>
By default, running out of memory is an untrappable, fatal error.
However, if suitably built, Perl can use the contents of C<$^M>
as an emergency memory pool after C<die()>ing. Suppose that your Perl
were compiled with C<-DPERL_EMERGENCY_SBRK> and used Perl's malloc.
Then
$^M = 'a' x (1 << 16);
would allocate a 64K buffer for use in an emergency. See the
F<INSTALL> file in the Perl distribution for information on how to
add custom C compilation flags when compiling perl. To discourage casual
use of this advanced feature, there is no L<English|English> long name for
this variable.
This variable was added in Perl 5.004.
=item $OSNAME
=item $^O
X<$^O> X<$OSNAME>
The name of the operating system under which this copy of Perl was
built, as determined during the configuration process. For examples
see L<perlport/PLATFORMS>.
The value is identical to C<$Config{'osname'}>. See also L<Config>
and the B<-V> command-line switch documented in L<perlrun>.
In Windows platforms, C<$^O> is not very helpful: since it is always
C<MSWin32>, it doesn't tell the difference between
95/98/ME/NT/2000/XP/CE/.NET. Use C<Win32::GetOSName()> or
Win32::GetOSVersion() (see L<Win32> and L<perlport>) to distinguish
between the variants.
This variable was added in Perl 5.003.
=item %SIG
X<%SIG>
The hash C<%SIG> contains signal handlers for signals. For example:
sub handler { # 1st argument is signal name
my($sig) = @_;
print "Caught a SIG$sig--shutting down\n";
close(LOG);
exit(0);
}
$SIG{'INT'} = \&handler;
$SIG{'QUIT'} = \&handler;
...
$SIG{'INT'} = 'DEFAULT'; # restore default action
$SIG{'QUIT'} = 'IGNORE'; # ignore SIGQUIT
Using a value of C<'IGNORE'> usually has the effect of ignoring the
signal, except for the C<CHLD> signal. See L<perlipc> for more about
this special case.
Here are some other examples:
$SIG{"PIPE"} = "Plumber"; # assumes main::Plumber (not
# recommended)
$SIG{"PIPE"} = \&Plumber; # just fine; assume current
# Plumber
$SIG{"PIPE"} = *Plumber; # somewhat esoteric
$SIG{"PIPE"} = Plumber(); # oops, what did Plumber()
# return??
Be sure not to use a bareword as the name of a signal handler,
lest you inadvertently call it.
If your system has the C<sigaction()> function then signal handlers
are installed using it. This means you get reliable signal handling.
The default delivery policy of signals changed in Perl v5.8.0 from
immediate (also known as "unsafe") to deferred, also known as "safe
signals". See L<perlipc> for more information.
Certain internal hooks can be also set using the C<%SIG> hash. The
routine indicated by C<$SIG{__WARN__}> is called when a warning
message is about to be printed. The warning message is passed as the
first argument. The presence of a C<__WARN__> hook causes the
ordinary printing of warnings to C<STDERR> to be suppressed. You can
use this to save warnings in a variable, or turn warnings into fatal
errors, like this:
local $SIG{__WARN__} = sub { die $_[0] };
eval $proggie;
As the C<'IGNORE'> hook is not supported by C<__WARN__>, you can
disable warnings using the empty subroutine:
local $SIG{__WARN__} = sub {};
The routine indicated by C<$SIG{__DIE__}> is called when a fatal
exception is about to be thrown. The error message is passed as the
first argument. When a C<__DIE__> hook routine returns, the exception
processing continues as it would have in the absence of the hook,
unless the hook routine itself exits via a C<goto &sub>, a loop exit,
or a C<die()>. The C<__DIE__> handler is explicitly disabled during
the call, so that you can die from a C<__DIE__> handler. Similarly
for C<__WARN__>.
The C<$SIG{__DIE__}> hook is called even inside an C<eval()>. It was
never intended to happen this way, but an implementation glitch made
this possible. This used to be deprecated, as it allowed strange action
at a distance like rewriting a pending exception in C<$@>. Plans to
rectify this have been scrapped, as users found that rewriting a
pending exception is actually a useful feature, and not a bug.
C<__DIE__>/C<__WARN__> handlers are very special in one respect: they
may be called to report (probable) errors found by the parser. In such
a case the parser may be in inconsistent state, so any attempt to
evaluate Perl code from such a handler will probably result in a
segfault. This means that warnings or errors that result from parsing
Perl should be used with extreme caution, like this:
require Carp if defined $^S;
Carp::confess("Something wrong") if defined &Carp::confess;
die "Something wrong, but could not load Carp to give "
. "backtrace...\n\t"
. "To see backtrace try starting Perl with -MCarp switch";
Here the first line will load C<Carp> I<unless> it is the parser who
called the handler. The second line will print backtrace and die if
C<Carp> was available. The third line will be executed only if C<Carp> was
not available.
Having to even think about the C<$^S> variable in your exception
handlers is simply wrong. C<$SIG{__DIE__}> as currently implemented
invites grievous and difficult to track down errors. Avoid it
and use an C<END{}> or CORE::GLOBAL::die override instead.
See L<perlfunc/die>, L<perlfunc/warn>, L<perlfunc/eval>, and
L<warnings> for additional information.
=item $BASETIME
=item $^T
X<$^T> X<$BASETIME>
The time at which the program began running, in seconds since the
epoch (beginning of 1970). The values returned by the B<-M>, B<-A>,
and B<-C> filetests are based on this value.
=item $PERL_VERSION
=item $^V
X<$^V> X<$PERL_VERSION>
The revision, version, and subversion of the Perl interpreter,
represented as a L<version> object.
This variable first appeared in perl v5.6.0; earlier versions of perl
will see an undefined value. Before perl v5.10.0 C<$^V> was represented
as a v-string rather than a L<version> object.
C<$^V> can be used to determine whether the Perl interpreter executing
a script is in the right range of versions. For example:
warn "Hashes not randomized!\n" if !$^V or $^V lt v5.8.1
While version objects overload stringification, to portably convert
C<$^V> into its string representation, use C<sprintf()>'s C<"%vd">
conversion, which works for both v-strings or version objects:
printf "version is v%vd\n", $^V; # Perl's version
See the documentation of C<use VERSION> and C<require VERSION>
for a convenient way to fail if the running Perl interpreter is too old.
See also C<L</$]>> for a decimal representation of the Perl version.
The main advantage of C<$^V> over C<$]> is that, for Perl v5.10.0 or
later, it overloads operators, allowing easy comparison against other
version representations (e.g. decimal, literal v-string, "v1.2.3", or
objects). The disadvantage is that prior to v5.10.0, it was only a
literal v-string, which can't be easily printed or compared, whereas
the behavior of C<$]> is unchanged on all versions of Perl.
Mnemonic: use ^V for a version object.
=item ${^WIN32_SLOPPY_STAT}
X<${^WIN32_SLOPPY_STAT}> X<sitecustomize> X<sitecustomize.pl>
If this variable is set to a true value, then C<stat()> on Windows will
not try to open the file. This means that the link count cannot be
determined and file attributes may be out of date if additional
hardlinks to the file exist. On the other hand, not opening the file
is considerably faster, especially for files on network drives.
This variable could be set in the F<sitecustomize.pl> file to
configure the local Perl installation to use "sloppy" C<stat()> by
default. See the documentation for B<-f> in
L<perlrun|perlrun/"Command Switches"> for more information about site
customization.
This variable was added in Perl v5.10.0.
=item $EXECUTABLE_NAME
=item $^X
X<$^X> X<$EXECUTABLE_NAME>
The name used to execute the current copy of Perl, from C's
C<argv[0]> or (where supported) F</proc/self/exe>.
Depending on the host operating system, the value of C<$^X> may be
a relative or absolute pathname of the perl program file, or may
be the string used to invoke perl but not the pathname of the
perl program file. Also, most operating systems permit invoking
programs that are not in the PATH environment variable, so there
is no guarantee that the value of C<$^X> is in PATH. For VMS, the
value may or may not include a version number.
You usually can use the value of C<$^X> to re-invoke an independent
copy of the same perl that is currently running, e.g.,
@first_run = `$^X -le "print int rand 100 for 1..100"`;
But recall that not all operating systems support forking or
capturing of the output of commands, so this complex statement
may not be portable.
It is not safe to use the value of C<$^X> as a path name of a file,
as some operating systems that have a mandatory suffix on
executable files do not require use of the suffix when invoking
a command. To convert the value of C<$^X> to a path name, use the
following statements:
# Build up a set of file names (not command names).
use Config;
my $this_perl = $^X;
if ($^O ne 'VMS') {
$this_perl .= $Config{_exe}
unless $this_perl =~ m/$Config{_exe}$/i;
}
Because many operating systems permit anyone with read access to
the Perl program file to make a copy of it, patch the copy, and
then execute the copy, the security-conscious Perl programmer
should take care to invoke the installed copy of perl, not the
copy referenced by C<$^X>. The following statements accomplish
this goal, and produce a pathname that can be invoked as a
command or referenced as a file.
use Config;
my $secure_perl_path = $Config{perlpath};
if ($^O ne 'VMS') {
$secure_perl_path .= $Config{_exe}
unless $secure_perl_path =~ m/$Config{_exe}$/i;
}
=back
=head2 Variables related to regular expressions
Most of the special variables related to regular expressions are side
effects. Perl sets these variables when it has a successful match, so
you should check the match result before using them. For instance:
if( /P(A)TT(ER)N/ ) {
print "I found $1 and $2\n";
}
These variables are read-only and dynamically-scoped, unless we note
otherwise.
The dynamic nature of the regular expression variables means that
their value is limited to the block that they are in, as demonstrated
by this bit of code:
my $outer = 'Wallace and Grommit';
my $inner = 'Mutt and Jeff';
my $pattern = qr/(\S+) and (\S+)/;
sub show_n { print "\$1 is $1; \$2 is $2\n" }
{
OUTER:
show_n() if $outer =~ m/$pattern/;
INNER: {
show_n() if $inner =~ m/$pattern/;
}
show_n();
}
The output shows that while in the C<OUTER> block, the values of C<$1>
and C<$2> are from the match against C<$outer>. Inside the C<INNER>
block, the values of C<$1> and C<$2> are from the match against
C<$inner>, but only until the end of the block (i.e. the dynamic
scope). After the C<INNER> block completes, the values of C<$1> and
C<$2> return to the values for the match against C<$outer> even though
we have not made another match:
$1 is Wallace; $2 is Grommit
$1 is Mutt; $2 is Jeff
$1 is Wallace; $2 is Grommit
=head3 Performance issues
Traditionally in Perl, any use of any of the three variables C<$`>, C<$&>
or C<$'> (or their C<use English> equivalents) anywhere in the code, caused
all subsequent successful pattern matches to make a copy of the matched
string, in case the code might subsequently access one of those variables.
This imposed a considerable performance penalty across the whole program,
so generally the use of these variables has been discouraged.
In Perl 5.6.0 the C<@-> and C<@+> dynamic arrays were introduced that
supply the indices of successful matches. So you could for example do
this:
$str =~ /pattern/;
print $`, $&, $'; # bad: perfomance hit
print # good: no perfomance hit
substr($str, 0, $-[0]),
substr($str, $-[0], $+[0]-$-[0]),
substr($str, $+[0]);
In Perl 5.10.0 the C</p> match operator flag and the C<${^PREMATCH}>,
C<${^MATCH}>, and C<${^POSTMATCH}> variables were introduced, that allowed
you to suffer the penalties only on patterns marked with C</p>.
In Perl 5.18.0 onwards, perl started noting the presence of each of the
three variables separately, and only copied that part of the string
required; so in
$`; $&; "abcdefgh" =~ /d/
perl would only copy the "abcd" part of the string. That could make a big
difference in something like
$str = 'x' x 1_000_000;
$&; # whoops
$str =~ /x/g # one char copied a million times, not a million chars
In Perl 5.20.0 a new copy-on-write system was enabled by default, which
finally fixes all performance issues with these three variables, and makes
them safe to use anywhere.
The C<Devel::NYTProf> and C<Devel::FindAmpersand> modules can help you
find uses of these problematic match variables in your code.
=over 8
=item $<I<digits>> ($1, $2, ...)
X<$1> X<$2> X<$3> X<$I<digits>>
Contains the subpattern from the corresponding set of capturing
parentheses from the last successful pattern match, not counting patterns
matched in nested blocks that have been exited already.
Note there is a distinction between a capture buffer which matches
the empty string a capture buffer which is optional. Eg, C<(x?)> and
C<(x)?> The latter may be undef, the former not.
These variables are read-only and dynamically-scoped.
Mnemonic: like \digits.
=item @{^CAPTURE}
X<@{^CAPTURE}> X<@^CAPTURE>
An array which exposes the contents of the capture buffers, if any, of
the last successful pattern match, not counting patterns matched
in nested blocks that have been exited already.
Note that the 0 index of @{^CAPTURE} is equivalent to $1, the 1 index
is equivalent to $2, etc.
if ("foal"=~/(.)(.)(.)(.)/) {
print join "-", @{^CAPTURE};
}
should output "f-o-a-l".
See also L</$I<digits>>, L</%{^CAPTURE}> and L</%{^CAPTURE_ALL}>.
Note that unlike most other regex magic variables there is no single
letter equivalent to C<@{^CAPTURE}>.
This variable was added in 5.25.7
=item $MATCH
=item $&
X<$&> X<$MATCH>
The string matched by the last successful pattern match (not counting
any matches hidden within a BLOCK or C<eval()> enclosed by the current
BLOCK).
See L</Performance issues> above for the serious performance implications
of using this variable (even once) in your code.
This variable is read-only and dynamically-scoped.
Mnemonic: like C<&> in some editors.
=item ${^MATCH}
X<${^MATCH}>
This is similar to C<$&> (C<$MATCH>) except that it does not incur the
performance penalty associated with that variable.
See L</Performance issues> above.
In Perl v5.18 and earlier, it is only guaranteed
to return a defined value when the pattern was compiled or executed with
the C</p> modifier. In Perl v5.20, the C</p> modifier does nothing, so
C<${^MATCH}> does the same thing as C<$MATCH>.
This variable was added in Perl v5.10.0.
This variable is read-only and dynamically-scoped.
=item $PREMATCH
=item $`
X<$`> X<$PREMATCH> X<${^PREMATCH}>
The string preceding whatever was matched by the last successful
pattern match, not counting any matches hidden within a BLOCK or C<eval>
enclosed by the current BLOCK.
See L</Performance issues> above for the serious performance implications
of using this variable (even once) in your code.
This variable is read-only and dynamically-scoped.
Mnemonic: C<`> often precedes a quoted string.
=item ${^PREMATCH}
X<$`> X<${^PREMATCH}>
This is similar to C<$`> ($PREMATCH) except that it does not incur the
performance penalty associated with that variable.
See L</Performance issues> above.
In Perl v5.18 and earlier, it is only guaranteed
to return a defined value when the pattern was compiled or executed with
the C</p> modifier. In Perl v5.20, the C</p> modifier does nothing, so
C<${^PREMATCH}> does the same thing as C<$PREMATCH>.
This variable was added in Perl v5.10.0.
This variable is read-only and dynamically-scoped.
=item $POSTMATCH
=item $'
X<$'> X<$POSTMATCH> X<${^POSTMATCH}> X<@->
The string following whatever was matched by the last successful
pattern match (not counting any matches hidden within a BLOCK or C<eval()>
enclosed by the current BLOCK). Example:
local $_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi
See L</Performance issues> above for the serious performance implications
of using this variable (even once) in your code.
This variable is read-only and dynamically-scoped.
Mnemonic: C<'> often follows a quoted string.
=item ${^POSTMATCH}
X<${^POSTMATCH}> X<$'> X<$POSTMATCH>
This is similar to C<$'> (C<$POSTMATCH>) except that it does not incur the
performance penalty associated with that variable.
See L</Performance issues> above.
In Perl v5.18 and earlier, it is only guaranteed
to return a defined value when the pattern was compiled or executed with
the C</p> modifier. In Perl v5.20, the C</p> modifier does nothing, so
C<${^POSTMATCH}> does the same thing as C<$POSTMATCH>.
This variable was added in Perl v5.10.0.
This variable is read-only and dynamically-scoped.
=item $LAST_PAREN_MATCH
=item $+
X<$+> X<$LAST_PAREN_MATCH>
The text matched by the last bracket of the last successful search pattern.
This is useful if you don't know which one of a set of alternative patterns
matched. For example:
/Version: (.*)|Revision: (.*)/ && ($rev = $+);
This variable is read-only and dynamically-scoped.
Mnemonic: be positive and forward looking.
=item $LAST_SUBMATCH_RESULT
=item $^N
X<$^N> X<$LAST_SUBMATCH_RESULT>
The text matched by the used group most-recently closed (i.e. the group
with the rightmost closing parenthesis) of the last successful search
pattern.
This is primarily used inside C<(?{...})> blocks for examining text
recently matched. For example, to effectively capture text to a variable
(in addition to C<$1>, C<$2>, etc.), replace C<(...)> with
(?:(...)(?{ $var = $^N }))
By setting and then using C<$var> in this way relieves you from having to
worry about exactly which numbered set of parentheses they are.
This variable was added in Perl v5.8.0.
Mnemonic: the (possibly) Nested parenthesis that most recently closed.
=item @LAST_MATCH_END
=item @+
X<@+> X<@LAST_MATCH_END>
This array holds the offsets of the ends of the last successful
submatches in the currently active dynamic scope. C<$+[0]> is
the offset into the string of the end of the entire match. This
is the same value as what the C<pos> function returns when called
on the variable that was matched against. The I<n>th element
of this array holds the offset of the I<n>th submatch, so
C<$+[1]> is the offset past where C<$1> ends, C<$+[2]> the offset
past where C<$2> ends, and so on. You can use C<$#+> to determine
how many subgroups were in the last successful match. See the
examples given for the C<@-> variable.
This variable was added in Perl v5.6.0.
=item %{^CAPTURE}
=item %LAST_PAREN_MATCH
=item %+
X<%+> X<%LAST_PAREN_MATCH> X<%{^CAPTURE}>
Similar to C<@+>, the C<%+> hash allows access to the named capture
buffers, should they exist, in the last successful match in the
currently active dynamic scope.
For example, C<$+{foo}> is equivalent to C<$1> after the following match:
'foo' =~ /(?<foo>foo)/;
The keys of the C<%+> hash list only the names of buffers that have
captured (and that are thus associated to defined values).
The underlying behaviour of C<%+> is provided by the
L<Tie::Hash::NamedCapture> module.
B<Note:> C<%-> and C<%+> are tied views into a common internal hash
associated with the last successful regular expression. Therefore mixing
iterative access to them via C<each> may have unpredictable results.
Likewise, if the last successful match changes, then the results may be
surprising.
This variable was added in Perl v5.10.0. The C<%{^CAPTURE}> alias was
added in 5.25.7.
This variable is read-only and dynamically-scoped.
=item @LAST_MATCH_START
=item @-
X<@-> X<@LAST_MATCH_START>
C<$-[0]> is the offset of the start of the last successful match.
C<$-[>I<n>C<]> is the offset of the start of the substring matched by
I<n>-th subpattern, or undef if the subpattern did not match.
Thus, after a match against C<$_>, C<$&> coincides with C<substr $_, $-[0],
$+[0] - $-[0]>. Similarly, $I<n> coincides with C<substr $_, $-[n],
$+[n] - $-[n]> if C<$-[n]> is defined, and $+ coincides with
C<substr $_, $-[$#-], $+[$#-] - $-[$#-]>. One can use C<$#-> to find the
last matched subgroup in the last successful match. Contrast with
C<$#+>, the number of subgroups in the regular expression. Compare
with C<@+>.
This array holds the offsets of the beginnings of the last
successful submatches in the currently active dynamic scope.
C<$-[0]> is the offset into the string of the beginning of the
entire match. The I<n>th element of this array holds the offset
of the I<n>th submatch, so C<$-[1]> is the offset where C<$1>
begins, C<$-[2]> the offset where C<$2> begins, and so on.
After a match against some variable C<$var>:
=over 5
=item C<$`> is the same as C<substr($var, 0, $-[0])>
=item C<$&> is the same as C<substr($var, $-[0], $+[0] - $-[0])>
=item C<$'> is the same as C<substr($var, $+[0])>
=item C<$1> is the same as C<substr($var, $-[1], $+[1] - $-[1])>
=item C<$2> is the same as C<substr($var, $-[2], $+[2] - $-[2])>
=item C<$3> is the same as C<substr($var, $-[3], $+[3] - $-[3])>
=back
This variable was added in Perl v5.6.0.
=item %{^CAPTURE_ALL}
X<%{^CAPTURE_ALL}>
=item %-
X<%->
Similar to C<%+>, this variable allows access to the named capture groups
in the last successful match in the currently active dynamic scope. To
each capture group name found in the regular expression, it associates a
reference to an array containing the list of values captured by all
buffers with that name (should there be several of them), in the order
where they appear.
Here's an example:
if ('1234' =~ /(?<A>1)(?<B>2)(?<A>3)(?<B>4)/) {
foreach my $bufname (sort keys %-) {
my $ary = $-{$bufname};
foreach my $idx (0..$#$ary) {
print "\$-{$bufname}[$idx] : ",
(defined($ary->[$idx])
? "'$ary->[$idx]'"
: "undef"),
"\n";
}
}
}
would print out:
$-{A}[0] : '1'
$-{A}[1] : '3'
$-{B}[0] : '2'
$-{B}[1] : '4'
The keys of the C<%-> hash correspond to all buffer names found in
the regular expression.
The behaviour of C<%-> is implemented via the
L<Tie::Hash::NamedCapture> module.
B<Note:> C<%-> and C<%+> are tied views into a common internal hash
associated with the last successful regular expression. Therefore mixing
iterative access to them via C<each> may have unpredictable results.
Likewise, if the last successful match changes, then the results may be
surprising.
This variable was added in Perl v5.10.0. The C<%{^CAPTURE_ALL}> alias was
added in 5.25.7.
This variable is read-only and dynamically-scoped.
=item $LAST_REGEXP_CODE_RESULT
=item $^R
X<$^R> X<$LAST_REGEXP_CODE_RESULT>
The result of evaluation of the last successful C<(?{ code })>
regular expression assertion (see L<perlre>). May be written to.
This variable was added in Perl 5.005.
=item ${^RE_DEBUG_FLAGS}
X<${^RE_DEBUG_FLAGS}>
The current value of the regex debugging flags. Set to 0 for no debug output
even when the C<re 'debug'> module is loaded. See L<re> for details.
This variable was added in Perl v5.10.0.
=item ${^RE_TRIE_MAXBUF}
X<${^RE_TRIE_MAXBUF}>
Controls how certain regex optimisations are applied and how much memory they
utilize. This value by default is 65536 which corresponds to a 512kB
temporary cache. Set this to a higher value to trade
memory for speed when matching large alternations. Set
it to a lower value if you want the optimisations to
be as conservative of memory as possible but still occur, and set it to a
negative value to prevent the optimisation and conserve the most memory.
Under normal situations this variable should be of no interest to you.
This variable was added in Perl v5.10.0.
=back
=head2 Variables related to filehandles
Variables that depend on the currently selected filehandle may be set
by calling an appropriate object method on the C<IO::Handle> object,
although this is less efficient than using the regular built-in
variables. (Summary lines below for this contain the word HANDLE.)
First you must say
use IO::Handle;
after which you may use either
method HANDLE EXPR
or more safely,
HANDLE->method(EXPR)
Each method returns the old value of the C<IO::Handle> attribute. The
methods each take an optional EXPR, which, if supplied, specifies the
new value for the C<IO::Handle> attribute in question. If not
supplied, most methods do nothing to the current value--except for
C<autoflush()>, which will assume a 1 for you, just to be different.
Because loading in the C<IO::Handle> class is an expensive operation,
you should learn how to use the regular built-in variables.
A few of these variables are considered "read-only". This means that
if you try to assign to this variable, either directly or indirectly
through a reference, you'll raise a run-time exception.
You should be very careful when modifying the default values of most
special variables described in this document. In most cases you want
to localize these variables before changing them, since if you don't,
the change may affect other modules which rely on the default values
of the special variables that you have changed. This is one of the
correct ways to read the whole file at once:
open my $fh, "<", "foo" or die $!;
local $/; # enable localized slurp mode
my $content = <$fh>;
close $fh;
But the following code is quite bad:
open my $fh, "<", "foo" or die $!;
undef $/; # enable slurp mode
my $content = <$fh>;
close $fh;
since some other module, may want to read data from some file in the
default "line mode", so if the code we have just presented has been
executed, the global value of C<$/> is now changed for any other code
running inside the same Perl interpreter.
Usually when a variable is localized you want to make sure that this
change affects the shortest scope possible. So unless you are already
inside some short C<{}> block, you should create one yourself. For
example:
my $content = '';
open my $fh, "<", "foo" or die $!;
{
local $/;
$content = <$fh>;
}
close $fh;
Here is an example of how your own code can go broken:
for ( 1..3 ){
$\ = "\r\n";
nasty_break();
print "$_";
}
sub nasty_break {
$\ = "\f";
# do something with $_
}
You probably expect this code to print the equivalent of
"1\r\n2\r\n3\r\n"
but instead you get:
"1\f2\f3\f"
Why? Because C<nasty_break()> modifies C<$\> without localizing it
first. The value you set in C<nasty_break()> is still there when you
return. The fix is to add C<local()> so the value doesn't leak out of
C<nasty_break()>:
local $\ = "\f";
It's easy to notice the problem in such a short example, but in more
complicated code you are looking for trouble if you don't localize
changes to the special variables.
=over 8
=item $ARGV
X<$ARGV>
Contains the name of the current file when reading from C<< <> >>.
=item @ARGV
X<@ARGV>
The array C<@ARGV> contains the command-line arguments intended for
the script. C<$#ARGV> is generally the number of arguments minus
one, because C<$ARGV[0]> is the first argument, I<not> the program's
command name itself. See L</$0> for the command name.
=item ARGV
X<ARGV>
The special filehandle that iterates over command-line filenames in
C<@ARGV>. Usually written as the null filehandle in the angle operator
C<< <> >>. Note that currently C<ARGV> only has its magical effect
within the C<< <> >> operator; elsewhere it is just a plain filehandle
corresponding to the last file opened by C<< <> >>. In particular,
passing C<\*ARGV> as a parameter to a function that expects a filehandle
may not cause your function to automatically read the contents of all the
files in C<@ARGV>.
=item ARGVOUT
X<ARGVOUT>
The special filehandle that points to the currently open output file
when doing edit-in-place processing with B<-i>. Useful when you have
to do a lot of inserting and don't want to keep modifying C<$_>. See
L<perlrun> for the B<-i> switch.
=item IO::Handle->output_field_separator( EXPR )
=item $OUTPUT_FIELD_SEPARATOR
=item $OFS
=item $,
X<$,> X<$OFS> X<$OUTPUT_FIELD_SEPARATOR>
The output field separator for the print operator. If defined, this
value is printed between each of print's arguments. Default is C<undef>.
You cannot call C<output_field_separator()> on a handle, only as a
static method. See L<IO::Handle|IO::Handle>.
Mnemonic: what is printed when there is a "," in your print statement.
=item HANDLE->input_line_number( EXPR )
=item $INPUT_LINE_NUMBER
=item $NR
=item $.
X<$.> X<$NR> X<$INPUT_LINE_NUMBER> X<line number>
Current line number for the last filehandle accessed.
Each filehandle in Perl counts the number of lines that have been read
from it. (Depending on the value of C<$/>, Perl's idea of what
constitutes a line may not match yours.) When a line is read from a
filehandle (via C<readline()> or C<< <> >>), or when C<tell()> or
C<seek()> is called on it, C<$.> becomes an alias to the line counter
for that filehandle.
You can adjust the counter by assigning to C<$.>, but this will not
actually move the seek pointer. I<Localizing C<$.> will not localize
the filehandle's line count>. Instead, it will localize perl's notion
of which filehandle C<$.> is currently aliased to.
C<$.> is reset when the filehandle is closed, but B<not> when an open
filehandle is reopened without an intervening C<close()>. For more
details, see L<perlop/"IE<sol>O Operators">. Because C<< <> >> never does
an explicit close, line numbers increase across C<ARGV> files (but see
examples in L<perlfunc/eof>).
You can also use C<< HANDLE->input_line_number(EXPR) >> to access the
line counter for a given filehandle without having to worry about
which handle you last accessed.
Mnemonic: many programs use "." to mean the current line number.
=item IO::Handle->input_record_separator( EXPR )
=item $INPUT_RECORD_SEPARATOR
=item $RS
=item $/
X<$/> X<$RS> X<$INPUT_RECORD_SEPARATOR>
The input record separator, newline by default. This influences Perl's
idea of what a "line" is. Works like B<awk>'s RS variable, including
treating empty lines as a terminator if set to the null string (an
empty line cannot contain any spaces or tabs). You may set it to a
multi-character string to match a multi-character terminator, or to
C<undef> to read through the end of file. Setting it to C<"\n\n">
means something slightly different than setting to C<"">, if the file
contains consecutive empty lines. Setting to C<""> will treat two or
more consecutive empty lines as a single empty line. Setting to
C<"\n\n"> will blindly assume that the next input character belongs to
the next paragraph, even if it's a newline.
local $/; # enable "slurp" mode
local $_ = <FH>; # whole file now here
s/\n[ \t]+/ /g;
Remember: the value of C<$/> is a string, not a regex. B<awk> has to
be better for something. :-)
Setting C<$/> to a reference to an integer, scalar containing an
integer, or scalar that's convertible to an integer will attempt to
read records instead of lines, with the maximum record size being the
referenced integer number of characters. So this:
local $/ = \32768; # or \"32768", or \$var_containing_32768
open my $fh, "<", $myfile or die $!;
local $_ = <$fh>;
will read a record of no more than 32768 characters from $fh. If you're
not reading from a record-oriented file (or your OS doesn't have
record-oriented files), then you'll likely get a full chunk of data
with every read. If a record is larger than the record size you've
set, you'll get the record back in pieces. Trying to set the record
size to zero or less is deprecated and will cause $/ to have the value
of "undef", which will cause reading in the (rest of the) whole file.
As of 5.19.9 setting C<$/> to any other form of reference will throw a
fatal exception. This is in preparation for supporting new ways to set
C<$/> in the future.
On VMS only, record reads bypass PerlIO layers and any associated
buffering, so you must not mix record and non-record reads on the
same filehandle. Record mode mixes with line mode only when the
same buffering layer is in use for both modes.
You cannot call C<input_record_separator()> on a handle, only as a
static method. See L<IO::Handle|IO::Handle>.
See also L<perlport/"Newlines">. Also see L</$.>.
Mnemonic: / delimits line boundaries when quoting poetry.
=item IO::Handle->output_record_separator( EXPR )
=item $OUTPUT_RECORD_SEPARATOR
=item $ORS
=item $\
X<$\> X<$ORS> X<$OUTPUT_RECORD_SEPARATOR>
The output record separator for the print operator. If defined, this
value is printed after the last of print's arguments. Default is C<undef>.
You cannot call C<output_record_separator()> on a handle, only as a
static method. See L<IO::Handle|IO::Handle>.
Mnemonic: you set C<$\> instead of adding "\n" at the end of the print.
Also, it's just like C<$/>, but it's what you get "back" from Perl.
=item HANDLE->autoflush( EXPR )
=item $OUTPUT_AUTOFLUSH
=item $|
X<$|> X<autoflush> X<flush> X<$OUTPUT_AUTOFLUSH>
If set to nonzero, forces a flush right away and after every write or
print on the currently selected output channel. Default is 0
(regardless of whether the channel is really buffered by the system or
not; C<$|> tells you only whether you've asked Perl explicitly to
flush after each write). STDOUT will typically be line buffered if
output is to the terminal and block buffered otherwise. Setting this
variable is useful primarily when you are outputting to a pipe or
socket, such as when you are running a Perl program under B<rsh> and
want to see the output as it's happening. This has no effect on input
buffering. See L<perlfunc/getc> for that. See L<perlfunc/select> on
how to select the output channel. See also L<IO::Handle>.
Mnemonic: when you want your pipes to be piping hot.
=item ${^LAST_FH}
X<${^LAST_FH}>
This read-only variable contains a reference to the last-read filehandle.
This is set by C<< <HANDLE> >>, C<readline>, C<tell>, C<eof> and C<seek>.
This is the same handle that C<$.> and C<tell> and C<eof> without arguments
use. It is also the handle used when Perl appends ", <STDIN> line 1" to
an error or warning message.
This variable was added in Perl v5.18.0.
=back
=head3 Variables related to formats
The special variables for formats are a subset of those for
filehandles. See L<perlform> for more information about Perl's
formats.
=over 8
=item $ACCUMULATOR
=item $^A
X<$^A> X<$ACCUMULATOR>
The current value of the C<write()> accumulator for C<format()> lines.
A format contains C<formline()> calls that put their result into
C<$^A>. After calling its format, C<write()> prints out the contents
of C<$^A> and empties. So you never really see the contents of C<$^A>
unless you call C<formline()> yourself and then look at it. See
L<perlform> and L<perlfunc/"formline PICTURE,LIST">.
=item IO::Handle->format_formfeed(EXPR)
=item $FORMAT_FORMFEED
=item $^L
X<$^L> X<$FORMAT_FORMFEED>
What formats output as a form feed. The default is C<\f>.
You cannot call C<format_formfeed()> on a handle, only as a static
method. See L<IO::Handle|IO::Handle>.
=item HANDLE->format_page_number(EXPR)
=item $FORMAT_PAGE_NUMBER
=item $%
X<$%> X<$FORMAT_PAGE_NUMBER>
The current page number of the currently selected output channel.
Mnemonic: C<%> is page number in B<nroff>.
=item HANDLE->format_lines_left(EXPR)
=item $FORMAT_LINES_LEFT
=item $-
X<$-> X<$FORMAT_LINES_LEFT>
The number of lines left on the page of the currently selected output
channel.
Mnemonic: lines_on_page - lines_printed.
=item IO::Handle->format_line_break_characters EXPR
=item $FORMAT_LINE_BREAK_CHARACTERS
=item $:
X<$:> X<FORMAT_LINE_BREAK_CHARACTERS>
The current set of characters after which a string may be broken to
fill continuation fields (starting with C<^>) in a format. The default is
S<" \n-">, to break on a space, newline, or a hyphen.
You cannot call C<format_line_break_characters()> on a handle, only as
a static method. See L<IO::Handle|IO::Handle>.
Mnemonic: a "colon" in poetry is a part of a line.
=item HANDLE->format_lines_per_page(EXPR)
=item $FORMAT_LINES_PER_PAGE
=item $=
X<$=> X<$FORMAT_LINES_PER_PAGE>
The current page length (printable lines) of the currently selected
output channel. The default is 60.
Mnemonic: = has horizontal lines.
=item HANDLE->format_top_name(EXPR)
=item $FORMAT_TOP_NAME
=item $^
X<$^> X<$FORMAT_TOP_NAME>
The name of the current top-of-page format for the currently selected
output channel. The default is the name of the filehandle with C<_TOP>
appended. For example, the default format top name for the C<STDOUT>
filehandle is C<STDOUT_TOP>.
Mnemonic: points to top of page.
=item HANDLE->format_name(EXPR)
=item $FORMAT_NAME
=item $~
X<$~> X<$FORMAT_NAME>
The name of the current report format for the currently selected
output channel. The default format name is the same as the filehandle
name. For example, the default format name for the C<STDOUT>
filehandle is just C<STDOUT>.
Mnemonic: brother to C<$^>.
=back
=head2 Error Variables
X<error> X<exception>
The variables C<$@>, C<$!>, C<$^E>, and C<$?> contain information
about different types of error conditions that may appear during
execution of a Perl program. The variables are shown ordered by
the "distance" between the subsystem which reported the error and
the Perl process. They correspond to errors detected by the Perl
interpreter, C library, operating system, or an external program,
respectively.
To illustrate the differences between these variables, consider the
following Perl expression, which uses a single-quoted string. After
execution of this statement, perl may have set all four special error
variables:
eval q{
open my $pipe, "/cdrom/install |" or die $!;
my @res = <$pipe>;
close $pipe or die "bad pipe: $?, $!";
};
When perl executes the C<eval()> expression, it translates the
C<open()>, C<< <PIPE> >>, and C<close> calls in the C run-time library
and thence to the operating system kernel. perl sets C<$!> to
the C library's C<errno> if one of these calls fails.
C<$@> is set if the string to be C<eval>-ed did not compile (this may
happen if C<open> or C<close> were imported with bad prototypes), or
if Perl code executed during evaluation C<die()>d. In these cases the
value of C<$@> is the compile error, or the argument to C<die> (which
will interpolate C<$!> and C<$?>). (See also L<Fatal>, though.)
Under a few operating systems, C<$^E> may contain a more verbose error
indicator, such as in this case, "CDROM tray not closed." Systems that
do not support extended error messages leave C<$^E> the same as C<$!>.
Finally, C<$?> may be set to a non-0 value if the external program
F</cdrom/install> fails. The upper eight bits reflect specific error
conditions encountered by the program (the program's C<exit()> value).
The lower eight bits reflect mode of failure, like signal death and
core dump information. See L<wait(2)> for details. In contrast to
C<$!> and C<$^E>, which are set only if an error condition is detected,
the variable C<$?> is set on each C<wait> or pipe C<close>,
overwriting the old value. This is more like C<$@>, which on every
C<eval()> is always set on failure and cleared on success.
For more details, see the individual descriptions at C<$@>, C<$!>,
C<$^E>, and C<$?>.
=over 8
=item ${^CHILD_ERROR_NATIVE}
X<$^CHILD_ERROR_NATIVE>
The native status returned by the last pipe close, backtick (C<``>)
command, successful call to C<wait()> or C<waitpid()>, or from the
C<system()> operator. On POSIX-like systems this value can be decoded
with the WIFEXITED, WEXITSTATUS, WIFSIGNALED, WTERMSIG, WIFSTOPPED,
WSTOPSIG and WIFCONTINUED functions provided by the L<POSIX> module.
Under VMS this reflects the actual VMS exit status; i.e. it is the
same as C<$?> when the pragma C<use vmsish 'status'> is in effect.
This variable was added in Perl v5.10.0.
=item $EXTENDED_OS_ERROR
=item $^E
X<$^E> X<$EXTENDED_OS_ERROR>
Error information specific to the current operating system. At the
moment, this differs from C<L</$!>> under only VMS, OS/2, and Win32 (and
for MacPerl). On all other platforms, C<$^E> is always just the same
as C<$!>.
Under VMS, C<$^E> provides the VMS status value from the last system
error. This is more specific information about the last system error
than that provided by C<$!>. This is particularly important when C<$!>
is set to B<EVMSERR>.
Under OS/2, C<$^E> is set to the error code of the last call to OS/2
API either via CRT, or directly from perl.
Under Win32, C<$^E> always returns the last error information reported
by the Win32 call C<GetLastError()> which describes the last error
from within the Win32 API. Most Win32-specific code will report errors
via C<$^E>. ANSI C and Unix-like calls set C<errno> and so most
portable Perl code will report errors via C<$!>.
Caveats mentioned in the description of C<L</$!>> generally apply to
C<$^E>, also.
This variable was added in Perl 5.003.
Mnemonic: Extra error explanation.
=item $EXCEPTIONS_BEING_CAUGHT
=item $^S
X<$^S> X<$EXCEPTIONS_BEING_CAUGHT>
Current state of the interpreter.
$^S State
--------- -------------------------------------
undef Parsing module, eval, or main program
true (1) Executing an eval
false (0) Otherwise
The first state may happen in C<$SIG{__DIE__}> and C<$SIG{__WARN__}>
handlers.
The English name $EXCEPTIONS_BEING_CAUGHT is slightly misleading, because
the C<undef> value does not indicate whether exceptions are being caught,
since compilation of the main program does not catch exceptions.
This variable was added in Perl 5.004.
=item $WARNING
=item $^W
X<$^W> X<$WARNING>
The current value of the warning switch, initially true if B<-w> was
used, false otherwise, but directly modifiable.
See also L<warnings>.
Mnemonic: related to the B<-w> switch.
=item ${^WARNING_BITS}
X<${^WARNING_BITS}>
The current set of warning checks enabled by the C<use warnings> pragma.
It has the same scoping as the C<$^H> and C<%^H> variables. The exact
values are considered internal to the L<warnings> pragma and may change
between versions of Perl.
This variable was added in Perl v5.6.0.
=item $OS_ERROR
=item $ERRNO
=item $!
X<$!> X<$ERRNO> X<$OS_ERROR>
When referenced, C<$!> retrieves the current value
of the C C<errno> integer variable.
If C<$!> is assigned a numerical value, that value is stored in C<errno>.
When referenced as a string, C<$!> yields the system error string
corresponding to C<errno>.
Many system or library calls set C<errno> if they fail,
to indicate the cause of failure. They usually do B<not>
set C<errno> to zero if they succeed. This means C<errno>,
hence C<$!>, is meaningful only I<immediately> after a B<failure>:
if (open my $fh, "<", $filename) {
# Here $! is meaningless.
...
}
else {
# ONLY here is $! meaningful.
...
# Already here $! might be meaningless.
}
# Since here we might have either success or failure,
# $! is meaningless.
Here, I<meaningless> means that C<$!> may be unrelated to the outcome
of the C<open()> operator. Assignment to C<$!> is similarly ephemeral.
It can be used immediately before invoking the C<die()> operator,
to set the exit value, or to inspect the system error string
corresponding to error I<n>, or to restore C<$!> to a meaningful state.
Mnemonic: What just went bang?
=item %OS_ERROR
=item %ERRNO
=item %!
X<%!> X<%OS_ERROR> X<%ERRNO>
Each element of C<%!> has a true value only if C<$!> is set to that
value. For example, C<$!{ENOENT}> is true if and only if the current
value of C<$!> is C<ENOENT>; that is, if the most recent error was "No
such file or directory" (or its moral equivalent: not all operating
systems give that exact error, and certainly not all languages). The
specific true value is not guaranteed, but in the past has generally
been the numeric value of C<$!>. To check if a particular key is
meaningful on your system, use C<exists $!{the_key}>; for a list of legal
keys, use C<keys %!>. See L<Errno> for more information, and also see
L</$!>.
This variable was added in Perl 5.005.
=item $CHILD_ERROR
=item $?
X<$?> X<$CHILD_ERROR>
The status returned by the last pipe close, backtick (C<``>) command,
successful call to C<wait()> or C<waitpid()>, or from the C<system()>
operator. This is just the 16-bit status word returned by the
traditional Unix C<wait()> system call (or else is made up to look
like it). Thus, the exit value of the subprocess is really (C<<< $? >>
8 >>>), and C<$? & 127> gives which signal, if any, the process died
from, and C<$? & 128> reports whether there was a core dump.
Additionally, if the C<h_errno> variable is supported in C, its value
is returned via C<$?> if any C<gethost*()> function fails.
If you have installed a signal handler for C<SIGCHLD>, the
value of C<$?> will usually be wrong outside that handler.
Inside an C<END> subroutine C<$?> contains the value that is going to be
given to C<exit()>. You can modify C<$?> in an C<END> subroutine to
change the exit status of your program. For example:
END {
$? = 1 if $? == 255; # die would make it 255
}
Under VMS, the pragma C<use vmsish 'status'> makes C<$?> reflect the
actual VMS exit status, instead of the default emulation of POSIX
status; see L<perlvms/$?> for details.
Mnemonic: similar to B<sh> and B<ksh>.
=item $EVAL_ERROR
=item $@
X<$@> X<$EVAL_ERROR>
The Perl error from the last C<eval> operator, i.e. the last exception that
was caught. For C<eval BLOCK>, this is either a runtime error message or the
string or reference C<die> was called with. The C<eval STRING> form also
catches syntax errors and other compile time exceptions.
If no error occurs, C<eval> sets C<$@> to the empty string.
Warning messages are not collected in this variable. You can, however,
set up a routine to process warnings by setting C<$SIG{__WARN__}> as
described in L</%SIG>.
Mnemonic: Where was the error "at"?
=back
=head2 Variables related to the interpreter state
These variables provide information about the current interpreter state.
=over 8
=item $COMPILING
=item $^C
X<$^C> X<$COMPILING>
The current value of the flag associated with the B<-c> switch.
Mainly of use with B<-MO=...> to allow code to alter its behavior
when being compiled, such as for example to C<AUTOLOAD> at compile
time rather than normal, deferred loading. Setting
C<$^C = 1> is similar to calling C<B::minus_c>.
This variable was added in Perl v5.6.0.
=item $DEBUGGING
=item $^D
X<$^D> X<$DEBUGGING>
The current value of the debugging flags. May be read or set. Like its
L<command-line equivalent|perlrun/B<-D>I<letters>>, you can use numeric
or symbolic values, e.g. C<$^D = 10> or C<$^D = "st">. See
L<perlrun/B<-D>I<number>>. The contents of this variable also affects the
debugger operation. See L<perldebguts/Debugger Internals>.
Mnemonic: value of B<-D> switch.
=item ${^ENCODING}
X<${^ENCODING}>
This variable is no longer supported.
It used to hold the I<object reference> to the C<Encode> object that was
used to convert the source code to Unicode.
Its purpose was to allow your non-ASCII Perl
scripts not to have to be written in UTF-8; this was
useful before editors that worked on UTF-8 encoded text were common, but
that was long ago. It caused problems, such as affecting the operation
of other modules that weren't expecting it, causing general mayhem.
If you need something like this functionality, it is recommended that use
you a simple source filter, such as L<Filter::Encoding>.
If you are coming here because code of yours is being adversely affected
by someone's use of this variable, you can usually work around it by
doing this:
local ${^ENCODING};
near the beginning of the functions that are getting broken. This
undefines the variable during the scope of execution of the including
function.
This variable was added in Perl 5.8.2 and removed in 5.26.0.
=item ${^GLOBAL_PHASE}
X<${^GLOBAL_PHASE}>
The current phase of the perl interpreter.
Possible values are:
=over 8
=item CONSTRUCT
The C<PerlInterpreter*> is being constructed via C<perl_construct>. This
value is mostly there for completeness and for use via the
underlying C variable C<PL_phase>. It's not really possible for Perl
code to be executed unless construction of the interpreter is
finished.
=item START
This is the global compile-time. That includes, basically, every
C<BEGIN> block executed directly or indirectly from during the
compile-time of the top-level program.
This phase is not called "BEGIN" to avoid confusion with
C<BEGIN>-blocks, as those are executed during compile-time of any
compilation unit, not just the top-level program. A new, localised
compile-time entered at run-time, for example by constructs as
C<eval "use SomeModule"> are not global interpreter phases, and
therefore aren't reflected by C<${^GLOBAL_PHASE}>.
=item CHECK
Execution of any C<CHECK> blocks.
=item INIT
Similar to "CHECK", but for C<INIT>-blocks, not C<CHECK> blocks.
=item RUN
The main run-time, i.e. the execution of C<PL_main_root>.
=item END
Execution of any C<END> blocks.
=item DESTRUCT
Global destruction.
=back
Also note that there's no value for UNITCHECK-blocks. That's because
those are run for each compilation unit individually, and therefore is
not a global interpreter phase.
Not every program has to go through each of the possible phases, but
transition from one phase to another can only happen in the order
described in the above list.
An example of all of the phases Perl code can see:
BEGIN { print "compile-time: ${^GLOBAL_PHASE}\n" }
INIT { print "init-time: ${^GLOBAL_PHASE}\n" }
CHECK { print "check-time: ${^GLOBAL_PHASE}\n" }
{
package Print::Phase;
sub new {
my ($class, $time) = @_;
return bless \$time, $class;
}
sub DESTROY {
my $self = shift;
print "$$self: ${^GLOBAL_PHASE}\n";
}
}
print "run-time: ${^GLOBAL_PHASE}\n";
my $runtime = Print::Phase->new(
"lexical variables are garbage collected before END"
);
END { print "end-time: ${^GLOBAL_PHASE}\n" }
our $destruct = Print::Phase->new(
"package variables are garbage collected after END"
);
This will print out
compile-time: START
check-time: CHECK
init-time: INIT
run-time: RUN
lexical variables are garbage collected before END: RUN
end-time: END
package variables are garbage collected after END: DESTRUCT
This variable was added in Perl 5.14.0.
=item $^H
X<$^H>
WARNING: This variable is strictly for
internal use only. Its availability,
behavior, and contents are subject to change without notice.
This variable contains compile-time hints for the Perl interpreter. At the
end of compilation of a BLOCK the value of this variable is restored to the
value when the interpreter started to compile the BLOCK.
When perl begins to parse any block construct that provides a lexical scope
(e.g., eval body, required file, subroutine body, loop body, or conditional
block), the existing value of C<$^H> is saved, but its value is left unchanged.
When the compilation of the block is completed, it regains the saved value.
Between the points where its value is saved and restored, code that
executes within BEGIN blocks is free to change the value of C<$^H>.
This behavior provides the semantic of lexical scoping, and is used in,
for instance, the C<use strict> pragma.
The contents should be an integer; different bits of it are used for
different pragmatic flags. Here's an example:
sub add_100 { $^H |= 0x100 }
sub foo {
BEGIN { add_100() }
bar->baz($boon);
}
Consider what happens during execution of the BEGIN block. At this point
the BEGIN block has already been compiled, but the body of C<foo()> is still
being compiled. The new value of C<$^H>
will therefore be visible only while
the body of C<foo()> is being compiled.
Substitution of C<BEGIN { add_100() }> block with:
BEGIN { require strict; strict->import('vars') }
demonstrates how C<use strict 'vars'> is implemented. Here's a conditional
version of the same lexical pragma:
BEGIN {
require strict; strict->import('vars') if $condition
}
This variable was added in Perl 5.003.
=item %^H
X<%^H>
The C<%^H> hash provides the same scoping semantic as C<$^H>. This makes
it useful for implementation of lexically scoped pragmas. See
L<perlpragma>. All the entries are stringified when accessed at
runtime, so only simple values can be accommodated. This means no
pointers to objects, for example.
When putting items into C<%^H>, in order to avoid conflicting with other
users of the hash there is a convention regarding which keys to use.
A module should use only keys that begin with the module's name (the
name of its main package) and a "/" character. For example, a module
C<Foo::Bar> should use keys such as C<Foo::Bar/baz>.
This variable was added in Perl v5.6.0.
=item ${^OPEN}
X<${^OPEN}>
An internal variable used by PerlIO. A string in two parts, separated
by a C<\0> byte, the first part describes the input layers, the second
part describes the output layers.
This variable was added in Perl v5.8.0.
=item $PERLDB
=item $^P
X<$^P> X<$PERLDB>
The internal variable for debugging support. The meanings of the
various bits are subject to change, but currently indicate:
=over 6
=item 0x01
Debug subroutine enter/exit.
=item 0x02
Line-by-line debugging. Causes C<DB::DB()> subroutine to be called for
each statement executed. Also causes saving source code lines (like
0x400).
=item 0x04
Switch off optimizations.
=item 0x08
Preserve more data for future interactive inspections.
=item 0x10
Keep info about source lines on which a subroutine is defined.
=item 0x20
Start with single-step on.
=item 0x40
Use subroutine address instead of name when reporting.
=item 0x80
Report C<goto &subroutine> as well.
=item 0x100
Provide informative "file" names for evals based on the place they were compiled.
=item 0x200
Provide informative names to anonymous subroutines based on the place they
were compiled.
=item 0x400
Save source code lines into C<@{"_<$filename"}>.
=item 0x800
When saving source, include evals that generate no subroutines.
=item 0x1000
When saving source, include source that did not compile.
=back
Some bits may be relevant at compile-time only, some at
run-time only. This is a new mechanism and the details may change.
See also L<perldebguts>.
=item ${^TAINT}
X<${^TAINT}>
Reflects if taint mode is on or off. 1 for on (the program was run with
B<-T>), 0 for off, -1 when only taint warnings are enabled (i.e. with
B<-t> or B<-TU>).
This variable is read-only.
This variable was added in Perl v5.8.0.
=item ${^UNICODE}
X<${^UNICODE}>
Reflects certain Unicode settings of Perl. See L<perlrun>
documentation for the C<-C> switch for more information about
the possible values.
This variable is set during Perl startup and is thereafter read-only.
This variable was added in Perl v5.8.2.
=item ${^UTF8CACHE}
X<${^UTF8CACHE}>
This variable controls the state of the internal UTF-8 offset caching code.
1 for on (the default), 0 for off, -1 to debug the caching code by checking
all its results against linear scans, and panicking on any discrepancy.
This variable was added in Perl v5.8.9. It is subject to change or
removal without notice, but is currently used to avoid recalculating the
boundaries of multi-byte UTF-8-encoded characters.
=item ${^UTF8LOCALE}
X<${^UTF8LOCALE}>
This variable indicates whether a UTF-8 locale was detected by perl at
startup. This information is used by perl when it's in
adjust-utf8ness-to-locale mode (as when run with the C<-CL> command-line
switch); see L<perlrun> for more info on this.
This variable was added in Perl v5.8.8.
=back
=head2 Deprecated and removed variables
Deprecating a variable announces the intent of the perl maintainers to
eventually remove the variable from the language. It may still be
available despite its status. Using a deprecated variable triggers
a warning.
Once a variable is removed, its use triggers an error telling you
the variable is unsupported.
See L<perldiag> for details about error messages.
=over 8
=item $#
X<$#>
C<$#> was a variable that could be used to format printed numbers.
After a deprecation cycle, its magic was removed in Perl v5.10.0 and
using it now triggers a warning: C<$# is no longer supported>.
This is not the sigil you use in front of an array name to get the
last index, like C<$#array>. That's still how you get the last index
of an array in Perl. The two have nothing to do with each other.
Deprecated in Perl 5.
Removed in Perl v5.10.0.
=item $*
X<$*>
C<$*> was a variable that you could use to enable multiline matching.
After a deprecation cycle, its magic was removed in Perl v5.10.0.
Using it now triggers a warning: C<$* is no longer supported>.
You should use the C</s> and C</m> regexp modifiers instead.
Deprecated in Perl 5.
Removed in Perl v5.10.0.
=item $[
X<$[>
This variable stores the index of the first element in an array, and
of the first character in a substring. The default is 0, but you could
theoretically set it to 1 to make Perl behave more like B<awk> (or Fortran)
when subscripting and when evaluating the index() and substr() functions.
As of release 5 of Perl, assignment to C<$[> is treated as a compiler
directive, and cannot influence the behavior of any other file.
(That's why you can only assign compile-time constants to it.)
Its use is highly discouraged.
Prior to Perl v5.10.0, assignment to C<$[> could be seen from outer lexical
scopes in the same file, unlike other compile-time directives (such as
L<strict>). Using local() on it would bind its value strictly to a lexical
block. Now it is always lexically scoped.
As of Perl v5.16.0, it is implemented by the L<arybase> module. See
L<arybase> for more details on its behaviour.
Under C<use v5.16>, or C<no feature "array_base">, C<$[> no longer has any
effect, and always contains 0. Assigning 0 to it is permitted, but any
other value will produce an error.
Mnemonic: [ begins subscripts.
Deprecated in Perl v5.12.0.
=back
=cut
PK {3�Z^o��� �� perl581delta.podnu �[��� =head1 NAME
perl581delta - what is new for perl v5.8.1
=head1 DESCRIPTION
This document describes differences between the 5.8.0 release and
the 5.8.1 release.
If you are upgrading from an earlier release such as 5.6.1, first read
the L<perl58delta>, which describes differences between 5.6.0 and
5.8.0.
In case you are wondering about 5.6.1, it was bug-fix-wise rather
identical to the development release 5.7.1. Confused? This timeline
hopefully helps a bit: it lists the new major releases, their maintenance
releases, and the development releases.
New Maintenance Development
5.6.0 2000-Mar-22
5.7.0 2000-Sep-02
5.6.1 2001-Apr-08
5.7.1 2001-Apr-09
5.7.2 2001-Jul-13
5.7.3 2002-Mar-05
5.8.0 2002-Jul-18
5.8.1 2003-Sep-25
=head1 Incompatible Changes
=head2 Hash Randomisation
Mainly due to security reasons, the "random ordering" of hashes
has been made even more random. Previously while the order of hash
elements from keys(), values(), and each() was essentially random,
it was still repeatable. Now, however, the order varies between
different runs of Perl.
B<Perl has never guaranteed any ordering of the hash keys>, and the
ordering has already changed several times during the lifetime of
Perl 5. Also, the ordering of hash keys has always been, and
continues to be, affected by the insertion order.
The added randomness may affect applications.
One possible scenario is when output of an application has included
hash data. For example, if you have used the Data::Dumper module to
dump data into different files, and then compared the files to see
whether the data has changed, now you will have false positives since
the order in which hashes are dumped will vary. In general the cure
is to sort the keys (or the values); in particular for Data::Dumper to
use the C<Sortkeys> option. If some particular order is really
important, use tied hashes: for example the Tie::IxHash module
which by default preserves the order in which the hash elements
were added.
More subtle problem is reliance on the order of "global destruction".
That is what happens at the end of execution: Perl destroys all data
structures, including user data. If your destructors (the DESTROY
subroutines) have assumed any particular ordering to the global
destruction, there might be problems ahead. For example, in a
destructor of one object you cannot assume that objects of any other
class are still available, unless you hold a reference to them.
If the environment variable PERL_DESTRUCT_LEVEL is set to a non-zero
value, or if Perl is exiting a spawned thread, it will also destruct
the ordinary references and the symbol tables that are no longer in use.
You can't call a class method or an ordinary function on a class that
has been collected that way.
The hash randomisation is certain to reveal hidden assumptions about
some particular ordering of hash elements, and outright bugs: it
revealed a few bugs in the Perl core and core modules.
To disable the hash randomisation in runtime, set the environment
variable PERL_HASH_SEED to 0 (zero) before running Perl (for more
information see L<perlrun/PERL_HASH_SEED>), or to disable the feature
completely in compile time, compile with C<-DNO_HASH_SEED> (see F<INSTALL>).
See L<perlsec/"Algorithmic Complexity Attacks"> for the original
rationale behind this change.
=head2 UTF-8 On Filehandles No Longer Activated By Locale
In Perl 5.8.0 all filehandles, including the standard filehandles,
were implicitly set to be in Unicode UTF-8 if the locale settings
indicated the use of UTF-8. This feature caused too many problems,
so the feature was turned off and redesigned: see L</"Core Enhancements">.
=head2 Single-number v-strings are no longer v-strings before "=>"
The version strings or v-strings (see L<perldata/"Version Strings">)
feature introduced in Perl 5.6.0 has been a source of some confusion--
especially when the user did not want to use it, but Perl thought it
knew better. Especially troublesome has been the feature that before
a "=>" a version string (a "v" followed by digits) has been interpreted
as a v-string instead of a string literal. In other words:
%h = ( v65 => 42 );
has meant since Perl 5.6.0
%h = ( 'A' => 42 );
(at least in platforms of ASCII progeny) Perl 5.8.1 restores the
more natural interpretation
%h = ( 'v65' => 42 );
The multi-number v-strings like v65.66 and 65.66.67 still continue to
be v-strings in Perl 5.8.
=head2 (Win32) The -C Switch Has Been Repurposed
The -C switch has changed in an incompatible way. The old semantics
of this switch only made sense in Win32 and only in the "use utf8"
universe in 5.6.x releases, and do not make sense for the Unicode
implementation in 5.8.0. Since this switch could not have been used
by anyone, it has been repurposed. The behavior that this switch
enabled in 5.6.x releases may be supported in a transparent,
data-dependent fashion in a future release.
For the new life of this switch, see L</"UTF-8 no longer default under
UTF-8 locales">, and L<perlrun/-C>.
=head2 (Win32) The /d Switch Of cmd.exe
Perl 5.8.1 uses the /d switch when running the cmd.exe shell
internally for system(), backticks, and when opening pipes to external
programs. The extra switch disables the execution of AutoRun commands
from the registry, which is generally considered undesirable when
running external programs. If you wish to retain compatibility with
the older behavior, set PERL5SHELL in your environment to C<cmd /x/c>.
=head1 Core Enhancements
=head2 UTF-8 no longer default under UTF-8 locales
In Perl 5.8.0 many Unicode features were introduced. One of them
was found to be of more nuisance than benefit: the automagic
(and silent) "UTF-8-ification" of filehandles, including the
standard filehandles, if the user's locale settings indicated
use of UTF-8.
For example, if you had C<en_US.UTF-8> as your locale, your STDIN and
STDOUT were automatically "UTF-8", in other words an implicit
binmode(..., ":utf8") was made. This meant that trying to print, say,
chr(0xff), ended up printing the bytes 0xc3 0xbf. Hardly what
you had in mind unless you were aware of this feature of Perl 5.8.0.
The problem is that the vast majority of people weren't: for example
in RedHat releases 8 and 9 the B<default> locale setting is UTF-8, so
all RedHat users got UTF-8 filehandles, whether they wanted it or not.
The pain was intensified by the Unicode implementation of Perl 5.8.0
(still) having nasty bugs, especially related to the use of s/// and
tr///. (Bugs that have been fixed in 5.8.1)
Therefore a decision was made to backtrack the feature and change it
from implicit silent default to explicit conscious option. The new
Perl command line option C<-C> and its counterpart environment
variable PERL_UNICODE can now be used to control how Perl and Unicode
interact at interfaces like I/O and for example the command line
arguments. See L<perlrun/-C> and L<perlrun/PERL_UNICODE> for more
information.
=head2 Unsafe signals again available
In Perl 5.8.0 the so-called "safe signals" were introduced. This
means that Perl no longer handles signals immediately but instead
"between opcodes", when it is safe to do so. The earlier immediate
handling easily could corrupt the internal state of Perl, resulting
in mysterious crashes.
However, the new safer model has its problems too. Because now an
opcode, a basic unit of Perl execution, is never interrupted but
instead let to run to completion, certain operations that can take a
long time now really do take a long time. For example, certain
network operations have their own blocking and timeout mechanisms, and
being able to interrupt them immediately would be nice.
Therefore perl 5.8.1 introduces a "backdoor" to restore the pre-5.8.0
(pre-5.7.3, really) signal behaviour. Just set the environment variable
PERL_SIGNALS to C<unsafe>, and the old immediate (and unsafe)
signal handling behaviour returns. See L<perlrun/PERL_SIGNALS>
and L<perlipc/"Deferred Signals (Safe Signals)">.
In completely unrelated news, you can now use safe signals with
POSIX::SigAction. See L<POSIX/POSIX::SigAction>.
=head2 Tied Arrays with Negative Array Indices
Formerly, the indices passed to C<FETCH>, C<STORE>, C<EXISTS>, and
C<DELETE> methods in tied array class were always non-negative. If
the actual argument was negative, Perl would call FETCHSIZE implicitly
and add the result to the index before passing the result to the tied
array method. This behaviour is now optional. If the tied array class
contains a package variable named C<$NEGATIVE_INDICES> which is set to
a true value, negative values will be passed to C<FETCH>, C<STORE>,
C<EXISTS>, and C<DELETE> unchanged.
=head2 local ${$x}
The syntaxes
local ${$x}
local @{$x}
local %{$x}
now do localise variables, given that the $x is a valid variable name.
=head2 Unicode Character Database 4.0.0
The copy of the Unicode Character Database included in Perl 5.8 has
been updated to 4.0.0 from 3.2.0. This means for example that the
Unicode character properties are as in Unicode 4.0.0.
=head2 Deprecation Warnings
There is one new feature deprecation. Perl 5.8.0 forgot to add
some deprecation warnings, these warnings have now been added.
Finally, a reminder of an impending feature removal.
=head3 (Reminder) Pseudo-hashes are deprecated (really)
Pseudo-hashes were deprecated in Perl 5.8.0 and will be removed in
Perl 5.10.0, see L<perl58delta> for details. Each attempt to access
pseudo-hashes will trigger the warning C<Pseudo-hashes are deprecated>.
If you really want to continue using pseudo-hashes but not to see the
deprecation warnings, use:
no warnings 'deprecated';
Or you can continue to use the L<fields> pragma, but please don't
expect the data structures to be pseudohashes any more.
=head3 (Reminder) 5.005-style threads are deprecated (really)
5.005-style threads (activated by C<use Thread;>) were deprecated in
Perl 5.8.0 and will be removed after Perl 5.8, see L<perl58delta> for
details. Each 5.005-style thread creation will trigger the warning
C<5.005 threads are deprecated>. If you really want to continue
using the 5.005 threads but not to see the deprecation warnings, use:
no warnings 'deprecated';
=head3 (Reminder) The $* variable is deprecated (really)
The C<$*> variable controlling multi-line matching has been deprecated
and will be removed after 5.8. The variable has been deprecated for a
long time, and a deprecation warning C<Use of $* is deprecated> is given,
now the variable will just finally be removed. The functionality has
been supplanted by the C</s> and C</m> modifiers on pattern matching.
If you really want to continue using the C<$*>-variable but not to see
the deprecation warnings, use:
no warnings 'deprecated';
=head2 Miscellaneous Enhancements
C<map> in void context is no longer expensive. C<map> is now context
aware, and will not construct a list if called in void context.
If a socket gets closed by the server while printing to it, the client
now gets a SIGPIPE. While this new feature was not planned, it fell
naturally out of PerlIO changes, and is to be considered an accidental
feature.
PerlIO::get_layers(FH) returns the names of the PerlIO layers
active on a filehandle.
PerlIO::via layers can now have an optional UTF8 method to
indicate whether the layer wants to "auto-:utf8" the stream.
utf8::is_utf8() has been added as a quick way to test whether
a scalar is encoded internally in UTF-8 (Unicode).
=head1 Modules and Pragmata
=head2 Updated Modules And Pragmata
The following modules and pragmata have been updated since Perl 5.8.0:
=over 4
=item base
=item B::Bytecode
In much better shape than it used to be. Still far from perfect, but
maybe worth a try.
=item B::Concise
=item B::Deparse
=item Benchmark
An optional feature, C<:hireswallclock>, now allows for high
resolution wall clock times (uses Time::HiRes).
=item ByteLoader
See B::Bytecode.
=item bytes
Now has bytes::substr.
=item CGI
=item charnames
One can now have custom character name aliases.
=item CPAN
There is now a simple command line frontend to the CPAN.pm
module called F<cpan>.
=item Data::Dumper
A new option, Pair, allows choosing the separator between hash keys
and values.
=item DB_File
=item Devel::PPPort
=item Digest::MD5
=item Encode
Significant updates on the encoding pragma functionality
(tr/// and the DATA filehandle, formats).
If a filehandle has been marked as to have an encoding, unmappable
characters are detected already during input, not later (when the
corrupted data is being used).
The ISO 8859-6 conversion table has been corrected (the 0x30..0x39
erroneously mapped to U+0660..U+0669, instead of U+0030..U+0039). The
GSM 03.38 conversion did not handle escape sequences correctly. The
UTF-7 encoding has been added (making Encode feature-complete with
Unicode::String).
=item fields
=item libnet
=item Math::BigInt
A lot of bugs have been fixed since v1.60, the version included in Perl
v5.8.0. Especially noteworthy are the bug in Calc that caused div and mod to
fail for some large values, and the fixes to the handling of bad inputs.
Some new features were added, e.g. the broot() method, you can now pass
parameters to config() to change some settings at runtime, and it is now
possible to trap the creation of NaN and infinity.
As usual, some optimizations took place and made the math overall a tad
faster. In some cases, quite a lot faster, actually. Especially alternative
libraries like Math::BigInt::GMP benefit from this. In addition, a lot of the
quite clunky routines like fsqrt() and flog() are now much much faster.
=item MIME::Base64
=item NEXT
Diamond inheritance now works.
=item Net::Ping
=item PerlIO::scalar
Reading from non-string scalars (like the special variables, see
L<perlvar>) now works.
=item podlators
=item Pod::LaTeX
=item PodParsers
=item Pod::Perldoc
Complete rewrite. As a side-effect, no longer refuses to startup when
run by root.
=item Scalar::Util
New utilities: refaddr, isvstring, looks_like_number, set_prototype.
=item Storable
Can now store code references (via B::Deparse, so not foolproof).
=item strict
Earlier versions of the strict pragma did not check the parameters
implicitly passed to its "import" (use) and "unimport" (no) routine.
This caused the false idiom such as:
use strict qw(@ISA);
@ISA = qw(Foo);
This however (probably) raised the false expectation that the strict
refs, vars and subs were being enforced (and that @ISA was somehow
"declared"). But the strict refs, vars, and subs are B<not> enforced
when using this false idiom.
Starting from Perl 5.8.1, the above B<will> cause an error to be
raised. This may cause programs which used to execute seemingly
correctly without warnings and errors to fail when run under 5.8.1.
This happens because
use strict qw(@ISA);
will now fail with the error:
Unknown 'strict' tag(s) '@ISA'
The remedy to this problem is to replace this code with the correct idiom:
use strict;
use vars qw(@ISA);
@ISA = qw(Foo);
=item Term::ANSIcolor
=item Test::Harness
Now much more picky about extra or missing output from test scripts.
=item Test::More
=item Test::Simple
=item Text::Balanced
=item Time::HiRes
Use of nanosleep(), if available, allows mixing subsecond sleeps with
alarms.
=item threads
Several fixes, for example for join() problems and memory
leaks. In some platforms (like Linux) that use glibc the minimum memory
footprint of one ithread has been reduced by several hundred kilobytes.
=item threads::shared
Many memory leaks have been fixed.
=item Unicode::Collate
=item Unicode::Normalize
=item Win32::GetFolderPath
=item Win32::GetOSVersion
Now returns extra information.
=back
=head1 Utility Changes
The C<h2xs> utility now produces a more modern layout:
F<Foo-Bar/lib/Foo/Bar.pm> instead of F<Foo/Bar/Bar.pm>.
Also, the boilerplate test is now called F<t/Foo-Bar.t>
instead of F<t/1.t>.
The Perl debugger (F<lib/perl5db.pl>) has now been extensively
documented and bugs found while documenting have been fixed.
C<perldoc> has been rewritten from scratch to be more robust and
feature rich.
C<perlcc -B> works now at least somewhat better, while C<perlcc -c>
is rather more broken. (The Perl compiler suite as a whole continues
to be experimental.)
=head1 New Documentation
perl573delta has been added to list the differences between the
(now quite obsolete) development releases 5.7.2 and 5.7.3.
perl58delta has been added: it is the perldelta of 5.8.0, detailing
the differences between 5.6.0 and 5.8.0.
perlartistic has been added: it is the Artistic License in pod format,
making it easier for modules to refer to it.
perlcheat has been added: it is a Perl cheat sheet.
perlgpl has been added: it is the GNU General Public License in pod
format, making it easier for modules to refer to it.
perlmacosx has been added to tell about the installation and use
of Perl in Mac OS X.
perlos400 has been added to tell about the installation and use
of Perl in OS/400 PASE.
perlreref has been added: it is a regular expressions quick reference.
=head1 Installation and Configuration Improvements
The Unix standard Perl location, F</usr/bin/perl>, is no longer
overwritten by default if it exists. This change was very prudent
because so many Unix vendors already provide a F</usr/bin/perl>,
but simultaneously many system utilities may depend on that
exact version of Perl, so better not to overwrite it.
One can now specify installation directories for site and vendor man
and HTML pages, and site and vendor scripts. See F<INSTALL>.
One can now specify a destination directory for Perl installation
by specifying the DESTDIR variable for C<make install>. (This feature
is slightly different from the previous C<Configure -Dinstallprefix=...>.)
See F<INSTALL>.
gcc versions 3.x introduced a new warning that caused a lot of noise
during Perl compilation: C<gcc -Ialreadyknowndirectory (warning:
changing search order)>. This warning has now been avoided by
Configure weeding out such directories before the compilation.
One can now build subsets of Perl core modules by using the
Configure flags C<-Dnoextensions=...> and C<-Donlyextensions=...>,
see F<INSTALL>.
=head2 Platform-specific enhancements
In Cygwin Perl can now be built with threads (C<Configure -Duseithreads>).
This works with both Cygwin 1.3.22 and Cygwin 1.5.3.
In newer FreeBSD releases Perl 5.8.0 compilation failed because of
trying to use F<malloc.h>, which in FreeBSD is just a dummy file, and
a fatal error to even try to use. Now F<malloc.h> is not used.
Perl is now known to build also in Hitachi HI-UXMPP.
Perl is now known to build again in LynxOS.
Mac OS X now installs with Perl version number embedded in
installation directory names for easier upgrading of user-compiled
Perl, and the installation directories in general are more standard.
In other words, the default installation no longer breaks the
Apple-provided Perl. On the other hand, with C<Configure -Dprefix=/usr>
you can now really replace the Apple-supplied Perl (B<please be careful>).
Mac OS X now builds Perl statically by default. This change was done
mainly for faster startup times. The Apple-provided Perl is still
dynamically linked and shared, and you can enable the sharedness for
your own Perl builds by C<Configure -Duseshrplib>.
Perl has been ported to IBM's OS/400 PASE environment. The best way
to build a Perl for PASE is to use an AIX host as a cross-compilation
environment. See README.os400.
Yet another cross-compilation option has been added: now Perl builds
on OpenZaurus, an Linux distribution based on Mandrake + Embedix for
the Sharp Zaurus PDA. See the Cross/README file.
Tru64 when using gcc 3 drops the optimisation for F<toke.c> to C<-O2>
because of gigantic memory use with the default C<-O3>.
Tru64 can now build Perl with the newer Berkeley DBs.
Building Perl on WinCE has been much enhanced, see F<README.ce>
and F<README.perlce>.
=head1 Selected Bug Fixes
=head2 Closures, eval and lexicals
There have been many fixes in the area of anonymous subs, lexicals and
closures. Although this means that Perl is now more "correct", it is
possible that some existing code will break that happens to rely on
the faulty behaviour. In practice this is unlikely unless your code
contains a very complex nesting of anonymous subs, evals and lexicals.
=head2 Generic fixes
If an input filehandle is marked C<:utf8> and Perl sees illegal UTF-8
coming in when doing C<< <FH> >>, if warnings are enabled a warning is
immediately given - instead of being silent about it and Perl being
unhappy about the broken data later. (The C<:encoding(utf8)> layer
also works the same way.)
binmode(SOCKET, ":utf8") only worked on the input side, not on the
output side of the socket. Now it works both ways.
For threaded Perls certain system database functions like getpwent()
and getgrent() now grow their result buffer dynamically, instead of
failing. This means that at sites with lots of users and groups the
functions no longer fail by returning only partial results.
Perl 5.8.0 had accidentally broken the capability for users
to define their own uppercase<->lowercase Unicode mappings
(as advertised by the Camel). This feature has been fixed and
is also documented better.
In 5.8.0 this
$some_unicode .= <FH>;
didn't work correctly but instead corrupted the data. This has now
been fixed.
Tied methods like FETCH etc. may now safely access tied values, i.e.
resulting in a recursive call to FETCH etc. Remember to break the
recursion, though.
At startup Perl blocks the SIGFPE signal away since there isn't much
Perl can do about it. Previously this blocking was in effect also for
programs executed from within Perl. Now Perl restores the original
SIGFPE handling routine, whatever it was, before running external
programs.
Linenumbers in Perl scripts may now be greater than 65536, or 2**16.
(Perl scripts have always been able to be larger than that, it's just
that the linenumber for reported errors and warnings have "wrapped
around".) While scripts that large usually indicate a need to rethink
your code a bit, such Perl scripts do exist, for example as results
from generated code. Now linenumbers can go all the way to
4294967296, or 2**32.
=head2 Platform-specific fixes
Linux
=over 4
=item *
Setting $0 works again (with certain limitations that
Perl cannot do much about: see L<perlvar/$0>)
=back
HP-UX
=over 4
=item *
Setting $0 now works.
=back
VMS
=over 4
=item *
Configuration now tests for the presence of C<poll()>, and IO::Poll
now uses the vendor-supplied function if detected.
=item *
A rare access violation at Perl start-up could occur if the Perl image was
installed with privileges or if there was an identifier with the
subsystem attribute set in the process's rightslist. Either of these
circumstances triggered tainting code that contained a pointer bug.
The faulty pointer arithmetic has been fixed.
=item *
The length limit on values (not keys) in the %ENV hash has been raised
from 255 bytes to 32640 bytes (except when the PERL_ENV_TABLES setting
overrides the default use of logical names for %ENV). If it is
necessary to access these long values from outside Perl, be aware that
they are implemented using search list logical names that store the
value in pieces, each 255-byte piece (up to 128 of them) being an
element in the search list. When doing a lookup in %ENV from within
Perl, the elements are combined into a single value. The existing
VMS-specific ability to access individual elements of a search list
logical name via the $ENV{'foo;N'} syntax (where N is the search list
index) is unimpaired.
=item *
The piping implementation now uses local rather than global DCL
symbols for inter-process communication.
=item *
File::Find could become confused when navigating to a relative
directory whose name collided with a logical name. This problem has
been corrected by adding directory syntax to relative path names, thus
preventing logical name translation.
=back
Win32
=over 4
=item *
A memory leak in the fork() emulation has been fixed.
=item *
The return value of the ioctl() built-in function was accidentally
broken in 5.8.0. This has been corrected.
=item *
The internal message loop executed by perl during blocking operations
sometimes interfered with messages that were external to Perl.
This often resulted in blocking operations terminating prematurely or
returning incorrect results, when Perl was executing under environments
that could generate Windows messages. This has been corrected.
=item *
Pipes and sockets are now automatically in binary mode.
=item *
The four-argument form of select() did not preserve $! (errno) properly
when there were errors in the underlying call. This is now fixed.
=item *
The "CR CR LF" problem of has been fixed, binmode(FH, ":crlf")
is now effectively a no-op.
=back
=head1 New or Changed Diagnostics
All the warnings related to pack() and unpack() were made more
informative and consistent.
=head2 Changed "A thread exited while %d threads were running"
The old version
A thread exited while %d other threads were still running
was misleading because the "other" included also the thread giving
the warning.
=head2 Removed "Attempt to clear a restricted hash"
It is not illegal to clear a restricted hash, so the warning
was removed.
=head2 New "Illegal declaration of anonymous subroutine"
You must specify the block of code for C<sub>.
=head2 Changed "Invalid range "%s" in transliteration operator"
The old version
Invalid [] range "%s" in transliteration operator
was simply wrong because there are no "[] ranges" in tr///.
=head2 New "Missing control char name in \c"
Self-explanatory.
=head2 New "Newline in left-justified string for %s"
The padding spaces would appear after the newline, which is
probably not what you had in mind.
=head2 New "Possible precedence problem on bitwise %c operator"
If you think this
$x & $y == 0
tests whether the bitwise AND of $x and $y is zero,
you will like this warning.
=head2 New "Pseudo-hashes are deprecated"
This warning should have been already in 5.8.0, since they are.
=head2 New "read() on %s filehandle %s"
You cannot read() (or sysread()) from a closed or unopened filehandle.
=head2 New "5.005 threads are deprecated"
This warning should have been already in 5.8.0, since they are.
=head2 New "Tied variable freed while still in use"
Something pulled the plug on a live tied variable, Perl plays
safe by bailing out.
=head2 New "To%s: illegal mapping '%s'"
An illegal user-defined Unicode casemapping was specified.
=head2 New "Use of freed value in iteration"
Something modified the values being iterated over. This is not good.
=head1 Changed Internals
These news matter to you only if you either write XS code or like to
know about or hack Perl internals (using Devel::Peek or any of the
C<B::> modules counts), or like to run Perl with the C<-D> option.
The embedding examples of L<perlembed> have been reviewed to be
up to date and consistent: for example, the correct use of
PERL_SYS_INIT3() and PERL_SYS_TERM().
Extensive reworking of the pad code (the code responsible
for lexical variables) has been conducted by Dave Mitchell.
Extensive work on the v-strings by John Peacock.
UTF-8 length and position cache: to speed up the handling of Unicode
(UTF-8) scalars, a cache was introduced. Potential problems exist if
an extension bypasses the official APIs and directly modifies the PV
of an SV: the UTF-8 cache does not get cleared as it should.
APIs obsoleted in Perl 5.8.0, like sv_2pv, sv_catpvn, sv_catsv,
sv_setsv, are again available.
Certain Perl core C APIs like cxinc and regatom are no longer
available at all to code outside the Perl core of the Perl core
extensions. This is intentional. They never should have been
available with the shorter names, and if you application depends on
them, you should (be ashamed and) contact perl5-porters to discuss
what are the proper APIs.
Certain Perl core C APIs like C<Perl_list> are no longer available
without their C<Perl_> prefix. If your XS module stops working
because some functions cannot be found, in many cases a simple fix is
to add the C<Perl_> prefix to the function and the thread context
C<aTHX_> as the first argument of the function call. This is also how
it should always have been done: letting the Perl_-less forms to leak
from the core was an accident. For cleaner embedding you can also
force this for all APIs by defining at compile time the cpp define
PERL_NO_SHORT_NAMES.
Perl_save_bool() has been added.
Regexp objects (those created with C<qr>) now have S-magic rather than
R-magic. This fixed regexps of the form /...(??{...;$x})/ to no
longer ignore changes made to $x. The S-magic avoids dropping
the caching optimization and making (??{...}) constructs obscenely
slow (and consequently useless). See also L<perlguts/"Magic Variables">.
Regexp::Copy was affected by this change.
The Perl internal debugging macros DEBUG() and DEB() have been renamed
to PERL_DEBUG() and PERL_DEB() to avoid namespace conflicts.
C<-DL> removed (the leaktest had been broken and unsupported for years,
use alternative debugging mallocs or tools like valgrind and Purify).
Verbose modifier C<v> added for C<-DXv> and C<-Dsv>, see L<perlrun>.
=head1 New Tests
In Perl 5.8.0 there were about 69000 separate tests in about 700 test files,
in Perl 5.8.1 there are about 77000 separate tests in about 780 test files.
The exact numbers depend on the Perl configuration and on the operating
system platform.
=head1 Known Problems
The hash randomisation mentioned in L</Incompatible Changes> is definitely
problematic: it will wake dormant bugs and shake out bad assumptions.
If you want to use mod_perl 2.x with Perl 5.8.1, you will need
mod_perl-1.99_10 or higher. Earlier versions of mod_perl 2.x
do not work with the randomised hashes. (mod_perl 1.x works fine.)
You will also need Apache::Test 1.04 or higher.
Many of the rarer platforms that worked 100% or pretty close to it
with perl 5.8.0 have been left a little bit untended since their
maintainers have been otherwise busy lately, and therefore there will
be more failures on those platforms. Such platforms include Mac OS
Classic, IBM z/OS (and other EBCDIC platforms), and NetWare. The most
common Perl platforms (Unix and Unix-like, Microsoft platforms, and
VMS) have large enough testing and expert population that they are
doing well.
=head2 Tied hashes in scalar context
Tied hashes do not currently return anything useful in scalar context,
for example when used as boolean tests:
if (%tied_hash) { ... }
The current nonsensical behaviour is always to return false,
regardless of whether the hash is empty or has elements.
The root cause is that there is no interface for the implementors of
tied hashes to implement the behaviour of a hash in scalar context.
=head2 Net::Ping 450_service and 510_ping_udp failures
The subtests 9 and 18 of lib/Net/Ping/t/450_service.t, and the
subtest 2 of lib/Net/Ping/t/510_ping_udp.t might fail if you have
an unusual networking setup. For example in the latter case the
test is trying to send a UDP ping to the IP address 127.0.0.1.
=head2 B::C
The C-generating compiler backend B::C (the frontend being
C<perlcc -c>) is even more broken than it used to be because of
the extensive lexical variable changes. (The good news is that
B::Bytecode and ByteLoader are better than they used to be.)
=head1 Platform Specific Problems
=head2 EBCDIC Platforms
IBM z/OS and other EBCDIC platforms continue to be problematic
regarding Unicode support. Many Unicode tests are skipped when
they really should be fixed.
=head2 Cygwin 1.5 problems
In Cygwin 1.5 the F<io/tell> and F<op/sysio> tests have failures for
some yet unknown reason. In 1.5.5 the threads tests stress_cv,
stress_re, and stress_string are failing unless the environment
variable PERLIO is set to "perlio" (which makes also the io/tell
failure go away).
Perl 5.8.1 does build and work well with Cygwin 1.3: with (uname -a)
C<CYGWIN_NT-5.0 ... 1.3.22(0.78/3/2) 2003-03-18 09:20 i686 ...>
a 100% "make test" was achieved with C<Configure -des -Duseithreads>.
=head2 HP-UX: HP cc warnings about sendfile and sendpath
With certain HP C compiler releases (e.g. B.11.11.02) you will
get many warnings like this (lines wrapped for easier reading):
cc: "/usr/include/sys/socket.h", line 504: warning 562:
Redeclaration of "sendfile" with a different storage class specifier:
"sendfile" will have internal linkage.
cc: "/usr/include/sys/socket.h", line 505: warning 562:
Redeclaration of "sendpath" with a different storage class specifier:
"sendpath" will have internal linkage.
The warnings show up both during the build of Perl and during certain
lib/ExtUtils tests that invoke the C compiler. The warning, however,
is not serious and can be ignored.
=head2 IRIX: t/uni/tr_7jis.t falsely failing
The test t/uni/tr_7jis.t is known to report failure under 'make test'
or the test harness with certain releases of IRIX (at least IRIX 6.5
and MIPSpro Compilers Version 7.3.1.1m), but if run manually the test
fully passes.
=head2 Mac OS X: no usemymalloc
The Perl malloc (C<-Dusemymalloc>) does not work at all in Mac OS X.
This is not that serious, though, since the native malloc works just
fine.
=head2 Tru64: No threaded builds with GNU cc (gcc)
In the latest Tru64 releases (e.g. v5.1B or later) gcc cannot be used
to compile a threaded Perl (-Duseithreads) because the system
C<< <pthread.h> >> file doesn't know about gcc.
=head2 Win32: sysopen, sysread, syswrite
As of the 5.8.0 release, sysopen()/sysread()/syswrite() do not behave
like they used to in 5.6.1 and earlier with respect to "text" mode.
These built-ins now always operate in "binary" mode (even if sysopen()
was passed the O_TEXT flag, or if binmode() was used on the file
handle). Note that this issue should only make a difference for disk
files, as sockets and pipes have always been in "binary" mode in the
Windows port. As this behavior is currently considered a bug,
compatible behavior may be re-introduced in a future release. Until
then, the use of sysopen(), sysread() and syswrite() is not supported
for "text" mode operations.
=head1 Future Directions
The following things B<might> happen in future. The first publicly
available releases having these characteristics will be the developer
releases Perl 5.9.x, culminating in the Perl 5.10.0 release. These
are our best guesses at the moment: we reserve the right to rethink.
=over 4
=item *
PerlIO will become The Default. Currently (in Perl 5.8.x) the stdio
library is still used if Perl thinks it can use certain tricks to
make stdio go B<really> fast. For future releases our goal is to
make PerlIO go even faster.
=item *
A new feature called I<assertions> will be available. This means that
one can have code called assertions sprinkled in the code: usually
they are optimised away, but they can be enabled with the C<-A> option.
=item *
A new operator C<//> (defined-or) will be available. This means that
one will be able to say
$a // $b
instead of
defined $a ? $a : $b
and
$c //= $d;
instead of
$c = $d unless defined $c;
The operator will have the same precedence and associativity as C<||>.
A source code patch against the Perl 5.8.1 sources will be available
in CPAN as F<authors/id/H/HM/HMBRAND/dor-5.8.1.diff>.
=item *
C<unpack()> will default to unpacking the C<$_>.
=item *
Various Copy-On-Write techniques will be investigated in hopes
of speeding up Perl.
=item *
CPANPLUS, Inline, and Module::Build will become core modules.
=item *
The ability to write true lexically scoped pragmas will be introduced.
=item *
Work will continue on the bytecompiler and byteloader.
=item *
v-strings as they currently exist are scheduled to be deprecated. The
v-less form (1.2.3) will become a "version object" when used with C<use>,
C<require>, and C<$VERSION>. $^V will also be a "version object" so the
printf("%vd",...) construct will no longer be needed. The v-ful version
(v1.2.3) will become obsolete. The equivalence of strings and v-strings (e.g.
that currently 5.8.0 is equal to "\5\8\0") will go away. B<There may be no
deprecation warning for v-strings>, though: it is quite hard to detect when
v-strings are being used safely, and when they are not.
=item *
5.005 Threads Will Be Removed
=item *
The C<$*> Variable Will Be Removed
(it was deprecated a long time ago)
=item *
Pseudohashes Will Be Removed
=back
=head1 Reporting Bugs
If you find what you think is a bug, you might check the articles
recently posted to the comp.lang.perl.misc newsgroup and the perl
bug database at http://bugs.perl.org/ . There may also be
information at http://www.perl.com/ , the Perl Home Page.
If you believe you have an unreported bug, please run the B<perlbug>
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team. You can browse and search
the Perl 5 bugs at http://bugs.perl.org/
=head1 SEE ALSO
The F<Changes> file for exhaustive details on what changed.
The F<INSTALL> file for how to build Perl.
The F<README> file for general stuff.
The F<Artistic> and F<Copying> files for copyright information.
=cut
PK {3�Z���+ĕ ĕ perluniintro.podnu �[��� =head1 NAME
perluniintro - Perl Unicode introduction
=head1 DESCRIPTION
This document gives a general idea of Unicode and how to use Unicode
in Perl. See L</Further Resources> for references to more in-depth
treatments of Unicode.
=head2 Unicode
Unicode is a character set standard which plans to codify all of the
writing systems of the world, plus many other symbols.
Unicode and ISO/IEC 10646 are coordinated standards that unify
almost all other modern character set standards,
covering more than 80 writing systems and hundreds of languages,
including all commercially-important modern languages. All characters
in the largest Chinese, Japanese, and Korean dictionaries are also
encoded. The standards will eventually cover almost all characters in
more than 250 writing systems and thousands of languages.
Unicode 1.0 was released in October 1991, and 6.0 in October 2010.
A Unicode I<character> is an abstract entity. It is not bound to any
particular integer width, especially not to the C language C<char>.
Unicode is language-neutral and display-neutral: it does not encode the
language of the text, and it does not generally define fonts or other graphical
layout details. Unicode operates on characters and on text built from
those characters.
Unicode defines characters like C<LATIN CAPITAL LETTER A> or C<GREEK
SMALL LETTER ALPHA> and unique numbers for the characters, in this
case 0x0041 and 0x03B1, respectively. These unique numbers are called
I<code points>. A code point is essentially the position of the
character within the set of all possible Unicode characters, and thus in
Perl, the term I<ordinal> is often used interchangeably with it.
The Unicode standard prefers using hexadecimal notation for the code
points. If numbers like C<0x0041> are unfamiliar to you, take a peek
at a later section, L</"Hexadecimal Notation">. The Unicode standard
uses the notation C<U+0041 LATIN CAPITAL LETTER A>, to give the
hexadecimal code point and the normative name of the character.
Unicode also defines various I<properties> for the characters, like
"uppercase" or "lowercase", "decimal digit", or "punctuation";
these properties are independent of the names of the characters.
Furthermore, various operations on the characters like uppercasing,
lowercasing, and collating (sorting) are defined.
A Unicode I<logical> "character" can actually consist of more than one internal
I<actual> "character" or code point. For Western languages, this is adequately
modelled by a I<base character> (like C<LATIN CAPITAL LETTER A>) followed
by one or more I<modifiers> (like C<COMBINING ACUTE ACCENT>). This sequence of
base character and modifiers is called a I<combining character
sequence>. Some non-western languages require more complicated
models, so Unicode created the I<grapheme cluster> concept, which was
later further refined into the I<extended grapheme cluster>. For
example, a Korean Hangul syllable is considered a single logical
character, but most often consists of three actual
Unicode characters: a leading consonant followed by an interior vowel followed
by a trailing consonant.
Whether to call these extended grapheme clusters "characters" depends on your
point of view. If you are a programmer, you probably would tend towards seeing
each element in the sequences as one unit, or "character". However from
the user's point of view, the whole sequence could be seen as one
"character" since that's probably what it looks like in the context of the
user's language. In this document, we take the programmer's point of
view: one "character" is one Unicode code point.
For some combinations of base character and modifiers, there are
I<precomposed> characters. There is a single character equivalent, for
example, for the sequence C<LATIN CAPITAL LETTER A> followed by
C<COMBINING ACUTE ACCENT>. It is called C<LATIN CAPITAL LETTER A WITH
ACUTE>. These precomposed characters are, however, only available for
some combinations, and are mainly meant to support round-trip
conversions between Unicode and legacy standards (like ISO 8859). Using
sequences, as Unicode does, allows for needing fewer basic building blocks
(code points) to express many more potential grapheme clusters. To
support conversion between equivalent forms, various I<normalization
forms> are also defined. Thus, C<LATIN CAPITAL LETTER A WITH ACUTE> is
in I<Normalization Form Composed>, (abbreviated NFC), and the sequence
C<LATIN CAPITAL LETTER A> followed by C<COMBINING ACUTE ACCENT>
represents the same character in I<Normalization Form Decomposed> (NFD).
Because of backward compatibility with legacy encodings, the "a unique
number for every character" idea breaks down a bit: instead, there is
"at least one number for every character". The same character could
be represented differently in several legacy encodings. The
converse is not true: some code points do not have an assigned
character. Firstly, there are unallocated code points within
otherwise used blocks. Secondly, there are special Unicode control
characters that do not represent true characters.
When Unicode was first conceived, it was thought that all the world's
characters could be represented using a 16-bit word; that is a maximum of
C<0x10000> (or 65,536) characters would be needed, from C<0x0000> to
C<0xFFFF>. This soon proved to be wrong, and since Unicode 2.0 (July
1996), Unicode has been defined all the way up to 21 bits (C<0x10FFFF>),
and Unicode 3.1 (March 2001) defined the first characters above C<0xFFFF>.
The first C<0x10000> characters are called the I<Plane 0>, or the
I<Basic Multilingual Plane> (BMP). With Unicode 3.1, 17 (yes,
seventeen) planes in all were defined--but they are nowhere near full of
defined characters, yet.
When a new language is being encoded, Unicode generally will choose a
C<block> of consecutive unallocated code points for its characters. So
far, the number of code points in these blocks has always been evenly
divisible by 16. Extras in a block, not currently needed, are left
unallocated, for future growth. But there have been occasions when
a later release needed more code points than the available extras, and a
new block had to allocated somewhere else, not contiguous to the initial
one, to handle the overflow. Thus, it became apparent early on that
"block" wasn't an adequate organizing principle, and so the C<Script>
property was created. (Later an improved script property was added as
well, the C<Script_Extensions> property.) Those code points that are in
overflow blocks can still
have the same script as the original ones. The script concept fits more
closely with natural language: there is C<Latin> script, C<Greek>
script, and so on; and there are several artificial scripts, like
C<Common> for characters that are used in multiple scripts, such as
mathematical symbols. Scripts usually span varied parts of several
blocks. For more information about scripts, see L<perlunicode/Scripts>.
The division into blocks exists, but it is almost completely
accidental--an artifact of how the characters have been and still are
allocated. (Note that this paragraph has oversimplified things for the
sake of this being an introduction. Unicode doesn't really encode
languages, but the writing systems for them--their scripts; and one
script can be used by many languages. Unicode also encodes things that
aren't really about languages, such as symbols like C<BAGGAGE CLAIM>.)
The Unicode code points are just abstract numbers. To input and
output these abstract numbers, the numbers must be I<encoded> or
I<serialised> somehow. Unicode defines several I<character encoding
forms>, of which I<UTF-8> is the most popular. UTF-8 is a
variable length encoding that encodes Unicode characters as 1 to 4
bytes. Other encodings
include UTF-16 and UTF-32 and their big- and little-endian variants
(UTF-8 is byte-order independent). The ISO/IEC 10646 defines the UCS-2
and UCS-4 encoding forms.
For more information about encodings--for instance, to learn what
I<surrogates> and I<byte order marks> (BOMs) are--see L<perlunicode>.
=head2 Perl's Unicode Support
Starting from Perl v5.6.0, Perl has had the capacity to handle Unicode
natively. Perl v5.8.0, however, is the first recommended release for
serious Unicode work. The maintenance release 5.6.1 fixed many of the
problems of the initial Unicode implementation, but for example
regular expressions still do not work with Unicode in 5.6.1.
Perl v5.14.0 is the first release where Unicode support is
(almost) seamlessly integrable without some gotchas. (There are a few
exceptions. Firstly, some differences in L<quotemeta|perlfunc/quotemeta>
were fixed starting in Perl 5.16.0. Secondly, some differences in
L<the range operator|perlop/Range Operators> were fixed starting in
Perl 5.26.0. Thirdly, some differences in L<split|perlfunc/split> were fixed
started in Perl 5.28.0.)
To enable this
seamless support, you should C<use feature 'unicode_strings'> (which is
automatically selected if you C<use 5.012> or higher). See L<feature>.
(5.14 also fixes a number of bugs and departures from the Unicode
standard.)
Before Perl v5.8.0, the use of C<use utf8> was used to declare
that operations in the current block or file would be Unicode-aware.
This model was found to be wrong, or at least clumsy: the "Unicodeness"
is now carried with the data, instead of being attached to the
operations.
Starting with Perl v5.8.0, only one case remains where an explicit C<use
utf8> is needed: if your Perl script itself is encoded in UTF-8, you can
use UTF-8 in your identifier names, and in string and regular expression
literals, by saying C<use utf8>. This is not the default because
scripts with legacy 8-bit data in them would break. See L<utf8>.
=head2 Perl's Unicode Model
Perl supports both pre-5.6 strings of eight-bit native bytes, and
strings of Unicode characters. The general principle is that Perl tries
to keep its data as eight-bit bytes for as long as possible, but as soon
as Unicodeness cannot be avoided, the data is transparently upgraded
to Unicode. Prior to Perl v5.14.0, the upgrade was not completely
transparent (see L<perlunicode/The "Unicode Bug">), and for backwards
compatibility, full transparency is not gained unless C<use feature
'unicode_strings'> (see L<feature>) or C<use 5.012> (or higher) is
selected.
Internally, Perl currently uses either whatever the native eight-bit
character set of the platform (for example Latin-1) is, defaulting to
UTF-8, to encode Unicode strings. Specifically, if all code points in
the string are C<0xFF> or less, Perl uses the native eight-bit
character set. Otherwise, it uses UTF-8.
A user of Perl does not normally need to know nor care how Perl
happens to encode its internal strings, but it becomes relevant when
outputting Unicode strings to a stream without a PerlIO layer (one with
the "default" encoding). In such a case, the raw bytes used internally
(the native character set or UTF-8, as appropriate for each string)
will be used, and a "Wide character" warning will be issued if those
strings contain a character beyond 0x00FF.
For example,
perl -e 'print "\x{DF}\n", "\x{0100}\x{DF}\n"'
produces a fairly useless mixture of native bytes and UTF-8, as well
as a warning:
Wide character in print at ...
To output UTF-8, use the C<:encoding> or C<:utf8> output layer. Prepending
binmode(STDOUT, ":utf8");
to this sample program ensures that the output is completely UTF-8,
and removes the program's warning.
You can enable automatic UTF-8-ification of your standard file
handles, default C<open()> layer, and C<@ARGV> by using either
the C<-C> command line switch or the C<PERL_UNICODE> environment
variable, see L<perlrun> for the documentation of the C<-C> switch.
Note that this means that Perl expects other software to work the same
way:
if Perl has been led to believe that STDIN should be UTF-8, but then
STDIN coming in from another command is not UTF-8, Perl will likely
complain about the malformed UTF-8.
All features that combine Unicode and I/O also require using the new
PerlIO feature. Almost all Perl 5.8 platforms do use PerlIO, though:
you can see whether yours is by running "perl -V" and looking for
C<useperlio=define>.
=head2 Unicode and EBCDIC
Perl 5.8.0 added support for Unicode on EBCDIC platforms. This support
was allowed to lapse in later releases, but was revived in 5.22.
Unicode support is somewhat more complex to implement since additional
conversions are needed. See L<perlebcdic> for more information.
On EBCDIC platforms, the internal Unicode encoding form is UTF-EBCDIC
instead of UTF-8. The difference is that as UTF-8 is "ASCII-safe" in
that ASCII characters encode to UTF-8 as-is, while UTF-EBCDIC is
"EBCDIC-safe", in that all the basic characters (which includes all
those that have ASCII equivalents (like C<"A">, C<"0">, C<"%">, I<etc.>)
are the same in both EBCDIC and UTF-EBCDIC. Often, documentation
will use the term "UTF-8" to mean UTF-EBCDIC as well. This is the case
in this document.
=head2 Creating Unicode
This section applies fully to Perls starting with v5.22. Various
caveats for earlier releases are in the L</Earlier releases caveats>
subsection below.
To create Unicode characters in literals,
use the C<\N{...}> notation in double-quoted strings:
my $smiley_from_name = "\N{WHITE SMILING FACE}";
my $smiley_from_code_point = "\N{U+263a}";
Similarly, they can be used in regular expression literals
$smiley =~ /\N{WHITE SMILING FACE}/;
$smiley =~ /\N{U+263a}/;
At run-time you can use:
use charnames ();
my $hebrew_alef_from_name
= charnames::string_vianame("HEBREW LETTER ALEF");
my $hebrew_alef_from_code_point = charnames::string_vianame("U+05D0");
Naturally, C<ord()> will do the reverse: it turns a character into
a code point.
There are other runtime options as well. You can use C<pack()>:
my $hebrew_alef_from_code_point = pack("U", 0x05d0);
Or you can use C<chr()>, though it is less convenient in the general
case:
$hebrew_alef_from_code_point = chr(utf8::unicode_to_native(0x05d0));
utf8::upgrade($hebrew_alef_from_code_point);
The C<utf8::unicode_to_native()> and C<utf8::upgrade()> aren't needed if
the argument is above 0xFF, so the above could have been written as
$hebrew_alef_from_code_point = chr(0x05d0);
since 0x5d0 is above 255.
C<\x{}> and C<\o{}> can also be used to specify code points at compile
time in double-quotish strings, but, for backward compatibility with
older Perls, the same rules apply as with C<chr()> for code points less
than 256.
C<utf8::unicode_to_native()> is used so that the Perl code is portable
to EBCDIC platforms. You can omit it if you're I<really> sure no one
will ever want to use your code on a non-ASCII platform. Starting in
Perl v5.22, calls to it on ASCII platforms are optimized out, so there's
no performance penalty at all in adding it. Or you can simply use the
other constructs that don't require it.
See L</"Further Resources"> for how to find all these names and numeric
codes.
=head3 Earlier releases caveats
On EBCDIC platforms, prior to v5.22, using C<\N{U+...}> doesn't work
properly.
Prior to v5.16, using C<\N{...}> with a character name (as opposed to a
C<U+...> code point) required a S<C<use charnames :full>>.
Prior to v5.14, there were some bugs in C<\N{...}> with a character name
(as opposed to a C<U+...> code point).
C<charnames::string_vianame()> was introduced in v5.14. Prior to that,
C<charnames::vianame()> should work, but only if the argument is of the
form C<"U+...">. Your best bet there for runtime Unicode by character
name is probably:
use charnames ();
my $hebrew_alef_from_name
= pack("U", charnames::vianame("HEBREW LETTER ALEF"));
=head2 Handling Unicode
Handling Unicode is for the most part transparent: just use the
strings as usual. Functions like C<index()>, C<length()>, and
C<substr()> will work on the Unicode characters; regular expressions
will work on the Unicode characters (see L<perlunicode> and L<perlretut>).
Note that Perl considers grapheme clusters to be separate characters, so for
example
print length("\N{LATIN CAPITAL LETTER A}\N{COMBINING ACUTE ACCENT}"),
"\n";
will print 2, not 1. The only exception is that regular expressions
have C<\X> for matching an extended grapheme cluster. (Thus C<\X> in a
regular expression would match the entire sequence of both the example
characters.)
Life is not quite so transparent, however, when working with legacy
encodings, I/O, and certain special cases:
=head2 Legacy Encodings
When you combine legacy data and Unicode, the legacy data needs
to be upgraded to Unicode. Normally the legacy data is assumed to be
ISO 8859-1 (or EBCDIC, if applicable).
The C<Encode> module knows about many encodings and has interfaces
for doing conversions between those encodings:
use Encode 'decode';
$data = decode("iso-8859-3", $data); # convert from legacy
=head2 Unicode I/O
Normally, writing out Unicode data
print FH $some_string_with_unicode, "\n";
produces raw bytes that Perl happens to use to internally encode the
Unicode string. Perl's internal encoding depends on the system as
well as what characters happen to be in the string at the time. If
any of the characters are at code points C<0x100> or above, you will get
a warning. To ensure that the output is explicitly rendered in the
encoding you desire--and to avoid the warning--open the stream with
the desired encoding. Some examples:
open FH, ">:utf8", "file";
open FH, ">:encoding(ucs2)", "file";
open FH, ">:encoding(UTF-8)", "file";
open FH, ">:encoding(shift_jis)", "file";
and on already open streams, use C<binmode()>:
binmode(STDOUT, ":utf8");
binmode(STDOUT, ":encoding(ucs2)");
binmode(STDOUT, ":encoding(UTF-8)");
binmode(STDOUT, ":encoding(shift_jis)");
The matching of encoding names is loose: case does not matter, and
many encodings have several aliases. Note that the C<:utf8> layer
must always be specified exactly like that; it is I<not> subject to
the loose matching of encoding names. Also note that currently C<:utf8> is unsafe for
input, because it accepts the data without validating that it is indeed valid
UTF-8; you should instead use C<:encoding(UTF-8)> (with or without a
hyphen).
See L<PerlIO> for the C<:utf8> layer, L<PerlIO::encoding> and
L<Encode::PerlIO> for the C<:encoding()> layer, and
L<Encode::Supported> for many encodings supported by the C<Encode>
module.
Reading in a file that you know happens to be encoded in one of the
Unicode or legacy encodings does not magically turn the data into
Unicode in Perl's eyes. To do that, specify the appropriate
layer when opening files
open(my $fh,'<:encoding(UTF-8)', 'anything');
my $line_of_unicode = <$fh>;
open(my $fh,'<:encoding(Big5)', 'anything');
my $line_of_unicode = <$fh>;
The I/O layers can also be specified more flexibly with
the C<open> pragma. See L<open>, or look at the following example.
use open ':encoding(UTF-8)'; # input/output default encoding will be
# UTF-8
open X, ">file";
print X chr(0x100), "\n";
close X;
open Y, "<file";
printf "%#x\n", ord(<Y>); # this should print 0x100
close Y;
With the C<open> pragma you can use the C<:locale> layer
BEGIN { $ENV{LC_ALL} = $ENV{LANG} = 'ru_RU.KOI8-R' }
# the :locale will probe the locale environment variables like
# LC_ALL
use open OUT => ':locale'; # russki parusski
open(O, ">koi8");
print O chr(0x430); # Unicode CYRILLIC SMALL LETTER A = KOI8-R 0xc1
close O;
open(I, "<koi8");
printf "%#x\n", ord(<I>), "\n"; # this should print 0xc1
close I;
These methods install a transparent filter on the I/O stream that
converts data from the specified encoding when it is read in from the
stream. The result is always Unicode.
The L<open> pragma affects all the C<open()> calls after the pragma by
setting default layers. If you want to affect only certain
streams, use explicit layers directly in the C<open()> call.
You can switch encodings on an already opened stream by using
C<binmode()>; see L<perlfunc/binmode>.
The C<:locale> does not currently work with
C<open()> and C<binmode()>, only with the C<open> pragma. The
C<:utf8> and C<:encoding(...)> methods do work with all of C<open()>,
C<binmode()>, and the C<open> pragma.
Similarly, you may use these I/O layers on output streams to
automatically convert Unicode to the specified encoding when it is
written to the stream. For example, the following snippet copies the
contents of the file "text.jis" (encoded as ISO-2022-JP, aka JIS) to
the file "text.utf8", encoded as UTF-8:
open(my $nihongo, '<:encoding(iso-2022-jp)', 'text.jis');
open(my $unicode, '>:utf8', 'text.utf8');
while (<$nihongo>) { print $unicode $_ }
The naming of encodings, both by the C<open()> and by the C<open>
pragma allows for flexible names: C<koi8-r> and C<KOI8R> will both be
understood.
Common encodings recognized by ISO, MIME, IANA, and various other
standardisation organisations are recognised; for a more detailed
list see L<Encode::Supported>.
C<read()> reads characters and returns the number of characters.
C<seek()> and C<tell()> operate on byte counts, as do C<sysread()>
and C<sysseek()>.
Notice that because of the default behaviour of not doing any
conversion upon input if there is no default layer,
it is easy to mistakenly write code that keeps on expanding a file
by repeatedly encoding the data:
# BAD CODE WARNING
open F, "file";
local $/; ## read in the whole file of 8-bit characters
$t = <F>;
close F;
open F, ">:encoding(UTF-8)", "file";
print F $t; ## convert to UTF-8 on output
close F;
If you run this code twice, the contents of the F<file> will be twice
UTF-8 encoded. A C<use open ':encoding(UTF-8)'> would have avoided the
bug, or explicitly opening also the F<file> for input as UTF-8.
B<NOTE>: the C<:utf8> and C<:encoding> features work only if your
Perl has been built with L<PerlIO>, which is the default
on most systems.
=head2 Displaying Unicode As Text
Sometimes you might want to display Perl scalars containing Unicode as
simple ASCII (or EBCDIC) text. The following subroutine converts
its argument so that Unicode characters with code points greater than
255 are displayed as C<\x{...}>, control characters (like C<\n>) are
displayed as C<\x..>, and the rest of the characters as themselves:
sub nice_string {
join("",
map { $_ > 255 # if wide character...
? sprintf("\\x{%04X}", $_) # \x{...}
: chr($_) =~ /[[:cntrl:]]/ # else if control character...
? sprintf("\\x%02X", $_) # \x..
: quotemeta(chr($_)) # else quoted or as themselves
} unpack("W*", $_[0])); # unpack Unicode characters
}
For example,
nice_string("foo\x{100}bar\n")
returns the string
'foo\x{0100}bar\x0A'
which is ready to be printed.
(C<\\x{}> is used here instead of C<\\N{}>, since it's most likely that
you want to see what the native values are.)
=head2 Special Cases
=over 4
=item *
Bit Complement Operator ~ And vec()
The bit complement operator C<~> may produce surprising results if
used on strings containing characters with ordinal values above
255. In such a case, the results are consistent with the internal
encoding of the characters, but not with much else. So don't do
that. Similarly for C<vec()>: you will be operating on the
internally-encoded bit patterns of the Unicode characters, not on
the code point values, which is very probably not what you want.
=item *
Peeking At Perl's Internal Encoding
Normal users of Perl should never care how Perl encodes any particular
Unicode string (because the normal ways to get at the contents of a
string with Unicode--via input and output--should always be via
explicitly-defined I/O layers). But if you must, there are two
ways of looking behind the scenes.
One way of peeking inside the internal encoding of Unicode characters
is to use C<unpack("C*", ...> to get the bytes of whatever the string
encoding happens to be, or C<unpack("U0..", ...)> to get the bytes of the
UTF-8 encoding:
# this prints c4 80 for the UTF-8 bytes 0xc4 0x80
print join(" ", unpack("U0(H2)*", pack("U", 0x100))), "\n";
Yet another way would be to use the Devel::Peek module:
perl -MDevel::Peek -e 'Dump(chr(0x100))'
That shows the C<UTF8> flag in FLAGS and both the UTF-8 bytes
and Unicode characters in C<PV>. See also later in this document
the discussion about the C<utf8::is_utf8()> function.
=back
=head2 Advanced Topics
=over 4
=item *
String Equivalence
The question of string equivalence turns somewhat complicated
in Unicode: what do you mean by "equal"?
(Is C<LATIN CAPITAL LETTER A WITH ACUTE> equal to
C<LATIN CAPITAL LETTER A>?)
The short answer is that by default Perl compares equivalence (C<eq>,
C<ne>) based only on code points of the characters. In the above
case, the answer is no (because 0x00C1 != 0x0041). But sometimes, any
CAPITAL LETTER A's should be considered equal, or even A's of any case.
The long answer is that you need to consider character normalization
and casing issues: see L<Unicode::Normalize>, Unicode Technical Report #15,
L<Unicode Normalization Forms|http://www.unicode.org/unicode/reports/tr15> and
sections on case mapping in the L<Unicode Standard|http://www.unicode.org>.
As of Perl 5.8.0, the "Full" case-folding of I<Case
Mappings/SpecialCasing> is implemented, but bugs remain in C<qr//i> with them,
mostly fixed by 5.14, and essentially entirely by 5.18.
=item *
String Collation
People like to see their strings nicely sorted--or as Unicode
parlance goes, collated. But again, what do you mean by collate?
(Does C<LATIN CAPITAL LETTER A WITH ACUTE> come before or after
C<LATIN CAPITAL LETTER A WITH GRAVE>?)
The short answer is that by default, Perl compares strings (C<lt>,
C<le>, C<cmp>, C<ge>, C<gt>) based only on the code points of the
characters. In the above case, the answer is "after", since
C<0x00C1> > C<0x00C0>.
The long answer is that "it depends", and a good answer cannot be
given without knowing (at the very least) the language context.
See L<Unicode::Collate>, and I<Unicode Collation Algorithm>
L<http://www.unicode.org/unicode/reports/tr10/>
=back
=head2 Miscellaneous
=over 4
=item *
Character Ranges and Classes
Character ranges in regular expression bracketed character classes ( e.g.,
C</[a-z]/>) and in the C<tr///> (also known as C<y///>) operator are not
magically Unicode-aware. What this means is that C<[A-Za-z]> will not
magically start to mean "all alphabetic letters" (not that it does mean that
even for 8-bit characters; for those, if you are using locales (L<perllocale>),
use C</[[:alpha:]]/>; and if not, use the 8-bit-aware property C<\p{alpha}>).
All the properties that begin with C<\p> (and its inverse C<\P>) are actually
character classes that are Unicode-aware. There are dozens of them, see
L<perluniprops>.
Starting in v5.22, you can use Unicode code points as the end points of
regular expression pattern character ranges, and the range will include
all Unicode code points that lie between those end points, inclusive.
qr/ [ \N{U+03} - \N{U+20} ] /xx
includes the code points
C<\N{U+03}>, C<\N{U+04}>, ..., C<\N{U+20}>.
This also works for ranges in C<tr///> starting in Perl v5.24.
=item *
String-To-Number Conversions
Unicode does define several other decimal--and numeric--characters
besides the familiar 0 to 9, such as the Arabic and Indic digits.
Perl does not support string-to-number conversion for digits other
than ASCII C<0> to C<9> (and ASCII C<a> to C<f> for hexadecimal).
To get safe conversions from any Unicode string, use
L<Unicode::UCD/num()>.
=back
=head2 Questions With Answers
=over 4
=item *
Will My Old Scripts Break?
Very probably not. Unless you are generating Unicode characters
somehow, old behaviour should be preserved. About the only behaviour
that has changed and which could start generating Unicode is the old
behaviour of C<chr()> where supplying an argument more than 255
produced a character modulo 255. C<chr(300)>, for example, was equal
to C<chr(45)> or "-" (in ASCII), now it is LATIN CAPITAL LETTER I WITH
BREVE.
=item *
How Do I Make My Scripts Work With Unicode?
Very little work should be needed since nothing changes until you
generate Unicode data. The most important thing is getting input as
Unicode; for that, see the earlier I/O discussion.
To get full seamless Unicode support, add
C<use feature 'unicode_strings'> (or C<use 5.012> or higher) to your
script.
=item *
How Do I Know Whether My String Is In Unicode?
You shouldn't have to care. But you may if your Perl is before 5.14.0
or you haven't specified C<use feature 'unicode_strings'> or C<use
5.012> (or higher) because otherwise the rules for the code points
in the range 128 to 255 are different depending on
whether the string they are contained within is in Unicode or not.
(See L<perlunicode/When Unicode Does Not Happen>.)
To determine if a string is in Unicode, use:
print utf8::is_utf8($string) ? 1 : 0, "\n";
But note that this doesn't mean that any of the characters in the
string are necessary UTF-8 encoded, or that any of the characters have
code points greater than 0xFF (255) or even 0x80 (128), or that the
string has any characters at all. All the C<is_utf8()> does is to
return the value of the internal "utf8ness" flag attached to the
C<$string>. If the flag is off, the bytes in the scalar are interpreted
as a single byte encoding. If the flag is on, the bytes in the scalar
are interpreted as the (variable-length, potentially multi-byte) UTF-8 encoded
code points of the characters. Bytes added to a UTF-8 encoded string are
automatically upgraded to UTF-8. If mixed non-UTF-8 and UTF-8 scalars
are merged (double-quoted interpolation, explicit concatenation, or
printf/sprintf parameter substitution), the result will be UTF-8 encoded
as if copies of the byte strings were upgraded to UTF-8: for example,
$a = "ab\x80c";
$b = "\x{100}";
print "$a = $b\n";
the output string will be UTF-8-encoded C<ab\x80c = \x{100}\n>, but
C<$a> will stay byte-encoded.
Sometimes you might really need to know the byte length of a string
instead of the character length. For that use the C<bytes> pragma
and the C<length()> function:
my $unicode = chr(0x100);
print length($unicode), "\n"; # will print 1
use bytes;
print length($unicode), "\n"; # will print 2
# (the 0xC4 0x80 of the UTF-8)
no bytes;
=item *
How Do I Find Out What Encoding a File Has?
You might try L<Encode::Guess>, but it has a number of limitations.
=item *
How Do I Detect Data That's Not Valid In a Particular Encoding?
Use the C<Encode> package to try converting it.
For example,
use Encode 'decode';
if (eval { decode('UTF-8', $string, Encode::FB_CROAK); 1 }) {
# $string is valid UTF-8
} else {
# $string is not valid UTF-8
}
Or use C<unpack> to try decoding it:
use warnings;
@chars = unpack("C0U*", $string_of_bytes_that_I_think_is_utf8);
If invalid, a C<Malformed UTF-8 character> warning is produced. The "C0" means
"process the string character per character". Without that, the
C<unpack("U*", ...)> would work in C<U0> mode (the default if the format
string starts with C<U>) and it would return the bytes making up the UTF-8
encoding of the target string, something that will always work.
=item *
How Do I Convert Binary Data Into a Particular Encoding, Or Vice Versa?
This probably isn't as useful as you might think.
Normally, you shouldn't need to.
In one sense, what you are asking doesn't make much sense: encodings
are for characters, and binary data are not "characters", so converting
"data" into some encoding isn't meaningful unless you know in what
character set and encoding the binary data is in, in which case it's
not just binary data, now is it?
If you have a raw sequence of bytes that you know should be
interpreted via a particular encoding, you can use C<Encode>:
use Encode 'from_to';
from_to($data, "iso-8859-1", "UTF-8"); # from latin-1 to UTF-8
The call to C<from_to()> changes the bytes in C<$data>, but nothing
material about the nature of the string has changed as far as Perl is
concerned. Both before and after the call, the string C<$data>
contains just a bunch of 8-bit bytes. As far as Perl is concerned,
the encoding of the string remains as "system-native 8-bit bytes".
You might relate this to a fictional 'Translate' module:
use Translate;
my $phrase = "Yes";
Translate::from_to($phrase, 'english', 'deutsch');
## phrase now contains "Ja"
The contents of the string changes, but not the nature of the string.
Perl doesn't know any more after the call than before that the
contents of the string indicates the affirmative.
Back to converting data. If you have (or want) data in your system's
native 8-bit encoding (e.g. Latin-1, EBCDIC, etc.), you can use
pack/unpack to convert to/from Unicode.
$native_string = pack("W*", unpack("U*", $Unicode_string));
$Unicode_string = pack("U*", unpack("W*", $native_string));
If you have a sequence of bytes you B<know> is valid UTF-8,
but Perl doesn't know it yet, you can make Perl a believer, too:
$Unicode = $bytes;
utf8::decode($Unicode);
or:
$Unicode = pack("U0a*", $bytes);
You can find the bytes that make up a UTF-8 sequence with
@bytes = unpack("C*", $Unicode_string)
and you can create well-formed Unicode with
$Unicode_string = pack("U*", 0xff, ...)
=item *
How Do I Display Unicode? How Do I Input Unicode?
See L<http://www.alanwood.net/unicode/> and
L<http://www.cl.cam.ac.uk/~mgk25/unicode.html>
=item *
How Does Unicode Work With Traditional Locales?
If your locale is a UTF-8 locale, starting in Perl v5.26, Perl works
well for all categories; before this, starting with Perl v5.20, it works
for all categories but C<LC_COLLATE>, which deals with
sorting and the C<cmp> operator. But note that the standard
C<L<Unicode::Collate>> and C<L<Unicode::Collate::Locale>> modules offer
much more powerful solutions to collation issues, and work on earlier
releases.
For other locales, starting in Perl 5.16, you can specify
use locale ':not_characters';
to get Perl to work well with them. The catch is that you
have to translate from the locale character set to/from Unicode
yourself. See L</Unicode IE<sol>O> above for how to
use open ':locale';
to accomplish this, but full details are in L<perllocale/Unicode and
UTF-8>, including gotchas that happen if you don't specify
C<:not_characters>.
=back
=head2 Hexadecimal Notation
The Unicode standard prefers using hexadecimal notation because
that more clearly shows the division of Unicode into blocks of 256 characters.
Hexadecimal is also simply shorter than decimal. You can use decimal
notation, too, but learning to use hexadecimal just makes life easier
with the Unicode standard. The C<U+HHHH> notation uses hexadecimal,
for example.
The C<0x> prefix means a hexadecimal number, the digits are 0-9 I<and>
a-f (or A-F, case doesn't matter). Each hexadecimal digit represents
four bits, or half a byte. C<print 0x..., "\n"> will show a
hexadecimal number in decimal, and C<printf "%x\n", $decimal> will
show a decimal number in hexadecimal. If you have just the
"hex digits" of a hexadecimal number, you can use the C<hex()> function.
print 0x0009, "\n"; # 9
print 0x000a, "\n"; # 10
print 0x000f, "\n"; # 15
print 0x0010, "\n"; # 16
print 0x0011, "\n"; # 17
print 0x0100, "\n"; # 256
print 0x0041, "\n"; # 65
printf "%x\n", 65; # 41
printf "%#x\n", 65; # 0x41
print hex("41"), "\n"; # 65
=head2 Further Resources
=over 4
=item *
Unicode Consortium
L<http://www.unicode.org/>
=item *
Unicode FAQ
L<http://www.unicode.org/unicode/faq/>
=item *
Unicode Glossary
L<http://www.unicode.org/glossary/>
=item *
Unicode Recommended Reading List
The Unicode Consortium has a list of articles and books, some of which
give a much more in depth treatment of Unicode:
L<http://unicode.org/resources/readinglist.html>
=item *
Unicode Useful Resources
L<http://www.unicode.org/unicode/onlinedat/resources.html>
=item *
Unicode and Multilingual Support in HTML, Fonts, Web Browsers and Other Applications
L<http://www.alanwood.net/unicode/>
=item *
UTF-8 and Unicode FAQ for Unix/Linux
L<http://www.cl.cam.ac.uk/~mgk25/unicode.html>
=item *
Legacy Character Sets
L<http://www.czyborra.com/>
L<http://www.eki.ee/letter/>
=item *
You can explore various information from the Unicode data files using
the C<Unicode::UCD> module.
=back
=head1 UNICODE IN OLDER PERLS
If you cannot upgrade your Perl to 5.8.0 or later, you can still
do some Unicode processing by using the modules C<Unicode::String>,
C<Unicode::Map8>, and C<Unicode::Map>, available from CPAN.
If you have the GNU recode installed, you can also use the
Perl front-end C<Convert::Recode> for character conversions.
The following are fast conversions from ISO 8859-1 (Latin-1) bytes
to UTF-8 bytes and back, the code works even with older Perl 5 versions.
# ISO 8859-1 to UTF-8
s/([\x80-\xFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg;
# UTF-8 to ISO 8859-1
s/([\xC2\xC3])([\x80-\xBF])/chr(ord($1)<<6&0xC0|ord($2)&0x3F)/eg;
=head1 SEE ALSO
L<perlunitut>, L<perlunicode>, L<Encode>, L<open>, L<utf8>, L<bytes>,
L<perlretut>, L<perlrun>, L<Unicode::Collate>, L<Unicode::Normalize>,
L<Unicode::UCD>
=head1 ACKNOWLEDGMENTS
Thanks to the kind readers of the perl5-porters@perl.org,
perl-unicode@perl.org, linux-utf8@nl.linux.org, and unicore@unicode.org
mailing lists for their valuable feedback.
=head1 AUTHOR, COPYRIGHT, AND LICENSE
Copyright 2001-2011 Jarkko Hietaniemi E<lt>jhi@iki.fiE<gt>.
Now maintained by Perl 5 Porters.
This document may be distributed under the same terms as Perl itself.
PK {3�Z��u u perldbmfilter.podnu �[��� =head1 NAME
perldbmfilter - Perl DBM Filters
=head1 SYNOPSIS
$db = tie %hash, 'DBM', ...
$old_filter = $db->filter_store_key ( sub { ... } );
$old_filter = $db->filter_store_value( sub { ... } );
$old_filter = $db->filter_fetch_key ( sub { ... } );
$old_filter = $db->filter_fetch_value( sub { ... } );
=head1 DESCRIPTION
The four C<filter_*> methods shown above are available in all the DBM
modules that ship with Perl, namely DB_File, GDBM_File, NDBM_File,
ODBM_File and SDBM_File.
Each of the methods works identically, and is used to install (or
uninstall) a single DBM Filter. The only difference between them is the
place that the filter is installed.
To summarise:
=over 5
=item B<filter_store_key>
If a filter has been installed with this method, it will be invoked
every time you write a key to a DBM database.
=item B<filter_store_value>
If a filter has been installed with this method, it will be invoked
every time you write a value to a DBM database.
=item B<filter_fetch_key>
If a filter has been installed with this method, it will be invoked
every time you read a key from a DBM database.
=item B<filter_fetch_value>
If a filter has been installed with this method, it will be invoked
every time you read a value from a DBM database.
=back
You can use any combination of the methods from none to all four.
All filter methods return the existing filter, if present, or C<undef>
if not.
To delete a filter pass C<undef> to it.
=head2 The Filter
When each filter is called by Perl, a local copy of C<$_> will contain
the key or value to be filtered. Filtering is achieved by modifying
the contents of C<$_>. The return code from the filter is ignored.
=head2 An Example: the NULL termination problem.
DBM Filters are useful for a class of problems where you I<always>
want to make the same transformation to all keys, all values or both.
For example, consider the following scenario. You have a DBM database
that you need to share with a third-party C application. The C application
assumes that I<all> keys and values are NULL terminated. Unfortunately
when Perl writes to DBM databases it doesn't use NULL termination, so
your Perl application will have to manage NULL termination itself. When
you write to the database you will have to use something like this:
$hash{"$key\0"} = "$value\0";
Similarly the NULL needs to be taken into account when you are considering
the length of existing keys/values.
It would be much better if you could ignore the NULL terminations issue
in the main application code and have a mechanism that automatically
added the terminating NULL to all keys and values whenever you write to
the database and have them removed when you read from the database. As I'm
sure you have already guessed, this is a problem that DBM Filters can
fix very easily.
use strict;
use warnings;
use SDBM_File;
use Fcntl;
my %hash;
my $filename = "filt";
unlink $filename;
my $db = tie(%hash, 'SDBM_File', $filename, O_RDWR|O_CREAT, 0640)
or die "Cannot open $filename: $!\n";
# Install DBM Filters
$db->filter_fetch_key ( sub { s/\0$// } );
$db->filter_store_key ( sub { $_ .= "\0" } );
$db->filter_fetch_value(
sub { no warnings 'uninitialized'; s/\0$// } );
$db->filter_store_value( sub { $_ .= "\0" } );
$hash{"abc"} = "def";
my $a = $hash{"ABC"};
# ...
undef $db;
untie %hash;
The code above uses SDBM_File, but it will work with any of the DBM
modules.
Hopefully the contents of each of the filters should be
self-explanatory. Both "fetch" filters remove the terminating NULL,
and both "store" filters add a terminating NULL.
=head2 Another Example: Key is a C int.
Here is another real-life example. By default, whenever Perl writes to
a DBM database it always writes the key and value as strings. So when
you use this:
$hash{12345} = "something";
the key 12345 will get stored in the DBM database as the 5 byte string
"12345". If you actually want the key to be stored in the DBM database
as a C int, you will have to use C<pack> when writing, and C<unpack>
when reading.
Here is a DBM Filter that does it:
use strict;
use warnings;
use DB_File;
my %hash;
my $filename = "filt";
unlink $filename;
my $db = tie %hash, 'DB_File', $filename, O_CREAT|O_RDWR, 0666,
$DB_HASH or die "Cannot open $filename: $!\n";
$db->filter_fetch_key ( sub { $_ = unpack("i", $_) } );
$db->filter_store_key ( sub { $_ = pack ("i", $_) } );
$hash{123} = "def";
# ...
undef $db;
untie %hash;
The code above uses DB_File, but again it will work with any of the
DBM modules.
This time only two filters have been used; we only need to manipulate
the contents of the key, so it wasn't necessary to install any value
filters.
=head1 SEE ALSO
L<DB_File>, L<GDBM_File>, L<NDBM_File>, L<ODBM_File> and L<SDBM_File>.
=head1 AUTHOR
Paul Marquess
PK {3�Z��)��'