Intro

This is a rough Perl 5 quick-reference that contains some brief overview material, recommendations, and reminders regarding how to use Perl 5. It assumes you’ve used Perl 5 before, but maybe have gotten a little rusty. :)

A good place to look for more info on using Perl 5 is the Modern Perl book.

Other good documentation, articles, and books:

Installation

You may wish to install your own Perl 5 (ex. ~/opt/perl/bin/perl) instead of using the one that comes with your system (/usr/bin/perl). The system Perl is often used heavily by various system processes, and it’s probably best not to tinker with it too much.

Tip: Use only your OS package installer (ex. apt-get or yum) to install Perl modules into the system Perl. Use only cpanm (or similar) for installing modules into your own Perl.

Of particular importance: don’t use cpanm with your system Perl. For example, when using Debian, apt knows about what it installs, but wouldn’t know about packages with cpanm.

Installing your very own Perl 5, OTOH, lets you experiment, tweak, and even break things — you can always just wipe it out and start fresh again, if that becomes necessary.

To install your very own Perl 5, grab the Perl 5 source code and manually install into ~/opt like so:

cd ~/opt
mkdir perl-5.n.m
mkdir perl-5.n.m/bin

cd src   # Make this dir if you need to.
cp path/to/perl-5.n.m.tar.gz .
tar xzf perl-5.n.m.tar.gz

cd perl-5.n.m
# rm -f config.sh Policy.sh  ## Not necessary for a newly-unpacked kit.
./Configure -Dprefix=/home/<you>/opt/perl-5.n.m
# Answer a lot of questions, accepting the defaults.
# If it complains that it can't find "/usr/bin/less -R",
# tell it to use "/usr/bin/less" instead.
make
make test
make install

# Final step (to be put on your $PATH).
cd ~/opt
ln -s perl-5.n.m perl

(where n and m are the version and sub-version numbers, and <you> is your home directory name).

Now add the following to your ~/.profile:

export PATH=$HOME/opt/perl/bin:$PATH

and log out/in again for the change to take effect. Check that you’re getting the perl you think you’re getting:

which perl        # should be ~/opt/perl/bin/perl
perl -v           # should be your newly-installed v5.n.m

# Alternatively:
perl -e 'print "$]\n"'    # or
perl -e 'print "$^V\n"'

As another option, you can instead use perlbrew to automatically install one or more Perls into your home directory.


* * *

Before doing anything else (but after you’ve logged out and back in again so the new perl is first on your PATH), install the cpanminus tool. I suggest using the clever “bootstrap method”, whereby cpanm installs itself:

curl -L http://cpanmin.us | perl - App::cpanminus

You should now have ~/opt/perl/bin/cpanm present.

As a quick test, have cpanminus install Modern::Perl:

cpanm Modern::Perl

This will install Modern::Perl into your own Perl’s site_perl dir.

Perl doesn’t come with a REPL out of the box. Install this one:

cpanm Devel::REPL

After that completes, you may also need to install a handful of supporting modules for the repl to work smoothly, such as File::Next, B::Keywords, and Term::ReadLine::Gnu. After that’s done, try it out:

$ re.pl

Here are some commonly-used modules you also might install right off the bat:

DateTime  Moose  Config::Tiny  Template  Try::Tiny
IPC::System::Simple  Capture::Tiny  File::Slurp

and for database use, probably also:

DBI  DBD::SQLite

For creating and distributing your own Perl 5 modules:

Module::Starter  Dist::Zilla

Finally, you may want to set up a local Perl module directory in your home directory for installing modules which you’d rather not put into your Perl’s site_perl dir (for example, more specialized, experimental, or project-specific modules). For this, use local::lib.

Local Perl docs

You can get at your perl docs using the perldoc command. They are also available online as html at http://perldoc.perl.org/.

To see the master ToC, see perldoc perltoc.

To see all the built-in functions grouped by category, see perldoc perlfunc and page down a couple of times.

You can always jump straight to the docs for a given function using the -f option, for example: perldoc -f sort. You can read the perldocs of a local file like so: perldoc ./myfile.pl (you can use the -F option here to speed things up).

Go to perldoc perlfunc and go down a page or two to find a handy arrangement of Perl functions grouped by category.

See perldoc perl to get the big table of contents of all the available perldoc pages.

At the top of every script/module

Always (unless you have a good reason not to) start your scripts and modules with use strict and use warnings, or else use Modern::Perl (which does those for you, plus a bit more).

Language fundamentals

numerical op string op
= eq
!= ne
< lt
<= le
> gt
>= ge
<=> cmp
* x
+ .

Context

When perl is evaluating an expression (at compile-time), what type of value it expects to find (scalar or list) depends upon context.

Context is determined at compile-time when perl parses your source code.

If a scalar is put into a list context, it usually produces a one-element list.

If a list-producing expression is put into a scalar context, it will hopefully evaluate to something useful. For example, an array will yield the number of elements it contains.

You can force a scalar context by using the scalar operator.

In the docs, when you see something like “sort LIST”, it means that the sort operator provides a list context to its arguments. Furthermore, if the operator provides a list context to an argument, it also provides a list context to the elements of that list argument.

Lexicals, globals, and scoping

Perl provides 2 kinds of namespaces for variables to live in: “package” (i.e. “symbol tables”), and lexical. Package variables are globals (aka “package globals”), are dynamically scoped, and live in named symbol tables. Lexicals are locals and live in unnamed lexical scopes. File scope is the largest possible lexical scope.

When one subroutine calls another, the one being called is in the dynamic scope of the caller. When one block is inside another, the inner block is in the lexical scope of the outer one. Note, however, that when you get to the end of a block where you started, you leave the current lexical and dynamic scopes.

If you like, you can create your very own secluded lexical scope with an empty block:

say "Noisy outer scope!";

{
    # Nice quiet local scope in here.
    my $i = 3;
    say $i;
}

# say $i;  ERROR

Symbol tables are actually global hashes, and contain the names of the variables in them. All the built-in globals (like @ARGV, %ENV, $$, etc.) are located in the main symbol table.

Incidentally, within each namespace there’s a sub-namespace for each sigil (that’s why $foo and @foo are 2 different variables). If you want to refer to every “foo” in a symbol table — regardless of sigil — you use a “typeglob”. Perl uses typeglobs to implement the importing of modules.

A fully-qualified package variable name like $Foods::Veg::Tomato::variety shows you the structure of the nested symbol tables — the most deeply-nested of which contains the global $variety scalar. The fully-qualified name also indicates that the file path leading to Tomato.pm is Foods/Veg/Tomato.pm.

Note: you can’t see local/lexical/my variables in a module from outside that module.

When you call use, it often imports package symbols (or else gives the compiler some hints (as “pragmas”)) for use in the current lexical scope.

A package declaration (usually at the top of a module) is lexically scoped, and declares the name of the current default package until the end of the current lexical scope (usually the file).

Recall that use happens at compile-time. require happens at run-time.

Aside from package and use (and require), the three operators dealing with lexicals and package globals are my, our, and local:

my declares a lexically-scoped variable. It’s name and value are both stored locally, only.

our declares a lexically scoped name that refers to a package global. Using the above example, the Tomato.pm file would contain our $variety; in it. our does not create values — it just gives you access to the global, though, you usually give access to and create it at the same time, ex.:

our $foo = 7;

Note: “our” replaces “use vars”.

local sets up a temporary value for a package global — but only for the current dynamic scope. That is, if you use local in a sub, and then call another sub, in that 2nd sub you’re still in the same dynamic scope, and so will still see that localized value. Once the 1st sub returns, you’re back to the pre-localized value. Of course, same thing happens if you come out of a lexical scope — you’re back to the value that the package global had before the scope you were just in.

Only use local if you have a good reason to.

More on operators

A little more on operators:

For more, read perldoc perlop.

Keywords

The details are in perldoc perlsyn. Among others, you’ll find info on:

Note, in Perl you can label loops, if you like.

LINE:
for my $line (@lines) {
    #...
    next LINE if $thus_and_so;
}

BTW, $line is lexically scoped to that for-loop block. Same thing with while loops like this:

while (my $r = get_thingy()) {
    # do something
}

say "it was $r!";  # ERROR: $r is out of scope here.

Incidentally, a bare block:

{
    print "hi\n";
}

is the same as a loop that only loops once. It’s contents have their own lexical scope, and you can exit it using last (or even start it over using redo). if and unless blocks are, of course, not loops.

When looping over a list, the variable used for each item ($_ being the default) actually aliases the item. So, you can change the list items on the fly. However, don’t delete items on the fly — for that, append to a separate array instead.

Built-ins

Perl comes with a good number of built-in functions. They’re also sometimes referred to as operators when you don’t put parentheses around their arguments. When you use a built-in that only takes one arg, and you don’t use parentheses, it’s called a named unary operator.

See a nicely-organized list of them at perldoc perlfunc.

Standard library

Perl comes out-of-the-box with many modules in its standard library. To see the list of them, run perldoc perlmodlib.

Commonly-used variables

Besides $_, a few of them are:

See chap. 28 of the Camel for more, or read perldoc perlvar.

Quoting

See the perldoc perlop section “Quote and Quote-like Operators” for more info.

If you need a multi-line string, use the heredoc syntax:

my $long_string = <<"END_OF_STRING";   # Note explicit quoting.
la dee da
va va va $voom
ok, done
END_OF_STRING

process_long_string($long_string);

process_long_string(<<'EOS');
line one
line two
line three
EOS

For dealing with regexes

Remember that, inside the current regex (i.e. during the match), you use \1. Outside the match, you use $1. For s/foo/bar/, you’d use $1 (not \1) where “bar” is.

The /g regex modifier is for globally finding matches. What the pattern match yields is described in the following sections.

It’s probably best practice to always use “/xms” at the end of your regexen.

Mostly, what you’ll be doing with regexes is:

Resulting value after an attempted match (m//)

In scalar context

In list context

Regarding s///

You can use /g with s/// as well, and it does what you’d expect. Regardless, s/// returns a number telling how many times it succeeded in doing the replacement. But note, s///g in scalar context is not progressive like m//g is — you need to manually loop for something like that.

There’s a lot more to regexes, of course. For details, see chapter 5 of the Camel, and/or perldoc perlre.

Object Oriented Programming

Use Moose. Or, for something much smaller, faster to start up, and less featureful, see Moo).

Basic Moose usage: To create a WoodStove class, put the following into WoodStove.pm:

package WoodStove;

use OtherModules;
use YouMightNeed;

use Moose;
use namespace::autoclean;

# extends, roles, attributes, etc.

# methods

no Moose;
__PACKAGE__->meta->make_immutable;

1;

Files

Some functions: open, close, chdir, glob, unlink, rename, mkdir, rmdir, …

Regarding open:

use autodie qw/:all/;

open( my $in_file,  '<', 'input.txt' );
open( my $out_file, '>', 'output.txt' );

my $one_line  = <$in_file>;
my @all_lines = <$in_file>;

print {$outfile} @all_lines;   # Note extra braces and no comma.

close $outfile;

# Easiest way to read in all lines of a file:
use File::Slurp qw/slurp/;
my $one_big_string = slurp $in_file;
my @all_lines      = slurp $in_file;

Note: you don’t need to check for success when opening files (those first 2 lines) if you’re using autodie.

And you should indeed be using autodie.

You can do tests on files using their filename and the various “-x” tests such as -r (is readable), -w (is writable), -e (exists), and so on. For the full list of tests, see perldoc perlfunc — they’re listed near the top.

Note that glob has some special magic: if it’s in the condition of a while, for, or until loop, each time through it’ll give you the next filename.

while (my $fn = glob '*.txt') {
    say "Found $fn";
}

This is analagous to the magic of the line input operator.

Processes

Aka, “shelling out”.

See also the docs for IPC::System::Simple and Capture::Tiny.

Exception handling

See perldoc -f eval and perldoc -f die.

POD

Perl 5 POD is pretty no-frills. Blank lines separate paragraphs. You indent what you want rendered verbatim. You can get I<italic>, B<bold>, and C<monospace>. Headings are made with =headn where n is 1, 2, 3, or 4. Lists are made with =over, =item * (or =item foo), and then with a =back at the end of the list. End POD with =cut. It works ok for manpage-style docs.

Perl 6 Pod (see Synopsis 26) is the newest standard for Pod.

That said, this quick reference is written in Pandoc-Markdown.

Packages, Modules, and Distributions

A distribution is the tar.gz file you can download from the CPAN (which cpanm downloads for you). These are typically named “Like-This-1.02.tar.gz”.

A module is a .pm file that you can use in your code. It may contain zero or more packages (described below).

A package is a namespace. At the top of your module you can specify the current package name which sets the name of the default package for whatever follows. Packages usually have a version number as well.

package MyPackage;
our $VERSION = 0.01;

(Think of version number i.NNNN like version i.j.k where j and k can be up to 2 digits and are padded with a zero if < 10.)

You generally use CamelCase for package and module names. You can specify more than one package per file, but it’s simpler to have just one per file. That is, one .pm file == one module == one package. Keep it simple.

You name your module the same as the tail end of the package name, but with “.pm” at the end. Further, if the package name contains ::, you place the file in the corresponding nested directory. So, for package Foo::Bar::Baz;, you’d have ~/perl5lib/Foo/Bar/Baz.pm. For your own simple modules, there’ll usually be no colons in the package name, and you’ll just drop the files directly into your ~/perl5lib (no subdirs needed or required).

You’ll need to have use lib '/home/you/perl5lib' in your source file for it to find your own modules.

Checking if you have a module installed

# If you can `use` it, it's installed.
perl -MSome::Module -e 1

# If you can read its docs, it's installed.
perldoc Some::Module

List all core modules installed

To see the full list of core modules for a given version of Perl:

corelist -v 5.n.m

(See corelist -v for the full list of versions that corelist knows about.)

List all extra modules you’ve installed

perldoc perllocal

Checking if a given module comes standard with Perl

Use the corelist script that comes with Perl, ex.:

corelist List::Util

Find out where a given module is installed

perldoc -l Some::Module

More Module Management Tools

See pmtools.

Uninstalling modules

Tricky.

You might opt to always install modules into your own ~/perl5lib (say, via local::lib). This way, you keep your Perl’s site_perl dir relatively clean, and you can tinker around with your ~/perl5lib dir as you wish. Then, if you really botch up your ~/perl5lib, you can always wipe it and start over without harming your Perl installation.

Using modules

To use modules, whether they come with the standard library or from elsewhere, you just use them near the top of your file like so:

use Foo::Bar;                  # Import whatever Foo::Bar exports.
use Foo::Baz qw/func1 func2/;  # Only import func1 and func2.
use Foo::Moo ();               # Do not import anything.

You’ll sometimes see something like:

use Foo::Bar -moo;

The two things going on there are:

  1. Putting a dash in front of a bareword does a little magic and makes it into a string which starts with a dash.
  2. Putting a lone string where a list is expected evaluates to a one-item list.

Where does Perl search for modules?

Paths are stored in @INC. See them yourself: perl -MModern::Perl -e 'for (@INC) { say; }'

Installing modules in a local directory

You can use cpanm to install modules into your own local ~/perl5lib directory (just like how it normally installs into the site_perl dir) by first installing and setting up local::lib. Follow the “bootstrap method” to install local::lib into your own ~/perl5lib dir.

TODO: more instructions here.

Writing your own procedural modules

For simple modules, just put this into your /home/<me>/perl5lib/Foo.pm file:

package Foo;

our $VERSION = 1.01;

my $whatever;  # Only visible inside this module.
our $bar;      # Can be accessed outside via `$Foo::bar`.

# Access outside via `Foo::baz()`.
sub baz {
    #...
}

1;

No need to use Exporter if it’s just a simple module that you don’t wish to export anything from.

From other scripts, to access Foo.pm’s globals and functions:

use lib '/home/me/perl5lib';
use Foo;

say $Foo::bar;
Foo::baz();

For anything more complex, or for modules you wish to distribute, see Module::Starter and Dist::Zilla.

Some Perl idioms

Some common tasks

Files, line-by-line

To do something with a file, line-by-line:

use autodie qw/:all/;
open(my $fh, '<', 'foo.txt');

while (my $line = <$fh>) {
    chomp $line;
    # ...
}

close $fh;

or

use File::Slurp qw/slurp/;

my $one_big_line = slurp 'foo.txt';

my @lines = slurp 'bar.txt';
chomp @lines;

Shell output line-by-line

for my $line (`ls -l *.txt`) {
    chomp $line;
    say ">>>$line<<<";
}

Date and time

my $sse = time;  # Seconds since epoch.

my ($hours, $minutes, $seconds) = (localtime)[2,1,0];

my ($year, $month, $day) = (localtime)[5,4,3];
$month++;       # Necessary.
$year += 1900;  # Necessary.
printf "%u-%02u-%02u\n", $year, $month, $day;

# See `perldoc -f sprintf` for details on that format string.

my $nice_string = localtime;

# You can also pass `localtime()` an sse value.
my $earlier = localtime($sse - 120);  # 2 minutes ago.

For anything more complicated than that, use DateTime.

Random

rand()         # 0.0 to < 1.0
rand(10)       # 0.0 to < 10.0
int(rand(10))  # 0 to 9 (`int` truncates)

Database access

SQLite

Make sure you have SQLite installed:

sudo apt-get install sqlite3 libsqlite3-dev

Then:

cpanm DBI
cpanm DBD::SQLite

XML

Use XML::Twig. See also XML::LibXML.

Other tips and best practices

One good place to look for a list of Perl best practices is the book "Perl Best Practices, by Damian Conway. If you haven’t already read it, you probably want to read it. Here’s some tips (some of which are from PBP):

Modules to make use of

Aside from many useful built-in modules, including:

here’s a few (in no particular order):

For more ideas, check out