[index] [home]

Tweet

Minimal Perl typesetting-module



For laying out printable text on a page a while ago, I was looking for a simple framework that could do absolute/relative positioning of text on a page, auto-wrap text, and some more simple tricks.

Mostly out of frustration with overly graphical software and unwillingness to learn a proper typesetting system, I rolled my own, and why not share it on the Internet.

I use this very often - basically every letter sent by me goes through this module.

Basic idea

Use is non-interactive: mix text to be rendered/positioned with statements about positioning, size, font etc in the same file, then process - and out comes a PDF-file.

Page-syntax

(Syntax described here might not be exact - see source for details :-)

Plaintext

Plaintext will be rendred as expected, with reasonable margins from the left/top/right of the page by default.

Special inline plaintext-sequences

The following sequences can occur within plaintext:

Comment

Any line starting with ## (2 hash-marks) starts a comment, which runs to the end of that line.

Statements

Any line starting with a single # (single hash-mark) contains one or multiple statements, being simple command-directives (e.g. home, dump) or expressions using predefined properties or user-defined variables.

'Properties' are sometimes refered to as 'options' in the following text; this was the term used in the original script. I guess the term 'properties' makes more sense for predefined variables, so I use that now.

Under the hood, properties and variables are simply Perl-variables; some of these (like boldfont) are predefined; others have no special meaning other than scratchpad for the user. One common use of a user-defined variable is to save/restore the value of a predefined property.

Multiple expressions on a single line are separated by semicolon. Terms in an expression must be surrounded by whitespace to humour the parser.

Command-directives

The following command-directives exist:

Predefined properties

The following predefined properties exist:

Learn by example

An example of a rendered page - each numbered area is discussed in one of the following sections.

(I don't know why the font looks so crappy; I used ImageMagick's convert to convert from PDF to PNG.)

Define column dimensions and add text (areas 1 and 2)

A normal column in the top-left area of the page. This is what I use most of the time, to confine text to a specific region/block on a page, e.g. an address-area.

    # t = l = 10
    # r = 130
    # home

    # bold
    (1) Let's fill a column somewhere on the top-left
    part of the page with some text:<_BRK_>
    # norm

    0 1 2 3 4 5 6 7 8 9 a b c d e f
    0 1 2 3 4 5 6 7 8 9 a b c d e f
    0 1 2 3 4 5 6 7 8 9 a b c d e f
    0 1 2 3 4 5 6 7 8 9 a b c d e f

Same idea, but smaller column elsewhere on page:

    # l = 140
    # r = 40
    # t = 100
    # home

    # bold
    (2) the same as before, but for a smaller column:<_BRK_>
    # norm

    0 1 2 3 4 5 6 7 8 9 a b c d e f
    0 1 2 3 4 5 6 7 8 9 a b c d e f

Change column width while adding text (area 3)

Not really spectacular, but changes to column-offsets take effect immediately (that is, after the next word).

    # l = 50
    # r = 130
    # t = 65
    # home

    # bold
    (3) Make this existing column smaller and smaller,
    while adding characters to it:<_BRK_>
    # norm

    0 1 2 3 4 5 6 7 8 9
    # r += 4
    0 1 2 3 4 5 6 7 8 9
    # r += 4
    0 1 2 3 4 5 6 7 8 9
    # r += 4
    0 1 2 3 4 5 6 7 8 9
    # r += 4
    0 1 2 3 4 5 6 7 8 9
    # r += 4
    0 1 2 3 4 5 6 7 8 9
    # r += 4
    0 1 2 3 4 5 6 7 8 9
    # r += 4
    0 1 2 3 4

Use different text-sizes (area 4)

Using the pt variable, different text size can be selected. Below, the original text size is saved/restored using user-defined variable old_pt.

    # l = 110
    # t = 25
    # r = 40
    # home

    # bold
    (4) You can vary the text-size:<_BRK_>
    # norm

    # old_pt = pt
    # pt = 10
    pt 10<_BRK_>
    # pt = 20
    pt 20<_BRK_>
    # pt = 30
    pt 30<_BRK_>
    # pt = 40
    pt 40<_BRK_>
    # pt = old_pt

    ...and back to normal.

Debugging-help: dump current properties (area 5)

Generate a properties-dump from page-source:

    # l = 10
    # t = 120
    # r = W - 30
    # home

    # bold
    (5) dumping all options to stdout<_BRK_>
    # norm

    # dump
    (nothing is visible on this page itself)

The resulting dump is output to stdout (no debugging-text is added to the rendered page itself), and looks something like this:

    $VAR1 = {
              'r' => 180,
              'x' => 0,
              't' => 120,
              'W' => 210,
              'normfont' => 'Helvetica',
              'l' => 10,
              'y' => '27.1764705882353',
              'pt' => 10,
              'boldfont' => 'Helvetica-Bold',
              'old_pt' => 10,
              'ydist' => 1
            };

(Both predefined and user-defined variables like old_pt will be shown.)

Poor man's bullet-list / indenting (area 6)

This is how I indent all my paragraphs, and make simple bullet-lists:

    # l = 30
    # t = 220
    # r = 100
    # home

    # bold
    (6) this is how you can make simple indenting/bullets work:<_BRK_>
    # norm

    As you can see, ...<_BRK_>
    # L = l
    # l += 10
    * this line is indented<_BRK_>
    * and so is this one<_BRK_>
    # l += 10
    - and this one even 2x!<_BRK_>
    - this one also 2x.<_BRK_>
    # l = L

    (this line is left-aligned again)

Toying around: simple drawing and character positioning (area 7)

Combining absolute and relative positioning (l vs x for horizontal position of text) to make simple drawings:

    # l = W / 2 - 20
    # t = 150
    # r = 60
    # home

    # bold
    (7) you can even make simple drawings. Note that properties
    (including any newly defined ones) retain their value
    when entering text:
    # norm

    0 1 2 3 4 5 6 7 8 9 a b c d e f
    0 1 2 3 4 5 6 7 8 9 a b c d e f
    0 1 2 3 4 5 6 7 8 9 a b c d e f
    0 1 2 3 4 5 6 7 8 9 a b c d e f

    (see the pretty lines to the left and right!)

    # l -= 5
    # home

    |<_BRK_>
    |<_BRK_>
    |<_BRK_>
    |<_BRK_>
    |<_BRK_>
    |<_BRK_>
    |<_BRK_>
    |<_BRK_>
    |<_BRK_>
    |<_BRK_>

    # l = W - r + 5
    # r -= 10
    # home

    |<_BRK_>
    |<_BRK_>
    |<_BRK_>
    |<_BRK_>
    |<_BRK_>
    |<_BRK_>
    |<_BRK_>
    |<_BRK_>
    |<_BRK_>
    |<_BRK_>

Horsing around: vertical text, lame version (area 8)

Using l == r (i.e. a zero-width column) and single-letter words to fake vertical text-flow:

    # l = W - 20
    # r = 20
    # t = 30
    # home

    # bold
    (8) 

    v e r t i c a l

    t e x t
    # norm

    ( s o r t

    o f )

Brains not included: overlapping text (area 9)

I don't like software getting in the way of my fail - text can be positioned everywhere, totally disregarding whatever might have been rendered there earlier:

    # l = 140
    # r = 20
    # t = 230
    # home

    # bold
    (9) text is simply output at the cursor-position,
    without checking if there was anything there already:<_BRK_>
    # norm

    # X = x
    # Y = y
    help!
    # x = X = X + 5
    # y = Y = Y + 5
    help!
    # x = X = X + 4
    # y = Y = Y + 4
    help!
    # x = X = X + 3
    # y = Y = Y + 3
    help!
    # x = X = X + 2
    # y = Y = Y + 2
    help!
    # x = X = X + 1
    # y = Y = Y + 1
    help!

Perl-source

Software consists of a module (basically a wrapper around PDF::Create and PDF::API2) and a simple driver-script which calls methods of the module.

Driver

The driver does little more than checking arguments and calling the module listed hereafter:

    #!/usr/bin/perl

    use strict;
    use warnings;
    use MyPDF;

    my ( $infile, $outfile ) = @ARGV;
    $outfile or die 'use: <infile> <outfile>';

    my $pdf = new MyPDF( $outfile ) or die "cannot create PDF-object";

    open( my $fh, "<$infile" ) or die "cannot open input-file '$infile'\n";
    my $contents = join( "", <$fh> );
    close( $fh );

    $pdf->interpret( $contents );
    $pdf->close() or die "cannot create PDF-file";

Module itself

(Main method is interpret().)

    {
      package MyPDF;

      use strict;
      use warnings;
      use PDF::Create;
      use PDF::API2;

      require Exporter;

      use constant {
        PAGEWIDTH_A4_MM  => 210,
        PAGEHEIGHT_A4_MM => 297,
      };

      sub set_opts
      {
        my ( $self, %opts ) = @_;

        use constant {
          DEFAULT_NORMFONT => 'Helvetica',
          DEFAULT_BOLDFONT => 'Helvetica-Bold',
        };

        my %opt2default = (
            t        => 20,
            l        => 20,
            r        => 20,
            x        => 0,
            y        => 0,
            pt       => 10,
            ydist    => 1,
            W        => PAGEWIDTH_A4_MM,
            normfont => DEFAULT_NORMFONT,
            boldfont => DEFAULT_BOLDFONT,
            );

        # Init uninitialised properties to their default values
        exists $self->{ opts }{ $_ } or $self->{ opts }{ $_ } = $opt2default{ $_ } foreach keys %opt2default;

        # Override properties from user
        $self->{ opts }{ $_ } = $opts{ $_ } foreach keys %opts;
      }

      sub _set_font { $_[ 0 ]->{ _curr_font } = $_[ 0 ]->{ pdf }->font( BaseFont => $_[ 1 ] ) or die }

      sub norm_font { $_[ 0 ]->_set_font( $_[ 0 ]->{ opts }{ normfont } ) }

      sub bold_font { $_[ 0 ]->_set_font( $_[ 0 ]->{ opts }{ boldfont } ) }

      sub close { $_[ 0 ]->{ pdf }->close(); 1 }

      sub new
      {
        my ( $this, $filename, %opts ) = @_;

        my $class = ref $this || $this;
        my $self = bless {}, $class;

        $self->{ pdf } = new PDF::Create( filename => $filename ) or die;
        $self->{ a4 } = $self->{ pdf }->new_page( MediaBox => $self->{ pdf }->get_page_size( 'A4' ) ) or die;
        $self->{ page } = $self->{ a4 }->new_page() or die;

        $self->set_opts( %opts );

        $self->norm_font();

        $self;
      }

      sub add_newline 
      {
        my ( $self ) = @_;

        $self->{ opts }{ x } = 0;

        $self->{ opts }{ y } += $self->_pt2mm( $self->{ opts }{ pt } );
        $self->{ opts }{ y } += $self->{ opts }{ ydist };
      }

      # (this is different than adding a word "    ", because this doesn't wrap yet
      sub add_tab { $_[ 0 ]{ opts }{ x } += $_[ 0 ]->text_width( "    " ) }

      sub _page_size_elt { $_[ 0 ]->{ pdf }->get_page_size( 'A4' )->[ $_[ 1 ] ] or die }

      sub _pagewidth_pt { $_[ 0 ]->_page_size_elt( 2 ) }

      sub _pageheight_pt { $_[ 0 ]->_page_size_elt( 3 ) }

      sub _pt_per_mm { ( $_[ 0 ]->_pagewidth_pt() / PAGEWIDTH_A4_MM ) }

      sub _pt2mm { $_[ 1 ] / $_[ 0 ]->_pt_per_mm() }

      sub _mm2pt { $_[ 1 ] * $_[ 0 ]->_pt_per_mm() }

      sub text_width
      {
        my ( $self, $text ) = @_;

        my $width_chr = $self->{ page }->string_width( $self->{ _curr_font }, $text );
        my $width_pt = ( $width_chr * $self->{ opts }{ pt } );

        $self->_pt2mm( $width_pt );
      }

      sub _abs_r { PAGEWIDTH_A4_MM - $_[ 0 ]->{ opts }{ r } }

      # Takes into account different coordinate systems between user/pdf
      sub _pos_mm2pt
      {
        my ( $self, $abs_x, $abs_y ) = @_;

        my $abs_x_pt = $self->_mm2pt( $abs_x );
        my $abs_y_pt = $self->_mm2pt( PAGEHEIGHT_A4_MM - $abs_y );

        ( $abs_x_pt, $abs_y_pt );
      }

      sub _cursor_pdfpos
      {
        my ( $self ) = @_;

        my $pos_x = ( $self->{ opts }{ l } + $self->{ opts }{ x } );
        my $pos_y = ( $self->{ opts }{ t } + $self->{ opts }{ y } );

        $self->_pos_mm2pt( $pos_x, $pos_y );
      }

      # Recognises special escapes (_BRK_, _TAB_, ...)
      sub add_word
      {
        my ( $self, $word, %opts ) = @_;

        $self->set_opts( %opts );

        if    ( $word =~ /^_BRK_$/ ) { $self->add_newline() }
        elsif ( $word =~ /^_TAB_$/ ) { $self->add_tab()     }
        else {

          my $word_width = $self->text_width( $word );

          # Adjust cursor-pos to start of (wrapped) word in box
          my $r = ( PAGEWIDTH_A4_MM - $self->{ opts }{ r } );
          my $box_width = ( $r - $self->{ opts }{ l } );
          my $wrap = ( ( $self->{ opts }{ x } + $word_width ) > $box_width );
          $wrap and $self->add_newline();

          # Put word at, and advance cursor
          my @pdfpos = $self->_cursor_pdfpos();
          $pdfpos[ 1 ] -= $self->{ opts }{ pt }; # correct position so that it's at underside of char
          $self->{ page }->string( $self->{ _curr_font }, $self->{ opts }{ pt }, @pdfpos, $word );
          $self->{ opts }{ x } += $word_width;
          $self->{ opts }{ x } += $self->text_width( ' ' );
        }

        $self;
      }

      # Adds/wraps text at, and updates cursor-position
      sub add_text
      {
        my ( $self, $text, %opts ) = @_;

        $self->set_opts( %opts );

        $self->add_word( $_ ) foreach split( /[\s<>]+/, $text );

        $self;
      }

      # Utility: overlay one PDF-page onto another (e.g. text onto a picture).
      sub overlay_pdf_files
      {
        my ( $fg_path, $bg_path, $out_path ) = @_;
        $out_path or return undef;

        my $fg_pdf = PDF::API2->open( $fg_path );
        my $bg_pdf = PDF::API2->open( $bg_path );
        my $out_pdf = PDF::API2->new;

        # Bring in the template page
        my $page = $out_pdf->importpage( $bg_pdf, 1 ); 

        # Overlay the second input page over the first
        $page = $out_pdf->importpage( $fg_pdf, 1, $out_pdf->openpage( 1 ) ); 

        #Save the new file
        $out_pdf->saveas( $out_path );

        1;
      }

      # Input: line-oriented annotated plaintext, as per instructions/properties/escapes listed
      sub interpret
      { 
        my ( $self ) = @_;

        my $linenr = 1;

        foreach my $line ( split( /\n/, $_[ 1 ] ) ) {
          $line =~ s/\n\r//g;
          $self->_interpret_line( $line, $linenr ) or die;
          $linenr++;
        }

        $self;
      }

      sub _interpret_expr
      {
        my ( $self, $expr, $linenr ) = @_;

        $expr =~ /^\s*$/ and return;   # ignore empty expressions

        my $ev = "";

        foreach my $tok ( split( /\s+/, $expr ) ) {
          $tok = "\$self->{ opts }{ $1 }$2" if ( $tok =~ /^([a-z_]\w*)(.*)$/i );
          $ev .= "$tok ";
        }

        my $eval_ok = eval( $ev );
        defined $eval_ok or die "malformed expression '$expr' in line $linenr";
      }

      sub interpret_directives
      {
        my ( $self, $directives, $linenr ) = @_;

        foreach my $dir ( split( /;+/, $directives ) ) {

          $dir =~ s/^\s*(\S.*\S)\s*$/$1/;   # strip leading/trailing whitespace

          if    ( $dir eq 'dump' ) { print Data::Dumper::Dumper( $self->{ opts } )   }
          elsif ( $dir eq 'home' ) { $self->{ opts }{ x } = $self->{ opts }{ y } = 0 }
          elsif ( $dir eq 'bold' ) { $self->bold_font()                              }
          elsif ( $dir eq 'norm' ) { $self->norm_font()                              }
          else                     { $self->_interpret_expr( $dir, $linenr )         }
        }

        $self;
      }

      sub _interpret_line
      {
        my ( $self, $line, $linenr ) = @_;

        if    ( $line =~ /^\s*##/ )    {                                                                    }
        elsif ( $line =~ /^\s*#(.*)/ ) { $self->interpret_directives( $1, $linenr )                         }
        elsif ( $line =~ /^\s*$/ )     { $self->add_newline() if $self->{ opts }{ x }; $self->add_newline() }
        else                           { $self->add_text( $line )                                           }

        $self;
      }
    };


Delivered to you by Vim, GNU Make, MultiMarkdown, bozohttpd, NetBSD, and 1 human.