Tuesday, September 28, 2010

Well, after a year layoff I'm back in the saddle again. I've been doing a lot of reading lately so I thought I'd make my comeback with a review of Manning Publication's iText in Action.

Review of: iText in Action 2nd edition

Author: Bruno Lowagie
Publisher: Manning Publications
ISBN-10: 1935182617
ISBN-13: 9781935182610

$37.79 (amazon.com) 700 pgs.
Reviewer: John S. Griffin

iText makes it possible to enhance applications with dynamic PDF solutions. It is one of the world's leading F/OSS (Free/Open Source Software) PDF libraries and is released under the Affero General Public License (AGPL). iText is available in two versions: the original Java version, and the C# port, iTextSharp. The iText web site is located at http://itextpdf.com/.

Both editions were written by the man who created the API. In the first edition of "iText in Action", readers learned why things work the way they do in iText, complemented with simple examples. This second edition takes readers further with more real-life examples and presenting comprehensive code samples that you can use to solve everyday problems.

The book utilizes a movie database, created for a (fictional) film festival as the running application for the examples as is usual for any Manning book. It is accessed from a series of simple programs and PDF files are created and manipulated in different ways that could be useful for the visitors of the imaginary film festival.

Summary of Contents
Part 1: Creating PDF documents from scratch
This section demonstrates how to create a document from scratch. Concepts such as iText's basic building blocks and direct content are introduced. Adding columns and tables to a document are discussed in great detail. Part one concludes by explaining how to add the finishing touch to your document using page events for headers, footers, page numbers and watermarks.

Chapter 1: Introduction
Creating a series of PDF documents from scratch begins the book. SQL statements are used to query a movie database, loop over the ResultSet, and add the data from each record to a PDF document using high-level objects such as Chunks, Phrases, Paragraphs, and so on. The creation of PDF documents without having to know anything about the PDF specification is the goal here.

Chapter 2: Composing a document using iText's Basic Building Blocks
Lines, shapes, and text are demonstrated to create a time table visualizing movie screenings, using a different color for every film festival category. To achieve this, low-level operations that demand a sound understanding of how PDFs work are utilized.

Chapter 3: Adding content at absolute positions
This chapter covers the basics of adding content to a page using methods that are referred to as low-level operations because they write PDF syntax directly to the content stream of the page. The ColumnText object is covered and reusing content with the PdfTemplate object is also discussed.

Chapter 4: Organizing content in tables
In one of the most important chapters of the first part, documents containing tabular data are created from the database. Almost everything there is to know about the PdfPTable and PdfPCell objects is presented.

Chapter 5: Completing your layout using table, cell, and page events
Knowledge about tables and cells will be completed by learning how to add custom behavior to a table and its cells using events. Also page events are discussed. The finishing touch to documents in the form of headers, footers, page numbers and a watermark is presented.

Part 2: Manipulating existing PDF documents
Part two deals with existing PDF files; be it documents created with iText as discussed in part one, or PDFs created with Adobe Acrobat, Open Office, or any other PDF producer. Different ways to copy, stamp, split and merge documents re examined. Actions and JavaScript are discussed along with everything on the subject of filling out interactive forms.

Chapter 6: Working with existing PDFs
PdfReader is used to access an existing PDF file, and one or more of these document manipulation classes is chosen for further changes depending on particular circumstances:
  • PdfWriter in combination with PdfImportedPage objects - taking "photocopies" of specific pages
  • PdfStamper - adding content to one existing PDF document
  • PdfCopy, PdfSmartCopy, or PdfCopyFields - combining a selection of pages from different, existing documents into a new PDF document

Chapter 7: Making documents interactive
A closer look at the PdfStamper class is taken. It is used to annotate a single document.

Chapter 8: Filling out interactive forms
A special type of annotation in PDFs are interactive form fields. They are used in forms by implementing AcroForm technology. Another type of PDF form is based on the XML Forms Architecture (XFA). Both types of interactive forms are discussed in chapter 8.

Part 3: Essential iText Skills
Parts one and two showed how to build a standalone application that is able to create and/or manipulate a PDF document in black and white, using standard fonts, and so on. Part three tells you how to integrate it into a web application, how to create images and colors, how to choose and use different fonts, and finally how to protect your document.

Chapter 9: Integrate iText in your web applications
For the sake of simplicity, most of the examples in this book are standalone applications, but a majority of projects use iText as a PDF engine in server-side web applications. You'll certainly benefit from chapter 9 if you want to avoid the pitfalls you might encounter while integrating your iText application into a Java Servlet.
Once your proof of concept is online, you'll probably be confronted with many extra user requirements some of which are covered in the next three chapters.

Chapter 10: Brighten up your PDF with color and images
There are eleven different color spaces available in PDF. Color spaces are expressed as PDF dictionaries. A reference to these dictionaries can be found in the resources entry of a PDF stream. This is explained in great detail in the PDF reference. How iText hides this complex theory by providing color classes is discussed.

Chapter 11: Choose the right font
How does one choose a particular font? Which font files can be used with iText? How about special writing systems? In some languages, you have to write from right to left, and from top to bottom. This chapter demonstrates that iText contains convenience classes that make it easier to select a font.

Chapter 12: Protect your PDF
Several important topics are presented in this chapter:

  • Providing metadata - information about the owner of this data and how to add and access it
  • Compressing and decompressing PDFs - the content of a document is compressed by default. iText can decompress content streams to read the PDF syntax that makes up a page or a form XObject.
  • Encrypting documents - PDF protection via passwords and encryption
  • Adding digital signatures - public and private keys

Part 4: Under the hood
An overview of the history of PDF is presented in chapter 13. It takes a look at the inner workings of a PDF document. It explains the different parts of a PDF file: the header, the body, the cross-reference table, and the trailer. Since the body of a PDF file consists of a series of objects, an examination of the different types of objects in the Carousel Object System is presented. The three final chapters focus on stream objects.

Chapter 13: PDF files inside-out
One of Adobe's important goals, was that every new version of the PDF specification had to be backward compatible. This was possible thanks to the well designed architecture of a PDF file, a.k.a. the Carousel Object System. By studying the different objects that make up a PDF document, you'll learn how iText creates a PDF file.

Chapter 14: The imaging model
The streams holding the content of a page in a PDF document is the focus of this chapter. All the methods to draw lines and shapes (graphics state), and to write letters and words (text state) are listed.

Chapter 15: Page content and structure
Making content optional is discussed, and also structure in the content stream of a page is introduced. A fair attempt to parse content streams of existing PDF pages is made and a closer look is taken at the other streams that can be found in a PDF document: images, fonts, file attachments, and rich media.

Chapter 16: PDF streams
In this final chapter, streams that contain images, fonts, file attachments, and rich media are covered.

This book is a rewrite of the first edition. When this reviewer first started the book and read what the author had done with it compared to the first edition he was a little worried that interested readers might have to buy both books in order to completely understand the iText library and API. After reading deeper into it you can rest assured that this is not the case. iText beginners will have no problem understanding the API.

The chapters concerning basic building blocks (to get you started), fonts, tables, colors, web applications and all the others are still there and with better examples and explanations. Speaking of examples, there are a hugh number of them in this book which is an improvement over the first edition. Many of them can be used right out of the book to solve problems that the author is sure you will run into. They have also been updated from Java4 to Java5.

New topics include PDF security which covers public/private keys, encryption, compression, passwords and the like, manipulating existing PDFs (what you can and cannot do) and a deep dive into the imaging model and the actual inner workings of the page among other topics.

In this reviewer's previous work experience there were many copies of the first edition on the desks of report developers. This is one of the most important niches for iText. Yes, there are more pieces of PDF generation software out there today than you can count but their problem is that they are very difficult to automate for on demand, detailed, ad hoc reports. That is a very large workspace for iText. This reviewer expects to see many more copies of the new edition out there than there were original ones.

All in all, the book is well written and the examples are vastly improved over the first book. If you are creating a system for generating ad hoc reports, programmatically generating PDFs on the fly for web or other applications, modifying existing PDFs without their original documentation or doing just about anything else with PDFs, this is the book you need. Highly recommended.

It won't be another year before I write again. I'm going to put another review here soon on a book I'm extremely impressed with, Manning's DSLs in Action.

1 comment:

Stefan said...

Thanks for the review, John! This was very helpful. Do you by chance know the iText version the second edition is based on?