Article
Parsers, Part II: Building a Java Class Browser
  
 
 

Articles Index


If you're a typical programmer--with a background in Computer Science--you look upon parser technology as fondly as your last trip to the dentist. Unless you are YACC freak, you were only too happy to leave behind the semester or two you spent building a simple Pascal or C compiler, along with memories of dorm food and bad roommates.

If you come from a less traditional programming background, you are in good company. Many Java programmers are self-taught. But you are just as likely as any CS grad to run screaming at the mention of Bakus-Naur forms, LALR grammars, and symbol table management.

Beyond Grep, AWK, and Perl

In truth, parsers, parser generators, and BNF grammars are no more obscure than common text processing tools like grep, sed, and AWK.

In fact, many programs based on pattern recognition are more easily implemented (and more powerful) when implemented with hand-written grammars, which are fed as input to parser generators. For those who understand the basics of compiler technology, it's little suprise to realise that grep itself is a parser generator whose grammar is the language of regular expressions. AWK, as an extension of grep and sed, takes the regular expression technologies a step further to implement a general programming language based on text processing and regular expression pattern matching. Perl, like AWK is yet another attempt to build a generalized text processing language based on regular expressions. Small wonder that Perl is frequently used to parse a variety of file types from HTML to C.

How Good is Your Grammar?

The intent of this series is to give you just enough background in compiler technology to make you "dangerous." This is not a course for building compilers. Rather, it is a course in the construction of powerful programming tools that should be useful to any Java programmer. Most of the tools share a common approach based upon lexical analyis of Java (and other languages) and pattern recognition. Many of these tools might be based upon a component written in grep, AWK, or Perl. However having access to the source code for a parser whether written in C, C++ or Java, can give you significant advantages in program design over using more standard pattern recognition tools. It's no coincidence that the term recognition in pattern recognition corresponds to the term recognizer, which is another word for parser.

The main subjects presented will include parsers, parser generators, and grammars. Tools presented in the series include source code browser (both class browsers and class hierarchy browsers), source code migration tools (useful for adapting Java 1.0.2 source code to a Java 1.1 environment), HTML verifiers (syntax checkers, link checkers), and code mungers used to protect your valuable class files from reverse engineering with decompilers like Mocha. For the more nerdly, you may be interested to know that combing a good text editor with a specialized parser, and an object-oriented database will give you most of the technology required to build an incremental compiler. Given the right parts, incremental compilers are not nearly as hard to implement as widely believed.

Next week we will post a complete Java grammar along with two binary executables: a parser generator and a parser. The parser is based upon the Java grammar. We will explain the process of generating a parser from a parser generator and a specific grammar.

Even if you have little interest in parser technology today, you may quickly come to see parser construction as a simple extension to more typical programming tasks. If you are capable of writing moderately complex regular expressions for grep and sed, or of writing AWK and Perl programs, you should have little trouble learning to appreciate the power of generating your own custom parsers for specific language processing tasks. As Terence is fond of saying, once you understand parsers, most programming tasks boil down to language processing problems.

This week we present the first of several Java applications that depend upon the output of a customized Java parser. This first application is a Java source code browser that parses an input file and locates class, variable, and method declarations. All such declarations are displayed in a list box at the top of the application's frame, while a text display shows the source code of the file being parsed. Selecting any of the class, variable, or method declarations in the list box will automatically position the cursor in the text window to the corresponding location in the source file.

Figure 1. A simple Java Class Browser

This simple source code browser, or class browser, is a model for more sophisticated class browsers, class hierarchy browsers, and project browsers found in powerful development environments like Smalltalk, OpenStep, NextStep, and Lisp workstations.

Download the source code.

Running the Browser

To run the browser from the command line, type

java Browser P.java

The second argument refers to the Browser.class application that includes a main function defined as the primary method of the Browser class. The third argument, P.java, refers to the source file that is to be parsed by the browser.

Until we Grep Again

For now, we simply present the browser application code for your study. Future installments will walk you through the details of this code. In the meantime, you can run the browser application to get a sense of its usage. Then you can peruse the code to see how the output of the parser can be used as input to the browser. You can literally build dozens of seemingly unrelated tools using this approach. Who knows, you may even come to think of parser construction as a vital component of your programming toolkit.

Until next time, enjoy.


copyright © Sun Microsystems, Inc