 |
Articles Index
By Greg Voss, JavaSoft
Terence Parr, MageLang Institute
January 1997
Java was specifically designed to simplify the complexities of C++
syntax and semantics. This simplification provides benefits for not
only the programmers who write programs in Java, but also for the
toolsmiths who build Java development tools. C++ is generally
considered to be the most difficult language for which to write a
parser, also known as a recognizer. Consider that just
building a correct symbol table manager for C++ can take a language
expert a month of programming effort: whereas for parsing Java a
language expert, given the right tools, can build an entire parser in
a few days.
While a Java parser could be built by hand, tools called parser generators
exist that can write recognizers automatically, given a description of a
language's grammatical structure. The ANTLR parser generator is one such tool
that has become popular due to its power, simplicity, and flexibility.
This article introduces a four-part series on parsing Java source
files as parsing applies to development tool construction, including a
discussion of ANTLR and general language recognition principles. Each
article in the series will focus on one of the following four
subjects:
-
An introduction to the Java parser and its application. A working source
code browser application is presented.
-
General principles behind language recognition and translation and how
to use the ANTLR parser generator. Source code and binary executables are
provided for ANTLR.
-
Symbol table management for Java and how symbolic information can be
used to answer questions about Java source code.
-
A discussion of the latest version of ANTLR that generates Java rather
than C++. Source and binary executables are provided.
Many programmers dismiss language recognition as purely a compiler
writer's problem; however, a number of interesting noncompiler tools
can be built using a parser as base. For example, a Java parser lies
at the base of the following useful tools:
-
JDK 1.0.2 to 1.1 code migration tool. A translator can be built
to detect and change obsolete, discouraged, or renamed method names.
-
Java source code browser. Many companies are building code
browsers and debuggers for Java. Being able to examine the source and
access symbol table information is crucial.
-
Java source code obfuscator ("munger"). The portability of
Java
.class files comes at a price at the moment. Byte-code
decompilers can reverse the compilation process and obtain essentially
the exact Java source code (including variable names) for any compiled Java
program. A code obfuscator could simply rename all of a program's classes,
variables, and methods to be a1 , a2 , a3 , and so on
effectively rendering any decompilation unreadable. A munger can be very useful for
protecting your intellectual property by causing meaningless output to be generated
when someone tries to decompile your class files with tools like Mocha.
-
In-house Java extensions to aid debugging. A Java translator could
be built that accepted debugging extensions (such as "run method
x after each access to this object to ensure consistency") or
that automatically added extra debugging information to your program.
For more information on the ANTLR parser generator on which this series is based,
see the "getting started in ANTLR" page.
Read Part II in this series:
|
 |