A compiler is a strange beast and seen by many as one of the last domains of black magic. And in earlier times it definitely was one of the most complex fields in Computer Science. But today compilers are things to play with as easily as a graphics program (some would say graphics programs are more complex).
There are two basic tricks that have moved compilers from the area of immensely complex problems to something that can be understood without a degree in Rocket Science. The first is to divide the work in different phases that are executed sequentially. The second is to use a tree-based representation of the program that is to be translated. These syntax trees are an abstraction of the real input and are thus named AST (Abstract Syntax Tree).
If we want to create AST transformations we should understand both, the different compiler phases and the tree-based representation.
A compiler's work can be divided in three different stages, first of all the frontend stage in which the program is read and transformed into the tree-based representation, the second middle-end stage in which the tree then is transformed and optimized, and the third and final backend stage in which the tree is instrumented with the instructions (in our case the byte code) and then written out (to memory or disk).
In the case of a compiler that creates byte code for the Java Virtual Machine the second stage is not really that important because most of the optimization work is done by the JIT compiler. In Groovy this holds true as well. Groovy++ though is a bit of an exception; the static compiler for Groovy does much work in this second area.
The Groovy compiler has nine phases. Six of these phases can be seen as part of the frontend stage; they focus on creating the full abstract syntax tree, check its correctness with regard to syntax and semantics and complete it (e.g., add additional nodes where needed). These phases are:
In this phase the necessary files are opened, the needed resources are allocated and the environment is set up.
Here a first tree representation of the input data is created. This includes the lexical analysis, identifying words from the language and the parsing of these words according to the Groovy grammar. This representation is still very much based on the source code i.e., a concrete representation or Concrete Syntax Tree (CST).
This phase takes the CST and transforms it to a more abstract representation, the Abstract Syntax Tree or AST.
- Semantic Analysis
In every language there are statements for which the normal grammar cannot decide whether or not they are correct. So in this phase the Groovy compiler, by hand, checks all these cases. This includes resolving classes, static imports and scope of variables. When this phase is complete the input is seen as valid source.
In this phase access from inner classes to the surrounding scope is resolved. Interestingly, the Groovy++ compiler does most of its work here.
The Backend stage has to create the bytecode representation from the AST. This again is done in different phases:
- Instruction SelectionClass Generation
These phases are also called "Class Generation 1" and "Class Generation 2". They are responsible for two major steps on the way to the final class. The first, and more important step (at least for the AST transformations) completes the AST and fills in everything which the programmer can so conveniently ignore (e.g. optional return statements). The second is the selection of the target bytecode set (which could be "Pre JDK5" or "Post JDK5" and following that the instrumentation of the AST with the bytecodes. Now that the AST is complete and instrumented with the correct instruction set the bytecode can be generated in memory.
The name is the game. This phase writes the bytecode that has been generated in the last phase to the output.
This phase is simply for housekeeping purposes. Allocated resources are freed, the environment is cleaned, files are closed etc., to guarantee that everything is in order when the compiler terminates. This is especially important since the compiler might not have been called from the command line, but instead e.g. inside an application server that tries to deliver a groovy resource through the template engine.
Before you think that I know every last detail of the Groovy compiler, let me tell you that Jochen "Blackdrag" Theodorou helped me and corrected the detail information in this entry.
Now that you know about the different phases of the compiler and their respective use, stay tuned for the next entries in which we actually start to play with the compiler.
Jochens presentation of working with the compiler.
Some Interesting Classes (see Groovy sourcecode)
|Selection of Instruction Set||org.codehaus.groovy.classgen.AsmClassGenerator|
|Adding Return Statements||org.codehaus.groovy.classgen.ReturnAdder|