Sunday, September 25, 2011

AST Transformations: Testing and Error Messages

When testing AST transformations we have different potential ways to do it, with respective advantages and disadvantages. In this entry we will look at direct testing, testing using a GroovyClassLoader and how global AST transformations can be tested. We also examine the last missing puzzle piece to our jigsaw puzzle of building local AST transformations, namely how to notify the user of errors in the way the transformation is used.

Direct Testing
Implementing a test directly is the simplest way to test whether a transformation does what we expect. The one disadvantage of this kind of testing is that you cannot watch the compiler do its work i.e., you cannot debug through your AST transformation (at least not without additional effort). Why? When the test class is loaded and compiled, then, in the phase "Semantic Analysis" the referenced classes are resolved, loaded, and compiled. So, when the actual test run starts, the whole compilation we are interested in has already ended. Additionally, the compiled transformation and annotation have to be on the classpath. If you are still developing the AST transformation this is not the best way to go.

Nonethelesse it is ok to simply have this kind of test in your test suite to ensure that the AST transformation still works (as a regression test). Here is an example that verifies the behavior (and yes, it is not perfect in that it does not test all edge cases).

class RequiresDirectTest extends GroovyTestCase {

    public void testInvokeUnitTest() {
        def out = new RequiresExample()

        assert out.divide10By(2) == 5

        def exceptionMessage = shouldFail { out.divide10By(0) }
        assert exceptionMessage =~ /Precondition violated/
    }
}


Testing using a Classloader
The better way to test during development is by using a GroovyClassLoader that parses the source code. Since this includes compiling the code you have every option to debug your AST transformation and watch exactly what happens.

We start by either defining a String containing the source class, or by loading the containing file from the file system. I prefer the second way, because then you can edit the class with a normal editor, and you can use the direct testing in parallel.

Then we create a new GroovyClassLoader, parse the class and create a new instance, on which we again execute our tests. And, as said above, during parseClass() the compiler is called and your AST transformation is executed during the "normal" program run.

class RequiresClassloaderTest extends GroovyTestCase {

    public void testInvokeUnitTest() {
        def file = new File('./RequiresExample.groovy')
        assert file.exists()

        GroovyClassLoader invoker = new GroovyClassLoader()
        def clazz = invoker.parseClass(file)
        def out = clazz.newInstance()

        assert out.divide10By(2) == 5

        def exceptionMessage = shouldFail { out.divide10By(0) }
        assert exceptionMessage =~ /Precondition violated/
    }
}


Global AST transformations
Global AST transformations have to be placed in the class path, in a jar with an additional text file containing the class names of the classes implementing the transformations (see references). Luckily, Hamlet D'Arcy has written a helper class that creates a new GroovyClassLoader and adds the transformation.

The interesting thing is, that this also registers a local transformation correctly. There is one caveat though: the transformation now is registered as a local and a global transformation that thus gets triggered twice. One call is as expected (the call for the local transformation), the other call (as a global transformation) provides only the source unit and no AST nodes i.e., the first argument is null. If you followed the defensive programming style, then you already check whether the AST nodes are provided and stop processing if not. This means you can use the helper class as well, and you can find many examples on the web. The question is: Should you? My opinion is that it helps if you ultimately try to create a global transformation and want to start by implementing a local transformation. In all other cases I would prefer the class loader approach.

class RequiresTransformTestHelperTest extends GroovyTestCase {

    public void testInvokeUnitTest() {
        def file = new File('./RequiresExample.groovy')
        assert file.exists()

        def invoker = new TransformTestHelper(new RequiresTransformation(),
                                              CompilePhase.CONVERSION)

        def clazz = invoker.parse(file)
        def out = clazz.newInstance()

        assert out.divide10By(2) == 5
        def exceptionMessage = shouldFail { out.divide10By(0) }
        assert exceptionMessage =~ /Precondition violated/
    }
}


Signalling Errors to the Compiler
If everything works as expected our transformation transforms the AST. But if something goes wrong we should try to communicate our problems to the user. The problem: There is no printwriter and no logger that can be used (and it wouldn't be a good way to do this anyway).

We could throw an exception or even an error, but we shouldn't do this either. Now, if you look at many of the transformations provided by Groovy, you could get the impression that throwing exceptions and errors is good manners. But the point is that these transformation, being part of Groovy, have a totally different implicit contract with the compiler. When we build an AST transformation, and some internals of the compiler on which we depend changes, than this is not the fault of the compiler but ours. So, no exceptions or errors; the compiler should execute without disruption, collecting errors from the different phases and different transformations to give the user as complete a list of the errors as it could determine.

The compiler, in fact, offers us a functionality to add our errors to a list of errors and continue the processing. Since errors are specific to a source, the SourceUnit provided as an argument to the visit()-method has an error collector that we can use to do this. The error collector can be accessed using the method getErrorCollector(), and offers a method addErrorAndContinue() that we can use. It takes an object of the class SyntaxErrorMessage, which in turn needs a SyntaxException as argument. In the following method, which can be found in the LogASTTransformation (and which is a variation of another method found in the class ClassCodeVisitorSupport), all these objects are created and fed to the error collector.

So, why not steal from the best? Here is the method, slightly modified for readability

public void addError(String msg, ASTNode expr, SourceUnit source) {
    int line = expr.lineNumber
    int col = expr.columnNumber

    SyntaxException se = new SyntaxException(msg + '\n', line, col)
    SyntaxErrorMessage sem = new SyntaxErrorMessage(se, source)
    source.errorCollector.addErrorAndContinue(sem)
}


And you simply use it like this (using our example):


public void visit(ASTNode[] astNodes, SourceUnit sourceUnit) {

    if (!checkNode(astNodes, annotationType)) {
        addError("Internal error on annotation", astNodes[0], sourceUnit);
        return
    }
    ...
}


This presents an error message to the user with line and column and, if possible, the source line where we encountered the error. Since we used the first node in the array of nodes, we associate the error with the annotation itself (the second node would be the annotated element).

Finally we can fill in those parts of the AST transformation where we had a comment "better error handling" and signal those errors to the user.

End
Now you have everything you need to successfully create your own, interesting AST transformations. I'm looking forward to a lot of new and interesting ideas being realized as AST transformations.

There is one loose end we still have to cover, though. The AstBuilder provides two additional methods for creating an AST, buildFromCode() and buildFromSpec(), which we will revisit soon. So stay tuned ...

References
Global AST Transformations Groovy documentation on global AST transformations

Sourcecode
RequiresDirectTest.groovyThe direct test
RequiresClassloaderTest.groovy    The test using a Groovy Classloader
RequiresTransformTestHelperTest.groovy    The test using a TransformTestHelper

Interesting Classes
org.codehaus.groovy.tools.ast.TransformTestHelperThe helper class that can test global AST transformations
org.codehaus.groovy.transformLogASTTransformation  The transformation containing our addError()-method
org.codehaus.groovy.ast.ClassCodeVisitorSupportThe class containing the original addError()-method


Thursday, September 22, 2011

AST Transformations: The transformation itself

So far we have seen the different compile phases, the form that an annotation has to take to wire the AST transformation into the compiler process, had a look at the classes that create the AST structure and we have discussed the idea of a transformation that implements a precondition as an annotation. Now we get to the transformation itself.

Access to the AST
We know that the compiler, during its work, creates an abstract syntax tree that is the basis for the finally created bytecode. The compiler provides an ordered access by allowing us to register objects implementing the interface ASTTransformation for the execution in a specific phase of the compiler.

The idea is that for each phase a tree walker is executed that, for every node it encounters, calls the method visit() of the registered objects to add information to the nodes and / or change the respective subtree. For local AST transformations this method is always called with two arguments. The first argument is an ASTNode array containing an AnnotationNode that triggered the call and a node representing the source code element that has been annotated e.g., if you have annotated a method then the second element in that array is an object of type MethodNode. The second argument is the SourceUnit instance that represents the actual source code.

Interlude - Defensive Programming
Now, this is more or less the description of the informal contract between the compiler and your ASTTransformation. So nothing has to be done and we can directly start with modifying the abstract syntax tree. But wait, do you really trust the compiler? What if, through some mysterious ways (and I don't mean the X Files kind, but the "oh, someone upgraded the compiler to a new major version with different semantics for the interface" way) the compiler decides to call your transformation with some other arguments? Then your code fails and with it, the compiler as a whole, without giving any sensible information about what happened to the user.

In addition to that Hamlet D'Arcy pointed out "that the contract [between AST transformation and compiler] is enforced by unit tests, not the type system and not a language specification. Who knows what will change in Groovy 1.9 and 2.0. It is better to be defensive and fail *without causing an exception* because that will make the compilation proceed even if your transform is broken in the future."

So, if we are fooling around with the compiler, a defensive programming style is the first thing you should adopt. This means, that the following code,

def annotationType = Requires.class.name

private boolean checkNode(astNodes, annotationType) {
    if (! astNodes)    return false
    if (! astNodes[0]) return false
    if (! astNodes[1]) return false
    if (!(astNodes[0] instanceof AnnotationNode))        return false
    if (! astNodes[0].classNode?.name == annotationType) return false
    if (!(astNodes[1] instanceof MethodNode))            return false
    true
}

public void visit(ASTNode[] astNodes, SourceUnit sourceUnit) {
    if (!checkNode(astNodes, annotationType)) {
        // add an error message or a warning
        return
    }
    ...
}


while totally redundant and needless, still is a guard against the worst. Look up Byzantine faults if you truly want to become paranoid; this is actually a good thing if you are writing AST transformations (remember the old saying, that being paranoid does not mean that your are not being followed).

It is important to note, that in principle the visit()-method of your AST transformation could be called by the compiler in parallel for different parts of the source code that have been annotated with our annotation. While the current compiler is not parallelizing (at least with Groovy version until 1.8.2), you should be aware of this and not only program defensively, but allow for parallel execution as well.

The AST Transformation
Actually implementing the visit()-method is straightforward. First of all we verify that we have been called with the correct arguments, then we check that the value provided by the annotation is of the correct type (since we are using a simple String constant this will be a ConstantExpression), call a method createStatements() that creates the new AST subtree that we want to add at the beginning of the method, and insert the statements at the beginning of the subtree of statements that represents the method that has been annotated.

public void visit(ASTNode[] astNodes, SourceUnit sourceUnit) {
    if (!checkNode(astNodes, annotationType)) {
    
    // add an error message or a warning
        return
    }

    MethodNode annotatedMethod = astNodes[1]
    def annotationExpression = astNodes[0].members.value

    // add better error handling
    if (annotationExpression.class != ConstantExpression) return

    String annotationValueString = annotationExpression.value
    BlockStatement block = createStatements(annotationValueString)

    def methodStatements = annotatedMethod.code.statements
    methodStatements.add(0, block)
}


Piece of cake... If there wasn't the method createStatements(). In this method we could manually create the different node objects and assemble an AST representing the code that we want to add to the beginning of the method (see the previous blog entry). This works, but is tedious. Or...

Enter The ASTBuilder
The ASTBuilder has been written by Hamlet D'arcy expressly to easily create ASTs using a simple builder DSL. This is done using the method createFromSpec() that takes a closure (containing the DSL statements) as an argument. Additionally, the ASTBuilder provides a method buildFromCode() to create an AST directly from code expressed as a closure and a method buildFromString() that creates the AST from source code provided as a String. Interestingly the method buildFromCode() recreates a String representation from the supplied code and feeds that to the method buildFromString() to do its work.

Completing the Transformation
We will use the method buildFromString() to create our AST tree representing the if statement.


def createStatements(String clause) {
    def statements = """
        if(!($clause)) {
            throw new Exception('Precondition violated: {$clause}')
        }
    """


    AstBuilder ab = new AstBuilder()
     List<ASTNode> res = ab.buildFromString(CompilePhase.SEMANTIC_ANALYSIS, statements)
    BlockStatement bs = res[0]
    return bs
}


First of all we define a GString that contains the static parts of the code and into which we embed the precondition as the if-clause and in the message of the created exception. Then we use an ASTBuilder-object to call the buildFromString()-method on our GString and let it create the AST for the phase "Semantic Analysis" (as explained in the blog entry).
The AstBuilder returns a list of ASTNodes. The method buildFromString() offers an additional parameter statementsOnly (which is set to true by default). If this additional parameter is set to false, then the method, in addiition to the AST representing the statements, returns an AST representing the whole Script class surrounding the code fragment as the second entry in the list of ASTNodes. Regardless of the way you call it, the first entry is always the AST representing your statements. In our case this is a block statement that in turn contains our if statement.

Tying it all Together
The final step is to tell the compiler in which phase the AST transformation will run. We choose the same that we used for our AST builder, namely "Semantic Analysis". This is done with another annotation as follows:


@GroovyASTTransformation(phase = CompilePhase.SEMANTIC_ANALYSIS)
public class RequiresTransformation implements ASTTransformation {
...
}


End
What remains? Testing and better error signalling. Stay tuned for the next blog entries.

Sourcecode
Requires.groovyThe Requires-Annotation
RequiresTransformation.groovy    The AST Transformation for the Requires-Annotation

Interesting Classes
org.codehaus.groovy.ast.builder.AstBuilderThe Builder that allows to simply create ASTs.
org.codehaus.groovy.ast.builder.AstStringCompiler    The class responsible for creating ASTs from Strings.
org.codehaus.groovy.transform.ASTTransformation   The interface between compiler and transformation.

Saturday, September 17, 2011

AST Transformations: Prerequisites and Annotations

Before we are starting with the transformations themselves, we need to be able to execute them on the source code. Two ways exist to do this: global or local AST Transformations are possible. Since local transformations are much easier to test we will focus on these first.

Annotations
The usual way to execute a transformation on the source code locally is using an annotation that marks the part of the abstract syntax tree to be transformed. So the first step is to create an annotation class for our use. If you have never heard of annotations before, now would be a good time to take a short break and read up on them.

Annotations allow us to annotate (nomen est omen) a part of the source code, providing additional information or instructions to either compiler or runtime system. An annotation is defined using the keyword @interface (the normal Java keyword interface prefixed with an @-sign). You can add elements that can contain additional information. If you use only one element it should be named value, which allows to leave out the element name when using the annotation, thus providing an elegant shortcut for providing the information. Additionally you can provide a default value for every element in case it is not defined by the user.

The parts of the source code that can be annotated are themselves defined using an annotation @TARGET that can take as different values TYPE, FIELD, METHOD, PARAMETER, CONSTRUCTOR, LOCAL_VARIABLE, ANNOTATION TYPE and PACKAGE or combinations of them. The names speak for themselves.

You can decide how long the annotation is kept using another annotation @RETENTION that takes three values. If SOURCE is used, then the annotation is discarded after the compiler run, CLASS is kept until the class is loaded and only RUNTIME is kept until the program is actually executed. So if you start experimenting and try to read an annotation using reflection be sure to use RUNTIME as the retention type.

Now let us use this information to create a first simple annotation that takes a value, annotates methods and is only used during compile-time (after all this is where the AST will be changed).

@Retention (RetentionPolicy.SOURCE)
@Target ([ElementType.METHOD])
public @interface FirstAnnotation {
    String value() default "this is the default value";
}


Seems simple? It actually is.

So far this is simple Java. Now if we want to use this annotation to signal an AST transformation we have to somehow wire the annotation to that transformation. This is done using another, groovy-specific annotation named @GroovyASTTransformationClass. This annotation takes as a value the fully qualified class name of the class implementing the AST transformation as a String. Now, if we put this together we get something like the following:


@Retention (RetentionPolicy.SOURCE)
@Target ([ElementType.METHOD])
@GroovyASTTransformationClass ("RequiresTransformation")
public @interface Requires {
String value() default "true";
}


If you get the distinct feeling that this might be part of the actual later example, you might not be wrong. Let me add, though, that for the sake of simplicity the whole example is put into the default package, which you probably shouldn't do at home.

Short Interlude - AST Structure
The structure of the AST can be arbitrarily complex (which is quite natural if you remember that it is a representation of the source code you just fed the compiler). I could now start to discuss all the gory details and walk you through the grammar and everything tied to it. But that is tedious. Instead, first of all, look at a few Groovy packages using your favorite IDE.

The first package is org.codehaus.groovy.ast, in which you find the structural elements that reflect the class level structure of your source code. FieldNode, MethodNode, Parameter, ConstructorNode ... You can compare these names to the possible @TARGET values above. The different values explicitly target these nodes, allowing you to focus on substructures instead of the AST as a whole.


Additionally to the structural elements we need the actual expressions and statements, which are built from the node type found in the packages org.codehaus.groovy.ast.expr and org.codehaus.groovy.ast.stmt (e.g. BooleanExpression in the first or IfStatement in the second package).


Please note that the screenshots of the different packages do not show the whole package, only a few entries are shown.

This all seems a bit theoretical, but now fire up your GroovyConsole. Enter the following short program:

def val = 1


and execute it (this is a workaround for a bug in Groovyconsole that otherwise renders the AST browser nonfunctional). As the next step choose the menu "Script  -> Inspect AST" and voilá, you can examine the live AST of the source code you entered. In fact you get two ASTs, the first being only the source code you entered, the second being the full Script class that is automatically generated from groovy scripts. Addtionally you can examine the AST in different phases (choose from the Dropdown menu). By default the compiler phase is "Instruction Selection".


For our purposes "Semantic Analysis" is the more interesting one.




If you compare the ASTs of the two phases shown above, you can see that, as described in the previous blog entry, the ReturnAdder inserts the needed return statements in the "Instruction Selection" phase. Now try an if-statement, throw an exception and define a method in the Groovy Console and view the respective ASTs to get a first understanding of how the AST is structured.

The AST Browser built into the Groovy Console is an indispensable tool if you want to understand how the AST is built, and I quite often use it when experimenting with AST transformations.

Transformation - The Example 
A nice example for a little AST transformation is an annotation that ensures that some preconditions are met when calling a method. We use the above defined annotation "Requires" and provide a boolean expression as a String. This expression is evaluated as the first statement in the method and if the condition is not met then an exception is thrown (making an assertion would be an alternative). So using this annotation would look like the following:

@Requires("divisor != 0")
public int divide10By(divisor) {
    10/divisor
}


And the AST transformation should then create the following code:


public int divide10By(divisor) {
    if( ! (divisor != 0) )
        throw new Exception('Precondition violated: {divisor != 0}')
    10/divisor
}


End
The idea is pretty simple, and even the implementation will not be complicated. First we have to create an AST for a static code into which we inject the boolean expression and part of the string. This AST has then to be added to the beginning of the method AST.

Nonetheless I find the functionality quite interesting and the implementation can be used as a basis to generate very interesting AST transformations. So stay tuned for the next blog entry where I will describe how to create the AST transformation itself.

Interesting Classes
java.lang.annotation.ElementType contains definitions for allowed target types
java.lang.annotation.RetentionPolicy contains definitions for the existing retention types.
package org.codehaus.groovy.ast contains the structural elements of the AST
package org.codehaus.groovy.ast.expr   contains every node type having to do with expressions
package org.codehaus.groovy.ast.stmt contains the node types creating statements

Wednesday, September 14, 2011

AST Transformations: Compiler Phases and Syntax Trees


A compiler is a strange beast and seen by many as one of the last domains of black magic. And in earlier times it definitely was one of the most complex fields in Computer Science. But today compilers are things to play with as easily as a graphics program (some would say graphics programs are more complex).

There are two basic tricks that have moved compilers from the area of immensely complex problems to something that can be understood without a degree in Rocket Science. The first is to divide the work in different phases that are executed sequentially. The second is to use a tree-based representation of the program that is to be translated. These syntax trees are an abstraction of the real input and are thus named AST (Abstract Syntax Tree).

If we want to create AST transformations we should understand both, the different compiler phases and the tree-based representation.

Phases
A compiler's work can be divided in three different stages, first of all the frontend stage in which the program is read and transformed into the tree-based representation, the second middle-end stage in which the tree then is transformed and optimized, and the third and final backend stage in which the tree is instrumented with the instructions (in our case the byte code) and then written out (to memory or disk).

In the case of a compiler that creates byte code for the Java Virtual Machine the second stage is not really that important because most of the optimization work is done by the JIT compiler. In Groovy this holds true as well. Groovy++ though is a bit of an exception; the static compiler for Groovy does much work in this second area.

Frontend
The Groovy compiler has nine phases. Six of these phases can be seen as part of the frontend stage; they focus on creating the full abstract syntax tree, check its correctness with regard to syntax and semantics  and complete it (e.g., add additional nodes where needed). These phases are:

  • Initialization
    In this phase the necessary files are opened, the needed resources are allocated and the environment is set up.
  • Parsing
    Here a first tree representation of the input data is created. This includes the lexical analysis, identifying words from the language and the parsing of these words according to the Groovy grammar.  This representation is still very much based on the source code i.e., a concrete representation or Concrete Syntax Tree (CST).
  • Conversion
    This phase takes the CST and transforms it to a more abstract representation, the Abstract Syntax Tree or AST.
  • Semantic Analysis
    In every language there are statements for which the normal grammar cannot decide whether or not they are correct. So in this phase the Groovy compiler, by hand, checks all these cases. This includes resolving classes, static imports and scope of variables. When this phase is complete the input is seen as valid source.
  • Canonicalization
    In this phase access from inner classes to the surrounding scope is resolved. Interestingly, the Groovy++ compiler does most of its work here. 
Which phase is suited best depends on what we want to do with the AST transformation. If we need information about local variables or need to access the already resolved classes, then "Semantic Analysis" is the earliest phase that we can use. "Canonicalization" could also be suitable in this case, because this phase could be seen as the phase representing the second middle-end stage. If you do not need the information provided by these phases or if you would like to introduce and use local variables without having to think about scope then the earlier "Conversion" phase would be the best choice. I personally prefer the "Semantic Analysis" phase.

Backend
The Backend stage has to create the bytecode representation from the AST. This again is done in different phases:

  • Instruction SelectionClass Generation
    These phases are also called "Class Generation 1" and "Class Generation 2". They are responsible for two major steps on the way to the final class. The first, and more important step (at least for the AST transformations) completes the AST and fills in everything which the programmer can so conveniently ignore (e.g. optional return statements). The second is the selection of the target bytecode set (which could be "Pre JDK5" or "Post JDK5" and following that the instrumentation of the AST with the bytecodes. Now that the AST is complete and instrumented with the correct instruction set the bytecode can be generated in memory.
  • Output
    The name is the game. This phase writes the bytecode that has been generated in the last phase to the output.
  • Finalization
    This phase is simply for housekeeping purposes. Allocated resources are freed, the environment is cleaned, files are closed etc., to guarantee that everything is in order when the compiler terminates. This is especially important since the compiler might not have been called from the command line, but instead e.g. inside an application server that tries to deliver a groovy resource through the template engine.
The default compiler phase when creating AST Transformations with the ASTBulder (which is absolutely preferrable to the alternatives) is "Class Generation", which is quite late in the process.  Furthermore, if you create only fragments of code with the AST Builder that you later want to introduce into another AST as part of your transformation there might be unwanted instructions that you do not get if you choose an earlier compiler phase. We will revisit the ASTBuilder in future blog entries.

End
Before you think that I know every last detail of the Groovy compiler, let me tell you that Jochen "Blackdrag" Theodorou helped me and corrected the detail information in this entry.

Now that you know about the different phases of the compiler and their respective use, stay tuned for the next entries in which we actually start to play with the compiler.

References
Jochens presentation of working with the compiler.
http://blackdragsview.blogspot.com/2006/11/chitchat-with-groovy-compiler.html

Some Interesting Classes (see Groovy sourcecode)
Groovy Grammar groovy/trunk/groovy/groovy-core/src/main/org/codehaus/groovy/antlr/groovy.g
InnerClasseVisitor org.codehaus.groovy.classgen.InnerClassVisitor
Selection of Instruction Set   org.codehaus.groovy.classgen.AsmClassGenerator
Adding Return Statements org.codehaus.groovy.classgen.ReturnAdder
Compiler Phases org.codehaus.groovy.control.Phases



Monday, September 12, 2011

There and Back Again

Four years without a blog entry isn't exactly a short tea break. But I took the time to focus on different things, which all in all, were very interesting and fulfilling.

But when last week I gave a presentation about the new features in Groovy 1.8, I got sidetracked talking about AST transformations, and in my youthful folly said something about writing about it in my blog.

So here we are.

After four years.

Well...

And whither then? I cannot say.