Sunday, October 2, 2011

AST Transformations: Creating a Complex AST Transformation

In this (quite long) blog entry we will examine a full implementation of an AST transformation using a closure as an annotation parameter. So be prepared to either be bored out of your mind or alternatively learn a bit about creating AST transformation. Your choice.

The Example Revisited
Until now we have used the AstBuilder to create an AST from a String containing the source code. But the AstBuilder offers two other methods buildFromCode() and buildFromSpec() that are equally interesting and have their respective advantages.

To demonstrate the use of the two methods we change our example a little bit. Until now we have used an annotation that took a String as an argument. We now want to provide a closure instead, which has the advantage that we can provided arbitrary code that results in a boolean expression. First of all we need the changed annotation. We define the type of the provided value (our closure) as Class. This way we can provide Closure objects without colliding with the limits regarding the allowed types of annotation values.

@Retention (RetentionPolicy.SOURCE)
@Target ([ElementType.METHOD])
@GroovyASTTransformationClass ("Requires2Transformation")
public @interface Requires2 {
    Class value() default {true};
}


Our annotated example looks still very much the same, only instead of a string we provide a closure as argument to the annotation. To show that we can access local variables inside the closure scope we use the field a for the comparison.

public class Requires2Example {
    def a = 0

    @Requires2({divisor != a})
    public int divide10By(divisor) {
        10/divisor
    }
}


The transformation should introduce code that checks for the precondition defined as a closure, and if it does not evaluate successfully, throw an exception:

    public int divide10By(divisor) {
        if(! {divisor != a}() )
            throw new Exception('Precondition violated')
        10/divisor
    }


The test looks nearly the same as in the original example.

class Requires2ClassloaderTest extends GroovyTestCase {

    public void testInvokeUnitTest() {
        def file = new File('./Requires2Example.groovy')
        assert file.exists()

        GroovyClassLoader invoker = new GroovyClassLoader()
        def clazz = invoker.parseClass(file)
        def out = clazz.newInstance()

        assert out.divide10By(2) == 5

        def exceptionMessage = shouldFail { out.divide10By(0) }
        assert exceptionMessage =~ /Precondition violated/
    }
}


The AST Transformation
Next is the transformation itself. Here we have more changes. As before we start with defensive programming and check the correct types for the arguments with which the method visit()has been called:

def annotationType = Requires2.class.name

private boolean checkNode(astNodes, annotationType) {
    if (! astNodes) return false
    if (! astNodes[0]) return false
    if (! astNodes[1]) return false
    if (!(astNodes[0] instanceof AnnotationNode)) return false
    if (! astNodes[0].classNode?.name == annotationType) return false
    if (!(astNodes[1] instanceof MethodNode)) return false
    true
}

public void visit(ASTNode[] astNodes, SourceUnit sourceUnit) {

    if (!checkNode(astNodes, annotationType)) {
        addError("Internal Error: wrong arguments", astNodes[0], sourceUnit)
        return
    }
    ...


After the initial check of the arguments we check for the correct type of the expression in the annotation, but now instead of hoping for a ConstantExpression ( the String) we now look for a ClosureExpression. If the type is not correct, we add an error to the error collector using the method addError() (see blog entry).

    MethodNode annotatedMethod = astNodes[1]
    def annotationExpression = astNodes[0].members.value

    if (annotationExpression.class != ClosureExpression) {
         addError("Internal Error: annotation doesn't contain closure", 

                 astNodes[0], sourceUnit)
        return
    }


The remaining part of the visit()-method is nearly the same as before. We call the method createStatements() that returns the if-statement as an AST and add that to the beginning of the AST of the annotated method.

    IfStatement block = createStatements(annotationExpression)

    def methodStatements = annotatedMethod.code.statements
    methodStatements.add(0, block)
}


In the first example we used the buildFromString()-method provided by the AstBuilder to create our AST (see blog entry). This time we want to test the other methods' usefulness to create the AST. Please remember that all AstBuilder-methods return an array of ASTNodes that contains as first element the AST describing the provided fragment, and as second element the whole script class that has been created from it. We normally simply access the first entry for our uses.

AstBuilder - buildFromCode()
This method is very interesting because it can only be called at compile time. If you examine the implementation of the method you only see an IllegalStateException being thrown. The trick here is that it has an AST transformation of its own (class AstBuilderTransformation), that, if it detects a call to this method at compile time, reroutes it to a private class AstBuilderInvocationTrap. This in turn identifies, from the given code, the associated source code, extracts that and calls the AstBuilder method buildFromString(). At first you might think this to be a bit convoluted, but in fact it is brilliant, because this way, the already partway compiled code is reevaluated in the context of the target class.

The following code fragment generates part of the AST that we need for the example, namely throwing the exception if the precondition is not met. Consider the following:

exception = ab.buildFromCode 
                { throw new Exception("Precondition violated in ${this.class}") }


The reference to the class will return the class in which the annotation is found, not our transformation class in which the code has been originally defined. This provides interesting opportunities with the caveat that our beforehand-knowledge of where the annotation might be used is limited at best.

The disadvantage of this approach though is that with buildFromCode() you cannot access the surrounding scope e.g., to add values from local variables to the generated code.

AstBuilder - buildFromSpec()
The third of the methods that AstBuilder offers to create ASTs provides a DSL (domain-specific language) to define your AST. There is no formal specification for the DSL, but the keywords are defined as method names in the class AstSpecificationCompiler (this is the delegate for the closure specifying the AST, and thus can implement the keywords).
Here is the example that we will use in our method createStatements() to create the AST representing the if-statement.

List<ASTNode> res = ab.buildFromSpec {
    ifStatement {
        booleanExpression {
            not {
                methodCall {
                    expression.add(closureAST)
                    constant "call"
                    argumentList {
                    }
                }
            }
        }
        block {
            expression.add(exception)
        }
        empty()
    }
}


Now examine the builder statements and compare them to the AST in the GroovyConsole (Script->Inspect AST)  using the example code of what the transformation should produce and you see the similarities. Two times I use the code expression.add(<local variable>) to insert ASTs held in the respective local variables. The AST in the variable exception for instance can be created using the call to buildFromCode() shown above. This way we can inject arbitrary sub-ASTs into the one we are just building using the DSL.

AstBuilder - Comparing the Provided Methods
We now have worked with each of the three methods that are provided by AstBuilder, and each of these methods has respective advantages and disadvantages. buildFromString() allows you to construct your code right before creating the AST using simple operations on the String representing the code, but this comes at the disadvantage that the compiler does not check the validity of the provided code beforehand. Thus you might introduce some subtle errors that can be difficult to trace. buildFromCode() allows you to facilitate your development environment and some parts of the compiler to ensure that you are provided with syntactically correct code, but has the disadvantage that you cannot access local variables or splice in some sub-AST. buildFromSpec() provides the functionality to do this, but is a little bit cumbersome. Thus, each of the methods has their merits and in every nontrivial case you will combine them.

The Source Code of The Closure
Now we have everything we need to create the transformation. The only thing that is less than perfect is the exception message we create using buildFromCode(). It is much more elegant to use buildFromString() as we did in our first example and use the source code of the closure expression provided in the annotation. Something like the following:

    def statements = "throw new Exception('Precondition violated: $source')"
    AstBuilder ab = new AstBuilder()
    BlockStatement exception
    exception = ab.buildFromString(CompilePhase.SEMANTIC_ANALYSIS, false, statements)[0]


Problem: We do not have the source of the closure. What we have is the sourceUnit, which contains the full source code that is associated with the compilation and we have information in the AST about the lines and columns in which the source code is found. This means, that, while not readily accessible, we can extract the source code from the source unit. The following code shows how this is done. It is derived from Hamlet D'Arcy's code in the class AstBuilderTransformation, where he has to solve the exact same problem.

private String convertToSource(ASTNode node, SourceUnit sourceUnit) {
    if (node == null) {
        addError("Error in convertToSource: node is null", node, sourceUnit)
        return ""
    }

    FileReaderSource sourceFileReader = sourceUnit.getSource()
    int first = node.getLineNumber()
    int last  = node.getLastLineNumber()
    StringBuilder result = new StringBuilder();

    for (int line in first..last) {
        String content = sourceFileReader.getLine(line, null);
        if (content == null) {
            addError("Error accessing source for closure.", node, sourceUnit);
            content = ""
        }
        if (line == last) content = content[0 .. node.getLastColumnNumber()-1]
        if (line == first) content = content[node.getColumnNumber()-1 .. -1]

        result.append(content).append('\n');
    }
    result.toString().trim()
}


The Final Solution
Here is the full implementation of the method createStatements(). We use both buildFromString() and buildFromSpec() to create our final AST and use the method convertToSource() to access the source code associated with the closure provided in the annotation. And this is it.

IfStatement createStatements(ClosureExpression closureAST, SourceUnit sourceUnit) {
    String source = convertToSource(closureAST, sourceUnit)
    def statements = "throw new Exception('Precondition violated: $source')"
    AstBuilder ab = new AstBuilder()
    BlockStatement exception 

    exception = ab.buildFromString(CompilePhase.SEMANTIC_ANALYSIS, false, statements)[0]

    List<ASTNode> res = ab.buildFromSpec {
        ifStatement {
            booleanExpression {
                not {
                    methodCall {
                        expression.add(closureAST)
                        constant "call"
                        argumentList {
                        }
                    }
                }
            }
            block {
                expression.add(exception)
            }
            empty()
        }
    }
    IfStatement is = res[0]
    return is
}


End 
Now we have covered all you need to easily create even complex AST transformations. There are still some dark and murky corners in the compiler and the AST generation, but using the tools that we have been provided with is better than a flashlight in the dark. So, have fun playing with the compiler.

Sourcecode
Requires2.groovyThe annotation for our example using a closure
Requires2Example.groovyThe example class using the annotation
Requires2ClassLoaderTest.groovy    The test using a Groovy Classloader
Requires2Transformation.groovy    The implementation of the AST transformation

Interesting Classes

org.codehaus.groovy.ast.builder.AstBuilderTransformation The global AST transformation that identifies buildFromCode() calls and creates the AST.
org.codehaus.groovy.ast.builder.AstSpecificationCompiler The helper class implementing the DSL for the AstBuilder


Sunday, September 25, 2011

AST Transformations: Testing and Error Messages

When testing AST transformations we have different potential ways to do it, with respective advantages and disadvantages. In this entry we will look at direct testing, testing using a GroovyClassLoader and how global AST transformations can be tested. We also examine the last missing puzzle piece to our jigsaw puzzle of building local AST transformations, namely how to notify the user of errors in the way the transformation is used.

Direct Testing
Implementing a test directly is the simplest way to test whether a transformation does what we expect. The one disadvantage of this kind of testing is that you cannot watch the compiler do its work i.e., you cannot debug through your AST transformation (at least not without additional effort). Why? When the test class is loaded and compiled, then, in the phase "Semantic Analysis" the referenced classes are resolved, loaded, and compiled. So, when the actual test run starts, the whole compilation we are interested in has already ended. Additionally, the compiled transformation and annotation have to be on the classpath. If you are still developing the AST transformation this is not the best way to go.

Nonethelesse it is ok to simply have this kind of test in your test suite to ensure that the AST transformation still works (as a regression test). Here is an example that verifies the behavior (and yes, it is not perfect in that it does not test all edge cases).

class RequiresDirectTest extends GroovyTestCase {

    public void testInvokeUnitTest() {
        def out = new RequiresExample()

        assert out.divide10By(2) == 5

        def exceptionMessage = shouldFail { out.divide10By(0) }
        assert exceptionMessage =~ /Precondition violated/
    }
}


Testing using a Classloader
The better way to test during development is by using a GroovyClassLoader that parses the source code. Since this includes compiling the code you have every option to debug your AST transformation and watch exactly what happens.

We start by either defining a String containing the source class, or by loading the containing file from the file system. I prefer the second way, because then you can edit the class with a normal editor, and you can use the direct testing in parallel.

Then we create a new GroovyClassLoader, parse the class and create a new instance, on which we again execute our tests. And, as said above, during parseClass() the compiler is called and your AST transformation is executed during the "normal" program run.

class RequiresClassloaderTest extends GroovyTestCase {

    public void testInvokeUnitTest() {
        def file = new File('./RequiresExample.groovy')
        assert file.exists()

        GroovyClassLoader invoker = new GroovyClassLoader()
        def clazz = invoker.parseClass(file)
        def out = clazz.newInstance()

        assert out.divide10By(2) == 5

        def exceptionMessage = shouldFail { out.divide10By(0) }
        assert exceptionMessage =~ /Precondition violated/
    }
}


Global AST transformations
Global AST transformations have to be placed in the class path, in a jar with an additional text file containing the class names of the classes implementing the transformations (see references). Luckily, Hamlet D'Arcy has written a helper class that creates a new GroovyClassLoader and adds the transformation.

The interesting thing is, that this also registers a local transformation correctly. There is one caveat though: the transformation now is registered as a local and a global transformation that thus gets triggered twice. One call is as expected (the call for the local transformation), the other call (as a global transformation) provides only the source unit and no AST nodes i.e., the first argument is null. If you followed the defensive programming style, then you already check whether the AST nodes are provided and stop processing if not. This means you can use the helper class as well, and you can find many examples on the web. The question is: Should you? My opinion is that it helps if you ultimately try to create a global transformation and want to start by implementing a local transformation. In all other cases I would prefer the class loader approach.

class RequiresTransformTestHelperTest extends GroovyTestCase {

    public void testInvokeUnitTest() {
        def file = new File('./RequiresExample.groovy')
        assert file.exists()

        def invoker = new TransformTestHelper(new RequiresTransformation(),
                                              CompilePhase.CONVERSION)

        def clazz = invoker.parse(file)
        def out = clazz.newInstance()

        assert out.divide10By(2) == 5
        def exceptionMessage = shouldFail { out.divide10By(0) }
        assert exceptionMessage =~ /Precondition violated/
    }
}


Signalling Errors to the Compiler
If everything works as expected our transformation transforms the AST. But if something goes wrong we should try to communicate our problems to the user. The problem: There is no printwriter and no logger that can be used (and it wouldn't be a good way to do this anyway).

We could throw an exception or even an error, but we shouldn't do this either. Now, if you look at many of the transformations provided by Groovy, you could get the impression that throwing exceptions and errors is good manners. But the point is that these transformation, being part of Groovy, have a totally different implicit contract with the compiler. When we build an AST transformation, and some internals of the compiler on which we depend changes, than this is not the fault of the compiler but ours. So, no exceptions or errors; the compiler should execute without disruption, collecting errors from the different phases and different transformations to give the user as complete a list of the errors as it could determine.

The compiler, in fact, offers us a functionality to add our errors to a list of errors and continue the processing. Since errors are specific to a source, the SourceUnit provided as an argument to the visit()-method has an error collector that we can use to do this. The error collector can be accessed using the method getErrorCollector(), and offers a method addErrorAndContinue() that we can use. It takes an object of the class SyntaxErrorMessage, which in turn needs a SyntaxException as argument. In the following method, which can be found in the LogASTTransformation (and which is a variation of another method found in the class ClassCodeVisitorSupport), all these objects are created and fed to the error collector.

So, why not steal from the best? Here is the method, slightly modified for readability

public void addError(String msg, ASTNode expr, SourceUnit source) {
    int line = expr.lineNumber
    int col = expr.columnNumber

    SyntaxException se = new SyntaxException(msg + '\n', line, col)
    SyntaxErrorMessage sem = new SyntaxErrorMessage(se, source)
    source.errorCollector.addErrorAndContinue(sem)
}


And you simply use it like this (using our example):


public void visit(ASTNode[] astNodes, SourceUnit sourceUnit) {

    if (!checkNode(astNodes, annotationType)) {
        addError("Internal error on annotation", astNodes[0], sourceUnit);
        return
    }
    ...
}


This presents an error message to the user with line and column and, if possible, the source line where we encountered the error. Since we used the first node in the array of nodes, we associate the error with the annotation itself (the second node would be the annotated element).

Finally we can fill in those parts of the AST transformation where we had a comment "better error handling" and signal those errors to the user.

End
Now you have everything you need to successfully create your own, interesting AST transformations. I'm looking forward to a lot of new and interesting ideas being realized as AST transformations.

There is one loose end we still have to cover, though. The AstBuilder provides two additional methods for creating an AST, buildFromCode() and buildFromSpec(), which we will revisit soon. So stay tuned ...

References
Global AST Transformations Groovy documentation on global AST transformations

Sourcecode
RequiresDirectTest.groovyThe direct test
RequiresClassloaderTest.groovy    The test using a Groovy Classloader
RequiresTransformTestHelperTest.groovy    The test using a TransformTestHelper

Interesting Classes
org.codehaus.groovy.tools.ast.TransformTestHelperThe helper class that can test global AST transformations
org.codehaus.groovy.transformLogASTTransformation  The transformation containing our addError()-method
org.codehaus.groovy.ast.ClassCodeVisitorSupportThe class containing the original addError()-method


Thursday, September 22, 2011

AST Transformations: The transformation itself

So far we have seen the different compile phases, the form that an annotation has to take to wire the AST transformation into the compiler process, had a look at the classes that create the AST structure and we have discussed the idea of a transformation that implements a precondition as an annotation. Now we get to the transformation itself.

Access to the AST
We know that the compiler, during its work, creates an abstract syntax tree that is the basis for the finally created bytecode. The compiler provides an ordered access by allowing us to register objects implementing the interface ASTTransformation for the execution in a specific phase of the compiler.

The idea is that for each phase a tree walker is executed that, for every node it encounters, calls the method visit() of the registered objects to add information to the nodes and / or change the respective subtree. For local AST transformations this method is always called with two arguments. The first argument is an ASTNode array containing an AnnotationNode that triggered the call and a node representing the source code element that has been annotated e.g., if you have annotated a method then the second element in that array is an object of type MethodNode. The second argument is the SourceUnit instance that represents the actual source code.

Interlude - Defensive Programming
Now, this is more or less the description of the informal contract between the compiler and your ASTTransformation. So nothing has to be done and we can directly start with modifying the abstract syntax tree. But wait, do you really trust the compiler? What if, through some mysterious ways (and I don't mean the X Files kind, but the "oh, someone upgraded the compiler to a new major version with different semantics for the interface" way) the compiler decides to call your transformation with some other arguments? Then your code fails and with it, the compiler as a whole, without giving any sensible information about what happened to the user.

In addition to that Hamlet D'Arcy pointed out "that the contract [between AST transformation and compiler] is enforced by unit tests, not the type system and not a language specification. Who knows what will change in Groovy 1.9 and 2.0. It is better to be defensive and fail *without causing an exception* because that will make the compilation proceed even if your transform is broken in the future."

So, if we are fooling around with the compiler, a defensive programming style is the first thing you should adopt. This means, that the following code,

def annotationType = Requires.class.name

private boolean checkNode(astNodes, annotationType) {
    if (! astNodes)    return false
    if (! astNodes[0]) return false
    if (! astNodes[1]) return false
    if (!(astNodes[0] instanceof AnnotationNode))        return false
    if (! astNodes[0].classNode?.name == annotationType) return false
    if (!(astNodes[1] instanceof MethodNode))            return false
    true
}

public void visit(ASTNode[] astNodes, SourceUnit sourceUnit) {
    if (!checkNode(astNodes, annotationType)) {
        // add an error message or a warning
        return
    }
    ...
}


while totally redundant and needless, still is a guard against the worst. Look up Byzantine faults if you truly want to become paranoid; this is actually a good thing if you are writing AST transformations (remember the old saying, that being paranoid does not mean that your are not being followed).

It is important to note, that in principle the visit()-method of your AST transformation could be called by the compiler in parallel for different parts of the source code that have been annotated with our annotation. While the current compiler is not parallelizing (at least with Groovy version until 1.8.2), you should be aware of this and not only program defensively, but allow for parallel execution as well.

The AST Transformation
Actually implementing the visit()-method is straightforward. First of all we verify that we have been called with the correct arguments, then we check that the value provided by the annotation is of the correct type (since we are using a simple String constant this will be a ConstantExpression), call a method createStatements() that creates the new AST subtree that we want to add at the beginning of the method, and insert the statements at the beginning of the subtree of statements that represents the method that has been annotated.

public void visit(ASTNode[] astNodes, SourceUnit sourceUnit) {
    if (!checkNode(astNodes, annotationType)) {
    
    // add an error message or a warning
        return
    }

    MethodNode annotatedMethod = astNodes[1]
    def annotationExpression = astNodes[0].members.value

    // add better error handling
    if (annotationExpression.class != ConstantExpression) return

    String annotationValueString = annotationExpression.value
    BlockStatement block = createStatements(annotationValueString)

    def methodStatements = annotatedMethod.code.statements
    methodStatements.add(0, block)
}


Piece of cake... If there wasn't the method createStatements(). In this method we could manually create the different node objects and assemble an AST representing the code that we want to add to the beginning of the method (see the previous blog entry). This works, but is tedious. Or...

Enter The ASTBuilder
The ASTBuilder has been written by Hamlet D'arcy expressly to easily create ASTs using a simple builder DSL. This is done using the method createFromSpec() that takes a closure (containing the DSL statements) as an argument. Additionally, the ASTBuilder provides a method buildFromCode() to create an AST directly from code expressed as a closure and a method buildFromString() that creates the AST from source code provided as a String. Interestingly the method buildFromCode() recreates a String representation from the supplied code and feeds that to the method buildFromString() to do its work.

Completing the Transformation
We will use the method buildFromString() to create our AST tree representing the if statement.


def createStatements(String clause) {
    def statements = """
        if(!($clause)) {
            throw new Exception('Precondition violated: {$clause}')
        }
    """


    AstBuilder ab = new AstBuilder()
     List<ASTNode> res = ab.buildFromString(CompilePhase.SEMANTIC_ANALYSIS, statements)
    BlockStatement bs = res[0]
    return bs
}


First of all we define a GString that contains the static parts of the code and into which we embed the precondition as the if-clause and in the message of the created exception. Then we use an ASTBuilder-object to call the buildFromString()-method on our GString and let it create the AST for the phase "Semantic Analysis" (as explained in the blog entry).
The AstBuilder returns a list of ASTNodes. The method buildFromString() offers an additional parameter statementsOnly (which is set to true by default). If this additional parameter is set to false, then the method, in addiition to the AST representing the statements, returns an AST representing the whole Script class surrounding the code fragment as the second entry in the list of ASTNodes. Regardless of the way you call it, the first entry is always the AST representing your statements. In our case this is a block statement that in turn contains our if statement.

Tying it all Together
The final step is to tell the compiler in which phase the AST transformation will run. We choose the same that we used for our AST builder, namely "Semantic Analysis". This is done with another annotation as follows:


@GroovyASTTransformation(phase = CompilePhase.SEMANTIC_ANALYSIS)
public class RequiresTransformation implements ASTTransformation {
...
}


End
What remains? Testing and better error signalling. Stay tuned for the next blog entries.

Sourcecode
Requires.groovyThe Requires-Annotation
RequiresTransformation.groovy    The AST Transformation for the Requires-Annotation

Interesting Classes
org.codehaus.groovy.ast.builder.AstBuilderThe Builder that allows to simply create ASTs.
org.codehaus.groovy.ast.builder.AstStringCompiler    The class responsible for creating ASTs from Strings.
org.codehaus.groovy.transform.ASTTransformation   The interface between compiler and transformation.

Saturday, September 17, 2011

AST Transformations: Prerequisites and Annotations

Before we are starting with the transformations themselves, we need to be able to execute them on the source code. Two ways exist to do this: global or local AST Transformations are possible. Since local transformations are much easier to test we will focus on these first.

Annotations
The usual way to execute a transformation on the source code locally is using an annotation that marks the part of the abstract syntax tree to be transformed. So the first step is to create an annotation class for our use. If you have never heard of annotations before, now would be a good time to take a short break and read up on them.

Annotations allow us to annotate (nomen est omen) a part of the source code, providing additional information or instructions to either compiler or runtime system. An annotation is defined using the keyword @interface (the normal Java keyword interface prefixed with an @-sign). You can add elements that can contain additional information. If you use only one element it should be named value, which allows to leave out the element name when using the annotation, thus providing an elegant shortcut for providing the information. Additionally you can provide a default value for every element in case it is not defined by the user.

The parts of the source code that can be annotated are themselves defined using an annotation @TARGET that can take as different values TYPE, FIELD, METHOD, PARAMETER, CONSTRUCTOR, LOCAL_VARIABLE, ANNOTATION TYPE and PACKAGE or combinations of them. The names speak for themselves.

You can decide how long the annotation is kept using another annotation @RETENTION that takes three values. If SOURCE is used, then the annotation is discarded after the compiler run, CLASS is kept until the class is loaded and only RUNTIME is kept until the program is actually executed. So if you start experimenting and try to read an annotation using reflection be sure to use RUNTIME as the retention type.

Now let us use this information to create a first simple annotation that takes a value, annotates methods and is only used during compile-time (after all this is where the AST will be changed).

@Retention (RetentionPolicy.SOURCE)
@Target ([ElementType.METHOD])
public @interface FirstAnnotation {
    String value() default "this is the default value";
}


Seems simple? It actually is.

So far this is simple Java. Now if we want to use this annotation to signal an AST transformation we have to somehow wire the annotation to that transformation. This is done using another, groovy-specific annotation named @GroovyASTTransformationClass. This annotation takes as a value the fully qualified class name of the class implementing the AST transformation as a String. Now, if we put this together we get something like the following:


@Retention (RetentionPolicy.SOURCE)
@Target ([ElementType.METHOD])
@GroovyASTTransformationClass ("RequiresTransformation")
public @interface Requires {
String value() default "true";
}


If you get the distinct feeling that this might be part of the actual later example, you might not be wrong. Let me add, though, that for the sake of simplicity the whole example is put into the default package, which you probably shouldn't do at home.

Short Interlude - AST Structure
The structure of the AST can be arbitrarily complex (which is quite natural if you remember that it is a representation of the source code you just fed the compiler). I could now start to discuss all the gory details and walk you through the grammar and everything tied to it. But that is tedious. Instead, first of all, look at a few Groovy packages using your favorite IDE.

The first package is org.codehaus.groovy.ast, in which you find the structural elements that reflect the class level structure of your source code. FieldNode, MethodNode, Parameter, ConstructorNode ... You can compare these names to the possible @TARGET values above. The different values explicitly target these nodes, allowing you to focus on substructures instead of the AST as a whole.


Additionally to the structural elements we need the actual expressions and statements, which are built from the node type found in the packages org.codehaus.groovy.ast.expr and org.codehaus.groovy.ast.stmt (e.g. BooleanExpression in the first or IfStatement in the second package).


Please note that the screenshots of the different packages do not show the whole package, only a few entries are shown.

This all seems a bit theoretical, but now fire up your GroovyConsole. Enter the following short program:

def val = 1


and execute it (this is a workaround for a bug in Groovyconsole that otherwise renders the AST browser nonfunctional). As the next step choose the menu "Script  -> Inspect AST" and voilá, you can examine the live AST of the source code you entered. In fact you get two ASTs, the first being only the source code you entered, the second being the full Script class that is automatically generated from groovy scripts. Addtionally you can examine the AST in different phases (choose from the Dropdown menu). By default the compiler phase is "Instruction Selection".


For our purposes "Semantic Analysis" is the more interesting one.




If you compare the ASTs of the two phases shown above, you can see that, as described in the previous blog entry, the ReturnAdder inserts the needed return statements in the "Instruction Selection" phase. Now try an if-statement, throw an exception and define a method in the Groovy Console and view the respective ASTs to get a first understanding of how the AST is structured.

The AST Browser built into the Groovy Console is an indispensable tool if you want to understand how the AST is built, and I quite often use it when experimenting with AST transformations.

Transformation - The Example 
A nice example for a little AST transformation is an annotation that ensures that some preconditions are met when calling a method. We use the above defined annotation "Requires" and provide a boolean expression as a String. This expression is evaluated as the first statement in the method and if the condition is not met then an exception is thrown (making an assertion would be an alternative). So using this annotation would look like the following:

@Requires("divisor != 0")
public int divide10By(divisor) {
    10/divisor
}


And the AST transformation should then create the following code:


public int divide10By(divisor) {
    if( ! (divisor != 0) )
        throw new Exception('Precondition violated: {divisor != 0}')
    10/divisor
}


End
The idea is pretty simple, and even the implementation will not be complicated. First we have to create an AST for a static code into which we inject the boolean expression and part of the string. This AST has then to be added to the beginning of the method AST.

Nonetheless I find the functionality quite interesting and the implementation can be used as a basis to generate very interesting AST transformations. So stay tuned for the next blog entry where I will describe how to create the AST transformation itself.

Interesting Classes
java.lang.annotation.ElementType contains definitions for allowed target types
java.lang.annotation.RetentionPolicy contains definitions for the existing retention types.
package org.codehaus.groovy.ast contains the structural elements of the AST
package org.codehaus.groovy.ast.expr   contains every node type having to do with expressions
package org.codehaus.groovy.ast.stmt contains the node types creating statements

Wednesday, September 14, 2011

AST Transformations: Compiler Phases and Syntax Trees


A compiler is a strange beast and seen by many as one of the last domains of black magic. And in earlier times it definitely was one of the most complex fields in Computer Science. But today compilers are things to play with as easily as a graphics program (some would say graphics programs are more complex).

There are two basic tricks that have moved compilers from the area of immensely complex problems to something that can be understood without a degree in Rocket Science. The first is to divide the work in different phases that are executed sequentially. The second is to use a tree-based representation of the program that is to be translated. These syntax trees are an abstraction of the real input and are thus named AST (Abstract Syntax Tree).

If we want to create AST transformations we should understand both, the different compiler phases and the tree-based representation.

Phases
A compiler's work can be divided in three different stages, first of all the frontend stage in which the program is read and transformed into the tree-based representation, the second middle-end stage in which the tree then is transformed and optimized, and the third and final backend stage in which the tree is instrumented with the instructions (in our case the byte code) and then written out (to memory or disk).

In the case of a compiler that creates byte code for the Java Virtual Machine the second stage is not really that important because most of the optimization work is done by the JIT compiler. In Groovy this holds true as well. Groovy++ though is a bit of an exception; the static compiler for Groovy does much work in this second area.

Frontend
The Groovy compiler has nine phases. Six of these phases can be seen as part of the frontend stage; they focus on creating the full abstract syntax tree, check its correctness with regard to syntax and semantics  and complete it (e.g., add additional nodes where needed). These phases are:

  • Initialization
    In this phase the necessary files are opened, the needed resources are allocated and the environment is set up.
  • Parsing
    Here a first tree representation of the input data is created. This includes the lexical analysis, identifying words from the language and the parsing of these words according to the Groovy grammar.  This representation is still very much based on the source code i.e., a concrete representation or Concrete Syntax Tree (CST).
  • Conversion
    This phase takes the CST and transforms it to a more abstract representation, the Abstract Syntax Tree or AST.
  • Semantic Analysis
    In every language there are statements for which the normal grammar cannot decide whether or not they are correct. So in this phase the Groovy compiler, by hand, checks all these cases. This includes resolving classes, static imports and scope of variables. When this phase is complete the input is seen as valid source.
  • Canonicalization
    In this phase access from inner classes to the surrounding scope is resolved. Interestingly, the Groovy++ compiler does most of its work here. 
Which phase is suited best depends on what we want to do with the AST transformation. If we need information about local variables or need to access the already resolved classes, then "Semantic Analysis" is the earliest phase that we can use. "Canonicalization" could also be suitable in this case, because this phase could be seen as the phase representing the second middle-end stage. If you do not need the information provided by these phases or if you would like to introduce and use local variables without having to think about scope then the earlier "Conversion" phase would be the best choice. I personally prefer the "Semantic Analysis" phase.

Backend
The Backend stage has to create the bytecode representation from the AST. This again is done in different phases:

  • Instruction SelectionClass Generation
    These phases are also called "Class Generation 1" and "Class Generation 2". They are responsible for two major steps on the way to the final class. The first, and more important step (at least for the AST transformations) completes the AST and fills in everything which the programmer can so conveniently ignore (e.g. optional return statements). The second is the selection of the target bytecode set (which could be "Pre JDK5" or "Post JDK5" and following that the instrumentation of the AST with the bytecodes. Now that the AST is complete and instrumented with the correct instruction set the bytecode can be generated in memory.
  • Output
    The name is the game. This phase writes the bytecode that has been generated in the last phase to the output.
  • Finalization
    This phase is simply for housekeeping purposes. Allocated resources are freed, the environment is cleaned, files are closed etc., to guarantee that everything is in order when the compiler terminates. This is especially important since the compiler might not have been called from the command line, but instead e.g. inside an application server that tries to deliver a groovy resource through the template engine.
The default compiler phase when creating AST Transformations with the ASTBulder (which is absolutely preferrable to the alternatives) is "Class Generation", which is quite late in the process.  Furthermore, if you create only fragments of code with the AST Builder that you later want to introduce into another AST as part of your transformation there might be unwanted instructions that you do not get if you choose an earlier compiler phase. We will revisit the ASTBuilder in future blog entries.

End
Before you think that I know every last detail of the Groovy compiler, let me tell you that Jochen "Blackdrag" Theodorou helped me and corrected the detail information in this entry.

Now that you know about the different phases of the compiler and their respective use, stay tuned for the next entries in which we actually start to play with the compiler.

References
Jochens presentation of working with the compiler.
http://blackdragsview.blogspot.com/2006/11/chitchat-with-groovy-compiler.html

Some Interesting Classes (see Groovy sourcecode)
Groovy Grammar groovy/trunk/groovy/groovy-core/src/main/org/codehaus/groovy/antlr/groovy.g
InnerClasseVisitor org.codehaus.groovy.classgen.InnerClassVisitor
Selection of Instruction Set   org.codehaus.groovy.classgen.AsmClassGenerator
Adding Return Statements org.codehaus.groovy.classgen.ReturnAdder
Compiler Phases org.codehaus.groovy.control.Phases



Monday, September 12, 2011

There and Back Again

Four years without a blog entry isn't exactly a short tea break. But I took the time to focus on different things, which all in all, were very interesting and fulfilling.

But when last week I gave a presentation about the new features in Groovy 1.8, I got sidetracked talking about AST transformations, and in my youthful folly said something about writing about it in my blog.

So here we are.

After four years.

Well...

And whither then? I cannot say.


Saturday, September 15, 2007

... And miles to go before I sleep.

Domain-specific languages are quite important these days, and Groovy is suited extremely well for implementing them.
A DSL is a language that allows expressions in a domain-specific manner (you could be more precise and more theoretical, but let us be pragmatic). Examples for DSLs are music scores, chess moves, or expressions regarding time and duration or space and distance.

A Domain-Specific Language

Let us take the last example, and think about possible uses of working with distances. What we would like is something like this:
3.m + 2.yd + 2.ml - 1.km
Or like this:
3.mi > 3.km

And a notation to simply convert distances to different units would be nice:
3.mi.km
meaning 3 miles as kilometers.

The interesting point is that with Groovy, this is quite simple to implement.

The Plumbing

We need support for the notation .unit, we need support for plus() and minus() operations, and we have to support comparisons between different distances. This involves converting from one unit to others. And we need a Distance and a Unit type. Before you say, oh God, way too much, let me tell you that the total number of lines to implement this DSL is less than 90 lines of source code.

Let us start with the Distance type. We need a reference to a Unit object and a length, an implementation of the methods plus() and minus(), an implementation of the methode compareTo() (which implies that Distance has to implement the interface Comparable) and a toString()-method. The conversion of distances in different units is simply delegated to the Unit class, since that class should know best.
class Distance implements Comparable {
BigDecimal length
Unit unit

Distance plus(Distance operand) {
def newLength = this.length + Unit.convertUnit(operand, this.unit)
new Distance(length : newLength, unit : this.unit)
}
Distance minus(Distance operand) {
def newLength = this.length - Unit.convertUnit(operand, this.unit)
new Distance(length : newLength, unit : this.unit)
}
int compareTo(other) {
if(this.unit == other.unit)
return this.length <=> other.length
return this.length <=> Unit.convertUnit(other, this.unit)
}
String toString() {
"$length $unit.name"
}
}

Nothing special here.

Now our Unit class should contain some predefined Unit objects representing the supported units. We have already seen that the Unit object needs a name. Furthermore it gets a ratio compared to the meter.
In principle the ratio would be a single value and we could compute different unit simply by dividing and multiplying in the correct order. But we have the problem that values are represented with finite precision. So to get our conversion as precise as possible we define a table that allows direct conversion with only one operation instead of two or three.

The ratio now is the row index in the table. To convert between different unit in principle would simply mean to access the ratio table for the coefficient and then compute the conversion. Since the conversion coefficients for different directions of the same unit combinations (e.g. m -> mi and mi -> m) are reciprocal, the table contains the coefficient only for one direction and a 0 for the other direction. If we access the table and get a 0, we do the inverse operation with the other coefficient.
There might be better and easier to extend ways to implement the conversion, but this one is fairly simple.

class Unit {
def ratio
String name

static def convertUnit(Distance d, Unit newUnit) {
def factor = ratioTable[d.unit.ratio][newUnit.ratio]
if(factor)
return d.length * factor
else
return d.length / ratioTable[newUnit.ratio][d.unit.ratio]
}
static ratioTable = [
// mm, cm, m, km, y, mi
[ 1, 0, 0, 0, 0, 0 ], // mm
[ 10, 1, 0, 0, 0, 0 ], // cm
[ 1e3, 1e2, 1, 0, 0, 0 ], // m
[ 1e6, 1e5, 1e3, 1, 0, 0 ], // km
[ 914.4, 91.44, 0.9144, 0.9144e-3, 1, 0 ], // yd
[ 1.609344e6, 1.609344e5, 1.609344e3, 1.609344, 1760, 1 ], // mi
]

public static final mm = new Unit(ratio : 0, name : "millimeter")
public static final cm = new Unit(ratio : 1, name : "centimeter")
public static final m = new Unit(ratio : 2, name : "meter")
public static final km = new Unit(ratio : 3, name : "kilometer")
public static final yd = new Unit(ratio : 4, name : "yard")
public static final mi = new Unit(ratio : 5, name : "mile(s)")
}

Finally we define constants for all the units we support (mm, cm, m, km, yd, mi).

Now we have everything we need as plumbing. We can add and subtract different Distance objects using the + and - operator, we can compare distances with the <, > and == operators and we can convert different distances.

Creating the DSL

To create our DSL we use a category. For each of the supported units we implement the method get() for the types Number and Distance. Be defining the method for Number objects we can use the notation "3.2.mi", by defining the method for Distance objects as well we can convert on the fly using the notation "3.2.mi.km". The respective implementations are straightforward.

class DistanceCategory {
static Distance getMm(Number n) { new Distance(length : n, unit : Unit.mm) }
static Distance getMm(Distance d) {
new Distance(length : Unit.convertUnit(d, Unit.mm), unit : Unit.mm)
}
... (repeat for cm, m, km, yd, mi)
}

And that's it. We could also try to generalize the get()-methods, but for such a simple DSL this is not worth the time.

Using the DSL

Using the DSL is as simple as using any other category:
use(DistanceCategory.class) {
def d1 = 1.m
def d2 = 1.yd
def d3 = 1760.yd
def d4 = 100.cm
println d1 + 1.yd
println 1.yd + 1.mi
println 1.m - 1.yd
println d2.m
println d3.mi
println d4.m
println 1000.yd.km
println 1000.yd
The result of the different println statements is the following:
1.9144 meter
1761 yard
0.0856 meter
0.9144 meter
1 mile(s)
1 meter
0.9144000 kilometer
true
Cool, isn't it? In less than 90 lines of source code we have a full DSL for describing distances, for adding and subtracting them, and for comparing them.

Next steps could be to allow multiplication and division as well, but then it gets interesting. Imagine the product of 1.m, 2.yd and 3.mi.

Credits

I took the idea for this DSL from a talk given by Bernd Schiffer at the JFS 2007. Bernd, I hope you don't mind.

This blog's title is the last line of a poem by Robert Frost (in case you wondered).