Welcome to MetaCC framework


Home
Project page
Download

MetaCC stands for Meta Compiler Construction framework. The aim of this framework is to aid in various stages of programming language procesing - that is tokenizing, parsing, semantic analysis, code generation etc. For now it is an annotation based lexer and parser generator suitable for processing wide range of languages, from small domain specific languages to big ones like Java. Actions may be written directly as annotated java methods.

A short example below shows how simple calculator may be written using MetaCC framework:

public class Calculator
{
	public Calculator()
	{
		// call lexer and parser building from annotations, omitted here
	}
  
	@PRuleAn(lhs="start", rhs="addExp")
	// annotations anchored on dummy fields may be as well written at class level
	// but they are here just for the grammar clarity
	public int dummy1;
	
	@PRuleAn(lhs="addExp", rhs="(addExp '+' mulExp)")
	public static BigDecimal onPlus(PEngine parser, PBnfRule rule)
	{
		BigDecimal left = (BigDecimal) parser.getRhsValue(0);
		BigDecimal right = (BigDecimal) parser.getRhsValue(2);
		return left.add(right);		
	}
	
	@PRuleAn(lhs="addExp", rhs="(addExp '-' mulExp)")
	public static BigDecimal onMinus(PEngine parser, PBnfRule rule)
	{
		BigDecimal left = (BigDecimal) parser.getRhsValue(0);
		BigDecimal right = (BigDecimal) parser.getRhsValue(2);
		return left.subtract(right).stripTrailingZeros();
	}
	
	@PRuleAn(lhs="addExp", rhs="mulExp")
	public int dummy2;
	
	@PRuleAn(lhs="mulExp", rhs="(mulExp '*' topExp)")
	public static BigDecimal onMul(PEngine parser, PBnfRule rule)
	{
		BigDecimal left = (BigDecimal) parser.getRhsValue(0);
		BigDecimal right = (BigDecimal) parser.getRhsValue(2);
		return left.multiply(right);
	}
	
	@PRuleAn(lhs="mulExp", rhs="(mulExp '/' topExp)")
	public static BigDecimal onDiv(PEngine parser, PBnfRule rule)
	{
		BigDecimal left = (BigDecimal) parser.getRhsValue(0);
		BigDecimal right = (BigDecimal) parser.getRhsValue(2);
		return left.divide(right, 10, RoundingMode.HALF_UP).stripTrailingZeros();
	}
	
	@PRuleAn(lhs="topExp", rhs="'number'")
	public int dummy3;
	
	@PRuleAn(lhs="topExp", rhs="('(' addExp ')')")
	public static Object onParenthesis(PEngine parser, PBnfRule rule)
	{
		return parser.getRhsValue(1);
	}
	
	@LHelperAn(lhs = "decimalDigit", rhs = "['0'-'9']")
	@LRuleAn(startCtx = "INIT", lhs = "whiteSpace", rhs = "[' ' '\t' '\f' '\r' '\n']+")
	public int dummy4;
	
	@LRuleAn(startCtx="INIT", lhs="number", rhs="(decimalDigit+ ('.' decimalDigit*)? (['e' 'E'] ['+' '-']? decimalDigit+)?)")
	public static void onNumber(LEngine lexer, LRule rule)
	{
		lexer.setTokenType(rule.getLhs());		
		lexer.setTokenValue(new BigDecimal(lexer.getBuf().toString()).stripTrailingZeros());
	}
}
See also examples (provided with distribution) of expression language and Java 5 grammar.

MetaCC itself is implemented in Java and must be run in JVM but of course language processed is not limited to Java.

The ultimate goal of MetaCC framework is to make a programming language (whatever one is processed by the framework) extendable. Extendable means that the language itself or a programming environment can influence language tokenizing, parsing, semantic analysis and other phases of its processing. The influence can manifest in example in adding or modifying language constructions on the fly, during compile time.

To reach this goal rules for tokenizing, parsing, code generation cannot be static or precompiled by some other tool, but instead they must be dynamic and embeeddable into the language. Rich metainformation (that is information about compiler environment) must be available and configurable by the programming environment.

For now lexer and parser implementations are ready. Lexer is based on DFA (Deterministic Finite Automata) algorithm and parser uses LALR(1) (Look Ahead Left to Right) algorithm. Additionally parser is able to do "forked parsing" which allows to parse ambiguous (non LALR(1)) grammars. Parser stack is "forked" in the place where shift/reduce or reduce/reduce conflict occurs and both alternatives are parsed "in parallel". One of them eventually ends with error and is silently removed. In order to be ready to work parser (or lexer) must be feeded with appropiate rules, from which internal tables are computed on the fly. Rules comes in form of Java classes (or annotations) with appropiate contents. Different implementations of such rules are possible - in example the simplest with fields set in a constructor, implementation which read rules from a string, implementation which reads from a file, XML file etc. Because internal tables are computed on the fly they may be easily changed or extended at run time. Implementations or Java 5 lexer and parser are provided as an example. Consult javadoc documentation, source code and examples too see framework in action.

MetaCC may evolve into something similar to GCC compiler. GCC consists of various frontends (C, C++, Java and many more) which parse and do semantic check of source code. Then intermediate representation is created. This representation is common for each language and may be processed and optimized in an uniform way. From intermediate representation code for various target architectures is generated (Intel, SPARC, etc). Similarly MetaCC may have in the future various extendable frontends and common intermediate representation which compiles to JVM in the backend. It's a very complicated task so as well the framework evolution may stop at present stage when it contains lexer and parser generators (that is frontend part) and is suitable rather for processing of embeeded languages.

Many compiler construction frameworks exists, among them very popular are ANTLR, SableCC or JavaCC but they are rather static in nature. They are able to produce lexers, parsers, tree walkers based on given rules but they are static and cannot be dynamically modified after generation and in run time.

TODO

Examples where dynamic compiler construction framework can be used

MetaCC is available on LGPL license.

Java is a registered trademark of Sun Microsystems. Other names may be trademarks of their respective owners.