MetaCC stands for Meta Compiler Construction framework. The aim of this framework is to aid in various stages of programming language procesing - that is tokenizing, parsing, semantic analysis, code generation etc. For now it is an annotation based lexer and parser generator suitable for processing wide range of languages, from small domain specific languages to big ones like Java. Actions may be written directly as annotated java methods.
A short example below shows how simple calculator may be written using MetaCC framework:
public class Calculator { public Calculator() { // call lexer and parser building from annotations, omitted here } @PRuleAn(lhs="start", rhs="addExp") // annotations anchored on dummy fields may be as well written at class level // but they are here just for the grammar clarity public int dummy1; @PRuleAn(lhs="addExp", rhs="(addExp '+' mulExp)") public static BigDecimal onPlus(PEngine parser, PBnfRule rule) { BigDecimal left = (BigDecimal) parser.getRhsValue(0); BigDecimal right = (BigDecimal) parser.getRhsValue(2); return left.add(right); } @PRuleAn(lhs="addExp", rhs="(addExp '-' mulExp)") public static BigDecimal onMinus(PEngine parser, PBnfRule rule) { BigDecimal left = (BigDecimal) parser.getRhsValue(0); BigDecimal right = (BigDecimal) parser.getRhsValue(2); return left.subtract(right).stripTrailingZeros(); } @PRuleAn(lhs="addExp", rhs="mulExp") public int dummy2; @PRuleAn(lhs="mulExp", rhs="(mulExp '*' topExp)") public static BigDecimal onMul(PEngine parser, PBnfRule rule) { BigDecimal left = (BigDecimal) parser.getRhsValue(0); BigDecimal right = (BigDecimal) parser.getRhsValue(2); return left.multiply(right); } @PRuleAn(lhs="mulExp", rhs="(mulExp '/' topExp)") public static BigDecimal onDiv(PEngine parser, PBnfRule rule) { BigDecimal left = (BigDecimal) parser.getRhsValue(0); BigDecimal right = (BigDecimal) parser.getRhsValue(2); return left.divide(right, 10, RoundingMode.HALF_UP).stripTrailingZeros(); } @PRuleAn(lhs="topExp", rhs="'number'") public int dummy3; @PRuleAn(lhs="topExp", rhs="('(' addExp ')')") public static Object onParenthesis(PEngine parser, PBnfRule rule) { return parser.getRhsValue(1); } @LHelperAn(lhs = "decimalDigit", rhs = "['0'-'9']") @LRuleAn(startCtx = "INIT", lhs = "whiteSpace", rhs = "[' ' '\t' '\f' '\r' '\n']+") public int dummy4; @LRuleAn(startCtx="INIT", lhs="number", rhs="(decimalDigit+ ('.' decimalDigit*)? (['e' 'E'] ['+' '-']? decimalDigit+)?)") public static void onNumber(LEngine lexer, LRule rule) { lexer.setTokenType(rule.getLhs()); lexer.setTokenValue(new BigDecimal(lexer.getBuf().toString()).stripTrailingZeros()); } }See also examples (provided with distribution) of expression language and Java 5 grammar.
MetaCC itself is implemented in Java and must be run in JVM but of course language processed is not limited to Java.
The ultimate goal of MetaCC framework is to make a programming language (whatever one is processed by the framework) extendable. Extendable means that the language itself or a programming environment can influence language tokenizing, parsing, semantic analysis and other phases of its processing. The influence can manifest in example in adding or modifying language constructions on the fly, during compile time.
To reach this goal rules for tokenizing, parsing, code generation cannot be static or precompiled by some other tool, but instead they must be dynamic and embeeddable into the language. Rich metainformation (that is information about compiler environment) must be available and configurable by the programming environment.
For now lexer and parser implementations are ready. Lexer is based on DFA (Deterministic Finite Automata) algorithm and parser uses LALR(1) (Look Ahead Left to Right) algorithm. Additionally parser is able to do "forked parsing" which allows to parse ambiguous (non LALR(1)) grammars. Parser stack is "forked" in the place where shift/reduce or reduce/reduce conflict occurs and both alternatives are parsed "in parallel". One of them eventually ends with error and is silently removed. In order to be ready to work parser (or lexer) must be feeded with appropiate rules, from which internal tables are computed on the fly. Rules comes in form of Java classes (or annotations) with appropiate contents. Different implementations of such rules are possible - in example the simplest with fields set in a constructor, implementation which read rules from a string, implementation which reads from a file, XML file etc. Because internal tables are computed on the fly they may be easily changed or extended at run time. Implementations or Java 5 lexer and parser are provided as an example. Consult javadoc documentation, source code and examples too see framework in action.
MetaCC may evolve into something similar to GCC compiler. GCC consists of various frontends (C, C++, Java and many more) which parse and do semantic check of source code. Then intermediate representation is created. This representation is common for each language and may be processed and optimized in an uniform way. From intermediate representation code for various target architectures is generated (Intel, SPARC, etc). Similarly MetaCC may have in the future various extendable frontends and common intermediate representation which compiles to JVM in the backend. It's a very complicated task so as well the framework evolution may stop at present stage when it contains lexer and parser generators (that is frontend part) and is suitable rather for processing of embeeded languages.
Many compiler construction frameworks exists, among them very popular are ANTLR, SableCC or JavaCC but they are rather static in nature. They are able to produce lexers, parsers, tree walkers based on given rules but they are static and cannot be dynamically modified after generation and in run time.
TODO
- attribute grammar system (or visitor pattern) for abstract syntax tree processing
- error recovery
- so called "dynamic programming" for code generation
- libraries for common tasks like name analysis, type analysis
- event logging and broadcast
- more examples
Examples where dynamic compiler construction framework can be used
- Abstracting of a common language patterns
- In Java common pattern of a stream usage is as follows:
InputStram is = new FileInputStram(...); try { // do something with a stream } finally { if(is != null) try { is.close(); } catch(Exception e) {} }
Why not to have a language construct which allows to write this pattern less verbose:with(InputStream is = new FileInputStream(...)) { // do something with a stream }
In a dynamic environment it's possible to add new syntax rules which translates an introduced construction into "known" ones. We don't have such an extendable Java parser but maybe it will be available one day. - Extended "for" loop. Finally it was introduced in Java 5 but it took well over a year from the idea, through community acceptance, implementation to a point where it could be used. If we have an extendable compiler environment anyone could experiment with similar constructions. Some of them then would be incorporated into language standard. That way CLOS (Common LISP Object System) was implemented as LISP macros without need of modification of a LISP compiler - immediately ready to try. Eventually it becamed a standard.
- In Java common pattern of a stream usage is as follows:
- Listening to a language process events and translating them to some useful code.
In example if data is kept in a list somebody can filter it down by (pseudo code):
boolean accept(Customer c) { return c.getAge() > 18 && c.getCity().startsWith("B"); } List filtered = mylist.filter(accept);
Eventually data may be kept in a database. The construction above might be translated under the covers into a SQL query and executed, returning result without code change. - small new languages for special purposes (ie. logical reasoning, mathematical computations) which compiles to JVM and easily cooperates with existing languages.
- embeeded languages - (ie. business rules scripting in a bigger aplication, web application control flow rules).
MetaCC is available on LGPL license.
Java is a registered trademark of Sun Microsystems. Other names may be trademarks of their respective owners.