interim Koopa logo

The Koopa Cobol Parser Generator

Koopa is a Cobol parser generator with a plan for growth. It is able to handle Cobol source files (fixed and free format) in isolation (no preprocessing required) and accepts CICS/SQL fragments. Due to its design it is easily extensible in a way which limits the impact on the overall project. It achieves this by means of a custom DSL for specifying Cobol island grammars in a concise way, and through a unit testing framework for such grammars which aids in rapid and accurate fault detection. This is complemented with support for handling of the generated syntax trees in a structure-shy manner.

Testsuite coverage

To give you an idea of the extent of the current implementation of the parser, the graph below shows Koopa's coverage of the Cobol 85 testsuite. This testsuite is publicly available from here, but is also included with the source code. The horizontal axis represents the 513 test files, including both source files and copy books. The vertical axis sets out the coverage for that file.

Koopa's coverage of the Cobol 85 testsuite.

For clarity: the height of each bar represents the number of tokens found in the source file. The green part represents the tokens which were categorised by Koopa. The part in blue represents the water, or the tokens which were skipped. A red bar means that that file failed to parse, in which case we don't know the exact number of tokens. So in that case the height of that bar is not representative for the size of the source file.

For revision 97 the numbers are: 493 out of 510 files parse successfully (or about 96.7%), and of those 493 Koopa is able to categorise 98.92% of all tokens.

End-user application

Overview of parse results in the Koopa application.Source code view with syntax highlighting.XPath querying of the syntax tree.Exploration of the Cobol grammar.

The GUI allows you to parse a batch of Cobol source files. It reports the success or failure of doing so, showing detailed errors and warnings. You get the option of exporting the parse results to CSV, which is how we got the graph above.

Koopa also provides a Cobol code visualiser. This application takes the parse tree and uses the information found in it to do syntax highlighting, as well as setting up an outline of the structure of the parsed file. You can also export the syntax tree to XML.

The syntax tree can be exposed to an XPath engine, allowing you to query by means of XPath expressions. The code visualiser integrates this functionality, so you can do interactive queries and inspect the result directly in the source code.

You can also navigate straight from the source code to the relevant rule of the Cobol grammar. A breadcrumb trail through the grammar rules is shown for the current selection in the source view. Clicking any part of that breadcrumb trail will let you browse the Cobol grammar for that grammar rule.

It is also possible to invoke Koopa from the command line. Right now there is only a single target for parsing Cobol files and dumping the syntax tree to XML.

For maximum flexibility you can, of course, interact directly with Koopa's code. There is support for partial ANTLR grammars, as well as XPath expressions for processing and querying the syntax trees. Or you can simply capture the raw trees and process them however you want.

Download

Ready-to-run executables are available on the project's files page. As long as you have a Java 1.5+ environment nothing else is needed. This is the fastest way to give Koopa a try.

All code for this project is available through the Subversion repository hosted by SourceForge.net. Detailed instructions for accessing it can be found on the project's developer page.

The code is offered under a BSD licensing scheme.

Documentation

If you're interested in a high-level overview of the concepts and structure of Koopa there is a guide [PDF] which provides this. I would recommend reading this if you're considering using Koopa in your own projects. It should help you find your way in its structure.

For some examples of the possible XPath queries, see here.

Support

Please check the FAQ first. If you don't find an answer there then feel free to contact the project administrators of this project through the project's summary page.

Publications

PDF iconAndy Kellens, Kris De Schutter, Theo D'Hondt, Luc Jorissen, Bart Van Passel; Cognac: a framework for documenting and verifying the design of Cobol systems; Conference on Software Maintenance and Reengineering (CSMR), 2009.

This paper has some details on the philosophy behind Koopa. The technical details, however, are somewhat out of date as it concerns a previous version of Koopa. The main difference lies with the use of a custom DSL for defining the island grammars in the current version. The version in the paper made use of standard ANTLR. I moved away from a pure ANTLR approach as it turned out too be difficult to evolve island grammars in this format.

Contact

Feel free to contact the project administrators of this project through the project's summary page.