OpenToken Package Readme

Version 1.3.1


The OpenToken package is a facility for performing token analysis within the Ada language. It is designed to provide all the functionality of a traditional lexical analyzer generator, such as lex. But due to the magic of inheritance and runtime polymorphism it is implemented entirely in Ada as withed-in code. No precompilation step is required, and no messy tool-generated source code is created.

Additionally, the technique of using classes of recognizers promises to make most token specifications as simple as making an easy to read procedure call. The most error prone part of generating analyzers, the token pattern matching, has been taken from the typical user's hands and placed into reusable classes. Over time I hope to see the addition of enough reusable recognizer classes that very few users will ever need to write a custom one.

Ada's type safety features should also make misbehaving analyzers easier to debug. All this will hopefully add up to token analyzers that are much simpler and faster to create, easier to get working properly, and easier to understand.
 

Manifest

This version of the OpenToken package should come with the following files:
 
gpl.html The license terms for this software. Please read it before using.
gpl.txt The plaintext version of the licensing terms
philosophical-gnu-sm.jpg A picture that goes with the licensing terms
Readme.html This file
Readme.txt The plaintext version of this file
token-analyzer.adb 
token-analyzer.ads
The token analyzer class
token-based_integer_ada_style.ads
token-based_integer_ada_style.adb
Ada integer literal with base designation (eg: 16#123abc#)
token-based_integer_java_style.ads
token-based_integer_java_style.adb
Java integer literal with base desingation
token-based_real_ada_style.ads
token-based_real_ada_style.adb
Ada real literal with base designation
token-bracketed_comment.ads
token-bracketed_comment.adb
Token recognizer for inclusive comments (eg: C's /* */ pairs)
token-character_set.ads 
token-character_set.adb
Token recognizer for a string consisting of only characters in a given set
token-csv_field.ads
token-csv_field.adb
Token recognizer for a field in a comma-separated value file (CSV)
token-end_of_file.adb 
token-end_of_file.ads
Token recognizer for the end of the input
token-extended_digits.ads
token-extended_digits.adb
Token recognizer for hexidecimal digits. Mostly useful as a building block for other recognizers.
token-escape_sequence.ads
token-escape_sequence.adb
Recognizer for character escape sequences
token-graphic_character.ads
token-graphic_character.adb
Recognizer for a character literal
token-identifier.adb 
token-identifier.ads
Token recognizer for a typical space-delimited identifier
token-integer.adb 
token-integer.ads
Recognizer for an integer literal
token-keyword.adb 
token-keyword.ads
Recognizer for a given specific keyword
token-line_comment.adb 
token-line_comment.ads
Recognizer for a line comment with a specified introducer
token-octal_escape.ads
token-octal_escape.adb
Recognizer for an octal escape sequence (eg: \003)
token-nothing.ads
token-nothing.adb
Recognizer for nothing (useful as a default token).
token-real.adb 
token-real.ads
Recognizer for a real (floating or fixed point) literal
token-separator.ads
token-separator.adb
Recognizer for a non-letter separator. Similar to keyword, but does not worry about the token's case.
token.ads Abstract parent class from which new token recognizers may be derived.
Language_Lexers/ada_lexer.ads A lexical analyzer for Ada
Language_Lexers/java_lexer.ads A lexical analyzer for Java
Examples/ASU_Example_3_6/
asu.txt
A sample input file for the asu_example_3_6 program
Examples/ASU_Example_3_6/
asu_example_3_6.adb
An example program that implements Example 3.6 from the Aho/Sethi/Ullman Compilers text.
Examples/ASU_Example_3_6/
Makefile
A makefile for building the example program from the sources
Examples/ASU_Example_3_6/
relop_example_token.adb
Examples/ASU_Example_3_6/
relop_example_token.ads
A token recognizer for a relational operator
Examples/Language_Lexer_Examples/
test_ada_lexer.adb
Testing routine for the Ada lexer
Examples/Language_Lexer_Examples/
test_java_lexer.adb
Testing routine for the Java lexer
Examples/Test/makefile A makefile for building and running the test programs from their sources
Examples/Test/string_test.adb Test driver for the string token recognizer
Examples/Test/token_analyzer_ctd.adb Test driver for the token analyzer
Docs/UsersGuide.html
Docs/UsersGuide.txt
The OpenToken User's Guide

History

Version 1.3

This version adds the default token capability to the Analyzer package. This allows a more flexible (if somewhat inefficient) means of error handling to the analyzer. The default token can be used as an error token, or it can be made into a non-reportable token to ignore unknown elements entirely.

Identifier tokens were generalized a bit to allow user-defined character sets for the first and subsequent characters. This not only gives it the ability to handle syntaxes that don't exacly match Ada's, but it allows one to define identifiers for languages that aren't latin-1 based. Also, the ability to turn off non-repeatable underscores was added.

Integer and Real tokens had an option added to support signed literals. This option is set on by default (which causes a minor backward incompatability). Syntaxes that have addition or subtraction operators will need to turn this option off.

A test to verify proper handling of default parameters was added to the Test directory. A makefile was also added to the same directory to facilitate automatic compiling and running of the tests. This makefile will not work in a non-Gnat/NT environment without some modification.

New recognizers were added for enclosed comments (eg: C's /* */ comments)and  single character escape sequences. Also a "null" recognizer was added for use as a default token.
 

Version 1.2.1

This version adds the CSV field token recognizer that was inadvertently left out of 1.2. This recognizer was designed to match fields in comma-separated value (CSV) files, which is a somewhat standard file format for databases and spreadsheets. Also, the extraneous CVS directories in the zip version of the distribution were removed.

Version 1.2

The long-awaited string recognizer has been added. It is capable of recognizing both C and Ada-style strings. In addition, there are a great many submissions by Christoph Grein in this release. He contributed mostly complete lexical analyzers for both Java and Ada, along with all the extra token recognizers he needed to accomplish this feat. He didn't need as many extra recognizers as I would have thought he'd need. But even so, slightly less than 1/2 of the recognizers in this release were contributed by Chris (with a broken arm, no less!)

Version 1.1

The main code change to this version is a default text feeder function that has been added to the analyzer. It reads its input from Ada.Text_IO.Current_Input, so you can change the file to whatever you want fairly easily. The capability to create and use your own feeder function still exists, but it should not be necessary in most cases. If you already have code that does this, it should still compile and work properly.

The other addition is the first version of the OpenToken user's guide. All it contains right now is a user manual walking through the steps needed to make a simple token analyzer. Feedback and/or ideas on this are welcome.

Version 1.0

This is the very first publicly released version. This package is based on work I did while working on the JPATS trainer for FlightSafety International. The germ of this idea came while I was trying to port a fairly ambitious, but fatally buggy Ada 83 token recognition package written for a previous simulator. But once I was done, I was rather suprised at the flexibility of the final product. Seeing the possible benefit to the community, and to the company through user-submitted enhancement and debugging, I suggested that this code be released as Open Source. They were open-minded enough to agree. Bravo!
 

Future

As it stands, I am developing and maintaining this package as part of my master's thesis. Thus you can count on a certain amount of progress in the next few months

Things on my plate for the next release:

Things you can help with: Again, I hope you find this package useful for your needs.

T.E.D.  - dennison@telepath.com