The OpenToken package is a facility for performing token analysis
within the Ada language. It is designed to provide all the functionality
of a traditional lexical analyzer generator, such as lex. But due to the
magic of inheritance and runtime polymorphism it is implemented entirely
in Ada as withed-in code. No precompilation step is required, and no messy
tool-generated source code is created.
Additionally, the technique of using classes of recognizers promises to make most token specifications as simple as making an easy to read procedure call. The most error prone part of generating analyzers, the token pattern matching, has been taken from the typical user's hands and placed into reusable classes. Over time I hope to see the addition of enough reusable recognizer classes that very few users will ever need to write a custom one.
Ada's type safety features should also make misbehaving analyzers easier
to debug. All this will hopefully add up to token analyzers that are much
simpler and faster to create, easier to get working properly, and easier
to understand.
| gpl.html | The license terms for this software. Please read it before using. |
| gpl.txt | The plaintext version of the licensing terms |
| philosophical-gnu-sm.jpg | A picture that goes with the licensing terms |
| Readme.html | This file |
| Readme.txt | The plaintext version of this file |
| token-analyzer.adb
token-analyzer.ads |
The token analyzer class |
| token-based_integer_ada_style.ads
token-based_integer_ada_style.adb |
Ada integer literal with base designation (eg: 16#123abc#) |
| token-based_integer_java_style.ads
token-based_integer_java_style.adb |
Java integer literal with base desingation |
| token-based_real_ada_style.ads
token-based_real_ada_style.adb |
Ada real literal with base designation |
| token-bracketed_comment.ads
token-bracketed_comment.adb |
Token recognizer for inclusive comments (eg: C's /* */ pairs) |
| token-character_set.ads
token-character_set.adb |
Token recognizer for a string consisting of only characters in a given set |
| token-csv_field.ads
token-csv_field.adb |
Token recognizer for a field in a comma-separated value file (CSV) |
| token-end_of_file.adb
token-end_of_file.ads |
Token recognizer for the end of the input |
| token-extended_digits.ads
token-extended_digits.adb |
Token recognizer for hexidecimal digits. Mostly useful as a building block for other recognizers. |
| token-escape_sequence.ads
token-escape_sequence.adb |
Recognizer for character escape sequences |
| token-graphic_character.ads
token-graphic_character.adb |
Recognizer for a character literal |
| token-identifier.adb
token-identifier.ads |
Token recognizer for a typical space-delimited identifier |
| token-integer.adb
token-integer.ads |
Recognizer for an integer literal |
| token-keyword.adb
token-keyword.ads |
Recognizer for a given specific keyword |
| token-line_comment.adb
token-line_comment.ads |
Recognizer for a line comment with a specified introducer |
| token-octal_escape.ads
token-octal_escape.adb |
Recognizer for an octal escape sequence (eg: \003) |
| token-nothing.ads
token-nothing.adb |
Recognizer for nothing (useful as a default token). |
| token-real.adb
token-real.ads |
Recognizer for a real (floating or fixed point) literal |
| token-separator.ads
token-separator.adb |
Recognizer for a non-letter separator. Similar to keyword, but does not worry about the token's case. |
| token.ads | Abstract parent class from which new token recognizers may be derived. |
| Language_Lexers/ada_lexer.ads | A lexical analyzer for Ada |
| Language_Lexers/java_lexer.ads | A lexical analyzer for Java |
| Examples/ASU_Example_3_6/
asu.txt |
A sample input file for the asu_example_3_6 program |
| Examples/ASU_Example_3_6/
asu_example_3_6.adb |
An example program that implements Example 3.6 from the Aho/Sethi/Ullman Compilers text. |
| Examples/ASU_Example_3_6/
Makefile |
A makefile for building the example program from the sources |
| Examples/ASU_Example_3_6/
relop_example_token.adb Examples/ASU_Example_3_6/ relop_example_token.ads |
A token recognizer for a relational operator |
| Examples/Language_Lexer_Examples/
test_ada_lexer.adb |
Testing routine for the Ada lexer |
| Examples/Language_Lexer_Examples/
test_java_lexer.adb |
Testing routine for the Java lexer |
| Examples/Test/makefile | A makefile for building and running the test programs from their sources |
| Examples/Test/string_test.adb | Test driver for the string token recognizer |
| Examples/Test/token_analyzer_ctd.adb | Test driver for the token analyzer |
| Docs/UsersGuide.html
Docs/UsersGuide.txt |
The OpenToken User's Guide |
Identifier tokens were generalized a bit to allow user-defined character sets for the first and subsequent characters. This not only gives it the ability to handle syntaxes that don't exacly match Ada's, but it allows one to define identifiers for languages that aren't latin-1 based. Also, the ability to turn off non-repeatable underscores was added.
Integer and Real tokens had an option added to support signed literals. This option is set on by default (which causes a minor backward incompatability). Syntaxes that have addition or subtraction operators will need to turn this option off.
A test to verify proper handling of default parameters was added to the Test directory. A makefile was also added to the same directory to facilitate automatic compiling and running of the tests. This makefile will not work in a non-Gnat/NT environment without some modification.
New recognizers were added for enclosed comments (eg: C's /* */ comments)and
single character escape sequences. Also a "null" recognizer was added for
use as a default token.
The other addition is the first version of the OpenToken user's guide. All it contains right now is a user manual walking through the steps needed to make a simple token analyzer. Feedback and/or ideas on this are welcome.
Things on my plate for the next release:
T.E.D. - dennison@telepath.com