                          OpenToken Package Readme

                               Version 1.3.1

The OpenToken package is a facility for performing token analysis within
the Ada language. It is designed to provide all the functionality of a
traditional lexical analyzer generator, such as lex. But due to the magic
of inheritance and runtime polymorphism it is implemented entirely in Ada
as withed-in code. No precompilation step is required, and no messy
tool-generated source code is created.

Additionally, the technique of using classes of recognizers promises to
make most token specifications as simple as making an easy to read
procedure call. The most error prone part of generating analyzers, the
token pattern matching, has been taken from the typical user's hands and
placed into reusable classes. Over time I hope to see the addition of
enough reusable recognizer classes that very few users will ever need to
write a custom one.

Ada's type safety features should also make misbehaving analyzers easier to
debug. All this will hopefully add up to token analyzers that are much
simpler and faster to create, easier to get working properly, and easier to
understand.


Manifest

This version of the OpenToken package should come with the following files:



 gpl.html                             The license terms for this software.
                                      Please read it before using.

 gpl.txt                              The plaintext version of the
                                      licensing terms

 philosophical-gnu-sm.jpg             A picture that goes with the
                                      licensing terms

 Readme.html                          This file

 Readme.txt                           The plaintext version of this file

 token-analyzer.adb
 token-analyzer.ads                   The token analyzer class

 token-based_integer_ada_style.ads    Ada integer literal with base
 token-based_integer_ada_style.adb    designation (eg: 16#123abc#)

 token-based_integer_java_style.ads   Java integer literal with base
 token-based_integer_java_style.adb   desingation

 token-based_real_ada_style.ads       Ada real literal with base
 token-based_real_ada_style.adb       designation

 token-bracketed_comment.ads          Token recognizer for inclusive
 token-bracketed_comment.adb          comments (eg: C's /* */ pairs)

 token-character_set.ads              Token recognizer for a string
 token-character_set.adb              consisting of only characters in a
                                      given set

 token-csv_field.ads                  Token recognizer for a field in a
 token-csv_field.adb                  comma-separated value file (CSV)

 token-end_of_file.adb                Token recognizer for the end of the
 token-end_of_file.ads                input

 token-extended_digits.ads            Token recognizer for hexidecimal
 token-extended_digits.adb            digits. Mostly useful as a building
                                      block for other recognizers.

 token-escape_sequence.ads            Recognizer for character escape
 token-escape_sequence.adb            sequences

 token-graphic_character.ads
 token-graphic_character.adb          Recognizer for a character literal

 token-identifier.adb                 Token recognizer for a typical
 token-identifier.ads                 space-delimited identifier

 token-integer.adb
 token-integer.ads                    Recognizer for an integer literal

 token-keyword.adb                    Recognizer for a given specific
 token-keyword.ads                    keyword

 token-line_comment.adb               Recognizer for a line comment with a
 token-line_comment.ads               specified introducer

 token-octal_escape.ads               Recognizer for an octal escape
 token-octal_escape.adb               sequence (eg: \003)

 token-nothing.ads                    Recognizer for nothing (useful as a
 token-nothing.adb                    default token).

 token-real.adb                       Recognizer for a real (floating or
 token-real.ads                       fixed point) literal

                                      Recognizer for a non-letter
 token-separator.ads                  separator. Similar to keyword, but
 token-separator.adb                  does not worry about the token's
                                      case.

 token.ads                            Abstract parent class from which new
                                      token recognizers may be derived.

 Language_Lexers/ada_lexer.ads        A lexical analyzer for Ada

 Language_Lexers/java_lexer.ads       A lexical analyzer for Java

 Examples/ASU_Example_3_6/            A sample input file for the
 asu.txt                              asu_example_3_6 program

 Examples/ASU_Example_3_6/            An example program that implements
 asu_example_3_6.adb                  Example 3.6 from the
                                      Aho/Sethi/Ullman Compilers text.

 Examples/ASU_Example_3_6/            A makefile for building the example
 Makefile                             program from the sources

 Examples/ASU_Example_3_6/
 relop_example_token.adb              A token recognizer for a relational
 Examples/ASU_Example_3_6/            operator
 relop_example_token.ads

 Examples/Language_Lexer_Examples/
 test_ada_lexer.adb                   Testing routine for the Ada lexer

 Examples/Language_Lexer_Examples/
 test_java_lexer.adb                  Testing routine for the Java lexer

 Examples/Test/makefile               A makefile for building and running
                                      the test programs from their sources

 Examples/Test/string_test.adb        Test driver for the string token
                                      recognizer

 Examples/Test/token_analyzer_ctd.adb Test driver for the token analyzer

 Docs/UsersGuide.html
 Docs/UsersGuide.txt                  The OpenToken User's Guide

History

Version 1.3

This version adds the default token capability to the Analyzer package.
This allows a more flexible (if somewhat inefficient) means of error
handling to the analyzer. The default token can be used as an error token,
or it can be made into a non-reportable token to ignore unknown elements
entirely.

Identifier tokens were generalized a bit to allow user-defined character
sets for the first and subsequent characters. This not only gives it the
ability to handle syntaxes that don't exacly match Ada's, but it allows one
to define identifiers for languages that aren't latin-1 based. Also, the
ability to turn off non-repeatable underscores was added.

Integer and Real tokens had an option added to support signed literals.
This option is set on by default (which causes a minor backward
incompatability). Syntaxes that have addition or subtraction operators will
need to turn this option off.

A test to verify proper handling of default parameters was added to the
Test directory. A makefile was also added to the same directory to
facilitate automatic compiling and running of the tests. This makefile will
not work in a non-Gnat/NT environment without some modification.

New recognizers were added for enclosed comments (eg: C's /* */
comments)and  single character escape sequences. Also a "null" recognizer
was added for use as a default token.


Version 1.2.1

This version adds the CSV field token recognizer that was inadvertently
left out of 1.2. This recognizer was designed to match fields in
comma-separated value (CSV) files, which is a somewhat standard file format
for databases and spreadsheets. Also, the extraneous CVS directories in the
zip version of the distribution were removed.

Version 1.2

The long-awaited string recognizer has been added. It is capable of
recognizing both C and Ada-style strings. In addition, there are a great
many submissions by Christoph Grein in this release. He contributed mostly
complete lexical analyzers for both Java and Ada, along with all the extra
token recognizers he needed to accomplish this feat. He didn't need as many
extra recognizers as I would have thought he'd need. But even so, slightly
less than 1/2 of the recognizers in this release were contributed by Chris
(with a broken arm, no less!)

Version 1.1

The main code change to this version is a default text feeder function that
has been added to the analyzer. It reads its input from
Ada.Text_IO.Current_Input, so you can change the file to whatever you want
fairly easily. The capability to create and use your own feeder function
still exists, but it should not be necessary in most cases. If you already
have code that does this, it should still compile and work properly.

The other addition is the first version of the OpenToken user's guide. All
it contains right now is a user manual walking through the steps needed to
make a simple token analyzer. Feedback and/or ideas on this are welcome.

Version 1.0

This is the very first publicly released version. This package is based on
work I did while working on the JPATS trainer for FlightSafety
International. The germ of this idea came while I was trying to port a
fairly ambitious, but fatally buggy Ada 83 token recognition package
written for a previous simulator. But once I was done, I was rather
suprised at the flexibility of the final product. Seeing the possible
benefit to the community, and to the company through user-submitted
enhancement and debugging, I suggested that this code be released as Open
Source. They were open-minded enough to agree. Bravo!


Future

As it stands, I am developing and maintaining this package as part of my
master's thesis. Thus you can count on a certain amount of progress in the
next few months

Things on my plate for the next release:

   * Look into changing the feeder function into a stream reference. I was
     unfamiliar with streams when I wrote this package. It looks like they
     would make several things much easier to deal with, but the devil's
     always in the details...
   * The Biggie: A parsing facility in the same vein as this token analysis
     facility!

Things you can help with:

   * More recognizers - The more of these there are, the more useful this
     facility is. If you make 'em, please send 'em in!
   * Well isolated bug reports (or even fixes). Version 1.0 had been fairly
     thoroughly wrung out already. But there's a lot of newer code, so its
     quite likely you may find problems.

Again, I hope you find this package useful for your needs.

T.E.D.  - dennison@telepath.com
