type Example_Token_ID is (If_ID, Then_ID, Else_ID, ID_ID, Num, Relop, Whitespace);Again, this is a very simple step once you know the list of tokens you need. But of course figuring that out is not always so simple!
package Tokenizer is new Token.Analyzer (Example_Token_ID);
If_Recognizer : constant Token.Handle := new Token.Keyword.Instance'
(Token.Keyword.Get ("if"));
Then_Recognizer : constant Token.Handle := new Token.Keyword.Instance'
(Token.Keyword.Get ("then"));
Else_Recognizer : constant Token.Handle := new Token.Keyword.Instance'
(Token.Keyword.Get ("else"));
ID_Recognizer : constant Token.Handle := new Token.Identifier.Instance'
(Token.Identifier.Get));
Num_Recognizer : constant Token.Handle := new Token.Real.Instance'
(Token.Real.Get));
Whitesp_Recognizer : constant Token.Handle :=
new Token.Character_Set.Instance'(Token.Character_Set.Get (Token.Character_Set.Standard_Whitespace);
A recognizer is a tagged type that is derived from the type Token.Instance.
You should extend the type to provide yourself state information and to
keep track of any settings that your recognizer type may allow. Other routines
and information about this specific type of token may be placed in there
too. In our example the token Relop cannot be recognized by any
of the provided token recognizers, so we declare it as follows. The part
that is cut-and-paste is in black. The part that was custom for this recognizer
is is blue.
with Token;Note that very little code is in blue; just the name of the package and the states between the first and last state. Of course more routines and field in Instance may be added at your discretion depending on the needs of the recognizer.
package Relop_Example_Token istype Instance is new Token.Instance with private;
---------------------------------------------------------------------------
-- This function will be called to create an Identifier token. Note that
-- this is a simple recognizer, so Get doesn't need any parameters.
---------------------------------------------------------------------------
function Get return Instance;private
type State_ID is (First_Char, Equal_or_Greater, Equal, Done);
type Instance is new Token.Instance with record
State : State_ID := First_Char;
end record;---------------------------------------------------------------------------
-- This procedure will be called when analysis on a new candidate string
-- is started. The Token needs to clear its state (if any).
---------------------------------------------------------------------------
procedure Clear (The_Token : in out Instance);
---------------------------------------------------------------------------
-- This procedure will be called to perform further analysis on a token
-- based on the given next character.
---------------------------------------------------------------------------
procedure Analyze (The_Token : in out Instance;
Next_Char : in Character;
Verdict : out Token.Analysis_Verdict);end Relop_Example_Token;
When naming states, I have found it easiest to stick to the following standard:
The result will be one of the enumeration values in Token.Analysis_Verdict. Matches indicates that the string you have been fed so far (since the last Clear call) does fully qualify as a token. So_Far_So_Good indicates that the string in its current state does not match a token, but it could possibly in the future match, depending on the next characters that are fed in. Note that it is quite possible for the verdict to be Matches on one call, and So_Far_So_Good on a later call, depending on the defnition of the token. The final verdict, Failed, is different. You return it to indicate that the string is not a legal token of your type, and can never be one no matter how many more characters are fed in. Whenever you return this, you should set the recognizer's state to Done as well.
package body Relop_Example_Token isNow the only thing that remains is to create a token recognizer object of your new recognizer type, just like you did for the predefined recognizer types.---------------------------------------------------------------------------
-- This procedure will be called when analysis on a new candidate string
-- is started. The Token needs to clear its state (if any).
---------------------------------------------------------------------------
procedure Clear (The_Token : in out Instance) is
begin
The_Token.State := First_Char;
end Clear;---------------------------------------------------------------------------
-- This procedure will be called to create a Relop token recognizer
---------------------------------------------------------------------------
function Get return Instance is
begin
return (Report => True,
State => First_Char);
end Get;--------------------------------------------------------------------------
-- This procedure will be called to perform further analysis on a token
-- based on the given next character.
---------------------------------------------------------------------------
procedure Analyze (The_Token : in out Instance;
Next_Char : in Character;
Verdict : out Token.Analysis_Verdict) is
begincase The_Token.State is
when First_Char =>
-- If the first char is a <, =, or >, its a match
case Next_Char is
when '<' =>
Verdict := Token.Matches;
The_Token.State := Equal_Or_Greater;when '>' =>
Verdict := Token.Matches;
The_Token.State := Equal;
when '=' =>
Verdict := Token.Matches;
The_Token.State := Done;when others =>
Verdict := Token.Failed;
The_Token.State := Done;
end case;when Equal_Or_Greater =>
-- If the next char is a > or =, its a match
case Next_Char is
when '>' | '=' =>
Verdict := Token.Matches;
The_Token.State := Done;when others =>
Verdict := Token.Failed;
The_Token.State := Done;
end case;when Equal =>
-- If the next char is a =, its a match
if Next_Char = '=' then
Verdict := Token.Matches;
The_Token.State := Done;
else
Verdict := Token.Failed;
The_Token.State := Done;
end if;when Done =>
Verdict := Token.Failed;
end case;
end Analyze;end Relop_Example_Token;
Relop_Recognizer : constant Token.Handle := new Relop_Example_Token.Instance'
(Relop_Example_Token.Get));
Syntax : constant Tokenizer.Syntax :=Note that steps 3 and 4 could easily be combined into one step. eg:
(If_ID => If_Recognizer,
Then_ID => Then_Recognizer,
Else_ID => Else_Recognizer,
ID_ID => ID_Recognizer,
Num => Num_Recognizer,
Relop => Relop_Recognizer,
Whitespace => Whitesp_Recognizer
);
Syntax : constant Tokenizer.Syntax :=
(If_ID => new Token.Keyword.Instance'(Token.Keyword.Get ("if")),
Then_ID => new Token.Keyword.Instance'(Token.Keyword.Get ("then")),
Else_ID => new Token.Keyword.Instance'(Token.Keyword.Get ("else")),
ID_ID => new Token.Identifier.Instance'(Token.Identifier.Get),
Num => new Token.Real.Instance'(Token.Real.Get),
Relop => new Relop_Example_Token.Instance'(Relop_Example_Token.Get),
Whitespace => new Token.Character_Set.Instance'(Token.Character_Set.Get
(Token.Character_Set.Standard_Whitespace))
);
Analyzer : Tokenizer.Instance := Tokenizer.Initialize (Syntax);This creates an analyzer that will read input from Ada.Text_IO.Current_Input, and attempt to match it to the given syntax. By default this will be standard input, but than can be redirected to the file of your choice using Ada.Text_IO.Set_Input.
function My_Text_Feeder return String;The text feeder function is just a pointer to a function that will be called to retrieve a string of data to be analyzed. Whenver the analyzer runs out of characters to process, it will request more from the feeder function. If you do not supply one, a default is used which reads input from the standard input stream. If you want to change the feeder function during analysis, use the function Set_Text_Feeder:Analyzer : Tokenizer.Instance := Tokenizer.Initialize
(Language_Syntax => Syntax,
Feeder => My_Text_Feeder'access);
Tokenizer.Set_Text_Feeder (Analyzer => Analyzer,
Feeder => My_New_Text_Feeder
);
The full source that was used for this tutorial is available in the Examples/ASU_Example_3_6 directory, along with a sample input file. To run it, issue the "make" command in that directory. When the command completes, type in "asu_example_3_6" to run it. You should see the following list of tokens recognized:
Found IF_ID
Found ID_ID
Found RELOP
Found ID_ID
Found THEN_ID
Found ELSE_ID
Found RELOP
Found REAL
Found RELOP
Found INT
integer -> (+ | -)? digit+This change has been made simply because it matches the definition used for the Integer and Real tokens provided with the OpenToken package. A joint "num" token could have been created to exactly match the num specified in ASD, but we will leave that as an excersize for the reader.
real -> (+ | -)? (digit | _)* digit . (digit | _)* ( (e | E) (- | +)? (digit)+ )?