bisonc++ grammar file organization



bisonc++input - Organization of bisonc++'s grammar file(s)


Bisonc++ derives from bison++(1), originally derived from bison(1). Like these programs bisonc++ generates a parser for an LALR(1) grammar. Bisonc++ generates C++ code: an expandable C++ class.

Refer to bisonc++(1) for a general overview. This manual page covers the structure and organization of bisonc++'s grammar file(s).

Bisonc++'s grammar file has the following generic outline:

    directives (see the next section)
    grammar rules

Grammar rules have the following generic form:


Production rules consist of zero or more sequences of terminal tokens, nonterminal tokens and/or action blocks. When multiple production rules are used they must be separated from each other by vertical bars. Action blocks are C++ compound statements.

This manual page contains the following sections:


Starting with version 6.02.00 bisonc++ reserved identifiers no longer end in two underscore characters, but in one. This modification was necessary because according to the C++ standard identifiers having two or more consecutive underscore characters are reserved by the language. In practice this could require some minor modifications of existing source files using bisonc++'s facilities, most likely limited to changing Tokens__ into Tokens_ and changing Meta__ into Meta_.

The complete list of affected names is:

DebugMode_, ErrorRecovery_, Return_, Tag_, Tokens_
Enums values:
Type / namespace designators:
Meta_, PI_, STYPE_
Member functions:
clearin_, errorRecovery_, errorVerbose_, executeAction_, lex_, lookup_, nextCycle_, nextToken_, popToken_, pop_, print_, pushToken_, push_, recovery_, redoToken_, reduce_, savedToken_, shift_, stackSize_, startRecovery_, state_, token_, top_, vs_,
Protected data members:
d_acceptedTokens_, d_actionCases_, d_debug_, d_nErrors_, d_requiredTokens_, d_val_, idOfTag_, s_nErrors_


Quite a few directives can be specified in the initial section of the grammar specification file. If command-line options for directives are available, then their specifications take precedence over the corresponding directives in the grammar file. Once class header or implementation header files exist directives affecting those files are ignored.

Directives accepting a `filename' do not accept path names, i.e., they cannot contain directory separators (/); directives accepting a 'pathname' may contain directory separators. A 'pathname' using blank characters should be surrounded by double quotes.

Some directives may generate errors. This happens when their specifications conflict with the contents of files bisonc++ cannot modify (e.g., a parser class header file exists, but doesn't define a namespace, but in a later run the a %namespace directive was provided).

To resolve such errors the offending directive could be omitted, the existing file could be removed, or the existing file could be hand-edited according to the directive's specification.


Like bison(1), bisonc++ by default uses int semantic values, and also supports the %stype and %union directives for using single-type or traditional C-type unions as semantic values. These types of semantic values are covered in bisonc++'s manual.

In addition, the %polymorphic directive can be specified to generate a parser using `polymorphic' semantic values. In this case semantic values are specified as pairs, consisting of tags (which are C++ identifiers), and C++ (pointer or value) type names. Tags and type names are separated by colons. Multiple tag and type name combinations are separated by semicolons, and an optional semicolon ends the final tag/type pair.

Here is an example, defining three semantic values: an int, a std::string and a std::vector<double>:

    %polymorphic INT: int; STRING: std::string; 
                 VECT: std::vector<double>
The identifier to the left of the colon is called the tag-identifier (or simply tag), and the type name to the right of the colon is called the type-name. Starting with bisonc++ version 4.12.00 the types no longer have to provide default constructors.

When polymorphic type-names refer to types that have not yet been declared by the parser's base class header, then these types must be (directly or indirectly) declared in a header file whose location is specified using the %baseclass-preinclude directive.

%type directives are used to associate (non-)terminals with semantic value types. E.g., after:

    %polymorphic INT: int; TEXT: std::string
    %type <INT> expr
the expr nonterminal returns int semantic values. In a rule like:

        expr '+' expr
            // Action block: C++ statements here.
symbols $$, $1, and $3 represent int values, and can be used that way in the C++ action block.

Definitions and declarations

The %polymorphic directive adds the following definitions and declarations to the generated base class header and parser source file (if the %namespace directive was used then all declared/defined elements are placed inside the namespace that is specified by the %namespace directive):

The namespace Meta_ contains, among other classes the class SType. The parser's semantic value type STYPE_ is equal to Meta_::SType.

STYPE_ equals Meta_::SType

Meta_::SType provides the standard user interface for using polymorphic semantic data types. It declares the following public interface:


Inside action blocks dollar-notations can be used to retrieve and assign values from/to the elements of production rules. Type directives are used to associates dollar-notations with semantic types.

When %stype is specified (and with the default int semantic value type) the following dollar-notations are available:

When %union is specified these dollar-notations are available:

When %polymorphic is specified these dollar-notations can be used:


To avoid collisions with names defined by the parser's (base) class, the following identifiers should not be used as token names:


All DECLARATIONS and DEFINE symbols not listed above but defined in bison++ are obsolete with bisonc++. In particular, there is no %header{ ... %} section anymore. Also, all DEFINE symbols related to member functions are now obsolete. There is no need for these symbols anymore as they can simply be declared in the class header file and defined elsewhere.


Using a fairly worn-out example, we'll construct a simple calculator below. The basic operators as well as parentheses can be used to specify expressions, and each expression should be terminated by a newline. The program terminates when a q is entered. Empty lines result in a mere prompt.

First an associated grammar is constructed. When a syntactic error is encountered all tokens are skipped until then next newline and a simple message is printed using the default error function. It is assumed that no semantic errors occur (in particular, no divisions by zero). The grammar is decorated with actions performed when the corresponding grammatical production rule is recognized. The grammar itself is rather standard and straightforward, but note the first part of the specification file, containing various other directives, among which the %scanner directive, resulting in a composed d_scanner object as well as an implementation of the member function int lex. In this example, a common Scanner class construction strategy was used: the class Scanner was derived from the class yyFlexLexer generated by flex++(1). The actual process of constructing a class using flex++(1) is beyond the scope of this man-page, but flex++(1)'s specification file is mentioned below, to further complete the example. Here is bisonc++'s input file:

%filenames parser
%scanner ../scanner/scanner.h

                                // lowest precedence
%token  NUMBER                  // integral numbers
        EOLN                    // newline

%left   '+' '-' 
%left   '*' '/' 
%right  UNARY
                                // highest precedence 


    expressions  evaluate

    alternative prompt


    expression EOLN
        cout << $1 << endl;
    'q' done
    error EOLN

        cout << "Done.\n";

    expression '+' expression
        $$ = $1 + $3;
    expression '-' expression
        $$ = $1 - $3;
    expression '*' expression
        $$ = $1 * $3;
    expression '/' expression
        $$ = $1 / $3;
    '-' expression      %prec UNARY
        $$ = -$2;
    '+' expression      %prec UNARY
        $$ = $2;
    '(' expression ')'
        $$ = $2;
        $$ = stoul(d_scanner.matched());

Next, bisonc++ processes this file. In the process, bisonc++ generates the following files from its skeletons:


Although the file parserbase.h, defining the parser class' base-class, rather than the header file parser.h defining the parser class is included by scanner.ih, the lexical scanner may simply return tokens of the class Parser (e.g., Parser::NUMBER rather than ParserBase::NUMBER). This former specification is considered somewhat more intuitively appealing than the latter specification. It was realized by a simple #define - #undef pair generated by bisonc++ near the end of parserbase.h and just before the definition of the parser class itself in the file parser.h. Note that this feature can only be used to access base class types and enum values. The actual parser class is not available by the time the the lexical scanner is being defined, avoiding circular class dependencies.


bison(1), bison++(1), bisonc++(1), bisonc++api(3), (using texinfo), flexc++(1),

Lakos, J. (2001) Large Scale C++ Software Design, Addison Wesley.
Aho, A.V., Sethi, R., Ullman, J.D. (1986) Compilers, Addison Wesley.


Frank B. Brokken (