Chapter 6: Technical information

This chapter consists of various sections. The first section describes Yodl from the point of view of the systems administrator. Issues such as the installation of the package are addressed here. The second section describes Yodl's technical implementation in some detail. Apart from the documentation about Yodl given here, much can be found in the individual source files. However, section 6.2 describes `the broad picture'. Having read section 6.2, it should be relatively easy to determine what happens where inside the Yodl program and the yodl-post post processor.

6.1: Obtaining Yodl

Yodl and the distributed macro package can be obtained from https://fbb-git.github.io/yodl/. The associated git-repository page is located at https://github.com/fbb-git/yodl.

Yodl can also be retrieved as a binary Debian package: to find it at http://www.debian.org/distrib/packages by entering yodl (in any distribution) in the form-fields you'll find on that page.

6.1.1: Installing Yodl

The binary package, distributed in yodl-X.Y.Z_a.b.c.deb can be installed using dpkg -install yodl-X.Y.Z. It installs: Local installations, not using the Debian installation process, can be obtained using the provided icmake build-script see below. An alternative (cf. contrib/build.pl) is to use the contributed build.pl script. Note that build.pl is not maintained by me. If you bump against problems when using it, I'll probably not be able to help you out.

If a local installation is preferred or required, unpack the file yodl-X.Y.Z.tar.gz. Next, chdir to the directory yodl-X.Y.Z, and optionally tweak the file config to your needs. Next, issue the command:


    build programs    (or use `build programs strip' to strip the binaries)
    build macros
    build man
    build manual
        
followed by

   install programs WHERE        (install the programs under WHERE,
                                  e.g. install programs /usr/bin)

   install macros WHERE          (install the macros under WHERE
                                  e.g., install macros /usr/share/yodl)

   install man WHERE             (install the man-pages under WHERE
                                  e.g., install man /usr/share/man)
                                
   install manual WHERE          (install the manual under WHERE
                                  e.g., install manual /usr/share/doc/yodl)

   install docs WHERE            (install additional docs under WHERE
                                  e.g., install docs /usr/share/doc/yodl)
        

The installation process installs the binaries, manual pages, other documentation and macro files under the indicated directories.

However, by far the easiest way to install a binary distribution is to use the Debian dpkg --install yodl*.deb command. Dpkg installs the various parts according to Debian's conventions under usr/.

Installation from source requires you to have the following programs installed on your system:

6.2: Organization of the software

This section describes the organization of the source files. Its contents are not necessarily relevant for the binary distribution. The section is probably most useful to those readers who want to be able to extend or who want to do maintenance on Yodl's sources, or who want simply to understand what's happening inside the Yodl program.

Much of the documentation is provided in the individual source files themselves. This section, however, should offer the `broad picture', allowing you to understand the logic behind Yodl relatively fast.

6.2.1: Subdirectories and their meanings

After unpacking Yodl's source archive, the following directories are available:

6.3: Yodl's component interrelations and component setup

Yodl's components show a strict hierarchical ordering. This allows the testing and development of components placed nearer to the component's tree without considering anything that's placed farther away.

The following piece of `ascii-art' shows the relationships for the Yodl program. The root of the tree starts at the top, at the root component. The tree can be read from the top to the bottom, where each horizontal line starts a level of components mentioned immediately below it, and each vertical route through the figure a series of components whose functioning depend on at least the components mentioned earlier.

However, a more natural way to look at it is to start somewhere in the tree, and see what's envountered going up. Doing so, all components that are required are visited. Once the figure shows a


        |
    --- | ---
        |
        
construction. This means that the horizontal line is not related to the vertical dependency crossing (but not touching) it.


                                root
                                |                        
                                message
                                |
                                new
                                |                             
                    +-------+---+-------+
                    |       |           |                    
                    string  queue       stack                
                    |       |           |                    
    +--+--+---------+       |           hashitem               
    |  |  |         |       |           |                    
    |  |  strvector |       |           |                    
    |  |  |         |       |           |                    
    |  |  +---------+       |           |                    
    |  |            |       |           |                    
    |  args         subst   |           hashmap              
    |  |            |       |           |                    
    |  |            +-------+       +---+-------+
    |  |                    |       |           |
    |  |                    |       symbol  +---+----+-------+-------+
    |  |                    |       |       |        |       |       |
    |  +------------+------ | ------+       chartab  counter macro   builtin
    |               |       |               |        |       |       |     
    |               file    |               +---+----+-------+-------+
    |               |       |                   |
    |               +---+---+                   |                         
    |                   |                       |
    |               +---+---+                   |
    |               |       |                   |
    process         lexer   ostream             |
    |               |       |                   |
    |               +-------+-------+-----------+
    |                               |
    |                               parser 
    |                               |
    +-------------------------------+
                                    |
                                    (yodl)   
    

A similar, albeit much simpler, tree can be drawn for yodl-post. Here is the organization of the components for the yodl-post program:


                                root
                                |                        
                                message
                                |
                                new
                                |                             
                      +-----+---+---+
                      |     |       |
                      |     |       |
                      lines string  hashitem
                      |     |       |
                      |     args    hashmap
                      |     |       |
                      |     +-------+
                      |     |
                      |     file
                      |     |
                      +-----+
                            |
                            postqueue
                            |
                            yodl2html-post
        

The source files of each component are organized as follows:

6.4: The token-producer `lexer_lex()'

Tokens are produced by the lexical scanner. The function lexer_lex() produces the next token, which is always an element of the following set:

    TOKEN_UNKNOWN,          /* should never be returned */

    TOKEN_SYMBOL,     
    TOKEN_TEXT,         
    TOKEN_PLAINCHAR,        /* formerly: anychar */
    TOKEN_OPENPAR,
    TOKEN_CLOSEPAR,
    TOKEN_PLUS,             /* it's semantics what we do with a +, not      */
                            /* something for the lexer to worry about       */

    TOKEN_SPACE,            /* Blanks should be at the end                  */
    TOKEN_NEWLINE,

    TOKEN_EOR,              /* end of record: ends pushed strings           */
    TOKEN_EOF,              /* at the end of nested evaluations/eof         */
        

In particular note the existence of a TOKEN_EOR token: this token indicates the end of a piece of text, a string, inserted into the input stream by the parser's actions, when it calls lexer_push_str(). Such a situation occurs in particular when a macro is evaluated: having read a macro, and replacing its parameters ARG1, ARG2, ... ARGn by their respective argumentes, the resulting string is pushed back into the input stream by lexer_push_str(). This happens, e.g., inside the function p_expand_macro(). An excerpt from this function shows this call:


    void p_expand_macro(register Parser *pp, register HashItem *item)
    {
        ...
            if (argc)                           /* macro with arguments     */
                p_macro_args(pp, &expansion, argc);
            ...
            lexer_push_str(&pp->d_lexer, string_str(&expansion));
        ...
    }
        

The parser repeatedly calls the lexer's function lexer_lex(). This happens most dramatically inside the function p_parse(), defined by a mere single statement:


    void p_parse(register Parser *pp)
    {
        while ((*pp->d_handler[lexer_lex(&pp->d_lexer)])(pp))
            ;
    }
        
Here, in a loop continuing until the handler indicates that the loop should terminate, lexer_lex() is called to produce the next token. The finite state automaton (FSA) implemented here is described in more detail in section 6.5.

Apart from here, lexer_lex() is called from four other locations inside the parser component:

So, lexer_lex() is the parser's `window to the outside world'. The lexer_lex() function, however, is a fairly complex animal:

6.5: The Parser's Finite State Automaton

The input files are parsed by the function parser_process(), which is called by Yodl's main() function.

This processor pushes all files that were specified on the input in reverse order on the input stack, and then calls the support function p_parse() to process each of them in turn.

p_parse() is an very short function: it contains one while statement, repeatedly calling a handler appropriate with the next token returned by the lexical scanner. Therefore, the parser can be considered as a table driven finite state automaton (FSA).

The table itself is initialized in parser/psetuphandlerset.c, by the function p_setup_handlerSet(). It fills the two dimensional array ps_handlerSet with the address of the function that must be called for each combination of parser-state (as defined in the HANDLER_SET_ELEMENTS enum) in parser/parser.h and token that may be produced by the lexical scanner (as defined in the LEXER_TOKEN enum in lexer/lexer.h). Depending on the situation the parser encounters, it may point its pointer d_handler to a particular row in this table. Since the rows represent the parser's states, states can be switched easily by reassigning this pointer. This happens all the time. For example, when in parsernameparlist.c a name must be retrieved from a parameter list, it calls parser_parlist(pp, COLLECT_SET), which function temporarily switches the parser's state to COLLECT_SET, returning the parameter list's contents. to its caller.

The functions whose addresses are stored in the various column-elements of the array ps_handlerSet are called handler. Most handlers are named p_handle_<state>_<lextoken>(), where <state> is the name of the associated parser state, and <lextoken> is the name of the appropriate lexical scanner token. For example, p_handle_default_symbol() is the handler that was designed for the situation where the parser is in its initial, or default, state, and the lexical scanner returns a TOKEN_SYMBOL token. Some handlers have more generic names, like p_handle_unknown(), which is some sort of emergengy exit, called when the parser doesn't know what to do with the received lexical scanner token (a situation which should, of course, not happen).

In versin 2.00, the following handler functions are available:

The parser has the following states:

COLLECT_SET

retrieves parameter lists as they are encountered on the input. The parameter list is not processed in any way, and omits the surrounding parentheses. So, when entering this state (e.g., in the function parser_parlist()), a parameter list is completely consumed, but only its contents (and not its surrounding parentheses) become available. In fact, when entering a state, p_parse() can be called again to process the information in this state. Eventually a state will encounter some stopping signal (e.g., a non-nested close parenthesis in the collect-state results in p_handle_parlist_closepar() to return false, thus terminating p_parse()), terminating that particular state. The function parser_parlist() shows this process in further detail.
DEFAULT_SET

In this state macros, builtins etc. are processed. For most of the tokens that can be returned by the lexical scanner p_handle_insert() is called.
IGNORE_SET

In this state a parameter list is completely skipped. This state is used, for example, when processing COMMENT().
NOEXPAND_SET

The contents of a parameter list is not expanded, but CHAR builtins are processed. In Yodl version 2.00 there is only one situation wher this state (and its companion state NOTRANS_SET) is actively used: Yodl's function gram_NOEXPAND() uses these states to retrieve the contents of a no-expanded or no-transed parameter list.
NOTRANS_SET

When the parser is in this state, a parameter list is inserted using the currently active insertion function (inserting to file or memory) It is identical to the NOEXPAND_SET state, but the character translation table is not used in the NOTRANS_STATE, whereas it is used in the NOEXPAND_STATE.
SKIPWS_SET

In this state all white-space characters are consumed. The lexical scanner merely returns the next non-whitespace character. This state is used, e.g., to skip the white space between multiple parameter lists when they are defined for macros.

6.6: Adding a new macro

With the advent of Yodl V 2.00, raw macros files are introduced. A raw macro file defines one macro, and all of its conversions. The raw macro files must be organized as follows:

    <STARTDOC>
    macro(name(arg1)(arg2)(etc))
    ( 

        Description of the macro `name', having arguments `arg1', `arg2',
        `etc', each argument is given its own parameter list. The names of the
        arguments in this description should be chosen in such a way that they
        suggest their function or purpose. All macro descriptions starting
        with tt(<STARTDOC>) are included in both the `man yodlmacros'
        manpage and the description of the macro in the user guide. If this is
        not considered appropriate (e.g., tt(XX...()) macros are not described
        in these documents) then use tt(<COMMENT>) rather than
        tt(<STARTDOC>). 
    )
    <>
    DEFINEMACRO(name)(#)(
        statements of macro `name' expecting `#' arguments used by all
        conversions. This section is optional
    <html>
        statements that should be executed by the HTML converter
    <man ms>
        statements that should be executed by two converters. In this case,
        the `man' and `ms' converters
    <else>
        statements that should be executed by all converters not explicitly
        mentioned above
    <>
        statements of macro `name' expecting `#' arguments used by all
        conversions, having processed their specific statements. 
        This section is also optional
    )
        
When setting up these macro definitions, the <> tags must appear with the initial documentation section. It must also appear when at least one specific converter tag is used. For a macro which is converter independent, the macro definition doesn't contain these pointed-arrow tags.

When writing standard Yodl macros, each macro should be stored in a file `name'.raw, where `name' is the lower-case name of the macro. This file should then be kept in the macros/rawmacros directory. The macros/build std call then adds the macro (filtering only the required statements per conversion) to each of the standard conversion formats.

If the macro requires a counter or symbol, consider defining the counter or symbol in, respectively, @counters and @symbols. Furthermore, consider pushing and popping these `variables', rather than plain assigning them, to allow other macros to use the variables as well. A case in point is the counter XXone which was added to the set of counters representing a local counter. Macros may always push XXone and pop Xxone, but should never reassign XXone before its value has been pushed. For Yodl version 2.00 only XXone was required, but other local counters might be considered useful in the future. In that case, XXtwo, XXthree etc. are used. For local symbold XXs prefixes are used: XXsone, XXstwo, etc.

6.7: The Yodl post-processor

With Yodl version 2.00 the old-style post-processor has ceased to exist. Also, the .tt(Yodl)TAGSTART. and .tt(Yodl)TAGEND. symbols no longer appear in yodl's output.

Instead, a system using an index file was adopted. When converting information, yodl produces an output file and an associated index file. The index file defines offsets in the output file up to where certain actions are to be performed. Each line in the index file contains the required information of one directive for yodlpost. For example:


    0 set extension man
    53 ignorews
    2112 verb on
    2166 verb off
    80007 ignorews
    80065 copy
    80065 mandone
        
Entries can be written into the index file using the INTERNALINDEX builtin function. This function has one argument: the information following the offset where it is called. So, there will be a INTERNALINDEX(set extension man) in the macro definitions for this particular conversion (obviously it is a man conversion. The particular INTERNALINDEX call is found in the standard man.yo macro definition file).

When yodlmacros is called, it processes the directives on the idx file in two steps:

For example, when the INTERNALINDEX(htmllabel ...) is specified, the function construct_label() is called. This function receives a line line


    432 label Overview
        
meaning that this label has been defined in offset 432 in the file generated by yodl. The construct_label() function now:

Then, when the queue is processed, a reference to this label may be encountered. This is signalled by an INTERNALINDEX(ref Overview) call. In this case the construct_ref() function doesn't have to do much. Here it is the handler that's doing all the work:

When references are solved in text-files, the INTERNALINDEX(txtref ...) command is used. Here, construct_ref() can still be used, but a specific handle_txt_ref() function is required.

New postprocessing labels can be constructed easily: