[index] [home]

Tweet

Macro-assembler for an imaginary single-instruction CPU



This page demonstrates a macro-assembler for a CPU that does not exist.

Background

I'm (slowly) making a hardware-implementation alongside a simple simulator for a 1-instruction CPU, simple enough to make from scratch, using only transistors and passives.

The main goal is to have fun along the way. Eventually, perhaps, this could be made into an educational tool - every signal is accessible, and there is no "black box" element, as is the case with pretty much every devboard I saw.

Because of the primitive nature of the CPU (1 instruction operating on 1 data-bit), there's a real need for a macro-assembler to create compound statements to implement something resembling native instructions in "normal" CPUs, in a similar way to micro-code being used to implement more complex instructions.

CPU

From an assembly point of view, about the only thing worth mentioning about the CPU is the way it interfaces with the outside world, to give a bit of context for the semantics of the only instruction available.

Other hardware-properties such as timing, power-supply, subsystems and pin-out are not discussed here.

Summary

The CPU is based around a D-latch and a mux:

(Additional latches and signals are omitted for convenience. The "+1"-block is meant to increment a word. Visible I/O lines are discussed in the following section.)

The latch reads and writes data, 1 bit at a time: it outputs its inverted input-value on the shared input-/output-pin, using a relatively high output-impedance.

The mux uses the latch-output to decide between incremented program-counter and branch-address from a dedicated input: if the data-bit is clear after inverting, take the branch, else resume execution at the next instruction.

Peripheral memory

Looking at the previous figure, there are 3 buses:

Note that, contrary to the ROM, there is no bus for indexing the RAM.

It would technically be possible to use a stand-alone CPU to create a simple state-machine or encoder. (For example, a switch could be used on the 1-bit data-line, and address-output could be connected to branch-address input in a clever way.)

However, the intended use for the CPU is to cooperate with an external RAM as well as a ROM, where part of the ROM-data indexes the RAM:

NOTE: I screwed up the ROM's bitfields in this figure - lower 16 bits are connected to CPU's branch-address input, while higher 16 bits are connected to the RAM's address-input.

As can be seen, the ROM uses half of its data-width to index a RAM, the data of which connects back to the CPU. Reads and writes on the CPU's data-input/-output line thus take place on the RAM.

Bitness..?

To be honest, I don't know how to classify this thing in terms of bitness (i.e. "8-bit", "16-bit", etc.).

If bitness depends on the total width of data-lines going in or out, well, then this would be a 17-bit CPU (16 bits for branch-address input, and 1 bit for true data-input/-output).

Alternatively, one could probably say that the 16 branch-address input bits are a dedicated part of the addressing scheme - they can not be used for anything else. Furthermore, if the latch is considered the only item connected to the data-bus, then this would be a 1-bit CPU.

What do you think..?

Native instruction: "IBC" (Invert and Branch if Clear)

The only "native instruction" implements a conditional branch, depending on the state of the data-input/-output after being inverted by the latch. (There is nothing else this CPU can do, and therefore, the term "instruction" may make little sense - the term "opcode" makes no sense at all here.)

I call this instruction IBC, with semantics "invert data-bit, and take branch if data-bit is clear/zero".

If you play along at home, the format of the IBC-instruction in the examples below is:

ibc <data> <branch>

...where data evaluates to the RAM-location of the data to operate on (i.e. the bit to flip), and branch evaluates to the ROM's branch-address to take, in case the bit is clear/zero after inverting.

The data-argument can be any Tcl-expression, while the branch-argument can be any Tcl-expression or label-name. Labels are resolved when the macro containing the instruction goes out of scope.

A program for this CPU ultimately consists of a list of tuples: the ROM's branch-address (branch) and RAM's data-address (data), 16 bits each, making for 32-bit program-words.

Assembler

Although possible, it quickly becomes tedious to write programs using only IBC. However, more complex pseudo-instructions can be defined in terms of IBC.

To do this, I made a macro-assembler in Tcl (wiki).

I find Tcl to be a brilliant language to implement DSLs. The DSL used here is itself Tcl, exploiting its very minimal syntax. Therefore, parsing and ability to use inline Tcl-snippets come for free.

Using this assembler, compound instructions can be defined as macros, to be instantiated or nested at will. Expansion of all macros in a program, recursively, would yield a list of IBC-instructions.

The following sections give a summary of its features. This is by no means a professional or stable product - I'm just toying around, and pretty new to Tcl myself. Error-checking is currently not implemented (other than what Tcl's parser and runtime provide), and testing has been pretty minimal so far.

Labels

Labels are placeholders for program-addresses.

Below follows a typical macro-definition, making use of a local label "done":

The label can be referenced in branch-like compound instructions, or in IBC itself. Forward references are allowed. Labels are defined using a colon, e.g. ": here" (note the whitespace).

Variables - discussed hereafter - can be dereferenced using the normal Tcl "$", while labels can not. This has to do with the fact that it's impossible to know the address corresponding to a label in a forward reference.

Variables

Each variable must be declared inside a macro-body, ahead of its use as (part of) a data-address in an instruction. Variable-declaration is done using a period, e.g. ". mydata" (note the whitespace). Variables are typeless.

An example of declaration and use:

Note that, since the DSL is Tcl, it's perfectly valid to introduce additional (helper-)variables using the normal "set" constructs.

Where "DSL-declared" variables (declared using a period) get assigned the next free data-address, it's possible to set normal "Tcl-variables" to arbitrary values, and use them as opcode-fields.

This can for example be useful for specifying address-constants or offsets.

Declaration of a DSL-declared variable results in a Tcl-variable with the same name coming into existence at that point. Therefore, there is no difference in referencing a DSL-declared variable or Tcl-variable: both can be used with the usual "$" notation, e.g. "$myvar".

Macro-arguments

Macros can take arguments, if so specified in the corresponding macro-definition.

Actual arguments (at the time of macro-instantiation) can be literals, variables, expressions containing either or both, or labels - the latter only for address-arguments. Expressions using labels are not allowed. (This has to do with the fact that labels have to be explicitly parsed, instead of having the Tcl-parser do the work.)

Use of an argument passed to a macro is shown below:

Expressions

Not much to say here - everything that's allowed in Tcl, is allowed here, since the DSL is Tcl. Furthermore, most binary numeric operators are made available in Polish notation / prefix-form; e.g. ">= 3 2" is allowed.

Once again, beware that it's not possible to use expressions containing labels.

Result/output and possible follow-up

At this point, assembler-output is a (debug-)dump of the program-memory resulting from expanding all macros, recursively. Generated program-memory contents start at offset 0.

In its current form, the assembler cannot be used for anything useful - output is only generated to verify it would produce a valid ROM-image, if it could.

Simulation/validation

The next step could be to make a simple PC-side simulator/validator for generated ROM-images (after assembly) and RAM-images (after running a program): expected behaviour of code-snippets could be described in some form, and checked against the generated data.

Examples of checks could be:

Such a tool is IMHO pretty much necessary before attempting to run programs on actual hardware - nobody needs assembler- or user-bugs at that point.

Higher-level language

Although assembly is fine for writing smaller programs and proof-of-concepts, I would like to make a higher-level language.

Given its limited code-space, combined with the extreme RISC nature of the CPU, it's unlikely that complex programs will ever be made for it.

This somehow puts a practical limit on the need for features in a higher-level language. For example, it probably makes little sense to implement complex types, classes, functional programming, memory-management, etc.

I played around with a small imperative language with one fixed-size variable-type, simple looping-constructs and conditionals.

An interesting challenge is to implement a stack. The CPU has nothing like indirect addressing or computed addresses. However, something resembling a function-call mechanism with data-passing back and forth is possible, and it should be fun to see something come alive that now only exists on paper.

Oh well, we'll see.

Examples: compound instructions

As mentioned before, compound pseudo-instructions can be defined in terms of IBC.

I made a minimal instruction-set using macros, somehow resembling instructions on simple existing CPUs.

This instruction-set still operates on single bits at a time. This is inconvenient for any practical use. Later on, some examples of 4-bit instructions are given, defined in terms of 1-bit instructions (themselves defined in terms of IBC).

The current 1-bit set of pseudo-instructions is displayed below. An arrow indicates dependence, i.e. every instruction eventually depends on the canonical instruction IBC:

A few instructions "close to IBC" are displayed in grey. These instructions only make use of IBC:

    # Invert a bit.

    macro not1 reg {

                ibc1    $reg    done
        : done

    }



    # Set bit.

    macro set1 reg {

        : repeat
                ibc1    $reg    repeat
    }



    # Clear and unconditionally branch.

    macro cb1 reg branch {

        ibc1    $reg    $branch
        ibc1    $reg    $branch
    }

Apart from that, they are in no way special.

For each of the following examples, a sub-graph of instruction-dependency is given. The implemented compound-instruction is coloured green, IBC-instruction is coloured yellow, and direct dependencies of the implemented instruction in question are given in salmon.

Ross agrees - salmon is definitely not pink!

In the implementation of each such compound instruction, macros taking data-arguments have a "1" suffixed to their name (e.g. "add" becomes "add1"), to indicate 1-bit width.

Each sub-instruction in each example is followed by a comment stating the number of underlying IBC-instructions. (This number applies to code-space, not to execution-time.)

"B": unconditional Branch

    macro b branch {

        . tmp

        cb1     $tmp    $branch     ;# (2)
    }

"BC": Branch if bit is Clear

    macro bc1 reg branch {

        not1    $reg                ;# (1)
        ibc1    $reg    $branch     ;# (1)
    }

"CLR": clear bit

    macro clr1 reg {

                ibc1    $reg    done    ;# (1)
                not1    $reg            ;# (1)
        : done
    }

"AND": bitwise AND

    macro and1 reg mask {

                ibc1    $mask   x           ;# (1)
                not1    $mask               ;# (1)
                cb1     $reg    done        ;# (2)
        : x
                not1    $mask               ;# (1)
        : done
    }

"OR": bitwise OR

    macro or1 reg mask {

                ibc1    $mask   x       ;# (1)
                ibc1    $mask   done    ;# (1)
        : x
                not1    $mask           ;# (1)
                set1    $reg            ;# (1)
        : done
    }

"MOV": move (copy) bit

    macro mov1 from to {

                ibc1    $from   a   ;# (1)
                cb1     $to     b   ;# (2)
        : a
                set1    $to         ;# (1)
        : b
                not1    $from       ;# (1)
    }

"BS": Branch if bit is Set

    macro bs1 reg branch {

                bc1     $reg    done        ;# (2)
                b               $branch     ;# (2)
        : done
    }

"XOR": bitwise exclusive-OR

    macro xor1 reg mask {

                bc1  $mask done      ;# (2)

                ibc1 $reg  done      ;# (1)
        : done
    }

"ADD": full adder

    macro add1 ci a b sum co {

                # Pre-calculate   sum  =  a ^ b   (need to invert in case ci == 1)

                mov1   $a     $sum    ;# (5)
                xor1   $sum   $b      ;# (3)

                bc1    $ci    ci_0    ;# (2)

        : ci_1

                #   ci  a   b       sum co
                #   
                #   1   0   0       1   0
                #   1   0   1       0   1
                #   1   1   0       0   1
                #   1   1   1       1   1

                # Invert pre-calculated "sum".

                not1 $sum         ;# 1

                # co  =  a | b

                mov1   $a     $co     ;# (5)
                or1    $co    $b      ;# (4)

                b      done           ;# (2)

        : ci_0

                #   ci  a   b       sum co
                #   
                #   0   0   0       0   0
                #   0   0   1       1   0
                #   0   1   0       1   0
                #   0   1   1       0   1

                # (We pre-calculated "sum" correctly.)

                # co  =  a & b

                mov1   $a     $co     ;# (5)
                and1   $co    $b      ;# (5)

        : done
    }

The latter implementation is perhaps not the most straight-forward - see the note about generated code size later on.

Assembling an example

An example code-snippet is given below:

    include 1bit.asm

    main {

        . ci
        . a
        . b
        . sum
        . co

        add1  $ci $a $b $sum $co  
    }

As can be seen, a single full-adder is instantiated.

All the 1-bit pseudo-instructions implemented in the previous examples are bundled up and included through file "1bit.asm".

Assembling goes as follows:

    $ ./mas.tcl in.asm > out.lst

(Debug-)output looks like this:

    0000: 0001 0003
    0001: 0003 0004
    0002: 0003 0004
    0003: 0003 0003
    0004: 0001 0005
    0005: 0002 0006
    0006: 0002 0008
    0007: 0003 0008
    0008: 0000 0009
    0009: 0000 0016
    000a: 0003 000b
    000b: 0001 000e
    000c: 0004 000f
    000d: 0004 000f
    000e: 0004 000e
    000f: 0001 0010
    0010: 0002 0012
    0011: 0002 0014
    0012: 0002 0013
    0013: 0004 0013
    0014: 0005 0020
    0015: 0005 0020
    0016: 0001 0019
    0017: 0004 001a
    0018: 0004 001a
    0019: 0004 0019
    001a: 0001 001b
    001b: 0002 001f
    001c: 0002 001d
    001d: 0004 0020
    001e: 0004 0020
    001f: 0002 0020

The amount of generated code is quite big: 32 words. (Note that this is just a single full adder.) Therefore, 64k program-words may not be too much of a luxury...

A note on generated code size

Assembling the aforementioned code-snippet resulted in 32 program-words. It used to be 86 with earlier XOR- and ADD-implementations. It can likely be made shorter than 32.

Sometimes, using a simple sequence or if-then-else-selection will generate smaller code than implementing the instruction in the "classic" way.

As an example, "A XOR B" can be defined as

    ( A & ~B ) | ( ~A & B )

...where "&", "|" and "~" mean conjunction/AND, disjunction/OR and negation/NOT, respectively.

An implementation of "XOR" can strictly follow this definition:

    macro xor1 reg mask {

        . tmp1
        . tmp2

        mov1    $reg    $tmp1
        not1    $tmp1           ;#  tmp1 = ~reg
        and1    $tmp1   $mask   ;#  tmp1 = ~reg & mask

        mov1    $mask   $tmp2
        not1    $tmp2           ;#  tmp2 = ~mask
        and1    $reg    $tmp2   ;#  reg' = reg & ~mask

        or1     $reg    $tmp1   ;#  reg' = ( reg & ~mask ) | ( ~reg & mask )
    }

This works fine, but is quite costly in terms of generated code.

An alternate implementation follows from the fact that "A XOR B" can be defined as "if ~B, then A, else ~A":

    macro xor1 reg mask {

                bc1 $mask done      ;# (2)

                ibc1 $reg done      ;# (1)

        : done
    }

The latter implementation would generate only 3 program-words.

Examples of some 4-bit instructions

Using normal Tcl math-operators (or their shorthands in Polish notation), an offset can be given to actual macro data-arguments (but not to branch-arguments), to form macros operating on subsequent bits in memory.

A simple example is a 4-bit "MOV"-instruction:

    macro mov4 from to { 

        mov1  [ + $from 0 ]  [ + $to 0 ]
        mov1  [ + $from 1 ]  [ + $to 1 ]
        mov1  [ + $from 2 ]  [ + $to 2 ]
        mov1  [ + $from 3 ]  [ + $to 3 ]
    }

...where actual macro-arguments "from" and "to" represent 4-bit entities.

To use this macro, you can declare variables of 4 bits in size:

    . source 4      ;# "source" and "dest" each allocate 4 bits
    . dest   4      ;#

    ...

    mov4  $source  $dest

Another example is a 4-bit "ADD"-instruction, where a carry ripples through individual 1-bit add-operations:

    macro add4 a b sum {

        . carry1
        . carry2

                clr1 $carry1

                add1  $carry1  [ + $a 0 ]  [ + $b 0 ]  [ + $sum 0 ]  $carry2
                add1  $carry2  [ + $a 1 ]  [ + $b 1 ]  [ + $sum 1 ]  $carry1
                add1  $carry1  [ + $a 2 ]  [ + $b 2 ]  [ + $sum 2 ]  $carry2
                add1  $carry2  [ + $a 3 ]  [ + $b 3 ]  [ + $sum 3 ]  $carry1
    }

(The 2 temporary carry-variables alternatively take the role of input- and output-carry.)

Source

The assembler's own source-code is pasted below.

    #!/usr/bin/env tclsh



    foreach op { + - * / & | << >> < <= > >= && || } { proc $op { a b } [ list expr \$a $op \$b ] }

    proc loop { N code } { for { set i 0 } { $i < $N } { incr i } { uplevel $code } }

    proc ! x { expr { ! $x } }

    proc unless { cond script } { if { ! $cond } { uplevel $script } }

    proc die msg { puts "FATAL: $msg"; exit 1 }

    proc llast li { lindex $li end }

    proc lleader li { lrange $li 0 end-1 }



    # Only-do-once alternative to "source <filename>".

    proc include fname {

        global sources

        if { ! [ info exists sources ] } { set sources {} }

        if { ! [ dict exists $sources $fname ] } {

            dict set sources $fname 1
            source $fname
        }
    }



    # "With-for-lists": execute a script operating on a list-item. 
    #
    # The item is available as "objvar", and can be changed by the script.
    # Upon succesful execution of "script", the item will be replaced by the
    # possibly changed item.
    #
    # Return-value is the script's return-code.

    proc lwith { objvar listvar index script } {

        upvar $objvar  obj
        upvar $listvar li

        set obj [ lindex $li $index ]

        try        { set ret [ uplevel $script ]
        } on ok {} { lset li $index $obj 
        }

        return $ret
    }



    # Generates a proc, replacing tags in the given proc-body. 
    #
    # (This is probably more convenient than use of "list" or using escapes for longer proc-bodies.)
    #
    # Example:   makeproc myproc { a b } { puts $a<SEP>$b } { <SEP> *** }
    #               -->   myproc { a b } { puts $a***$b   }

    proc makeproc { name arglist body replace } {

        set body [ string map $replace $body ]

        proc $name $arglist $body
    }



    # "pword": operations on program-words (program-mem items)
    #
    # A program-word is a 32-bit entity, consisting of:
    #   - a data-address "daddr" (bits 31..16)
    #   - a branch-address "baddr" (bits 15..0)
    #
    # Semantics of these fields are described at the definition of the only instruction, "ibc".

    namespace eval pword {

        namespace export *
        namespace ensemble create

        proc daddr w { >> [ & $w 0xffff0000 ] 16 }

        proc baddr w {      & $w 0x0000ffff      }

        proc create { daddr baddr }  { | [ << $daddr 16 ] $baddr }
    }



    # "pmem": program-memory singleton and operations thereon
    #
    # Program-memory consists of 32-bit program-word ("pword") entries. 

    namespace eval pmem {

        namespace export *
        namespace ensemble create

        namespace eval our { variable pmem {} }



        proc dump {} { 

            set offs 0

            foreach w $our::pmem {

                puts  [ format "%04x: %04x %04x"  $offs  [ pword daddr $w ]  [ pword baddr $w ] ]

                incr offs
            }
        }



        proc set-baddr { offs baddr } {

            set daddr [ pword daddr [ lindex $our::pmem $offs ] ]

            lset our::pmem $offs [ pword create $daddr $baddr ]
        }



        proc here {} { 

            return [ llength $our::pmem ]
        }



        proc append { daddr baddr } { 

            lappend  our::pmem  [ pword create $daddr $baddr ] 
        }
    }



    # "label" - placeholder for program-memory address
    #
    # An instruction or macro-instance can refer to a label instead of an address-literal as 
    # branch-target. Each such reference is eventually resolved to the actual program-address 
    # corresponding to the label.
    #
    # A label can either be 
    #   - unqualified (label consists of label-name only)
    #   - qualified (label contains macro nesting-level in which it were first defined, as well as label-name)
    #
    # Label-qualification is necessary to be able to reuse label-names across macros, which is extremely
    # convenient. 
    #
    # Labels are automatically qualified right before they are passed via arguments to nested macro-instances.
    #
    # label-layout:   [         $name ]   (unqualified), or
    #                 [ $level, $name ]   (qualified)
    #
    # (This namespace also contains some functionality to distinguish labels from literals.)

    namespace eval label {

        namespace export *
        namespace ensemble create

        proc is-literal x { && [ string is integer $x ] [ >= $x 0 ] }

        proc is-label x { ! [ is-literal [ lindex $x end ] ] }

        proc qualify label { lrange [ concat [ context top ] $label ] end-1 end }

        proc is-qualified label { expr { [ llength $label ] > 1 } }



        proc unpack { label "->" levelvar namevar } { 

            upvar  $levelvar level  $namevar name

            set label [ qualify $label ]

            set level [ lindex $label 0 ]
            set name  [ lindex $label 1 ]
        }
    }



    # "context" - macro-frame, containing housekeeping for currently instantiated macro
    #
    # Macros can be nested. That is, a macro-definition can contain macro-instantiations. Contexts of
    # nested macro-instances form a stack.
    #
    # All code is contained inside _some_ macro-body. There is no "global" scope, but instead a top-level
    # macro-instance called "main", which is the only macro explicitly instantiated by the assembler.
    #
    # Each context contains:
    #
    #   - macro-name, mainly for debugging
    #   - data-address of the next to-be-declared macro-local variable (variable-offset, or "varoffs")
    #   - name ("name") and program-memory ("addr") corresponding to each label defined in this macro
    #   - for each such label, all instruction-addresses where the label is referenced ("reflocs"), to be resolved 
    #     later (see comment with "resolve-refs" hereafter)

    namespace eval context {

        namespace export *
        namespace ensemble create



        # Layout:
        #
        #   "stack": 
        #   [
        #       "name"    : $name,
        #       "varoffs" : $varoffs,
        #       "labels"  : 
        #       {
        #           "name" : 
        #           {
        #               "addr"    : $addr,
        #               "reflocs" : [ ... ]
        #           }
        #       }
        #   ]

        namespace eval our { variable stack {} }



        proc nesting {} { llength $our::stack }

        proc top {} { - [ nesting ] 1 }

        proc at level { lindex $our::stack $level }

        proc from { level "get" propname } { dict get [ context at $level ] $propname }

        proc varoffs {} { expr { [ nesting ] ? [ from [ top ] get varoffs ] : 0 } }

        proc new { "with" "name" name "with" "vars" "at" varoffs } { dict create  name $name  varoffs $varoffs  labels {} }

        proc enter { "new" "with" "name" name } { lappend our::stack [ new with name $name with vars at [ varoffs ] ] }

        proc leave {} { set our::stack [ lleader $our::stack ] }

        proc new-label {} { dict create  addr {}  reflocs {} }

        proc with-context { objname "at" level script } { uplevel [ list lwith $objname our::stack $level $script ] }

        proc incr-varoffs { { N 1 } } { with-context c at [ top ] { dict incr c varoffs  $N } }



        proc with-label { label script } { 

            label unpack $label -> level name

            uplevel  [ list with-context c at $level  [ list dict with c labels $name $script ] ] 
        }



        proc label-exists label { label unpack $label -> level name; dict exists [ from $level get labels ] $name }



        proc touch-label label {

            unless [ label-exists $label ] {

                label unpack $label -> level name

                with-context c at $level {

                    dict set c labels $name [ new-label ]
                }
            }
        }



        proc set-labeladdr { label addr } { touch-label $label; with-label $label [ list set addr $addr ] } 

        proc add-refloc { label paddr } { touch-label $label; with-label $label [ list lappend reflocs $paddr ] }

        proc get-labeladdr label { with-label $label { set addr } }



        # Resolve label-referenes.
        #
        # Labels (defined using a colon, e.g. ": branch_here" - note the whitespace) can be forward-referenced.
        #
        # Since the address corresponding to a label may not be known before it is used, all 
        # references to that label must be replaced with its actual address at some point.
        #
        # By definition, the address corresponding to each label defined in a macro-instance is known 
        # when that instance goes out of scope (since all code corresponding to the macro-instance
        # and all nested macro-instances has been emitted at that point, and all code-locations within the
        # macro-instance are thus known).
        #
        # Therefore, when a macro-instance goes out of scope, all references to its labels can be resolved.

        proc resolve-refs {} {

            set labels [ from [ top ] get labels ]

            dict for { name props } $labels {

                set baddr [ dict get $props addr ]

                foreach refloc [ dict get $props reflocs ] {

                    pmem set-baddr $refloc $baddr
                }
            }
        }
    }



    # Canonical instruction: Invert & Branch if Clear.
    #
    # Semantics are roughly as follows: invert the bit at the data-address corresponding to "data", 
    # and if the bit is clear/zero after being inverted, transfer execution to program-address or label 
    # "branch", else resume execution at the next program-instruction.
    #
    # (Execution is done by actual hardware or emulator.)
    #
    # The "data"-argument must be a Tcl-style expression containing literals and/or declared variables. 
    # (For example, a declared ". my_var" would be referenced as "$my_var" in the expression.)
    #
    # The "branch"-argument must either be a Tcl-style expression or a label, but not an expression 
    # containing labels. 

    proc ibc1 { data branch } {

        if [ label is-label $branch ] { 

            # Labels are resolved later, when leaving the current context.

            context add-refloc $branch [ pmem here ]

            set branch 0xffff
        }


        pmem append $data $branch
    }



    # Label-definition.

    proc : labelname { context set-labeladdr $labelname [ pmem here ] }



    # Variable-declaration.
    #
    # When we accept the fact that (macro-local) variables need to be declared before they are referenced, 
    # it becomes possible to substitute variable-addresses for references on-the-fly (their address is
    # already known at the point of declaration). 
    #
    # Therefore, variables don't need to be resolved to data-addresses later on.
    #
    # For each declared variable, a Tcl-variable will be created at the stack-level of the macro-body.
    # Macro-variables can then be refered to in the normal Tcl way (i.e. "$my_var").

    proc . { varname { num_bit 1 } } { 

        set addr [ context varoffs ]
        context incr-varoffs $num_bit

        uplevel set $varname $addr
    }



    # Macro-definition: use this to define compound instructions. 
    #
    # Macros can take arguments if so specified in the corresponding macro-definition, where each argument is bound
    # to a formal parameter. 
    #
    # An actual parameter (at the time of macro-instantiation) can either be anything that can be passed to Tcl
    # procedures (e.g. literals or more complex expressions), or a label-name. 
    #
    # Labels are resolved only when the macro has been completely instantiated, and right before the corresponding 
    # context is destroyed.
    #
    # (A typical use of arguments is to have macro-instances work with data-variables declared in upper macro-instances.)
    #
    # Defining a macro creates a generator-proc with the same name as the macro.
    #
    # This proc basically does the following, in this order:
    #
    #   1) fully qualify all labels passed to it through arguments, while still in the parent-macro's context/nesting-level
    #   2) create and enter new context
    #   3) emit the macro-body (which may itself instantiate nested macros)
    #   4) resolve pending references to labels defined in this macro
    #   5) destroy the created context, i.e. exit the macro-instance
    #
    # (Last argument in "args" is the macro-body.)

    proc macro { name args } {

        set script   [ llast   $args ]
        set argnames [ lleader $args ]

        makeproc $name $argnames {

            # Before entering new context, qualify all label-type arguments:
            #
            # unqualified label-type formal parameters get re-assigned with their fully qualified form 
            # (using the nesting-level of the parent-context to qualify them), while address-literal 
            # parameters are left untouched.
            #
            # (There is probably interp-magic for this, but we use the formal parameter-list and 
            # string-substitution instead.)

            foreach _argname [ list <ARGNAMES> ] {

                set _argval [ set $_argname ]

                if [ label is-label $_argval ] {

                    context touch-label $_argval

                    set $_argname [ label qualify $_argval ]
                }
            }

            # Enter new context and expand macro-contents.

            context enter new with name <NAME>

            <SCRIPT>

            # Addresses of all encountered labels in this context are known at this point, so resolve them before leaving.

            context resolve-refs

            context leave

        }  [ list <SCRIPT>    $script    \
                  <ARGNAMES>  $argnames  \
                  <NAME>      $name      ]
    } 



    ########################################################################################################################



    unless  [ llength $argv ]  { die "need argument <infile>" }
    lassign $argv infile



    # (All code to be assembled should occur within a "main"-block in order to be assembled.)

    proc main block { macro code_in_main $block }

    source $infile

    code_in_main



    pmem dump

For reference: graph-generation

Graphs in this text were made using Graphviz' "Dot"-tool, using something like this (for the complete graph at the top):

    digraph G {

        node [ fontname = Helvetica ]

        NOT -> IBC
        SET -> IBC
        CLR -> IBC, NOT
        CB  -> IBC
        B   -> CB
        BC  -> NOT, IBC
        BS  -> BC, B
        MOV -> IBC, CB, SET, NOT
        AND -> IBC, NOT, CB
        OR  -> IBC, NOT, SET
        XOR -> BC, IBC
        ADD -> MOV, XOR, BC, NOT, OR, B, AND

        IBC          [ fillcolor = yellow,    style = filled ]
        NOT, SET, CB [ fillcolor = "#e0e0e0", style = filled ]
    }

and processed as follows:

    dot -Tpng in.dot -o out.png

    convert -scale 60% out.png{,}

That's all!


Delivered to you by Vim, GNU Make, MultiMarkdown, bozohttpd, NetBSD, and 1 human.