[index] [home]

Tweet

Universal preprocessor & 6502 assembly coroutines



This page presents a simple universal text-preprocessor, which is then used to implement coroutines / cooperative threads for MOS 6502 assembly-language source.

In a nutshell

Coroutines are used a lot over here, allowing event-driven software to be written in a sequential way.

Most software I write is in C, where coroutines can for instance be implemented using a switch-statement based on Duff's device, or GCC's computed gotos.

Recently I'm toying with a Commodore 64 in assembly, where a number of state-machines (each implementing a visual effect) must run simultaneously.

Instead of applying classic state-variables and switch-like statements, I thought this would be a nice opportunity to try to make coroutines in 6502 assembly.

The idea was to extend assembly-source with thread-control directives, such as "create thread" and "yield thread".

Universal preprocessor (Tcl)

Inspired by the very nice PerlPP, I would like to allow for arbitrary Tcl code within tags to dynamically generate content.

Because Tcl rocks. :-)

The standard output of this code would then be inserted instead of the code/tags. The program to do this would - like PerlPP - be a preprocessor filtering stdin (or a given file) to stdout.

As an example, the following code snippet...

    ...               ; assembler-directives (handled by tass64)

    <? puts nop ?>    ; preprocessor-directives (handled by Tcl)

    ...               ; more assembler-directives

...when ran through the preprocessor, would effectively result in:

    ...
    nop
    ...

A simple Tcl implementation for such preprocessor is given here:

    #!/usr/bin/env tclsh



    set fname [ lindex $argv 0 ]

    set ch [ expr { [ string length $fname ] ? [ open $fname ] : "stdin" } ]
    set rest [ read $ch ];
    close $ch

    while { [ regexp {^(.*?)<\?(.*)$} $rest -> literal rest ] } { 

        eval [ list puts $literal ]

        if { ! [ regexp {^(.*?)\?>(.*)$} $rest -> script rest ] } { puts "cannot find closing tag"; exit 1 }

        eval $script
    }

    eval [ list puts $rest ]

Note that for this to run, you will need a Tcl interpreter on your machine.

(As an exercise, I wrapped the above code in C using the embeddable Jim Tcl implementation, which is a very nice project in itself. The result was a stand-alone binary, not needing a Tcl interpreter to run.)

Learn by example: threaded colour-cycler (tass64 & preprocessor)

Back to the original challenge - to allow for several state-machine-like visual effects to occur simultaneously.

Imagine 2 effects, one of which toggles the screen colour, and another cycles the border-colour of the C64 through red, green and blue.

The following code implements these effects as 2 threads ("toggleScreenColour" and "cycleBorderColour").

To allow other threads to run, a thread can "yield" its turn. In the example below, either thread yields each time after changing colour.

(Note that this is the only way of switching between active threads - there is no preempting going on. A thread chooses if and when to give up control.)

    ;
    ; Include code to enable Tcl thread-directives like "threadBegin".
    ;

    <? source thread.tcl ?>



    ;
    ; A single line of BASIC ("1 sys xxxx") making this program runnable.
    ;

    * = $0801

                        .word listend, 1
                        .null $9e, ^start
    listend:            .word 0



    ;
    ; Main loop calling both threads in turn.
    ;

    start:
                jsr toggleScreenColour
                jsr cycleBorderColour

                jmp start



    ;
    ; Thread: toggle screen-colour between black and white.
    ;

    <? threadBegin toggleScreenColour ?>

                lda #0                      ; black
                sta $d021                   ;

                threadYield

                lda #1                      ; white
                sta $d021                   ;

                threadYield

    <? threadEnd ?>



    ;
    ; Thread: cycle border-colour through red, green and blue.
    ;

    <? threadBegin cycleBorderColour ?>

                lda #2                      ; red
                sta $d020                   ;

                threadYield

                lda #5                      ; green
                sta $d020                   ;

                threadYield

                lda #6                      ; blue
                sta $d020                   ;

                threadYield

    <? threadEnd ?>



    * = $2000
    .dsection data

Interesting things:

Although this doesn't follow from the above source, each thread runs in a loop: when its code is finished, a thread resumes execution at its top.

(This too is an implementation-choice; nearly all threads I use are loop-like in nature, so I chose to not duplicate that choice everywhere in code.)

Imagine each thread starting execution at its top. The observed behaviour would then be:

  1. toggleScreenColour: turn screen black
  2. cycleBorderColour: turn border red
  3. toggleScreenColour: turn screen white
  4. cycleBorderColour: turn border green
  5. toggleScreenColour: turn screen black
  6. cycleBorderColour: turn border blue
  7. toggleScreenColour: turn screen white
  8. cycleBorderColour: turn border red
  9. toggleScreenColour: turn screen black
  10. ...

The threads thus seem to run in pseudo-parallel.

A screenshot of this program in the VICE emulator:

When adding a delay between thread-switches, like this...

    start:
                jsr toggleScreenColour
                jsr delay                   ; <--
                jsr cycleBorderColour
                jsr delay                   ; <--

                jmp start

...the pseudo-parallel and cyclic behaviour can be seen more clearly:

Functions for thread-definition & -manipulation (Tcl)

The preprocessor itself is generic, and has nothing to do with visual effects, coroutines, and in fact doesn't care much about the language being preprocessed. (I have used it for C code too.)

The actual thread-specific functions (in file "thread.tcl", sourced from the top of the example-code) deal with the actual glue-assembly emitted to make coroutines work:

    proc threadBegin { threadName } {

        puts ";--- threadBegin ---"

        puts "$threadName .proc"

        puts {
            .section data
            threadContext:  .word threadStartAddr
            .send

            jmp ( threadContext )
            threadStartAddr:
        }
    }



    puts {

        ;--- generated threadYield macro ---

        threadYield         .macro

        .block
        lda #<after
        sta threadContext
        lda #>after
        sta threadContext + 1
        rts

        after:
        .bend

        .endm
    }



    proc threadEnd {} {

        puts {
            ;--- threadEnd ---

            jmp threadStartAddr

            .pend
        }
    }

(If interested, look at the generated assembly-source and -listing below to see how this works.)

More real-life fun with yield-points

As said, I use this mechanism to implement state-machine-like visual effects on the C64.

In short, visual effects on C64 can be implemented as raster-interrupt handlers for better control w.r.t. timing. A raster-interrupt can be made to fire each time the top-to-bottom buildup of the screen reaches an certain position. This allows you to execute code synchronised to the screen-buildup.

To do this, you program a value N into a certain video-register (address 0xd012), and each time the Nth horizontal raster-line is reached, an interrupt occurs.

You can use multiple interrupts within a single vertical period, by reprogramming the video-register while still handling a raster-interrupt.

Here's an example of 2 pieces of code - "spriteAnimate" and "someOtherEffect" - being called when the screen-buildup reaches raster-line 80 respectively 160:

    <? threadBegin rasterIntHandler ?>

               lda #80                 ; Configure next IRQ to occur at line 80.
               sta $d012               ;

               threadYield             ; Exit IRQ-handler...

               ;
               ; (Vertical retrace
               ; happens here.)
               ;

               inc $d021               ; ...and we're back, at line 80. 
                                       ;
                                       ; (Turn screen yellow as indication.)

               jsr spriteAnimate       ; Do sprite-animation.

               lda #160                ; Configure next IRQ to occur at line 160.
               sta $d012               ;

               threadYield             ; Exit IRQ-handler...

               dec $d021               ; ...and we're back again, at line 160 this time.
                                       ;
                                       ; (Restore screen-colour to blue as indication.)

               jsr someOtherEffect

    <? threadEnd ?>

(The parent-thread "rasterIntHandler" is assumed to be executed in interrupt-context.)

Note the sequential nature of this snippet, although corresponding execution is not sequential at all:

  1. screen-buildup reaches raster-line 80
  2. raster-interrupt fires
  3. interrupt-handler is resumed right after its 1st yield-point
  4. "spriteAnimate" code is executed
  5. interrupt-handler is exited
  6. non-interrupt code is executed until screen-buildup reaches raster-line 160
  7. raster-interrupt fires
  8. interrupt-handler is resumed after its 2nd yield-point
  9. "someOtherEffect" code is executed
  10. interrupt-handler is exited
  11. non-interrupt code is executed until screen-buildup reaches raster-line 80 during the next vertical period
  12. raster-interrupt fires
  13. ...

A screenshot is shown here. Note that the screen is coloured yellow from raster-line 80 until 160:

By using coroutines, the programmer's intention remains clear and the code remains readable, compared to a classic approach using an explicit state-machine.

Extending the thread-functionality

Up to now, the only available operation in thread-context is to yield, using "threadYield". By wrapping "threadYield" in macros or preprocessor-snippets, you can make more interesting primitives.

For example, the next macro implements a simple "sleep":

    threadSleep         .macro numCyc
                        .block
    .section data
        countdn:        .byte ?
    .send
                        lda \numCyc
                        sta countdn
        redo:
                        threadYield

                        dec countdn
                        bne redo
                        .bend
                        .endm

It can be used like such, in a thread-context:

    ...                 ; Execute some code, ...

    threadSleep #4      ; ...wait here for 4 cycles/iterations, ...

    ...                 ; ...and execute some more code.

As can be seen, "sleeping" is basically the same as executing "threadYield" in a loop using a down-counter.

The duration of sleep is specified in "cycles". The parent-thread's calling code determines what that means - in the above example using raster-interrupts, a "cycle" would typically be 1/50 second (the vertical screen-period, assuming PAL).

Note that althought he above use of "threadSleep" looks sequential, the corresponding execution is not: during this thread's sleep, other threads may run.

Another way in which to extend the thread-functionality is to add "wait"-like statements, e.g. to wait for a condition, loop while a condition holds, etc. This can form the basis for a simple IPC-mechanism, where multiple threads communicate through signals.

I use this a lot in embedded C code.

Makefile

For completeness sake, here's a (GNU) Makefile for building and running. File "p.p.asm" is the example shown above, and "p.asm" is the preprocessed version, shown later on.

    TARG = prg.prg

    $(TARG): asm.asm
        tass64 -q -B -L lst.lst --tab-size=1 -o $@ $<

    asm.asm: asm.p.asm
        ./pp $< > $@

    clean:
        -rm $(TARG)

    run: $(TARG)
        x64 -autostart-warp -autostartprgmode 1 +truedrive +cart -VICIIfulldevice Vidmode -VICIIfilter 0 +sound $(TARG)

Caveat

It's easy to shoot yourself in the foot using assembly, and coroutines only hand you a bigger gun.

Know at any point which code is being emitted. There's no magic involved, obviously - coroutines are only intended to make code more readable, not to make it do things it couldn't do otherwise.

One known issue is that register- and flag-values are not preserved across a yield. That is, assume nothing right after a yieldpoint.

(Saving registers and status involves considerably more complex code, and IMHO that's not worth the effort.)

Appendix - preprocessed assembly

Here are both thread-bodies after preprocessing (in file "p.asm"):

    ;
    ; Thread: toggle screen-colour between black and white.
    ;


    ;--- threadBegin ---
    toggleScreenColour .proc
    .section data
    threadContext .word threadStartAddr
    .send
    jmp ( threadContext )
    threadStartAddr:


                lda #0                      ; black
                sta $d021                   ;

                threadYield

                lda #1                      ; white
                sta $d021                   ;

                threadYield


    ;--- threadEnd ---
    jmp threadStartAddr
    .pend




    ;
    ; Thread: cycle border-colour through red, green and blue.
    ;


    ;--- threadBegin ---
    cycleBorderColour .proc
    .section data
    threadContext .word threadStartAddr
    .send
    jmp ( threadContext )
    threadStartAddr:


                lda #2                      ; red
                sta $d020                   ;

                threadYield

                lda #5                      ; green
                sta $d020                   ;

                threadYield

                lda #6                      ; blue
                sta $d020                   ;

                threadYield


    ;--- threadEnd ---
    jmp threadStartAddr
    .pend

(Displayed indent may differ slightly from what would be expected from the "thread.tcl" code, because "asm.asm" as well as the following list-file were generated from a previous, functionaly equivalent version of "thread.tcl".)

Appendix - assembly list-file

To play along at home, here's the assembly-listing. Interesting is the saving/restoring of current-address at each yieldpoint. (Compare this to the above code, where macro-references have not yet been resolved.)

    ; 64tass Turbo Assembler Macro V1.51.992? listing file
    ; tass64 -q -B -L lst.lst --tab-size=1 -o prg.prg asm.asm
    ; Fri Feb 24 18:25:47 2017

    ;******  Processing input file: asm.asm

    >0801            0b 08 01 00                                        .word listend, 1
    >0805            9e 32 30 36 31 00                                  .null $9e, ^start
    >080b            00 00                          listend:            .word 0
    .080d                                           start:
    .080d            20 16 08       jsr $0816                   jsr toggleScreenColour
    .0810            20 3c 08       jsr $083c                   jsr cycleBorderColour
    .0813            4c 0d 08       jmp $080d                   jmp start
    .0816                                           toggleScreenColour
    >2000            19 08                          threadContext .word threadStartAddr
    .0816            6c 00 20       jmp ($2000)     jmp ( threadContext )
    .0819                                           threadStartAddr:
    .0819            a9 00          lda #$00                    lda #0                      ; black
    .081b            8d 21 d0       sta $d021                   sta $d021                   ;
    .081e            a9 29          lda #$29                                lda #<after
    .0820            8d 00 20       sta $2000                               sta threadContext
    .0823            a9 08          lda #$08                                lda #>after
    .0825            8d 01 20       sta $2001                               sta threadContext + 1
    .0828            60             rts                                     rts
    .0829                                                   after:
    .0829            a9 01          lda #$01                    lda #1                      ; white
    .082b            8d 21 d0       sta $d021                   sta $d021                   ;
    .082e            a9 39          lda #$39                                lda #<after
    .0830            8d 00 20       sta $2000                               sta threadContext
    .0833            a9 08          lda #$08                                lda #>after
    .0835            8d 01 20       sta $2001                               sta threadContext + 1
    .0838            60             rts                                     rts
    .0839                                                   after:
    .0839            4c 19 08       jmp $0819       jmp threadStartAddr
    .083c                                           cycleBorderColour
    >2002            3f 08                          threadContext .word threadStartAddr
    .083c            6c 02 20       jmp ($2002)     jmp ( threadContext )
    .083f                                           threadStartAddr:
    .083f            a9 02          lda #$02                    lda #2                      ; red
    .0841            8d 20 d0       sta $d020                   sta $d020                   ;
    .0844            a9 4f          lda #$4f                                lda #<after
    .0846            8d 02 20       sta $2002                               sta threadContext
    .0849            a9 08          lda #$08                                lda #>after
    .084b            8d 03 20       sta $2003                               sta threadContext + 1
    .084e            60             rts                                     rts
    .084f                                                   after:
    .084f            a9 05          lda #$05                    lda #5                      ; green
    .0851            8d 20 d0       sta $d020                   sta $d020                   ;
    .0854            a9 5f          lda #$5f                                lda #<after
    .0856            8d 02 20       sta $2002                               sta threadContext
    .0859            a9 08          lda #$08                                lda #>after
    .085b            8d 03 20       sta $2003                               sta threadContext + 1
    .085e            60             rts                                     rts
    .085f                                                   after:
    .085f            a9 06          lda #$06                    lda #6                      ; blue
    .0861            8d 20 d0       sta $d020                   sta $d020                   ;
    .0864            a9 6f          lda #$6f                                lda #<after
    .0866            8d 02 20       sta $2002                               sta threadContext
    .0869            a9 08          lda #$08                                lda #>after
    .086b            8d 03 20       sta $2003                               sta threadContext + 1
    .086e            60             rts                                     rts
    .086f                                                   after:
    .086f            4c 3f 08       jmp $083f       jmp threadStartAddr

    ;******  End of listing

That's all - have fun!


Delivered to you by Vim, GNU Make, MultiMarkdown, bozohttpd, NetBSD, and 1 human.