For those of you who absolutely MUST hack the brick to bits, the pbForth assembler lets you get right down to the metal. The assembler is itself written in pbForth, and uses postfix notation, just like pbForth. This means that you'll need to get your head around writing down the operands and then the instruction, which is backwards from the "normal" way of doing things.
This article takes you through a series of successively more complex examples, starting with a no-op word that does nothing, then a new word that swaps the high and low bytes of the value on the top of the stack. Finally, we show how the conditionals work in the pbForth assembler and make an improved version of the built-in RSHIFT that is at least an order of magnitude faster.
Fortunately, pbForth is fast enough that you won't often need to write anything in assembler, but it's nice to know you can. The assembler uses a little less than 4K of the avaliable 18K of memory. I'm assuming that you are familiar with writing software in assembler, and that you won't mind crashing your RCX once in a while. This is not an inherent problem with pbForth, it's just that programming in assembler is a little bit like driving a car by pushing the gas and brake pedals with your hands.
The assembler would not be available at all without the efforts of Darin Johnson. He emailed the source to me early in March of 2002 and I didn't really get to looking at it until late April 2002. It is based on the same principles as many other Forth assemblers, in that a careful examination of the target processor's opcode table brings out some patterns in the composition of the instructions. Extensive use of Forth's CREATE DOES> mechanism in the assembler makes it easy to get the instruction generator right.
The current version of the assembler does not support the bit or CCR instructions yet, but the support will be available soon.
To get the most up to date version of pbForth firmware and the example scripts, you can get the pbForth Scripts as a zip archive or the pbForth Scripts as a tar.gz archive. The scripts you'll need are in the h8300 directory.
The syntax of the the pbForth assembler is a bit cryptic, especially if you are used to "normal" assemblers. To ease some of the confusion, this section walks through the different types of instructions and addressing modes with a conventional assembler such as GNU as and the pbForth assembler.
The H8/300 has 8 16-bit registers (r0 to r7) that can be split into high and low bytes using standard instructions. The pbForth system uses r6 to hold the top of the stack and uses r4, r5, and r7 for internal purposes. You have complete access to r0, r1, r2, and r3 for transient data. If you do touch r4 to r7, save them first and restore them when you are done.
There is one additional Forth convention to keep in mind when writing code. Whenever we put constant values into the dictionary space, we are "compiling" bytes. By convention, compiling words end in a , (comma). Because the assembler puts values into the dictionary space, the opcodes end in a comma as well. The other reason this is useful is that otherwise the built-in AND would interfere with the assembler mnemonic of the same name!
| GNU as | Forth Assembler |
| NOP | NOP, |
| RTE | RTE, |
| GNU as | Forth Assembler |
| INC.B r0h | r0h INC, |
| NEG.B r1h | r1h NEG, |
| GNU as | Forth Assembler |
| MOV.B r0h,r1h | r0h r1h MOV, |
| ADDS.W #1, r0 | 1 ## r0 ADDS, |
| Address Mode | GNU as | Forth Assembler | Comment |
| Register Direct | r0 | r0 | |
| Register Indirect | @r0 | r0 ) | 0 @r0 is an acceptable short form |
| Register Indirect Offset | @(4,r0) | 4 r0 )@ | 4 @r0 is an acceptable short form |
| Register Indirect PostIncrement | @r0+ | r0 @+ | |
| Register Indirect PreDecrement | @-r0 | r0 -@ | |
| Indirect Address | @1234 | 1234 () | The assembler figures out if the value is 8 or 16 bits automatically |
| Immediate Value | #1234 | 1234 ## | The assembler figures out if the value is 8 or 16 bits automatically |
| Double Indirect Address | \@@12:8 | 12 @@ | Only used by JMP/JSR |
There is no need to specify :8 or :16 after the vlaues. The assembler is smart enough to figure out if there will be a range error and complains as needed. There is also no explicit @aa:8 addressing mode. Instead provide a 16-bit value and the assembler will optimize the instruction. Ie, use FFC0 () instead of @C0:8.
With that behind us, lets go through some examples to demonstrate the use of assembly language within pbForth.
I'm assuming that you know the basics of Forth and how postfix notation works. If not, I suggest you get into the Zen of pbForth a bit before tackling something like writing new words in assembler. You'll need to upload the h8300/assembler.txt script to the RCX before trying any of the following examples. If you don't want to reload the firmware and the assembler every time you crash the RCX - and you will crash it - then use the SAVE-SYSTEM facility to save the firmware including the assembler in a new firmware file.
Part of the Zen of pbForth is that we do the simple things first, and what could be simpler than writing a No-Op word for pbForth? This will let us see the basic structure that we can fit new words into. There is already a NoOp word in pbForth, so the new one will be called myNoOp, and here it is:
\ ----------------------------------------------------------------------------- \ h8300/myNoOp.txt - simple NoOp instruction in assembler \ \ Requires: h8300/assembler.txt \ \ This routine is just a new version of NoOp for the RCX in assembly language. \ ----------------------------------------------------------------------------- \ Revision History \ \ R. Hempel 2002-04-25 - Original \ ----------------------------------------------------------------------------- BASE @ DECIMAL CODE myNoOp \ Creates a new assembly language word called myNoOp NEXT, \ Compiles a JMP to NEXT which must end every assembler word END-CODE \ marks the end of the assembler word BASE ! \ -----------------------------------------------------------------------------
To test it, all we need to do is type myNoOp at the console. Verify that the top value on the stack is unchanged and that no values are consumed or generated on the stack - it is a NoOp after all!
If you're really feeling daring, move it to the 'UserIdle value and let it run in the background. Because it does absolutely nothing by itself, it does nothing in the background and will not interfere with the normal operation of the system.
The neat thing about Forth is that if the language is missing a word, you can just add your own. This example will let us write a new word that splits the value on the top of the stack into two bytes. Given a 16-bit value on the top of the stack, this word returns two values. The most significant byte will be on the top of the stack, the least significant byte will be just below it. The original value is destroyed. Here's what the word might look like in conventional Forth:
\ ----------------------------------------------------------------------------- \ h8300/splitWord1.txt - split the word on top of stack into bytes \ \ Requires: nothing \ \ ----------------------------------------------------------------------------- \ Revision History \ \ R. Hempel 2002-04-25 - Original \ ----------------------------------------------------------------------------- BASE @ HEX : SPLIT-WORD ( u -- lsb msb ) DUP FF AND SWAP \ ( -- lsb u ) Isolate the lsb 8 RSHIFT ; \ ( -- lsb msb ) Isolate the msb : TEST 0 DO I SPLIT-WORD 2DROP LOOP ; BASE ! \ -----------------------------------------------------------------------------
You can check the execution time of this word, including the loop overhead by typing 10000 TEST and noting that it takes about 20 seconds to do the 10,000 iterations. You are about to see the difference a little assembler code makes. Here is the same routine coded in assembler:
\ ----------------------------------------------------------------------------- \ h8300/splitWord2.txt - split the word on top of stack into bytes \ \ Requires: h8300/assembler.txt \ \ ----------------------------------------------------------------------------- \ Revision History \ \ R. Hempel 2002-04-25 - Original \ ----------------------------------------------------------------------------- BASE @ HEX CODE SPLIT-WORD ( u -- lsb msb ) 0 ## r0 MOV, \ Clear out a temporary word value r6l r0l MOV, \ Copy the LSB of TOS to LSW of temp word r0 PUSH, \ Push the LSB as a word on the stack r6h r6l MOV, \ Copy the MSB of TOS to LSB 0 ## r6h MOV, \ Clear the MSB of TOS NEXT, \ Compile jump to NEXT END-CODE : TEST 0 DO I SPLIT-WORD 2DROP LOOP ; BASE ! \ -----------------------------------------------------------------------------
You can check the execution time of this word, including the loop overhead by typing 10000 TEST and noting that it takes about 1 second to do the 10,000 iterations - is something wrong? No, it's just that the RSHIFT operation in the first word is implemented as pure Forth, so it's running a ton of loops. The assembler version does the work without shifts, it just juggles the bytes around naturally!
In fact even if you do the test 30,000 times, it takes only about 3 seconds to run the SPLIT-WORD definition, which means that you can often get order-of magnitude improvement by writing in assembler. This does not mean that pbForth is inherently slow, it just means that in the pure Forth implementation of SPLIT-WORD, the RSHIFT word is probably the bottle-neck.
If you've read this far, you probably know what's coming. We're going to rewrite RSHIFT in assembler. This will use one of the coolest features in the Forth assembler, which is that you can use the same looping constructs in Forth assembler as in regular Forth. It's a bit ugly, but here's the code for RSHIFT written in assembler:
\ -----------------------------------------------------------------------------
\ h8300/rShift.txt - shift word n bits to the right
\
\ Requires: h8300/assembler.txt
\
\ -----------------------------------------------------------------------------
\ Revision History
\
\ R. Hempel 2002-04-25 - Original
\ -----------------------------------------------------------------------------
BASE @
HEX
CODE newRSHIFT ( u1 n -- u2 )
r0 POP, \ Pull u1 into R0 - it's the number we're shifting
0F ## r6l AND, \ Limit the shifting to 16 bits
BEGIN,
NE \ Previous operation sets Z if R6L is 0
WHILE,
r0h SHLR, \ Logical shift MSB right, puts zero in high bit
r0l ROTXR, \ Rotate right, putting carry in high bit
r6l DEC, \ Decrement count of bits to shift
REPEAT,
r0 r6 MOV, \ Copy the result to TOS
NEXT, \ Compile jump to NEXT
END-CODE
: TEST0 0 DO I 8 RSHIFT DROP LOOP ;
: TEST1 0 DO I 8 newRSHIFT DROP LOOP ;
: newSPLIT-WORD ( u -- lsb msb )
DUP FF AND SWAP \ ( -- lsb u ) Isolate the lsb
8 newRSHIFT ; \ ( -- lsb msb ) Isolate the msb
: TEST2 0 DO I newSPLIT-WORD 2DROP LOOP ;
BASE !
\ -----------------------------------------------------------------------------
The results of running the test of the old RSHIFT 10,000 times by typing 10000 TEST0 is no surprise, at least to me it isn't. It takes almost 20 seconds! This means that the run time of the original pure Forth SPLIT-WORD was dominated by RSHIFT. The improvement in performance due to rewriting this critical word in assembler can be verified by typing 10000 TEST1 and noting that it returns in about 1 second.
The final test is to run the pure Forth version of newSPLIT-WORD which uses the newRSHIFT to speed things up. If you type 10000 TEST2 the prompt returns in just over 1 second.
This introduction only scratches the surface of what we can do with assembly language in pbForth. It is important to note that the examples in this article were specifically chosen to illustrate the fact that rewriting key words in assembler can produce significant performance gains - but you need to be aware of exactly what the bottlenecks are.
Before you optimize your code by rewriting in assembler, you should measure the performance of specific bits of code to determine exactly where the speed issues are. That goes for any language - not just Forth!