Tips for surviving 6502 assembly-language programming
[Joe Holt is a freelance technical writer. He can be reached at 476 West Main Rd., El Centro, CA 92243. This is article from 1985.]
WITH THE ADVENT of complex microprocessors whose operation codes (op codes) begin to resemble some high-level languages, the days of the 6502 seem numbered. Indeed, a microprocessor with only three 8-bit registers and a 64 K-byte address space is apparently no match for a piece of silicon that can walk through 4 gigabytes of memory in 32-bit strides and work with hundreds of bits worth of registers. I won’t try to fool you: The 6502 will not be around forever. But when the last of its species emerges from the forge, it will be joining an installed base of more than 3 million 6502-based computers. There is merit, therefore, in discussing the peculiarities of this dying breed, the ins and outs of this most nonorthogonal microprocessor.
The 6502 has three 8-bit registers (only one of which can be used in arithmetic and Boolean operations), a single-page (256-byte) fixed stack, and an optimized performance when dealing with the first 256 bytes of memory. But what may be considered a tight architecture is befuddled by an instruction set full of inconsistency and seeming favoritism to certain combinations of addressing modes and op codes. Steve Wozniak even admitted that the only reason he put a 6502 in his Apple II was because it was cheap.
In order to gain the most benefit from the 6502, an assembly-language programmer must understand these idiosyncrasies and us them to his or her advantage.
I will assume for the sake of brevity that you are already familiar with the 6502’s architecture and operation codes (and assembly language in general) and perhaps have access to an Apple or Atari with a decent assembler. If not, you might want to pick up a good reference book on the 6502 and sit down with an assembler and experiment. An intimate knowledge of your machine is the greatest boon to any programming task.
The first two pages (512 bytes) of address space have special meaning to the 6502. The first page, called the Zero page, consists of memory addresses 0000 to FFFF [Editor’s note: All addresses are in hexadecimal unless otherwise specified.] and is considered prime real estate for variable storage. Memory references to the Zero page by most op codes can be reduced to 1 byte because the 6502 has a special addressing mode strictly for this area of memory. Not only are programs that place variables on the Zero page shorter, but they also run faster because the micro-processor need only fetch 1 byte from the program to determine the memory address. The upper byte of the address (00xx) is supplied internally. For example, the following sequence assembles to 6 bytes and executes in eight clock cycles:
$0300 LDA $380 ;get value
$0303 STA $381 ;and stuff it elsewhere
The equivalent sequence using Zero-page variables assembles to only 4 bytes and executes in six clock cycles:
$0300 LDA $80 ;get Zero-page value
$0302 STA $81 ;and stuff it elsewhere
Where speed and space are critical, there is no better solution than to put oft-used variables on the Zero-page. But beware: Other programmers before you have done the same, so it’s important not to alter Zero-page memory locations already used by your computer’s ROM (read-only memory) routines or DOS (disk operating system).
The second page consists of memory addresses 0100 to 01FF and is the location of the 6502’s stack. Because the entire stack can be addressed in 9 bits, the stack pointer is 8 bits wide with the upper bit (01xx) supplied internally. You can only set this 8-bit stack pointer via the X register; you must place the value in the X register and transfer it to the stack pointer with the TXS (transfer X register to stack pointer) op code. Because the stack grows downward, it is good practice to initialize the stack pointer at the beginning of any 6502 program with the following code:
$0300 LDX #$FF ;set pointer to very top of stack
$0302 TXS
Conversely, the only way to read the stack pointer is through the X register, with the TSX (transfer stack pointer to X register) op code. This instruction is handy for “locating yourself” in a relocatable program, a topic I’ll describe later.
The stack never grows so low that it clobbers the Zero page. Instead, the stack pointer wraps from 0100 back up to 01FF, possibly causing confusion if you’ve already got a page full of variables or return addresses pushed onto the stack. This situation can be kept in check by limiting the use of recursive subroutines.
The 6502 is notorious for how it handles its flags, especially the Carry flag. Where most processors set the Carry flag when a borrow operation occurs from a subtract operation, the 6502 produces a clear Carry to indicate a borrow. The following is an illustration with the SBC (subtract with Carry) instruction:
$0300 SEC ;be sure the Carry's set
$0301 SBC #1 ;decrement the accumulator
$0303 BCC BORROW ;a borrow occurred
You’ve probably noticed the lack of add and subtract op codes on the 6502 that do not include the Carry flag. This oversight necessitates setting (for SBC, as above) or clearing (for ADC – add with Carry) the Carry before performing one of these operations.
The backward Carry for subtract operations is not in itself confusing, that is, until you realize that compare operations (CMP, CPX, CPY) are actually subtracts in compares’ clothing. Logical compares are accomplished within most microprocessors by simply subtracting from the value to compare to and tossing away the result. In other words, comparing 5 to 6 would be a matter of subtracting 6 from 5 and setting the appropriate status flags. In this situation (5 minus 6) a borrow would occurs, indicated by a clear Carry. Therefore, after a compare operation, a clear Carry indicates less than, and a set Carry indicates greater than or equal to. This is backward from the logic of all other popular microprocessors.
The other flag that you should be wary of is the Decimal flag. When set, all arithmetic operations are performed in BCD (binary-coded decimal). This is wonderful if you intend to perform BCD mathematics, but it can cause all kinds of unexplainable problems if not. Consequently, you should start any 6502 program with a CLD (clear Decimal flag) operation. Set the Decimal flag only when necessary and clear it immediately afterward.
The way the 6502 decides to set or clear its flags after different operations is not immediately obvious, and the logic behind this is somewhat different from that of most other microprocessors. For example, the Carry flag is only affected by arithmetic (ADC, SBC, and compares) and shift operations (ASL, LSR, ROL, and ROR), except when explicitly changed (with CLC, SEC, or PLP). Operations that affect the Carry flag on many other processors, such as Boolean operations (AND, EOR, and ORA), do not modify the Carry flag at all. Increment and decrement operations do not affect the Carry either, but this is only a problem when testing if a register or memory value was decremented past zero, in which case you have to execute an explicit comparison:
$0300 DEX ;count down
$0301 CPX #$FF ;past zero?
$0303 BEQ ROLLUNDER ;yes
The two flags that are set or cleared in conjunction with most operations are the Negative and Zero flags. The Negative flag reflects the state of the eighth (most significant) bit of the result of the operation, and the Zero flag is set whenever the result is equal to 0 (all bits cleared). All operations that work with a value set these flags, including the load (LDA, LDX, and LDY) and transfer (TAX, TAY, TSX, TXA, TXS, and TYA) op codes.
Although the 6502’s instruction set includes no operations for bit manipulation (short of the Boolean operations), a special instruction can be used to examine the eighth (most significant) and seventh bits of a value in memory. The is the BIT op code, and when used it will place the eighth bit on the memory referenced in the Negative flag and the seventh bit in the Overflow flag without affecting the contents of any register. This has the most benefit when you use it to test if a Boolean variable (that is, one that is either 00 or FF) is set or not:
$0300 BIT ALLDONE ;are we done?
$0302 BMI DONE ;yes
The BIT operation also has the side effect of setting or clearing the Zero flag depending on the result of a logical AND operator between the memory value and the accumulator. This feature (really the main purpose of the BIT operation) has little use outside of testing status bits in a memory-mapped I/O machine.
To complete the discussion of the 6502’s peculiarities, I should mention that there is no provision to push or prop the contents of the X or Y registers. If you need to preserve the contents of either of these registers, there are two paths you can take. The first and most logical one is to save the contents on the stack, but you can only do this by transferring the register to the accumulator and then pushing the accumulator onto the stack. To restore the register, you must pop the accumulator from the stack and transfer its contents back to the appropriate register:
$0300 TXA ;save the X register
$0301 PHA
...
$0325 PLA ;now restore the X register
$0326 TAX
The disadvantage of this method is obvious: The original value in the accumulator is destroyed, and it is not a simple matter of pushing and popping the accumulator around each sequence to preserve it. You must temporarily save the contents of the accumulator either in the remaining register or in some memory location. Neither situation is desirable. (Of course, if the accumulator isn’t holding anything of consequence, then this disadvantage can be ignored.)
Your alternative is to store in the register in memory somewhere and then load it back when required. This has the advantage of placing the register’s content where it can be easily accessed (by LDX, STX, LDY, and STY op codes), but it forces you to set aside a specific location just for preserving a register. Things get really messy when this occurs in a recursive routine or if the location for saving the register is also used someplace else for the same purpose. In either case, havoc will ensue. The bottom line is that no solution is perfect, and you must examine the situation carefully to decide what will work best.
Tricks (or “hacks”) are techniques that use a feature or deficiency in the programming environment to an advantage not anticipated by the system designer. If you use them with caution, they can result in faster, more compact object code.
In the 6502’s case, there are dozens of these optimization tricks, each of which saves bytes and cycle time. Some of the tricks described here are not necessarily applicable to the 6502 but are simply good programming practice. Nonetheless, they are essential to using the 6502’s full potential.
One of the glaring holes in the 6502’s instruction set is the absence of an unconditional relative jump (or branch). This makes writing relocatable code difficult, and many times it seems a waste to have to use 3 bytes for a jump instruction just to skip over 1 or 2 bytes. One way you can get around this is by using a conditional branch when the state of a flag is guaranteed. For example, because a load operation always modifies the Negative and Zero flags, this type of sequence is possible:
$0300 CMP #5 ;less than 5?
$0302 BCS NOTLESS ;no
$0304 LDA #1 ;yes, set to 1
$0306 BNE CONTINUE ;-- always taken
Because the accumulator is loaded with 1, the Zero flag will always be clear, in which case the BNE (branch if not equal) will always be taken.
If you do not know the state of a flag for certain, but you must use a branch instead of a jump (perhaps you are writing some relocatable code), you can always force one of the flags to a known state and branch on that condition. Because the Overflow flag is seldom used, it finds itself the most likely candidate:
$0300 CLV ;clear overflow for branch
$0301 BVC SMORE ;-- always taken
This sequence takes just as many bytes are an equivalent jump (JMP) instruction, with the one disadvantage common to all 6502 relative branches: They can only jump forward 127 bytes or back 128 bytes.
One other trick involves the misuse of the BIT operation. Because the BIT operation does not affect any registers and only the Zero, Negative and Overflow flags, it can be put to good use as a “skip over the next 2 bytes” instruction. For example:
$0300 CMP #5 ;less than 5?
$0302 BCS NO ; no
$0304 LDX #$FF ;yes, set to true
$0306 DFB $2C ;-- BIT trick
$0307 NO LDX #0 ;no, set to false
$0309 STX AFLAG ;save true/false status
If the condition is true (less than 5), the X register will be loaded with the value FF, then a nonsense BIT instruction occurs, after which execution continues. If the condition is false, NOT will be branched to, which loads the X register with 00. In one sense, the operation immediately following the LDX #$FF is a BIT operation with the memory address 00A2, but in another sense, that memory address operand (00A2) disassembles to the instruction LDX #0.
This trick of hiding code within the operand of other code is an old one but should nonetheless be used with caution.
This BIT technique can also be used to skip over one byte by using the Zero-page addressing mode for BIT, in which case the value for the DFB (define byte) pseudo-operation would be 24.
Because of the lack of add and subtract operations that do not include the Carry, the sequences CLC, ADC and SEC, SBC are common ones. There is one way to avoid having to explicitly set or clear the Carry, but only if the Carry is in a known state. For example:
$0300 CMP #5 ;less than 5?
$0302 BCS NOTLESS ;no
$0304 CLC
$0305 ADC #5 ;yes, put it above 5
Because the Carry flag will always be clear if the branch is not taken, the CLC before the ADC instruction is unnecessary. You must be careful to ensure that the condition of the branch does not change, however.
If the Carry flag is in the wrong (but known) state, it can still be used to your advantage if the operand for the ADC or SBC is immediate:
$0300 CMP #5 ;less than 5?
$0302 BCC NOTLESS ;yes!
$0304 ADC #5-1 ;(or 4)
In this example, the ADC operation will still add 5 to the accumulator, because the Carry will always be set and will therefore be added along with the 4.
There are many places in a program where a number of variables must be set to certain values. Usually the code looks like this:
$0300 LDA #$FF ;set a few true/false vars
$0302 STA VAR1
$0304 STA VAR2
$0306 LDA #0 ;clear this one
$0308 STA VAR3
$030A LDA #1 ;and initialize another
$030C STA VAR4
If the X or Y register can be sacrificed, the fact that the values being stored are consecutive can be exploited:
$0300 LDX #$FF ;set a few true/false vars
$0302 STX VAR1
$0304 STX VAR2
$0306 INX ;now we’re up to 0
$0307 STX VAR3 ;clear this one
$0309 INX ;now we’re up to 1
$030A STA VAR4 ;and initialize another
Note that this only reduces the size of the code; there is no savings in speed (an INX op code takes just as many clock cycles as an LDA #).
Often it is necessary to know the location of code that is executing in a relocatable environment. If there is within the system the location of a known RTS instruction (perhaps in ROM), this can be accomplished by calling this RTS, then examining the remains on the stack. The following example will determine what page the executing code is on:
$0300 JSR KNOWNRTS ;get return address on the stack
$0303 TSX
$0304 LDA $100,X ;now get what page we're on
There is a special consideration when using this type of code. If there is the possibility of a 6502 interrupt occurring (from a real-time clock or communications device), the return address above the stack would be destroyed by the 6502’s interrupt processing. It would be wise in this case to turn off interrupts before this bit of code (SEI), then reenable them immediately after the LDA $100,X (CLI). [Editor’s note: These instructions are two more examples of the 6502’s confusing mnemonics. At first glade, you would thing that SEI would mean “set interrupts.” However, SET means “set interrupt disable flag”; its execution shuts off the 6502 interrupts. CLI means “clear interrupt disable flag,” and a CLI instruction actually enables interrupts.]
One of the nicest features of the 6502 is its abundance of addressing modes, but there are times when even these are not enough. A prime example is the situation when a nonindexed indirect reference must be made. Most of the time, the Y register is sacrificed in this way:
$0300 LDY #0 ;index of zero
$0302 LDA (POINTER),Y ;nonindexed
But if the Y register is already being used, a similar method involving the seldom (if ever) used reindexed indirect addressing mode can be used, at the expense of the X register:
$0300 LDX #0 ;use some register
$0302 LDA (POINTER,X)
Of course, you should take advantage of an index register that already contains 00.
Last but not least is a technique for branching to different routines depending on an index value. This can be used for interpreting keyboard commands or executing alternate parts of code based on a certain number. The only limitation is that all the routines must reside on the same page in memory, but a little extra programming can overcome this. This technique uses an RTS instruction as a kind of indirect jump:
$0300 LDA #HIPAGE ;high byte of address routines are on
$0302 PHA ;stick it on the stack
$0303 LDA ADDRTAB,X ;X contains the function number
$0306 PHA ;fake a 'return' address
$0307 RTS ;go to the routine
The table ADDRTAB wold contain the low byte of each routine minus one due to the fact that the 6502’s program counter is incremented after the address is obtained for the RTS. This portion of code expects the function number in the X register. First, the high byte of the routines’ addresses is pushed onto the stack, then the low byte is obtained from the table and pushed onto the stack. At this point, the stack contains the address of a routine to execute just as if an instruction right before that routine had been a JSR (jump to subroutine; note: JSR actually stands for “jump and save return address”). When the RTS executes, it pulls the address off the stack, increments it, then continues program execution at that address.
Just when you get used to the idea of having to use all these tricks and have plastered your wall with 6502 peculiarities (as I’ve done), along comes the 65C02. The 65C02 is a revision of the 6502 that is built into the Apple IIc, is offered as an upgrade to the Apple IIe, and solves many of the problems and voids many of the hacks described above. For example, the 65C02 includes a branch-always instruction, eliminating the need for the known-condition branching tricks. Also included are push and pop X and Y instructions, which eliminate the need for all the tricky register transfers or loading and storing. Another nice addition is the inclusion of increment- and decrement accumulator instructions, so that it is no longer necessary to resort to arithmetic just for this. A few bugs within the 6502 have also been fixed in the 65C02.
After all is said and done, has all this been purely academic? I think not. Despite its simplicity and peculiarities, the 6502 is still an attractive microprocessor for semi-dedicated machines (that is, games machines, low-priced home computers, etc.). It is a fast microprocessor with a very efficient design. And to tell you the truth, I enjoy programming the 6502 for the very quirks I’ve been complaining about. I believe there are others who feel the same way, and this if nothing else should guarantee a long and prosperous life for the 6502.
Article by Joe Holt, published by Byte, June 1985 issue
Scan and OCR by richardtoohey