6502.org • View topic - Assembler and coding conventions

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programming

All times are UTC

Assembler and coding conventions

Page 1 of 1

[ 7 posts ]

Previous topic | Next topic

Author

Message

Mats

Post subject: Assembler and coding conventions

Posted: Sun Jan 18, 2004 10:00 pm

Joined: Sun Aug 24, 2003 7:17 pm
Posts: 111

An assembler statement for the original KIM ASSEBLER for 6502 from 1977 had the format

(label) opcode (operand) (comments)

as follows:

Blanks are used as separator between the fields
The label did NOT have to begin in column 1. Instead it had to be different from all the opcodes (ADC, AND,……,TYA) and could not be any of the special single characters A, S, P, X, Y (reserved). The first character had further to be alphabetic
In the operand field expressions consisting of symbols and constants separated by operands +, -, *, / were allowed. Constants could be in hex, octal, binary or decimal format

There were certainly little practical advantages of allowing a label to start at any columns. I think that nowadays it generally accepted that a label always should start in column 1 and that this is the way to distinguish a label from an opcode thats starts in column 2 or later. Like this any string of characters (not containing the field separator ‘ ‘ or any of +, -, *, / ) can be used as label, the coding get less error prone and also the design of the assembler program itself gets simpler and more elegant.

I also think that experience has also shown that it is better to always stick to hex for addresses and that shifts following + or – in always is in decimal, i.e.

LDA $ABAB (not LDA 43947 , not LDA @125653, not LDA %10111…..)

and

LDA $ABAB-13, X (not $ABAB - $D,X)

With symbols one then writes:

LDA MYLABEL

LDA MYLABEL+3,Y

It is very seldom useful to form the difference between labels to positions in memory, i.e.

LABEL2 – LABEL1

( but see example below)

and I think that the construction LABEL2 + LABEL1 where LABEL2 and LABEL1 are actual (2 byte) addresses NEVER makes sense!

If such differences/sums of symbols is not allowed numerical labels not starting with an alphabetic character can be allowed, i.e

Code:

       *=$6000
1     NOP
       .BYTE  $AB
2     LDA     1 + 1

results in that the byte $AB resident at address $6001 gets loaded to the accumulator. With the original KIM ASSEMBLER

LDA 1 + 1

would instead result in that the byte in address $0002 is loaded and a number like 1 is not allowed as label

For immediate addressing one would in the same way use (only)

LDA #$FF
LDA #<MYLABEL
LDA #>MYLABEL
LDX #<MYLABEL - 1

i.e. using hexadecimal format for addresses and decimal notation for shifts. Like this the first character ($ or something else) is the way to distinguish between constants and symbols. These examples should cover all what is needed or useful

Sometimes (but seldom) it can be useful to form statements of type

LDX # LABEL2 - LABEL1

LDX # LABEL2 - LABEL1 + 1

This only make sense if 0 <= LABEL2 - LABEL1 <= 255 and the only application I can imagine is to move a portion of the code to a new area (for example from ROM to RAM) as follows :
(This code only works if LABEL2-LABEL1<=128, otherwise only one value would be moved!)

Code:

LABEL1   .BYTE  $0A
                  .BYTE $0B
…..
LABEL2   .BYTE $FF
START      LDX # LABEL2-LABEL1
LOOP       LDA LABEL1,X
                  STA $0200,X
                  DEX
                  BPL LOOP

This type of coding is used in EhBASIC of Lee Davidson and this is one of the reasons why this program could not be assembled with my or Daryl Richters assembler (Lee uses TASS)
But this could alternatively have been coded as follows:
(ZP is an address in the zero page)

Code:

            LDA  #<LABEL1    
            STA  ZP
            LDA  #>LABEL1
            STA  ZP +1
            LDA  #<LABEL2
            PHA
            LDX #$00
LOOP    LDA (ZP)
            STA $0200,X
            INX
            INC ZP
            BNE 1
            INC ZP+1
1           PLA
            CMP ZP
            BNE LOOP

This is a bit longer but considering how unfrequent this situation is I think this is a worthwhile price for allowing numerical labels. And this code is good for up to 256 values to transfer instead of only 128.
Note the use of the numerical label 1!

These are the coding conventions I use and my assembler program is designed to these conventions.

Does anybody have any comment on this design?

Top

GARTHWILSON

Post subject:

Posted: Mon Jan 19, 2004 4:35 am

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8432
Location: Southern California

Some text editors have a "condense" feature which shows only the lines that have something in column 1 and omits all the rest. One of the values of this feature is that it makes it much easier to go down and find something like a certain label you may want where you don't remember the exact spelling to do a search for it. A routine may have labels that are only of local interest. For example, you may have a 20-line routine with two or three places to which the routine branches to another part of itself, and no other routine refers to those labels. It is nice then to be able to indent these labels one space primarily so the text editor's "condense display" feature doesn't show them. If you're writing an assembler, I would recommend that the labels not be required to start in column 1. It will still be able to know it's a label by the colon at the end of the label. For those times that you want to find the next so many uses of that label, you leave the colon out of the search string. If you want the text editor to go directly to that label and not stop at any other uses of it, you put the colon after it in the search string.

Sometimes it's a nuisance when an assembler won't accept a label that starts with a digit. In that case usually preceding the digit with an underscore character will do the job.

After working in Forth a lot (which does allow mixing assembly language in) and then going back to straight assembly, it's always a frustration to me that labels in assembly are so restrictive. Forth labels can include any characters other than the obvious LF, FF, CR, Space, and NUL, and a few others that would be highly impractical like BS. The flexibility makes it nice for using the characters that were so easily accessible in DOS like ° ± √ and the Greek letters that are used so much in engineering work. What would be considered operators (+, -, *, /, etc.) if they were separated from other things with a space, are also legal in names where it's all run together. This means you can have labels like /10 or HMS+ or seconds-HH:MM:SS.

Labels in assembly are for more than just program and variable addresses though. A label might be for a constant, a macro, an assembler variable (similar to an EQUate constant but it can have different values at different points in the program), and probably something else I'm forgetting right now.

I do like the assembler to be case-sensitive, or at least have the option to turn the case sensitivity on.

Which number base is most practical to use in operands depends on what it's for. It would be too restrictive to say that anything other than an address should be in decimal. I use very few numerals in my operands though, and sometimes I should probably use even less and resort more to names. This is not just for making it more descriptive of what the number is, but also to make future modification easier. For example, if adding a new hardware feature dictates that a particular output bit get moved to another port bit, you can just change it at the label, and then every mention of it throughout the code is automatically fixed. If several places in your program checked the status of a particular input bit, say VIA2 port A bit 2, and you had to change it to bit 4 or even a different port or a different VIA altogether, the ORing and ANDing you'd use to set and clear that bit will all be automatically taken care of if you use EQUates instead of numerical bit masks and addresses.

> It is very seldom useful to form the difference between labels to positions in memory, i.e.
>
> LABEL2 - LABEL1

Actually this (taking the address difference) is often useful for getting something like the length of a table you want to search and putting it into an operand to set up a routine's looping. I've used it for other things too. For the 65816's MVP and MVN memory-move instructions, you actually put the 16-bit length of the block to move into the accumulator, possibly with an LDA#.

In your alternate looping example for a block move on the 6502, you'll want to follow your PLA with PHA to avoid crashing with a load of PLAs and only one PHA. Then remember to do PLA at the end of the looping. OTOH, why not skip the PHAs and PLAs altogether, and do an LDA #<LABEL2 followed by CMP ZP, which is faster anyway. The PLA takes 4 clocks, and then you sill have to do the PHA with another three clocks, meaning seven for each loop, whereas the LDA# only takes two clocks and no overhead before or after the loop.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Last edited by GARTHWILSON on Mon Jan 19, 2004 3:36 pm, edited 1 time in total.

Top

John West

Post subject: Re: Assembler and coding conventions

Posted: Mon Jan 19, 2004 12:52 pm

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 296

I'm almost struggling to find a statement that I agree with.

Mats wrote:

I think that nowadays it generally accepted that a label always should start in column 1 and that this is the way to distinguish a label from an opcode thats starts in column 2 or later.

Absolutely not. I've found that it's very useful to have only the main routine label (and its comments) in column 1. Everything else gets indented - local labels once, code twice.

Mats wrote:

I also think that experience has also shown that it is better to always stick to hex for addresses and that shifts following + or – in always is in decimal

My experience has shown that it's a bad idea to place unnecessary restrictions on the programmer, for no other reason than the assembler writer "thought it wasn't useful". It's trivial for the assembler to deal with numbers in multiple bases, and outrageous to force the programmer to use a fixed set, remembering which base to use where.

Mats wrote:

It is very seldom useful to form the difference between labels to positions in memory

For "seldom", read "often". Do you never need to know the size of a table?

Mats wrote:

and I think that the construction LABEL2 + LABEL1 where LABEL2 and LABEL1 are actual (2 byte) addresses NEVER makes sense!

You're placing unnecessary restrictions on the programmer again. An assembler needs a general expression evaluator anyway - why not use it?

Adding addresses is how I do double-buffering:

addr = buffer1
(do stuff at addr)
addr = (buffer1 + buffer2) - addr
(do stuff at addr)
addr = (buffer1 + buffer2) - addr
(do stuff at addr)
...

Mats wrote:

If such differences/sums of symbols is not allowed numerical labels not starting with an alphabetic character can be allowed

It isn't hard to write a tokeniser that allows that in any case. As long as the symbol has a non-digit somewhere in it, it can be recognised.

Mats wrote:

LDA 1 + 1

would instead result in that the byte in address $0002 is loaded and a number like 1 is not allowed as label

How do you know that this isn't what the programmer wanted? Consistency is a good thing - any programming environment should follow the Law of Least Astonishment. Changing the rules for every addressing mode is not going to help anyone.

Mats wrote:

These are the coding conventions I use and my assembler program is designed to these conventions.

Does anybody have any comment on this design?

There is no chance that I would ever use your assembler. Sorry.

Top

Mats

Post subject:

Posted: Mon Jan 19, 2004 9:42 pm

Joined: Sun Aug 24, 2003 7:17 pm
Posts: 111

To support the use of binary and octal constants in the Assembler is straightforward but why? My pocket calculator has a button to convert between binary, octal, hex and decimal in half a second. When the KIM assembler was designed around 1970 not everybody had such performant pocket calculator! And when I look into programs written by others I hardly ever find anything else then hexadecimal constants. Nothing else makes sense!

The problem with decimal is just that it does not have fixed conventional prefix like $ @ % The conflict is between the use of constants of type 1234 and labels as 1234. That's why the KIM assembler required the labels to start with a letter! But I like labels of type 1234 (as used in FORTRAN) and do not need decimal constants. One has to chose between these two possiblities!

I also think that

ASL A
DEC A
INC A
LSR A
ROL A
ROR A

that are one byte operations with implied addressing should be written

ASLA
DECA
etc

Arguments:

-One writes DEX, DEY,INX,INY
-Operations should be 3 or 4 charactes (RMBx, SMBx,BBSx,BBRx use 4!)
-A should be allowed as label

These details one sees clearly when one writes an Assembler! And it is not a matter of being lazy. Only if the assembler language syntax is as straightforward and logical as possible will the resultant Assembler be good, reliable and maintainable! This all pays out not only for the developer but just as much for the user!

And I don't think there exists any real standard yet. The MOS, the Daryl and the TASS assemblers all use (slightly) different syntax and a program must be modified (slightly) when you switch assembler!

Top

GARTHWILSON

Post subject:

Posted: Tue Jan 20, 2004 3:34 am

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8432
Location: Southern California

> To support the use of binary and octal constants in the Assembler is
> straightforward but why? My pocket calculator has a button to convert
> between binary, octal, hex and decimal in half a second.

The octal thing is one I never could figure out, and I doubt that anyone here uses it. I suppose it's left over from before someone thought of using the letters A-F to make it convenient to group the bits in fours. To group them in threes for octal when a byte has 8 bits is awkward to say the least.

Binary, OTOH, is convenient sometimes when you want an operand to show individual bits where the overall resulting number is of no interest. It might be a byte to store in the data direction register of an I/O port for example, and you want an almost graphic representation of which bits are inputs and which are outputs.

> The problem with decimal is just that it does not have fixed conventional prefix like $ @ %

The one notation I really dislike is C's 0xF2 for example. It's like saying I'm 0044. years old, or makes it look like an odometer where the hundred thousands digit is unknown but you assume that at least the car has not reached a million miles.

> But I like labels of type 1234 (as used in FORTRAN)

Wow— I haven't looked at FORTRAN in so long I had forgotten about that! As I mentioned earlier, it would sometimes be nice if labels were allowed to start with a digit, but the whole idea is to make the label both descriptive and short. If you use just a number for a label, will you know for sure that you haven't already used that one somewhere else in the program? If it's just for a really short branch and a description of what that point does is not necessary, sometimes you can simulate higher-level language structures by using macros, and do away with the need for a label altogether.

[Edit: I have an article on forming and using structure macros, with accompanying C32 assembler source code for the 65c02, at http://wilsonminesco.com/StructureMacros/index.html. Most of the local branching labels are gone, as the macros take care of the addresses to branch to, and the structure is more visible just from indentation and descriptive names like "BEGIN...WHILE_EQ...REPEAT" or "IF_ZERO...ELSE_...END_IF". The benefits are: more clarity, better control, fewer bugs, usually with absolutely no memory or performance penalties.]

> I also think that
>
> ASL A
> DEC A
> INC A
> LSR A
> ROL A
> ROR A
>
> that are one byte operations with implied addressing should be written. . .

Some of these do already have "official" aliases, like INA and DEA.

> -Operations should be 3 or 4 charactes (RMBx, SMBx,BBSx,BBRx use 4!)

Personally, I prefer that the mnemonic be kept at 3 characters and the "x" here be specified in the operands. Going back to my earlier example of several places in the program having to check an input bit of a port and then you find you have to move that input to a different bit, the change of the constant will then fix all the op code occurrences to specify the right bit, without requiring a change of mnemonic. If a particular assembler doesn't support this, it can probably be synthesized again with macros.

Also a personal preference, not necessarily the only way to do it—I also like to keep the mnemonics all at 3 characters so anything that could otherwise look like a mnemonic but has a different number of characters is automatically understood to be a macro call.

One feature I like on the 2500AD assembler that I had to jury-rig with macros in my C32 assembler is the directives for a large piece of comments. For example, if a particular routine needs to be prefaced with a couple of paragraphs and maybe a diagram of the what's, why's, and wherefore's, it's nice to not have to put semicolons down the left edge, especially if editing a paragraph changes all the line lengths! The comment directives allow you to just tell it where a comment block starts so the assembler skips everything up to the end-of-comments mark many lines later.

Yes, it would be nice if there were a standard such that going from one assembler to another did not require modifying the code a little here and there; but as soon as someone tries to impose a standard, I suppose someone will jump up and down about some nice feature that got ruled out. I guess this is why committees work on these things before giving them to ANS to standardize.

Different people will have different needs based on the kind of work they do, requirements of the team they work with, how the source code may need to be INCLuded with other files, how it's linked, and what kinds of programmer's text editors they use. The best way to help the greatest number of users is to make things ultra-flexible and then do an outstanding job of documenting and supporting the assembler.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Top

Mats

Post subject: Multipass Assembler

Posted: Thu Jan 22, 2004 10:14 pm

Joined: Sun Aug 24, 2003 7:17 pm
Posts: 111

Probably an unique feature of my assember!

Normally an assembler does its job in two passes. In the first pass the forward referring symbols cannot be resolved but when this first pass is done all symbols should be defined and available for the second and final pass.

This is obviously true for genuine labels but not necessarily for other symbols. There could possibly be recursive forward referencing of type

a=b
b=c
c=$AB

for which 3 passes are needed.

The logical approach:

Make succesive passes until either
-all symbols are defined (success)
or
-not all symbols are defined but the last pass did not increase the number of defined symbols (failure)

Like this it could even happen that one only need one pass! In the case that there is no forward branch.

Is this a new idea?

For the original MOS assembler no symbols could be defined using forward referencing!

Top

John West

Post subject: Re: Multipass Assembler

Posted: Fri Jan 23, 2004 3:03 pm

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 296

Mats wrote:

a=b
b=c
c=$AB

for which 3 passes are needed.

You don't need three passes for that. The first pass finds the definitions of all symbols (c's definition will be a value. a and b will be expressions). The second pass uses the definitions from the first pass to evaluate expressions. The expression evaluator will be recursive, but it will be anyway. It doesn't need for multiple passes over the entire source.

The first pass needs to keep track of PC, so it can assign values to labels. Some instructions can be different sizes depending on the value of their operands (zero page or absolute addressing, for example). If the value is not yet known, it's traditional to assume the larger instruction. You could use multiple passes to resolve these, but in practice there's little point.

Pass two then has to remember that the larger instruction was used, even if it now knows that the smaller one is suitable. Probably everyone's first assembler has this bug.

Top

Page 1 of 1

[ 7 posts ]

Board index » 6502.org Users Forum » Programming

All times are UTC

Who is online

Users browsing this forum: rudla.kudla and 8 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum