André's 8-bit Pages  Projects  Code  Resources  Tools  Forum

6502 binary relocation format

V1.3 as of 31 mar 2005

(c) André Fachat (afachat@gmx.de)


Changes from V1.2

These changes have been lingering in my inbox for quite a while, only now did I find the time to update the document.

  1. New Operating System ID for CC65 generic and opencbm
  2. New mode bit for simple address schema
  3. A note about exporting an undefined reference
  4. An example for late binding
  5. An improved explanation of the relocation table entries
  6. Additional notes about character encodings and name lengths for undefined references and exported labels.
  7. A new header bit to chain o65 files.
  8. New header bits to detect 65xx CPU types.
  9. A New header bit to signal bss zeroing.

Changes from V1.1

The order for saving the undefined reference and the low byte of a high byte relocation entry has changed. This makes the OS/A65 lib6502 implementation easier.


0) Preface

With some new 6502/C64/C128 operating systems comes the need for a new binary format. In multitasking operating systems like Lunix, SMOS, or OS/A65, a binary file cannot be loaded to a fixed location that might already be used by another program. Therefore it must be possible to relocate the program to an arbitrary address at least at load time. In addition to that, more specific information might be stored in a binary executable file, like interrupt vectors for example.

This text gives a good solution to this problem for the 6502 CPU and an assembler source format to use this format in a general manner. The file format can even be used as an object file format, i.e. a format a linker can use as an input file. It is also usable as a 65816 file format. Instead of zeropage addressing modes, the 65816 has direct addressing modes, that add the contents of the direct register to the zeropage address in the opcode.

1) 6502/65816 specifics

The 6502 has the special feature of a 'zeropage', i.e. a very limited memory address range used for special addressing modes. So the format should not only provide a means to relocate absolute addresses but also zeropage addresses. The 65816 replaces zeropage addressing with direct addressing modes.

The stack space is also very limited. A binary format has to provide a measure of how much stack space is needed for the application.

Such limits should be defined as 2 byte values, even if the 6502 only has a range of 8 address bits for zeropage and stack. But the 65816 behaves differently, it has a 16 bit stack pointer for example. For further expandability, a 32 bit format should be provided, although the 16 bit format suffices for the 65816 already.

Another problem is, that an address can be 'split', i.e. you can just use the high byte or the low byte separately in an opcode. This gives need to a special relocation table format, that can cope with half-address references. The 65816 can even have three byte addresses, i.e. address in a segment and segment number.

2) binary format

2.1) General

The file differs from the known Commodore file formats, in that a lot more information is stored in the file. First the data is structured in separate segments to allow different handling of text (program code), data (like tables) and bss (uninitialized data).

Also tables are included to allow late binding, i.e. linking the file with other files at load time, and relocation, i.e. executing the file at different addresses in 6502 address space.

2.2) Segments

As already used in other formats, the assembler uses three different segment types, i.e. text (the actual program code), data (initialized variables), and bss (uninitialized variables). To have these different segments seems to be 'overdesigned', but they actually make memory handling easier in more complex operating systems or systems with virtual addresses (OS/A65, for example).

The text segment is defined to be read-only memory. This doesn't allow self-modifying code in this segment, but allows memory sharing in virtual memory architectures. The data segment actually is like the text segment, only it is allocated writable. This segment might not be shared between different processes. The contents of these two segments are loaded from the file. The bss segment is uninitialized data, i.e. upon program start, it is not defined - and not loaded from the file. This area is read-write and can be used during program execution. It is also not shared between processes. In addition to these segments, the 6502 format also includes a zeropage segment type, to allow zeropage variables to be relocated. This zeropage segment is like a bss segment, in that only the length, but not the data is saved. For the 65816 the zeropage segment changes its meaning to a bank zero segment.

The different segments hold different type of data and can be located anywhere in memory (except zero segment, which has to be in the zeropage resp. bank zero). The program must therefore not assume anything about the relative addresses between different segments.

2.3) Relocation

In general, there are three ways to handle the relocation problem so far:

- Tables: have a relocation table for a text segment
  if the relocation table is put in front of code
  you have to save the table in a side-storage
  if table is behind, you still cannot relocate 'on the fly'.

- Deassembling: go through the code, deassemble it and change all absolute
  addresses. Problem: needs to know or have hints about where some
  data is in the code.

- Relocation info in the code: here each address is preceeded with an
  'escape' code and is relocated when loading. But this disallows block
  oriented transfer from storage media to memory.
This binary format uses the first method, with the table after the code/data. This way block oriented transfer for the text/data segment can be used. And while reading the relocation tables bytewise, the relocation can be done without the need to save the table somewhere.

2.4) External References & Exported Globals

As this file format should not only be used as an executable format, but also as object file format, it must provide a way to define references - references exported from this object and labels referenced in this object. The external references list (also called 'undefined list') lists the addresses where labels not defined in this object are referenced. The exported globals list lists the addresses that are available for other objects. The labels are named by null-terminated ASCII strings.

Even an executable file can have non-empty globals and externals lists, but only if the operating system allows this. In this case, so called 'late binding' is used to link the object with some global libraries at link time.

2.5) File extension

The proposed standard extension for the described format is ".o65" when used as an object file.

2.6) Format description

The binary format is the following:

   (
	header

	text segment

	data segment

	external references list

	relocation table for text segment

	relocation table for data segment

	exported globals list
   )
The description of the parts follows:

2.6.1) Header

The header contains the minimum needed data in a fixed struct. The rest of the necessary information is put into the header options. [Note: .word is a 16 bit value, low byte first, .byt is a simple byte. .long is a 32 bit value, low byte first. .size is a 16 or 32 bit value according to .word and .long, depending on the size bit in the mode field ]

This is the fixed struct:

   (
	.byt $01,$00		; non-C64 marker

	.byt $6f, $36, $35	; "o65" MAGIC number!
	.byt 0			; version

	.word mode		; mode word

	.size tbase		; address to which text is assembled to 
				; originally
	.size tlen		; length of text segment
	.size dbase		; originating address for data segment
	.size dlen		; length of data segment
	.size bbase		; originating address for bss segment
	.size blen		; length of bss segment
	.size zbase		; originating address for zero segment
	.size zlen		; length of zero segment
	.size stack		; minimum needed stack size, 0= not known.
				; the OS should add reasonable values for
				; interrupt handling before allocating
				; stack space
   )
The mode word currently has these defined bits:
	mode.15 :	CPU	0= 6502 	1= 65816 
	mode.14	:	reloc	0= bytewise... 	1= page(256byte)wise relocation
						   allowed
	mode.13	:	size	0= size=16 bit,	1= size=32 bit
	mode.12 :	obj	0= executable	1= object file
	mode.11 :	simple	0= (ignored)	1= simple file addresses
	mode.10 :	chain	0= (ignored)	1= another file follows this one
	mode.9  :	bsszero 0= (ignored)	1= the bss segment must be zeroed out for this file

	mode.4-7 :	CPU2	0000 = 6502 core (no undocumented opcodes)
				0001 = 65C02 /w some bugfix, no illegal opcodes
				0010 = 65SC02 (enhanced 65C02), some new opcodes
				0011 = 65CE02 some 16bit ops/branches, Z register is modifiable
				0100 = NMOS 6502 (including undocumented opcodes)
				0101 = 65816 in 6502 emulation mode
				011x = reserved
				1xxx = reserved
				^^^^ 
				|||+- Bit 4
				||+-- Bit 5
				|+--- Bit 6
				+---- Bit 7

	mode.0-1:	align	0= byte align,	
				1= word (i.e. 2 byte) align
				2= long (4 byte) align
				3= block (256 byte) align
The CPU bit tells the loader for which CPU the file was made. This has implications on the zero segment, for example. Also a system can check if the program will run at all (on a 6502 that is). The reloc bit defines if an object file can be relocated bytewise, or if it must be page-aligned. A page has 256 bytes. The restriction to pagewise relocation simplifies the relocation table and also allows simpler compilers/assemblers. The size bit determines the size of the segment base address and length entries. Currently the 16 bit size (size bit = 0) works for 6502 and 65816 CPUs.

The obj bit distinguishes between object files and executables. An object file is used as assembler output that can be linked with other object files to build an executable or an object library.

The simple bit signals the loader that the load addresses have a specific form. This form fulfills the following conditions:

        dbase   = tbase + tlen
        bbase   = dbase + dlen
This condition ensures that the loader can actually load the text and data segments in one block, and can then use the same base address for the relocation of all three, the text, data and bss segments. The simple mode bit is optional, in that when it is set the conditions must be fulfilled, but if not set the conditions may or may not be fulfilled.

The chain bit signals the loader that after the current o65 "file" there is another "file" appended to the actual file on disk. This way "multi-o65" files can be built. An "o65" file in a multi-o65 file is here now called "section". Chaining allows the following scenarios:

  1. Init code in a separate segment - the chain contains a first o65 section with the code to run the program, and a second o65 with initialization code that can be thrown away after init. As the init code may just as any program need zero-, data- and bss segments, a full o65 file structure is provided in the section.
  2. Larger systems have mapped memory. The chain bit allows to provide different sections to be loaded in different memory mappings in a single file.
  3. Fat binaries: A single file could hold different o65 sections, one for each different type of CPU. The loader could ignore the parts that do not fit the CPU that it is running on.
The loader may support binding undefined references in a later section to global labels exported from an earlier section. Otherwise the operating system should provide calls to access the separate sections, e.g. when they are loaded into different memory mappings. The next o65 section starts again with the header (including non-C64 marker and magic number), so sections with different characteristics may be chained. The last section must have chain=0. The chain bit is optional, if it is set and a loader does not support it, the file may be rejected right away. It is recognized that for these purposes the loader must have a means of identifying different sections and their purposes. Currently there is no simple way except using the order of the sections in the file. A more complicated way would be to use optional headers in each section.

The bsszero bit tells the loader that the executable to be loaded requires the bss segment to be zeroed out. If it is not set, then the code must not assume any special value in the bss segment (which is the default behaviour for o65 version 1.2 and below). A loader that does not support zeroing out the bss segment must reject a file with this bit set.

The CPU2 bits determine the type of 6502 CPU. 6502 core means that only the originally documented 6502 opcodes are used. In addition the NMOS 6502 signals that in addition to the code, some undocumented opcodes of the NMOS version are used. The other values indicate other versions of 6502 CPUs. Please see the appendix for an additional note.

The two align bits give the address boundary the segments can be placed. Even the 6502 needs this, as, for example, "jmp ($xxFF)" is broken. The align bits are valid for all of the segments. [Note: if reloc=1, then align should be 3. But if align=3, reloc need not be 1, because reloc switches to a simpler version of the relocation table. The reloc bit might be obsoleted in newer versions of this format. Though it should be set, if necessary.]

All unused bits in the mode field must be zero.

Note that the header size is 26 if the size bit is zero and 44 if the size bit is one.

The fixed sized struct is immediately followed by a list of header options. Each header option consists of a single byte total length, a type byte and some data bytes if needed. A single length byte of $00 ends the header option list.

   (
  	{			; optional options, more than one allowed
	   .byt olen		; overall length (including length and type
				; byte
	   .byt otype		; option type
	   [ .byt option_bytes ]
	}
	.byt $00		; end of options marker (i.e. option len=0)
   )
The header options currently defined/proposed are:
- Filename:
  type=0; len=strlen(filename_in_ascii)+3; content="filename_in_ascii",0
  The string contains the name of the object.

- Operating System Header
  type=1; len=?
  the first data byte is the OS type:
    	1 	OSA/65 header supplement
	2	Lunix header supplement
	3	CC65 generic module (new in v1.3)
	4	opencbm floppy modules (new in v1.3)
	[others to follow?]
  the following data contains OS specific information.
  A suggested data byte is the OS version as second byte.

- Assemblerprogram:
  type=2; len=strlen(ass)+3; content="ass",0
  The string contains the name of the assembler resp. linker that produced 
  this file/object.
  For example (syntax see below)
     .fopt 2, "xa 2.1.1g",0
  becomes
     0c 02 78 61 20 32 2e 31 2e 31 67 00
  in the file.

- Author:
  type=3; len=strlen(author)+3; content="author",0
  The string contains the author of the file. 

- Creation data:
  type=4; len=strlen(date)+3; content="date_string",0
  The string contains the creation date in format like:
  "Sat Dec 21 14:00:23 MET 1996", where we have the day, Month, date,
  time, timezone and year. See output of `date`...
2.6.2) text and data segments

The text and data segments are just the assembled code. The only difference between text and data segments is the read/write mode of the two segments. Therefore, to be compliant to this file format, self-modifying code goes into the data segment.

2.6.3) Undefined references list

The next list is an ASCII list of labels that are referenced in this file but not defined. The lists is preceeded with the number of undefined labels (16 or 32 bits, according to the mode.size bit).

undef_list:	number_of_undefined_labels.s
		"undefined_label1",0
		"undefined_label2",0
		...

The character encoding and length of the names of the undefined labels should be appropriate for the target platform, that may define additional constraints. The encoding must allow zero-terminated byte arrays as string representations. To allow short loading times, the names should not be exceedingly long.

2.6.4) Relocation tables

The relocation tables are the same format for the two segments, text and data. In general a relocation entry consists of the offset from the previous relocation address to the next one, the type of the relocation and additional info. Relocation not only defines the relocation when moving object code to a different address, but also filling in the undefined references.

Each table starts at relocation address = segment base address -1. I.e. if the segment base address is $1000 for example, the first entry has an offset computed from base address-1 = $0fff. The offset to the next relocation address is the first byte of each entry. If the offset is larger than 254 (i.e. 255 or above), than a 255 is set as offset byte, the offset is decremented by 254 (note the difference) and the entry is started again.

{ [255,...,255,] offset of next relocation (b), typebyte|segmentID [, low_byte] }+
where typebyte has the bits 5, 6 and 7 and is one of
WORD	$80	2 byte address
HIGH	$40	high byte of an address
LOW	$20	low byte of an address
SEGADR	$c0	3 byte address (65816)
SEG	$a0	segment byte of 3 byte address
The segmentID stands for the segment the reference points to:
0		undefined
1		absolute value
2		text segment
3		data segment
4		bss segment
5		zero segment
(Of course the absolute value will never appear in a relocation table, but this value is necessary for the exported list)

If the type is HIGH, the low byte of the value is stored behind the relocation table entry, if bytewise relocation is allowed (header mode field bit 14). If only pagewise relocation is allowed, then only HIGH relocation entries can occur, and the low byte is implicitely set zero (i.e. it is _not_ saved in the relocation table).

If the type is SEG, then the two lower bytes of the three byte segment address are stored behind the entry in the relocation table, lower byte first.

If the segment is "undefined", the typebyte is immediately followed by the two (mode size=0) or four (mode size=1) byte value index in the undefined references list. If it is a high byte relocation, the low byte is saved behind the index value. The index value determines the undefined reference, which must be looked up by the loader.

The value taken from the relocation address in the segment, together with the low byte from the relocation table (if HIGH entry) form the address used if the segment would be used unrelocated. To relocate the segment, the difference between the relocated segment base address and the segment base address from the file is then added to the above address. The result is again saved in the segment.

A zero offset byte ends the relocation table. The first offset is computed from the segment base address-1, to avoid a 0 value in the first entry.

Note that direct addressing modes do not generate entries in the relocation table. instead it is assumed that the 65816 direct register holds the correct value (i.e. zero segment base address) when running this program.

Example (for file contents see appendix B.1):

Segment Base address in file (header.tbase) is $1000. The start address of the text segment after relocation is real.tbase = $1234.

Now the first (unrelocated) address at which a relocation should take place is here:

$1222	A9 23 		lda #>vector
To compute the relocation table entry, we have to identify the address that must be relocated. This is not the opcode address $1222, but the address of the parameter to the offset, i.e. $1223. The first relocation table entry offset is calculated from the start of the segment minus one, i.e. $0fff in this case. The offset to be stored in the relocation table therefore is $1223-$0fff=$224. This is larger than $fe, therefore the first byte in the relocation table entry is $ff, and the offset is decremented by $fe, which results in $126. This again is larger than $fe, so the next byte in the relocation table entry is $ff again and the offset is decremented by $fe, resulting in $28. This offset becomes the next byte in the relocation table entry. The offset for the next relocation table entry is then computed from $1223, because this is the last relocation address.

Now we reference the high byte of an address, lets say vector=$23d0 (not relocated), in the text segment. Therefore the relocation type becomes 'HIGH | text_segmentID = $42', which is the next byte. Because we are referencing a high byte of an address, the low byte of the unrelocated address is saved behind the typebyte in the relocation entry. This byte is missing when referencing a low byte or address.

The relocation table entry is now:

$ff, $ff, $28, $42, $d0.
When actually doing the relocation, the relocation pointer is initialized to real.tbase-1 = $1233 (this value correlates to the unrelocated text segment start minus one, $0fff). Then we add the offset of $224 from the first relocation table entry, which brings us to $1457, where the parameter byte of the opcode is after loading the file to $1234. We now have to compute the new address, where vector is after relocation. So we take the unrelocated low byte from the relocation table ($d0) and the high byte from $1457 ($23).
vector_file = ($23 << 8) + $d0 = $23d0
To this value we add the difference between the address the program is assembled to and the real load address:
vector_relocated = vector_file + (real.tbase - header.tbase)
		 = $23d0 + ($1234 - $1000)
		 = $23d0 + $234
		 = $2604
From this value the high byte is then written back to the address $1457. Had we not saved the low byte in the relocation table, and only added the high bytes, we would have missed the carry bit that increments the high byte in this case!

Had "vector" now been an undefined reference, and "vector" would be the second label in the undefined references list, we would get the following relocation table entry (assuming mode.size=0):

$ff, $ff, $28, $40, $02, $00, $00
The value computed with the above formula for vector_file is now added to the address the label "vector" now really has (This must of course be looked up into an external table or list). Had the opcode been "LDA #>vector+$567", then the low byte in the relocation table would be $67, while the high byte in the opcode would be $05. This value would result in vector_file and the real address of "vector" would be added before wrting back the high byte to the opcode.

2.6.5) exported globals list

The global list is a list of names, together with the target segment and the offset in the segment for each name. It is preceeded with the number of exported labels. This allows the loader to allocate a table large enough, if needed. The number of labels and the offset value are 16 bit or 32 bit values according to the size bit in the header mode field. The segmentID is a byte value and the same as in the relocation table entry (see section 2.6.3).

	number_of_exported_labels.s
        "global_label_name_in_asc1",0, segmentID.b, value.s
	...

Note: an undefined reference can not be exported. Doing this would lead to circular references for example when linking multiple object files, therefor it is not allowed.

The character encoding and length of the names of the undefined labels should be appropriate for the target platform, that may define additional constraints. The encoding must allow zero-terminated byte arrays as string representations. To allow short loading times, the names should not be exceedingly long.

3) assembler source format

The assembler source format is a suggestion only. It will be implemented in xa65, a cross assembler for 6502 CPUs running on Unix/Atari ST/Amiga as a reference platform.

The assembler provides a way to embed absolute address code in relocatable code. This is needed when code should be copied to a specific location known at assemble time. There also is a way to make a file 'romable'. You can give the start address of the _file_ in ROM, and the assembler automatically sets the text segment start address to where the code will be in the ROM. Of course, the other segments must be taken care of with -b? command line parameter, that set the segment start address.

3.1) embed absolute code in relocatable files

When the assembler is started in relocatable mode, everything is put into a .o65 relocatable file. All address references generate relocation table entries. If a "*= value" pseudo opcode is encountered, then the assembler switches to absolute mode. The following opcodes don't generate relocation table entries. If a "*=" without a value is read, then the assembler switches back to relocatable mode. The relocation program counter is increased with the length of the absolute part and the absolute code is embedded between the relocatable parts.

3.2) embed relocatable code in absolute files

This is dropped - too complicated. Should better be done with some objdump or linker programs or so.

3.2) Header options

Before any opcode (after starting in relocatable mode, or after a .reloc opcode), a header option can be set by:

	.fopt byte1, byte2, ...
The header option length is automatically set by the assembler. An example for an file author entry:
	.fopt 3, "Andre Fachat",0
The 3 is the type byte for the author header option. The last zero ends the name. The assembler can be configured to automatically include an assembler header option into a file header.

3.3) allocation of data segment/zeropage segment address space

The assembler switches between the different segments by the means of ".text", ".data", ".bss" and ".zero" pseudo opcodes. After starting in relocatable mode, the assembler is in the text segment.

The text segment contains the program code. Data holds the initialized data, while bss and zero segments contain uninitialized data for normal/zeropage address space. Everything that is between one of these segment opcodes and the next segment opcode gets into the corresponding segment, i.e. labels, assembled code etc. The text and data segments are saved in the file, while for the bss and zero segments only the length is saved in the file.

The assembler should issue a warning when a direct addressing mode is used without a zero segment address and vice versa for 65816 CPUs.

3.4) referencing data/bss/zeropage addresses

One problem with the 6502 is, that it cannot load an address within one step or assembler opcode. So an address is loaded with standard byte opcodes, like "lda #<label". But how do we decide, whether "label" is an address or not, and what do we if we get something like "lda #zp_label + 12 * label2"?

The assembler is now intelligent enough to evaluate such expressions and check for:

- no address label			: ok, absolute
- one address label, only add to label	: ok, relocate
- difference between two addresses 	: If addresses in same segment, compute
					  diff and set absolute, otherwise bail
- everything else			: warning
This way there is no change in syntax. Address labels are distinguished by using the "label:" syntax, as opposed to "label = value". Also, if the assembler is capable of doing so, an address label may be defined by "label opcode", i.e. without a colon.

3.5) aligning code

The 6502 has the problem that some opcodes (e.g. "JMP ($xxFF)" are broken, if the address given is at some (odd) address. But when loading a relocatable file, one cannot know if an address will be odd or even. Therefore there is a new opcode,

	.align 2
that aligns the next address at the given address boundary. Valid values are 2, 4, and 256. For the 6502 the opcode may insert NOP operations ($EA opcodes) until the alignment is reached. In addition the header align bits must be set appropriately.

4) Additional Notes

4.1 Clearance

This file is surely not the optimum and could be improved. Also the header option "assigned numbers" should be added here.

For this reason the author, André Fachat, will function as a clearing point, where problems can be discussed and numbers can be assigned.

4.2 Character Sets

Appendix

A) Additional note

A.1) "inofficially" supported CPUs.

As this format has already been used for other CPUs than the 6502 or 65816, there are CPU codes that are reserved for these CPUs. Please note that these codes are derived from the current use of the file format and not any preference of the author.
        mode.4-7 :      CPU2    0000 = 6502 core (no undocumented opcodes)
                                0001 = 65C02 /w some bugfix, no illegal opcodes
                                0010 = 65SC02 (enhanced 65C02), some new opcodes
                                0011 = 65CE02 some 16bit ops/branches, Z register is modifiable
                                0100 = NMOS 6502 (including undocumented opcodes)
                                0101 = 65816 in 6502 emulation mode
                                011x = reserved

                                1000 = 6809

				1010 = Z80

				1101 = 8086
				1110 = 80286 
                                ^^^^
                                |||+- Bit 4
                                ||+-- Bit 5
                                |+--- Bit 6
                                +---- Bit 7

B) Late binding

Late binding means that during the assembler run the values of some variables are not known. Instead these variable values are filled in when the program file is loaded into the system.

As an example let's discuss an example for a program that needs to access some hardware at the expansion port of the C64. The hardware is located either at IO1 ($de00) or IO2 ($df00) depending on some hardware switch. To allow to use only one executable for the program, it uses a variable "IOPORT" that is not defined in the program itself, but set by the o65 loader using late binding.

The program accesses the io port is using the variable:

	lda IOPORT
When assembling this one line program, the assembler is told to accept the variable IOPORT as undefined. In xa this is done using the -L option:
	xa -R -LIOPORT -o program.o65 program.a65 
Then program.o65 contains relocatable code with an undefined reference named IOPORT. Every time the code uses this variable, the relocation table contains an entry with a reference to the label in the undefined reference table.

The resulting file looks like:

00000000  01 00 6f 36 35 00 00 00  00 10 03 00 00 04 00 00  |..o65...........|
00000010  00 40 00 00 04 00 00 00  00 00 00 ad 00 00 01 00  |.@..............|
00000020  49 4f 50 4f 52 54 00 02  80 00 00 00 00 00 00     |IOPORT.........|
The first six bytes are the magic number. After that the mode bits (bytes seven and eight) are all zero. The text segment starts at $1000 and has a length of 3. The data segment starts at $0400 with a length of zero, and the bss segment starts at $4000 with a length of zero too. The zerospace segment starts at $0004, but also with a length of zero. The minimum stack size needed is zero too. The list of header options starts at file offset $001a. As the first byte is zero, there is no header option. After this follows the text segment containing the bytes $ad $00 $00. If the data segment size would not be zero, the data segment would come here. Then the undefined references list follows. The first two bytes (at file offset $001e) state that there is a single undefined reference, and the name of the undefined reference "IOPORT" followed by the ending zero byte is stored after this number. Then the relocation table follows. The first byte in the relocation table is $02. As the relocation table offset starts at tbase-1 this means that the first relocation position is at the second byte (offset 1) in the text segment. The type byte $80 defines that it is an undefined, absolute reference. The next two bytes define the index in the undefined reference table, in this case $0000, which means that the reference IOPORT is referenced. The next byte is zero, signalling the end of the text segment relocation table. At file offset $002b the relocation table for the data segment starts. As the first byte is zero, there is no relocation entry for the data segment (obviously, as the data segment is empty). After this relocation table the number of exported globals follows. This is zero, as there is no exported global variable.

When loading the file, the loader must know in advance what value IOPORT should be assigned. This is not further discussed here. When the loader loads the file, and recognizes the name IOPORT in the undefined references table, it remembers the index of this name in the table. Then, when the relocation table contains a reference to the undefined label with the index value for IOPORT, the value for that variable is then used in the relocation.

If, for example the loader knows that IOPORT=$de00, then the text segment is relocated to

	$ad $00 $de
If the source is changed for example to
	lda IOPORT+1
one byte in the the file changes:
00000010  00 40 00 00 04 00 00 00  00 00 00 ad 01 00 01 00  |.@..............|
                                               ^^
If the file is then relocated, the loader adds the value in the opcode in the text segment ($0001 in this case) to the value resuling from the reference resolution (IOPORT in this case). The resulting code then becomes:
	$ad $01 $de

C) File examples

C.1 Example from section 2.6.4

The example source file is

.text
        .dsb $222,$aa

        lda #>vector

        .dsb $23d0-$1224,$55

        vector = *;

Using the command
	xa -R -o test2.o65 test2.a65
results in this file
00000000  01 00 6f 36 35 00 00 00  00 10 d0 13 00 04 00 00  |..o65...........|
00000010  00 40 00 00 04 00 00 00  00 00 00 aa aa aa aa aa  |.@..............|
00000020  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
*
00000230  aa aa aa aa aa aa aa aa  aa aa aa aa aa a9 23 55  |..............#U|
00000240  55 55 55 55 55 55 55 55  55 55 55 55 55 55 55 55  |UUUUUUUUUUUUUUUU|
*
000013e0  55 55 55 55 55 55 55 55  55 55 55 00 00 ff ff 28  |UUUUUUUUUUU....(|
000013f0  42 d0 00 00 01 00 76 65  63 74 6f 72 00 82 d0 23  |B.....vector...#|
00001400
After relocating the file with
	ld65 -bt 4660 test2.o65
to the new address $1234 (using ld65 from the xa package), the resulting file is:
00000000  01 00 6f 36 35 00 00 00  34 12 d0 13 00 10 00 00  |..o65...4.......|
00000010  00 40 00 00 02 00 00 00  00 00 00 aa aa aa aa aa  |.@..............|
00000020  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
*
00000230  aa aa aa aa aa aa aa aa  aa aa aa aa aa a9 26 55  |..............&U|
00000240  55 55 55 55 55 55 55 55  55 55 55 55 55 55 55 55  |UUUUUUUUUUUUUUUU|
*
000013e0  55 55 55 55 55 55 55 55  55 55 55 00 00 ff ff 28  |UUUUUUUUUUU....(|
000013f0  42 04 00 00 01 00 76 65  63 74 6f 72 00 82 d0 23  |B.....vector...#|
which confirms the addresses computed above.

B.x more examples

(to be done with reference assembler)