OS/A65 - a Multitasking/Multithreading Operating System for 6502 computers

This page contains an article originally published in Commodore Hacking Issue#13 in July 1996. It describes the GeckOS operating system.

Table of content

Introduction
What Should a Real OS Do?
Modern Kernel Design
The OS/A65 Operating System
Operating without an MMU
Port to the C64
Port to the C128
Conclusion

Introduction

In 1989, I first thought about building a self-designed computer. I already had some experience with 6502 based computers. A friend of mine and I had been trying to build a telephone line switch computer based on the 6502. Although the project never succeeded (well, to a certain extent it worked, but then we always got new ideas...), the project gave me an idea of what an OS should be capable of.

With my homebrew computer, I not only wanted to implement one of those 'simple' OSes as in the C64 or other 6502 based computer, but I also wanted to go a step further and do a real multitasking, microkernel design OS. This constrained the hardware design to allow memory mapping of key memory locations, including the 6502 zero-page and stack.

What Should a Real OS Do?

A real operating system has four major parts that handle the input/output, filesystems, memory management and process handling. At the very least, a "real" OS includes some form of multitasking :-)

Process Management

Process management forms one block of an OS. A multitasking operating system requires more administration than a single-tasking OS. A process, or task, can be seen as a set of allocated resources. These resources include memory pages, swap pages, open files, and even the CPU, if the task is active. The CPU is the processing element that executes the given program using the allocated resources. Therefore, the CPU state has to be saved if a task is interrupted. This allows undisturbed continuation after the interruption is handled. For each task, the allocated resources have to be registered and freed. As the CPU can be allocated to only a single task at a given time, it must be shared among all the active processes. So, in order to create the illusion of executing multiple processes at the same time (pseudo-parallelism), the CPU has to be assigned to one task after another, at a speed that achieves this illusion. If the assignments happen too slow, the illusion is lost, but if the speed is too fast, the CPU spends all of its time administering the tasks and not enough time executing the tasks. The same concepts hold true for multiprocessor computers, except that such a machine can achieve parallel operation on as many tasks as there are CPUs in the system.

A scheduler interrupts the CPU after a certain time to allow the CPU to be assigned to another task. If the scheduler interrupts the task itself to schedule a new task, the system is called preemptive. If the task has to give the CPU back to the system, it is called cooperative multitasking, like in MS Windows (tm). of the two, preemptive is preferred, as cooperative multitasking fails when a single process forgets or is unable to relinquish control of the CPU. If such a scenario occurs, the computer is "blocked".

I/O

As the second part, I/O provides a uniform interface to all peripherals, including character devices (serial lines, parallel printers), or block devices (disk drives). These services are normally provided by device drivers, which, in some operating systems, are even loadable. One problem is the communication between device interrupt routines and the rest of the system. Andrew Tanenbaum, in _Operating Systems, Design and Implementation_, says that, "Interrupts are an unpleasant fact of life. They should be hidden away, deep in the bowels of the system, so that as little of the system as possible knows about them." Nevertheless, interrupts are necessary to handle time critical operations, like providing new data to serial lines. Provisions must be taken to avoid data corruption by an interrupt routine and a program (or the kernel) using the same memory locations at the same time. So, even if you don't like interrupts, you have to use them.

Filesystem

As the third part, the filesystem provides user-level abstraction of I/O. Files store information of any kind. It is the most visible part of the OS. The naming conventions make a big part of the OS view for the normal user. (Remember the 8+3 character filename length restriction in MS-DOS filesystems?) The filesystem itself provides a standard interface to the user, although the underlying structure (i.e. how files are stored) may differ on different devices. In UNIX operating systems, even devices can be used as files and are represented by special entries in the directory structure (on the newest version of Linux (pre2.0.) even files can be used as filesystem (that hold files that can be used as filesystem (that hold files.. Ooops ;-))). I will not go further into this issue, but how a filesystem is organized can sometimes become a religious war among their respective followers. Since a filesystem keeps all internal structures to itself, it is possible to mount differently structured filesystems in one system.

Memory Management

As the final part, memory management keeps track of which parts of the memory are in use and which are not. Memory can be allocated when needed and is freed for other uses when no longer needed. Modern systems use the concept of virtual memory. Virtual memory specifies a system that uses a translation table between the CPU and the real memory locations. When the CPU tries to access a certain memory address, the address given in the opcode does not reflect the real address used to access the memory chips. Instead, the translation table is used to look up the real memory address from the `virtual' address given in the opcode. So, if there is no appropriately sized contiguous memory block available in real memory, such a block can be built using smaller chunks by setting up the translation table for the task. The lookup is done by the memory management unit (MMU). Software called a memory mapper is used to load and change the table. It loads the table with the values set up for each task. So the same opcode address in two different tasks accesses very different memory locations in the RAM.

More sophisticated memory managers even do swapping. The memory manager allows a task to allocate more memory than actually available. If a memory location that is not available is accessed, the CPU is trapped (the ability to do this cleanly was one of the (IMHO very few) additions from the Motorola 68000 to the 68010 CPU). The memory manager then saves (swaps out) another memory page to disk and uses the now free memory. The CPU can then continue. If a swapped out memory address is accessed, the CPU is halted again and the page is swapped in again - swapping out another page if necessary. Clearly this slows the whole thing down, but then virtual addresses are a very nice feature. You can hide the pages used by other tasks or map the same memory to several tasks, making it shared memory.

In the CBMs...

These inclusion of these features implies that all resources can be assigned equally to each task. As there are problems with this in the 6502 (think of the stack), another concept should at least be mentioned. The IBM `Virtual Machine' (VM/*) series of operating systems emulates the entire computer's hardware resources for a single task (i.e. a task doesn't talk to the system via system calls, but by writing data into some I/O registers). These register accesses are trapped and appropriate action is taken. This means that the task can behave as if it owns the entire machine. This also means it must load its own OS to handle disk and other I/O (the second part of the "VM/*" naming scheme).

The Commodore PET and its successors, the VIC, C64 and 128, already contain some functionality of a "real" OS. On these machines, a single interface allows uniform file access across different devices (tape, disk, console). All of them are accessed via the standard OPEN / CKOUT / CHKIN / CLOSE system calls. However, I/O comprises only one part of an OS, as defined above. The Commodore 8 bit computers are single CPU, singletasking systems (for exceptions see below). Therefore, no process management is necessary. In addition, there is no memory management. All memory is assigned to the single running process. (Although sometimes the need for multiple $cXXX pages seems pressing.) The filesystem, an important part of an OS, is put into the floppy drive on Commodore 8-bit computers and is accessed via standard I/O over the IEEE bus.

One interesting exception is the old (IEEE488) Commodore disk drives. These drives have not one but two processors: one 6502 and a 6504 that run in parallel and share some memory. The 6504 is used as a floppy drive controller that handles the low level disk I/O. The 6502 gets the commands from the bus and processes the `filesystem' task. By writing low level commands to certain memory locations, it sends commands to the floppy drive controller (the 6504) that in turn reads and writes the disk blocks. If you look at the 1541, for example, you can see that this concept still holds true. However, in the 1541, the interrupt routine takes the role of the drive controller. Ironically, this reduction in CPUs was done to save 1541. In its effort to cut costs, Commodore forced the single CPU of the 1541 to multitask, creating a bare operating system to support drive operation.

Modern Kernel Design

Early operating systems started with a monolithic approach. i.e. all the system functions were provided with one big binary. Modern UNIX systems- even Linux, which is not derived from the original UNIX source- use this concept.

A modern kernel instead has a microkernel design. A microkernel only provides the means of communication between different processes, not doing much itself. Some implementations even have the scheduler (!) or memory manager (!) running as a separate task. The kernel calls these processes to find out about free memory pages and which task to start next. This reduces the size of the kernel and allows greater flexibility. On the downside, the microkernel designs forces more messages to be transferred, slowing down operation somewhat.

One `famous' microkernel implementation is the current Mach microkernel. This kernel, and its derivatives, has been ported to many platforms. The PowerPC Platform OS/2 is based on a mach derived microkernel, as well as Linux for PowerPC Macintosh (mklinux). But, these are relatively simple ports of already existing operating systems. These mach `single servers' don't allow alternate OS system to run alongside or instead of themselves. On the other hand, the GNU Hurd operating system exploits the mach design to allow any server to be replaced by another.

The OS/A65 Operating System

Now let's get from theory to practice...

The Kernel Implementation

When it comes to hardware design, the 6502 has a big advantage: It is a very simple CPU. With only a few support ICs, it is possible to build a fully functional computer (neglecting video and sound capabilities). On the other hand, the simplicity of the CPU has drawbacks. The 6502 has only three multi-purpose registers, and all are 8 bits. As such, none can hold a complete 16 bit 6502 memory location. Even the stack pointer is 8 bits, restricting the stack to the 256 bytes from $0100 to $01ff. The stack size and the absolute addresses are a severe limitation if you intend to develop a multitasking OS on this machine.

Because I was developing a new system, I could do anything I wanted to get around this problem. I solved the stack problem by using an MMU, a Memory Management Unit. (Although the used chip, the 74ls610 is stated to be a `Memory Mapper' for paged memory mapping, I call it a `Memory Management Unit'...). The upper 4 address bits are used to select one of 16 8-bit registers. (The 74ls610 has 12-bit registers, but only 8 bits are used, for obvious reasons.) The output of the registers were then used as the upper 8 address bits, extending the total accessible memory to 1 MByte. The CPU could switch each 4 kByte page to any of the 256 pages available by changing the register values in the MMU. Oops - just introduced virtual addresses to the 6502 ;-)

For each task, new memory is allocated and saved in the task's page table. When a task is activated, the MMU registers are loaded with these values, giving each task its own memory environment. In the described OS, the memory `manager' is part of the kernel, although a quite independent part. The virtual addresses in the opcodes are translated to the real addresses through the contents of the MMU registers.

The tasks are handled by the environment routines. These routines set up the environment tables used by the scheduler. The (round robin) scheduler performs the task switching and decides which task to run next. Preemptive multitasking is achieved by using the interrupt to switch between different tasks. The most important routines are the two kernel entry and exit routines. These sub-routines have to switch the pages and the stack pointer as well as preserve all other register values.

The tasks providing filesystem services register with the filesystem manager. They are then assigned drive numbers. Although UNIX filesystems are virtual, where a user can reconfigure the system at any time, developing such a system for the 6502 would overly complicate matters. Different filesystems can then be used at the same time with different drive numbers. The drive numbers are translated by the filesystem manager when passing the message through to the filesystem task. Currently `fsiec' for IEEE488 (parallel IEC-bus) interfaced CBM disk drives, `fsibm' (for PC style disks) and `fsdev' for using devices as files are provided.

The interface to the hardware is provided by the devices. Devices are simply stripped off tasks and are called as subroutines only. A device- filesystem (`fsdev') task translates filesystem requests to the device interface, so that any device can be used like a file. The general structure can be seen in Fig.1.

    ---------- --------- --------- ------ -------
    |  fsdev | | fsiec | | fsibm | | sh | | mon | tasks...
    ---------- --------- --------- ------ -------
    ---------------------------- -------------- ----- ---------- --------
    |          |               | |    fsm     | |   | |        | |      |
    |          |      env      | -------------- |   | |        | |      |
    |          |               ------------------   | | stream | | mem  |
    |          |                                    | |        | |      |
    |          -------------------------------------- |        | |      |
    |             devices                           | |        | |      |
    ------------------------------------------------- ---------- --------
    --------- ------- ----------- ----------
    | video | | par | | spooler | | serial | devices...
    --------- ------- ----------- ----------

Fig.1: General OS structure. The devices and tasks make up the features of the system, while the kernel provides communications. (fsm = filesystem manager, env = environment handling, task switcher)

In addition to executing code within the task, tasks also need to execute to communicate with other tasks or components of the OS. To communicate between tasks, a send/receive interface is provided. Using a rendezvous technique (the sender blocks till the message can directly be copied to the receiver and vice versa) the mechanism is kept simple, as no buffering is involved. Semaphores can be used for synchronization between different tasks. Data streams are used to pass data between tasks, and even between tasks and devices. Each task has a standard input, output, and error streams opened upon creation, analogous to the stream in UNIX systems. The shell can even redirect or pipe the output.

Program examples

The shell is a good example to show some of the capabilities of the system. As already mentioned, each task has three specially assigned streams. Filesystem tasks don't use them (and have them set to an ignored stream), but shells normally get started with these streams connected to a terminal device or a serial line device. The streams are normally opened by the task that `forks' the new task. On boot, the ROM contains some hints about which device number to open for a program. When a new task is started with a shell command, the shell has to open the devices. Normally the standard input and output streams used by the shell itself are registered for the new task. However, if given on the command line, other files can be opened and the streams for these files used as stdio streams.

When a file has to be opened, an OPEN message is sent to the filesystem manager. This part of the kernel translates the drive number and forwards the message to the filesystem task. The filesystem then tries to open the file and sends a reply message. The originating task provides a stream number with its first message. If the filesystem task succeeds in opening the file, it uses the provided stream to read or write the data to. If the file ends, the writing task closes the stream, which is recognized by the other end when there's nothing more to read. This works for read only and write only opens, but not for read/write opens.

Problems

Bootstrapping was the first major problem. How do you start a new computer and debug its OS if don't have an OS on the computer? From earlier systems I already had a small monitor program - directly burned into an EPROM - able to load binaries through a serial line. Getting the MMU (74ls610) was the second problem, because it was on the CoCom list, and it was not allowed to export to eastern countries. (Although I don't live in an eastern country, this posed some difficulties...)

After defining the necessary interfaces between kernel and tasks and kernel and devices, the design was quite straightforward, actually. One problem was the small number of registers in the 6502. For some of the kernel routines, as well as for the send/receive interface it was necessary to define a special buffer. This buffer is at an absolute address at $02XX, which is the same for each task. For systems with an MMU, this is not a problem after all. But it showed out to be a significant problem when porting the OS to systems without MMU, like the C64 (see below).

Operating without an MMU

After the system worked well with an MMU, I decided to build a stripped down version for systems without an MMU to better fit some `embedded applications' I had in mind. The system without an MMU is much more a multithreading than a multitasking system. Threads, as opposed to tasks, share the same memory, thus being able to change variables and data of other threads. But, on the other hand, two identical programs cannot run at the same time as with an MMU, unless they know they will together ahead of time.

The problem lies within the limited stack size of the 6502. Without an MMU, it is not possible to remap memory pages, especially the page with the stack in it. So the stack is divided into several parts, limiting the stack size of each thread, of course. Another problem is global, absolute addresses - like the send/receive buffer for example. As it would be too much of a rewrite and memory wastage to give each thread its own buffer, the send/receive buffer is now protected by a semaphore. A sempahore is a construct that allows exactly one thread to be in a certain routine or manipulate the protected data at a time. Semaphores originate from the railways, where it is important not to have two trains on the same rail, running in opposite directions...

Port to the C64

In addition to lacking an MMU, the Commodore 64 posed other porting problems. Only small changes had to be made to the kernel. The C64 kernel required an interrupt source for task switching. The video device had to be changed to support the C64 keyboard map and video interface. The hardware cursor used in my homebrew computer was replaced by a software cursor. The IEEE488 filesystem was first ported to the IEEE488 interface for the C64 and then to the C64 serial port. When stress testing the system I realized that I still hadn't ported the STDIO library - a few low level subroutines that make life easier. The library was mapped to most tasks and was called from the task environment, not from inside the kernel. Unfortunately, it used global variables - which broke the library when running on a multithreaded system without an MMU. Therefore, some routines have been changed, while others can only be protected by a semaphore.

Port to the C128

Well, the C128 has more memory and even the capability of remapping the stack and zero page to other locations. In a simple expansion of the C64 version, this could be a way to raise the limited stack size to the full possible 256 bytes. Then, other ideas come to mind. The original memory management is made for a system with MMU and is quite useless without an MMU. What is missing is a call to get a contiguous memory block of more than a memory page in size. Then such a large block could be allocated for a new task to load the binary. The binary itself must then be relocated to fit the new address range. Unfortunately, plans to extend the system calls or add relocation capabilities do not exist at this time.

Conclusion

The OS/A65 operating system provides multitasking and multithreading capabilities with a modern kernel design for a 6502 CPU. The OS can be used from embedded applications to desktop systems. A shell provides modern I/O redirection and piping capabilities. Filesystems for Commodore disk drives and PC-style floppies are available. For me, it was a real adventure to design a completely new computer and operating system the way I wanted them designed. I also learned a lot about operating system design - maybe you have learned a bit as well. If you are interested in it, more information is available at the GeckOS page

Disclaimer

All Copyrights are acknowledged. The information here is provided under the terms of the GNU Public License version 2 unless noted otherwise.

Return to Homepage

Last modified: 2010-01-07