6502.org • View topic - 65816 Hardware Memory Protection

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Hardware

All times are UTC

65816 Hardware Memory Protection

Page 1 of 2

[ 17 posts ]

Go to page 1, 2 Next

Previous topic | Next topic

Author

Message

daivox

Post subject: 65816 Hardware Memory Protection

Posted: Sun May 01, 2005 7:23 am

Joined: Sat Sep 04, 2004 4:17 am
Posts: 30
Location: Last Ninja 2: Basement

I don't know a ton so this is all theoretical, but I believe I have generated a rough method of adding memory protection to a 65816 and would like to spark discussion on this.

This won't work on 6502 because of lack of an ABORTB pin+interrupt that cancels the currently executing instruction.

The basic idea is to intercept the address bus and compare it via some gates to an eight-bit latch or outputs from a CIA or VIA on a port. If the address bus bank (A16-A23) match the set protection range while protection is enabled (probably by cutting out A16 and protecting 256K or by using another I/O port) then the R/W line (intercepted by this mechanism on all transactions) would be disconnected and the ABORTB pin triggered, and the protect enable would be toggled off. This will allow the ABORT NMI to take effect without re-triggering the protection until enabled again. The ABORT handler can terminate the process that was executing at the time of the IRQ and continue OS operation. This protection mechanism would provide a "user mode" and "kernel mode" like modern i386, SPARC, 680x0, PPC, etc. systems.

Actual implementation would be a bit heavy on gates, probably best to do with FPGA(?). NAND gates on the address bus and the latch or port can provide the ABORT + disable protection + cut R/W for one cycle signal. Throw some flip-flops in there for toggling. I'm sure it can be done but I am a bit more of a software guy.

Let's talk about this, or shoot it down in flames. Either way I'd like to know if this thinking was worth the time :)

Top

GARTHWILSON

Post subject:

Posted: Wed May 04, 2005 4:59 am

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8433
Location: Southern California

We have a few people on this forum who would know about memory protection, but maybe they have not been checking in regularly. It goes in waves for many of us as work and other demands fluxuate. I can't answer your question, but I would wonder if you really need memory protection since I doubt that you're going to be running a multi-user setup on an '816, and each task can have its own bank or set of banks anyway. Although the '816 was never intended to be a workstation-class processor or powerful graphics machine, I think that on the lower scale it was intended for, it looks quite capable of true multitasking and address-agnostic code. If your OS controls the bank and DP usage, each task can start at address 0 in its bank. It's not the best use of memory, but you probably won't have more than a few tasks running at the same time anyway. Putting multiple tasks in one bank can be done too, but it might defeat some of your purposes. I'd be interested to hear what you would like to do with it.

Top

daivox

Post subject: 65816 Hardware Memory Protection

Posted: Wed May 04, 2005 2:59 pm

Joined: Sat Sep 04, 2004 4:17 am
Posts: 30
Location: Last Ninja 2: Basement

Despite the fact that you can stick programs in separate banks, they can still access the entire system as they please. I was just thinking that any program that happens to have a bug that mangles the OS or uses I/O ports directly without permission should be blocked from making those accesses without doing so through what would become the supervisor-mode part of the system through this circuitry. If the system had a buffer overflow in a small web server application, for example, the exploitation of this buffer overflow could be limited in scope with memory protection mechanisms.

It may be a bit silly-sounding, but it was a random thought for me and I wanted to see where it could go.

Top

TMorita

Post subject:

Posted: Fri May 06, 2005 6:42 am

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214

You've basically described how the read-enable, write-enable, and execute-enable bits on an MMU work.

Toshi

Top

wirehead

Post subject:

Posted: Fri May 06, 2005 5:41 pm

Joined: Wed Mar 24, 2004 6:32 pm
Posts: 59
Location: Bay Area, CA

If I'm understanding stuff correctly, the 65816 can do everything but swap. And I know that processes will always be limited to 16 megs per process (Excluding software-driven paging)

My current thinking on the "best" way to do a cheap MMU is probably a piece of SRAM, run off of the top lines of the address bus. This lets you do all kinds of things. So, for the point of argument, let's say that you run the top 8 bits of the address bus, giving you 64k "pages".

You can set up some logic on the outputs of the SRAM to compare one of the bits to various signals. This gives you the ability to make a page read-only, write-only, or no-execute.

You no longer need to map the address space linearly, which then means that IO can go anywhere. This also means that you can move around the zero page and stack much better. Remember, on the 65816 the zero page and stack must live in the first 64k of memory, which tends to get in the way of multiple processes.

There's nothing preventing you from having a 16 bit wide SRAM, instead of an 8 bit wide SRAM. Say we use 4 bits for flags. That means that we have 4 more bits of address space -- now we can fit 256 megs of RAM, but can only see 16 megs at a time. This shouldn't be much of a surprise; just about every z80 or 6502 based system did at least some of that.

Now, what about multiple processes having their own address space? Well, there's a few ways to go about it, depending on how fancy you want to be. The safe way to make sure that the modes are seperated is to set a flip-flop to be set whenever an interupt fires (use the vector pull line, I think) and clearable by accessing an IO address.

You can use an AND gate to compare that signal to a MMU line, to mark parts of the memory as supervisor-only. This way, you can have the MMU SRAM appear somewhere in memory always, but only be changable in supervisor mode.

Now, you can have the supervisor do a copy every time the memory map changes. This means that the first thing you do, right after receiving an interupt, is change to the supervisor mapping. And then, before you can go back to the process, you need to change back. You can make this less painful by having your supervisor always mapped to memory, but that will be memory not usable by the user apps.

Remember, I said only the top 8 lines -- that's 256 bytes. So one possible avenue is to have a 1k SRAM chip and a latch for process number, so now you have four possible processes running at the same time. Or you can have a 512 byte SRAM chip and switch between user and supervisor modes. Remember, no matter what you do, you will run out of MMU SRAM if you have enough processes. So you can just write some logic that copies stuff in and out of the MMU SRAM but keeps the 4 most frequently accessed processes available, say. Or you just say, "There are 4 threads. Deal"

So, what is your PC doing differently? Well, a few things. The most important thing to remember is that all modern processors can restart all stopped instructions, which the 65816 cannot do. With this scheme, there's no way to recover if you reach an invalid address. On a PC, you can take a page of memory, write it to disk and let the process run. If it needs that memory again, it'll fire an interupt, the supervisor retrieves that page from disk, and restarts execution right there. They also use this to avoid storing the entire memory map at once. So they'll have say the last 7 mappings stored on the chip. They compare each of the 7 mappings and then pull from memory if they can't find it in those 7 mappings. The x86 magically does this for you, but some of the other architectures don't.

You can do a hardware assisted mapping-pull, although I bet that the logic would be pretty hefty and require either a CPLD or at least 2 chips per mapping item in your cache.

The x86 also has 4k pages. Which is fine, you can trap the top 12 address lines if you want, but remember that's 4k of mappings to copy every context switch.

You can swap stuff to disk. There's two ways to do it, both of which have been done, but neither of which is nearly as nice as how a moderm PC's MMU does things.

The first way is to swap out entire processes. The second way is to have a mostly-cooperative memory manager. You can allocate "pages" and then lock and unlock them. Unlocked pages may end up on disk.

The biggest problem with the 65816, other than restarting stopped instructions, is that all instructions are equally privelaged. You can make sure that interupts go where they should by taking advantage of the vector pull line. But the stack pointer can be moved around, which means that a user process can really gum things up for the supervisor if it isn't careful. I think that you can come to an acceptable solution if you have a blank page specifically to recover the location of the stack, post interupt. You can just not move the stack in your client application, but that's won't defeat a determined user.

Top

daivox

Post subject: 65816 Hardware Memory Protection

Posted: Fri May 06, 2005 7:01 pm

Joined: Sat Sep 04, 2004 4:17 am
Posts: 30
Location: Last Ninja 2: Basement

How do you propose to write to the SRAM from the 65816 if the SRAM starts in an unitialized state?

Since the '816 wasn't designed specifically to have fancy memory allocation, there will obviously be some bumps in the road to an MMU, and the inability to continue an aborted instruction is not a problem in the limited system I was describing: where the circuit only performs memory access protection and not actual memory re-mapping. If a program accesses a protected bank of memory, that program will be immediately terminated and de-allocated from memory, without further thought, end of story, problem solved. This keeps a rogue program (bug or buffer overflow exploits or whatever) from causing damage to the protected area(s) and keeps the system secure and relatively stable (assuming the program was not critical to system operation).

You really went all out with that last idea, though, and it gave my mind something to think about while I'm at work for the next few nights. :) Thanks!

Top

GARTHWILSON

Post subject:

Posted: Fri May 06, 2005 8:17 pm

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8433
Location: Southern California

> and the inability to continue an aborted instruction

I think you can use the ABORT\ input for what you're talking about. The aborted instruction's address is put on the stack before the processor takes the ABORT vector so you can come back and try again when the problem is taken care of. This input was designed for virtual memory systems.

As a side note, the move instructions MVN and MVP can be interrupted, and then resumed after the interrupt is serviced.

Top

wirehead

Post subject:

Posted: Fri May 06, 2005 8:42 pm

Joined: Wed Mar 24, 2004 6:32 pm
Posts: 59
Location: Bay Area, CA

Ah, put another gate in at startup so that it "boots up" with the MMU SRAM and the boot ROM mapped in a defined configuration. That way, you start up and have a chance to get things in order, first.

It's probably better that way and besides, it's easier to find fast RAM than fast ROM. For the 65816 design I'm considering, it will have a clock divider so that it starts up at 1 MHz and then goes to full-speed only after the bootstrap ROM has been copied into main memory and the ROM disabled.

I'm betting that, after you add enough gates, it'll probably end up having too much latency, therefore requiring either a slower clock speed or a CPLD/FPGA. Or, sadly, both.

I'm going on what Toshi said earlier about the ABORT being buggy. I have no illusions about being right here.

Top

wirehead

Post subject:

Posted: Fri May 06, 2005 8:52 pm

Joined: Wed Mar 24, 2004 6:32 pm
Posts: 59
Location: Bay Area, CA

GARTHWILSON wrote:

I think you can use the ABORT\ input for what you're talking about. The aborted instruction's address is put on the stack before the processor takes the ABORT vector so you can come back and try again when the problem is taken care of.

Tee hee. I just realized..... You can make sure that you can't be screwed over by the client process messing around with where the stack lives if you set up a timer and catch the processor writing the return address before calling the ISR, and then storing that value in a latch, for later read-back.

Or you can wait until the ISR transfers control to the supervisor and be very careful to never do any stack operations, just write the current stack pointer to a particular location in memory.

Reentrancy will now rear its ugly head here. If the kernel is interupted, you need to be able to handle that differently from if the user application is interupted -- probably the same place that you make sure that the ISR vectors are where you want it to be and that the MMU SRAM is changed over to supervisor mode. Just supply two different sets of ISR vectors.

Top

daivox

Post subject: 65816 Hardware Memory Protection

Posted: Sat May 07, 2005 4:18 am

Joined: Sat Sep 04, 2004 4:17 am
Posts: 30
Location: Last Ninja 2: Basement

I think everyone is jumping back into the full-blown MMU discussions and why they will/won't work. I was merely discussing a system for aborting a process that causes a memory protection fault before the bus transaction that would be illegal can finish taking place. Regardless of the ABORT ability's glitchiness, one thing is for certain: it aborts the instruction and causes a special NMI with a different vector address to be invoked, and if the kernel scheduler keeps track of the PID that is currently executing, the current task PID upon an ABORT will be the one to destroy. The hardware implementation of the protection mechanism is where I wish to focus; the paging/swapping can leave the discussion completely, especially if the ABORT has a glitch that prevents restart (I assumed that it aborted and jumped without possibility of return, probably a stupid mistake but then again I suppose I don't care that much as long as it does what I read it does).

Top

bogax

Post subject:

Posted: Sat May 07, 2005 6:35 pm

Joined: Tue Nov 18, 2003 8:41 pm
Posts: 250

wirehead wrote:

Remember, I said only the top 8 lines -- that's 256 bytes. So one possible avenue is to have a 1k SRAM chip and a latch for process number, so now you have four possible processes running at the same time. Or you can have a 512 byte SRAM chip and switch between user and supervisor modes. Remember, no matter what you do, you will run out of MMU SRAM if you have enough processes. So you can just write some logic that copies stuff in and out of the MMU SRAM but keeps the 4 most frequently accessed processes available, say. Or you just say, "There are 4 threads. Deal"

Probably I'm misunderstanding what you're saying but ..

Memory is (relatively) cheap. You're talking about 256 bytes or 512 bytes etc
for a map but fast SRAM comes in 32k, 128k, 256k, 512k (etc) chunks and this
is not something you're going to try to squeeze on to the die with the cpu.
ie it won't cost much (more) to use 512k for your maps and flags than to
use 512 bytes.

So why not step back one remove and map the maps so that you can swap
maps with pointers?

ie instead of saving and swapping 512 bytes of map just handle a one or
two byte pointer to the map(s). If you're using 512 bytes/map and you've
got 512k of fast SRAM, that's 1024 processes.

How many processes do you need? (that's not a rhetorical question)

Or to put it another way, I don't think it would be impractical to arrange
things so that you ran out of time to run processes before you ran out of
room for their mappings.
ie I think you could have enough SRAM/mappings for any reasonable/practical
number of processes. (but it would depend on the "granularity" hence the
question 'how many processes do you need')

It would cost time to set up the maps and what ever latency it added to the
address decoding, of course.

Top

wirehead

Post subject:

Posted: Sun May 08, 2005 5:27 pm

Joined: Wed Mar 24, 2004 6:32 pm
Posts: 59
Location: Bay Area, CA

I don't think it's necessary to "map the maps" and it would cost you a LOT of latency. Two additional SRAM lookups is nothing to sneeze at. One is tollerable. Besides, you can do a lot with just setting your PID with a latch.

But, yeah, I simplified things. You can definately get away with a huge 64k SRAM and map pages smaller than 64k and have a bunch of processes and probably not wory about stuff.

Top

bogax

Post subject:

Posted: Mon May 09, 2005 6:20 am

Joined: Tue Nov 18, 2003 8:41 pm
Posts: 250

wirehead wrote:

I didn't mean an additional layer of SRAM I mean a larger SRAM that you
only use part of at one time.
eg if your mapping takes 512 bytes, use an SRAM of, say, 256 x 512 bytes
so that you have 256 maps and select amongst them with a one byte pointer.
instead of stashing and restoring the map when you switch contexts set the
maps up in advance and then switch among them by writing a single byte.
so you're limited to 256 different contexts.
How many is enough?
I'm guessing that what ever the number, it'll be cheap enough in terms of
SRAM to be practical.

(is that what you mean by "setting your PID with a latch" ? if so, then I guess
I misunderstood you)

Top

TMorita

Post subject:

Posted: Mon May 09, 2005 9:41 am

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214

wirehead wrote:

If I'm understanding stuff correctly, the 65816 can do everything but swap. And I know that processes will always be limited to 16 megs per process (Excluding software-driven paging)

No...most processors can swap. The PDP-11s running early versions of Unix swapped instead of paging.

wirehead wrote:

There's nothing preventing you from having a 16 bit wide SRAM, instead of an 8 bit wide SRAM. Say we use 4 bits for flags. That means that we have 4 more bits of address space -- now we can fit 256 megs of RAM, but can only see 16 megs at a time. This shouldn't be much of a surprise; just about every z80 or 6502 based system did at least some of that.

Yes, you've reinvented how an MMU works.

wirehead wrote:

The x86 also has 4k pages. Which is fine, you can trap the top 12 address lines if you want, but remember that's 4k of mappings to copy every context switch.

Nope. IIRC, the x86 supports multiple page sizes from 1k to about 32k or so, and there's a special 4m mode for I/O.

It's not just 4k, although it's the most common size used by OSes.

Incidentally, I just looked at the 65816 datasheet again, and I caught something I didn't notice before - it pushes the address of the current instruction onto the stack. This, plus the inhibiting of registers being modified, should allow an external MMU to work properly for paging.

Toshi

Top

wirehead

Post subject:

Posted: Mon May 09, 2005 1:42 pm

Joined: Wed Mar 24, 2004 6:32 pm
Posts: 59
Location: Bay Area, CA

Yeah, Toshi, I know I was just describing an MMU.

So.... yeah, if the ABORT line works... you'll run out of latency before you'll run out of things to do with your 65816.

This also lets you do a software-filled TLB cache (picture 8 latches, 8 comparators. If one of the comparators matches the address lines to the latch, it will put the address on the bus. Otherwise, it fires an ABORT interupt, to force the software to re-fill the TLB.) and other stunts.

Top

Page 1 of 2

[ 17 posts ]

Go to page 1, 2 Next

Board index » 6502.org Users Forum » Hardware

All times are UTC

Who is online

Users browsing this forum: Google [Bot] and 10 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum