The National Semiconductor 32K family

"Elegance and regular design was a main goal of this processor, as well as completeness. It was similar to the 68000 in basic features, such as byte addressing, 24-bit address bus in the first version, memory to memory instructions, and so on (The 320xx also includes a string and array instruction). Unlike the 68000, the 320xx had eight instead of sixteen 32-bit registers, and they were all general purpose, not split into data and address registers. There was also a useful scaled-index addressing mode, and unlike other CPUs of the time, only a few operations affected the condition codes (as in more modern CPUs).

Also different, the PC and stack registers were separate from the general register set - they were special purpose registers, along with the interrupt stack, and several "base registers" to provide multitasking support - the base data register pointed to the working memory of the current module (or process), the interrupt base register pointed to a table of interrupt handling procedures anywhere in memory (rather than a fixed location), and the module register pointed to a table of active modules.

The 320xx also had a coprocessor bus, similar to the 8-bit Ferranti F100-L CPU, and coprocessor instructions. Coprocessors included an MMU, and a Floating Point unit which included eight 32-bit registers, which could be used as four 64-bit registers.

The series found use mainly in embedded applications, and was expanded to that end, with timers, graphics enhancements, and even a Digital Signal Processor unit in the Swordfish version (1991, also known as 32732 and 32764). The Swordfish was among the first truly superscalar microprocessors, with two 5-stage pipelines (integer A, and B, which consisted of an integer and floating point pipeline - an instruction dispatched to B would execute in the appropriate pipe, leaving the other with an empty slot. The integer pipe could cycle twice in the memory stage to synchronise with the result of the floating point pipe, to ensure in-order completion when floating point operations could trap. B could also execute branches). This strategy was influenced by the Multiflow VLIW design. Instructions were always fetched two at a time from the instruction cache which partially decoded the instruction pairs and set a bit to indicate whether they were dependent or could be issued simultaneously (effectively generating two-word VLIWs in the cache from an external stream of instructions). The cache decoder also generated branch target addresses to reduce branch latency as in the AT&T CRISP/Hobbit CPU.

The Swordfish implemented the NS32K instruction set using a reduced instruction core - NS32K instructions were translated by the cache decoder into either: one internal instruction, a pair of internal instructions in the cache, or a partially decoded NS32K instruction which would be fully decoded into internal instructions after being fetched by the CPU. The Swordfish also had dynamic bus resizing (8, 16, 32, or 64 bits, allowing 2 instructions to be fetched at once) and clock doubling, 2 DMA channels, and in circuit emulation (ICE) support for debugging.

The Swordfish was later simplified into a load-store design and used to implement an instruction set called CompactRISC (also known as Pirhana, an implementation independent instruction set supporting designs from 8 to 64 bits)." Great Microprocessors of the Past and Present (V 12.1.2)

One additional note since the internet magazine for modern myths, wikipedia, trys to shed a bad light on these processors: at those times you had to build the whole logic necessary for communication between the different chips amd memory by yourself. This was usually done with ASIC's. And this is a complicated bussiness as every one can tell who was in this bussiness those days. But this was the case with all processors at that time.At least with processors with many support chips. What's more all processors at that time needed a very careful layout and a very sophisticated power supply and power supply blocking sheme. The NS-processor was a little bit - if not to say very - delicate in this aspect, that's true. So also in this aspect it was very similar to more modern CPUS's. You almost had to observe rules of analog design if you wanted a stable running layout. Adnitted: those who were used the rock-steady functioning of a Z-80 or the 68000 sure had problems when working with the 32.000 family. But these were the exception to the rule: they worked in nearly every design.

What is surely true is that the 16032 had in its infancy many bugs. But this was in these years absolutely no peculiarity. Especially with the more complex designs. As an example: when the Intel FPU 387 (delated by 2 years ) appeared, it had so many bugs that it was almost useless. There are till today test programs in the web to rule out these errors. The problem those days were not the bugs themselves but the knowledge about them (they were not communicated by the manufacterers), so that you could program around them.

Developpers (the whole design phase was accompanied by former developpers of the vax team as consultants):
National Semiconductor 32K - Dan O'Dowd and Les Kohn
32016/16032 - Avraham Menachem (microarchitecture and chip design), Asher Kaminker (microcode), and Yoav Lavy (BIU, processor buses, external MMU, and interrupt controller)
32332 - Ran Talmudi
32532, 1987 - Uri Weiser, Don Alpert, Gigi Licht, Jonathan Levy (BIU, MMU, and dcache), and Sidi Yom Tov (design manager)
See B. Maytal, S. Iacobovici, D. Alpert, D. Biran, J. Levy, and S.Y. Tov, "Design Considerations for a General Purpose Microprocessor," IEEE Computer, January 1989, pp. 66-76.
See D. Alpert, J. Levy, and B. Maytal, "Architecture of the NS32532 Microprocessor," Proceedings ICCD, October 1987, pp. 168-172.
32732 (a.k.a. 32764 and Swordfish, superscalar design, not delivered as N32K family member), 1991 - Don Alpert (see Swordfish web page and CompactRISC