This CPU was designed from scratch for a project class during my 3rd year at UCSD. The goal was to design an 8-bit single-cycle CPU and corresponding assembly instruction set capable of executing these 3 specific programs in as few dynamic instructions as possible:
- Read an array of 63 bytes starting at memory address 128, and determine the median value.
- Read an array of 50 bytes starting at memory address 64, and report the number of strings of length 2, 3, and 4. For example, 01111011 reports 1 string of length 4, 2 strings of length 3, and 4 strings of length 2.
- Search an array of 75 bytes starting at memory address 40 for the values in memory addresses 9 and 10, and return locations at which these 2 values are found in sequence.
My concept was to design a CPU that would perform a large number of tasks in a single instruction, resulting in DRASTICALLY lower dynamic instruction counts than our competitors at the cost of a complex hardware design and slower cycle times . Thus, to execute many of the instructions, the CPU had to be capable of performing more than one task in sequence. This led to a dual-ALU design, where the results of ALU1 could be used in ALU2’s operations during the same cycle.
To the left is a sketch of my rather unconventional chip design (click for a larger image). You will notice that each of the 2 ALU’s have two outputs, a data output and a “truth-value” output. The truth-outputs are used for operations such as conditional branches, or conditional register writes. Data outputs are used as memory addresses, operands for another ALU, register specifiers, and so forth.
ALU1 is much simpler than ALU2; it has no OPCode, and supports only two operations: Add (outputs to data bus) and set-less-than (outputs to truth-bit). ALU2’s operations can be seen in the chart that follows.
Our CPU has eight general-use registers in its register bank. Additionally, ALU2 has four internal registers, which it uses for its string comparator module. The CPU also has one register that acts as a dynamic instruction counter, and one program-counter register, for a total of fourteen registers.
Here is a summary of the control lines used by the CPU:
Where ALU2’s control signals are being simplified by a smaller ALU-control box. This box’s input (4 bits) and output (6 bits) are given below:
In order to execute the three programs in as few instructions as possible, we created a set of 26 very specific instructions. These instructions have only one field: the OpCode. Therefore, each instruction has its register operands, branch destinations, memory addresses, and any other necessary information hard-coded into it. The task of translating instructions into machine code, and then into control signals, is therefore greatly simplified. Of course, this design decision makes the CPU useless for executing any programs except for the three that were given for this assignment...but it will make these three programs execute as efficiently as can be. The supported instructions are listed below:
And the control signals corresponding to each of these instructions are (click to zoom):
The CPU itself was constructed in XILINX, with the following schematic hierarchy:
- A) Main Control System (with single-bit outputs)
- A-1) MUX-Tree (Contains M-5)
- A-2) Control-Bit Pairs (Contains M-5)
- B: Register File (with built-in control MUXes) (Contains M-2, M-4)
- B-1) Main Register File (without controls) (Contains M-3)
- C: ALU1 (Contains D-1)
- D: ALU2 (Contains M-2)
- D-1) 8-bit Set-Less-Than module
- D-2) Bit-Check-2
- D-3) Bit-Check-3
- D-4) Bit-Check-4
- D-5) String Comparator (Outer)
- D-5-1) String Comparator (Inner)
- E: Program Counter (with built-in control MUXes) (Contains M-1)
- F: Dynamic Instruction Counter (with built-in control MUXes) (Contains M-1)
- M: Multiplexor
- M-1) 2-bus MUX (for 8-bit busses)
- M-2) 4-bus MUX (for 8-bit busses)
- M-3) 8-bus MUX (for 8-bit busses)
- M-4) 3-bus MUX (for 3-bit busses)
- M-5) 2-bus MUX (for 72-bit busses)
- M-6) 2-bus MUX (for 6-bit busses)
- O: Other
- O-1) ALU2-Control (Contains M-6)
- O-2) Not (Invertor with ENABLE)
- O-3) RAM Module (LOGIBLOX module)
- O-4) ROM Modules (LOGIBLOX module)
Because of the large number of modules & relative complexity of each, I won’t include all of the schematics here, but follows is a sample of what one looks like. This is a snapshot of the module D-5-1, one of the inner string-comparators:
I won't include the actual code for the 3 programs here, but the results for the 3 programs were:
- 144 dynamic instructions.
- 207 dynamic instructions.
- 184 dynamic instructions.
Far better than the second-best CPU in the class! Mission accomplished.