Malhar Chaudhari, Akshay Mote, Suhit Pai, Mayuresh Mhase, Aishwarya Natarajan, Shweta Shetty
The front-end design of this processor was undertaken as a part of the capstone project for the VLSI front end design course, I had enrolled in from August – October 2013 (Not a part of college curriculum)
The five stages of Pipelined processor are:
- Instruction fetch stage
- Decoding stage
- Execution stage
- Memory stage
- Write-back stage
The various instructions that the Processor should execute are as below:
- Shift Left/Right
- Multiplication (8-bit, higher-higher bytes or lower-lower bytes)
- LOAD from Memory
- STORE to Memory
- MOV (move data between GPRs as well as immediate data) (16-bit, 8-bit higher/lower byte)
- JUMP (unconditional forward jump, incrementing Program counter by specified jump value)
- NOP (no operation, disallowing write operations)
- EDP (end of program, disabling of Program counter)
All instructions related to Arithmetic and Logical Unit are capable of carrying out both 16 bit and 8 bit (options for choosing lower or higher byte) operations. A total of 47 Instructions were found to be executable on this processor.
This processor was designed using the Xilinx ISE. The memory units were taken from the inbuilt libraries, and the rest of the processor components were written in Verilog.
We also understand that some specific instructions if executed one after the other could result in yielding absurd results. On account of that, we will need to implement a Hazard aversion Scheme for the Processor.
The following is the list of components in the processor that were designed in verilog:
- Instruction Memory: A 256X16 bit memory is used as the instruction memory with input as 8 bit instruction address and 16 bit instruction.
- REG file: An 8X16 bit register file has 3 bit addresses of two sources, address of destination, 16 bit destination, 2 reconfiguration bits, 2 write enable bits and clock as input. It also has two 16 bit sources as the output.
- Control unit: The control unit is designed as per the opcodes decided for the instructions.
- ALU: The ALU is designed as per the ALU control logic decided by Control unit.
- Memory unit: An 8X16 bit data memory unit is already provided to store the data.
- Buffers: Since, it is 5 stage pipeline, we use the buffers for the 5 stages. Buffers are nothing but flip-flops giving out data held by them at the positive clock edge.
The 16 Bit Instruction Format is as follows:
Fig: 16 Bit Instruction Format
Initially, we designed the data paths for individual instructions. We found that Datapaths for most of the Arithmetic and logical instructions were same. It turned out that most of the Instructions have similar Datapaths. In all, hardly 4-5 Datapaths were required to be made. Later, we merged these Datapaths to a single datapath for all the instructions.
Fig: Final Datapath of the designed Processor
The components that we have used in this datapath are as follows:
Buffer0: Buffer0 is a component of Program counter (PC) loop. It holds the Program counter value until next positive edge of clock. It acts as a pointer to desired instruction in instruction memory.
Instruction memory (256X16): Instruction memory stores the set of Instructions to be executed.
MUX and adder combinational logic for updating PC: The PC loop needs to be modified for EDP and JUMP instructions. The PC needs to stop or execute dummy cycles upon receiving EDP instruction and for JUMP instruction it needs to increment by specified count. For that we have included an additional Adder in the loop which is responsible for handling the Program count if JUMP is encountered. We have also taken care of JUMP 0 hazard here itself by using a combinational logic which upon coming across JUMP 0 i.e. recursion, breaks the recursive loop and continues by incrementing the Program counter by 1. This increment by 1 shows that JUMP 0 had occurred but was ignored. For EDP instruction we keep checking for the occurrence of the Opcode of EDP and then clear the Buffer0 subsequently disallowing the PC operation. Note that the program counter does not completely terminate its operations.
Buffer1: This buffer is responsible for forwarding the 16-bit Instruction to the next stage upon receiving a positive edge of clock pulse.
Register file (8X16): Register file has 8 general purpose registers out of which register with address 000 is special as it holds the value received from MVI instruction. The Register file can output 16-bit as well as 8-bit data. Actually the 8-bit data is 16-bit data only but with appropriate byte masked. The reconfig bits are active low and decide whether 16-bit or 8-bit data (higher or lower byte) is to be given at output. The write enable bits are also active low and decide whether the data to be written is 16-bit or 8-bit (higher or lower byte). The data to be written and address of GPR is also received by the Register file.
Control unit: The control unit issues commands in response to received Instruction. It provides the signal to ALU to select desired operation. It also provides ctrl bit output which is used to distinguish between two operations with same Opcode. It also provides sel[1:0] which is used as select line for MUX in write-back stage. The Register write enable (regwren) output acts as write enable bits to Regfile. The memory read/write output is used to issue a command for reading/writing to the memory. We have taken up one additional one bit shift output which instructs the Shifting block in logical unit of ALU to perform either left shift or right shift.
Hazard unit: This unit consists of a 3 stage pipeline. It stores the received Instruction word and pushes it along pipeline. Pipelining makes available to us the source address, destination and Opcode of upto 3 instructions. We use if-else ladder to detect various hazards. Depending upon hazard, respective select line outputs are given out by this unit. These outputs act as Select line inputs to two MUXes in execution stage i.e. at input of both ALU inputs and to a MUX in memory stage.
Fig: Hazard Unit Component Design
The list of hazards detected by Hazard unit:
- ADD ADD
- ADD X ADD
- LOAD ADD
- LOAD X ADD
- ADD STORE
- ADD X STORE
- LOAD MOV
- MOV STORE
- LOAD STORE
- MVIH MVIL MOV
Black Box: Black box extracts the 8-bit data provided in the Instruction word in case of MVI instruction. It also takes care to place data in higher byte or lower byte in case of MVIH or MVIL respectively. It also issues address to the Regfile to make available data at specified address. Blackbox includes a 16-bit 2:1 MUX which has the 8-bit immediate data (with other byte as 0, so in total 16-bit data) as one input and Regfile extracted data as another, depending upon the received instruction appropriate MUX input is selected.
Buffer2: The outputs of Control unit, Hazard unit and Register file are passed on to next stage at positive clock edge.
ALU: The ALU is designed to perform Addition, Subtraction, Multiplication, Increment, Decrement, Logical operations (AND/NAND, OR/NOR, XOR/XNOR) and shift. It also deals with buffer and negate operation. Here’s the block diagram for ALU implementation.
ADD, SUB, INC and DEC share a common adder/subtractor. Input A is always fixed. Input B of Adder/Subtractor can be ALU input B or a hardwired 1. ALU input B is used for ADD/SUB (A+B/A-B) operations and hardwired 1 is used for INC/DEC (A+1/A-1) operations. This explains the second input selection logic shown above.
Logical operations include AND, OR, XOR and shift. All logical operations are performed but only the desired output is selected using a 4:1 MUX. For generating NAND, NOR, XOR ctrl bit is used as select line for a 2:1 MUX shown above to select the proper output (buffered or negated). An additional shift input is present where shift  gives direction of shift and shift [2:0] gives amount of shift.
Multiplication block is used to perform 8-bit multiplication. The byte selection logic is used to choose appropriate byte from each input and only that is passed on to the multiplier .We are passing either both lower bytes or higher bytes, higher/lower bytes and lower/higher bytes cases are not included. It would have been possible to do so using byte selection logic.
Direct data (Input A of ALU) is passed to execute buffer instruction or negate instruction.
Input selection logic for ALU inputs: Input selection logic for ALU inputs consists of one MUX each for both ALU inputs. It works to ensure proper input is received by ALU in case of a hazard. These MUXes receive select line input from Hazard unit.
Buffer3: The outputs of ALU, immediate data of MVI, data to be stored in Memory and some control logic outputs are passed on to next stage at the positive clock edge.
Data Memory (8X16): It is used to store a maximum of 8 16-bit words. It receives data input from MUX logic to select appropriate input with view of avoiding hazards. It receives read/write input from control unit.
Buffer4: This buffer forwards the ALU outputs, data read from memory, immediate 8-bit data upon receiving positive clock edge.
MUX logic for write-back input selection: This MUX is needed to select proper data to be selected for write-back. It receives the select line input from control logic unit. These select line values also pass from buffer2, buffer3 and buffer4. It selects which of the three data is to be outputted- data read from memory, ALU output or direct data (immediate).
MUX logic for write-back address selection: The special of MVI invokes the need of this MUX. The MVI instruction necessitates that the immediate data be written into register with address 000 in Register file. This condition is taken care of using this MUX. For rest of the operations, desired address is directly passed
The Processor design was a challenging job in terms of Area and Speed optimization as well as Hazard aversion. There are still more hazards which have a dependency similar to ADD ADD hazard or even ADD X ADD hazard. These hazards can be surely avoided upon further detailed implementation of hazard unit. We also figured out some Instructions that could have been added like XCHG i.e. interchanging lower and higher bytes of a word by inserting a 2:1 MUX in the direct data line where both direct data and byte reversed data would have been available. The functionality of Multiplication Instruction can also be further enhanced to perform multiplication of higher and lower or lower and higher bytes of two words.
I know you might wonder, where is the Verilog code I mentioned about? If you want to know the complete details of my implementation, do drop me a mail on email@example.com