SPC563Mxx Device overview
Doc ID 13850 Rev 6 19/48
3.3 Feature details
3.3.1 e200z335 core
The e200z335 processor utilizes a four stage pipeline for instruction execution. The
Instruction Fetch (stage 1), Instruction Decode/Register file Read/Effective Address
Calculation (stage 2), Execute/Memory Access (stage 3), and Register Writeback (stage 4)
stages operate in an overlapped fashion, allowing single clock instruction execution for most
instructions.
The integer execution unit consists of a 32-bit Arithmetic Unit (AU), a Logic Unit (LU), a 32-
bit Barrel shifter (Shifter), a Mask-Insertion Unit (MIU), a Condition Register manipulation
Unit (CRU), a Count-Leading-Zeros unit (CLZ), a 32×32 Hardware Multiplier array, result
feed-forward hardware, and support hardware for division.
Most arithmetic and logical operations are executed in a single cycle with the exception of
the divide instructions. A Count-Leading-Zeros unit operates in a single clock cycle. The
Instruction Unit contains a PC incrementer and a dedicated Branch Address adder to
minimize delays during change of flow operations. Sequential prefetching is performed to
ensure a supply of instructions into the execution pipeline. Branch target prefetching is
performed to accelerate taken branches. Prefetched instructions are placed into an
instruction buffer capable of holding six instructions.
Branches can also be decoded at the instruction buffer and branch target addresses
calculated prior to the branch reaching the instruction decode stage, allowing the branch
target to be prefetched early. When a branch is detected at the instruction buffer, a
prediction may be made on whether the branch is taken or not. If the branch is predicted to
be taken, a target fetch is initiated and its target instructions are placed in the instruction
buffer following the branch instruction. Many branches take zero cycle to execute by using
branch folding. Branches are folded out from the instruction execution pipe whenever
possible. These include unconditional branches and conditional branches with condition
codes that can be resolved early.
Conditional branches which are not taken and not folded execute in a single clock. Branches
with successful target prefetching which are not folded have an effective execution time of
one clock. All other taken branches have an execution time of two clocks. Memory load and
store operations are provided for byte, halfword, and word (32-bit) data with automatic zero
or sign extension of byte and halfword load data as well as optional byte reversal of data.
These instructions can be pipelined to allow effective single cycle throughput. Load and
store multiple word instructions allow low overhead context save and restore operations.
The load/store unit contains a dedicated effective address adder to allow effective address
generation to be optimized. Also, a load-to-use dependency does not incur any pipeline
bubbles for most cases.
The Condition Register unit supports the condition register (CR) and condition register
operations defined by the Power Architecture. The condition register consists of eight 4-bit
fields that reflect the results of certain operations, such as move, integer and floating-point
compare, arithmetic, and logical instructions, and provide a mechanism for testing and
branching. Vectored and autovectored interrupts are supported by the CPU. Vectored
interrupt support is provided to allow multiple interrupt sources to have unique interrupt
handlers invoked with no software overhead.
The hardware floating-point unit utilizes the IEEE-754 single-precision floating-point format
and supports single-precision floating-point operations in a pipelined fashion. The general
purpose register file is used for source and destination operands, thus there is a unified