Safe IO sharing can be accomplished through the use of a hypervisor; however, there is a performance penalty associated
with virtual IO, as the hypervisor must consume CPU cycles to schedule the IO requests and get the results back to the right
software partition.
The DPAA (described in Data Path Acceleration Architecture (DPAA)") was designed to allow multiple partitions to
efficiently share accelerators and IOs, with its major capabilities centered around sharing Ethernet ports. These capabilities
were enhanced in the chip with the addition of FMan storage profiles. The chip's FMans perform classification prior to buffer
pool selection, allowing Ethernet frames arriving on a single port to be written to the dedicated memory of a single software
partition. This capability is fully described in Receiver functionality: parsing, classification, and distribution."
The addition of the RMan extends the chip's IO virtualization by allowing many types of traffic arriving on Serial RapidIO to
enter the DPAA and take advantage of its inherent virtualization and partitioning capabilities.
The PCI Express protocol lacks the PDU semantics found in Serial RapidIO, making it difficult to interwork between PCI
Express controllers and the DPAA; however, PCI Express has made progress in other areas of partition. The Single Root IO
Virtualization specification, which the chip supports as an endpoint, allows external hosts to view the chip as multiple four
physical functions (PFs), where each PF supports up to 64 virtual functions (VFs). Having multiple VFs on a PCI Express
port effectively channelizes it, so that each transaction through the port is identified as belonging to a specific PF/VF
combination (with associated and potentially dedicated memory regions). Message signalled interrupts (MSIs) allow the
external Host to generate interrupts associated with a specific VF.
4.13.4 Secure boot and sensitive data protection
The core MMUs and PAMU allow the SoC to enforce a consistent set of memory access permissions on a per-partition basis.
When combined with an embedded hypervisor for safe sharing of resources, the SoC becomes highly resilient to poorly
tested or malicious code. For system developers building high reliability/high security platforms, rigorous testing of code of
known origin is the norm.
For this reason, the SoC offers a secure boot option, in which the system developer digitally signs the code to be executed by
the CPUs, and the SoC insures that only an unaltered version of that code runs on the platform. The SoC offers both boot
time and run time code authenticity checking, with configurable consequences when the authenticity check fails. The SoC
also supports protected internal and external storage of developer-provisioned sensitive instructions and data. For example, a
system developer may provision each system with a number of RSA private keys to be used in mutual authentication and key
exchange. These values would initially be stored as encrypted blobs in external non-volatile memory; but, following secure
boot, these values can be decrypted into on-chip protected memory (portion of platform cache dedicated as SRAM). Session
keys, which may number in the thousands to tens of thousands, are not good candidates for on-chip storage, so the SoC offers
session key encryption. Session keys are stored in main memory, and are decrypted (transparently to software and without
impacting SEC throughput) as they are brought into the for decryption of session traffic.
4.14 Advanced power management
Power dissipation is always a major design consideration in embedded applications; system designers need to balance the
desire for maximum compute and IO density against single-chip and board-level thermal limits.
Advances in chip and board level cooling have allowed many OEMs to exceed the traditional 30 W limit for a single chip,
and Freescale's flagship T4240 multicore chip, has consequently retargeted its maximum power dissipation. A top-speed bin
T4240 dissipates approximately 2x the power dissipation of the P4080; however, the T4240 increases computing
performance by ~4x, yielding a 2x improvement in DMIPs per watt.
Junction temperature is a critical factor in comparing embedded processor specifications. Freescale specs max power at 105C
junction, standard for commercial, embedded operating conditions. Not all multicore chips adhere to a 105C junction for
specifying worst case power. In the interest of normalizing power comparisons, the chip's typical and worst case power (all
CPUs at 1.8 GHz) are shown at alternate junction temperatures.
Chip features
T2080 Product Brief, Rev 0, 04/2014
22 Freescale Semiconductor, Inc.
To achieve the previously-stated 2x increase in performance per watt, the chip implements a number of software transparent
and performance transparent power management features. Non-transparent power management features are also available,
allowing for significant reductions in power consumption when the chip is under lighter loads; however, non-transparent
power savings are not assumed in chip power specifications.
4.14.1 Transparent power management
This chip's commitment to low power begins with the decision to fabricate the chip in 28 nm bulk CMOS. This process
technology offers low leakage, reducing both static and dynamic power. While 28 nm offers inherent power savings,
transistor leakage varies from lot to lot and device to device. Leakier parts are capable of faster transistor switching, but they
also consume more power. By running devices from the leakier end of the process spectrum at less than nominal voltage and
devices from the slower end of the process spectrum at higher nominal voltage, T2080-based systems can achieve the
required operating frequency within the specified max power. During manufacturing, Freescale will determine the voltage
required to achieve the target frequency bin and program this Voltage ID into each device, so that initialization software can
program the system's voltage regulator to the appropriate value.
Dynamic power is further reduced through fine-grained clock control. Many components and subcomponents in the chip
automatically sleep (turn off their clocks) when they are not actively processing data. Such blocks can return to full operating
frequency on the clock cycle after work is dispatched to them. A portion of these dynamic power savings are built into the
chip max power specification on the basis of impossibility of all processing elements and interfaces in the chip switching
concurrently. The percent switching factors are considered quite conservative, and measured typical power consumption on
QorIQ chips is well below the maximum in the data sheet.
As noted in Frame Manager and network interfaces, the chip supports Energy-Efficient Ethernet. During periods of extended
inactivity on the transmit side, the chip transparently sends a low power idle (LPI) signal to the external PHY, effectively
telling it to sleep.
Additional power savings can be achieved by users statically disabling unused components. Developers can turn off the
clocks to individual logic blocks (including CPUs) within the chip that the system is not using. Based on a finite number of
SerDes, it is expected that any given application will have some inactive Ethernet MACs, PCI Express, or serial RapidIO
controllers. Re-enabling clocks to a logic block generally requires an chip reset, which makes this type of power management
infrequent (effectively static) and transparent to runtime software.
4.14.2 Non-transparent power management
Many load-based power savings are use-case specific static configurations (thereby software transparent), and were described
in the previous section. This section focuses on SoC power management mechanisms, which software can dynamically
leverage to reduce power when the system is lightly loaded. The most important of these mechanisms involves the cores.
A full description of core low-power states with proper names is provided in the SoC reference manual. At a high level, the
most important of these states can be viewed as "PH10" and "PH20," described as follows. Note that these are relative terms,
which do not perfectly correlate to previous uses of these terms in Power Architecture and other ISAs:
In PH10 state CPU stops instruction fetches but still performs L1 snoops. The CPU retains all state, and instruction
fetching can be restarted instantly.
In PH20 state CPU stops instruction fetches and L1 snooping, and turns off all clocks. Supply voltage is reduced, using
a technique Freescale calls State Retention Power Gating (SRPG). In the "napping" state, a CPU uses ~75% less power
than a fully operational CPU, but can still return to full operation quickly (~100 platform clocks).
The core offers two ways to enter these (and other) low power states: registers and instructions.
As the name implies, register-based power management means that software writes to registers to select the CPU and its low
power state. Any CPU with write access to power management registers can put itself, or another CPU, into a low power
state; however, a CPU put into a low power state by way of register write cannot wake itself up.
Chip features
T2080 Product Brief, Rev 0, 04/2014
Freescale Semiconductor, Inc. 23
Instruction-based power management means that software executes special WAIT instruction to enter a low power state.
CPUs exit the low power state in response to external triggers, interrupts, doorbells, stashes into L1-D cache, or clear
reservation on snoop. Each vCPU can independently execute WAIT instructions; however, the physical CPU enters PH20
state after the second vCPU executes its wait. The instruction-based "enters PH20 state" state is particularly well-suited for
use in conjunction with Freescale's patented Cascade Power Management, which is described in the next section.
While significant power savings can be achieved through individual CPU low power states, the SoC also supports a register-
based cluster level low power state. After software puts all CPUs in a cluster in a PH10 state, it can additionally flush the L2
cache and have the entire cluster enter PH20 state . Because the L2 arrays have relatively low static power dissipation, this
state provides incremental additional savings over having four napping CPUs with the L2 on.
4.14.3 Cascade power management
Cascade power management refers to the concept of allowing SoC load, as defined by the depth of queues managed by the
Queue Manager, to determine how many vCPUs need to be awake to handle the load. Recall from Queue Manager that the
QMan supports both dedicated and pool channels. Pool channels are channels of frame queues consumed by parallel workers
(vCPUs), where any worker can process any packet dequeued from the channel.
Cascade Power Management exploits the QMan's awareness of vCPU membership in a pool channel and overall pool
channel queue depth. The QMan uses this information to tell vCPUs in a pool channel (starting with the highest numbered
vCPU) that they can execute instructions to enter PH10 mode. When pool channel queue depth exceeds configurable
thresholds, the QMan wakes up the lowest numbered vCPU.
The SoC's dynamic power management capabilities, whether using the Cascade scheme or a master control CPU and load to
power matching software, enable up to a 75% reduction in power consumption versus data sheet max power.
4.15 Debug support
The reduced number of external buses enabled by the move to multicore chips greatly simplifies board level lay-out and
eliminates many concerns over signal integrity. While the board designer may embrace multicore CPUs, software engineers
have real concerns over the potential to lose debug visibility.
Processing on a multicore chip with shared caches and peripherals also leads to greater concurrency and an increased
potential for unintended interactions between device components. To ensure that software developers have the same or better
visibility into the device as they would with multiple discrete communications processors, Freescale developed an Advanced
Multicore Debug Architecture.
The debugging and performance monitoring capability enabled by the device hardware coexists within a debug ecosystem
that offers a rich variety of tools at different levels of the hardware/software stack. Software development and debug tools
from Freescale (CodeWarrior), as well as third-party vendors, provide a rich set of options for configuring, controlling, and
analyzing debug and performance related events.
Appendix A T2081
A.1 Introduction
The T2081 QorIQ advanced, multicore processor combines four, dual-threaded e6500 Power Architecture® processor cores
with high-performance datapath acceleration logic and network and peripheral bus interfaces required for networking,
telecom/datacom, wireless infrastructure, and mil/aerospace applications.
This figure shows the major functional units within the chip.
Introduction
T2080 Product Brief, Rev 0, 04/2014
24 Freescale Semiconductor, Inc.

T2080NSE7PTB

Mfr. #:
Manufacturer:
NXP Semiconductors
Description:
Microprocessors - MPU QorIQ, 64b Power Arch, 8x 1.5GHz threads, 1.87GT/s DDR3/3L, 4x10GE, crypto enabled, 0-105C, Rev 1.1
Lifecycle:
New from this manufacturer.
Delivery:
DHL FedEx Ups TNT EMS
Payment:
T/T Paypal Visa MoneyGram Western Union