XC4000, XC4000A, XC4000H Logic Cell Array Families
2-10
independently for each of the two registers; this input also
can be disabled for either flip-flop. A separate global Set/
Reset line (not shown in Figure 1) sets or clears each
register during power-up, reconfiguration, or when a dedi-
cated Reset net is driven active. This Reset net does not
compete with other routing resources; it can be connected
to any package pin as a global reset input.
Each flip-flop can be triggered on either the rising or falling
clock edge. The source of a flip-flop data input is program-
mable: it is driven either by the functions F', G', and H', or
the Direct In (DIN) block input . The flip-flops drive the XQ
and YQ CLB outputs.
In addition, each CLB F' and G' function generator con-
tains dedicated arithmetic logic for the fast generation of
carry and borrow signals, greatly increasing the efficiency
and performance of adders, subtracters, accumulators,
comparators and even counters.
Multiplexers in the CLB map the four control inputs, la-
beled C1 through C4 in Figure 1, into the four internal
control signals (H1, DIN, S/R, and EC) in any arbitrary
manner.
The flexibility and symmetry of the CLB architecture facili-
tates the placement and routing of a given application.
Since the function generators and flip-flops have inde-
pendent inputs and outputs, each can be treated as a
separate entity during placement to achieve high packing
density. Inputs, outputs, and the functions themselves can
freely swap positions within a CLB to avoid routing conges-
tion during the placement and routing operation.
Figure 1. Simplified Block Diagram of XC4000-Families Configurable Logic Block
LOGIC
FUNCTION
OF
G1-G4
G4
G3
G2
G1
G'
LOGIC
FUNCTION
OF
F1-F4
F4
F3
F2
F1
F'
LOGIC
FUNCTION
OF
F', G',
AND
H1
H'
DIN
F'
G'
H'
DIN
F'
G'
H'
G'
H'
H'
F'
S/R
CONTROL
D
EC
RD
SD
Q
YQ
S/R
CONTROL
D
EC
RD
SD
Q
XQ
1
1
K
(CLOCK)
Y
X
H1 DIN S/R EC
C1 C2 C3 C4
X6099
BYPASS
BYPASS
MULTIPLEXER CONTROLLED
BY CONFIGURATUON PROGRAM
2-11
Speed Is Enhanced Two Ways
Delays in LCA-based designs are layout dependent. While
this makes it hard to predict a worst-case guaranteed
performance, there is a rule of thumb designers can
consider — the system clock rate should not exceed one
third to one half of the specified toggle rate. Critical
portions of a design, shift registers and simple counters,
can run faster — approximately two thirds of the specified
toggle rate.
The XC4000 family can run at synchronous system clock
rates of up to 60 MHz. This increase in performance over
the previous families stems from two basic improve-
ments: improved architecture and more abundant routing
resources.
Improved Architecture
More Inputs
: The versatility of the CLB function genera-
tors improves system speed significantly. Table 3 shows
how the XC4000 families implement many functions more
efficiently and faster than is possible with XC3000 devices.
A 9-bit parity checker, for example, can be implemented in
one CLB with a propagation delay of 7 ns. Using a
XC3000-family device, the same function requires two
CLBs with a propagation delay of 2 x 5.5 ns = 11 ns. One
XC4000 CLB can determine whether two 4-bit words are
identical, again with a 7-ns propagation delay. The ninth
input can be used for simple ripple expansion of this
identity comparator (25.5 ns over 16 bits, 51.5 ns over
32 bits), or a 2-layer identity comparator can generate the
result of a 32-bit comparison in 15 ns, at the cost of a single
extra CLB. Simpler functions like multiplexers also benefit
from the greater flexibility of the XC4000-families CLB. A
16-input multiplexer uses 5 CLBs and has a delay of only
13.5 ns.
More Outputs:
The CLB can pass the combinatorial
output(s) to the interconnect network, but can also store
the combinatorial result(s) or other incoming data in one or
two flip-flops, and connect their outputs to the interconnect
network as well. With XC3000-families CLBs the designer
has to make a choice, either output the combinatorial
function or the stored value. In the XC4000 families, the flip
flops can be used as registers or shift registers without
blocking the function generators from performing a differ-
ent, perhaps unrelated task. This increases the functional
density of the devices.
When a function generator drives a flip-flop in a CLB, the
combinatorial propagation delay
overlaps completely
with
the set-up time of the flip-flop. The set-up time is specified
between the function generator inputs and the clock input.
This represents a performance advantage over competing
technologies where combinatorial delays must be added
to the flip-flop set-up time.
Fast Carry:
As described earlier, each CLB includes high-
speed carry logic that can be activated by configuration.
The two 4-input function generators can be configured as
a 2-bit adder with built-in hidden carry that can be ex-
panded to any length. This dedicated carry circuitry is so
fast and efficient that conventional speed-up methods like
carry generate/propagate are meaningless even at the
16-bit level, and of marginal benefit at the 32-bit level.
A 16-bit adder requires nine CLBs and has a combinatorial
carry delay of 20.5 ns. Compare that to the 30 CLBs and
50 ns, or 41 CLBs and 30 ns in the XC3000 family.
The fast-carry logic opens the door to many new applica-
tions involving arithmetic operation, where the previous
generations of FPGAs were not fast and/or not efficient
enough. High-speed address offset calculations in micro-
processor or graphics systems, and high-speed addition in
digital signal processing are two typical applications.
Faster and More Efficient Counters:
The XC4000-fami-
lies fast-carry logic puts two counter bits into each CLB and
runs them at a clock rate of up to 42 MHz for 16 bits,
whether the counters are loadable or not. For a 16-bit
Table 3. Density and Performance for Several Common Circuit Functions
XC3000 (-125) XC4000 (-5)
16-bit Decoder From Input Pad 15 ns 4 CLBs 12 ns 0 CLBs
24-bit Accumulator 17 MHz 46 CLBs 32 MHz 13 CLBs
State Machine Benchmark* 18 MHz 34 CLBs 30 MHz 26 CLBs
16:1 Multiplexer 16 ns 8 CLBs 16 ns 5 CLBs
16-bit Unidirectional Max Density 20 MHz 16 CLBs 40 MHz 8 CLBs
Loadable Counter Max Speed 34 MHz 23 CLBs 42 MHz 9 CLBs
16-bit U/D Counter Max Density 20 MHz 16 CLBs 40 MHz 8 CLBs
Max Speed 30 MHz 27 CLBs 40 MHz 8 CLBs
16-bit Adder Max Density 50 ns 30 CLBs 20.5 ns 9 CLBs
Max Speed 30 ns 41 CLBs 20.5 ns 9 CLBs
* 16 states, 40 transitions, 10 inputs, 8 outputs
XC4000, XC4000A, XC4000H Logic Cell Array Families
2-12
decoder outputs in a CLB. This decoding feature covers
what has long been considered a weakness of FPGAs.
Users often resorted to external PALs for simple but fast
decoding functions. Now, the dedicated decoders in the
XC4000 can implement these functions efficiently and
fast.
Higher Output Current:
The 4-mA maximum output
current specification of today’s FPGAs often forces the
user to add external buffers, cumbersome especially on
bidirectional I/O lines. The XC4000 families solve many of
these problems by increasing the maximum output sink
current to 12 mA. Two adjacent outputs may be intercon-
nected to increase the output sink current to 24 mA. The
FPGA can thus drive short buses on a pc board. The
XC4000A and XC4000H outputs can sink 24 mA per
output and can double up for 48 mA.
While the XC2000 and XC3000 families used complemen-
tary output transistors, the XC4000 outputs are n-channel
for both pull-down and pull-up, somewhat analogous to the
classical totem pole used in TTL. The reduced output High
level (VOH) makes circuit delays more symmetrical for
TTL-threshold systems. The XC4000H outputs have an
optional p-channel output transistor.
Abundant Routing Resources
Connections between blocks are made by metal lines with
programmable switching points and switching matrices.
Compared to the previous LCA families, these routing
resources have been increased dramatically.The number
of globally distributed signals has been increased from two
to eight, and these lines have access to any clock or logic
input. The designer of synchronous systems can now
distribute not only several clocks, but also control signals,
all over the chip, without having to worry about any skew.
There are more than twice as many horizontal and vertical
Longlines that can carry signals across the length or width
of the chip with minimal delay and negligible skew.The
horizontal Longlines can be driven by 3-state buffers, and
can thus be used as unidirectional or bidirectional data
buses; or they can implement wide multiplexers or wired-
AND functions.
Single-length lines connect the switching matrices that are
located at every intersection of a row and a column of
CLBs. These lines provide the greatest interconnect flexi-
bility, but cause a delay whenever they go through a
switching matrix. Double-length lines bypass every other
matrix, and provide faster signal routing over intermediate
distances.
Compared to the XC3000 family, the XC4000 families
have more than double the routing resources, and they are
arranged in a far more regular fashion. In older devices,
Figure 2. Fast Carry Logic in Each CLB
up/down counter, this means twice the speed in half the
number of CLBs, compared with the XC3000 families.
Pipelining Speeds Up The System:
The abundance of
flip-flops in the CLBs invites pipelined designs. This is a
powerful way of increasing performance by breaking the
function into smaller subfunctions and executing them
in parallel, passing on the results through pipeline flip-
flops. This method should be seriously considered wher-
ever total performance is more important than simple
through-delay.
Wide Edge Decoding:
For years, FPGAs have suffered
from the lack of wide decoding circuitry. When the address
or data field is wider than the function generator inputs (five
bits in the XC3000 families), FPGAs need multi-level
decoding and are thus slower than PALs. The XC4000-
family CLBs have nine inputs; any decoder of up to nine
inputs is, therefore, compact and fast. But, there is also a
need for much wider decoders, especially for address
decoding in large microprocessor systems. The XC4000
family has four programmable decoders located on each
edge of each device. Each of these wired-AND gates is
capable of accepting up to 42 inputs on the XC4005 and 72
on the XC4013. These decoders may also be split in two
when a large number of narrower decoders are required
for a maximum of 32 per device. These dedicated decod-
ers accept I/O signals and internal signals as inputs and
generate a decoded internal signal in 18 ns, pin-to-pin. The
XC4000A family has only two decoder AND gates per
edge which, when split provide a maximum of 16 per
device. Very large PALs can be emulated by ORing the
Logic
Function
of G1 - G4
G'
Carry
Logic
Carry
Logic
F'
Logic
Function
of F1 - F4
M
F4
F3
F2
F1
COUT
CIN 1
CIN 2
B0
A0
G4
G3
G2
G1
A1
B1
SUM 1
SUM 0
X5373

XC4003-6PC84C

Mfr. #:
Manufacturer:
Xilinx
Description:
IC FPGA 61 I/O 84PLCC
Lifecycle:
New from this manufacturer.
Delivery:
DHL FedEx Ups TNT EMS
Payment:
T/T Paypal Visa MoneyGram Western Union