Library Design
The basic logic elements of an IC design, gates, and flip flops, are not made by the designer him- self but mostly taken from libraries. The libraries provide a set of common and frequently used basic building blocks. In the area of digital design full custom design at the transistor level can be avoided. The risk of the design process, as well as the effort of verification, is thus reduced significantly. For analogue integrated circuits, when components are available, analogue libraries are used. Digital circuits are composed nearly entirely of library components, whilst analogue circuits are composed by 80 % or more of library components. These library components are assembled and combined into the intended design. Only cells with special requirements or performances, which are unavailable as stock library components, are custom designed.
Library components are building blocks which are combined to IC designs by interconnection. In the sense of the Gajski Y (see chapter 2), each library component has three views:
• Symbol (structure);
• Model (behaviour); and
• Form (geometry).
See fig 17.1. In digital designs this is a relatively small number of basic components, which are assembled into all further complex circuits.
This concept must not be limited to the design of integrated VLSI circuits. Discrete components like transistors, resistors, and capacitors, arranged on a printed circuit board, have similar symbol views, behaviour description, and housing geometry. In the case of discrete circuits the number of components used which sets up the library may be remarkably large.
The manner in which the library components are connected is described in the netlist. Netlists and the library together describe the entire design.
With the arrangement of the cells on the silicon surface – the so called placement of the cells which are listed in the netlist – lay out design starts. The electrical connections between the cell pins that are used by the routing are again listed in the netlist. Placement and routing is governed by the process defining the design rules and layer usage, and which is used for DRC verification.
For the purpose of simulation the models contained in the library are used together with the netlist. In this way netlist and library are the center of the design, where the library of components combine in itself the three arms of the Gajski Y. A further
simplification of the design is not needed. The library components build the lowest level of abstraction in the hierarchy.
Choosing the library, the further development process is defined. FPGA libraries allow one to design FPGA, ASIC libraries the design of standard cell or gate array designs, discrete libraries of e.g. 74xx TTL components, discrete board designs. If there are common symbols and functions in different libraries, a one to one mapping of netlist compo- nents may be possible. If a direct replacement or mapping is not possible, modern synthesis soft- ware can deliver functionally equivalent combina- tions of basic gates. Basic logic functions AND, OR and INV are available in all digital libraries and can be used directly; more complex AOI gates (AND-OR-INV gates) may not always be avail- able, and may be simplified and reconstituted by the program.
Because of the close connection with the IC manufacturing process, libraries are typically provided by the ASIC manufacturer. This means that de- tailed design information of the cell is often kept secret by the company Only the physical shapes of the cells, the external interface contour, and the positions of the pins are published. In most cases this is sufficient to design the circuit in a standard cell design style. The internal structures are inserted into the shapes of the design before mask generation at the manufacturer.
In reality the CMOS manufacturing processes of most plants are not very different. For this reason there are companies which specialize in providing manufacturer independent libraries, designed for a superset of process rules. Such libraries (e.g. those by Compass Design Automation Inc., and now Avant! Inc.) can be mapped with little effort onto many CMOS processes. Such ‘passport’ Libraries allow one to change the manufacturer without major redesign of the IC. Process independent libraries ease the maintenance and further development of library cells and the porting onto new advanced technologies.
In gate arrays and FPGAs the physical or geo- metrical view is not included. Place and route is completely carried out by the manufacturer. Placement and routing with FPGAs is generally not transparent to the user and programming of the interconnections is carried out by automatic tools.
The libraries provided for FPGA are very similar to ASIC libraries, so that there is an easy mapping process from one technology to another.
In this chapter the typical content of concrete li- braries is demonstrated, and how it can be used to design chips.
Digital Libraries
Digital libraries require a compromise between specialization and standardization. Basically, only a few gates are sufficient for building up all logic constructs. Logic equations can be written in nor- mal forms (e.g., the three gate types AND, OR and INV are fully sufficient for mapping every logical equation). It can be shown that even one gate type, NAND or NOR, would allow complete coverage.
Such extreme concepts have applications in the gate array scene, for the ordinary user they are not of interest. He needs a broad set of NAND, NOR, AND, OR, etc. gates, and further memory elements such as flip flops in different types and configurations. A minimal library for digital de- signs should have about 30 elements (listed in table 17.1).
In the first class basic combinational gates INV, AND, OR are listed. In order to drive a certain number of interconnected gate inputs, buffers and inverters should be available with large drive capabilities. NAND and NOR should be available with more than two inputs to avoid complex logic statements with many levels of internal nodes. For arithmetic applications and data path structures a full adder should be included in the library, as well as multiplexers and gates of the AND-OR- INV type. Busses may be build up with tristate buffers. These buffers should have sufficient drive capability to drive a large number of inputs to get a good frequency performance.
Memory must be provided in the form of latches and flip flops, some with asynchronous reset/set. For test purposes flip flops with scan input are needed. For the busses further bus keeper flip flops are provided, keeping the bus in a defined state when all tristate outputs are in the ‘off’ state.
The abovedefined minimum digital library contains 30 cells, even fewer are sufficient. Figure shows some statistics on numbers of cell types
used for about 150 ASIC designs [17.10]. Smith demonstrates that 80 % of the designs analysed use only 20 % of the available cell library.
The arrangement of the optimal set of basic com- ponents depends on:
• design stile (manual or synthesis); and the
• design object (control dominated or data dominated).
Design and maintenance of libraries needs much effort because every cell has to be simulated at the lowest transistor level (e.g., with SPICE) and has to be characterized. Prototypes have to be manufactured, verified, and qualified for a certain process technology. Modelling has to be carried out in several simulator dialects. The life time of a process technology has decreased to less than two years, so modifications to new processes are made continuously. This effort can be justified by the manufacturer, who provides the cells for many applications. The cost of the library maintenance may be distributed over many customers.
As a result of the competition between manufacturers and the ability to map designs using different libraries and processes, a certain standardisation has been established. Today about 300 library components are provided for the IC core, another 100 for the periphery pad cells.
Table 17.2 shows an overview of available libraries, and in which only newer technologies are listed. The libraries offered open the actual technology field for designers down to 0.11 μm
CMOS processes. Many manufacturers offer both standard cell libraries and gate array libraries for the same process and several masters. Far eastern manufacturers and smaller companies primarily offer gate array processes. FPGA manufacturers are not included in table 17.2, the FPGA libraries offered are very similar to gate array libraries. The following discussions are focussed on standard cell libraries.
As an example for the content of such a digital library, table 17.3 gives a summarized view of the CMOS library of ALCATEL MIETEC [17.8].
A large number of different cells arises from:
• all important cells are offered with up to four different driver capabilities;
• more than 1/3 of all cells are complex gates, e.g., AOI gates.
The diversification of basic cells to cells with larger driver capability is relatively simple by re- placing the driver stage, so these cells are designed quickly. At the system level these cells save an additional buffer as well as the interconnection in between.
The same is true for AOI gates, which are area optimal, save interconnections and ease routing of logic composed of several levels. Modern logic synthesis supports AOI gates effectively. A disadvantage is the requirement of supplying and sup- port a zoo of AOI gates; this is taken into account.
Technical data of selected cells, shorted and simplified, shows table 17.4.
The cells have, as is typical for standard cells, the same height (25 μ m) and vary only in width. In many libraries it is permitted to place the cells horizontally, mirrored, or vertically. Voltage is supplied via VDD and VSS supply busses which are run horizontally through the cell with sufficient width in metal_1. Signal ports or ‘pins’ are located on metal_1 or metal_2 and generally routed vertically through the cell. During the de- sign phase of the cells internal connections using metal_2 are avoided. In more complex cells, flip flops, metal_2 may be used and blockage layers are defined which tell the automatic routing tool where no connections are allowed. With three or more metal layers it may be routed over the cells, allowing high density channel free routing.
As an example for the internals of a standard cell see fig. 17.3.
In addition to the assembly of standard cells in rows, there is another concept of an array like assembly of gates using specialized libraries. These libraries optimise the lay out of data paths, adders with wide inputs, as well as multipliers and similar systolic structures. The libraries are used together with data path generators. The cells of these libraries are characterised by the signal input being one side, the output on the other, and the supply rails and clock lines passing horizontally and/or vertically through the cell. The cells are designed for direct interconnection by abutment with the neighbouring cells.
Figure 17.4 shows the external interface (abstract) of a cell for data path applications. The cell is designed in such a way that electrical interconnections are made by direct abutment. With synthesis gaining more importance today, data path genera- tors are less used, sometimes only for generation of block cells, which are placed in block design style into the IC core. Block design has the disadvantage of inefficient area usage, so the improvement of a design by using high density data path generators may be balanced or outperformed by the block placement.
The usable set of cells which may be used by data path generators is quite small and limited on a few basic gates and so called systolic cells for multiplier and similar structures. In many cases the cells are defined in a lay out language and are designed in detail with help of the design rules of the process by automatic generator tools [17.1].
Table 17.4 provides the geometry and area data as well as the driving capability, given in standard loads (SL). One standard load is defined as the input capacity of the smallest inverter in the library. The data is related to an abstract behaviour model of the cell which has been found by characterization, and which is simulated at the transistor level with all parasitic effects. Characterization is a technical area on its own and will not be discussed further here.
The propagation delay times measured from one input port to an output port are modelled in many libraries by the following linear timing equation:
Delay = Tintr + Rramp · Cload (17.1) Delay means here the time for the propagation of a signal’s transition slope (the rising and falling edges may have different values) from the chosen input to the assigned output. Tintr is the intrinsic delay, or basic delay, of the element and Rramp a delay proportional to the load with the dimension nsec/pF. Rramp models the increase of delay with the number of gates connected.
The intrinsic delay is the basic delay of the cell without any external load. It results from internal nodes and loads which are not transparent for the user. Rramp depends on the load capability and design of the driver stage, and represents indirectly the output resistance (pull up or pull down resistance). Because of the pull up resistance formed by a p-channel MOS transistor, and the pull down resistance by a n-channel MOS transistor, there is a different timing behaviour for the rising and falling edges of the output signal, even in a so called balanced design the concept of which is to size the output transistors in such a way that delay performance is similar for both slopes.
In the modelling of gates with more than one input and output, time delays for each path and for each type of slope have to be evaluated, which delivers a large number of values even for a sim- ple gate. Gates with larger driver capability, or fan out (IV4), show significantly lower values for Rramp than standard gates (IV). But the larger output transistors increase the cell area and the power consumption. The equation (17.1) describes the measured behaviour shown in fig. 17.5 only simplified. Instead of the simple linear equation newer libraries and simulators use tables which are interpolated. Loads, sometimes called fan in, are calculated from the netlist and library only once, just before simulation, and are not dynamically changed during the digital simulation. Loads are combined with some wire load modelling, estimating the capacitive load of the interconnection wiring.
The current drive performance of an output transistor depends, besides upon the load, upon:
• the temperature of the crystal;
• the driving voltage; and
• the transconductance of the transistor.
With increasing temperature the mobility of the charge carriers in the channel decreases (negative temperature dependency), the resistance, and as a consequence of this the time constant increases. The gate works more slowly. With lower supply voltage the threshold voltage, which is more or less fixed in the process, is reached later because the rising slope of the voltage is less steep. The gate reacts more slowly. Last, but not least, in all switching performance the transconductance of the transistor is included, depending mainly on the thickness of the gate oxide. With a thickness of below 5 nm today, there may be significant tolerances in thickness, so there are fast gates for low thickness, typical gates for normal thickness and slow gates for increased thickness of the gate oxide, depending on the manufacturing lot. Of course there may be other effects, too, which are not of interest here. Manufacturing tolerances are strongly correlated for all transistors in a lot and on one chip, because the thickness of the oxide layer is about the same everywhere.
Switching performance depends strongly on the slope rate of the input signal, because the input capacitance varies non-linearly over the voltage and there are many internal effects [17.10]. Slow input slope rates have to be avoided in each case. The current passing through the driver stage is high when both transistors are in the ‘on’ condition. For all input signals, a minimum steepness of the slope is required. This is a critical requirement for clock signals because of the internal dynamical structures of clock controlled master slave flip flops. The signals slopes steepness is guaranteed by the way in which the load is compensated by choosing an output stage with sufficient driving capability. The listed delay times are only valid for input signals’ rise times of less than 1 nsec.
It is common to list all timing data for the worst case conditions:
• 85 degree crystal temperature (highest operating temperature);
• slowest process (slow);
• lowest specified supply voltage (2.7 V).
For the application used, the timing behaviour must be calculated by using de-rating factors:
The de-rating factors may be taken from tables, see table 17.5 to 17.7 as examples.
The tables show the degree of influence which the parameters may possess. The process may have a typical tolerance of ±20 %. The effect temperature has is even larger given a constant supply voltage. To keep the possible combinations of extremes as small as possible during verification it is common to simulate with the worst case conditions, which is a conservative approach. The delay times for a normal environment of 25 ◦C, 3 V and a typical process are only 60 % of the worst case values. Minimal delay times are only seldom requested. A design requiring minimum timing should, in principle, always be avoided. In bipolar and BiCMOS processes additional factors need to be considered, but this will not be discussed further.
In the next pages two example designs made by the author in a 0.5 μm CMOS technology, mapped on the library of table 17.3, will be discussed in detail some more. The designs differ in content and design style.
Table 17.8 lists the statistics of cells used for a 16 bit microprocessor core design with micro- code ROM, which was originally designed by hand (schematics) and was only optimised by synthesis programs in sub-blocks. The design uses only 20 different cells. The only complex gates used are a multiplexer and a full adder. AOI gates were not originally available. With 420 cells the design is very compact, together with the micro-code ROM it is only 1.5 mm2, and it is an example for a typical hard macro (hard IP).
The same processor core with some extended capabilities and an integrated 8bx8b multiplier, but without micro code ROM, is represented as a soft macro (soft IP), designed in VHDL and then synthesized using Synopsys Design Compiler, and is listed in table 17.9 for comparison. The core now uses 4 258 cells of 108 different types. The synthesized logic contains many complex gates of AOI type and several buffers. During synthesis the operator library from Synopsys for addition and multiplication was used. The multiplier itself amounts to about one third of the entire design. The size of the core in the same technology is only 10 % larger in area but has higher performance, and may now be mapped to any technology and size requirement.
Designs synthesized from VHDL use broad libraries (via synthesis) much better than manual designs, but must not be optimal for area and power in each case.
Comments
Post a Comment