Design for Testability:Design for Testability
Design for Testability
All methods presented so far for improving testability are generally applicable circuit modifications which may be performed after completing a circuit design. In this section we now present techniques of how to consider testability in an early phase of circuit design.
In section 15.6.1 circuits are designed in a way such that simple well known tests can be applied. Then sections 15.6.2, 15.6.3, and 15.6.4 consider techniques to derive self checking circuits. Often such circuits have a special mode of operation for generating test inputs and to check the computed outputs for correctness. For example, in this way chips may test themselves at power on reset. Also a self-test may be performed during the normal operation of a circuit. For this purpose one uses codes to simplify the detection of faulty results.
Universal Test
A universal test is a test which can be applied to a large class of circuits. A typical example is testing RAM memory. Here the principal type of test sequences is always the same and there are only slight differences depending on the actual size of a RAM block. Thus the test pattern sequence is defined in a parameterized way such that for a given data length and storage size it is easy to generate suitable test patterns. In a similar way it is also possible to derive test patterns for the class of carry ripple adders or carry look ahead adders that are parameterized by the data length.
It is more interesting that such universal tests also exist for a class of easy testable PLAs (PLA: Programmable Logic Array) [15.2], [15.6], [15.11], [15.46]. Since PLAs can be used to implement arbitrary Boolean functions they represent a very powerful class of circuits. In particular, PLAs are also used to implement state transition functions of finite state machines (FSM: Finite State Ma- chine). When an arbitrary combinatorial circuit is implemented by an easily testable PLA then no explicit test pattern generation is necessary, because, depending on the actual size of the PLA, a pre-computed universal test can be used.
First we describe the structure of a conventional PLA (not designed for testability) for a Boolean function (y1, ... , ym) = f (x1, ... , xn). The normalized representation for each signal yi is defined as a sum of products over input variables and inverted input variables. Thus the function can be implemented using two levels of gates. As shown in figure 15.45 a PLA consists of an AND block and an OR block. In the AND block there are k columns, each computing one term of the normalized representation. Then the OR block combines such terms obtaining signals yi.
Possible faults are a stuck at fault at a line or that a transistor is permanently conducting or permanently locking
To improve the testability of a PLA one extends the circuit such that each column of the AND block becomes separately addressable and faults can be observed at additional test outputs v1 and v2 (figure 15.46). For that purpose a shift register with input z0 is inserted above the AND block. It will be serially initialized to (s1, s2, ..., sk) such that for si = 1 the AND operation is performed for column i and for si = 0 column i always computes signal 0’.
Furthermore, for the horizontal lines of the AND block the input signals are combined with test inputs z1, resp., z2, such that for test mode horizontal lines can easily be set to ‘1’. Finally, there
is an additional column in the AND block with transistors placed in such a way that for every row there is an odd number of transistors. Accordingly, there is an additional row in the OR block such that in every column of the OR block there is an odd number of transistors. Figure 15.46 shows such an easy to test PLA for the example of functions
The circuit needs n = 4 data inputs (x1, x2, x3, x4) and computes m = 2 output bits y1 and y2 using six intermediate product terms. Thus the AND block consists of k = 6 + 1 = 7 columns. For the normal mode of the circuit one uses z1 = z2 = 0 and s1 = s2 = ... = sk = 1. The good testability of the circuit is because each transistor can be selected separately and the additional column and row is used to calculate parity bits v1 and v2 such that several transistors can be tested simultaneously.
This way a complete test for single faults only needs the 3k + 2n test patterns given in table 15.17. Because of the computed parity bits the test pat- terns do not depend on the Boolean functions implemented. Thus the given test is a universal one and can be applied to any PLA of the type described.
Signature Analysis
Signature Analysis is a method derived for self- testing circuits. One collects, for example, intermediate data at certain lines of the circuit to compute a signature. For external evaluation the signature may be written to primary outputs or its correctness can also be checked immediately on the chip.
A simple computation of signatures can be achieved using linear feedback shift registers (LFSR). Figure 15.47 presents an example of a LFSR of length l = 4 bits. It is used to convert a long sequence of observed signals into a comparatively short signature.
Test sequence:
1. reset signature analysis;
2. input of a long sequence of test patterns;
3. output of a computed short signature;
4. compare signature to the desired value.
In table 15.18 this is demonstrated by a small example of converting an input sequence of length eight bits into a signature of length four bits.
The correct signature for a given sequence of test patterns can be determined by a circuit simulation and may be coded in hardware within the circuit in such a way that finally the chip itself can perform a signature check. If a wrong signature is observed then obviously the chip is defective. For a correct signature the chip may be correct or the correct signature is derived because of a compensation of several incorrect bits. To estimate the probability of such a hidden fault we assume for the sake of simplicity that after the first faulty bit we obtain a random sequence of states such that all possible signatures are generated with equal probability. Then the probability of obtaining a correct signature of length l, despite faulty input bits, becomes 1/2l . Thus for a sufficiently long signature the probability of an undetected fault becomes extremely small.
However, for a complete self-test, instead of ap- plying external test patterns the patterns should be created on the chip. Since this is very difficult for complicated sequences of test patterns one often uses a pseudo-random sequence of test patterns which can be generated by a linear feedback shift register. In section 15.6.3 this technique will be considered in more detail.
In addition to the signature analysis described with sequential input there are other versions with parallel input (MISR: Multiple Input Signature Register). Furthermore, there are versions like BILBO (Built In Logic Block Observation) with additional multiplexers at shift register cells such that signature analysis can operate in different modes. This way, for example, a serial output of the computed signature can also be obtained.
On-Chip Generation of Test Patterns
To generate test patterns locally for a self-test on the chip one uses linear feedback shift registers as applied for signature analysis. If, for example, the input of the LFSR register shown in figure 15.47 is fixed to logical 1 and the register is initialized to ‘0000’ then the LFSR generates the state sequence given in table 15.19. Of course a sequence generated in the manner described has to become periodic. For the example the period is of length 12.
Such arbitrary looking sequences of numbers are called pseudo-random sequences. In the case of a self-test they are used to replace random test patterns. The quality of pseudo-random sequences depends on the length of period. For a feedback shift register of length l the maximal attainable period length is 2l − 1. It can be obtained by selecting special feedbacks, such that the operation of the LFSR is equivalent to a polynomial division with some prime polynomial. For example, in [15.44] minimal primitive polynomials are listed such that only few feedback loops with XOR gates are needed.
In section 15.4.4 it was mentioned that using a ran- dom test pattern only ‘easily detectable’ faults can be observed. Therefore the fault coverage of self- testing circuits can be improved by combining deterministic sequences and random sequences. For this purpose one uses a test pattern generator to derive additional deterministic test patterns for those faults that are not detected by a random pattern sequence. Such deterministic patterns then may be stored in a ROM on the chip to be used as additional patterns for a self-test. In [15.8] a modified method is presented for using an OR matrix for re-coding the state of a shift register in such a way that the generated pseudo-random sequence also includes deterministic patterns. Figure 15.48 gives a sketch of the hardware used. At first a logic 1 is shifted through the register without using feedback loops, thus creating test patterns which correspond to single columns of the OR matrix. Afterwards feedback is used to create pseudo-random patterns using the OR operation on columns.
Other approaches try to create all desired deterministic test patterns using the LFSR method. In this case the fault coverage obtained is defined by a deterministic test. For that purpose one can re- initialize the shift register, such that successively several sequences with different initial values are created. Alternatively, the type of feedback can be dynamically modified to obtain sequences with special properties [15.16]. Such implementations are sketched in figure 15.49 and 15.50. The ROM data needed can be derived from a deterministic test by solving a linear system of equations.
To test a circuit for delay faults or stuck open faults one needs pairs of test patterns. In this case also the arrangement of test patterns within a sequence is important. Indeed the LFSR method can also be used to create test pattern sequences such that pre- defined pairs of test patterns are included [15.45], [15.9].
An alternative method of generating pseudo- random sequences of test patterns by LFSR is to use a field of cellular automata [15.20], [15.14]. In such a field communication between adjacent automata ensures that input patterns are modified in a complicated way. For a suitable initialization and definition of interaction between adjacent cells one also obtains test pattern sequences containing predefined patterns or even predefined pairs of test patterns.
All methods mentioned are gaining significance by applying the Boundary Scan method considered in section 15.7. For chips which are equipped with that test interface there is already additional hardware at primary inputs which can also be used for test pattern generation. In this way only little additional hardware is sufficient for achieving an on-chip generation of test patterns.
Application of Codes
Special codes are used, for example, to transmit data such that a receiver is able to detect transmis- sion errors. A simple example is the introduction of a parity bit. With other codes it is even possible not only to detect errors, but also to perform an automatic error correction. As an example we consider code C = {(000), (111)} which only consists of two code words selected from eight possible binary words of length 3 bits. Figure 15.51 gives a graphical representation of this code. It is easy to see that any path between both code words needs three edges. Thus the code is of Hamming distance 3. If, owing to an error, one or two bits of a word are faulty this can be detected because the resulting word is not a code word. Assuming that only one bit is faulty the error can be corrected by replacing the wrong word by the adjacent code word.
The redundancy of a code can also be used to detect errors or even to correct errors for signals on a chip. For the following application we assume that a circuit S is implemented in such a way that intermediate data words or results have to be code words. Then a self-test can be performed as shown in figure 15.52 by using an additional test circuit to check whether indeed all data words belong to the code.
For this approach the problem arises that a fault may also occur within the test circuit, in such a way that the self-test will not work correctly. Thus the test circuit should be implemented to be totally self-checking in order also to detect its own faults.
As an example the circuit S may use an ‘m of n code’. Such a code is defined by the set of all binary words of length n with exactly m bits set to
words. As an example the 3 of 6 code consisting of 30 code words is given in table 15.20. With such a code one can detect single errors (one bit of the code word is wrong) and all multiple errors when all faulty bits of a code word have the same logic value. A fault causing erroneous data words of this type is called an uni-directional fault. For example it may be owed to either stuck at 0 faults or stuck at 1 faults at several lines.
As an example A may be the 3 of 6 code and for B we use the 1 of 2 code (B = {(0, 1), (1, 0)}). Then the test circuit computes a function g accepting six input bits and producing two output bits y1 and y2. If in the input x there are exactly three ones then we obtain an output with y1 /= y2. Otherwise, the output satisfies y1 = y2. The circuit implementation of g shall be done in a way such that following definition is satisfied.
A test circuit is called self-checking for a fault set F if for each fault f ∈ F there is at least one input x ∈ A such that the circuit produces an output y /∈ B.
Because of this definition every fault within a test circuit can be detected using only the available inputs x ∈ A. This is an important property, because
code words (0, 1) and (1, 0) are never interchanged because of a fault.
Test circuits for an m of n code with n /= 2m can be designed by re-coding the inputs to reduce the problem to the case of a m of 2m code considered above [15.38], [15.26], [15.33].
Using other codes arbitrary combinatorial circuits can be made totally self-checking. For example, in [15.4] a circuit modification is presented such that any fault within a circuit can be observed at an odd number of output bits. In this case faults are detected by parity checking at primary outputs.
Another variant of self-checking circuits uses fault indicators registering whether at any time a given test property has been violated. In this way such faults can also be detected that occur for only short time periods and do not produce wrong results. For example, this technique can be used for path delay faults. In [15.35] such a fault indicator is presented for the 1 of 3 code.
A generalization of totally self-checking circuits are strong fault secure circuits [15.41], [15.33] that are defined to also handle multiple faults.
A further improvement leads to fault tolerant cir- cuits. Such circuits will produce correct results even if there are ‘small’ faults within the circuit. To derive such circuits one can use a code with a large Hamming distance such that single erroneous bits can be automatically corrected. For example, fault tolerant circuits are of interest for safety-relevant applications where a fault may not lead to a total failure of a complete system. Also, if hardware overhead for fault tolerance is not too large this technique can be used to improve the yield of chip production. This is because in spite of a fault such chips may further be used as correct chips without fault tolerance.
Principle of Multiple Computation
A very simple method of using redundancy to detect faults is to perform the desired function several times using separate hardware and to compare the results obtained. This is a special case of a code such that each bit of the correct result is represented several times. However, this simple approach causes a large overhead of hardware. Sometimes there are more skilful implementations of multiple computations.
For example, when using RNS arithmetic (RNS: Residue Number System) for fast addition and multiplication of large numbers all numbers are represented as vectors with k relatively small com- ponents. Then the actual computation can be performed componentwise using k relatively small arithmetic units, and afterwards the final result is assembled from k independent intermediate results y1, ..., yk. In order to detect faults within arithmetic units one can extend the number representation by an additional redundant component. Then using k+1 instead of k arithmetic units one obtains a redundant representation y1, ..., yk, yk+1 of the intermediate result.
The final result y can be determined from every k components of the intermediate result. Thus using the same hardware, but different data the result y, resp. yˆ can be determined twice ((y1, ... , yk) → y and (y2, ..., yk+1) → yˆ). Whenever one of the components yi is erroneous, owing to a fault in an arithmetic unit, both results will differ and the fault is detected.
Comments
Post a Comment