SCSI Troubleshooting Tips

When resposible for the operation of a SCSI configuration you sometimes have trouble to find the root of a (usually) intermittend fault. Like any other bus the SCSI bus has some devices attached to it that need to work together. Any failure in one of the devices or in the cabling/harness can affect the operation of all devices on the bus. So, how do you locate faults quickly?

The answer is simple: Use some simple testing tools and, mostly, your brains. For latter to be usuable you need a thorough understanding of the priciples of the SCSI bus. A short tutorial on this site gives you some of the most basic facts. But if that all is new to you, you should better read a book about the SCSI bus or attend a course. (We offer courses in SCSI basics and SCSI troubleshooting, see our trainings partner Onsite Computer.)

This article will describe some easy, but powerful ways to find most of the problem on a parallel SCSI bus. We do not consider the serial versions here. The procedures described herein do not need any specialized equipment - a simple multimeter is all you need. But, of course, this short essay can not be a substitute for a thorough training.

The most common faults on the parallel SCSI bus

Believe it or not, but the most common faults are the cables, connectors and terminators. Not that they fail more often than the rest, but they are typically disregarded and/or mistreated. Let us start with the terminators.

The SCSI bus has a strict definition of where terminators are to be placed:

  1. There are always two and only two on the SCSI bus. Never may be more than two, nor only one. (Without any terminator the bus will totally cease to function, but this fault is easy to detect.) What will the symptoms be when there are not exactly two terminators? The symptoms in most cases will be strange error messages in the log file of the type "SCSI phase sequence error" or "SCSI unexpected bus free" or even a lot of entries that indicate that just a specific device produces intermittend errors. Dont't blame the device! It was designed to work on a properly configured bus, not on one with more noise or too low impedance than specified.

    You can easily check how many terminators are there on the bus, at least for the single ended version of the bus (that is the most common to attach cheap peripherals):

    Take your multimeter, switch it to milliamps, and measure the short circuit current between any data or control line (but notTermPWR!) to ground. On the 50pin flat cable with the pinheader connectors simply take a pair of opposite contacts at either end of the connector. These are either pins 1 and 2 or 49 and 50. Both pairs have a ground and a signal line. If you read about 24mA then there are just two terminators on the cable and you have to investigate further. If you read something less than 18mA there is only one terminator present and if the value is larger than 30mA three or more terminators are there. Go and look for them, now. Then proceed to the next step.

  2. The two terminator must be located at the physical ends of the cable. Not somewhere, but really at the ends. What did your test in the first step show? Is the number correct? Yes, are both at the place where they are expected to be (at the ends)?

    Don't underestimate the importance to the correct location of the terminators! Like in the first case the symptoms can be very misleading. When the bus is operating at higher speeds the cable becomes a transmission line with all it's intricacies. And transmission lines must be properly terminated, period. Typical error messages look like the ones mentioned above, eventually blaming an innocent device that simply has the bad luck to be placed at an inconvenient place on the cable.

    So, check that the two terminators are really at the end of the cable. Do not allow any stub of cable to protrude beyond a terminator. OK, checked it, but the intermittend faults are still there? Try the next step.

  3. Do you use active or passive terminators? The standard urges you to use active terminators at the higher transfer speeds (>10MT/s) and not to mix active and passive termination. Although this usually presents no big problem, it sometimes can be. So check it and use active termination whenever possible.

  4. How long are all cables combined and how fast do you run the bus? (Assuming that you have neither a differential nor a SCSI-3 LVD bus.) There are four ranges: <1.5m, 1.5m to 3.0m, 3.0m to 6.0m and >6m. If your figure falls into the fourth range you must find a way to reduce the cable length - there is no easy way around it, unless you spend money in converters. (Contact your dealer to get more info about the availability and pricing of these converters.)

    If your cables are less than 6m, but more than 1.5m in length you have two options: First to reduce the total length to less than 1.5m or to limit the speed of the bus to a maximum of 5 million transactions per second (5MT/s). On an 8 bit bus this means 5MB/s, on a 16bit (WIDE) one 10MB/s. Probably not exactly what you like, but unless you can shorten your cables a reliable operation is not guaranteed with too long cables. In tabular form the dependency between the allowable cable length and the maximum transfer speed looks like this:
    Cable Length<1.5m1.5m ... 3.0m3.0 ... 6.0m
    Max. Speed20MT/s10MT/s5MT/s
    As with the misconfigurations in the previous steps the generated error messages are more confusing than helpful when the cables are too long, causing false signal transitions at unexpected times.

    Be aware of one common pitfall: Your old SCSI controller is replaced by a modern one and on a sudden some devices (possibly including the brand new controller) produce errors. This "upgrade" trap is very common. Remember, that on the SCSI bus all the controllers check all the devices after a bus reset and will mutually agree on a transfer rate to use. Suppose, you had an AHA12xx in your system and replace it with an AHA29xx. Additionally you bought a "Hawk" disk a couple of weeks ago. With the old controller everythings worked fine, but with the new the disk and/or the controller produce error messages. This is due to the fact, that now both, the controller and the drive can speak faster and they will do it unless you limit the maximum transfer rate to negotiate on the controller.

  5. Are the cables and the terminators still in a good shape? No bends or something the like? Especially when a connection is plugged and unplugged very often the contacts degrade and eventually will be a reason for failures. Remember, that only some special connectors are designed for more than few hundred of mating cycles. But these are usually not used here. So when you disconnect a device once per day you will have worn out the connectors within one year's time!

    And the cables can break, too. Route them in a way that they do not experience any strain, nor bend them with a radius less than 5 times the diameter of the cable. For a cable with 10mm diameter this means at least a bend radius of 50mm.

Your cables and terminators are all ok, but your system still shows problems? Now, you are about to enter a swamp if you do not have access to some good equipment. Sure, some errors can be found by carefully evaluating the error log entries (if they contain more information than that of Windows NT). The rest of them is very hard to track down. But some tips are available.

More tricky faults.

If your problems remain you should consult a specialist that has access to special equipment. Trying to find an intermittend fault without the right tools is too time consuming and frustrating.

© Paul Elektronik, 1998-2002