Troubleshooting PC hardware problems can seem daunting to the uninitiated, but in reality it is much simpler than it seems. Most problems can be diagnosed and corrected using few, if any, special tools and can be accomplished by anybody who can apply simple deductive reasoning and logical thinking. PCs have become more complicated and yet simpler all at the same time. More and more complex internal circuits mean that there are potentially more things that can go wrongmore ways the system can fail. On the other hand, today's complex circuits are embedded into fewer boards, with fewer chips on each board and more serial interconnections using fewer pins (fewer wires). The internal consolidation means that isolating which replaceable component has failed is in many ways simpler than ever before. An understanding of the basics of how PCs work, combined with some very simple tools, some basic troubleshooting tips, and logical thinking and common sense, will enable you to effectively diagnose and repair your own systems, saving a tremendous amount of money over taking it to a shop. In some cases, you can save enough money to practically pay for an entire new system. The bottom line with troubleshooting PC problems is that a solution exists for every problem, and through simple practices combined with deductive reasoning, that solution can easily be found.
Modern PCsMore Complicated and More Reliable
Consider this: The modern PC is an incredible collection of hardware and software. Focusing specifically on the hardware, between 50 and more than 400 million transistors exist in modern processors. In addition, nearly 8.6 billion transistors are in 1GB of RAM; hundreds of millions of transistors exist in the motherboard chipset, video processor, and video RAM; and millions more are in the other adapter cards or logic boards in the system. Each of these billions of interconnected transistors must not only function properly, but also operate in an orderly fashion within strictly enforced timing windows, some of which are measured in picoseconds (trillionths of a second). When you realize that your PC will lock up or crash if any one of these transistors fails to operate properly and on timeand/or any one of the billions of circuit paths and interconnections between the transistors or devices containing them fails in any wayit is a wonder that PCs work at all!
Every time I turn on one of my systems and watch it boot up, I think about the billions upon billions of components and trillions upon trillions of machine/program steps and sequences that have to function properly to get there. As you can now see, many opportunities exist for problems to arise.
Although modern PCs are exponentially more complicated than their predecessors, from another point of view they have become simpler and more reliable. When you consider the complexity of the modern PC, it is not surprising that problems occasionally do arise. However, modern design and manufacturing techniques have made PCs more reliable and easier to service despite their ever-increasing internal complexity. Today's systems have fewer and fewer replaceable components and individual parts, which is a bit of a paradox. The truth is that, as PCs have become more complex, they have also become simpler and easier to service in many ways.
Industry-Standard Replaceable Components
The use of industry-standard components is one of the key features of a PC. This means that virtually all the parts that make up a system are interchangeable with other systems in some manner. This also means that the parts are plentiful, inexpensive, and generally very easy to install. A typical PC contains the following replaceable components, most of which are made to industry standards for design and form factor:
-
Motherboard
-
Processor
-
CPU heatsink/fan
-
RAM
-
CMOS battery
-
Chassis with optional fan
-
Power supply
-
Video card[*]
[*] May be integrated into the motherboard in some systems.
-
Monitor
-
Sound card[*]
-
Speakers
-
Network card[*]
-
Hard drive
-
CD-ROM/RW drive
-
DVD-ROM/+RW drive
-
Floppy drive
-
Drive cables
-
Keyboard
-
Mouse
Although some of the more well-optioned systems might have even more components than listed here, you can see that most PCs have fewer than 20 replaceable "parts." Some can have as few as 1015, depending on how many options are present and how they are integrated. From a hardware troubleshooting or repair perspective, one of these components is either improperly installed (configured) or defective. If it's improperly installed or configured, the component can be repaired by merely reinstalling it or configuring it properly. If it's truly defective, the component must simply be replaced. When a PC is broken down to the basic replacable parts, you can see that it really isn't that complicated, which is why I've spent my career helping people to easily perform their own repairs or upgrades and even build entire systems from scratch.
Reinstall or Replace?
When dealing with hardware problems, the first simple truth to understand is that you do not usally repair anythingyou reinstall or replace it instead. You reinstall because the majority of PC hardware problems are caused by a particular component being improperly installed or configured. I remember hearing from IBM many years ago that it had found that 60% or more of the problems handled by its service technicians were due to improper installation or configuration, meaning the hardware was not actually defective. This was, in fact, the major impetus behind the plug-and-play revolution, which has eliminated the need to manually configure jumpers and switches on most hardware devices. This has thus minimized the expertise necessary to install hardware properly and has also minimized installation, configuration, and resource conflict problems. Still, plug and play has sometimes been called plug and pray because it does not always work perfectly, sometimes requiring manual intervention to make it work properly.
You replace because of the economics of the situation with computer hardware. The bottom line is that it financially is much cheaper to replace a failed circuit board with a new one than to repair it. For example, you can purchase a new, state-of-the-art motherboard for around $100, but repairing an existing board normally costs much more than that. Modern boards use surface-mounted chips that have pin spacings measured in hundredths of an inch, requiring sophisticated and expensive equipment to attach and solder the chip. Even if you could figure out which chip had failed and had the equipment to replace it, the chips themselves are usually sold in quantities of thousands and obsolete chips are usually not available. The net effect of all of this is that the replacable components in your PC have become disposable technology. Even a component as large and comprehensive as the motherboard is replaced rather than repaired.
Troubleshooting by Replacing Parts
You can troubleshoot a PC in several ways, but in the end it often comes down to simply reinstalling or replacing parts. That is why I normally use a simple "known-good spare" technique that requires very little in the way of special tools or sophisticated diagnostics. In its simplest form, say you have two identical PCs sitting side by side. One of them has a hardware problem; in this example let's say the memory module (DIMM) is defective. Depending on how and where the defect lies, this could manifest itself in symptoms ranging from a completely dead system to one that boots up normally but crashes when running Windows or software applications. You observe that the system on the left has the problem but the system on the right works perfectlythey are otherwise identical. The simplest technique for finding the problem would be to swap parts from one system to another, one at a time, retesting after each swap. At the point when the DIMMs were swapped, upon powering up and testing (in this case testing is nothing more than allowing the system to boot up and run some of the installed applications), the problem has now moved from one system to the other. Knowing that the last item swapped over was the DIMM, you have just identified the source of the problem! This did not require an expensive ($2,000 or more) DIMM test machine or any diagnostics software. Because components such as DIMMs are not economical to repair, replacing the defective DIMM would be the final solution.
Although this is very simplistic, it is often the quickest and easiest way to identify a problem component as opposed to specifically testing each item with diagnostics. Instead of having an identical system standing by to borrow parts from, most technicians have an inventory of what they call "known-good spare" parts. These are parts that have been previously used, are known to be functional, and can be used to replace a suspicious part in a problem machine. However, this is different from new replacement parts because, when you open a box containing a new component, you really can't be 100% sure that it works. I've been in situations in which I've had a defective component and replaced it with another (unknown to me) defective new component and the problem remained. Not knowing that the new part I just installed was also defective, I wasted a lot of time checking other parts that were not the problem. This technique is also effective because so few parts are needed to make up a PC and the known-good parts don't always have to be the same (for example, a lower-end video card can be substituted in a system to verify that the original card had failed).
Troubleshooting by the Bootstrap Approach
Another variation on this theme is the "bootstrap approach," which is especially good for what seems to be a dead system. In this approach, you take the system apart to strip it down to the bare minimum necessary, functional components and test it to see whether it works. For example, you might strip down a system to the chassis/power supply, bare motherboard, CPU (with heatsink), one bank of RAM, and a video card with display and then power it up to see whether it works. In that stripped configuration, you should see the POST or splash (logo) screen on the display, verifying that the motherboard, CPU, RAM, video card, and display are functional. If a keyboard is connected, you should see the three LEDs (capslock, scrlock, and numlock) flash within a few seconds after powering on. This indicates that the CPU and motherboard are functioning because the POST routines are testing the keyboard. After you get the system to a minimum of components that are functional, you should reinstall or add one part at a time, testing the system each time you make a change to verify it still works and that the part you added or changed is not the cause of a problem. Essentially, you are rebuilding the system from scratch using the existing parts, but doing it one step at a time.
Many times problems are caused by corrosion on contacts or connectors, so the mere act of disassembling and reassembling a PC will "magically" repair it. Over the years, I've disassembled, tested, and reassembled many systems only to find no problems after the reassembly. How can merely taking it apart and reassembling repair a problem? Although it might seem that nothing was changed and everything is installed exactly like it was before, in reality simply unplugging and replugging renews all the slot and cable connections between devices, which is often all the system needs. Some useful troubleshooting tips include
-
Eliminate unnecessary variables or components that are not pertinent to the problem.
-
Reinstall, reconfigure, or replace only one component at a time.
-
Test after each change you make.
-
Keep a detailed record (write it down) of each step you take.
-
Don't give up! Every problem has a solution.
-
If you hit a roadblock, take a break or work on another problem. A fresh approach the next day often reveals things you overlooked.
-
Don't overlook the simple or obvious. Double- and triple-check the installation and configuration of each component.
-
Keep in mind that the power supply is one of the most failure-prone parts in a PC, as well as one of the most overlooked components. A high-output "known-good" spare power supply is highly recommended to use for testing suspect systems.
-
Cables and connections are also a major cause of problems, so keep replacements of all types on hand.
Before starting any system troubleshooting, a few basic steps should be performed to ensure a consistent starting point and to enable isolating the failed component:
-
Turn off the system and any peripheral devices. Disconnect all external peripherals from the system, except for the keyboard and video display.
-
Make sure the system is plugged in to a properly grounded power outlet.
-
Make sure the keyboard and video displays are connected to the system. Turn on the video display, and turn up the brightness and contrast controls to at least two-thirds of the maximum. Some displays have onscreen controls that might not be intuitive. Consult the display documentation for more information on how to adjust these settings. If you can't get any video display but the system seems to be working, try moving the card to a different slot (not possible with AGP or PCI Express video adapters) or try a different video card or monitor.
-
Turn on the system. Observe the power supply, chassis fans (if any), and lights on either the system front panel or power supply. If the fans don't spin and the lights don't light, the power supply or motherboard might be defective.
-
Observe the power on self test (POST). If no errors are detected, the system beeps once and boots up. Errors that display onscreen (nonfatal errors) and that do not lock up the system display a text message that varies according to BIOS type and version. Record any errors that occur and refer to the disc accompanying this book for a list of BIOS error codes for more information on any specific codes you see. Errors that lock up the system (fatal errors) are indicated by a series of audible beeps. Refer to the disc for a list of beep error codes.
-
Confirm that the operating system loads successfully.
Problems During the POST
Problems that occur during the POST are usually caused by incorrect hardware configuration or installation. Actual hardware failure is a far less-frequent cause. If you have a POST error, check the following:
-
Are all cables correctly connected and secured?
-
Are the configuration settings correct in Setup for the devices you have installed? In particular, ensure the processor, memory, and hard drive settings are correct.
-
Are all drivers properly installed?
-
Are switches and jumpers on the baseboard correct, if changed from the default settings?
-
Are all resource settings on add-in boards and peripheral devices set so that no conflicts existfor example, two add-in boards sharing the same interrupt?
-
Is the power supply set to the proper input voltage (110V120V or 220V240V)?
-
Are adapter boards and disk drives installed correctly?
-
Is a keyboard attached?
-
Is a bootable hard disk (properly partitioned and formatted) installed?
-
Does the BIOS support the drive you have installed, and if so, are the parameters entered correctly?
-
Is a bootable floppy disk installed in drive A:?
-
Are all memory SIMMs or DIMMs installed correctly? Try reseating them.
-
Is the operating system properly installed?
Hardware Problems After Booting
If problems occur after the system has been running, and without having made any hardware or software changes, a hardware fault possibly has occurred. Here is a list of items to check in that case:
-
Try reinstalling the software that has crashed or refuses to run.
-
Try clearing CMOS RAM and running Setup.
-
Check for loose cables, a marginal power supply, or other random component failures.
-
Try reseating the memory modules (SIMMs, DIMMs, or RIMMs).
Problems Running Software
Problems running application software (especially new software) are usually caused by or related to the software itself, or are due to the fact that the software is incompatible with the system. Here is a list of items to check in that case:
-
Does the system meet the minimum hardware requirements for the software? Check the software documentation to be sure.
-
Check to see that the software is correctly installed. Reinstall if necessary.
-
Check to see that the latest drivers are installed.
-
Scan the system for viruses using the latest antivirus software.
Problems with Adapter Cards
Problems related to add-in boards are usually related to improper board installation or resource (interrupt, DMA, or I/O address) conflicts. Chapter 4, "Motherboards and Buses," has a detailed discussion of these system resources, what they are, how to configure them, and how to troubleshoot them. Also be sure to check drivers for the latest versions and ensure that the card is compatible with your system and the operating system version you are using.
Sometimes adapter cards can be picky about which slot they are running in. Despite the fact that, technically, a PCI or ISA adapter should be able to run in any of the slots, minor timing or signal variations sometimes occur from slot to slot. I have found on numerous occasions that simply moving a card from one slot to another can make a failing card begin to work properly. Sometimes moving a card works just by the inadvertent cleaning (wiping) of the contacts that takes place when removing and reinstalling the card, but in other cases I can duplicate the problem by inserting the card back into its original slot. When all else fails, try moving the cards around! Because some motherboards share a single IRQ between two PCI slots or between a PCI and an AGP slot, changing one of the PCI cards to another slot can resolve conflicts.
Caution
Note that PCI cards become slot specific after their drivers are installed. By this I mean that if you move the card to another slot, the plug-and-play resource manager sees it as if you have removed one card and installed a new one. You therefore must install the drivers all over again for that card. Don't move a PCI card to a different slot unless you are prepared with all the drivers at hand to perform the driver installation. ISA cards don't share this quirk because the system is not aware of which slot an ISA card is in.
No comments:
Post a Comment