Do Computers Make Mistakes?

Posted by Josh Fewell on Dec 2nd 2020

At some point when learning about computer hardware, you've probably been curious how these microscopic parts operate so flawlessly. In reality, computer hardware consistently makes errors, but with the development of error detecting and correcting technology, we’ve been able to correct faulty data and utilize technology with a high fault tolerance. The way we accomplish this is through the utilization of several different fault tolerance techniques: redundancy, parity, and error-correcting codes (ECC). Fault Tolerance is how much data a system or device is capable of losing to failure - while keeping the data secure.

A simple, yet expensive way we raise the fault tolerance of our hardware is through redundancy. Redundancy is defined as the inclusion of extra components which are not strictly necessary to functioning, in case of failure in other components. This is implemented in various pieces of computer hardware. We’ve covered, in the past, examples of redundancy used in the instance of data storage. A simple demonstration of this is RAID. RAID is an acronym for a redundant array of independent disks. Through the configuration of RAID 1, two drives are used and carry duplicate information to insure the loss of a disk. More info can be found on RAID configurations and redundancy here. https://centralvalleycomputerparts.com/articles/what-is-raid-0-1-5-6-and-10/

Another way we raise fault tolerance is by the use of parity. Parity, by definition, is the fact of a number being odd or even. In the process of error detection and correction, our data is protected by parity bits (also called checksum bits) placed throughout our storage and memory. These parity bits are designated to double check their surrounding bits to detect and correct errors in real time. The word ‘checksum’ can be used as a mnemonic device to remember what parity bits do; they check the sum of surrounding bits, and if the number is even or odd, a checksum bit will be 0 or 1, respectively. A helpful tool to understand error correction using parity can be found here: https://www.khanacademy.org/computing/computer-science/informationtheory/moderninfotheory/v/testtest , and more information can be found on binary and bits here: https://centralvalleycomputerparts.com/articles/cpus-explained/.

Parity is an example of an error-correcting code (ECC). ECC can be found in various parts of computer hardware, using a plurality of ECC variations. The most common type found in modern computer hardware is the Reed-Solomon error correction. Reed-Solomon codes have been instrumental in not only the development of computer storage devices, but also in smaller applications like barcodes. This is why we are able to scan barcodes when the code is partially ruined.

Hopefully, this provides a basic understanding of, despite how our hardware is faulty, we are able to maintain such reliable use of high fault tolerant components.

Resources:

https://www.khanacademy.org/computing/computer-sci...

https://en.wikipedia.org/wiki/ECC_memory

https://en.wikipedia.org/wiki/Error_detection_and_...

https://www.researchgate.net/publication/328578644...

https://en.wikipedia.org/wiki/Fault_tolerance#Faul...

https://computersciencewiki.org/index.php/Redundan...

https://en.wikipedia.org/wiki/Redundancy_(engineer...

https://www.router-switch.com/faq/what-is-parity-i...

https://en.wikipedia.org/wiki/Error_correction_cod...

https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon...