Can we trust computers calculations?

### Introduction

**IEEE 754** specification is the standard used for floating-point calculations in the engineering and scientific fields. In my humble opinion it is a perfect example of how computer industry and software development are based on a **fundamental illusion of mankind:** the possibility to attain and manipulate infinity (infinitely small, infinitely big, it doesn’t matter).

I don’t really want to discuss the value of this standard (which is used among others by __all__ processors and programming languages), but to explain the pain I feel looking at results like these:

32 bit calculation:

1.93 - 0.93 = **0.99999994**

64 bit calculation:

1.93 - 0.93 = **0.9999999999999999**

We would expect a nice ** 1.0**, but the damned

*calculator machine*seems gone nuts! Don’t they use computers to do weather forecasts, to calculate rocket paths and bank account interests? How come after all these years nobody noticed anything?

### The hard truth

To be precise, the IEEE 754 standard defines just how to obtain __the best approximation possible__ of decimal numbers, and the examples above perfectly adhere to the specification. Strange as it seems, “0.9999999999999999” in a 64 bit calculation is definitely a better approximation rather than “1.0” of the difference “1.93 - 0.93” !!!

Decimal numbers (in Math terminology “real” numbers) in fact cannot be always represented by a __finite__ number of digits, because they can have __infinte__ digits. Here comes the approximation done by the *calculation machines*:

- with 64 bit we can represent 2^64 values (18.446.744.073.709.551.616 a bit over eighteen billions of billions)
- 1 bit is used for the
*sign*+/- - 11 bit are used for the
*exponent* - 52 bit are used for the
*mantissa*

### Examples

Let’s see now a simple example using the Java programming language. To show the difference between a **real number** and a **32 bit floating-point number** we will use two different data types:

**float**: a primitive Java type that represents a floating point value coded according to IEEE 754 specification targeted at 32 bit floating-point numbers;**java.math.BigDecimal**: BigDecimal class handles “arbitrary precision” math (with limits, of course); it can handle numbers with an “arbitrary” amount (well, no more than Integer.MAX_VALUE) digits. For an introduction see “java.math.BigDecimal” on this site.

The program does this:

- assigns to the variable
`float a`

the value “0.1”; - creates a BigDecimal based on the value of
`a`

; - shows the value of
`a`

and`b`

.

Because `b`

has a far greater precision than `a`

it shows the weird approximation that has afflicted the value “0.1”, that is “0.100000001490116119384765625”

If we now multiply `a`

by greater and greater numbers, we can understand the amusement God has looking at us __pursuing numbers always a bit longer but never precise enough…__

```
package megadix;
import java.math.*;
/** Program that shows the limits of Java floating-points calculations
* @author De Franciscis Dimitri megadix@yahoo.it */
public class MadNumbers {
private static void display(float a, BigDecimal b) {
System.out.println("--------------------------------------------------");
System.out.println("a = " + a);
System.out.println("b = " + b);
String stringVal = Float.toString(a);
System.out.println("b - a = " + b.subtract(new BigDecimal(stringVal)));
}
public static void main(String[] args) {
System.out.println("Variables");
System.out.println("a : 32 bit floating point number");
System.out.println("b : arbitrary-precision representation of a");
float a = 0.1f;
BigDecimal b = new BigDecimal(a);
display(a, b);
display(a * 1000, b.multiply(new BigDecimal(1000)));
display(a * 1000000, b.multiply(new BigDecimal(1000000)));
display(a * 1000000000, b.multiply(new BigDecimal(1000000000)));
}
}
```

**Output:**

Variables a : 32 bit floating point number b : arbitrary-precision representation of a -------------------------------------------------- a = 0.1 b = 0.100000001490116119384765625 b - a = 0.000000001490116119384765625 -------------------------------------------------- a = 100.0 b = 100.000001490116119384765625000 b - a = 0.000001490116119384765625000 -------------------------------------------------- a = 100000.0 b = 100000.001490116119384765625000000 b - a = 0.001490116119384765625000000 -------------------------------------------------- a = 1.0E8 b = 100000001.490116119384765625000000000 b - a = 1.490116119384765625000000000