Can we trust computers calculations?

IEEE 754 specification is the standard used for floating-point calculations in the engineering and scientific fields. In my humble opinion it is a perfect example of how computer industry and software development are based on a fundamental illusion of mankind: the possibility to attain and manipulate infinity (infinitely small, infinitely big, it doesn't matter).

I don't really want to discuss the value of this standard (which is used among others by all processors and programming languages), but to explain the pain I feel looking at results like these:

32 bit calculation:

1.93 - 0.93 = **0.99999994**

64 bit calculation:

1.93 - 0.93 = **0.9999999999999999**

We would expect a nice 1.0, but the damned calculator machine seems gone nuts! Don't they use computers to do weather forecasts, to calculate rocket paths and bank account interests? How come after all these years nobody noticed anything?

The hard truth

To be precise, the IEEE 754 standard defines just how to obtain the best approximation possible of decimal numbers, and the examples above perfectly adhere to the specification. Strange as it seems, "0.9999999999999999" in a 64 bit calculation is definitely a better approximation rather than "1.0" of the difference "1.93 - 0.93" !!!

Decimal numbers (in Math terminology "real" numbers) in fact cannot be always represented by a finite number of digits, because they can have infinte digits. Here comes the approximation done by the calculation machines:

  • with 64 bit we can represent 2^64 values (18.446.744.073.709.551.616 a bit over eighteen billions of billions)
  • 1 bit is used for the sign +/-
  • 11 bit are used for the exponent
  • 52 bit are used for the mantissa


Let's see now a simple example using the Java programming language. To show the difference between a real number and a 32 bit floating-point number we will use two different data types:

  • float: a primitive Java type that represents a floating point value coded according to IEEE 754 specification targeted at 32 bit floating-point numbers;
  • java.math.BigDecimal: BigDecimal class handles "arbitrary precision" math (with limits, of course); it can handle numbers with an "arbitrary" amount (well, no more than Integer.MAX_VALUE) digits. For an introduction see "java.math.BigDecimal" on this site.

The program does this:

  • assigns to the variable float a the value "0.1";
  • creates a BigDecimal based on the value of a;
  • shows the value of a and b.

Because b has a far greater precision than a it shows the weird approximation that has afflicted the value "0.1", that is "0.100000001490116119384765625"

If we now multiply a by greater and greater numbers, we can understand the amusement God has looking at us pursuing numbers always a bit longer but never precise enough...

package megadix;

import java.math.*;

/** Program that shows the limits of Java floating-points calculations
 * @author De Franciscis Dimitri */
public class MadNumbers {
  private static void display(float a, BigDecimal b) {
    System.out.println("a     = " + a);
    System.out.println("b     = " + b);
    String stringVal = Float.toString(a);
    System.out.println("b - a = " + b.subtract(new BigDecimal(stringVal)));

  public static void main(String[] args) {
    System.out.println("a : 32 bit floating point number");
    System.out.println("b : arbitrary-precision representation of a");

    float a = 0.1f;
    BigDecimal b = new BigDecimal(a);

    display(a, b);
    display(a * 1000, b.multiply(new BigDecimal(1000)));
    display(a * 1000000, b.multiply(new BigDecimal(1000000)));
    display(a * 1000000000, b.multiply(new BigDecimal(1000000000)));


a : 32 bit floating point number
b : arbitrary-precision representation of a
a     = 0.1
b     = 0.100000001490116119384765625
b - a = 0.000000001490116119384765625
a     = 100.0
b     = 100.000001490116119384765625000
b - a = 0.000001490116119384765625000
a     = 100000.0
b     = 100000.001490116119384765625000000
b - a = 0.001490116119384765625000000
a     = 1.0E8
b     = 100000001.490116119384765625000000000
b - a = 1.490116119384765625000000000