IEE 754 Standard for Floating-Point Arithmetic


The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard established by the Institute of Electrical and Electronics Engineers (IEEE). It is the most ubiquitous standard for floating-point computations. Many software and hardware implementations are required to follow IEEE 754 formats and operations. It has been developed to consolidate the results of numerical calculations performed on different hardware platforms. The standard defines 2 types of floating-point numbers: 32-bit single precision numbers and a 64-bit double precision numbers. The format of a floating-point number comprises 3 types of bits presented in the following figure:

For a 32- bit single precision numbers the standard specifies:
Sign bit - 1 bit
Exponent width - 8 bits
Significand precision: 24 bits
Sign bit determines whether the number is positive (value “0”) or negative (value “1”). The exponent is is either an 8 bit signed integer from −128 to 127 or an 8 bit unsigned integer from 0 to 255. The true significand includes 23 fraction bits to the right of the binary point and an implicit leading bit with value 1 (unless the exponent is stored with all zeros). Thus only 23 fraction bits of the significand appear in the memory format but the total precision is 24 bits.

For a 64- bit double precision numbers the standard specifies:
Sign bit - 1 bit
Exponent width - 11 bits
Significand precision: 53 bits

The value of the number in the IEEE 754 standard format follows:

Example:
01000010110010000000000000000000
s = 0 so the number is positive
e = 10000101 (overflow) = 133-127=6
m = 01.10010000000000000000000 = 1 9/16
N = (-1)^0 * 2^6 * 1 9/16 = 26 * x 25/16 = 4 * 25 = 100

Try the IEEE 754 format online