Read the source Luke!: Replacing Floating Point multiplication with Integer shifting (or ... "optimization witchcraft")

Updated 2014/08/09 (see update)

NOTE: Throughout this article I try to use lowercase names (i.e. float or integer) for the types and uppercase names (i.e. Float or Integer) for class names.

I’m currently working with some professional vision mixers and one of the tasks is loading & saving image from and to the mixer. And the mixer expects all in 8-bit (4:2:2) encoded YCbCr. Converting between RGB and YCbCr is pretty straight forward using some transformation matrices:

⎡ ⎣ ⎢ ⎢ R G B ⎤ ⎦ ⎥ ⎥ = ⎡ ⎣ ⎢ ⎢ 1.164 1.164 1.164 0 - 0.392 2.017 1.596 - 0.813 0 ⎤ ⎦ ⎥ ⎥ \cdot ⎡ ⎣ ⎢ ⎢ Y - 16 C b - 128 C r - 128 ⎤ ⎦ ⎥ ⎥

$\left[ \begin{matrix} R \\ G \\ B \end{matrix} \right] = \left[ \begin{matrix} 1.164 & 0 & 1.596 \\ 1.164 & -0.392 & -0.813 \\ 1.164 & 2.017 & 0 \\ \end{matrix} \right] \cdot \left[ \begin{matrix} Y-16 \\ Cb -128 \\ Cr-128 \end{matrix} \right]$

floatRgbFromY: y Cb: cb Cr: cr
    "YCrCb to RGB
    RGB 0-255
    Y 16-235
    CbCr 16-240"

    | r g b |
    r := (y - 16) * 1.164 + ((cr - 128) * 1.793).
    g := (y - 16) * 1.164 + ((cb - 128) * -0.213) + ((cr - 128) * -0.533).
    b := (y - 16) * 1.164 + ((cb - 128) * 2.112).
    ^ {r.g.b}

However using this code to convert a 1280x720 image from YCbCr takes around 20 seconds on my machine … not really fast. After some surfing I found a two different ways to convert the data - both of which where a bit and much faster:

integerRgbFromY: y Cb: cb Cr: cr
    "YCrCb to RGB
    RGB 0-255
    Y 16-235
    CbCr 16-240"

    | ym cbm crm r g b |
    ym := (y - 16) * 149 / 128.
    cbm := cb - 128.
    crm := cr - 128.
    r := ym + (crm * 459 / 256).
    g := ym - (cbm * 109 / 512) - (crm * 17 / 32).
    b := ym + (cbm * 135 / 64).
    ^ {r.g.b}

shiftRgbFromY: y Cb: cb Cr: cr
    "YCrCb to RGB
    RGB 0-255
    Y 16-235
    CbCr 16-240"

    | ym r g b cbm crm |
    ym := y - 16.
    ym := ym + (ym >> 3) + (ym >> 5) + (ym >> 7).
    cbm := cb - 128.
    crm := cr - 128.
    r := ym + crm + (crm >> 3) + (crm >> 4).
    g := ym - ((cbm >> 3) + (cbm >> 4) + (cbm >> 5)) - ((crm >> 1) + (crm >> 5)).
    b := ym + ((cbm << 1) + (cbm >> 4) + (cbm >> 5) + (cbm >> 6)).
    ^ {r.g.b}

To be honest I thought I understood the integer approach but had no idea how the shift approach worked. Still the results were the same numerically - just a lot faster. So I my adventure into “Integer witchcraft” started and I decided to keep my findings here in the blog.

I’ll start with a simple example. Lets take the term $x \cdot 1.164$ . This is obviously a term which requires floating point operations. Lets try to convert it into an integershift(s) expression step by step.

Eliminate floating point numbers (using powers of 10)

The float $1.164$ is something we want to eliminate. Because it has a limited number of decimal places it’s easy to rewrite it as a combination of one multiplication and one division. I.e.

x \cdot 1.164 = x \cdot 1164 1000 = x \cdot 1164 \div 1000 = x \div 1000 \cdot 1164

$x \cdot 1.164 = x \cdot \frac{1164}{1000} = x \cdot 1164 \div 1000 = x \div 1000 \cdot 1164$

The last two terms are mathematically equal – however if you work in a language where integeroperations do not coerce to floats “automagically” (e.g. C or Ruby) you my want to multiply first. Why? Let’s assume $x=10$ then

(i n t) x \div (i n t) 1000 \cdot (i n t) 1164 (i n t) 10 \div (i n t) 1000 \cdot (i n t) 1164 (i n t) 0 \cdot (i n t) 1164 (i n t) 0 \neq \neq \neq \neq (i n t) x \cdot (i n t) 1164 \div (i n t) 1000 (i n t) 10 \cdot (i n t) 1164 \div (i n t) 1000 (i n t) 11640 \div (i n t) 1000 (i n t) 11

$\begin{eqnarray} (int)x \div (int)1000 \cdot (int)1164 & \neq & (int)x \cdot (int)1164 \div (int)1000 \\ (int)10 \div (int)1000 \cdot (int)1164 & \neq & (int)10 \cdot (int)1164 \div (int)1000 \\ (int)0 \cdot (int)1164 & \neq & (int)11640 \div (int)1000 \\ (int)0 & \neq & (int)11 \\ \end{eqnarray}$

So multiplying first doesn’t hurt in Smalltalk (unless you sprinkle in a lot of asInteger calls) or other languages which automatically coerce to floats if needed … but might safe your bu.. in C, Ruby and maybe a few others.

Eliminate floating point numbers (using powers of 2)

Instead of using powers of 10 we can also use powers of 2. This will allow us to convert the expression $x \cdot 1.164$ into an Integer-only/shifting expression.

We can use any power of 2 – but let’s just use 32 ( $2^5$ ) for now. We can/will/should/must use others as well; but that’s discussed later.

x \cdot 1.164 = x \cdot 1.164 \cdot 1 = x \cdot 1.164 \cdot 32 32 = x \cdot 1.164 \cdot 32 \div 32

$x \cdot 1.164 = x \cdot 1.164 \cdot 1 = x \cdot 1.164 \cdot \frac{32}{32} = x \cdot 1.164 \cdot 32 \div 32$

$1.164$ is equal to $1.164 \cdot 32 \div 32$ and with a bit of integer magic we can come up with an integer expression which is “kind of” equal:

1.164 = \approx \approx 1.164 \cdot 32 \div 32 (i n t) (1.164 \cdot 32) \div (i n t) 32 37 \div 32 = 1.15625

$\begin{eqnarray} 1.164 & = & 1.164 \cdot 32 \div 32 \\ & \approx & (int)(1.164 \cdot 32) \div (int)32\\ & \approx & 37 \div 32 = 1.15625 \end{eqnarray}$

As you can see the difference between the float value $1.164$ and our approximate “equivalent” expression value $1.15625$ is quite small. But it’s a lot easier (and faster) to calculate $x \cdot 37 \div 32$ than $x \cdot 1.164$ .

When working in languages which to not automatically coerce the result of integer operations to floats when needed your can stop reading here. Integer arithmetic is on of the most heavily optimized areas in today’s compilers and CPUs.
It’s different in Smalltalk though: A simple expression like 3 * 4 / 5 (which would be 2with integer arithmetic and 2.4with floating points) might result in a Fraction ((12/5))!!! Although great from a “keep the precision” standpoint it’s a speed nightmare. Sprinkling in some asFloatcalls might speed up the thing a bit … but for soeme calculations it can even be faster … just read on.

Replace multiplication of floating point numbers with integer shifts

So how to convert the expression $x \cdot 37 \div 32$ into something which only uses integer shifts? The “trick” is to replace the numerator $37$ with expressions which are powers of $2$ (the denominator is already a power of $2$ ). Each of these expressions is either a multiplication or division with a power of $2$ .

QUICK REFRESH
$x \ll s = x \cdot 2^s$ : Shifting an integer $x$ by $s$ bits to the left is equal to multiplying $x$ with $2^s$ .
$x \gg s = x * 2^{-s} = x \div 2^s$ : Shifting an integer $x$ by $s$ bits to the left is equal to multiplying $x$ with $2^{-s}$ or dividing by $2^s$ .

x \cdot 37 \div 32 = = = = = = = = = x \cdot (32 + 4 + 1) \div 32 x \cdot (25 + 22 + 20) \div 25 x \cdot (2 5 + 2 2 + 2 0 2 5) x \cdot (2 5 2 5 + 2 2 2 5 + 2 0 2 5) x \cdot (1 + 1 2 3 + 1 2 5) x \cdot (20 + 2 - 3 + 2 - 5) (x \cdot 20) + (x \cdot 2 - 3) + (x \cdot 2 - 5) (x ≫ 0) + (x ≫ 3) + (x ≫ 5) x + (x ≫ 3) + (x ≫ 5)

$\begin{eqnarray} x \cdot 37 \div 32 & = & x \cdot (32 + 4 + 1 ) \div 32 \\ & = & x \cdot (2^5 +2^2 + 2^0) \div 2^5 \\ & = & x \cdot \left( \frac{2^5 +2^2 + 2^0}{2^5} \right) \\ & = & x \cdot \left( \frac{2^5}{2^5} + \frac{2^2}{2^5} + \frac{2^0}{2^5} \right) \\ & = & x \cdot \left( 1 + \frac{1}{2^3} + \frac{1}{2^5} \right) \\ & = & x \cdot \left( 2^0 + 2^{-3} + 2^{-5} \right) \\ & = & \left(x \cdot 2^0\right) + \left(x \cdot 2^{-3}\right) + \left(x \cdot 2^{-5}\right) \\ & = & (x \gg 0) + (x \gg 3) + (x \gg 5) \\ & = & x + (x \gg 3) + (x \gg 5) \\ \end{eqnarray}$

So we can replace the expression $x \cdot 1.164$ (which “forces” float values&operations) with $x + (x \gg 3) + (x \gg 5)$ (which only uses integer values&operations).

Conclusion

Maybe everybody knew this “trick” and I have simply been living under a rock for the last decades. If this is the case this blog still server of a reminder for myself. If you indeed learned something that’s even better!

Limitations

IT ONLY WORKS FOR INTEGER VALUES!!

An expression like $x \cdot 1.164$ can only be converted to $x + (x \gg 3) + (x \gg 5)$ if any only if $x$ is an integer!

PRECISION

Float values ( $x \in \mathbb{Q}$ ) can be represented as a fraction ( $\frac{a}{b}$ ). Other floats (e.g. $\pi$ ) cannot be represented as a fraction. The method presented above however will always try to approximate the float value $x$ with a fraction. Even if the value can be represented as a fraction ( $x \in \mathbb{Q}$ ) you might loose precision – due to the denominator always being a power of $2$ and because the achievable precision depends on the used exponent:

Denominator $16 = 2^4$

Approximated fraction: $1.164 \approx \frac{19}{16} = 1.1875$ ; Error $\approx 0.02350$
Shift Expression: x + (x >> 3) + (x >> 4)

Denominator $32 = 2^5$

Approximated fraction: $1.164 \approx \frac{37}{32} = 1.15625$ ; Error $\approx 0.00775$
Shift Expression: x + (x >> 3) + (x >> 5)

Denominator $64 = 2^6$

Approximated fraction: $1.164 \approx \frac{74}{64} = 1.15625$ ; Error $\approx 0.00775$
Shift Expression: x + (x >> 3) + (x >> 5)

Denominator $128 = 2^7$

Approximated fraction: $1.164 \approx \frac{149}{128} = 1.1640625$ ; Error $\approx 0.00006$
Shift Expression: x + (x >> 3) + (x >> 5) + (x >> 7)

So in theory the bigger the denominator the higher the precision – although not every step means more precision (compare 32 and 64). Another limiting factor is the numeric range of the variable. If the variable is an unsigned byte (0-255) it’s pointless to shift it more than 8 bits to the right.

So given the example above ( $1.164$ ) using $128$ for byte-sized values is pointless: The largest shift is 7 bits – so only values with the highest bit set (128+) will “survive” and result in a value different than $0$ . Using $64$ or $32$ will result in the same shift operations with a maximum shift of 5 bits - so the highest 3 bits will “survive”. That’s quite usable for most cases.

Finding the best compromise between precision and number of shifts is the actual hard decision you have to make. Check the code below to make an informed decision.

Code

NOTE: The code below is neither nice nor fast. But it does it’s job :-)

Number>>#multiplyAsShiftReportWith
    ^ self multiplyAsShiftReportWith: 'x'

Number>>#multiplyAsShiftReportWith: variableName
    ^ String
        streamContents: [ :report | 
            | denominator numerator aproximatedFloat bitString numberOfSetBits currentShiftOffset shiftPositions |
            report
                nextPutAll: 'Float: ' , self printString;
                cr.
            4 to: 9 do: [ :denominatorExponent | 
                denominator := 2 raisedTo: denominatorExponent.
                numerator := (self * denominator) rounded.
                aproximatedFloat := (numerator / denominator) asFloat.
                bitString := numerator bitString last: 16.
                numberOfSetBits := bitString asBag occurrencesOf: $1.
				report
					cr;
					nextPutAll: ('<1p> / <2p> = <3p>' expandMacrosWith: numerator with: denominator with: aproximatedFloat);
					cr;
					nextPutAll:
							('|Error| = <1s>; (log10 |Error| = <2s>)'
									expandMacrosWith: ((aproximatedFloat - self) abs printShowingDecimalPlaces: 5)
									with: ((aproximatedFloat - self) abs log printShowingDecimalPlaces: 3));
					cr;
					nextPutAll: ('<1p> = Binary <2s>' expandMacrosWith: numerator with: bitString);
					nextPutAll: ('; required shifts: <1p>' expandMacrosWith: numberOfSetBits);
					cr;
					nextPutAll:
							('Precision/Shifts Ratio: <1s>'
									expandMacrosWith: ((aproximatedFloat - self) abs log abs / numberOfSetBits printShowingDecimalPlaces: 3));
					cr;
					nextPutAll: ('Orignal Expression: <1s> * <2p>' expandMacrosWith: variableName with: self);
					cr;
					nextPutAll:
							('Integer Expression: <1s> * <2p> / <3p>' expandMacrosWith: variableName with: numerator with: denominator);
					cr.
				currentShiftOffset := denominatorExponent - bitString size.
				shiftPositions := OrderedCollection new.
				bitString asArray
					do: [ :bit | 
						currentShiftOffset := currentShiftOffset + 1.
						bit = $1
                            ifTrue: [ shiftPositions add: currentShiftOffset ] ].
                report nextPutAll: 'Shift Expression: '.
                shiftPositions
                    do: [ :position | 
                        position = 0
                            ifTrue: [ report nextPutAll: variableName ]
                            ifFalse: [ 
                                position negative
                                    ifTrue: [ report nextPutAll: ('(<1s> %<%< <2p>)' expandMacrosWith: variableName with: position negated) ]
                                    ifFalse: [ report nextPutAll: ('(<1s> >> <2p>)' expandMacrosWith: variableName with: position) ] ] ]
                    separatedBy: [ report nextPutAll: ' + ' ].
                report cr ] ]

Code (Usage)

I using the same float ( $1.164$ ) I used throughout this article to demonstrate the functionality. The report shows the different values incl. their precision, number of shifts and expressions which can be directly copy&pasted.

1.164 multiplyAsShiftReportWith: 'x'. "print-it"
 'Float: 1.164

19 / 16 = 1.1875
|Error| = 0.02350; (log10 |Error| = -1.629)
19 = Binary 0000000000010011; required shifts: 3
Precision/Shifts Ratio: 0.543
Orignal Expression: x * 1.164
Integer Expression: x * 19 / 16
Shift Expression: x + (x >> 3) + (x >> 4)

37 / 32 = 1.15625
|Error| = 0.00775; (log10 |Error| = -2.111)
37 = Binary 0000000000100101; required shifts: 3
Precision/Shifts Ratio: 0.704
Orignal Expression: x * 1.164
Integer Expression: x * 37 / 32
Shift Expression: x + (x >> 3) + (x >> 5)

74 / 64 = 1.15625
|Error| = 0.00775; (log10 |Error| = -2.111)
74 = Binary 0000000001001010; required shifts: 3
Precision/Shifts Ratio: 0.704
Orignal Expression: x * 1.164
Integer Expression: x * 74 / 64
Shift Expression: x + (x >> 3) + (x >> 5)

149 / 128 = 1.1640625
|Error| = 0.00006; (log10 |Error| = -4.204)
149 = Binary 0000000010010101; required shifts: 4
Precision/Shifts Ratio: 1.051
Orignal Expression: x * 1.164
Integer Expression: x * 149 / 128
Shift Expression: x + (x >> 3) + (x >> 5) + (x >> 7)

298 / 256 = 1.1640625
|Error| = 0.00006; (log10 |Error| = -4.204)
298 = Binary 0000000100101010; required shifts: 4
Precision/Shifts Ratio: 1.051
Orignal Expression: x * 1.164
Integer Expression: x * 298 / 256
Shift Expression: x + (x >> 3) + (x >> 5) + (x >> 7)

596 / 512 = 1.1640625
|Error| = 0.00006; (log10 |Error| = -4.204)
596 = Binary 0000001001010100; required shifts: 4
Precision/Shifts Ratio: 1.051
Orignal Expression: x * 1.164
Integer Expression: x * 596 / 512
Shift Expression: x + (x >> 3) + (x >> 5) + (x >> 7)
'

Update 1

This approach of course works for every kind of Number, ScaledDecimals included (thanks Tobias!). So I moved the method to class Numberand updated the code.

Update 2

Thanks to a Twitter conversation with Hwa Jong Oh there is now a FastConstantMultiplier. So if you need to multiply an Integer with a constant quite often this might be interesting for you:

Load package

Gofer it
    smalltalkhubUser: 'UdoSchneider' project: 'Playground';
    package: 'FastConstantMultiplier';
    load.

Using the package

m := FastConstantMultiplier  constant: 0.123456789 significantDigits: 11.

m * 10000. "1232 "
m * 10000000000000." 1234567890054"

Written with StackEdit.

Read the source Luke!

Friday, August 08, 2014

Replacing Floating Point multiplication with Integer shifting (or ... "optimization witchcraft")