Simple Encryption in C#


  • I have created a simple C# Windows application that encrypts/saves/loads up to 128 characters of text. It is intended to serve as a rudimentary password manager.
  • You can view/download the full source on my Github repository.


It seems, these days, that an online livelihood comprises too many disparate services, profiles, and accounts, and far, far too many passwords to all keep straight. Common strategies include passwords that are the names of beloved pets or next of kin, simple alphanumeric sequences, or to use the same password for every account. Of course, these strategies are security no-nos. Secure passwords are complex, hard to guess, and consequently, hard to memorize.

There are many, much better password managers out there, but I thought I’d try to make my own.. I chose to do this project in MSVS C# over, say, Java or Python for some of C#s readymade features, like save and load Windows dialogs, and its generally easy GUI development process. This decision, however, was not without some shortcomings; unlike the Java language BigInteger class, the C# BigInteger class is scant on features. There is no C# BigInteger primality testing method, no inverse mod n method, and there is no method that generates random BigIntegers. These are all necessary components of proper encryption schemes.


Another hurdle to this project was that instead of using a static encryption key, saving the encrypted password or password hash associated with each ciphertext, and verifying that a user has entered a correct password before decrypting, I designed a method to derive a unique encryption key from the user’s password. In this way, it is not necessary to save or store the user’s password or password hash at all. Attempting to decrypt with an incorrect password will (likely) yield a different encryption key, and so, not properly decipher an encrypted text.


While very famous, RSA is an asymmetric encryption scheme, where users have both a public key, and a separate, private key. It is meant to facilitate secure communication among users on a network, and so, it isn’t an appropriate encryption scheme for this project. However, a very similar symmetric encryption scheme, modeled after RSA will suffice. Whereas Euler’s Theorem underpins the RSA encryption scheme, the less general Fermat’s Little Theorem is all that is needed here.

Fermat’s Little Theorem:
where p is prime, and a \not\equiv 0\ (\textrm{mod}\ p)

a^{p-1} \equiv 1\ (\textrm{mod}\ p)

This can be read to say that any whole number that is not zero, and not a multiple of some particular prime, when raised to the power of that particular prime minus one, will always have a remainder of one when divided by that particular prime.

I think that the best way to prove Fermat’s Little Theorem, or to understand why it is true is via the group theoretical perspective. For some whole number, p, whole numbers less than p that have no divisors in common with p form a finite group under the operation of multiplication. If p is a prime number, then no whole number less than p can have any divisors in common with p, and so, such a group must have p-1 elements. Lagrange’s Theorem states that for every finite group, G, the order (the number of elements) of every subgroup of G divides the order of G. Hence if the order of G is p-1, where p is prime, all of the cyclic subgroups generated by the elements of G must have an order that divides p-1.

A couple example tables might help illustrate Fermat’s Little Theorem.

\\ \begin{array}{|c|c|c|c|c|} \hline a^b (\textrm{mod}\ 5) & 1 & 2 & 3 & 4 \\ \hline 1 & 1 & 1 & 1 & 1 \\ \hline 2 & 2 & 4 & 3 & 1 \\ \hline 3 & 3 & 4 & 2 & 1 \\ \hline 4 & 4 & 1 & 4 & 1 \\ \hline \end{array}

\begin{array}{|c|c|c|c|c|c|c|} \hline a^b (\textrm{mod}\ 7) & 1 & 2 & 3 & 4 & 5 & 6 \\ \hline 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \hline 2 & 2 & 4 & 1 & 2 & 4 & 1 \\ \hline 3 & 3 & 2 & 6 & 4 & 5 & 1 \\ \hline 4 & 4 & 2 & 1 & 4 & 2 & 1 \\ \hline 5 & 5 & 3 & 1 & 5 & 3 & 1 \\ \hline \end{array}

In the first table above, I’ve taken all the numbers in the leftmost column, raised them to the power of the numbers in the top row, and then taken their remainders after dividing by the prime number, 5. In the second table I did the same, but divided by the prime number 7 instead of 5. Each row can be taken to represent the cyclic subgroups generated by the elements of (\mathbb{Z}_5, *) and (\mathbb{Z}_7, *), respectively. Note that in both tables, the rightmost column has only ones. This result is precisely what is claimed by Fermat’s Little Theorem.

Multiplying both sides of the congruence given in Fermat’s Little theorem by the number, a, gives

a^{(p-1)+1} \equiv a\ (\textrm{mod}\ p)

This congruence suggests the following encryption scheme:

  • Choose any number, e, from the group U_{p-1}. i.e. the units of p-1, a group formed by multiplication of whole numbers less than p-1 which have no divisors in common with p-1.
  • find d, the inverse of e in U_{p-1}.
  • encrypt message, a, as a^e (\textrm{mod}\ p) = c where c is the cipher message.
  • decrypt cipher message, c, with c^d (\textrm{mod}\ p) = a to recover plaintext message, a.

To explain why this works, begin with e, and its inverse d in U_{p-1}. This means that for some whole number, k,

\begin{aligned} ed &\equiv 1\ (\textrm{mod}\ p-1) \\ &= 1 + k(p-1) \end{aligned}

and so,

\begin{aligned} a^{ed} &\equiv a^{k(p-1) + 1}\ (\textrm{mod}\ p) \\ &\equiv a^{k(p-1)}a^1\ (\textrm{mod}\ p) \\ &\equiv (a^{(p-1)})^k a^1\ (\textrm{mod}\ p) \\ &\equiv (1)^k a\ (\textrm{mod}\ p) \\ &\equiv a\ (\textrm{mod}\ p) \\ \end{aligned}

As stated earlier, the C# BigInteger class is not equipped with an inverse mod n method, so I had to write my own. Solving this problem is equivalent to solving a linear Diophantine equation in two variables: Provided two whole numbers, a and b, find whole numbers, x and y, for which

ax + by = 1

Subtracting the term, b*y, from both sides of this equation, demonstrates the equivalence of solving the Diophantine equation to finding modular inverses.

\begin{aligned} ax &= 1 - by \\ &\equiv 1\ (\textrm{mod}\ b) \\ \end{aligned}

Thus, numbers a and x are inverses of one another, mod b.

Why groups, U(n), can only contain elements that are relatively prime to n is illustrated by the equivalence of these two problems. If it were desired to find the inverse of a, mod b, but the greatest common divisor of a and b is not one, the Diophantine equation yields a contradiction; the right-hand side is divisible only by one, whereas the left-hand side is divisible by GCD(a,b).

\begin{aligned} 1 &= ax + by \\ &= GCD(a,b)(a'x + b'y) \\ \end{aligned}

In such an instance, a has no inverse, mod b, but groups require that all its elements have inverses.

As the Diophantine equation is linear, its general solution will consist of the solution to its homogeneous equation, plus its particular solution.

a[x_h + x_p] + b[y_h + y_p] = 1

A solution solution of the homogeneous equation, (x_h, y_h) = (-bk, ak), \ \forall k \in \mathbb{Z}, is easy to spot:

a[-bk] + b[ak] = 0

The particular solution is commonly found using the Extended Euclidean Algorithm. That is to use the regular Euclidean Algorithm to find GCD(a,b), and then work backwards to find an x and y that satisfy the Diophantine equation.

Find x,y \in \mathbb{Z} for which 5344x + 2077y = GCD(a,b) .

\begin{aligned} 5344 &= (2)2077+ 1190 \\ 2077 &= (1)1190 + 887 \\ 1190 &= (1)887 + 303 \\ 887 &= (2)303 + 281 \\ 303 &= (1)281 + 22 \\ 281 &= (12)22 + 17 \\ 22 &= (1)17 + 5 \\ 17 &= (3)5 + 2 \\ 5 &= (2)2 + 1 \\ 2 &= (2)1 + 0 \\ \end{aligned}

Thus the Euclidean Algorithm shows (second to last row) that 5344 and 2077 are relatively prime, i.e. GCD(5344, 2077) = 1. Now work backwards to find x and y in terms of 5344, 2077.

\begin{aligned} 1 &= 5 - (2)2 \\ &= [22-(1)17] - (2)[17-(3)5] \\ &= [(303-(1)281)-(1)(281-(12)22)] - (2)[(281-(12)22)-(3)(22-(1)17)] \\ &\vdots \\ \end{aligned}

Working backwards, as done above, can become a cumbersome and tricky procedure, requiring a sort of recursive back-substitution process, but with intermittent times when it is advantageous to simplify the expression in a strategic fashion. Thankfully, my Number Theory textbook offers an alternative to this “working backwards” procedure that is much more easily codified. The process involves writing the problem as a system of two equations in two unknowns, then producing new equations by adding multiples of those equations together until an equation is produced that is equal to GCD(a,b).

\begin{aligned} 5344(1) + 2077(0) &= 5344 \\ 5344(0) + 2077(1) &= 2077 \\ 5344(1) + 2077(-2) &= 1190 \\ 5344(-1) + 2077(3) &= 887 \\ 5344(2) + 2077(-5) &= 303 \\ 5344(-5) + 2077(13) &= 281 \\ 5344(7) + 2077(-18) &= 22 \\ 5344(-89) + 2077(229) &= 17 \\ 5344(96) + 2077(-247) &= 5 \\ 5344(-377) + 2077(970) &= 2 \\ 5344(850) + 2077(-2187) &= 1 \\ \end{aligned}

With the solution to the homogeneous equation from earlier, and now the particular solution, the example problem is solved: \forall k \in \mathbb{Z} ,

5344(2077k + 850) + 2077(-5344k - 2187) = 1 \\

Furthermore, with the particular solution, the expression shows that the inverse of 5344 (mod 2077) is 850, and the inverse of 2077 (mod 5344) is 3157.

\begin{aligned} 5344(850) = 1 + 2077(2187) \\ \equiv 1\ (\textrm{mod}\ 2077) \\ \end{aligned}


\begin{aligned} 2077(3157) = 1 + 5344(1227) \\ \equiv 1\ (\textrm{mod}\ 5344) \\ \end{aligned}

I implemented the procedure used above in a C# method that operates on C# BigIntegers. Provided two BigIntegers, x and y, ExtendedEuclideanAlg returns a three-tuple (i, j, GCD(x,y)), where xi + yj = GCD(x,y). The method works by first initializing two equations that equal abs(x), and abs(y) on the right-hand side, respectively. If x or y are negative, that information is preserved on the left-hand side. This step simplifies the algorithm because the aim can be narrowed to decrease the right-hand side until equaling GCD(x,y). At every iteration of the main loop, a new equation is produced by subtracting the largest multiple the last equation from the second to last equation for which the right-hand side of the equation remains positive. Once the right-hand side is equal to GCD(x,y) the process terminates. I put a little check to also terminate the process accordingly for the case when GCD(x,y) is not one, so that the last entry is always equal to GCD(x,y). Also, I made the method track every step of the problem in lists (u,v,n). While the entirety of the lists are not returned, they are still sometimes desirable, so I left them in. A more efficient method should only require storing two equations at once.

public static BigInteger[] ExtendedEuclideanAlg(BigInteger x, BigInteger y)
	List<BigInteger> n = new List<BigInteger>();
	List<BigInteger> u = new List<BigInteger>();
	List<BigInteger> v = new List<BigInteger>();
	BigInteger q, r;

	//         [_____u____________v____________n_____]
	//    R0:  [  sign(x)*x  +    0y       =  abs(x) ]
	//    R1:  [     0x      +  sign(y)*y  =  abs(y) ]
	n.Add(x * x.Sign);
	n.Add(y * y.Sign);

	// add multiples of rows together until n.last == gcd(x,y)
	while ((!(BigInteger.Equals(n[n.Count - 1], BigInteger.One))) && (!BigInteger.Equals(n[n.Count - 1], n[n.Count - 2])))
		q = BigInteger.DivRem(n[n.Count - 2], n[n.Count - 1], out r);

		//if n.last divides n.secondToLast,
		//     n.last == gcd(x,y)
		if (BigInteger.Equals(r, BigInteger.Zero))
			q = q - BigInteger.One;

		n.Add(n[n.Count - 2] - q * n[n.Count - 1]);
		u.Add(u[u.Count - 2] - q * u[u.Count - 1]);
		v.Add(v[v.Count - 2] - q * v[v.Count - 1]);

	// u.last*x + v.last*y = n.last = gcd(x, y)
	return new BigInteger[3] { u.Last(), v.Last(), n.Last() };

The other requirement for this project is the generation of large primes. Although I took an introductory class in Number Theory, I am admittedly not on the level to understand why the Miller-Rabin primality test works. Wikipedia states that its correctness relies on the unproven Extended Riemann Hypothesis. The algorithm offers a probabilistic measure of the likelihood that it will incorrectly decide a composite number is a prime (false-positive). This probability is a function of the number of iterations that the algorithm is allowed to iterate, with that probability stated as at most 4^{-k}, k iterations.
Here is the pseudo-code that Wikipedia provides:

write n − 1 as 2r·d with d odd by factoring powers of 2 from n − 1
WitnessLoop: repeat k times:
   pick a random integer a in the range [2, n − 2]
   x ← ad mod n
   if x = 1 or x = n − 1 then
      continue WitnessLoop
   repeat r − 1 times:
      x ← x2 mod n
      if x = 1 then
         return composite
      if x = n − 1 then
         continue WitnessLoop
   return composite
return probably prime

Note that this procedure relies on the ability to generate random numbers. Since the C# BigInteger class does not provide such a method, it is necessary to write one. I wasn’t very interested in doing so, and was able to quickly find someone else’s solution on stack overflow.

Though I don’t understand the Miller-Rabin primality test, I was able to work through the pseudocode and produce a working implementation in C#:

public static double MillerRabinPrimalityTest(BigInteger n, int numTrials, Random random)
	//Probability that n passes test but is not prime ~ 4^(-numTrials)
	if ((n == 2) || (n == 3) || (n == 5))
		return 1.0;
	if (n.IsEven)
		return 0.0;
		//find d, r, for which d*(2^r) = n-1
		BigInteger d = n - 1;
		int r = 0;
		while (d.IsEven)
			d = d / 2;
		BigInteger nMinusThree = n - 3;
		for (int i = 0; i < numTrials; i++)
			BigInteger a = RandomBigIntegerLessThan(nMinusThree, random);
			while (a < 2 || a > nMinusThree)
				a = RandomBigIntegerLessThan(nMinusThree, random);
			BigInteger x = BigInteger.ModPow(a, d, n);
			if (!(x.IsOne || (x == (n - 1))))
				int j = 1;
				while ((j < r) && (x != (n - 1)))
					x = BigInteger.ModPow(x, 2, n);
					if (x.IsOne)
						return 0.0;
				if (x != (n - 1))
					return 0.0;
		return Math.Pow(4.0, Convert.ToDouble(-numTrials));

Finally, I devised of a method to generate (e, d, n) encryption keys from the hashcode of whatever password a user might provide:

public static BigInteger[] PasswordToEDN(string password, int bytewidth)
	//derives from password string
	//numbers e, d, n
	// for which
	//        ed ~ 1 (mod (n-1))
	// and
	//        n > 256^bytewidth
	// and
	//        n is prime

	byte[] temp = new byte[bytewidth + 1];
	temp[bytewidth] = 1;
	Random rnd = new Random();

	//derive a large prime number, n, from password hashcode
	BigInteger hashcode = new BigInteger(Math.Abs(password.GetHashCode()));
	BigInteger modMax = BigInteger.Parse("2147483647");

	for (int i = 0; i < bytewidth; i++)
		temp[i] = (byte)(hashcode % 256);
		hashcode = BigInteger.ModPow(hashcode, 2, modMax);
	BigInteger n = new BigInteger(temp);
	n = NextProbablePrime(n, 100, rnd);

	//find e, the jth number that is
	//relatively prime to n-1.
	//       GCD(e_j, n-1) == 1
	//where j is derived from
	//the password hashcode
	BigInteger nMinusOne = n - 1;
	BigInteger e = new BigInteger(1);
	int j = (int)(hashcode % 1000);
	for (int k = 0; k < j; k++)
		e = nNextRelativePrimeToM(e, nMinusOne);

	//find e inverse, d
	BigInteger d = xInverseModN(e, nMinusOne);

	return new BigInteger[] { e, d, n };

The same method is used for generating keys (e, d, n) when encrypting and decrypting, but the only the numbers e and n are used to encrypt with modular exponentiation of the message, whereas the numbers d and n are used to decrypt with modular exponentiation.

This application restricts the message length to 128 or fewer characters (bytes). That is because my key generating algorithm yields a prime modulus, n, that is 129 bytes in length, where at first, the highest order byte is explicitly set to one, and the remaining bytes are chosen based on the user’s password. The BigInteger represented by these bytes is then increased until it is found to be prime by the Miller-Rabin primality test. This guarantees that n is at least 256^{129} (a one with about 300 zeros after it), but it is very probably less than 256^{130}.  Thus, a message that is more than 128 characters might encode to a number that is larger than the modulus n, and would be corrupted by the encryption process.

Further complicating the matter, once encrypted, the message can sometimes become mapped to a number that, though still less than the the modulus n, is large enough to require 129 bytes, or characters to represent. And so I decided on a simple, but inelegant rule for the application: messages no larger than 128 characters may be encrypted, but messages no larger than 129 characters may be decrypted. This is not a great solution because it can admit messages larger than the modulus for decryption.

It is computationally expensive to generate primes that are much larger than 128 bytes in length (by methods known to me, at least..). I wondered for a while about how to encrypt/decrypt larger messages but could only come up with ideas that involved introducing padding to the message. But as this strategy would seem to introduce vulnerabilities, and having been eager to move on to my next project, I gave up on pursuing the matter.

Of course, once the project was out of mind, and well behind me, I had the idea that messages larger than the modulus, n, could be expressed as polynomials in base n via the division algorithm (just as numbers can be expressed in base 2 for binary, or base 16 for hexadecimal), then the coefficients of this polynomial could be encrypted, and then the polynomial with encrypted coefficients can be evaluated to yield the encrypted message that is larger than the modulus, n. This process could clearly be inverted in order to decrypt and recover the original message. While I’m pretty confident in this idea, I can’t guarantee that it will work without having implemented it, and I’m more interested in moving along with my upcoming project.



Assorted Derivations in 2D


  • As a final project for one of my Computer Science courses I made a Google App Engine game of billiards where you can interact with the mathematical derivations from this post. It requires two players, so you’ll need to navigate to the web page in two separate browser tabs if you don’t have an opponent. Visit to play.
  • Additionally, all of the images from this post were rendered using some of the components I developed for this project. You can view the full source on my Github repository.

I’ve been allowed some creative freedom with a final project for one of my computer science courses. Naturally, I’m going to make a video game. This has led me to think lately about some of the ensuing technical hurdles and mathematical tools I’ll need to develop in order to get the project underway.

Matrix determinant and parallelogram area


I’ll begin by defining a vector operation, \circ, where

a \circ b = a_x b_y - a_y b_x.

The operation is just a 2×2 matrix determinant, but without getting too technical, I’ll just think of it as an operation on two vectors that gives a number, the absolute value of which is the area of the parallelogram that they subtend.


Note that the operation is not commutative, it is anti-commutative.

\begin{aligned} a \circ b &= a_x b_y - a_y b_x \\ &= -(a_y b_x - a_x b_y) \\ &= -(b_x a_y - b_y a_x) \\ &= - (b \circ a) \end{aligned}

This is a useful property of the operation. if a \circ b is positive, then I know the area of the parallelogram that they subtend, but also that vector b is oriented counter-clockwise with respect to vector a. On the other hand, if a \circ b is negative, then I know that vector b is oriented clockwise with respect to vector a.

Also useful to know, the operation is left and right distributive over addition/subtraction:

\begin{aligned} a \circ (b + c) &= a_x (b_y + c_y) - a_y (b_x + c_x) \\ &= a_x b_y  + a_x c_y- a_y b_x - a_y c_x \\ &= a_x b_y - a_y b_x + a_x c_y - a_y c_x\\ &=  a \circ b + a \circ c \\ \end{aligned}


(a+b) \circ c = a \circ c + b \circ c

I will be using a \circ b notation throughout this post to save typing. In addition to this, I will use more familiar notation for other common operations, such as:

Dot Product

a \cdot b = a_x b_x + a_y b_y

and Euclidean norm

|a| = \sqrt{a_x ^ 2 + a_y ^ 2}

What is the height of the blue vector over the red vector?


To find the length of the yellow line, I’ll begin by finding the area of the parallelogram subtended by the blue and red vectors.

\begin{aligned} A_p &= a_x b_y - a_y b_x \\ &= a \circ b \end{aligned}


Half of this area gives the area of the triangle defined by the blue and red vectors.

\begin{aligned} A_t &= \frac{a_x b_y - a_y b_x}{2} \\ &= \frac{a \circ b}{2} \end{aligned}


The length of the yellow line is the height of this triangle, with base equal to the length of the red vector. I can find the height of the yellow line with the familiar formula,

\begin{aligned} A_t &= \frac{(base)(height)}{2}\\ height &= \frac{2 A_t}{base}. \end{aligned}

Substituting for A_t and the base of the triangle gives

\begin{aligned} h &= \frac{2[\frac{a_x b_y - a_y b_x}{2}]}{[\sqrt{b_x^2 + b_y^2}]}\\ &= \frac{a_x b_y - a_y b_x}{\sqrt{b_x^2 + b_y^2}} \\ &= \frac{a \circ b}{|b|} \end{aligned}

Note that this operation is related to the formula for vector projection,

\begin{aligned} aProjB &= \frac{a_x b_x + a_y b_y}{\sqrt{b_x^2 + b_y^2}}\\ &= \frac{a \cdot b}{|b|} \end{aligned}

When paired together, the two operations satisfy the Pythagorean theorem:

\begin{aligned} |a|^2 &= (\frac{a \cdot b}{|b|})^2 + (\frac{a \circ b}{|b|})^2\\ &= (\frac{a_x b_x + a_y b_y}{\sqrt{b_x^2 + b_y^2}})^2 + (\frac{a_x b_y - a_y b_x}{\sqrt{b_x^2 + b_y^2}})^2\\ &= \frac{a_x^2 b_x^2 + 2a_x b_x a_y b_y + a_y^2 b_y^2 + a_x^2 b_y^2 - 2a_x b_y a_y b_x+ a_y^2 b_x^2}{b_x^2 + b_y^2}\\ &= \frac{a_x^2 b_x^2 + a_y^2 b_y^2 + a_x^2 b_y^2 + a_y^2 b_x^2}{b_x^2 + b_y^2}\\ &= \frac{a_x^2(b_x^2 + b_y^2) + a_y^2(b_x^2 + b_y^2)}{b_x^2 + b_y^2}\\ &= \frac{(a_x^2+a_y^2)(b_x^2 + b_y^2)}{b_x^2 + b_y^2}\\ &= a_x^2+a_y^2\\ &= |a|^2 \end{aligned}

Together they give the base and height, respectively, of the generalized right triangle formed by two arbitrary vectors.

\begin{aligned} base &= \frac{a \cdot b}{|b|} & height &= \frac{a \circ b}{|b|} \end{aligned}


When will the circle run into the edge?


The circle will run into the edge when a line (yellow) drawn perpendicular from the edge to the center of the circle has a length that is equal to the circle’s radius.P5Image5

I’ll use the operation defined earlier to get the height of the circle, c, above the edge, e

\begin{aligned} h &= \frac{c \circ e}{|e|}\\ &= \frac{c_x e_y - c_y e_x}{\sqrt{e_x^2 + e_y^2}} \end{aligned}


As circle, c, travels along its velocity vector, v, through time,

\begin{aligned} c_x &= v_x t + c_{x0}\\ c_y &= v_y t + c_{y0} \end{aligned}

I’ll make this substitution for c_x and c_y and solve for time t, when the height of the yellow line is equal to the radius of the circle, c_r.

h = \pm c_r = \frac{[v_x t + c_{x0}]e_y-[v_y t + c_{y0}]e_x}{\sqrt{e_x^2 + e_y^2}}

With a little algebra, the time at which the circle runs into the edge,

t = \frac{c_{x0}e_y-c_{y0}e_x \pm c_r\sqrt{e_x^2 + e_y^2}}{e_x v_y - e_y v_x}

This can be written more compactly:

t = \frac{c \circ e \pm c_r |e|}{e \circ v}

The plus or minus symbol in the formula arises because the circle can be in contact with the line segment from both above and below, or more accurately, when the circle is oriented counter-clockwise and when it is oriented clockwise with respect to the line segment.

In what direction should the circle be deflected off of the edge?
The circle should be deflected in the direction of a new velocity vector whose component that is perpendicular to the edge has been negated, but whose component that is parallel to the edge has remained unchanged.
With respect to the edge, e, the perpendicular and parallel components of the velocity vector before deflection, v_i , are respectively

\begin{aligned} &\frac{v_i \circ e}{|e|}, &\frac{v_i \cdot e}{|e|} \end{aligned}

And so the logic above gives the following system of equations for the velocity after deflection, v_f:

\begin{cases} \frac{v_f \circ e}{|e|} &= -\frac{v_i \circ e}{|e|}\\ \frac{v_f \cdot e}{|e|} &= \frac{v_i \cdot e}{|e|}\\ \end{cases}


\begin{cases} v_{fx}e_y - v_{fy}e_x &= -(v_{ix}e_y - v_{iy}e_x)\\ v_{fx}e_x + v_{fy}e_y &= v_{ix}e_x + v_{iy}e_y \end{cases}

Because it will appear in other calculations, for the sake of reuse I’ll state that the coefficient matrix of this system has inverse,

\frac{1}{|e|^2} \begin{bmatrix} e_y & e_x\\ -e_x & e_y \end{bmatrix}

Multiplying both sides of the system gives the solution,

\begin{cases} v_{fx} &= \frac{-e_y(v_{ix}e_y - v_{iy}e_x) + e_x(v_{ix}e_x + v_{iy}e_y)}{e_x^2 + e_y^2}\\ v_{fy} &= \frac{e_x(v_{ix}e_y - v_{iy}e_x) + e_y(v_{ix}e_x + v_{iy}e_y)}{e_x^2 + e_y^2} \end{cases}


\begin{cases} v_{fx} &= \frac{-e_y(v_i \circ e) + e_x(v_i \cdot e)}{|e|^2}\\ v_{fy} &= \frac{e_x(v_i \circ e) + e_y(v_i \cdot e)}{|e|^2} \end{cases}


When will the two circles collide with one another?

The two circles will collide with one another when the distance between the centers of the two circles is equal to the sum of their radii.

\sqrt{(A_x - B_x)^2 + (A_y - B_y)^2} = A_r + B_r

As circles A and B move in accordance of their velocity vectors,

\begin{cases} A_x &= A_{x0} + A_{vx}t \\ A_y &= A_{y0} + A_{vy}t \\ \end{cases} \\ \begin{cases} B_x &= B_{x0} + B_{vx}t \\ B_y &= B_{y0} + B_{vy}t \\ \end{cases}


\begin{aligned} ([A_{x0} + A_{vx}t] - [B_{x0} + B_{vx}t])^2 + ([A_{y0} + A_{vy}t] - [B_{y0} + B_{vy}t])^2 &= (A_r + B_r)^2 \\ ([A_{x0} - B_{x0}] + [A_{vx}-B_{vx}]t])^2 + ([A_{y0} - B_{y0}] + [A_{vy}-B_{vy}]t)^2 &= (A_r + B_r)^2 \\ \end{aligned}


\begin{aligned} &(A_{x0} - B_{x0})^2 + 2*(A_{x0} - B_{x0})(A_{vx}-B_{vx})t + (A_{vx}-B_{vx})^2 t^2 \\ + &(A_{y0} - B_{y0})^2 + 2*(A_{y0} - B_{y0})(A_{vy}-B_{vy})t + (A_{vy}-B_{vy})^2 t^2\\ - &(A_r + B_r)^2 \\= &0 \end{aligned}


\begin{cases} \alpha &= (A_{vx}-B_{vx})^2 + (A_{vy}-B_{vy})^2 \\ \beta &= 2[(A_{x0} - B_{x0})(A_{vx}-B_{vx}) + (A_{y0} - B_{y0})(A_{vy}-B_{vy})]\\ \gamma &= (A_{x0} - B_{x0})^2 + (A_{y0} - B_{y0})^2 - (A_r + B_r)^2 \end{cases}

Solving for t with the quadratic formula gives two solutions.

\begin{aligned} t = \frac{-\beta \pm \sqrt{\beta^2 - 4\alpha\gamma}}{2\alpha} \end{aligned}

The solution that is the smallest positive value gives the time at which the soonest collision occurs.
The solution that is the largest positive value gives the time at which another collision would occur should the circles pass through one another, rather than bounce away from each other.
If the two circles are not going to collide, t will either be a complex number or infinity.

How should the two circles be deflected away from one another?


The component of the circle’s velocity vector that is perpendicular to an edge connecting the center of the two circles should remain constant, while the component of the circle’s velocity vector that is parallel to the edge connecting the centers should be the result of a one-dimensional elastic collision.


Let a vector, e, extend from the center of circle A, to the center of circle B.


Obtain the projection, A_{pi}, of A’s velocity vector on vector e.


Also, obtain the projection, B_{pi}, of B’s velocity vector onto vector e.


In an elastic collision, where circles have mass, A_m and B_m the final velocities will have projections onto edge, e,

\begin{cases} A_{pf} &= \frac{ 2B_m B_{pi} + A_{pi}(A_m - B_m)}{A_m + B_m} \\ B_{pf} &= \frac{ 2A_m A_{pi} + A_{pi}(B_m - A_m)}{A_m + B_m} \end{cases}

Now Circle A’s velocity after deflection can be described with the system,

\begin{cases} \frac{A_{vf} \circ e}{|e|} &= \frac{A_{vi} \circ e}{|e|} \\ \frac{A_{vf} \cdot e}{|e|} &= A_{pf} \end{cases}


\begin{cases} A_{vfx}e_y - A_{vfy}e_x &= A_{vix}e_y - A_{viy}e_x\\ A_{vfx}e_x + A_{vfy}e_y &= A_{pf}|e| \end{cases}

Again, the coefficient matrix of this system has inverse,

\frac{1}{|e|^2} \begin{bmatrix} e_y & e_x\\ -e_x & e_y \end{bmatrix}

multiplying both sides gives

\begin{cases} A_{vfx} &= \frac{e_y(A_{vix}e_y - A_{viy}e_x) + e_x(A_{pf}|e|)}{e_x^2 + e_y^2}\\ A_{vfy} &= \frac{-e_x(A_{vix}e_y - A_{viy}e_x) + e_y(A_{pf}|e|)}{e_x^2 + e_y^2} \end{cases}


\begin{cases} A_{vfx} &= \frac{e_y(A_{vi} \circ e) + e_x(A_{pf}|e|)}{|e|^2}\\ A_{vfy} &= \frac{-e_x(A_{vi} \circ e) + e_y(A_{pf}|e|)}{|e|^2} \end{cases}

Use the same process to determine the velocity of the other circle after deflection.

Parametric Lines
A useful parameterization gives points along a line defined by endpoints (x_1, y_1) and (x_2, y_2)

\begin{aligned} x(t) &= x_1 + t(x_2 - x_1)\\ y(t) &= y_1 + t(y_2 - y_1) \end{aligned}

This parameterization has the property that for values 0 \leq t \leq 1, (x(t), y(t)) is on the line segment, between the two endpoints. On the other hand, if t is less than zero, or t is greater than one, (x(t), y(t)) is outside of the two endpoints.

Below are points (from left to right) for t = -0.5 in red, t = 0, 0.25, 0.5, 0.75, and 1 in green, and for t = 1.5 in red again:

At what point do two line segments intersect?


Lines A and B intersect for parametric values s and t, such that

\begin{aligned} A_x(s) &= B_x(t)\\ A_y(s) &= B_y(t) \end{aligned}

I’ll use the parametric equation of lines A and B and solve for the values s and t for which the equality is satisfied.

\begin{aligned} A_{x1} + s(A_{x2} - A_{x1}) &= B_{x1} + t(B_{x2} - B_{x1})\\ A_{y1} + s(A_{y2} - A_{y1}) &= B_{y1} + t(B_{y2} - B_{y1}) \end{aligned}


\begin{aligned} s(A_{x2} - A_{x1}) - t(B_{x2} - B_{x1}) &= B_{x1} - A_{x1}\\ s(A_{y2} - A_{y1}) - t(B_{y2} - B_{y1}) &= B_{y1} - A_{y1} \end{aligned}

In matrix form

\begin{bmatrix} (A_{x2} - A_{x1}) & -(B_{x2} - B_{x1}) \\ (A_{y2} - A_{y1}) & -(B_{y2} - B_{y1}) \end{bmatrix} \begin{bmatrix} s\\ t \end{bmatrix} = \begin{bmatrix} B_{x1} - A_{x1}\\ B_{y1} - A_{y1} \end{bmatrix}

If I define vectors

\begin{aligned} a &= ( A_{x2} - A_{x1}, A_{y2} - A_{y1} )\\ b &= ( B_{x2} - B_{x1}, B_{y2} - B_{y1} )\\ c &= ( B_{x1} - A_{x1}, B_{y1} - A_{y1} ) \end{aligned}

The solutions can be expressed as simple ratios of determinants.

\begin{array}{cc} s = \frac{b \circ c}{b \circ a} & t = \frac{a \circ c}{b \circ a} \end{array}

If the line segments do intersect, they do at point

A(\frac{b \circ c}{b \circ a}) = B(\frac{a \circ c}{b \circ a})

If either s or t are not in the range of 0 to 1, then the line segments do not cross. If the line segments do not cross because they run parallel to one another, the determinant a \circ b must equal zero.

How can I tell if an object is enclosed within a region?


If, by protocol, regions are polygons defined with counter-clockwise directed edges, an object inside the region is then oriented counter-clockwise with respect to all the edges that define the region, whereas an object that is outside of the region must be oriented clockwise with at least one of the region’s edges.

With the operation defined earlier, it is possible to determine this property. While a directed edges aren’t vectors, the coordinates of the involved objects can be transformed to the origin and thought of as such.

bool aIsCounterClockwiseToB(point a, edge b) {

    int u_x <- b.x2 - b.x1
    int u_y <- b.y2 - b.y1
    int v_x <- a.x - b.x1
    int v_y <- a.y - b.y1
    int A <- u_x*v_y - u_y*v_x     return A > 0
bool aIsWithinRegionB(point a, edge[] b) {

    bool isInRegion <- true
    for edge in b {
        if !aIsCounterClockwiseToB(a, edge) {
            isInRegion <- false

    return isInRegion

Of course, regions could be defined with clockwise oriented edges and the reverse of the above algorithms could determine if an object is enclosed or not, just as well. But note that this algorithm will only work for convex polygonal regions. In order to determine if a point is inside of a concave polygonal region, it would have to first be broken down into several convex polygonal regions.

More to follow…

Code Optimization: Computer Brains vs. People Brains


This project consists of two components which you can view/download from my Github repository:

The Octave/Matlab component contains two .m script files. They generate the data that the other project component relies on. I created the Octave/Matlab .m files using Octave, a free, open source alternative to Matlab. Though Octave is designed to be compatible with Matlab, it sometimes fails to be so.

The C++ component relies on special (SSE3) instructions that are not available on all machine architectures. The code should compile and run as expected on Intel or AMD CPUs from 2010 or newer. Running the code on older machines will result in errors, or worse, unexpected behavior.

I had been leafing through some detailed documentation on assembly language instructions (that’s my idea of a good time, I guess…) when I noticed that the floating point trigonometric instructions, fsin, fcos, fsincos, fptan, and fpatan all incur latencies that are considerably larger than do other instructions. This appears to be the case for every machine architecture described in the documentation. Why? And, by the way, how do computers calculate sin(x) and other such functions?

The above linked to stackoverflow question leads down a pretty deep rabbit hole; the topic is apparently a controversial and interesting one to more than just myself. To summarize some of the responses from the stackoverflow page, CPUs have hardware implementations of the CORDIC algorithm for such computations. This is what is used to evaluate the fsin, fcos, etc. assembly instructions–the ones that incur so much latency. And as someone else has uncovered, this algorithm has potential issues with accuracy, as well. But that’s only half the story. Compilers (mostly) won’t map your high level language sin(x) instruction to the fsin assembly instruction, but will instead map it to an elaborate software algorithm that evaluates the function with a truncated Taylor series, or Chebyshev polynomial.

This inspired me to try my own hand at a sin(x) approximation algorithm. I wondered if I could write an algorithm that was faster than the CPU or compiler software implementations at the expense of a reasonable degree of accuracy. Also, I opted to restrict the domain of my approximation to the interval, (0,\pi), since other values may be inferred by the function’s symmetries (actually, (0,\frac{\pi}{2}) would suffice). I felt that, while involving the least amount of work on my behalf, a Taylor Series approximation would not yield a good accuracy at a low enough order of truncation. A continuous least squares method will yield a polynomial approximation that minimizes the total error and take a reasonable amount of work. On the other hand, a Chebyshev polynomial approximation would minimize the maximum error of the approximation, but this would involve the most work on my behalf. And so, a continuous least squares polynomial approximation seemed to me the goldilocks-zone method for a project of this scope.


A continuous least squares polynomial approximation begins with a polynomial of degree n, with unknown coefficients,C_k,

P_n(x) = C_n x^n + C_{n-1} x^{n-1} + \dots + C_1 x + C_0

the function it is to approximate, f(x), and the domain, (a,b), over which it is to do so. The objective is to find the particular coefficients, C_k, which minimize the error of the approximation. The error of the approximation is measured as the square of the difference between the polynomial, and the function it is to approximate over the interval (a,b).

\begin{aligned} E &= \int_a^b (f(x) - P_n{x})^2 dx \\ &= \int_a^b (f(x))^2 dx - 2\int_a^b f(x)P_n(x) dx + \int_a^b (P_n(x))^2 dx \end{aligned}

The error function is optimized by taking a partial derivative with respect to each coefficient and solving when equal to zero

\frac{\partial E}{\partial C_{k}} = -2 \int_a^b x^k f(x) dx + 2 \int_a^b x^k P_n(x) dx = 0

This yields an equation for each coefficient, C_k,

\int_a^b x^k P_n(x) dx = \int_a^b x^k f(x) dx


\begin{bmatrix} \int_a^b x^{2n} dx & \int_a^b x^{2n-1} dx & \dots & \int_a^b x^{n+1} dx & \int_a^b x^n dx \\ \int_a^b x^{2n-1} dx & \int_a^b x^{2n-2} dx & \dots & \int a^b x^n dx & \int a^b x^{n-1} dx \\ \vdots & & \ddots & & \vdots \\ \int_a^b x^{n+1} dx & \int_a^b x^n dx & \dots & \int_a^b x^2 dx & \int_a^b x dx \\ \int_a^b x^n dx & \int_a^b x^{n-1} dx & \dots & \int_a^b x dx & \int_a^b 1 dx \end{bmatrix} \begin{bmatrix} C_n \\ C_{n-1} \\ \vdots \\ C_1 \\ C_0 \end{bmatrix} = \begin{bmatrix} \int_a^b x^n f(x) dx \\ \int_a^b x^{n-1} f(x) dx \\ \vdots \\ \int_a^b x f(x) dx \\ \int_a^b f(x) dx \end{bmatrix}

For this project, the range in question is (0, \pi). This gives matrix entries of the form

\int_0^{\pi} x^p dx = \frac{\pi^p}{p}

The vector on the right hand side has entries of the form

\int_0^{\pi} x^p sin(x) dx

While these integrals can be solved with the product rule without any difficulty, they are tedious. I used Wolfram Alpha to get their closed form solutions. You can refer to my Octave sinApproxData.m source file to view them.

I solved systems for least square approximating polynomials of degrees two through eight. Here are my results:

\begin{array}{|c|c|} \hline & P_2(x) \\ \hline C_0 & -0.0504654977784461\\ \hline C_1 & 1.3122362048324483\\ \hline C_2 & -0.4176977570064662\\ \hline \end{array}

\begin{array}{|c|c|} \hline & P_3(x) \\ \hline C_0 & -5.04654977784000e-002\\ \hline C_1 & 1.31223620483227e+000\\ \hline C_2 & -4.17697757006328e-001\\ \hline C_3 & -2.92305840062896e-014\\ \hline \end{array}

\begin{array}{|c|c|} \hline & P_4(x) \\ \hline C_0 & 0.00131345589768425\\ \hline C_1 & 0.98260114780498042\\ \hline C_2 & 0.05446968167436379\\ \hline C_3 & -0.23379309903632944\\ \hline C_4 & 0.03720932737240900\\ \hline \end{array}

\begin{array}{|c|c|} \hline & P_5(x) \\ \hline C_0 & 1.31345589736404e-003\\ \hline C_1 & 9.82601147808036e-001\\ \hline C_2 & 5.44696816675576e-002\\ \hline C_3 & -2.33793099030552e-001\\ \hline C_4 & 3.72093273703402e-002\\ \hline C_5 & 2.63408359623400e-013\\ \hline \end{array}

\begin{array}{|c|c|} \hline & P_6(x) \\ \hline C_0 & -1.70004824988927e-005\\ \hline C_1 & 1.00038803940859e+000\\ \hline C_2 & -2.14775276097336e-003\\ \hline C_3 & -1.61705542577131e-001\\ \hline C_4 & -5.81476368125425e-003\\ \hline C_5 & 1.20515943047020e-002\\ \hline C_6 & -1.27871387060836e-003\\ \hline \end{array}

\begin{array}{|c|c|} \hline & P_7(x) \\ \hline C_0 & -1.70004984191225e-005\\ \hline C_1 & 1.00038803968086e+000\\ \hline C_2 & -2.14775389497628e-003\\ \hline C_3 & -1.61705540620288e-001\\ \hline C_4 & -5.81476536069728e-003\\ \hline C_5 & 1.20515950620807e-002\\ \hline C_6 & -1.27871404236276e-003\\ \hline C_7 & 1.54399345600267e-011\\ \hline \end{array}

\begin{array}{|c|c|} \hline & P_8(x) \\ \hline C_0 & 1.31570823227546e-007\\ \hline C_1 & 9.99995401622597e-001\\ \hline C_2 & 3.94059931985509e-005\\ \hline C_3 & -1.66810967472702e-001\\ \hline C_4 & 2.79388739405372e-004\\ \hline C_5 & 8.01674986535062e-003\\ \hline C_6 & 2.19672210664518e-004\\ \hline C_7 & -2.92010889608420e-004\\ \hline C_8 & 2.32374889636375e-005\\ \hline \end{array}

I noticed right away that the polynomials of odd degree are almost identical to their predecessors; their leading coefficients are nearly zero. This result surprised me. I at first speculated that this is to do with the Taylor series expansion of sin(x), since every other coefficient in the expansion is zero. But in its taylor series expansion, it’s the even coefficients that are zero, not the odd ones.

sin(x) = \sum_{n = 0}^{\infty} \frac{(-1)^n}{(2n+1)!}x^{2n+1}

On the other hand, maybe its just that sin(x) on (0,x) is entirely concave-down in shape. It wouldn’t make sense for the approximation to alternate its concavity over the interval unless it could change it back once more to concave down, still within the interval. This behavior is possible only with even polynomials.

Here are some plots I made in octave of each approximating polynomial that has even degree, along with its squared difference from sin(x).

E_n(x) = (sin(x) - P_n(x))^2

Since the polynomials of odd degree offer no improvement, their plots are omitted, however they can be found in the github repository associated with this project.




It’s clear that my chosen least squares method results in a lot of error near the boundaries of the approximation interval. While a least squares method minimizes the total error of the approximation, a Chebyshev polynomial approximation reduces error near the interval boundaries in exchange for a small increase in total error. In this regard, a Chebyshev polynomial approximation is probably a superior choice for practical applications.

I calculated the root mean square error of each approximating polynomial to get an idea of how accurate I can expect it to be.

rmsE_n = \sqrt{\frac{\int_0^{\pi} (sin(x)-P_n(x))^2 dx}{\pi}}

\begin{array}{|c|c|} \hline rmsE_2 & 1.72635958108317e-002\\ \hline rmsE_3 & 1.72635958108316e-002\\ \hline rmsE_4 & 3.69025604229967e-004\\ \hline rmsE_5 & 3.69025604229946e-004\\ \hline rmsE_6 & 4.15526427715294e-006\\ \hline rmsE_7 & 4.15526427714237e-006\\ \hline rmsE_8 & 2.88859802322048e-008\\ \hline \end{array}


When it comes to evaluating polynomials, in general, Horner’s method is the most computationally efficient.

\begin{aligned} P_n(x) &= C_0 + C_1x + C_2x^2 + \dots + C_{n-1}x^{n-1} + C_n x^n \\ &= C_0 + x(C_1 + x(C_2 + x(\dots x(C_{n-1} + xC_n)))) \end{aligned}

Straightforward evaluation of a polynomial of degree n requires n additions, n products, and n-1 exponentiations, whereas Horner’s method requires n additions, n products, and no exponentiation.

Breaking the process into partial steps yields the recurrence relation:

y_k = C_{n-k} + xy_{k-1}

This translates easily into an algortihm. Here is my c++ implementation from this project. Note that in my implementation the coefficients are ordered in reverse.

double hornersMethod(double x, double* c, int length) {

	double y = 0;
	for (int i = 0; i < length; i++) {
		y = c[i] + x*y;

	return y;

But this function is way too slow to compete with the built-in MSVC sin(x) function. First of all, it’s too general; coefficients and the number of them have to be passed to the function each time it is called. To remedy this issue, it’s necessary to make specialized functions for each particular polynomial that have the coefficients defined locally. I only did this for the polynomials of degree 4 and 8. I made another slight improvement with a technique called loop unrolling. Rather than iterating over a loop n times, the code within the loop are simply written n times. Normally this technique is redundant because a compiler will make this sort of optimization on it’s own. Here is my unrolled P_8(x) function:

double unrolledHornerSinP8(double x) {

	__declspec(align(16)) double c8[] = { 2.32374889636375e-005, -2.92010889608420e-004, 2.19672210664518e-004, 8.01674986535062e-003, 2.79388739405372e-004, -1.66810967472702e-001, 3.94059931985509e-005, 9.99995401622597e-001, 1.31570823227546e-007 };
	__declspec(align(16)) double y;

	y = c8[0];
	y = c8[1] + x*y;
	y = c8[2] + x*y;
	y = c8[3] + x*y;
	y = c8[4] + x*y;
	y = c8[5] + x*y;
	y = c8[6] + x*y;
	y = c8[7] + x*y;
	y = c8[8] + x*y;

	return y;

Having conducted some timings at this stage in development, I found that even this approach can’t compete. Then it dawned on me: factor–back to Octave!

\begin{array}{|c|c|} \hline & P_4(x) \\ \hline r_1 & 4.86760270318095678 \\ \hline r_2 & 3.14292946639567994 \\ \hline r_3 & -1.72601004959195414 \\ \hline r_4 & -0.00133681280580084 \\ \hline \end{array}

\begin{array}{|c|c|} \hline & P_8(x) \\ \hline r_1 & 6.559254850158984 + 2.377563907402060i \\ \hline r_2 & 6.559254850158984 - 2.377563907402060i \\ \hline r_3 & 5.895452530035389 + 0.000000000000000i \\ \hline r_4 & 3.141592785174156 + 0.000000000000000i \\ \hline r_5 & -3.417662278854166 + 2.377563613450983i \\ \hline r_6 & -3.417662278854166 - 2.377563613450983i \\ \hline r_7 & -2.753860092985270 + 0.000000000000000i \\ \hline r_8 & -0.000000131571428 + 0.000000000000000i \\ \hline \end{array}

While P_4(x) has four deliciously real roots, P_8(x) has two pairs of complex conjugate roots. To evaluate P_8(x), I expanded the complex roots and expressed them as two irreducibly real quadratics.

\begin{aligned} (x-r_1)(x-r_2) &= x^2 - (r_1 + r_2)x + r_1 r_2 \\ &= x^2 - ([a+bi] + [a-bi])x + [a+bi][a-bi] \\ &= x^2 - 2ax + a^2+b^2 \end{aligned}

Here’s my factored P_8(x) function:

double factoredP8Sin(double x) {

	__declspec(align(16)) double r4[] = { 5.895452530035389, 3.141592785174156, -2.753860092985270, -0.000000131571428 };
	__declspec(align(16)) double q1[] = { -13.1185097003180, 48.6766343231151 };
	__declspec(align(16)) double q2[] = { 6.83532455770833, 17.3332241883087 };
	__declspec(align(16)) double r8Magnitude = 2.32374889636375e-005;

	return r8Magnitude*(x - r4[0])*(x - r4[1])*(x - r4[2])*(x - r4[3])*(x*x + q1[0] * x + q1[1])*(x*x + q2[0] * x + q2[1]);

While the public intrigue and controversy over the CPU trigonometric functions and their various compiler software alternatives that I stumbled upon initially got me thinking about this project, I decided to go ahead with it because I wanted an excuse to tinker around with SIMD (Single Instruction Multiple Data) assembly instructions. I had originally intended to write full on inline assembly routines, but I didn’t have the patience to figure out how to use some of the instructions necessary to do so (I found the shuffle instructions rather complex and tedious). Visual Studio provides an alternative means of accessing these instructions that allowed me to skirt around my lack of know how: intrinsics. As an afterthought, had I gone for full on inline assembly implementations of these algorithms, I probably would have gotten poor results in their timings; the MSVS compiler won’t optimize code within asm blocks, but it can optimize around intrinsics.

The SIMD instructions do as the acronym says; more than one variable can be loaded into a single SIMD register, a mathematical operation is performed with that register and another SIMD register, and the result is that operation performed on each individual variable within the register. This process is sometimes called vectorization. Since I was using 64-bit double precision variables in this project, there’s room for only two of them per SIMD register, but they could accommodate more than just two variables of lesser bit-width.

There are a ton of these SIMD instructions, but I wound up using just a few of them:

  • _mm_add_pd(u, v)
    • Add packed double precision values in registers u and v.
  • _mm_mul_pd(u, v)
    • Multiply packed double precision values in registers u and v.
  • _mm_hadd_pd(u, v)
    • Add packed doubles in register v together and place in lower half of result register, and add packed doubles in register u together and place in upper half of result register.

I wrote vectorized SIMD alternates of all my c++ functions. Most of these were boring, because they were just straightforward O(1) number crunching procedures. The general Horner’s method algorithm was interesting to write because it was a little more complex to begin with. I devised of splitting the polynomial in half, whereby the even terms of the polynomial are all evaluated in the lower half of a SIMD register, while the odd terms are evaluated in its upper half.

double hornersMethodSIMD(double x, double* c, unsigned int length) {

	__m128d u, v, w;

	u.m128d_f64[0] = x*x;
	u.m128d_f64[1] = x*x;						//u = | x^2 | x^2 |

	w.m128d_f64[0] = c[0];
	w.m128d_f64[1] = c[1];						//w = | c[1] | c[0] |

	for (int i = 2; i < (length - 1); i += 2) {
		v.m128d_f64[0] = c[i];
		v.m128d_f64[1] = c[i + 1];				//v = | c[i+1] | c[i] |
		w = _mm_mul_pd(u, w);					//w = | (x^2)*w[1] | (x^2)*w[0] |
		w = _mm_add_pd(v, w);					//w = | c[i+1] + x*w[1] | c[i] + x*w[0] |

	if ((length % 2) == 0) {
		w.m128d_f64[0] *= x;					//w = | w[1] | x*w[0] |
	else {
		u.m128d_f64[0] = x*x;
		u.m128d_f64[1] = x;						//u = | x | x^2 |
		w = _mm_mul_pd(u, w);					//w = | x*w[1] | (x^2)*w[0] |
		w.m128d_f64[0] += c[length - 1];		//w = | w[1] | c[n] + w[0] |
	w = _mm_hadd_pd(w, w);						//w = | w[0]+w[1] | w[0]+w[1] |

	return w.m128d_f64[0];

I first conducted timings on my computer with the compiler set to debug mode. This prevents the compiler from conducting optimization. Then I conducted the same timings again with the compiler set to release mode. This allows MSVS to optimize.

\begin{array}{|c|c|c|} \hline Debug & ns & error \\ \hline MSVS sin(x) & 29.1142 &0\\ \hline horner Method & 71.4804 &-\\ \hline unrolled P_4(x) & 56.6324 &e-4\\ \hline unrolled P_8(x) & 65.6092 &e-8\\ \hline factored P_4(x) & 43.6212 &e-4\\ \hline factored P_8(x) & 40.2231 &e-8\\ \hline horner Method SIMD & 96.3103 &-\\ \hline unrolled P_4(x) SIMD & 80.6376 &e-4\\ \hline unrolled P_8(x) SIMD & 117.434 &e-8\\ \hline factored P_4(x) SIMD & 62.6635 &e-4\\ \hline factored P_8(x) SIMD & 113.347 &e-8\\ \hline \end{array}

\begin{array}{|c|c|c|} \hline Release & ns & error \\ \hline MSVS sin(x) & 22.5987 & \\ \hline horner Method & 25.1078 &\\ \hline unrolled P_4(x) & 11.9099 &\\ \hline unrolled P_8(x) & 22.1605 &\\ \hline factored P_4(x) & 7.98517 &\\ \hline factored P_8(x) & 12.1009 &\\ \hline horner Method SIMD & 15.8491 &\\ \hline unrolled P_4(x) SIMD & 15.8491 &\\ \hline unrolled P_8(x) SIMD & 20.9332 &\\ \hline factored P_4(x) SIMD & 7.2142 &\\ \hline factored P_8(x) SIMD & 10.5768 &\\ \hline \end{array}

As for algorithm accuracy, I concede that my least square method is probabbly less desireable than what a Chebychev polynomial approximation could offer. Near the boundaries of the approximation, my approximations loose about a digit of accuracy. As for algorithm speed, the optimal code from this project came from both my efforts and those of the MSVS compiler working in tandem. When the compiler is allowed to optimize my c++ code, it actually does so with vectorized SIMD instructions. What it can’t do, however, is recognize the underlying task and design a vectorized algorithm to solve it. At least for now, that’s something only humans brains can do.

Binary and AVL Search Trees


You can view/download my Binary and AVL tree Python implementations from my github repository.

The python file relies on the simple 3rd party library which you can download here.

This project was developed using the 3.x branch of python, though it should also work with python 2.x.

I originally had a more unique project in mind for this post but because the project involves processing large amounts of data I wound up implementing a crude binary search tree to store and sort the data. Dissatisfied with my crude binary search tree, I became interested in developing a more robust one, and eventually, in developing a balanced AVL tree. I decided that the topic is interesting and elaborate enough to make an intermediate project of it, and to put the project I had originally planned off to a later date.

Since an AVL tree is just a binary search tree with additional features, I’ll first describe what a binary search tree is, its advantages, and some of its inner machinery, and then I’ll describe the additional properties and features that make a binary search tree an AVL tree.

From an outside perspective, a binary search tree is just a data container. I can put data into it, check if it already contains a particular data element, remove data from it, and maybe list or iterate over all the data it contains. An advantage of binary trees is that all of these tasks can be done quickly, even when they contain large amounts of data. If I needed to store a large number of objects in some hypothetical container, I could do so very quickly by haphazardly throwing them in, one by one. This creates problems down the line, however. If I later wanted to retrieve some particular object from the container, it could take a very long time to dig through all the contents of the container until I found it. A binary search tree mediates these issues. Due to the internal structure of a binary search tree, slightly more time and care is taken with how objects (data) are placed in it, but in this way, finding or retrieving an object from a binary tree takes less time because its contents are always sorted.

When data is inserted into a binary search tree it is first encapsulated in an intermediate data structure called a node. Along with the data it encapsulates, each node has a unique number assigned to it called a key (actually, the key doesn’t necessarily have to be a number, but has to at least be some object that can be comparatively less than, greater than, or equal to other keys). Keys are often derived from the data via a hash-code, or are some serial number associated with the data. The Node’s key determines how it is sorted, and where it is placed in the binary tree. Though the data placed in a binary tree can be complicated in nature, e.g. image files, financial records, etc., for this project I’m just going to have my data elements be numbers. That way, the data in a node can also function as the key of the node. Also, some binary trees treat instances of duplicates differently; in my implementation, if a data element that is already present in the tree is again inserted into the tree, the node pertaining to that data will track the count of how many such elements have been inserted.

As well as the data and key, a node consists of three edges, named the right, left, and parent edges. The edges of a node connect to other nodes in the binary tree. The data structure is aptly named, as it tends to take on a tree-like quality when multiple nodes are connected together.

Inserting Nodes:

A binary search tree always inserts data beginning at a special node called its root node. If the tree is empty, whatever data is first inserted becomes the root node. Otherwise, the key associated with the new data is compared with the key associated with the root node. If the key associated with the new data is greater than that of the root node, the new data is inserted to the right of the root node, otherwise, if the key associated with the new data is less than that of the root node, it is inserted to the left of the root node. If there’s already a node present to the left or right of the root node, the process is repeated and the the new data is inserted to the right or left of that node, and so on, until an empty space is come upon.

    if (newNode.key > self.key)
        if (self.right == null)
            self.right = newNode
            newNode.parent = self
        if (self.left == null)
            self.left = newNode
            newNode.parent = self


>>> myTree.insert(100)


The number 100, serving as both data and the key of the new node is inserted. Since the tree had been empty, this node becomes the root node of the tree.

>>> myTree.insert(300)


Because 300 is greater than 100, it is inserted to the right of 100.

>>> myTree.insert(200)


Because 200 is greater than 100, it is inserted to the right of 100, but since 200 is less than 300, it is inserted to the left of 300.

>>> myTree.insert(0)


Because 0 is less than 100 it is inserted to the left of 100.

>>> myTree.insert(250)


Because 250 is greater than 100, it is inserted to the right of 100, but since 250 is less than 300, it is inserted to the left of 300, but since 250 is greater than 200, it is inserted to the right of 200.

By convention, computer scientists draw trees upside down. Also note a fundamental property of binary search trees: that for every node in a binary search tree, nodes with smaller key values appear to the left of that node, whereas nodes with greater key values appear to the right of that node. If I list the data in the tree beginning with the left-most node, then the second left-most, then the third, and so on, I end up listing all the data in a sorted order.

Now that I have a nice binary tree to look at, I’ll refer to it to specify some terminology.

  • Node: Nodes encapsulate data and keys and connect to other nodes. In my tree diagrams, they are drawn as circles.
  • Edge: Each node has a left, right and parent edge. In my tree diagrams, they are drawn as lines connecting nodes together.
  • Parent: A node is a parent node to the nodes below it that it has edges to. In the above diagram, 100 is a parent to 0 and 300, 300 is a parent to 200 and 200 is a parent to 250.
  • Child: A node is a child node to the node above it that it has an edge to. In the above diagram, 250 is a child of 200, 200 is a child of 300, both 0 and 300 are children of 100.
  • Left Child: A node is a left child if it is to the left of its parent. In the above diagram, 0 is a left child of 100, and 200 is a left child of 300.
  • Right Child: A node is a right child if it is to the right of its parent. In the above diagram, 300 is a right child of 100, and 250 is a right child of 200.
  • Left Subtree: A left subtree consists of all the nodes that can be reached from the left child of a particular node. In the above diagram, the left subtree of 100 consists of 0, the left subtree of 300 consists of both 200 and 250.
  • Right Subtree: A right subtree consists of all the nodes that can be reached from the right child of a particular node. In the above diagram, the right subtree of 100 consists of 300, 200, and 250. The right subtree of 200 consists only of 250.
  • Leaf Node: A leaf node is a node with no children. In the above diagram, 0 and 250 are leaf nodes. A non-empty binary tree always contains at least one leaf node.
  • Ancestor: A node is an ancestor to another if it is its parent, its parent’s parent, parent’s parent’s parent, etc.. In the above diagram, 200, 300, and 100 are all ancestors of 250, 300 and 100 are ancestors of 200, etc. The root node is an ancestor of every node in the tree.

Searching and Retrieval:


Suppose I would like to determine if the above tree contains a node with the key 76 (of course I know it does because I can see it in the diagram). Similar to the insert algorithm, beginning at the root node, 94 is greater than 76, so 76 would have to be in the left subtree. 67 is less than 76, so it would have to be in the right subtree from there. 79 is greater than 76, so it would have to be in the left subtree from there. 69 is less than 76, and there is 76, to the right of 69. It took five steps to find the node with a key of 76. That is the vertical distance from the root node to the searched for node. In the worst case, it would take as many steps as the height of the tree to find a node with a particular key. Note that the above tree is much wider than it is tall. That’s no coincidence. On average, a random binary tree containing n nodes will have a height of log_2 n .

Just as well, If I search for a node that isn’t in the tree, I can follow the same search pattern. If I haven’t found it after reaching a leaf node, I can conclude that it’s not in the tree.

    if (self == Null)
        return Null
    if (self.key == searchNode.key)
        return self
    else if (self.key > searchNode.key)
        return self.left.find(searchNode)
        return self.right.find(searchNode)


Navigating from an arbitrary node to its successor or predecessor is vital to the functionality of a binary tree. This process serves as a building block for iteration over a series of nodes, and for more robust binary tree functionality, like node removal.

The definition of a node’s successor and predecessor can actually vary. A binary tree has three canonical sorts of recursive tours: pre-order, in-order, and post-order. To each of these sorts of tours, there is a different definition of the next and previous node. I won’t get into the specifics of these tours, but just specify that the definition of successor and predecessor that I’m describing is that of the in-order tour. This pertains to a more expected definition of next and previous: a next node has the key that follows that of the current node in numerical order, and a previous node has the key that precedes it.

Navigating from one node to it’s next or previous node can be confusing; the correct path is dependent on the type of the current node. A node can either be a root node, a left child, or a right child. Furthermore, a node can either be a leaf node, have a left subtree only, a right subtree only, or have both a left and right subtree. This produces 3 \times 4 = 12 possible cases. For the sake of elucidation, I’ll enumerate over each of these cases and describe in words, the path from the current node to it’s next and previous node. If drawn carefully, such that for each node in the binary tree, nodes with smaller key values appear to the left of that node, and those with larger key values appear to its right, a node’s previous node will always appear in the column to the immediate left of the current node, and the next node will always appear in the column to the immediate right of the current node.  In the following diagrams, I’ll always highlight the current node in green, it’s next node in blue, and it’s previous node in red.

Case 1: (Root Node, Leaf Node)
This is the simplest case and I wont bother providing an image. If the current node is the root node, and is also a leaf node, then it must be the only node in the tree and it’s next and previous nodes are both Null.

\begin{array}{ccl} & \to & \text{Null} \\ currentNode.prev() & \to & \text{Null} \end{array}

Case 2: (Root Node, Left Subtree Only)


\begin{array}{ccl} & \to & \text{Null} \\ currentNode.prev() & \to & \text{rightmost node in left subtree} \end{array}

Case 3: (Root Node, Right Subtree Only)


\begin{array}{ccl} & \to & \text{leftmost node in the right subtree} \\ currentNode.prev() & \to & \text{Null} \end{array}

Case 4: (Root Node, Both Left and Right Subtrees)


\begin{array}{ccl} & \to & \text{leftmost node in the right subtree} \\ currentNode.prev() & \to & \text{rightmost node in the left subtree} \end{array}

Case 5: (Left Child, Leaf Node)


\begin{array}{ccl} & \to & \text{parent} \\ currentNode.prev() & \to & \text{parent of nearest ancestor that is a right child} \end{array}

Case 6: (Left Child, Left Subtree Only)


\begin{array}{ccl} & \to & \text{parent} \\ currentNode.prev() & \to & \text{rightmost node in the left subtree} \end{array}

Case 7: (Left Child, Right Subtree Only)


\begin{array}{ccl} & \to & \text{leftmost node in right subtree} \\ currentNode.prev() & \to & \text{parent of nearest ancestor that is a right child} \end{array}

Case 8: (Left Child, Both Left and Right Subtree)


\begin{array}{ccl} & \to & \text{leftmost node in right subtree} \\ currentNode.prev() & \to & \text{rightmost node in left subtree} \end{array}

Case 9: (Right Node, Leaf Node)


\begin{array}{ccl} & \to & \text{parent of nearest ancestor that is a left child} \\ currentNode.prev() & \to & \text{parent} \end{array}

Case 10: (Right Child, Left Subtree Only)


\begin{array}{ccl} & \to & \text{parent of nearest ancestor that is a left child} \\ currentNode.prev() & \to & \text{rightmost node in left subtree} \end{array}

Case 11: (Right Child, Right Subtree Only)


\begin{array}{ccl} & \to & \text{leftmost node in right subtree} \\ currentNode.prev() & \to & \text{parent} \end{array}

Case 12: (Right Child, Both left and Right Subtree)


\begin{array}{ccl} & \to & \text{leftmost node in right subtree} \\ currentNode.prev() & \to & \text{rightmost node in left subtree} \end{array}

Although there are 12 possible types of nodes, next and previous nodes are always reached via one of only 6 paths: the leftmost node in the right subtree, the rightmost node in the left subtree, the nearest ancestor that is a left child, the nearest ancestor that is a right child, the parent, or null. Of these, only four will require some code to accomplish, and they are rather simple to implement. Then implementing a next and previous function, involves only identifying the type of the current node, and returning the node obtained by traversing the appropriate path.

Removing Nodes:

It’s sometimes desirable to remove nodes from a binary tree. The simplest case is to remove a leaf node: set its parent edge to Null, and set its parent’s left or right connecting edge to Null. However, if a non-leaf node is up for removal, it’s necessary to swap in a replacement node from somewhere else in the tree. Care must be taken in selecting replacement nodes so as to maintain a properly sorted binary tree. It so happens that a node’s next or previous node can always be swapped in. In this way, the tree retains the property that for each node in the tree, nodes with larger keys are to its right, while nodes with lesser keys are to its left. To further complicate matters, a node’s next or previous node is not necessarily a leaf node either. This gives rise to a recursive removal process: If a node is a leaf node, then it can simply be removed, otherwise replace with a next or previous node by applying the remove process to it. It’s important to always select a next or previous node that appears further down in the tree. That way, the process is guaranteed to eventually reach a leaf node and the recursion terminates. Since a non leaf node will always have a next or previous node that appears further down the tree, this algorithm always works.

    if (self.isLeaf())
        if (self.isLeftChild())
            self.parent.left = null
        if (self.isRightChild())
            self.parent.right = null
        if (self.left != null) and (self.right != null)
            if (self.prev().isLeaf())
            if (self.right != null)
    return self


Initial Tree:


>>> myTree.remove(46)


Since the node with the key 46 was a leaf node, removal is straightforward.

>>> myTree.remove(83)


The Node with key 83 was not a leaf node, however its previous node, 81, was a leaf node. That node was removed from its previous location, and swapped in to where 83 had been.

>>> myTree.remove(27)


Neither of 27’s previous or next nodes were leaf nodes. This removal involved swapping 42 in for 33, which was swapped in for 28, which was finally swapped in for 27.

AVL Trees:

The number of steps involved in binary tree operations such as insertion, searching, and removal all depend on the height of the tree. As stated earlier, the height of a random binary tree containing n nodes is O(log_2{n}), on average, and thus, the previously mentioned operations will, on average, run in O(log_2{n}) time. Average tree heights, however are not guaranteed. In the worst case, a binary tree can end up with it’s height equal to the number of nodes it contains. This would occur if data that is already sorted is inserted into a binary tree. An AVL tree (inventors Adelson-Velsky and Landis) is a self-balancing tree. It maintains, in addition to properties of regular binary trees, the property that for each node in the tree,  the heights of its left and right subtrees differ by at most one. This property is enforced by checking the tree for infractions whenever a node is inserted or removed. If a node is found to be in violation, the tree is restructured around that node with a maneuver called a tree rotation. These rotations restore the height balance property where preformed.

leftleft rotate() \to   leftleftavl

A rotation rearranges three nodes, the node at which the height violation is found, its child with the greater height, and its child’s child with the greater height, and any subtrees of these three nodes. Note that in the above image on the left, the node with key 300 is in violation of the height balance rule. The height of its left subtree is two, while the height of its right subtree is zero.

Infractions can involve nodes in four possible configurations: left-left (as depicted above, the node at which the infraction occurs, it’s left child, and it’s left child’s left child), left-right, right-right, and right-left. No matter the initial configuration, the infraction is always corrected by making the middle node the highest node, the left-most node its left child, and the right-most node its right child. If any of these nodes have subtrees, they are reattached to the child nodes in the new configuration in the same left to right order as they had been in before the rotation.

I used my,, and Python files to generate the trees appearing in this post. Depicted below are two trees, the first, a binary search tree, and the second, an AVL tree. They were generated by inserting the same numbers in both trees, in the same order with the following Python script:

>>> from drawTree import *
>>> a = AVLTree()
>>> b = BTree()
>>> for i in range(0,25):
>>>      n = random.randint(0,100)
>>>      a.insert(n)
>>>      b.insert(n)
>>> drawTree(a, "AVL Tree")
>>> drawTree(b, "Binary Tree")



The Binary tree has a height of 8 while the AVL tree has a height of only 5. You can observe that for every node in the AVL tree, the height of its left and right subtrees differ by at most one.

Why Does Bresenham’s Line Drawing Algorithm Work?

I created a C# implementation of Bresenham’s line drawing algorithm in Visual Studio. You can view/download the source code from my github.


Given two integer endpoint coordinates, Bresenham’s line drawing algorithm determines what other integer coordinates best approximate the ideal line segment that they define.

Here is the (modified) pseudocode from Wikipedia:

plotLine(x0,y0, x1,y1)
    dx = x1 - x0
    dy = y1 - y0
    D = 2*dy - dx
    y = y0

    for x from x0 to x1
        if D > 0
            y = y + 1
            D = D - 2*dx
        end if
        D = D + 2*dy

Remarkably, this algorithm uses only integer arithmetic. Earlier computers motivated this property because non whole number operations, like division, were then relatively slow to compute.

The algorithm encodes the following intuitive logic:

  • Provided I am currently on a pixel that best approximates the ideal line, I should proceed to the adjacent pixel that also best approximates the ideal line. That pixel is either straight ahead, or diagonal from the current pixel.
  • By repeating this logic I can always begin at one endpoint and, selecting the pixel that best approximates the ideal line at each step, arrive at the other endpoint.
  • Since the pixels that best approximate the ideal line are always selected at each step, the entire line approximates the ideal line segment as best as possible.


But from looking at the code, that’s not at all apparent. So why does Bresenham’s line drawing algorithm work?


Endpoints, (x_0,y_0) and (x_1,y_1) define the ideal line:

\begin{aligned} y(x)&=mx+b \\ m &= \frac{(y_1-y_0)}{(x_1-x_0)} = \frac{\Delta y}{\Delta x} \end{aligned}

Denote the current coordinate:


To measure \alpha_i, the error in proceeding from the current coordinate the diagonal coordinate, and \beta_i, the error in proceeding instead to the adjacent coordinate:

\begin{aligned} \alpha_i &= y_i+1-y(x_i+1) \\ &= y_i+1-m(x_i+1)-b \\ \beta_i &= y(x_i+1)-y_i \\ &= m(x_i+1)+b-y_i \end{aligned}

If the error in proceeding to the diagonal coordinate is less than the error in proceeding to the adjacent coordinate, then I should proceed to the diagonal coordinate, otherwise I should proceed to the adjacent coordinate.

\begin{aligned} \alpha_i < \beta_i &\implies diagonal \\ \alpha_i \geq \beta_i &\implies adjacent \end{aligned}

If I instead take the difference of the two errors,

\begin{aligned} \beta_i - \alpha_i = &m(x_i+1)+b-y_i \\ +&m(x_i+1)+b-y_i-1 \\ = &2m(x_i+1)+2b-2y_i-1 \\ = &2mx_i+2m+2b-2y_i-1 \end{aligned}

I can follow the equivalent logic:

\begin{aligned} \beta_i - \alpha_i < 0 &\implies adjacent \\ \beta_i -\alpha_i \geq 0 &\implies diagonal \end{aligned}

Since the slope of the line, m, is a rational number, the decision still can not be made using just integer arithmetic.

\begin{aligned} \beta_i - \alpha_i &= 2mx_i+2m+2b-2y_i-1 \\ &= 2[\frac{\Delta y}{\Delta x}]x_i+2[\frac{\Delta y}{\Delta x}]+2b-2y_i-1\end{aligned}

But I can multiply the equality by \Delta x

\Delta x(\beta_i - \alpha_i) = 2\Delta y x_i + 2\Delta y + 2\Delta x b - 2 \Delta x y_i - \Delta x

and now follow the equivalent, integer arithmetic only logic:

\begin{aligned} \Delta x(\beta_i - \alpha_i) < 0 &\implies adjacent \\ \Delta x(\beta_i -\alpha_i) \geq 0 &\implies diagonal \end{aligned}

I’ll give this expression a name:

\begin{aligned} D_i &= \Delta x(\beta_i - \alpha_i) \\ &= 2\Delta y x_i + 2\Delta y + 2\Delta x b - 2 \Delta x y_i - \Delta x \end{aligned}

But rather than making all these calculations at every iteration of the algorithm, observe that the difference between successive terms is

\begin{aligned} D_{i+1} - D_i &= 2\Delta y x_{i+1} + 2\Delta y + 2\Delta x b - 2 \Delta x y_{i+1} - \Delta x \\ &-(2\Delta y x_i + 2\Delta y + 2\Delta x b - 2 \Delta x y_i - \Delta x) \\ &= 2\Delta y(x_{i+1} - x_i) - 2 \Delta x(y_{i+1} - y_i) \\ &= 2\Delta y - 2 \Delta x(y_{i+1} - y_i)\end{aligned}

Then I only have to track changes to this quantity at each iteration. This requires fewer calculations. Since at every iteration, the x-coordinate increases by one, x_{i+1} - x_i is always one. Since the y-coordinate increases by one only when D_i \geq 0

\begin{cases} & D_i < 0 \implies D_{i+1} = D_i + 2\Delta y \\ &D_i \geq 0 \implies D_{i+1} = D_i + 2\Delta y - 2 \Delta x \end{cases}

So when D is less than zero, add 2\Delta y to D and keep the y-coordinate the same. Otherwise, if D is greater than zero, add 2\Delta y - 2 \Delta x to D and increase the y-coordinate by one.
Now all that remains is to find the initial value, D_0:

\begin{aligned} D_0 &= 2\Delta y x_0 + 2\Delta y + 2\Delta x b - 2 \Delta x y_0 - \Delta x \\ &= 2\Delta y x_0 + 2\Delta y + 2\Delta x [y_0 - \frac{\Delta y}{\Delta x} x_0] - 2 \Delta x y_0 - \Delta x \\ &= 2 \Delta y - \Delta x \end{aligned}

And that’s all there is to it!

Counting in N-Dimensions


Using the Eclipse IDE, I have created a Java project that implements what is discussed in this post. You can view or download it from my github repository.

For convenience, this article includes the number 0 in the set of natural numbers, \mathbb{N}, which by convention usually does not include the number 0.

Definition: A set S is a countable set if and only if there exists a bijective mapping f: S \to \mathbb{N}, where \mathbb{N} is the set of natural numbers
(i.e. for any pair of elements, a and b in S for which a \neq b, f(a) \neq f(b), and for any element, n in \mathbb{N}, there exists an element c in S, such that f(c) = n).

\begin{array}{c |ccccc} \mathbb{N}^2 & 0 & 1 & 2 & 3 & \dots \\ \hline 0 & (0,0) & (0,1) & (0,2) & (0,3) \\ 1 & (1,0) & (1,1) & (1,2) & (1,3) \\ 2 & (2,0) & (2,1) & (2,2) & (2,3) \\ 3 & (3,0) & (3,1) & (3,2) & (3,3) \\ \vdots & & & & & \ddots \end{array}

By creating a table with rows and columns both labeled by the natural numbers in ascending order, and with entries consisting of these ordered row and height number pairs, each entry can be counted while traversing along the diagonals of the table in such a manner as to yield the following sequence listing the elements of \mathbb{N}^2:

\mathbb{N}^2 = \{(0,0), (1,0), (0,1),(2,0),(1,1),(0,2),(3,0),(2,1), ...\}

The reason for traversing along the diagonals of the table instead of its rows or columns is that, although they grow larger as the process continues, the diagonals are always finite in length. If I needed to count up to a particular ordered pair, I could always reach it in a finite number of steps by counting along the table’s diagonals. On the other hand, the rows and columns of the table are infinite in length. Were I to count along them, I could never get to an element occurring in a subsequent row or column.

Clearly, every possible ordered pair of natural numbers occurs in this sequence exactly once. Since there are an infinite number of possible ordered pairs of natural numbers, the sequence must be infinite in length. In this way, the set \mathbb{N}^2 satisfies the definition of a countable set (a similar argument demonstrates that \mathbb{Q}, the set of rational numbers, is also a countable set).

Repeating this process, the elements of \mathbb{N}^{3} can be counted by constructing a table with rows in \mathbb{N} and with columns in \mathbb{N}^{2}

\begin{array}{c |ccccc} \mathbb{N}^3 & \mathbb{N}^2_{0} & \mathbb{N}^2_{1} & \mathbb{N}^2_{2} & \mathbb{N}^2_{3} & \dots \\ \hline 0 & (0,\mathbb{N}^2_{0}) & (0,\mathbb{N}^2_{1}) & (0,\mathbb{N}^2_{2}) & (0,\mathbb{N}^2_{3}) \\ 1 & (1,\mathbb{N}^2_{0}) & (1,\mathbb{N}^2_{1}) & (1,\mathbb{N}^2_{2}) & (1,\mathbb{N}^2_{3}) \\ 2 & (2,\mathbb{N}^2_{0}) & (2,\mathbb{N}^2_{1}) & (2,\mathbb{N}^2_{2}) & (2,\mathbb{N}^2_{3}) \\ 3 & (3,\mathbb{N}^2_{0}) & (3,\mathbb{N}^2_{1}) & (3,\mathbb{N}^2_{2}) & (3,\mathbb{N}^2_{3}) \\ \vdots & & & & & \ddots \end{array} = \begin{array}{c |ccccc} \mathbb{N}^3 & (0,0) & (1,0) & (0,1) & (2,0) & \dots \\ \hline 0 & (0,0,0) & (0,1,0) & (0,0,1) & (0,2,0) \\ 1 & (1,0,0) & (1,1,0) & (1,0,1) & (1,2,0) \\ 2 & (2,0,0) & (2,1,0) & (2,0,1) & (2,2,0) \\ 3 & (3,0,0) & (3,1,0) & (3,0,1) & (3,2,0) \\ \vdots & & & & & \ddots \end{array}

which gives the sequence of elements in \mathbb{N}^3:

\mathbb{N}^3 = \{(0,0,0), (1,0,0), (0,1,0),(2,0,0),(1,1,0),(0,0,1),(3,0,0),(2,1,0), ...\}

This sequence can in turn be crossed with \mathbb{N} once more to count the elements of \mathbb{N}^{4}, and so on.

If I am given a particular n-tuple, can I find its sequence index number without counting up to it? For instance, at what index in \mathbb{N}^3 does the 3-tuple (13, 5, 7) occur? Obversely, if I am given a particular index number, can I find its corresponding n-tuple? for instance what is the 47th element in the \mathbb{N}^5 sequence?

It is best to first solve the simplest case in \mathbb{N}^2, and then generalize to \mathbb{N}^p

As elements in \mathbb{N}^2 are two-dimensional, this suggests a function of two variables, f: \mathbb{N}\times\mathbb{N} \to \mathbb{N}. As it turns out, it suffices that f be quadratic.

Thus, with as yet unknown constants C_i,

f(x,y) = C_0 x^2 + C_1 y^2 + C_2 xy + C_3 x + C_4 y + C_5

As there are six unknown constants, a system of six equations suffices to determine what they are. We know that (0,0) is the zeroth element in \mathbb{N}^2, (1,0) is the 1st element, (0,1) is the 2nd, and so on…

\begin{array}{c} f(0,2) = C_0 (0)^2 + C_1 (2)^2 + C_2 (0)(2) + C_3 (0) + C_4 (2) + C_5 = 5 \\ f(1,1) = C_0 (1)^2 + C_1 (1)^2 + C_2 (1)(1) + C_3 (1) + C_4 (1) + C_5 = 4 \\ f(2,0) = C_0 (2)^2 + C_1 (0)^2 + C_2 (2)(0) + C_3 (2) + C_4 (0) + C_5 = 3 \\ f(0,1) = C_0 (0)^2 + C_1 (1)^2 + C_2 (0)(1) + C_3 (0) + C_4 (1) + C_5 = 2 \\ f(1,0) = C_0 (1)^2 + C_1 (0)^2 + C_2 (1)(0) + C_3 (1) + C_4 (0) + C_5 = 1 \\ f(0,0) = C_0 (0)^2 + C_1 (0)^2 + C_2 (0)(0) + C_3 (0) + C_4 (0) + C_5 = 0 \\ \end{array}

Solving the system,

\begin{bmatrix}0&4&0&0&2&1\\ 1&1&1&1&1&1\\ 4&0&0&2&0&1\\ 0&1&0&0&1&1\\ 1&0&0&1&0&1\\ 0&0&0&0&0&1 \end{bmatrix} \begin{bmatrix}C_0\\C_1\\C_2\\C_3\\C_4\\C_5\end{bmatrix} = \begin{bmatrix}5\\4\\3\\2\\1\\0\end{bmatrix}

gives C_0 = \frac{1}{2}, C_1 = \frac{1}{2}, C_2 = 1, C_3 = \frac{1}{2}, C_4 = \frac{3}{2}, C_5 = 0.

and so,

\begin{aligned} f(x,y) &= \frac{1}{2}x^2 + \frac{1}{2}y^2 + xy + \frac{1}{2}x + \frac{3}{2}y \\ &= \frac{1}{2}(x+y)(x+y+1)+y \end{aligned}

This is known as the Cantor pairing function.

Not only can this function give the index of a particular 2-tuple, but by composing it recursively, it can give the index of a general n-tuple.

f(x_1,x_2,...,x_n) = f(x_1,f(x_2,f(...,f(x_{n-1},x_n)))

Now I can find the index of (13, 5, 7) in \mathbb{N}^3:

\begin{aligned} f(13,5,7) &= f(13,f(5,7)) \\ &= f(13, \frac{1}{2}(5+7)(5+7+1)+7) \\ &= f(13,85) \\ &= \frac{1}{2}(13+85)(13+85+1)+85 \\ &= 4,936 \end{aligned}

What about the inverse of this function, f^{-1}: \mathbb{N} \to \mathbb{N}^p? Given an index, can I calculate its corresponding n-tuple?

I will first show how to begin with a particular index in \mathbb{N}^2, i, and find the 2-tuple, (x(i),y(i)), that it corresponds to. Observe that the first element of the nth diagonal always begins with (n,0). Now also observe that the nth diagonal of the \mathbb{N}^2 table has n+1 elements. For instance, the 0th diagonal has one element, the 1st diagonal has 2 elements, the 2nd diagonal has 3 elements, and so on. This suggests the following mappings:

\begin{array}{ccccc} f^{-1}(0) & = & f^{-1}(0) & = & (0,0) \\ f^{-1}(0+1) & = & f^{-1}(1) & = & (1,0) \\ f^{-1}(0+1+2) & = & f^{-1}(3) & = & (2,0) \\ f^{-1}(0+1+2+3) & = & f^{-1}(6) & = & (3,0) \\ \vdots & & \vdots & & \vdots \\ f^{-1}(0+1+2+3+...+n) & = & f^{-1}(\frac{n(n+1)}{2}) & = & (n,0) \\ \end{array}

If the ith element is also the \frac{n(n+1)}{2}th element, for some n, then

\begin{array}{c} i = \frac{n(n+1)}{2} \\ \to 0 = n^2 + n - 2i \\ \to n = \frac{ \sqrt{1+8i}-1 }{2} \\ \end{array}

If i is larger than the \frac{n(n+1)}{2}th element, but still in the same diagonal, and hence, at most the \frac{n(n+1)}{2} + nth element

\lfloor\frac{\sqrt{1+8i}-1}{2}\rfloor < n

I can find the x(i) coordinate by substituting for n, this expression into \frac{n(n+1)}{2} + n - i= \frac{n(n+3)}{2} - i. The y(i) coordinate is derived similarly.

\begin{aligned} x(i) &= \frac{\lfloor\frac{\sqrt{1+8i}-1}{2}\rfloor(\lfloor\frac{\sqrt{1+8i}-1}{2}\rfloor+3)}{2} - i \\ y(i) &= i - \frac{\lfloor\frac{\sqrt{1+8i}-1}{2}\rfloor(\lfloor\frac{\sqrt{1+8i}-1}{2}\rfloor+1)}{2}\end{aligned}

I can extend the use of these equations to n-tuples by composing them in the following manner.

\begin{aligned} \mathbb{N}^q_{i} &= \mathbb{N}_{x(i)}\times\mathbb{N}^{q-1}_{y(i)} \\ &=\mathbb{N}_{x(i)}\times\mathbb{N}_{x(y(i))}\times\mathbb{N}^{q-2}_{y(y(i))} \\ &=\mathbb{N}_{x(i)}\times\mathbb{N}_{x(y(i))}\times\mathbb{N}_{x(y(y(i)))}\times\mathbb{N}^{q-3}_{y(y(y(i)))} \\ &= \dots \end{aligned}

Now I can answer the second question, what is the 47th element in the \mathbb{N}^5 sequence?

\begin{aligned} \mathbb{N}^5_{47} &= (\mathbb{N}_{x(47)},\mathbb{N}^{4}_{y(47)}) \\ &=(7,\mathbb{N}^{4}_{2}) \\ &=(7,\mathbb{N}_{x(2)},\mathbb{N}^{3}_{y(2)}) \\ &=(7,0,\mathbb{N}^{3}_{1}) \\ &=(7,0,1, \mathbb{N}^{2}_{0}) \\ &=(7,0,1, \mathbb{N}_{x(0)}, \mathbb{N}_{y(0)}) \\ &=(7,0,1, 0, 0) \end{aligned}

I have created a Java class, RecursiveIntTupleGenerator, that generates these sequences. An instance of this class has a specified number of dimensions, and is initialized to the 0th element of its corresponding sequence. You can view/download the Eclipse project source here.

\begin{tabular}{|l|} \hline RecursiveIntTupleGenerator \\ \hline -x: int \\ -subTuple: RecursiveIntTupleGenerator \\ -savedX: int \\ -dimensions: int \\ \hline +constructor(int numDimensions) \\ +getDimensions() \\ +getVal(int index) \\ +setZero() \\ +next() \\ +toString() \\ ... \\ \hline\end{tabular}

As discussed above, the Cantor pairing function directly associates the natural numbers to the respective natural number n-tuple, and vice versa, but I was also interested in capturing the recursive nature of the n-tuple counting process. This is implemented in the next() method.

	public void next() {
		if (dimensions == 1) {
			x += 1;
		} else {
			if (x > 0) {
				x -= 1;;
			} else {
				savedX += 1;
				x = savedX;

I’d like to expand on this project in the future. It must be possible to iterate over the integers (including negative numbers) in multiple dimensions just as well as it is possible to iterate over the natural numbers. Also, since this procedure may theoretically carry on indefinitely, it would make more sense to use Java’s BigInteger class instead of just integers.

Being able to generate a sequence of every possible positive whole number n-tuple affords an easy, brute-force solution search for multivariate Diophantine equations, and various other problems in number theory. I put my Java class to the test and verified Lagrange’s four-square theorem for the first 100 whole numbers.

\begin{array}{c} 0 = 0^{2} + 0^{2} + 0^{2} + 0^{2} \\ 1 = 1^{2} + 0^{2} + 0^{2} + 0^{2} \\ 2 = 1^{2} + 1^{2} + 0^{2} + 0^{2} \\ 3 = 1^{2} + 1^{2} + 1^{2} + 0^{2} \\ 4 = 2^{2} + 0^{2} + 0^{2} + 0^{2} \\ 5 = 2^{2} + 1^{2} + 0^{2} + 0^{2} \\ \vdots \\ 95 = 9^{2} + 3^{2} + 2^{2} + 1^{2} \\ 96 = 8^{2} + 4^{2} + 4^{2} + 0^{2} \\ 97 = 9^{2} + 4^{2} + 0^{2} + 0^{2} \\ 98 = 9^{2} + 4^{2} + 1^{2} + 0^{2} \\ 99 = 7^{2} + 7^{2} + 1^{2} + 0^{2} \\ \end{array}

I let my program run longer and verified the theorem for numbers up to 5,000. My program took a much, much longer time to verify the solution, 3,072 = 32^{2} + 32^{2} + 32^{2} + 0^{2}  than it did for any other number in this range. It must take many calls of the next() function to find this solution. Starting from (0,0,0,0), how many times does the next() function have to be called in order to find these four squares that add up to 3,072?