Explain Normal Equation matrix formula

I’m just learning the Machine Learning course of Andrew Ng on coursera , and at the Normal Equation lesson I encountered a matrix formula to compute regression coefficients \Theta without the explanation of how to come up with that.

There are some blogs on the Internet prove this formula in very detail and I just want to share an easy way to explain as well as to memorize the formula that is efficiently and handy.

So we have matrix X as the design matrix, and Y is the output vector of size (m+1). We want to find\Theta matrix so that:

$latex X \Theta = Y$

All we want to do is isolating \Theta in the left side (just like we always do to isolate x when solving a equation). And to do that, we want to “bring” what multiply by \Theta to the right side, and we can do that only when “what” is a square matrix (make sense right ?).

So now X is not a square matrix yet, we will multiply X by its transpose matrix, which is $latex X^{T}$. Now we have:

$latex X^{T}X\Theta = X^{T}Y$

Now X^{T}X is a square matrix, we can “bring” it to the right side:

\Theta = (X^{T}X)^{-1}X^{T}Y

And that’s it !



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s