Least sqaure error is used as a cost function in linear regression. However, why should one choose sqaure error, instead of absolute error, or other choices? There’s a simple proof that can show that least sqaure error is a reasonable and natural choice.

Assume the target variable and inputs are related as below:

,where \(\epsilon\sim\mathcal{N}(\mu,\,\sigma^{2})\)

i.e.

implies that

We would like to minimize the error by maximising the log likelihood. The likelihood function is

Minimizing the log likelihood function

which is equivalent to minimizing

, which is also known as the least sqaure function, and note that the \(\sigma^2\) is irrelavent in this case.

Note that the least-sqaure method corresponds to the maximum likelihood estimation. Hence, one can justify the least-sqaure method, with the natural assumption of \(\epsilon\sim\mathcal{N}(\mu,\,\sigma^{2})\).

*It’s my first time using Katex, and it’s tough writing mathematical equations in markdown files. I chose a simple example as a practice. Here is the Github repo of Katex: https://github.com/Khan/KaTeX