What is KDE

Kernel Density Estimation is a way of estimating an unkonwn probability density function given some data, similar to histogram.

Basic idea is you define a kernel function and you center a kernel function $K$ on each sampledata point $xi$, and then you sum these functions together, and then you have a kernel density estimate $\hat{f}(x) = \frac{1}{N}\sum\limits{i=1}\limits^{N} K(x-x_i)$.

Choice of Kernel

Attribute of Kernel

The kernel function $K$ istypically

Non-negative: $K(x) \geq 0$ for every $x$，因为描述的是概率
Symmetric: $K(x) = K(-x)$ for every $x$
Decreasing: $K’(x) \leq 0 $ for every $x$，随着x的增大值靠近0
can be bounded support or not，例如高斯核函数就是渐进0但是不为0，而三角和函数就是直接为0。

Four kinds of kernel

Guassian/Box/Triangle/Triweight

Choice of Kernel is not that important because as data grows large the final function estimation will look as same. Gaussian will be fine.

Choice of Bandwidth

Use $h$ to control for the bandwidth of

$\hat{f}(x) = \frac{1}{Nh}\sum\limits_{i=1}\limits^{N} K(\frac{x-x_i}{h})$

感觉bandwidth就是决定了对于每一个数据的核函数的分布函数的分散程度$\sigma$。

Silverman’s rule of thumb

computes an optimal $h$ by assuming that the data is normally distributed.

Improved Sheater Jones(ISJ)

Multimodel, seveal modes, such as two noraml distributions.

Tips

Weighting data

it is possible to add weights $w_i$ to data points $x_i$ by writing

$\hat{f}(x) = \frac{1}{h}\sum\limits_{i=1}\limits^{N} w_iK(\frac{x-x_i}{h}), \, where \, \sum\limits_{i=1}\limits^{N}w_i=1$

Bounded domains

A simple trick to overcome bias at boundries is to mirror the data.

在分界处间断点如何确定值？把一侧的数据镜像到另一侧，然后两个分布相加后获得的新的original+mirrored distribution在间断点处做切割，做了mirror的那个部分全为概率0，original的部分采用新的original+mirrored distribution。

Extension to d dimensions

An approach to $d$-dimensional estiamtes is to wirte

$\begin{array}{ll} \hat{f}(x) = \frac{1}{h^d}\sum\limits_{i=1}\limits^{N} w_iK(\frac{\|x-x_i\|_p}{h}), \, where \, \sum\limits_{i=1}\limits^{N}w_i=1 \\ \|x\|_p := (\sum_{i=1}|x_i|^p)^{1/p} \end{array}$

the choice of norm

The shape of kernel functions in higher dimensions depend on the value of p in the p norm.

As the number of samples grow, the choice of both kernel K and norm p becomes unimportant. The bandwidth H is still important.

Reference

Intro to Kernel Density Estimation