機器學習筆記(五) - 高維體積計算、Gamma 函數與 Beta 函數、高斯積分

高維體積計算

在一個 \(n\) 維空間 \(\mathrm{R}^n\) 中,若欲求 \(k\ (k\le n)\) 個線性獨立向量 \(\{\vec{a_1}, \vec{a_2}, \cdots, \vec{a_k}\}\) 所構成的平行多面體 (注意此說法為在三維下的說法,在二維中其實就是個三角形,而高維中則是比較抽象的概念),令 \(A = \begin{bmatrix} \vec{a_1}& \vec{a_2}& \cdots \vec{a_k}\end{bmatrix}\):

\[\text{則此 \(k\) 個向量所構成的體積}\ V = \sqrt{\text{det}(A^TA)}\]

證明如下:
                在計算三角形面積時,最簡單的方法是 (底 x 高);而在計算體積時,一般會使用 (長 x 寬 x 高) 來計算;而推廣至高維的時候,可以理解為 \(k\) 個正交的向量長度相乘,利用 Gram-Schmidt 正交化 (Gram-Schmidt orthogonalization) 的方法可求出一組正交化基底:

\begin{array}{ll} \vec{b_1} = \vec{a_1}& &\vec{a_1} = \vec{b_1} \\ \vec{b_2} = \vec{a_2} - \frac{\vec{a_2}\cdot\vec{b_1}}{\vec{b_1}\cdot\vec{b_1}}\vec{b_1}& \Rightarrow& \vec{a_2} = \vec{b_2} + \frac{\vec{a_2}\cdot\vec{b_1}}{\vec{b_1}\cdot\vec{b_1}}\vec{b_1} \\ \qquad\vdots& & \qquad\vdots \\ \vec{b_k} = \vec{a_k} - \sum_{i = 1}^{k-1}\frac{\vec{a_k}\cdot\vec{b_i}}{\vec{b_i}\cdot\vec{b_i}}\vec{b_i}& & \vec{a_k} = \vec{b_k} + \sum_{i = 1}^{k-1}\frac{\vec{a_k}\cdot\vec{b_i}}{\vec{b_i}\cdot\vec{b_i}}\vec{b_i} \\ \end{array}

透過迭代可將等號右邊有關 \(\vec{a}\) 的項消掉,考慮一組單範正交基底 (orthonormal basis) \(\{\vec{q_1}, \vec{q_2}, \cdots, \vec{q_k}\}\),其中 \(\vec{b_i} = l_i\vec{q_i}\quad \forall 1\le i \le k \in \mathrm{N}\),令 \(\vec{a_i}=\sum_{j = 1}^{i}r_{ji}\vec{q_j}\),寫成矩陣形式:

\[A = \begin{bmatrix} \vec{a_1}& \vec{a_2}& \cdots \vec{a_k} \end{bmatrix} = \begin{bmatrix} \vec{q_1} & \vec{q_2}& \cdots& \vec{q_k} \end{bmatrix} \times \begin{bmatrix} r_{11}& r_{12}& \cdots& r_{1k}&\\ 0& r_{22}& \cdots& r_{2k}&\\ \vdots& \vdots& \ddots& \vdots& \\ 0& 0& \cdots& r_{kk}& \end{bmatrix} \triangleq QR\]

令 \(Q_i = \begin{bmatrix}\vec{q_1} & \vec{q_2}& \cdots& \vec{q_i}\end{bmatrix}\)、\(P_i\) 為投影至 \(\text{R}(Q_i)\) 的投影矩陣,考慮正交化在幾何中的意義:

\begin{align}\vec{b_i} &= l_i\vec{q_i} \\ &= \vec{a_i} - P_{i-1}\vec{a_i} \\ &= \vec{a_i} - Q_{i-1}(Q_{i-1}^TQ_{i-1})^{-1}Q_{i-1}^T\vec{a_i} \\ &= \vec{a_i} - Q_{i-1}Q_{i-1}^T\vec{a_i} \\ &= \vec{a_i} - \begin{bmatrix}\vec{q_1}& \vec{q_2}& \cdots& \vec{q_{i-1}} \end{bmatrix}\begin{bmatrix}\vec{q_1}^T\\ \vec{q_2}^T\\ \vdots\\ \vec{q_{i-1}}^T \end{bmatrix}\vec{a_i} \\ &= \vec{a_i} - \begin{bmatrix}\vec{q_1}& \vec{q_2}& \cdots& \vec{q_{i-1}} \end{bmatrix}\begin{bmatrix}\vec{q_1}^T\\ \vec{q_2}^T\\ \vdots\\ \vec{q_{i-1}}^T \end{bmatrix}\left(\sum_{j = 1}^{i}r_{ji}\vec{q_j}\right) \\ &= \vec{a_i} - \begin{bmatrix}\vec{q_1}& \vec{q_2}& \cdots& \vec{q_{i-1}} \end{bmatrix}\begin{bmatrix}\vec{q_1}^T\left(\sum_{j = 1}^{i}r_{ji}\vec{q_j}\right)\\ \vec{q_2}^T\left(\sum_{j = 1}^{i}r_{ji}\vec{q_j}\right)\\ \vdots\\ \vec{q_{i-1}}^T\left(\sum_{j = 1}^{i}r_{ji}\vec{q_j}\right) \end{bmatrix} \\ &= \vec{a_i} - \begin{bmatrix}\vec{q_1}& \vec{q_2}& \cdots& \vec{q_{i-1}} \end{bmatrix}\begin{bmatrix}r_{1i}\\ r_{2i}\\ \vdots\\ r_{(i-1)i} \end{bmatrix} \\ &= \sum_{j = 1}^{i}r_{ji}\vec{q_j} - \sum_{j = 1}^{i-1}r_{ji}\vec{q_j} \\ &= r_{ii}\vec{q_i} \end{align}

代回我們所欲求的體積:

\begin{align} V &= \prod_{i = 1}^k\Vert\vec{b_i}\Vert \\ &= \prod_{i = 1}^k\Vert r_{ii}\vec{q_i}\Vert \\ &= \prod_{i = 1}^k \vert r_{ii}\vert \\ &= \vert\text{det}(R)\vert\end{align}

回頭看我們先前所假設的體積公式:

\begin{align} \sqrt{\text{det}(A^TA)} &= \sqrt{\text{det}((QR)^TQR)} \\ &= \sqrt{\text{det}(R^TQ^TQR)} \\ &= \sqrt{\text{det}(R^TR)} \\ &= \sqrt{\text{det}(R^T)\text{det}(R)} \\ &= \vert\text{det}(R)\vert \\ &= V \end{align}

一個特殊情況發生在 \(k = n\),此時 \(\text{rank}A = n\)

\begin{align}\Rightarrow \sqrt{\text{det}(A^TA)} &= \sqrt{\text{det}(A^T)\text{det}(A)} \\ &= \sqrt{\text{det}(A)^2} \\ &= \vert\text{det}(A)\vert\end{align}

Jacobian 行列式

在做微積分相關運算時,有時會需要用到變數變換的技巧,最常見到的是由 笛卡兒坐標系(Cartesian coordinate system) 轉換成 極座標系(Polar coordinate system)。考慮在平面中的一點 \(p=(x,y)=(r,\theta)\)附近的面積,用微積分符號表示的話:

\[\text{d}x\ \text{d}y = \lim\limits_{\Delta s\to 0\\\Delta t \to 0} \left\vert\text{det} \left( \begin{bmatrix} \Delta s& 0 \\ 0& \Delta t \\ \end{bmatrix} \right) \right\vert \]

可以視作在平面上四個點 \((x,y),\ (x+\Delta s,y),\ (x,y+\Delta t),\ (x+\Delta s,y+\Delta t)\) 所構成的矩形:

也可以看做由兩個向量 \(\vec{a} = (\Delta s, 0)\) 及 \(\vec{b} = (0, \Delta t)\) 所構成的面積,接著我們將矩形的四個點對應至極座標系統,可以依序得到 右上方圖片的四個點,在 \(\Delta s, \ \Delta t\) 足夠小時,右方四個點構成的面積可以看作是一個平行四邊形,考慮以 \((r(x, y), \theta(x, y))\) 為出發點的向量,可以得到

\[\vec{a'} = (r(x +\Delta s, y), \theta(x + \Delta s, y)) - (r(x, y), \theta(x, y))\approx(r_x(x, y), \theta_x(x, y))\Delta s\\ \vec{b'} = (r(x,\ y+\Delta t), \theta(x,\ y+\Delta t)) - (r(x, y), \theta(x, y))\approx(r_y(x, y), \theta_y(x, y))\Delta t\]

其中 \(r_x\) 等價於 \(\partial r / \partial x\),其餘類似。

寫成矩陣形式:

\begin{align}\text{d}r\ \text{d}\theta &= \lim\limits_{\Delta s\to 0\\\Delta t \to 0}\left\vert\text{det}\left(\begin{bmatrix}\vec{a'}&\vec{b'}\end{bmatrix}\right)\right\vert \\ &= \lim\limits_{\Delta s\to 0\\\Delta t \to 0}\left\vert\text{det}\left(\begin{bmatrix}\frac{\partial r}{\partial x}\Delta s& \frac{\partial r}{\partial y}\Delta t \\ \frac{\partial \theta}{\partial x}\Delta s& \frac{\partial \theta}{\partial y}\Delta t\end{bmatrix}\right)\right\vert \\ &= \lim\limits_{\Delta s\to 0\\\Delta t \to 0}\left\vert\text{det}\left(\begin{bmatrix}\frac{\partial r}{\partial x}& \frac{\partial r}{\partial y} \\ \frac{\partial \theta}{\partial x}& \frac{\partial \theta}{\partial y}\end{bmatrix} \begin{bmatrix} \Delta s& 0 \\ 0& \Delta t \\ \end{bmatrix}\right)\right\vert \\ &= \left\vert\text{det}\left(\begin{bmatrix}\frac{\partial r}{\partial x}& \frac{\partial r}{\partial y} \\ \frac{\partial \theta}{\partial x}& \frac{\partial \theta}{\partial y}\end{bmatrix}\right)\right\vert \text{d}x\ \text{d}y \\ \end{align}

從另外一個角度來看,考慮一個 \(m\times n\) 的 Jacobian 矩陣,其作用為將 \(\mathrm{R^n}\) 映射到 \(\mathrm{R^m}\)的線性變換矩陣:

\[T(x) = T(p) + J(p)(x-p)\] \[\Rightarrow T(x) -T(p) = J(p)(x-p)\]

也就是說轉換坐標系後的向量可以表示為在原坐標系的向量做線性變換,下面試著將極坐標系轉成卡式坐標系:

\[\text{d}r\ \text{d}\theta = \lim\limits_{\Delta s\to 0\\\Delta t \to 0} \left\vert\text{det} \left( \begin{bmatrix} \Delta s& 0 \\ 0& \Delta t \\ \end{bmatrix} \right) \right\vert \] \begin{align}J(r,\theta) \begin{bmatrix}\Delta s \\ 0\end{bmatrix} &= \begin{bmatrix} \frac{\partial x}{\partial r}& \frac{\partial x}{\partial \theta} \\ \frac{\partial y}{\partial r}& \frac{\partial y}{\partial \theta} \end{bmatrix} \begin{bmatrix}\Delta s \\ 0\end{bmatrix} \\ &= \begin{bmatrix}\frac{\partial x}{\partial r}\Delta s \\ \frac{\partial y}{\partial r}\Delta s\end{bmatrix} \\ J(r,\theta) \begin{bmatrix} 0 \\ \Delta t\end{bmatrix} &= \begin{bmatrix} \frac{\partial x}{\partial r}& \frac{\partial x}{\partial \theta} \\ \frac{\partial y}{\partial r}& \frac{\partial y}{\partial \theta} \end{bmatrix} \begin{bmatrix}0 \\ \Delta t\end{bmatrix} \\ &= \begin{bmatrix}\frac{\partial x}{\partial \theta}\Delta t \\ \frac{\partial y}{\partial \theta}\Delta t\end{bmatrix} \end{align}

轉成卡式坐標系後:

\begin{align} \lim\limits_{\Delta s\to 0\\\Delta t \to 0} \left\vert\text{det} \left( \begin{bmatrix} \frac{\partial x}{\partial r}\Delta s& \frac{\partial x}{\partial \theta}\Delta t \\ \frac{\partial y}{\partial r}\Delta s& \frac{\partial y}{\partial \theta}\Delta t \\ \end{bmatrix} \right) \right\vert &= \lim\limits_{\Delta s\to 0\\\Delta t \to 0} \left\vert\text{det} \left( \begin{bmatrix} \frac{\partial x}{\partial r}& \frac{\partial x}{\partial \theta} \\ \frac{\partial y}{\partial r}& \frac{\partial y}{\partial \theta} \end{bmatrix} \begin{bmatrix} \Delta s& 0 \\ 0& \Delta t \\ \end{bmatrix} \right) \right\vert \\ &= \left\vert\text{det} \left( \begin{bmatrix} \frac{\partial x}{\partial r}& \frac{\partial x}{\partial \theta} \\ \frac{\partial y}{\partial r}& \frac{\partial y}{\partial \theta} \end{bmatrix} \right) \right\vert \lim\limits_{\Delta s\to 0\\\Delta t \to 0} \left\vert\text{det} \left( \begin{bmatrix} \Delta s& 0 \\ 0& \Delta t \\ \end{bmatrix} \right) \right\vert \\ &= \vert J(r,\theta)\vert \text{d}r\ \text{d}\theta \\ &=\begin{vmatrix}\cos\theta& -r\sin\theta \\ \sin\theta& r\cos\theta\end{vmatrix}\text{d}r\ \text{d}\theta \\ &= r\text{d}r\ \text{d}\theta\qquad (\because r \ge 0\text{,不用加絕對值}) \end{align}

Gamma 函數與 Beta 函數

在了解了 Jacobian 積分變換後,我們來推導上一篇文章中提到的 Gamma 函數與 Beta 函數之間的關係,還記得

\[\Gamma(z) = \int_{0}^{\infty} \frac{t^{z-1}}{\mathrm{e}^t} \,{\rm{d}}t\] \[\Rightarrow\Gamma (x)\Gamma (y)=\int _{u=0}^{\infty }\ e^{-u}u^{x-1}\,du\cdot \int _{v=0}^{\infty }\ e^{-v}v^{y-1}\,dv\]

根據 富比尼定理(Fubini's theorem):

\[\qquad\qquad\quad\ =\int _{v=0}^{\infty }\int _{u=0}^{\infty }\ e^{-u-v}u^{x-1}v^{y-1}\,du\,dv.\]

令 \(z=u+v,\ t = u / (u+v):\)

\begin{aligned}\qquad\qquad\quad&=\int _{z=0}^{\infty }\int _{t=0}^{1}e^{-z}(zt)^{x-1}(z(1-t))^{y-1}{\big |}J(z,t){\big |}\,dt\,dz\\[6pt] &=\int _{z=0}^{\infty }\int _{t=0}^{1}e^{-z}(zt)^{x-1}(z(1-t))^{y-1}\left\vert\text{det}\left(\begin{bmatrix}\frac{\partial u}{\partial z}&\frac{\partial v}{\partial z}\\\frac{\partial u}{\partial t}&\frac{\partial v}{\partial t}\end{bmatrix}\right)\right\vert\,dt\,dz\\[6pt] &=\int _{z=0}^{\infty }\int _{t=0}^{1}e^{-z}(zt)^{x-1}(z(1-t))^{y-1}\left\vert\text{det}\left(\begin{bmatrix}t&z\\1-t&-z\end{bmatrix}\right)\right\vert\,dt\,dz\\[6pt] &=\int _{z=0}^{\infty }\int _{t=0}^{1}e^{-z}(zt)^{x-1}(z(1-t))^{y-1}z\,dt\,dz\\[6pt] &=\int _{z=0}^{\infty }e^{-z}z^{x+y-1}\,dz\cdot \int _{t=0}^{1}t^{x-1}(1-t)^{y-1}\,dt\\[6pt] &=\Gamma (x+y)\,\mathrm {B} (x,y),\end{aligned}

高斯積分

若隨機變量 \(X\) 服從一個位置參數 (平均數) 為 \(\mu\)、尺度參數 (標準差) 為 \(\sigma\) 的常態分布,記為:

\[X\sim N(\mu,\sigma)\]

則其機率密度函數為

\[f(x) = \frac{1}{\sigma\sqrt{2\pi}}\,e^{- (x-\mu )^2 / (2\sigma^2)}\]

若考慮其指數項的積分:

\[\text{令}\quad I=\int_{-\infty}^{\infty}e^{-x^2}dx\] \[\text{則}\quad I^2=\int_{-\infty}^{\infty}e^{-x^2}dx\cdot\int_{-\infty}^{\infty}e^{-y^2}dy\]

根據 富比尼定理(Fubini's theorem):

\begin{align}\qquad\quad&= \int_{x = -\infty}^{\infty}\int_{y = -\infty}^{\infty}e^{-(x^2+y^2)}dx\,dy \\ &= \iint_{\mathrm{R}^2}e^{-(x^2+y^2)}dx\,dy \end{align}

轉換至極座標系統:

\begin{align}\qquad\quad &= \int_{r = 0}^{\infty}\int_{\theta =0}^{2\pi}e^{-r^2}rd\theta\,dr \\ &= \frac{1}{2}\int_{r = 0}^{\infty}\int_{\theta =0}^{2\pi}e^{-r^2}d\theta\,dr ^2 \\ &= \frac{1}{2}\times 2\pi \times\left.\left( -e^{-r^2}\right)\right |_{r=0}^{\infty} \\ &= \pi\end{align} \[\Rightarrow \int_{-\infty}^{\infty}e^{-x^2}dx = I= \sqrt{\pi}\]

此類型積分還有以下變形 (\(a \gt 0\)):

\begin{align} \int_{-\infty}^{\infty}e^{-(ax^2+bx+c)}dx &= \int_{-\infty}^{\infty}e^{-a(x^2+\frac{b}{a}x+\frac{c}{a})}dx \\ &= \int_{-\infty}^{\infty}e^{-a(x+\frac{b}{2a})^2+\frac{b}{4a}-c}dx \\ &= e^{\frac{b}{4a}-c}\int_{-\infty}^{\infty}e^{-a(x+\frac{b}{2a})^2}dx \\ &= \frac{1}{\sqrt{a}}e^{\frac{b}{4a}-c}\int_{-\infty}^{\infty}e^{-(\sqrt{a}x+\frac{b}{2\sqrt{a}})^2}d(\sqrt{a}x+\frac{b}{2\sqrt{a}}) \\ &= \sqrt{\frac{\pi}{a}}e^{\frac{b}{4a}-c} \end{align} \begin{align} \iint_{\mathrm{R}^2}e^{-(\lambda_1{x_1}^2+\lambda_2{x_2}^2)}dx_1dx_2 &=\int_{-\infty}^{\infty}e^{-\lambda_1{x_1}^2}dx_1\cdot\int_{-\infty}^{\infty}e^{-\lambda_2{x_2}^2}dx_2 \\ &= \sqrt{\frac{\pi}{\lambda_1}}\cdot\sqrt{\frac{\pi}{\lambda_2}} \\ &= \frac{\pi}{\sqrt{\lambda_1\lambda_2}} \end{align}

另外值得一提的是

\[\int e^{-x^2}dx = \frac{\sqrt{\pi}}{2}\operatorname {erf}(x)\]

其中

\[\operatorname {erf}(x)={\frac {2}{{\sqrt {\pi }}}}\int _{0}^{x}e^{{-t^{2}}}\,{\mathrm d}t.\]

因此上述積分並沒有解析解 (誤差函數並非初等函數),但依然可以用誤差函數的性質來計算高斯積分:

\begin{align}\int_{-\infty}^{\infty} e^{-x^2}dx &= \frac{\sqrt{\pi}}{2}\left.\operatorname {erf}(x)\right|_{-\infty}^{\infty} \\ &= \frac{\sqrt{\pi}}{2}(\operatorname {erf}(\infty)-\operatorname {erf}(-\infty)) \\ &= \frac{\sqrt{\pi}}{2}(1 - (-1)) \\ &= \sqrt{\pi}\end{align} 參考資料:

留言

這個網誌中的熱門文章

機器學習筆記(三) - 資訊熵

機器學習筆記(一) - LSE 、牛頓法

機器學習筆記(二) - 機率 、貝氏定理