机器学习以及贝叶斯统计里，关于近似intractable integral，大家都偏爱什么算法？第1页

zi-yuan-35 网友的相关建议:

假设有先验和似然，现在要求，但分母那个积分并不能直接求出，这是我们要解决的问题。

Gaussian quadrature

是基于格点一维积分的数值算法。这种一维数值积分算法很难直接推广到高维积分，因为所需格点数量随维数指数增长。也有一些确定性的基于稀疏格点的数值算法，不过限制比较多，在这个问题上大家不关心这类数值算法。

Laplace (Gaussian) approximation 的想法是由于（是的MAP，），因此可以通过高斯积分算出。好处是简单，坏处是显然不能什么分布都拿正态分布去近似。实际中多作为trick使用。例子请在

Machine Learning: A Probabilistic Perspective

中搜索 Gaussian approximation，其中包括了

Bayesian information criterion

的推导。

Variational inference

的目标是求一个使得最小，最终表现形式为 mean field approximation。由于涉及到优化，所以一个方向是关于把大规模优化的东西用到variational inference上。例子：

Stochastic Variational Inference

。

Markov chain Monte Carlo

就是直接从采样了。过程是

1. 假设当前位置为，现从某个容易采样的proposal 分布中采一个；

2. 以概率（请翻书）接受或拒绝。如果接受了，，否则。如此一来这个的Markov链的稳态分布恰好就是。

实际使用中可能会有各种问题，比如说很难从分布的一个 mode 跳到另一个 mode 啦，于是有人提出

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

（大致就是下个样本走得越远越好，no U-turn），并在其上做了个

Stan

，实现了全自动基于 NUTS 的 full Bayesian inference（输入生成模型的描述及数据，输出模型参数的采样样本，带某种程度的可视化）。

另外也有关于并行化的一些研究，可以自己搜一下。

（小知识：其他做 Bayesian inference 的包还有

BayesPy

、

PyMC

等等）

MCMC 中的接受概率是跟似然函数有关的，假如连这个也很难具体算出来，那么可以考虑

approximate Bayesian computation

，做所谓的 likelihood-free inference：在采到后，用它来生成数据，如果就接受，否则拒绝，这里是某个统计量，是某个距离。

另外可以参考

Handbook of Markov Chain Monte Carlo

，上面有 MCMC 的各种推广及改进。

MCEM 就是把 MCMC 用到EM中的E步。相应地有 variational EM。Machine Learning: A Probabilistic Perspective 11.4.9 有两者及其他一些EM变种的简介。

关于 variational inference 跟 sampling 算法的比较，这里原文引用 Machine Learning: A Probabilistic Perspective 第24章的导语：

It is worth briefly comparing MCMC to variational inference (Chapter 21). The advantages of variational inference are (1) for small to medium problems, it is usually faster; (2) it is deterministic; (3) is it easy to determine when to stop; (4) it often provides a lower bound on the log likelihood. The advantages of sampling are: (1) it is often easier to implement; (2) it is applicable to a broader range of models, such as models whose size or structure changes depending on the values of certain variables (e.g., as happens in matching problems), or models without nice conjugate priors; (3) sampling can be faster than variational methods when applied to really huge models or datasets.

（这里没有提到判断 MCMC 是否收敛并不是显然的，这算是一个缺点。具体参见

Handbook of Markov Chain Monte Carlo

第 6 章）

另外，

Markov Chain Monte Carlo and Variational Inference: Bridging the Gap

也许可以看一下。

机器学习以及贝叶斯统计里，关于近似intractable integral，大家都偏爱什么算法？的其他答案点击这里