けの〜のブログ

ガッキーでディープラーニングして教育界に革命を起こす

Model Ensemble モデルアンサンブルについて

Model Ensemble モデルアンサンブルについて

 

実用上一通り学習を終えた後にもう一つ精度を上げたいとなった場合、モデルアンサンブルという手法がとられることがある

この手法は、最後の予測を複数のモデルの出力の平均をとった値を出力とする、という手法である。

Model Ensembles

In practice, one reliable approach to improving the performance of Neural Networks by a few percent is to train multiple independent models, and at test time average their predictions. As the number of models in the ensemble increases, the performance typically monotonically improves (though with diminishing returns). Moreover, the improvements are more dramatic with higher model variety in the ensemble. There are a few approaches to forming an ensemble:

  • Same model, different initializations. Use cross-validation to determine the best hyperparameters, then train multiple models with the best set of hyperparameters but with different random initialization. The danger with this approach is that the variety is only due to initialization.
  • Top models discovered during cross-validation. Use cross-validation to determine the best hyperparameters, then pick the top few (e.g. 10) models to form the ensemble. This improves the variety of the ensemble but has the danger of including suboptimal models. In practice, this can be easier to perform since it doesn’t require additional retraining of models after cross-validation
  • Different checkpoints of a single model. If training is very expensive, some people have had limited success in taking different checkpoints of a single network over time (for example after every epoch) and using those to form an ensemble. Clearly, this suffers from some lack of variety, but can still work reasonably well in practice. The advantage of this approach is that is very cheap.
  • Running average of parameters during training. Related to the last point, a cheap way of almost always getting an extra percent or two of performance is to maintain a second copy of the network’s weights in memory that maintains an exponentially decaying sum of previous weights during training. This way you’re averaging the state of the network over last several iterations. You will find that this “smoothed” version of the weights over last few steps almost always achieves better validation error. The rough intuition to have in mind is that the objective is bowl-shaped and your network is jumping around the mode, so the average has a higher chance of being somewhere nearer the mode.

 

同じモデルだが初期設定の値を変える方法

cross validationで良い精度の上からいくつかのモデルを使用する方法

学習にコストがかかってしまう場合、あるepoch数での状態のモデルを保存してそれらの平均を取る方法

がある。

 

ただしモデルアンサンブルはtestdataに対し出力するのに時間がかかってしまうらしい。

それを効率化する研究もなされているようだ。

One disadvantage of model ensembles is that they take longer to evaluate on test example. An interested reader may find the recent work from Geoff Hinton on “Dark Knowledge” inspiring, where the idea is to “distill” a good ensemble back to a single model by incorporating the ensemble log likelihoods into a modified objective.

 

cs231n.github.io