学習を上手く行うために　Hyperparameter optimization ハイパーパラメーターの設定について

Hyperparameter optimization ハイパーパラメーターの設定について

Learning Rateやlearnig rate scheduleやregularization strengthなど調整すべきハイパーパラメーターをどのように設定するか

Learning Rateを例にとってまとめる

まずざっくりの範囲でRandom searchを行う

For example, a typical sampling of the learning rate would look as follows: learning_rate = 10 ** uniform(-6, 1).

このようにざっくり値を定めて

Prefer random search to grid search. As argued by Bergstra and Bengio in Random Search for Hyper-Parameter Optimization, “randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid”. As it turns out, this is also usually easier to implement.

良い精度を出す範囲を絞って行く

その絞って行く過程をどのようにするか

Stage your search from coarse to fine. In practice, it can be helpful to first search in coarse ranges (e.g. 10 ** [-6, 1]), and then depending on where the best results are turning up, narrow the range. Also, it can be helpful to perform the initial coarse search while only training for 1 epoch or even less, because many hyperparameter settings can lead the model to not learn at all, or immediately explode with infinite cost. The second stage could then perform a narrower search with 5 epochs, and the last stage could perform a detailed search in the final range for many more epochs (for example).

10の(−６~1)乗の範囲で探索している場合、多くても1epochで絞り込む

次の絞り込みの段階では5epoch

次の段階ではより多くのepoch数で範囲を絞って行く