学習を上手く行うために　Batch Normalization

Batch Normalizationについて

Batch NormalizationはConv層やFully Connected層の後に用いられ、W*Xで出力された値を正規化し、無理やり正規分布になるよう値を調整する、それを次の層へ出力する

これによりネットワークの傾きの流れが良くなり

Learning rateを高く設定することができ

Initializationの高度な設計をせずにすみ

正則化の役割をある意味果たし、dropoutの必要性も少なくすることができる

Batch Normalization. A recently developed technique by Ioffe and Szegedy called Batch Normalizationalleviates a lot of headaches with properly initializing neural networks by explicitly forcing the activations throughout a network to take on a unit gaussian distribution at the beginning of the training. The core observation is that this is possible because normalization is a simple differentiable operation. In the implementation, applying this technique usually amounts to insert the BatchNorm layer immediately after fully connected layers (or convolutional layers, as we’ll soon see), and before non-linearities. We do not expand on this technique here because it is well described in the linked paper, but note that it has become a very common practice to use Batch Normalization in neural networks. In practice networks that use Batch Normalization are significantly more robust to bad initialization. Additionally, batch normalization can be interpreted as doing preprocessing at every layer of the network, but integrated into the network itself in a differentiable manner. Neat!

Batch Normalizationを行うことで、初期化の悪影響にも頑強になり、各層に置いて前処理を行なっていると解釈できる。

cs231n.github.io

けの〜のブログ

ガッキーでディープラーニングして教育界に革命を起こす

学習を上手く行うために　Batch Normalization