学習を上手く行うために　重みの初期設定　Weight Initializationについて

重みの初期設定　Weight Initializationについて

最初の重みが統一されていると、つまり重みフィルターw0~wnまで同じ値だと重みフィルターをいくつも用意する意味がない（ただ同じ値が計算されていくだけだから）

なので重みフィルターの値は異ならないといけない

しかし重みの値が小さすぎると層をかさねていくと0にどんどん近づいていく

大きすぎると大きい値になっていく(tanhなどの活性化関数では傾きが殺されてしまう）

これらによって重みの更新が効率よく行えなくなってしまう。

tanhなどの活性化関数ではXavierの関数を用いて初期設定すると値が保存されていいらしい。

しかし

Reluなどの活性化関数を使用する際にも負の値は0で返されてしまうため学習を上手く行えない

よって重みの初期設定は学習においてとても重要である

In practice, the current recommendation is to use ReLU units and use the w = np.random.randn(n) * sqrt(2.0/n), as discussed in He et al..

実用上は活性化関数としてReluを用いる場合このように重みを初期化することが望ましいらしい。

Initializing the biases. It is possible and common to initialize the biases to be zero, since the asymmetry breaking is provided by the small random numbers in the weights. For ReLU non-linearities, some people like to use small constant value such as 0.01 for all biases because this ensures that all ReLU units fire in the beginning and therefore obtain and propagate some gradient. However, it is not clear if this provides a consistent improvement (in fact some results seem to indicate that this performs worse) and it is more common to simply use 0 bias initialization.

バイアスの初期化は、0を用いるのが通例らしい