The dropout technique is a data-driven regularization method for neural networks. It consists in randomly setting some
activations from a given hidden layer to zero during training. Repeating the procedure for each training example, it is equivalent to sample a network from an exponential number of architectures that share weights. The goal of dropout is to prevent feature detectors to rely on each other.
Dropout has successfully been applied to deep MLPs and to convolutional neural networks, for various tasks of speech recognition and computer vision. We recently proposed a way to use dropout in MDLSTM-RNNs for handwritten word and line recognition.
In this paper, we show that further improvement can be achieved by implementing dropout differently, more specifically by applying it at better positions relative to the LSTM units.