Dense LSTMs for Speech Recognition

29th January, 2018

In this post, we introduce a new neural network architecture for speech recognition, densely connected LSTM (or dense LSTM). At Capio, we have recently achieved state-of-the-art performance on the industry-standard NIST 2000 Hub5 English evaluation data, which is a significant jump from our first work in the domain. A combination of multiple systems, including a few benefiting from dense LSTM acoustic models, enabled us to achieve these results. Let’s start with how we were motivated.

Gradient vanishing is a phenomenon where error signals vanish during back propagation as they go deeper inside a neural network being stacked with a number of layers. This prevents deep neural networks from being trained properly. Deep Residual Learning was proposed to mitigate the gradient vanishing phenomenon, which exploits skip connections between neural layers.

[Read more...]

Capio researchers improve conversational speech recognition accuracy with Dense Connection Networks

16th December, 2017

In the last few months, researchers at Capio have been focusing on exploring new Neural-Network techniques for conversational speech recognition. Inspired by the densely connected convolutional networks recently introduced for image classification tasks, we have been exploring densely connected bi-directional LSTMs for acoustic modeling.

By combining these densely connected models, with CNN-BLSTM acoustic models, Capio’s researchers were able to obtain accuracies of 95.0% and 90.9% on the Switchboard and CallHome test-sets, two NIST-standard evaluation sets for US-English. Capio’s single best system, that uses one acoustic model and one language model obtained an accuracies of 94.4% on Switchboard and 89.5% on CallHome. In order to be able to compare the performance of Capio’s systems to other research groups, we limited training data to use only data that is publicly available from LDC.

Capio continues to work hard to improve the accuracy of our speech recognition systems across different languages, acoustic conditions and accents. This fundamental work in exploring new Neural-Network architectures for speech recognition modeling is an important part of this process.

The CAPIO 2017 Conversational Speech Recognition System.pdf