Capio researchers improve conversational speech recognition accuracy with Dense Connection Networks

16th December, 2017

In the last few months, researchers at Capio have been focusing on exploring new Neural-Network techniques for conversational speech recognition. Inspired by the densely connected convolutional networks recently introduced for image classification tasks, we have been exploring densely connected bi-directional LSTMs for acoustic modeling.

By combining these densely connected models, with CNN-BLSTM acoustic models, Capio’s researchers were able to obtain accuracies of 95.0% and 90.9% on the Switchboard and CallHome test-sets, two NIST-standard evaluation sets for US-English. Capio’s single best system, that uses one acoustic model and one language model obtained an accuracies of 94.4% on Switchboard and 89.5% on CallHome. In order to be able to compare the performance of Capio’s systems to other research groups, we limited training data to use only data that is publicly available from LDC.

Capio continues to work hard to improve the accuracy of our speech recognition systems across different languages, acoustic conditions and accents. This fundamental work in exploring new Neural-Network architectures for speech recognition modeling is an important part of this process.

The CAPIO 2017 Conversational Speech Recognition System.pdf