deepzen

Case Study-DEEPZEN

DeepZen Racing to market with the first AI solutions for convincing emotive speech with Meridian IT

  • Industry: sector-specific quality of speech

  • Focus: • To meet demand for compelling spoken content • Offer an alternative to costly, time-consuming studio recording • Conveying emotion effectively has been limited to human voices

  • Challenges

    • To meet demand for compelling spoken content • Offer an alternative to costly, time-consuming studio recording • Conveying emotion effectively has been limited to human voices

Overview

The market for spoken-word audio is under-served due
to the prohibitive cost and time it takes to find talented
actors and record them in studios.

Using neural networks developed and running on the IBM® PowerAI Enterprise platform, DeepZen analyzes text and turns it into synthesized speech including emotion, launching its solution four months ahead of schedule.

ai blue

Business Challenge

Humans are unique in their ability to communicate emotion through speech. From television advertising to audiobooks, actors inspire empathy and seize people’s attention by instilling their words with feeling.

However, recording voices is a laborious and expensive process. Studio space and talented actors are in short supply. Producing a typical audiobook costs thousands of dollars and takes weeks. This has created an enormous unmet demand for spoken content
among the large audience of people who are visually impaired, dyslexic or simply enjoy listening to recorded speech. In addition, more companies are using spoken content to strike a chord with customers on digital channels, further raising demand. DeepZen was established to fill this gap in the market.

The company set out to combine text-to-speech and natural language processing techniques to provide a
cost- and time-effective alternative to studio recording. Taylan Kamis, CEO and Co-founder of DeepZen, explains: “Our aim isn’t to put voice actors out of jobs, but rather to solve the capacity issues in the current market. We identify emotion in text automatically and use voice samples – for which we pay royalties to voice actors – combined with speech synthesis technology to produce convincing voice  audio. “To do this, we needed to create large and complex neural networks. These require extensive amounts of processing power to produce accurate results fast, so we needed the right technology platform to bring our vision to life.”

Want to know how we helped? Download the PDF below

Get in touch to find out more!

If you want an honest conversation about where your business needs to be in five years, and what kind of AI you need to get there, book a consultation with our AI team today.

More Case Studies

Click here to see all our case studies

Close Menu