MediaVOCS™

Say it, then play it.

Gracenote MediaVOCS™ lets users enjoy their music collections using voice commands. With MediaVOCS, Gracenote applies its expertise in music recognition, navigation and automatic playlist creation to speech recognition (ASR) and text-to-speech (TTS) technology. Device manufacturers can quickly deploy advanced speech-based media control features in a variety of digital devices, and MediaVOCS-enabled products provide music fans with hands-free access to their entire music collections in the car, at home, or on the go.

Pronunciation and Phonetics

ASR and TTS are limited when it comes to music. While ASR and TTS technologies have been longstanding well-developed parts of the digital landscape, neither ASR or TTS are designed to address the complexities of music. It's difficult for these technologies to recognize artist names, album names, track titles, and especially nicknames or alternate pronunciations. MediaVOCS addresses the limitations of existing ASR and TTS solutions by using a proprietary database of phonetic transcriptions for official and alternate artist, album, and track names. Standard speech technologies cannot consistently recognize or pronounce common artist, album, and track names. These often contain a variety of non-standard terms, including multiethnic names, abbreviations, nicknames, and invented words that defy the default pronunciation and language rules built into these systems.

Finding the Music

When integrated with the Gracenote Media Management System, MediaVOCS enables voice commands and enables an alternative way to use music devices. For example, traditional graphical user interfaces in devices that have small displays and limited manual control options become impractical when scrolling for an artist or an album in a 5,000-song collection while driving your car.

MediaVOCS enables music consumers to use voice commands to:

Phonetic Transcription Data

MediaVOCS provides the essential and critical phonetic transcription data required to not only recognize and pronounce music names and terms correctly, but also to recognize common user mispronunciations:

Phonetic Variants

Music fans from around the world have their own way of pronouncing a band or artist's name. Phonetic variants enable recognition of a wide range of pronunciations.

Additional Artist, Album, and Track Alternate Names

Elvis Presley is known by more names than only Elvis Presley. He's also "The King". MediaVOCS accounts for many nicknames and aliases coined for iconic artists and bands.

Enhanced Voice Commands to Manage, Enjoy, and Discover music

In just a few words you can, play your favorite artis, play your favorite album, or even create an entire automatic playlist to fit your mood. MediaVOCS works with Gracenote MusicID®, Playlist™, Playlist Plus™, and Link™ to enable voice control of the most popular Gracenote functionality.

MusicID

Playlist and Playlist Plus

Link

Navigation

Developer Tools Provided

  • Sample application and source code
  • Source code for operating system abstraction layer
  • Reference ports for Windows, Linux, or Darwin platforms
  • Embedded database and decryption key
  • Object code for full database lookup layer, cross-compiled to device
  • Documentation

System Requirements

  • Prototyping and Development
    Device code development environment:

  • - Operating Systems Supported: Windows 2000, Windows XP, Suse Linux 9, or Macintosh OS X.

    - System Requirements: 256 MB RAM to build sample code; 2 GB hard disk space for tool kit files, database, and documentation.

    - CD-ROM/DVD-ROM drive (or another mechanism for playback of digital music that can pass raw TOC data, if CD recognition is a desired feature): Accuracy to 1/75 of a second.

    Developer Environment: The Windows sample application and supplied source code use Visual Studio 6.0. The Linux and Macintosh OSX sample application and supplied source code use GCC with GNU Make-compatible Make files. The source files are written in standard ANSI C and are designed for easy porting and compiling on various target operating systems and environments.

    - Code: C source code / Visual C++ 6.0 or GNU development tools (GCC, Make, etc).

    - Internet connection

  • Target Device

  • The device code should have a small footprint to integrate into the application and to be portable. Gracenote MediaVOCS for devices is compiled into object code for the target microprocessor and operating system. The minimum requirements to run on most of these platforms are as follows:

    - Processor: 64 and 32-bit microprocessor and operating system (other designs by Gracenote approval).

    - Operating Systems Supported: Most commercial 32-bit Operating Systems, including Linux, VxWorks, QNX, and others upon request.

    - Disk Space for Gracenote Embedded Media Databases varies based on specific product configurations and target regions (contact Gracenote for more details).

    - ROM: 256 bytes for the decryption key

    - CD-ROM/DVD-ROM drive (or other mechanisms for playback of digital music that can pass raw TOC data, if CD recognition is a desired feature): Accuracy to 1/75 of a second .

    - Code: C source code / Visual C++ 6.0 or GNU development tools (GCC, Make, and others).

    - Internet connection

    - Memory: MediaVOCS does not require significant incremental memory above that required for the other Gracenote SDKs used in a particular implementation (MusicID, Playlist, Playlist Plus, or Link).

    Note: System requirements for the underlying 3rd party ASR/TTS solution will vary depending on vendor and product configuration.