OpenMRCP

From FreeSWITCH Wiki

Jump to: navigation, search
demo

IMPORTANT mod_openmrcp has been replaced by mod_unimrcp. The principles of MRCP server operation that are mentioned on this page remain the same. The diagram is essentially the same as well, except it uses UniMRCP instead of OpenMRCP.

Contents

Overview

Introduction

MRCP is a protocol for PBX's to communicate with ASR and TTS engines. In version 1 of the MRCP spec, RTSP is used for session setup and RTP used for media streaming, whereas in version 2 of the MRCP spec, SIP is used instead of RTSP for session setup (RTP still used for media streaming).

OpenMRCP is a full stack, and can be used as the basis of either an mrcp server or an mrcp client. It uses sofia for the SIP stack and custom developed rtp library. OpenMRCP was sponsored by Cepstral, a leading TTS voice provider.

Current Status

It was officially released on Monday, August 20th - Press Release

Supports both v1 and v2 of the MRCP protocol.

See Fisheye for tracking the subversion repository.

Softswitch Integration

An mrcp client has been be integrated into Freeswitch as the mod_openmrcp module.

Using OpenMRCP

The following instructions are geared towards developers who want to integrate OpenMRCP into a product. Most people will probably want to use OpenMRCP as a module in an existing soft switch, for example by using mod_openmrcp that is included with FreeSWITCH.

Downloading/Building the code

svn co http://svn.openmrcp.org/svn/openmrcp/trunk openmrcp_trunk

To build the code, see the INSTALL file for instructions. Here is an example of building on linux:

Type This
./bootstrap
./configure --with-apr=/usr/src/freeswitch/libs/apr --with-apr-util=/usr/src/freeswitch/libs/apr-util


Informational Tip

--with-sofia-sip not specified, so you will need to have a /usr/local/include/sofia-sip-X.XX directory. Download sofia-sip from their main web page and install.


 


Running the server

cd platform/server
./openmrcpserver

Running the client: TTS

There is a default "demo" TTS server that just streams from a file instead of actually converting any text to speech.

Create a demo.pcm file

By default, the server will use a "demo synthesizer" that just streams the data that is in the demo.pcm file. For testing purposes, you will need to create your own demo.pcm file and put it in the correct directory.

  • Go to platform/server directory
  • Create a test audio file called demo.pcm with the following characteristics
    • 8000 khz
    • 16-bit
    • 1-channel
    • Signed linear (eg, pcm)

Sox command to do this

sox foo.wav -r 8000 -w -s -c 1 -t raw demo.pcm

Simple bash script to convert a wav file to pcm with a bit of error checking.

#!/bin/bash
[ ! "$1" -o ! -e "$1.wav" ] && echo "You must supply a valid filename, omit the .wav extension" && exit 1
echo -n "Converting $1.wav to $1.pcm: " && sox $1.wav -r 8000 -w -s -c 1 -t raw $1.pcm 2>/dev/null >/dev/null
[ "$?" == "0" ] && echo "OK" && exit 0
echo "ERROR" && exit 1

Starting server

The client needs something to connect to, so start the server

cd platform/server
./openmrcpserver

Starting client

cd platform/client 
./openmrcpclient 

Also, to show help options

> help

Connect to server

> create 0 

The 0 stands for the 0th session. The client can juggle multiple sessions, and each one is identified by a slot number.


Create a TTS channel

> add 0 0  

0 is the session slot, 2nd 0 means to create a channel of type TTS

Send MRCP msg

The following will instruct the mrcp server to convert the given text to audio and stream it back to the client via RTP. The speak.msg is a test message that is in the OpenMRCP source checkout.

> msg 0 /path/to/test/parsertest/v2/speak.msg

0 is the session slot, the second parameter is the actual MRCP control message.

Informational Tip

you don't need to modify Channel-identifier manually in the .msg file, as the all session (context) specific values will be overridden automatically before sending the message.


 


Verifying it worked

The client will create a .pcm file such as synth_result_0.pcm in the same directory as the binary. Convert this to a wav file using sox

sox -t raw -r 8000 -c 1 -w -s synth_result_0.pcm synth_result_0.wav

Another way to verify it that worked is to use a packet sniffer like Wireshark. After capturing while performing the steps above, go to the statistics menu, choose "voip calls", highlight call and click "player", click checkbox and click "play".

Running the client: ASR

Connect to server

> create 0 

The 0 stands for the 0th session. The client can juggle multiple sessions, and each one is identified by a slot number.

Create an ASR channel

This basically tells the server "Hey, I'm gonna be sending you some data for your ASR engine to interpret. Setup a media port and an MRCP control port, and send me the port numbers"

> add 0 1  

0 is the session slot, 1 means to create a channel of type ASR

Send MRCP msg

The following will instruct the mrcp server to start performing speech recognition on the data in the RTP media stream that the client is presumably sending. The recognize.msg is a test message that is in the OpenMRCP source checkout.

> msg 0 /path/to/test/parsertest/v2/recognize.msg

0 is the session slot, the second parameter is the actual MRCP control message.

Informational Tip

you don't need to modify Channel-identifier manually in recognize.msg, as the all session (context) specific values will be overridden automatically before sending the message.


 


Sending media

If the file speech_to_recog.pcm is present in the same directory as the openmrcpclient executable, upon sending an MRCP message, the client should also stream any audio in this file.

Troubleshooting/FAQ

How do I see more logging statements?

Issue "loglevel 7" on CLI to get more logs.

Should I call init on the client?

Sofia-sip has some problems with re-invites. However, there is no scenario when "init" is required. It's just optional command to ping the server. It is recommend not to use it at the moment.

Can I run server and client on different boxes?

You can set the network interface (ip address) of both the client and server, by default it's localhost, and it's ok when both client and server on the same machine. You can specify the interface explicitly (e.g. openmrcpserver -i 192.168.0.1 and openmrcpclient -c 192.168.0.2 -s 192.168.0.1)

The server is not binding to port 544

You must run the process as root to bind to a reserved port (< 1024).
Informational Tip

updated code to bind to port 1544 by default.


 


How do I use the Cepstral TTS engine?

First, please follow the instructions How do I build .so libraries.

There is a configure option --with-swift, by default it looks at /opt/swift. You have to specify configure (--with-swift) option only when you installed cepstral sdk in none default location, by default it always look for /opt/swift.

Type This
./configure --with-swift=/path/to/cepstral


Run-time linking are used for plugins, so you can load cepstral plugin to openmrcpserver. You should specify the location of libswiftplugin.so

Type This
./openmrcpserver --synth-plugin=/usr/local/openmrcp/lib/libswiftplugin.so


Multiple Servers and MRCP

Question

The MRCP v2 specification's architecture diagram (Figure 1) shows one Media Resource Server containing multiple processing resources (speech recognition, speaker verification, speaker identification and speech synthesis) connected to a client. It appears that the MRCP v2 layer in the server performs mixing and routing of sound streams between the speech processors. The RTP stream goes directly from the source/sink without bypassing the MRCP client.

The architecture in the diagram at the top of this page looks significantly different, with RTP streams going through the server, and multiple speech processing servers.

Please explain why these architectures look so different! Does OpenMRCP mix RTP streams at the client side?

Answer

There are many possible ways to configure a setup, and the MRCP v2 spec diagram basically folds all these possibilities into a single diagram. The architecture diagram above instead just shows one valid configuration out of the many possibilities. In the particular configuration that is illustrated, there are two seperate MRCP servers, one which performs asr, and the other which performs tts. If a caller dialed into an IVR that used both ASR and TTS resources, the outgoing audio from the callers phone would go into freeswitch, then out of mod_openmrcp and into the recognizer server, while the incoming audio received at the callers phone would be received from freeswitch, and freeswitch will have received it from the (seperate) tts server.

Another possible valid configuration -- not shown in the diagram -- is to have a single MRCP server that can perform both asr and tts. (this is more like the configuration implied by the MRCP v2 spec diagram). In this case, there would only be one rtp stream path: the client connected to freeswitch has a two way rtp stream between itself and freeswitch. Then, the embedded mrcp client in mod_openmrcp has a two way rtp stream between the single MRCP server. The outgoing audio from the mod_openmrcp will carry audio to be used by the recognizer resource, where the incoming audio will contain any TTS audio generated by the TTS resource.

libdemoplugin as a shared library?

Question

How can i build libdemoplugin build as a shared library instead of a static library so i can load it as the default synth plugin in openmrcp?

Answer

Please see How do I build .so libraries

How do I build shared object .so libraries?

By default, depending on your setup, you might not be able to build .so libraries without some tweaking. If openmrcp is built using the libtool that ships with the APR that ships with FreeSWITCH, .so libraries will NOT be built. There are two ways around that:

Configure OpenMRCP against standalone APR

By changing openmrcp to configure against the standalone APR rather than the APR that ships with FreeSWITCH, it will use a version of libtool that is able to successfully build .so libs.

In order to make this work, you will have to first download APR (Apache Portable Runtime), extract it and build it.

FreeSwitch 1.0 has APR 1.2, that is the version you should build the standalone APR with.


Then, when configuring openmrcp, instead of

./configure --with-apr=/usr/src/freeswitch_trunk/libs/apr 

run

./configure --with-apr=/usr/src/apr

Hack configure.in script

This change prevents it from using the FreeSWITCH version of APR and its corresponding libtool version, and thus it will fallback to the system default libtool.

Here is an example diff of what changes should be done to the openmrcp configure.in file

Index: configure.in
===================================================================
--- configure.in        (revision 466)
+++ configure.in        (working copy)
@@ -25,8 +25,8 @@
 MRCP_CHECK_SWIFT
 
 
-LIBTOOL="`$apr_config --apr-libtool` --silent"
-AC_SUBST(LIBTOOL)
+#LIBTOOL="`$apr_config --apr-libtool` --silent"
+#AC_SUBST(LIBTOOL)
 
 
AC_SUBST(ac_aux_dir)

Related Specifications

  • NLSML - Natural Language Semantics Markup Language for the Speech Interface Framework
Personal tools
Community
Support FreeSWITCH