mod_unimrcp

About

mod_unimrcp is the FreeSWITCH module that allows communication with Media Resource Control Protocol (MRCP) servers. MRCP allows client machines to control media resources on a network. MRCP version 1 uses the Real Time Streaming Protocol (RTSP) while version 2 uses the Session Initiation Protocol (SIP) to negotiate the MRCP connection. mod_unimrcp allows FreeSWITCH to act as such a client. Servers are supplied by numerous vendors such as Cepstral, Voxeo, Nuance, and many others.

As of 2022.08.29 mod_unimrcp was removed from FreeSWITCH tree and placed in https://github.com/freeswitch/mod%5Funimrcp

Some of the media resources that can be controlled via MRCP:

Automatic speech recognition (ASR)
Text to speech (TTS)
Speaker verification and identification (not currently supported by mod_unimrcp)

Click here to expand Table of Contents

1 Compatibility
2 Building
3 Configuration
4 Example Configuration
- 4.1 unimrcp.conf.xml
- 4.2 mrcp_profiles
5 Using From FreeSWITCH
- 5.1 TTS
- 5.2 ASR Parameters
- 5.3 ASR Grammars
6 See Also

Compatibility

mod_unimrcp has been tested with the following products

UniMRCP Server
Speech Technology Center, VoiceNavigator
Nuance Speech Server 5.0/5.1
Voxeo Prophecy 8.0
Loquendo Suite 7.0
Acapela TTS for Windows Server and Linux Server 6.400 with MRCPv2 add-on (As Acapela MRCPv2 add-on is based on UniMRCP Server: Reference)
IVONA Telecom 1.6

Building

Add asr_tts/mod_unimrcp to modules.conf
Build FreeSWITCH with 'make install'

Configuration

Edit conf/autoload_config/unimrcp.conf.xml
Add MRCP profiles to conf/mrcp_profiles/
Add <load module="mod_unimrcp"/> to autoload_configs/modules.conf.xml

Example Configuration

unimrcp.conf.xml

<configuration name="unimrcp.conf" description="UniMRCP Client">
 <settings>
   <param name="default-tts-profile" value="voxeo-prophecy8.0-mrcp1"/>
   <param name="default-asr-profile" value="voxeo-prophecy8.0-mrcp1"/>
   <param name="log-level" value="DEBUG"/>
   <param name="max-connection-count" value="100"/>
   <param name="offer-new-connection" value="1"/>
 </settings>
 <profiles>
   <X-PRE-PROCESS cmd="include" data="../mrcp_profiles/*.xml"/>
 </profiles>
</configuration>

default-tts-profile - what profile to use for client settings for contacting a TTS server when one isn't specified.
default-asr-profile - what profile to use for client settings for contacting an ASR server when one isn't specified.
log-level - log level for UniMRCP client library, valid values: EMERGENCY|ALERT|CRITICAL|ERROR|WARNING|INFO|DEBUG.
max-connection-count - max number of MRCPv2 connections to manage.
offer-new-connection - no idea... need to check UniMRCP documentation

mrcp_profiles

An MRCP profile allows you to define a configuration for a specific MRCP server. This allows you to integrate different types of MRCP servers with FreeSWITCH. Each profile defines the MRCP version to use, the client and server addresses and ports, codec preferences, and any default parameters to send to the server.

Default MRCP parameters for SPEAK and RECOGNIZE are set in synthparams and recogparams, respectively. See the appropriate RFC and your MRCP server documentation for the available MRCP parameters that may be used.

MRCPv1 example

MRCP version 1 is usually supported by MRCP servers. It uses RTSP over a TCP connection to send requests to the server. RTP is used to send audio between the client and server.

<include>
 <profile name="mrcpserver01" version="1">
   <param name="server-ip" value="10.10.5.1"/>
   <param name="server-port" value="554"/>
   <param name="resource-location" value=""/>
   <param name="speechsynth" value="synthesizer"/>
   <param name="speechrecog" value="recognizer"/>
   <param name="rtp-ip" value="10.10.5.2"/>
   <param name="rtp-port-min" value="4000"/>
   <param name="rtp-port-max" value="5000"/>
   <param name="codecs" value="PCMU PCMA L16/96/8000"/>
   <synthparams>
   </synthparams>
   <recogparams>
       <param name="start-input-timers" value="false"/>
   </recogparams>
 </profile>
</include>

server-ip - IP address of MRCP server
server-port - RTSP port. Possibly 554 or 1554. Check your MRCP server documentation.
resource-location - "media" or blank
speechsynth - "speechsynthesizer" or "synthesizer"
speechrecog - "speechrecognizer" or "recognizer"
rtp-ext-ip - External IP address for client RTP
rtp-ip - IP address for client RTP
rtp-port-min - Start of RTP port range
rtp-port-max - End of RTP port range
playout-delay -
max-playout-delay -
ptime - ptime to negotiate with MRCP server
codecs - codec negotiation preference. Server will probably support PCMU, PCMA, or L16

MRCPv2 example

MRCP version 2 is currently a draft standard, but is supported by some MRCP servers like Nuance Speech Server. MRCPv2 uses SIP to manage sessions and its own protocol to send speech requests. RTP is used to send audio between the client and server.

<include>
 <profile name="mrcpserver02" version="2">
   <param name="client-ip" value="10.10.5.2"/>
   <param name="client-port" value="5090"/>
   <param name="server-ip" value="10.5.5.152"/>
   <param name="server-port" value="5060"/>
   <param name="sip-transport" value="udp"/>
   <param name="rtp-ip" value="10.10.5.2"/>
   <param name="rtp-port-min" value="4000"/>
   <param name="rtp-port-max" value="5000"/>
   <param name="codecs" value="PCMU PCMA L16/96/8000"/>
   <synthparams>
   </synthparams>
   <recogparams>
       <param name="start-input-timers" value="false"/>
   </recogparams>
 </profile>
</include>

client-ext-ip - External SIP IP address of MRCP client
client-ip - SIP IP address of MRCP client
client-port - SIP port of MRCP client (make sure it doesn't conflict with conf/sip_profiles)
server-ip - SIP IP address of MRCP server
server-port - SIP port of MRCP server
force-destination -
sip-transport - "udp" or "tcp"
ua-name - User agent name
sdp-origin -
rtp-ext-ip - External IP address for client RTP
rtp-ip - IP address for client RTP
rtp-port-min - Start of RTP port range
rtp-port-max - End of RTP port range
playout-delay -
max-playout-delay -
ptime - ptime to negotiate with MRCP server
codecs - codec negotiation preference. Server will probably support PCMU, PCMA, or L16

Sample profiles

The FreeSWITCH trunk contains conf/mrcp_profiles for various MRCP servers. A few of those profiles are described below:

Speech Technology Center, VoiceNavigator

<include>
 <profile name="stc-vn-mrcp1" version="1">
   <param name="server-ip" value="10.5.5.152"/>
   <param name="server-port" value="8000"/>
   <param name="resource-location" value=""/>
   <param name="speechsynth" value="tts"/>
   <param name="speechrecog" value="asr"/>
   <param name="rtp-ip" value="10.10.5.2"/>
   <param name="rtp-port-min" value="32768"/>
   <param name="rtp-port-max" value="33268"/>
   <param name="codecs" value="PCMU PCMA L16/96/8000"/>
 </profile>
</include>

server-ip - IP address of MRCP server
server-port - RTSP port. 8000
resource-location - "media" or blank
speechsynth - "tts"
speechrecog - "asr"
rtp-ext-ip -
rtp-ip - IP address for client RTP
rtp-port-min - "32768"
rtp-port-max - "33268"
playout-delay -
max-playout-delay -
ptime - ptime to negotiate with MRCP server
codecs - codec preference

Voxeo Prophecy 8.0

<include>
 <profile name="voxeo-prophecy8.0-mrcp1" version="1">
   <param name="server-ip" value="99.185.85.31"/>
   <param name="server-port" value="554"/>
   <param name="resource-location" value=""/>
   <param name="speechsynth" value="synthesizer"/>
   <param name="speechrecog" value="recognizer"/>
   <param name="rtp-ip" value="10.10.5.2"/>
   <param name="rtp-port-min" value="4000"/>
   <param name="rtp-port-max" value="5000"/>
   <param name="codecs" value="PCMU PCMA L16/96/8000"/>
 </profile>
</include>

server-ip - IP address of MRCP server
server-port - RTSP port. 4900 or 554
resource-location - "media" or blank
speechsynth - "speechsynthesizer" or "synthesizer"
speechrecog - "speechrecognizer" or "recognizer"
rtp-ext-ip -
rtp-ip - IP address for client RTP
rtp-port-min -
rtp-port-max -
playout-delay -
max-playout-delay -
ptime - ptime to negotiate with MRCP server
codecs - codec preference

Nuance Speech Server 5.0

<include>
 <profile name="nuance5-mrcp2" version="2">
   <param name="client-ip" value="auto"/>
   <param name="client-port" value="5090"/>
   <param name="server-ip" value="10.5.5.152"/>
   <param name="server-port" value="5060"/>
   <param name="sip-transport" value="udp"/>
   <param name="rtp-ip" value="auto"/>
   <param name="rtp-port-min" value="4000"/>
   <param name="rtp-port-max" value="5000"/>
   <param name="codecs" value="PCMU PCMA L16/96/8000"/>
 </profile>
</include>

client-ext-ip -
client-ip - IP address of MRCP client
client-port - SIP port of MRCP client (make sure it doesn't conflict with conf/sip_profiles)
server-ip - IP address of MRCP server
server-port - SIP port of MRCP server
force-destination -
sip-transport - "udp" or "tcp"
ua-name - User agent name
sdp-origin -
rtp-ext-ip -
rtp-ip - IP address for client RTP
rtp-port-min -
rtp-port-max -
playout-delay -
max-playout-delay -
ptime - ptime to negotiate with MRCP server
codecs - codec preference

Loquendo Suite 7.0

<include>
 <profile name="loquendo-7-mrcp2" version="2">
   <param name="client-ip" value="auto"/>
   <param name="client-port" value="5090"/>
   <param name="server-ip" value="127.0.0.1"/>
   <param name="server-port" value="5060"/>
   <param name="sip-transport" value="udp"/>
   <param name="rtp-ip" value="10.10.5.2"/>
   <param name="rtp-port-min" value="4000"/>
   <param name="rtp-port-max" value="5000"/>
   <param name="codecs" value="PCMU PCMA L16/96/8000"/>
   <param name="jsgf-mime-type" value="application/jsgf"/>
 </profile>
 </include>

client-ip - IP address of MRCP client
client-port - SIP port of MRCP client (make sure it doesn't conflict with conf/sip_profiles)
server-ip - IP address of MRCP server
server-port - SIP port of MRCP server (this defaults to 5060 in the Loquendo config, which may conflict with FS)
sip-transport - "udp" or "tcp"
rtp-ip - IP address for client RTP
rtp-port-min -
rtp-port-max -
codecs - codec preference

IVONA Telecom 1.6 MRCPv1

<include>
   <profile name="ivona16-mrcp1" version="1">
     <param name="server-ip" value="10.2.7.214"/>
     <param name="server-port" value="4900"/>
     <param name="resource-location" value="media"/>
     <param name="speechsynth" value="speechsynthesizer"/>
     <param name="speechrecog" value="speechrecognizer"/>
     <param name="rtp-ip" value="10.10.5.2"/>
     <param name="rtp-port-min" value="4000"/>
     <param name="rtp-port-max" value="5000"/>
     <param name="rtcp" value="1"/>
     <param name="rtcp-bye" value="2"/>
     <param name="rtcp-tx-interval" value="5000"/>
     <param name="rtcp-rx-resolution" value="1000"/>
     <param name="playout-delay" value="100"/>
     <param name="max-playout-delay" value="200"/>
     <param name="ptime" value="20"/>
     <param name="codecs" value="PCMA"/>
   </profile>
 </include>

Lumenvox

<include>
 <profile name="lumenvox" version="2">
  <param name="client-ip" value="FREESWITCH_IP"/>
  <param name="client-port" value="25097"/>
  <param name="server-ip" value="LUMENVOX_IP"/>
  <param name="server-port" value="5060"/>
  <param name="sip-transport" value="udp"/>
  <param name="rtp-ip" value="FREESWITCH_IP"/>
  <param name="rtp-port-min" value="28000"/>
  <param name="rtp-port-max" value="29000"/>
  <param name="codecs" value="PCMU PCMA L16/96/8000 telephone-event/101/8000"/>
  <synthparams>
  </synthparams>
  <recogparams>
      <param name="start-input-timers" value="false"/>
  </recogparams>
 </profile>
</include>

Never use any "auto" values for IP addresses in the configuration.

Note: For Lumenvox, if you want to run CPA, you need to provide the builtin something like this:

detect_speech unimrcp builtin:special/cpa lumenvox

You can use the following extension for testing:

<extension name="unimrcp">
<condition field="destination_number" expression="^5$">
  <action application="answer"/>
  <action application="speak" data="unimrcp:lumenvox||FreeSWITCH is awesome"/>
  <action application="sleep" data="500"/>
</condition>
</extension>

UniMRCP Server

This is for the flite plugin of unimrcpserver (it assumes that unimrcpserver runs on same server).
The media engine/rtpfactory profile of the unimrcpserver should use the same L16/96/8000 codec.
You will need r1027 of unimcrp. The unimrcp flite plugin will use only voice "awb" and doesn't support SSML or prosody markers

 <include>
 <profile name="unimrcpserver-mrcp2" version="2">
   <param name="server-ip" value="auto"/>
   <param name="server-port" value="8070"/>
   <param name="resource-location" value=""/>
   <param name="client-ip" value="10.10.5.2"/>
   <param name="client-port" value="5090"/>
   <param name="sip-transport" value="tcp"/>
   <param name="ua-name" value="Freeswitch"/>
   <param name="sdp-origin" value="Freeswitch"/>
   <param name="rtp-ip" value="10.10.5.2"/>
   <param name="rtp-port-min" value="4000"/>
   <param name="rtp-port-max" value="5000"/>
   <param name="codecs" value="L16/96/8000"/>
   <param name="speechsynth" value="flite"/>
 </profile>
 </include>

Hint: It's crucial that you provide the correct value for "speechsynth" to make it work! It MUST be the same as specified in unimrcpserver.xml! Otherwise you'll run into issues like not beeing able to return to dialplan: http://code.google.com/p/unimrcp/issues/detail?id=94

Using From FreeSWITCH

When specifying the TTS/ASR engine to use in FreeSWITCH, use

unimrcp:profile_name

Alternatively, you may use:

unimrcp

and the default profile specified in unimrcp.conf.xml will be selected.

TTS

mod_unimrcp supports plain text and Speech Synthesis Markup Language (SSML). TTS can be sent using either speak or playback (if prefixed with say:unimrcp:[optional voice]:<TTS text>).

Lua

session:set_tts_parms("unimrcp:nuance5-mrcp2", "Donna");
session:speak("{speech-language=en-US,prosody-rate=slow}Hello, from FreeSWITCH.");

Perl

$session->set_tts_parms("unimrcp:nuance5-mrcp2", "Donna");
$session->speak("{speech-language=en-US,prosody-rate=slow}Hello, from FreeSWITCH.");

With SSML added

$session->set_tts_parms("unimrcp:nuance5-mrcp2", "Donna");
$session->speak("<?xml >Hello, <emphasis level='strong'>John</emphasis> how are you?</>");

JavaScript

session.speak("unimrcp:nuance5-mrcp2", "Donna", 
"{speech-language=en-US,prosody-rate=slow}Hello, from FreeSWITCH.");

The string after the : is the desired profile from your config.

C/C++

switch_ivr_play_file(session, fh, 
"say:unimrcp:Donna:{speech-language=en-US,prosody-rate=slow}Hello, from FreeSWITCH.", args);

switch_ivr_play_file(session, fh, 
"say:unimrcp:Donna:{speech-language=en-US,prosody-rate=slow}Hello, from FreeSWITCH.", args);

Dialplan XML

<extension name="unimrcp">
  <condition field="destination_number" expression="^5$">
    <action application="answer"/>
    <action application="set" data="tts_engine=unimrcp:unimrcpserver-mrcp2"/>
    <action application="set" data="tts_voice=awb"/>
    <action application="speak" data="This is our English text-to-speech system"/>
    <action application="sleep" data="500"/>
  </condition>
</extension>

ASR Parameters

MRCP parameters are passed directly to the MRCP server in the RECOGNIZE request.

!!! The following parameters are mod_unimrcp-specific and are not passed to the MRCP server:

start-recognize: (true|false) If false, loading grammars will not automatically start recognition. Default is true.
define-grammar: (true|false) If true, DEFINE-GRAMMAR request is sent prior to RECOGNIZE. Default is false.
name: Can be used to set grammar name in mod_dptools: play_and_detect_speech.

ASR Grammars

ASR grammars may be inline, in a local file, referenced by a URL, built-in, or cached.

mod_unimrcp supports the following grammar formats for inline and local file grammars:

Speech Recognition Grammar Specification (SRGS) - XML or ABNF notation
Java Speech Grammar Format (JSGF)
Nuance Grammar Specification Language (GSL)

Your MRCP server will most likely only support some of these formats - check your MRCP server documentation.

Inline Grammars

Inline grammars are prefixed with "inline:". Inline grammars are cached on the server with DEFINE-GRAMMAR. Subsequent pause/resume calls will reference the grammar by its name.

Grammars in local files

Grammars stored in local files will be loaded and cached on the server with DEFINE-GRAMMAR. All local files must end in the ".gram" extension. If the local files are in the freeswitch/grammars directory, they can be referenced in by file name without the extension. If the file is located in a different directory, the entire path must be specified. For example, /usr/local/freeswitch/grammars/yesno.gram can be referenced as "yesno" or as "/usr/local/freeswitch/grammars/yesno", but not as "yesno.gram" or "/usr/local/freeswitch/grammars/yesno.gram".

Built-in grammars

Built-in grammars are prefixed with "builtin:"

Cached grammars

Cached grammars are referenced by name prefixed with "session:"

Grammar URL

Only http:// and file:// grammar URLs are currently supported.

JavaScript

var asr = new SpeechDetect(session, "unimrcp");

C/C++

Load grammar

To detect speech the first time during a call:

switch_ivr_detect_speech(session, "unimrcp", "yesno", "yesno-name", "", NULL);

The channel will hold onto the ASR resource for the duration of the call, or until switch_ivr_stop_detect_speech() is called.

For subsequent detects with the same ASR resource:

switch_ivr_detect_speech_load_grammar(session, "yesno", "yesno-name");

ASR parameters

To set specific ASR parameters, add these params inside {} before the grammar, like in the example below.

switch_ivr_detect_speech(session, "unimrcp", "{variable1=val1,variable2=val2}yesno", "yesno-name", "", NULL);

Start timers

By default, the MRCP server will start timers in the RECOGNIZE request. If you specify start-input-timers=false, then the timers will not start until START-INPUT-TIMERS is sent to the server. This allows the client to notify the server when the prompt has finished and the input timers should start.

FreeSWITCH does not call this function in any higher-level interface, so you must create (but don't open) the switch_asr_handle_t, ah, before calling switch_ivr_detect_speech() and pass that handle to the function. Then, you can use that handle to call switch_core_asr_start_timers().

switch_asr_handle_t *ah;
ah = switch_core_session_alloc(session, sizeof(switch_asr_handle_t));
switch_ivr_detect_speech(session, "unimrcp", "{start-input-timers=false}yesno", "yesno-name", "", ah);
... play some prompts ...
... start of input or prompt has finished ...
switch_core_asr_start_timers(ah);

Pause/Resume

Once speech recognition has started, it can be paused and resumed. Pause is useful, because it allows you to stop speech recognition and still keep the resource.

To pause speech recognition:

switch_ivr_pause_detect_speech(session);

To resume speech recognition with the same grammar:

switch_ivr_resume_detect_speech(session);

To resume speech recognition with a different grammar:

switch_ivr_detect_speech_load_grammar(session, "places", "places-name");

mod_unimrcp will stop any executing speech recognition if switch_ivr_detect_speech_load_grammar() is called. The new grammar is then loaded and speech recognition on the new grammar is started.

Attachments:

Mod_openmrcp.gif (image/gif)

Comments:

Where can I find all of built-in grammars ?grammar/booleangrammar/phone Posted by livem at Jan 16, 2019 06:45
with the high level chart below "ToC", there are connections in different and shape, so the red line, blue dash line and dots have different meaning? I have configured this module, it is great, but it is a bit complicated, so, I guess the connections are figured with some points, right? Posted by hain at Mar 09, 2019 14:46
I do not know. I guess that red means media and blue dashes mean control signals? I have never used MRCP. Posted by boteman at Mar 19, 2019 14:32

mod_unimrcp

About​

Compatibility​

Building​

Configuration​

Example Configuration​

unimrcp.conf.xml​

mrcp_profiles​

MRCPv1 example​

MRCPv2 example​

Sample profiles​

Speech Technology Center, VoiceNavigator​

Voxeo Prophecy 8.0​

Nuance Speech Server 5.0​

Loquendo Suite 7.0​

IVONA Telecom 1.6 MRCPv1​

Lumenvox​

UniMRCP Server​

Using From FreeSWITCH​

TTS​

Lua​

Perl​

JavaScript​

C/C++​

Dialplan XML​

ASR Parameters​

ASR Grammars​

Inline Grammars​

Grammars in local files​

Built-in grammars​

Cached grammars​

Grammar URL​

JavaScript​

C/C++​

Load grammar​

ASR parameters​

Start timers​

Pause/Resume​

See Also​

Attachments:​

Comments:​

About

Compatibility

Building

Configuration

Example Configuration

unimrcp.conf.xml

mrcp_profiles

MRCPv1 example

MRCPv2 example

Sample profiles

Speech Technology Center, VoiceNavigator

Voxeo Prophecy 8.0

Nuance Speech Server 5.0

Loquendo Suite 7.0

IVONA Telecom 1.6 MRCPv1

Lumenvox

UniMRCP Server

Using From FreeSWITCH

TTS

Lua

Perl

JavaScript

C/C++

Dialplan XML

ASR Parameters

ASR Grammars

Inline Grammars

Grammars in local files

Built-in grammars

Cached grammars

Grammar URL

JavaScript

C/C++

Load grammar

ASR parameters

Start timers

Pause/Resume

See Also

Attachments:

Comments: