Events Training Consulting Newsletters Webcasts Blogs
Subscriptions
Current Issue
Past Issues
Join Our Mailing List
Contact Us
Home
 
 
 

 


TechEncyclopedia

Network Based Media Servers: The Next Generation

IP has redefined the mechanics of application media service - moving voice, speech, and other forms of signal processing out into the cloud, into specialized, independently scaling servers, behind open interfaces. Easier and faster application building is the short-term goal. A new set of services may be the ultimate payoff.

By Bill Michael

print this article print this article
email this article e-mail this article
.

The Hottest Contact Center Trends& and Keeping Up With Them
Nuance Releases Recognizer V9
CosmoCom Adds "Unified Customer Communications" To Enterprise UC
Dialogic Announces Microsoft Certification
Weather the Storm
Presence: What It Means for the Call Center
Symon Enhances Enterprise Server
Sarbanes-Oxley vs. Hosting?
Dialogic Is Back
DIRECTV Partners With Speech IVR Provider West Corporation
.

04/05/2001, 1:30 PM ET

DSP- (and CPU-) based media processing is a necessary feature of many convergent communications apps. Where does the power for this processing - software and hardware - come from? Where and how is it best performed?

This is a non-trivial issue: technically challenging, subject to intense debate - and it inflects issues of reliability and scalability germane to every aspect of new network buildout. To understand the engineering issues involved, it helps to divide media processing into two general classes.

Low-level processing - DTMF recognition/generation, audio call progress analysis, etc., required to make convergent voice apps interface cleanly with the PSTN - is typically performed during every call. So there's little dispute that these functions should be placed in the media gateway (or PBX or other edge-network terminal device) which also runs primary codecs, echo cancellation, noise reduction, and other inline functions.

Application-level media processing - speech recognition, text-to-speech, conference mixing, etc. - is another issue. Where this stuff should live is presently under dispute. Both classic CT and first-generation converged-service architectures posit unbundling applications in tandem with application-level media service, and housing them on a separate "application server," connected across separate call-paths to the gateway. In next-gen architectures, this server lives on the IP side, and IP packet streams are directed to it by a softswitch. Isolating the application from heterogeneous transport mechanics is overwhelmingly agreed upon.

The emerging question is whether the application should be further isolated from other underlying processes, most especially media service. There are numerous arguments in favor of unbundling media service from core application processing. Scalability and throughput is one: Not all calls through an application make identical demands on media processing, so isolating media processing from core application state-machine work would allow more efficient use of media processing horsepower and application CPU time. Perhaps even more relevant to scaling and throughput: Media processing functions such as speech recognition and higher-order "dialogue module processing" (such as when a voice prompt is played and a response obtained) happen in human real-time - in computer terms, a very long time, and are intrinsically unpredictable. It makes little sense to block a light-ning fast, lightweight, deterministic core application process, while a caller tries to decide whether to press "1" or "0" and tells the kids to shut up, so she can hear. Far more sensible to make both the application process and (to some extent) the media process stateless, and let them communicate in client/server fashion.

Marketability is another interesting issue. Higher-order media processing is hard to do. Speech rec, text-to-speech, dialogue management, error trapping - all the various details of telephony user-interface engineering - are valuable, distinguishing marks of quality. It makes sense to break the bonds that tie these functions to particular hardware/software platforms and link them closely to applications, to enable them to evolve and find markets independently. This can go beyond simple - software vs. software - competition. Imagine, for example, a service bureau that back-ends media/dialogue processing engines with human operators, and offers "guaranteed accurate speech recognition" - performed by the machines, most of the time, and by humans as the occasional fallback.

Extending the vision, it's possible - and some manufacturers are creating - media processing engines that go beyond voice and into multimedia space. These products handle a variety of media processing tasks, such as transcoding, video compression, etc.

The Joys of Stupidity

What will next-gen media servers look like? A surprisingly diverse group of vendors is coming into agreement about basic characteristics. In moving from the old world to the new, one guiding principle has been the Internet's embrace of simplicity - or, to be more blunt, stupidity - in the core of the network.

The idea of locating devices that are big, powerful, and basically dumb in the network core extends to various parts of the Internet infrastructure, including backbone routers and switches, optical networking gear, and, now, high-density media servers. To fulfill this vision, in its purest form, the media server needs two key, related characteristics: It must be stateless, and it must be generic. IP makes both of these possible, while still enabling virtually any type of media processing application.

The first condition is somewhat relative. Whether or not a media server is or should be fully stateless depends on the protocols it employs and the applications it supports. There is growing consensus, though, that information about live calls, and state machine driven programming functions, belong outside the media server itself, presumably to be shared between an application server and a softswitch. While circuit-switched predecessors may have had a similar idea, a more accurate point of reference is media servers on the web, which typically communicate with applications via HTTP, a stateless protocol that uses a request-response model.

There are some good reasons why the web model can't be applied directly to voice. Yet, more and more, that's the direction in which voice services are moving. The newest protocols being used for VoIP applications incorporate elements like request-response, for example, and will in turn make for a different type of relationship between the media server and the specific applications it supports. Whereas traditional media processing platforms could, for the most part, double as standalone phone switches, current and future generation products are much more focused on performing a limited set of tasks, and maintain less intelligence about the state of calls on which they act. The results are better efficiency and a more cost-effective design.

Related to minimizing the media server's awareness of call states, is the idea that media processing is a generic, not application-specific, resource in the network.

Dave Penny, co-founder of Snowshore (Chelmsford, MA - 978-367-8400), a stealth-mode company building next-gen media servers, comments: "What we'd really like to have are true network resources; so that you don't have to have dedicated conference bridges, for example, and instead you could just have a media server with a conferencing resource built into it." It's the job of the media server, Penny says, to do the "heavy lifting" of processing as many IP media streams as possible while maintaining a low level of latency. The application server, sitting at a layer above, can request specific resources, and decide based on its needs what level of control over a call or a user session to give to the media server.


| 1 | 2 | 3 | 4 | 5 | 6 | 7 | Next Page > >

.

Free CallCenter Insider Newsletter

Your Email Address


Optional Areas of Interest
International News
Advice/Tips
Technology
Agent Development
IVR