TOC 
XCON WGC. Jennings
Internet-DraftCisco Systems, Inc.
Expires: September 20, 2004March 22, 2004

Conference State Markup Language

draft-jennings-mixer-control-00.txt

Status of this Memo

This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on September 20, 2004.

Copyright Notice

Copyright (C) The Internet Society (2004). All Rights Reserved.

Abstract

This draft is in a very early stage and has many known mistakes.

Media mixers are capable of a limited number of transformations and combinations of various media streams. This draft describes an XML document that can be used to control the state of a mixer that performs some common media manipulations.

This work is related to the work of the XCON working group but is not part of the scope of that working group. It is being discussed on the xcon@ietf.org mailing list.



Table of Contents

1.  Introduction
2.  Conventions and Definitions
3.  Requirements
4.  Describing Mixer State
4.1  General Conference Parameters
4.2  General Stream Parameters
4.3  Cascading
4.4  Audio
4.5  Video
4.6  Text
5.  Manipulating State
6.  Examples
6.1  Create an Audio Conference with One Participant
6.2  Adding a Stream
6.3  Deleting a Stream
6.4  Creating a Side Bar
6.5  Adding 5+1 Video Layout
6.6  Changing a video layout
6.7  Controlling a multicast video switcher
6.8  Setting up a Customer Agent Supervisor call
7.  Syntax
8.  IANA
9.  Security
10.  To Do
11.  Acknowledgments
§  Normative References
§  Informative References
§  Author's Address
§  Intellectual Property and Copyright Statements




 TOC 

1. Introduction

Conference mixers have many controls that change how the media is combined for each media stream in the conference. Applications need to be able to control this state on a mixer. The description of this control needs to be rich enough to allow common operations in an interoperable way; extensible, since there will always be new operations; yet explicit and clear to the user what media flow is required, so that a highly optimized implementation is possible.

The general approach is to model the various parameters that can be controlled to set up the media flow in the mixer. It is assumed that some other protocol can be used to play announcements, record, or be notified about DTMF from any stream or conference using the identifier for that stream on conference. This work only describes how to control the media mixers.

Floor control is done at a higher layer than this media control but a floor control command may cause the application controlling the mixer to send new state to the mixer indicating the change in which streams are allowed to contribute media to the mix.



 TOC 

2. Conventions and Definitions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119[1].

Definitions:

Stream A flow of media, such as RTP, to or from an end client.
Conference A set of related streams and media transformations that result in one virtual result of a single media type. This may be slightly different from the overall meeting, which may contain several media types and some sidebar mixes. This overall meeting would result in several conferences on various mixers that each took care of producing one stream.
RosterId An identifier used to correlate media streams that are somehow grouped to a common "user" or "participant" in a confernce.


 TOC 

3. Requirements

This work needs to be capable of describing the state for the media transformation required by the use cases in [3]. Modification to the state needs to be idempotent. It needs to be able to fully describe the state of a mixer other than the state associated with the current media or the signaling to set up the streams.



 TOC 

4. Describing Mixer State

The conference consists of some parameters that affect the whole conference and a set of input and output streams of various media types.

For example, the XML fragment below describes a simple audio conference with streams two two audio endpoints.

 
<conference id="conf1" 
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
               xsi:noNamespaceSchemaLocation="confControl2.xsd">
	<stream id="in1" type="audio" direction="in">
		<rosterId value="part2"/>
		<rtp port="20123" ip="1.2.3.4"/>
	</stream>
	<stream id="in2" type="audio" direction="in">
		<rosterId value="part2"/>
		<rtp port="20124" ip="1.2.3.4"/>
	</stream>
	<stream id="out1" type="audio" direction="out">
		<rosterId value="part1"/>
		<rtp port="20123" ip="1.2.3.4"/>
	</stream>
	<stream id="out2" type="audio" direction="out">
		<rosterId value="part2"/>
		<rtp port="20124" ip="1.2.3.4"/>
	</stream>
</conference>

4.1 General Conference Parameters

expectedNumStreams - This sets the number of input streams eventually expected for each media type. In some cases, such as when this is one or two, this allows the mixer to pick a substantially different optimization approach.

TODO - Is there any need for this or is it just an optimization hint.

4.2 General Stream Parameters

Each stream has a unique identifier that is chosen by the application controlling the mixer. It also has a direction and a media type.

rosterId - This identifier is used to identify the "user" or "participant" that is creating the media in an input stream and is used to correlate different streams usually of different media types. When the video is supposed to follow the active speaker, this provides a way to correlate which video stream corresponds to which speaker.

source - Identifies another conference as the source of this media stream.

rtp - Provides the port and IP of an RTP session that forms the stream.

TODO - need to generalize source in port, ip, other ways to identify a stream such as fid or even a PSTN circuit.

url - Provides the URL for non-RTP media sessions that form a stream.

priority - Provides a relative priority of this stream for being included in the mix. The mix SHOULD not add any streams to the mix that are of a lower priority than some stream it has chosen to ignore from the mix. A priority of 1.0 means the mixer should make all efforts to include this media while a priority of 0.0 indicates that the media should only be included as a last resort.

4.3 Cascading

A conference can specify that one of it's input streams actually comes from some named output stream of another conference.

TODO - deal with issues of "gain" for audio, video, text, application.

4.4 Audio

gain - this applies a gain to the stream as it comes in or out of the mix. The gains are specified in dB; a value of -1000 is used to mean the same as negative infinity.

muteOnDtmf - Indicates that the stream should be muted on any input DTMF to the participant of this stream.

automaticGainControl - Automatic gain control :-)

echoCancelation - you guessed it

annoucementGain - specifies a gain for announcements played to the particular stream or played to the whole conference that this stream participates in.

supppresionInAnouncement - specifies a gain in dB for how much the main conference mix is suppressed during an announcement. A suppression of -1000dB would effectively mute the conference mix.

4.5 Video

freeze - indicates the image should be frozen at the current image

blank - indicates that a black image or logo should replace the image

overlay - specifies some text to render over the video image.

layout - specifies a grid for layout of a video composition output stream. Contains several cell elements. Each cell covers some rectangular region in the grid and contains one of the video input streams. This allows only fairly simple layout control. If more complex control is required, an approach such as SMIL[4] could be used.

Inside of each layout are cell tags. Each cell identifies a retangular space in the grid and the video stream inside it. The streams to be placed in a cell can be statically identified by the stream label or indirectly identified by referring to a location from which to receive stream labels. Two common forms of this are "active speaker" and "previous speaker" which use the audio to pick the corresponding video stream.

4.6 Text



 TOC 

5. Manipulating State

State is manipulated over some transport protocol that can send XML documents and receive a response of whether that worked or not. The response may also include an XML document. The protocol may also support an asynchronous send of an XML document from the mixer to an application that had previously sent it a document.

This can be mapped to a SIP PUBLISH to send and change information and use of a SUB/NOT to querry state and find out about asyncronous changes.

New conferences and streams are added by sending a document that has a conference ID that is new. Conferences and streams are deleted by sending an XML tag identifying that object and having it not contain any child elements. An item can be changed by sending just the portion of the XML document that contains all the changed elements.

There are two ways updating state can not work. One is that the request may be malformed or unauthorized - this is referred to as failure and the protocol used to transport the request needs to indicate a failure in the response to the request. The other way something can not work is that the the mixer may not have exactly matched the requested operation and may have approximated it in some way. For example, an audio gain of 3.2 dB may have been requested but the mixer may have only been able to do multiples of 3 dB and may have therefore rounded this to 3.0 dB. In this case the actual state of the confernce is returned in an XML body in the response. The returned XML contains the actual state acheived (3.0 in this case) instead of the state requested (3.3 in this case).



 TOC 

6. Examples

6.1 Create an Audio Conference with One Participant

 
<?xml version="1.0" encoding="UTF-8"?>
<conference id="conf1234" 
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
            xsi:noNamespaceSchemaLocation="confControl2.xsd">

	<!-- Set up generic parameters for the whole conference
		num audio stream of 2 is the only interesting value  -->
	<expectedNumStreams value="10" type="audio"/>

	<!-- Streams our input or output and have a media type -->
	<!-- All id are allocated by applications that control the state-->
	<stream id="in123" type="audio" direction="in">

		<!-- rosterId is used to correlate audio and video switching -->
		<rosterId value="part12"/>

		<!-- The rtp identifies the source of this stream as RTP -->
		<rtp port="20123" ip="1.2.3.4"/>

		<!-- Gain is in dB  and -1000 means mute so the
                     stream is not being contributed to the mix -->
		<gain value="-1000.0"/>

		<!-- Hint that may help in speaker selection. 
                        All higher priority active speakers are used 
                        before lower priority ranges from 0 to 1 -->
		<priority value="1.0"/>

		<!-- Indicates any DTMF input on this stream should 
                        cause it to be muted -->
		<muteOnDtmf value="false"/>

		<!-- Not sure the DTM patterns are needed -->
		<!-- <muteOnDtmf pattern="#*5"/> -->
		<automaticGainControl value="false"/>

		<echoCancelation value="false"/>
	</stream>	
	<stream id="out123" type="audio" direction="out">
		<rosterId value="part12"/>
		<rtp port="20123" ip="1.2.3.4"/>

		<!-- gain is -1000 db so not hearing conference 
                        but would hear announcements since stream 
                        is in the conference -->
		<gain value="-1000.0"/>

		<!-- gain on announcements to whole conference -->
		<announcementGain value="0.0"/>

		<!-- suppression multiplier of main conference when 
                        announcement is playing -->
		<announcementSuppressionGain value="-1000.0"/>
	</stream>
</conference>

6.2 Adding a Stream

This example adds one more input and one more output stream.

 
<?xml version="1.0" encoding="UTF-8"?>
<conference id="conf1234" 
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
               xsi:noNamespaceSchemaLocation="confControl2.xsd">
        <stream id="in2" type="audio" direction="in">
                <rosterId value="part2"/>
                <rtp port="20126" ip="1.2.3.4"/>
        </stream> 
        <stream id="out2" type="audio" direction="out">
                <rosterId value="part2"/>
                <rtp port="20126" ip="1.2.3.4"/>
        </stream>
</conference>

6.3 Deleting a Stream

 
<?xml version="1.0" encoding="UTF-8"?>
<conference id="conf1234" 
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
            xsi:noNamespaceSchemaLocation="Control2.xsd">
	<stream id="in124" />
</conference>

6.4 Creating a Side Bar

 
<?xml version="1.0" encoding="UTF-8"?>

<conference id="conf1" 
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
               xsi:noNamespaceSchemaLocation="confControl2.xsd">

	<stream id="in123">
	</stream>
	<stream id="out123">
	</stream>
</conference>

<conference id="conf2" 
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
               xsi:noNamespaceSchemaLocation="confControl2.xsd">

	<stream id="in123" type="audio" direction="in">
		<rosterId value="part12"/>
		<rtp port="20123" ip="1.2.3.4"/>
	</stream>
	<stream id="out123" type="audio" direction="out">
		<rosterId value="part34"/>
		<rtp port="20123" ip="1.2.3.4"/>
	</stream>

	<!--The following stream is an input to a sidebar from 
            a main conference -->
	<stream id="in125" type="audio" direction="in">
		<!-- This indicates the source is another conference 
                        there is no loop detection -->
		<source conference="conf1"/>
		<gain value="-3.0"/>
	</stream>
</conference>

6.5 Adding 5+1 Video Layout

 
<?xml version="1.0" encoding="UTF-8"?>
<conference id="conf1234" 
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
               xsi:noNamespaceSchemaLocation="confControl2.xsd">
	<expectedNumStreams value="5" type="video"/>
	<stream id="in127" type="video" direction="in">
		<rosterId value="part12"/>
		<rtp port="30123" ip="1.2.3.4"/>
		<freeze value="false"/>
		<blank value="false"/>
		<priority value="0.8"/>
		<!-- The overlay value may be rendered over 
                        the image for this user -->
		<overlay>Bob in Atlanta </overlay>
	</stream>
	<stream id="out004" type="video" direction="out">
		<!-- This video mix is sent to several RTP streams-->
		<rtp port="30123" ip="1.2.3.4"/>
		<rtp port="30129" ip="1.2.3.4"/>
		<rtp port="30166" ip="1.2.3.4"/>
		<!-- The video layout will on a 3 by 3 grid.
                        This example does the classic 5+1 layout -->
		<layout row="3" cols="3">
			<!-- The main window has the active speaker -->
			<cell rowPosition="1" colPostion="1" 
                              rowSize="2" colSize="2" 
                              source="activeSpeaker" 
                              aspect="clip"/>
			<cell rowPosition="1" colPostion="3" 
                              source="previousSpeaker" 
                              aspect="clip"/>
			<cell rowPosition="3" colPostion="1" streamId="in127"/>
			<!-- a named stream is put in this cell -->
			<cell rowPosition="3" colPostion="2" streamId="in129"/>
		</layout>
	</stream>
</conference>

6.6 Changing a video layout

6.7 Controlling a multicast video switcher

6.8 Setting up a Customer Agent Supervisor call



 TOC 

7. Syntax

TODO - several types in here could be improved.

 
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
            elementFormDefault="qualified">
	<xs:element name="announcementGain">
		<xs:complexType>
			<xs:attribute name="value" type="xs:float" use="required"/>
		</xs:complexType>
	</xs:element>
	<xs:element name="announcementSuppressionGain">
		<xs:complexType>
			<xs:attribute name="value" type="xs:float" use="required"/>
		</xs:complexType>
	</xs:element>
	<xs:element name="automaticGainControl">
		<xs:complexType>
			<xs:attribute name="value" type="xs:boolean" use="required"/>
		</xs:complexType>
	</xs:element>
	<xs:element name="blank">
		<xs:complexType>
			<xs:attribute name="value" type="xs:boolean" 
                            use="optional" default="true"/>
		</xs:complexType>
	</xs:element>
	<xs:element name="cell">
           <xs:complexType>
		<xs:attribute name="rowPosition" type="xs:positiveInteger" 
                       use="required"/>
		<xs:attribute name="colPostion" type="xs:positiveInteger" 
                          use="required"/>
		<xs:attribute name="rowSize" type="xs:positiveInteger" 
                        use="optional" default="1"/>
		<xs:attribute name="colSize" type="xs:positiveInteger" 
                        use="optional" default="1"/>
                <xs:attribute name="source" use="optional">
                <xs:simpleType>
                    <xs:restriction base="xs:NMTOKEN">
                        <xs:enumeration value="activeSpeaker"/>
                        <xs:enumeration value="previousSpeaker"/>
                    </xs:restriction>
                </xs:simpleType>
                </xs:attribute>
                <xs:attribute name="aspect" 
                               use="optional" default="auto">
                <xs:simpleType>
                    <xs:restriction base="xs:NMTOKEN">
                        <xs:enumeration value="auto"/>
                        <xs:enumeration value="warp"/>
                        <xs:enumeration value="clip"/>
                        <xs:enumeration value="fill"/>
                        <xs:enumeration value="wideClipTallFill"/>
                        <xs:enumeration value="wideFillTallClip"/>
                     </xs:restriction>
                </xs:simpleType>
                </xs:attribute>
                  <xs:attribute name="streamId" use="optional">
                     <xs:simpleType>
                          <xs:restriction base="xs:NMTOKEN"/>
                     </xs:simpleType>
                </xs:attribute>
            </xs:complexType>
        </xs:element>
        <xs:element name="conference" type="conferenceType"/>
        <xs:element name="echoCancelation">
                <xs:complexType>
                        <xs:attribute name="value" type="xs:boolean" 
                         use="optional" default="true"/>
                </xs:complexType>
        </xs:element>
        <xs:element name="freeze">
                <xs:complexType>
                        <xs:attribute name="value" type="xs:boolean" 
                        use="optional" default="true"/>
                </xs:complexType>
        </xs:element>
        <xs:element name="gain">
                <xs:complexType>
                        <xs:attribute name="value" type="xs:float" 
                             use="required"/>
                </xs:complexType>
        </xs:element>
        <xs:element name="layout">
           <xs:complexType>
             <xs:sequence>
               <xs:element ref="cell" minOccurs="0" 
                      maxOccurs="unbounded"/>
             </xs:sequence>
           <xs:attribute name="row" type="xs:positiveInteger" 
                            use="required"/>
           <xs:attribute name="cols" type="xs:positiveInteger" 
                            use="required"/>
           </xs:complexType>
	</xs:element>
	<xs:element name="msrp">
		<xs:complexType>
			<xs:attribute name="url" type="xs:anyURI" use="required"/>
		</xs:complexType>
	</xs:element>
	<xs:element name="muteOnDtmf">
		<xs:complexType>
			<xs:attribute name="value" type="xs:boolean" 
                               use="optional" default="true"/>
			<xs:attribute name="pattern" 
                                  type="xs:string" use="optional" 
                                  default="123456789*#abcdef"/>
		</xs:complexType>
	</xs:element>
	<xs:element name="expectedNumStreams">
		<xs:complexType>
			<xs:attribute name="value" type="xs:nonNegativeInteger" 
                        use="required"/>
			<xs:attribute name="type" type="xs:string" use="required"/>
		</xs:complexType>
	</xs:element>
	<xs:element name="overlay" type="xs:string"/>
	<xs:element name="rosterId">
		<xs:complexType>
			<xs:attribute name="value" use="required">
				<xs:simpleType>
					<xs:restriction base="xs:NMTOKEN"/>
				</xs:simpleType>
			</xs:attribute>
		</xs:complexType>
	</xs:element>
	<xs:element name="priority">
		<xs:complexType>
			<xs:attribute name="value" use="required">
				<xs:simpleType>
					<xs:restriction base="xs:float">
						<xs:minInclusive value="0.0"/>
						<xs:maxInclusive value="1.0"/>
					</xs:restriction>
				</xs:simpleType>
			</xs:attribute>
		</xs:complexType>
	</xs:element>
	<xs:element name="rtp">
		<xs:complexType>
			<xs:attribute name="port" type="xs:integer" use="required"/>
			<xs:attribute name="ip" use="optional">
				<xs:simpleType>
					<xs:restriction base="xs:string">
						<xs:whiteSpace value="collapse"/>
					</xs:restriction>
				</xs:simpleType>
			</xs:attribute>
		</xs:complexType>
	</xs:element>
	<xs:element name="source">
		<xs:complexType>
		   <xs:attribute name="conference" type="xs:NMTOKEN" 
                                    use="required"/>
		   <xs:attribute name="stream" type="xs:NMTOKEN" 
                                    use="optional"/>
		</xs:complexType>
	</xs:element>
	<xs:element name="stream">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="rosterId" minOccurs="0"/>
				<xs:element ref="source" minOccurs="0"/>
				<xs:element ref="rtp" minOccurs="0" maxOccurs="unbounded"/>
				<xs:element ref="msrp" minOccurs="0"/>
				<xs:element ref="gain" minOccurs="0"/>
				<xs:element ref="freeze" minOccurs="0"/>
				<xs:element ref="blank" minOccurs="0"/>
				<xs:element ref="priority" minOccurs="0"/>
				<xs:element ref="muteOnDtmf" minOccurs="0" maxOccurs="1"/>
				<xs:element ref="automaticGainControl" minOccurs="0"/>
				<xs:element ref="echoCancelation" minOccurs="0"/>
				<xs:element ref="announcementGain" minOccurs="0"/>
				<xs:element ref="announcementSuppressionGain" minOccurs="0"/>
				<xs:element ref="overlay" minOccurs="0"/>
				<xs:element ref="layout" minOccurs="0"/>
			</xs:sequence>
			<xs:attribute name="id" use="required">
				<xs:simpleType>
					<xs:restriction base="xs:NMTOKEN"/>
				</xs:simpleType>
			</xs:attribute>
			<xs:attribute name="type" use="required">
				<xs:simpleType>
					<xs:restriction base="xs:NMTOKEN">
						<xs:enumeration value="audio"/>
						<xs:enumeration value="text"/>
						<xs:enumeration value="video"/>
						<xs:enumeration value="application"/>
						<xs:enumeration value="model"/>
						<xs:enumeration value="image"/>
					</xs:restriction>
				</xs:simpleType>
			</xs:attribute>
			<xs:attribute name="direction" use="required">
				<xs:simpleType>
					<xs:restriction base="xs:NMTOKEN">
						<xs:enumeration value="in"/>
						<xs:enumeration value="out"/>
					</xs:restriction>
				</xs:simpleType>
			</xs:attribute>
		</xs:complexType>
	</xs:element>
	<xs:complexType name="conferenceType">
		<xs:sequence>
			<xs:element ref="expectedNumStreams" 
                               minOccurs="0" maxOccurs="unbounded"/>
			<xs:element ref="stream" minOccurs="0" maxOccurs="unbounded"/>
		</xs:sequence>
		<xs:attribute name="id" type="xs:string" use="required"/>
	</xs:complexType>
</xs:schema>


 TOC 

8. IANA



 TOC 

9. Security



 TOC 

10. To Do

Fix the names spaces and add XML extensibility.



 TOC 

11. Acknowledgments



 TOC 

Normative References

[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997 (HTML, XML).
[2] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996.


 TOC 

Informative References

[3] Even, R., "Conferencing Scenarios", draft-ietf-xcon-conference-scenarios-00 (work in progress), December 2003.
[4] Ossenbruggen, J., Rutledge, L., Saccocio, B., Schmitz, P., Kate, W., Ayars, J., Bulterman, D., Cohen, A., Day, K., Hodge, E., Hoschka, P., Hyche, E., Jourdan, M., Kubota, K., Lanphier, R., Layaïda, N., Michel, T. and D. Newman, "Synchronized Multimedia Integration Language (SMIL 2.0) Specification", W3C REC REC-smil20-20010807, August 2001.
[5] Dyke, J., Burger, E. and A. Spitzer, "Media Server Control Markup Language (MSCML) and Protocol", draft-vandyke-mscml-03 (work in progress), July 2003.
[6] Melanchuk, T. and G. Sharratt, "Media Sessions Markup Language (MSML)", draft-melanchuk-sipping-msml-01 (work in progress), October 2003.
[7] Melanchuk, T. and G. Sharratt, "Media Objects Markup Language (MOML)", draft-melanchuk-sipping-moml-01 (work in progress), October 2003.


 TOC 

Author's Address

  Cullen Jennings
  Cisco Systems, Inc.
  170 West Tasman Dr.
  MS: SJC-21/3
  San Jose, CA 95134
  USA
Phone:  +1 408 527-9132
EMail:  fluffy@cisco.com


 TOC 

Intellectual Property Statement

Full Copyright Statement

Acknowledgment