XCON WG C. Jennings Internet-Draft Cisco Systems Expires: December 27, 2004 B. Rosen Marconi June 28, 2004 Media Mixer Control for XCON draft-jennings-xcon-media-control-01 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 27, 2004. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. Abstract Conference mixers have many controls that change how the media is combined for each participant in the conference. There is a need to describe these to the clients connected to the a centralized conference so that the clients can render a user interface and allow the user to manipulate them. This work is being discussed on the xcon@ietf.org mailing list. Jennings & Rosen Expires December 27, 2004 [Page 1] Internet-Draft Media Mixer Control June 2004 Table of Contents 1. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Introduction to the Problem . . . . . . . . . . . . . . . . . 4 2.1 Non Problems . . . . . . . . . . . . . . . . . . . . . . . 4 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.1 Semantic information in a Conference . . . . . . . . . . . 5 3.2 The Protocol . . . . . . . . . . . . . . . . . . . . . . . 5 3.3 Templates . . . . . . . . . . . . . . . . . . . . . . . . 5 3.4 Parameters . . . . . . . . . . . . . . . . . . . . . . . . 5 3.5 Controls . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.6 Roles . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.7 Streams . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.8 Streams Lists . . . . . . . . . . . . . . . . . . . . . . 6 4. Introductory Example . . . . . . . . . . . . . . . . . . . . . 7 4.1 Simple Audio . . . . . . . . . . . . . . . . . . . . . . . 7 4.2 Simple Video . . . . . . . . . . . . . . . . . . . . . . . 9 5. Names and terminology . . . . . . . . . . . . . . . . . . . . 10 5.1 Templates . . . . . . . . . . . . . . . . . . . . . . . . 11 5.2 Participants . . . . . . . . . . . . . . . . . . . . . . . 11 5.3 Streams . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.3.1 Stream Types . . . . . . . . . . . . . . . . . . . . . 11 5.3.2 Stream URLs . . . . . . . . . . . . . . . . . . . . . 12 5.3.3 Stream Priority . . . . . . . . . . . . . . . . . . . 12 5.4 Roles . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.5 Controls . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.6 Parameters . . . . . . . . . . . . . . . . . . . . . . . . 13 6. Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.1 Templates . . . . . . . . . . . . . . . . . . . . . . . . 14 6.1.1 Parameters . . . . . . . . . . . . . . . . . . . . . . 14 6.1.2 Roles . . . . . . . . . . . . . . . . . . . . . . . . 15 6.1.3 Streams . . . . . . . . . . . . . . . . . . . . . . . 15 6.1.4 Streams Lists . . . . . . . . . . . . . . . . . . . . 15 6.1.5 Controls . . . . . . . . . . . . . . . . . . . . . . . 16 6.1.6 Conference State . . . . . . . . . . . . . . . . . . . 16 6.1.7 Transport Protocol . . . . . . . . . . . . . . . . . . 16 6.2 Controls . . . . . . . . . . . . . . . . . . . . . . . . . 16 6.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . 16 6.2.2 Strings . . . . . . . . . . . . . . . . . . . . . . . 17 6.2.3 Integer . . . . . . . . . . . . . . . . . . . . . . . 17 6.2.4 Boolean . . . . . . . . . . . . . . . . . . . . . . . 18 6.2.5 Selection . . . . . . . . . . . . . . . . . . . . . . 18 6.2.6 Multiple Selection . . . . . . . . . . . . . . . . . . 18 6.2.7 Frame . . . . . . . . . . . . . . . . . . . . . . . . 19 7. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.1 Audio Video Presentation . . . . . . . . . . . . . . . . . 19 8. Template Registry . . . . . . . . . . . . . . . . . . . . . . 21 9. Comparison to other solutions . . . . . . . . . . . . . . . . 21 Jennings & Rosen Expires December 27, 2004 [Page 2] Internet-Draft Media Mixer Control June 2004 10. CPCP vs. MPCP vs. CCP vs. MCP . . . . . . . . . . . . . . . 21 11. IANA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 12. Security . . . . . . . . . . . . . . . . . . . . . . . . . . 21 13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . 21 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 14.1 Normative References . . . . . . . . . . . . . . . . . . . . 21 14.2 Informative References . . . . . . . . . . . . . . . . . . . 21 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 22 Intellectual Property and Copyright Statements . . . . . . . . 23 Jennings & Rosen Expires December 27, 2004 [Page 3] Internet-Draft Media Mixer Control June 2004 1. Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [1]. 2. Introduction to the Problem This work tries to solve the problem of allowing a conference participant to manipulate the media flow in a mixer. It defines a protocol between the end user's software manipulating the conference and the centralized conference mixer. This needs to be rich enough for a mixer to express what information it wants from a mixer yet simple enough to allow the client to render a useful user interface to the user. This work takes into account that real mixers have constraints on what media flows are possible and that UIs have buttons, knobs, etc that users manipulate. The goal is for a conferencing end point made by one vendor to work with mixers or conference systems made by another vendor. 2.1 Non Problems There are several topics that are completely internal to the conference systems and are out of scope for this this work. These include: How the focus manipulates the mixer. How one describes what a mixer is capable of doing. 3. Overview When a conference is created, it is instantiated from a template. The template describes what controls are available for the client to manipulate the media. The template can have parameters that are set when it is instantiated to allow one template to describe variations of similar flow models. This document describes the templates and ways for the client to understand and manipulate the media in the conference. It allows for the following: A conference consists of several participants and multiple streams of media flowing between the participant and the mixer. Sidebars are mini conferences that are just like conferences except that a sidebar cannot itself contain sidebars. Clients can discover the template chosen for use in a conference, and the Values of the parameters set for the conference Clients can discover the available streams in a conference. Clients can send media on a participant stream and receive media and receive media on a mixer stream. Jennings & Rosen Expires December 27, 2004 [Page 4] Internet-Draft Media Mixer Control June 2004 Clients can discover the Participants in a conference and their role (this is more conference policy than media policy). Clients can join a conference as a participant and assume a particular role. Conferences, Streams, and Participants can have controls that manipulate the media sent and received. The role of the participant will control what view of the conference they have and which media streams they can manipulate. 3.1 Semantic information in a Conference The conference has a list of Participants. Each Participant has a set of streams that are being contributed to the confernce and a set of stream sbeing setn to the client. Each Stream has attributes such as name, type, priority and list of contributing participants. Each of thes Stream has Controls that the user of the client program can manipulate. Each conference has a list of sidebars. Each conference has a list of Streams. 3.2 The Protocol The protocol between the client and the conference server allows the client to get the semantic information in the conference, find out when it changes, and make changes to it. It's probably something like XCAP. [TODO add ref] 3.3 Templates Templates define a model for the reception, manipulation and transmission of streams. A template provides enough information that the client can intelligently render a useful GUI to the end user to manipulate the model. There is a registry of well known templates, but a conference server can define new ones. A convener can find all the templates a conference server supports and select one to use when creating the conference. A template for a very basic audio conference, for example, may indicate that there is one audio stream for each participant, and one output mixer stream named "primary". Each participant in the stream has a single binary control called "Mute". There is only one Role that can be used, called "participant". 3.4 Parameters Parameters are variables in the template that are set when the conference is created. For example, in the audio conference, the maximum number of participants might be a parameter. If the value was set to 10 when the conference is instantiated, then up to 10 Jennings & Rosen Expires December 27, 2004 [Page 5] Internet-Draft Media Mixer Control June 2004 participant streams can be accepted into the mixer. The template can indicate the valid range for max number of participants, perhaps from 2 to 128. 3.5 Controls Controls are variables participants may manipulate to control the media streams of the conference. Conferences can have controls, participants in a conference can have controls, and streams in a conference can have controls. Controls can also be implicitly created by stream action, for example a selector control based on the loudest speaker. Controls have a name, and a value. Controls are defined in the template. 3.6 Roles Participants in a conference can take on multiple different Roles that change what controls they may manipulate and which media streams they have access to. The template defines what Roles are available for the client. Manipulation of Roles in done in CPCP. Some common roles include: Participant Presenter Moderator Observer 3.7 Streams Streams corespond to a given flow of media. They are named and can be selected by a controlls. The conference package is used to understand the relationships between users, dialog or session, and streams. 3.8 Streams Lists Lists of stream exist and form virtual streams that can also be displayed. For example there is a virtual streams called "default". This contins the default media mix for the confernce. It is a list and elements can be indexed. For example, the default[0] in a vidoe confernce would likely contains the current speaker and default[1] would contains the previous speaker. The lists become an important concept when the end system which to render media at some location only if the media is not being rendered elsewhere. There are virtual lists for media from the default mix for the confernece, each type of Role, each type of Floor, as welll as confernce templates can define new named virtual streams lists. Streams lists can be indexed by an integer that describes Jennings & Rosen Expires December 27, 2004 [Page 6] Internet-Draft Media Mixer Control June 2004 4. Introductory Example 4.1 Simple Audio TODO - have User/Dialog/Stream idea The client selects the basic audio template that looks like: TODO - cpcp is used to create conference. The client retrieves this template. This templates defines that this confernce has one Role called participant and that this role has a stream list called "defalult-audio-in" and another called "default-audio-out". Alice and Bob join this conference and the conference server tells Bob about the state of the conference media. There is only one role "participant". Each participant contributes one input stream. There is also an output stream per participant. There is a single control, called mute, for each participant. After Alice and Bob have joined, the conference server informs Bob that the current state of the conference is as shown in the xml below. TODO - move mute to gain TODO - add id to things Jennings & Rosen Expires December 27, 2004 [Page 7] Internet-Draft Media Mixer Control June 2004 10 0 0 There are two participants, Alice and Bob, who both contribute input streams and receive output streams and neither is muted. Bob's client decides to change the Mute state for its audio stream and sends the following to the conference server to change the state of the conference. 1 A key part of this is that Bob's client may have known about this basic audio template and what the semantics of the "mute" control implied. The client may have connected this up with a button of the client's that was labeled mute. On the other hand, Bob's client may Jennings & Rosen Expires December 27, 2004 [Page 8] Internet-Draft Media Mixer Control June 2004 not have known anything about this template and simply rendered a button on the screen and labeled it "mute" with no idea what this would do. A third client may not have been able to deal with the control at all and may have just ignored it. Clearly the user interface can be better if the client understands the semantics of what the template means, but the user interface is still functional when the client does not. 4.2 Simple Video A more complex video example is given below. note need - value for if stream gets bumped from mix and another value that indicates relative positioning. confernce type="video" name="" Participant name="Alice" stream type=audo dir=in name="default-audio-in" stream type=audio dir=out name=default-audio-out stream type=video dir=out name=presenter[0] stream type=video dir=out name=preentaion[0] Participan name="Bob" stream type=audio dir=in name=default-audio-in stream type=video dir=in name=default-video-in stream type=application dir=in name=defaul-presentaion-in stream type=audio dir=out name=default-audio-out stream type=video dir=out name=bob-video-out control type=selector value="3+5" group control type=streamSelector value=default-presentation[0] q=0.9 control type=streamSelector value=default-presenter[0] q=0.8 control type=streamSelector value=default-speaker[0] q=0.7 duplicate=next group control type=streamSelector value=moderator q=0.4 duplicate=next control type=streamSelector value=participants q=0.3 duplicate=next Participan name"Brian" stream type=audio dir=in name=default-audio stream type=video dir=in name=default-video Jennings & Rosen Expires December 27, 2004 [Page 9] Internet-Draft Media Mixer Control June 2004 stream type=video dir=in name=default-video-small stream type=video dir=out name=default-presentation[0] sid=s1 stream type=video dir=out name=default-presenter[0] sid=s2 stream type=video dir=out name=default-participant[0] sid=s3 stream type=video dir=out name=default-participant[1] sid=s4 stream type=video dir=out name=default-participant[sid=27] sid=s4 stream type=audio dir=out name=default-presentation[0] sid=sa1 stream type=audio dir=out name=default-presenter[0] sid=sa2 stream type=audio dir=out name=default-participant[0] sid=sa3 stream type=audio dir=out name=default-participant[1] sid=sa4 Participan name = Snoopy stream type=video dir=out name=sid27 stream type=video dir=out name=sid2 stream type=video dir=out name=sid3 In the example above, ..... TODO 5. Names and terminology A stream-id is an integer assigned by the focus to each physical input and output stream. For RTP medai, this coresponds to a single RTP session. This integer is unique to all streams in a specific conference (and all its sub-conferences). Each output stream can specify the physical or logical set of input streams which contribute to that output stream. In some cases, a stream can contain multiple components; for example, video, text, whiteboard, or application tiles or panels in a composite video output stream, or audio inputs placed logically for stereo or spatial mixing. Logical sets of streams indicate ordered lists of input streams which change dynamically and potentially very quickly during the lifetime of a conference. For example, one logical set is the set of input video streams corresponding to the current speaker or speakers. Another logical set is the set of input audio streams that correspond to the current holders of a particular floor. An exclusivity-group is a group of output streams or components which are grouped to allow control over whether the same physical input stream contributes to multiple related output streams or components. For example, Alice might choose to display the input video stream of Bob (the presenter) in one tile (component) and the input video stream which corresponds to the current speaker in another tile; Jennings & Rosen Expires December 27, 2004 [Page 10] Internet-Draft Media Mixer Control June 2004 however if Bob is the current speaker, Alice would like to see a different video stream instead in this tile. Note that different components of a single output-stream could appear in different exclusivity-groups. 5.1 Templates Templates contain a list of stream, roles for participants, parameters that need to be set, and controls for the conference. 5.2 Participants Participants are the logical user entities participating in a conference. 5.3 Streams The stream is a named stream of media. An example is a simple audio conference with 6 participants and a mixer that mixes the loudest three. Each participant contributes an input stream. There is a single logical output stream, but every participant gets a "custom" version of this stream, because, in normal mixers, each participants can hear all inputs except his own. This is commonly referred to as "mix-minus". If the output steam also has a control (mute), the output streams for each participant may also vary depending on the state of the control. Streams all have a type, a name, a direction (in or out), one or more URLs, and a priority. The URL is the source or sink of the stream. The priority indicates how important this particular stream is to the conference and the type indicates the type of media carried in this steam. Streams have types. These correspond to the major MIME types of the media the stream carries. 5.3.1 Stream Types 5.3.1.1 Audio Streams originate as participant contributions (dir="in") that are mixed using some kind of algorithm. Intermediate streams may be created, which are subsequently mixed with other streams yielding streams which are sent to participants (dir="out"). Controls commonly available on audio streams include input or output faders (volume controls), stereo balance, and mute. Jennings & Rosen Expires December 27, 2004 [Page 11] Internet-Draft Media Mixer Control June 2004 5.3.1.2 Video Streams originate as participant contributions (dir="in") that are combined with some kind of algorithm. Intermediate streams may be created, which are subsequently combined with other streams yielding streams which are sent to participants (dir="out"). Controls commonly available on video streams might include selectors for choosing a tiling format, selectors which input streams appear on output tiles, and video mutes. 5.3.1.3 Text Streams originate as participant contributions (dir="in") (Instant Messages). Messages from all participants are combined using some algorithm. Intermediate streams may be created, which are subsequently combined with other text streams yielding streams which are sent to participants (dir="out"). 5.3.1.4 Application At a minimal level, this consist of a URL that defines the application. Many systems will simply update an http URL that fetches an HTML page that shows the current presentation. 5.3.2 Stream URLs Streams have URLs that specify the source or sink of the stream. These would typically be a SIP, H323 or IM URL. 5.3.3 Stream Priority Streams have a priority from 0 to 1. Zero indicates that a client, by default, should not play/display this stream unless the user specifically requests it. A priority of 1 indicates that, by default, the client should render this stream and should warn the user if it cannot. Other values only define an ordering, and clients should attempt to use their resources to display the higher priority streams before the lower. 5.4 Roles TODO - switch Role so a participant can have several roles simultanously Roles are defined as part of Conference Policy but are used here so that the Media Policy can define separate streams and controls depending on role. Roles are defined by in the template. Some templates may allow a participant to take on more than one role at a Jennings & Rosen Expires December 27, 2004 [Page 12] Internet-Draft Media Mixer Control June 2004 time. Each template must define a role named "participant", which is the default role. "Moderator" is a typical role, as is "Floor-Holder", but templates do not intrinsically define or require such roles. 5.5 Controls Controls manipulate the state of the conference while it is instantiated. All controls have a name, a type, a current value and permissions that indicate whether or not the current client can modify them. They may also have, optionally, a min and max value. A control can be defined as being part of a role. In that case, all participants who assume that role have an instance of the control. A control may also be defined as part of a stream, in which case all contributors of that stream (dir="in") have an instance of the control, or all sinks of the stream (dir="out") have an instance of the control. There can be global controls, which are available to all participants. Implicit controls extract values from streams (or other controls), such as choosing video inputs based on loudest speakers 5.6 Parameters Parameters are variables that modify the function of the template. They are fixed when the conference is instantiated and can not be changed after that. Parameters allow a single template definition to describe a range of possible mixer capabilities. Parameters have a name, a type, a value and, optionally, a min and max value. 6. Solution A conference client can request the template from the focus. This allows the client to discover what the current media policy is, what controls it can manipulate. The client can then send a template update to the focus to chance the controls to manipulate media policy for various participants. The state of the media policy for a conference is represented in an instantiated template. Inside the template are one or more participants where the media policy is manipulated. Each participnat section indentifies media sessions (identified with stream id's) that are being contributed to stream lists. It also indtifies media that is sent to the client. Each one of these input or output may contains varios controls that manipulate the media. Jennings & Rosen Expires December 27, 2004 [Page 13] Internet-Draft Media Mixer Control June 2004 A template may define varios virtual stream lists. For example, one video stream may contain video of the active presenter and nother video stream may have the presentation that the presenter is showing. Media from a participant is contributed to one of the virtual stream lists. Various controls, such as gain, may be attached to each input contribute. Each participant also has output streams which represent media being sent to the client. Output streams to a client are named and may have complex controls that effect which streams are selected to contribute to the result. Output strams may be foremed using multiple input components streams. This is typically done for video when the output is some composited form of the input compenet streams but it can also be done for audio in such cases as selecting muliple mono audio streams and defning how they are composinted into a stereo stream. Streams can be selected out of a stream list using an array notation. This selection of the item must not a select the same pyisical stream if that stream has already been selected inside the same exclusinvity group. 6.1 Templates A template is an xml document that both describes the state of a confernce and what can be changed about the confernce. There is a set of well known templates that are IANA registered but clients can deal with nkonw templates. The template definition includes a name, which is a string, for example: