AR Proposal Public Review

Jump to: navigation, search

By Augmented Reality Continuum Working Group, Web3D Consortium

Feb 26, 2013

The Augmented Reality Continuum Working Group (ARC WG) has been developing a proposal for extending X3D to support augmented and mixed reality visualization. As the proposal is reaching the state of its completion, the working group has decided to open the proposal into public and collect feedbacks from others, including our Web3D members, other working groups, and generally from anyone who is interested in AR and Web3D technology. The ARC WG would like to welcome all kinds of feedback that would be helpful to consolidate the proposal and advance into next level of extending the X3D specification to support AR and MR visualization.

  • Reviewing period: March, 2013
  • How to give feedback:
    • Use the "discussion" tab on the top of this page to give feedback and start discussions.
    • If you prefer e-mails, please mail your feedback to Gun Lee (ARC WG co-chair, endovert[at]

Extending X3D for MR Visualization - Unified Proposal

1. Introduction

This document describes an overview of the unified proposal for extending the X3D standard to support Mixed Reality (MR) visualization. Mixed Reality includes both, Augmented Reality (AR) and Augmented Virtuality (AV). The extension of the X3D standard proposed in this document is based on the comparison of three proposals: two from Web3D Korea Chapter (KC1 and KC2) and one from InstantReality (IR), Fraunhofer IGD. The details of the comparison can be found in the following public wiki page:

In this document we focus on the three main components that are necessary to achieve basic MR visualization: sensors, video stream rendering, and camera calibration. We try to minimize the changes to the current specification, but also try to make the solution to be generic enough so that it could be applied to various future applications besides MR visualization.

In order to focus on consolidating the fundamental features, we leave out the following items/functions from the original proposals as future work.

  • High-level events for tracking from proposal KC2
  • Supporting color keying in texture from proposal KC1
  • Supporting correct occlusion between virtual and physical objects (Ghost object from Proposal KC1 and Color Mask + sortKey from IR)
  • Supporting generic type of sensors including those are not directly related to AR/MR visualization (Direct sensor nodes in IR)

2. Sensors

To achieve MR visualization, sensing the real environment is crucial. Two types of sensor that are necessary to support MR visualization are those for acquiring video stream images from a real camera and motion tracking information of physical objects. While the sensors could be generalized to acquiring any type of information from the real world, in this proposal, we focus on these two sensors that are crucial for MR visualization.

In this proposal, two new nodes, CalibratedCameraSensor and TrackingSensor nodes, are proposed for representing interfaces for sensors that are essential for MR visualization.

Since hardware and software setup (including X3D browser) vary between end users, it is not appropriate to describe specific devices or tracking technology to use within the scene. In fact, the author of the X3D scene can have no knowledge of what kind of hardware or software setup is available on the user’s side. Therefore, the X3D scene should only include the high-level description of the purpose of the use of sensor, in order to give hint to the browser and the user to choose appropriate hardware or software on the user’s setup that could meet the intended use. The “description” field is used to describe such intention of using the sensor. At run-time, the browser will show the value of the "description" field to the user through user interface (e.g. a dialog box), asking to choose an appropriate one from the list of sensors available in the local hardware/software setup. The user chooses the appropriate hardware to use for the sensor node, and in this way, users can view the X3D scene with the best option of hardware/software sensors available in his/her environment.

In addition, the browser can be configured to use specific sensor hardware for the sensor nodes with certain predefined values in the “description” field. Each type of sensor node can have different set of predefined values for the description field, and these values are used by the browser to automatically map the default sensors that are preconfigured by the user. Another way to determine default sensors to use is to keep the history of the sensors chosen by the user. The browser can record the mapping of the sensors and sensor nodes chosen by the user, and when the same X3D scene is loaded later, the browser can use the mapping saved from the previous instance.

Asking the user interactively in run-time is not only for mapping appropriate sensors, but also provides a method for validating the use of sensors on the user's device to avoid privacy issues. Therefore, the browser must always ask the user for confirmation, even if the browser is able to automatically map the sensors.

2.1 CalibratedCameraSensor node

The CalibratedCameraSensor node provides an interface to a camera device. The main information provided into the X3D scene through this node is an image stream captured with the camera. The ‘image’ field of the node provides the image stream captured with the camera device. In addition to the image stream, the node should also provide internal parameters of the camera for calibration of the Viewpoint parameters to achieve correct composition of the MR scene. Four fields (focalPoint, fieldOfView, fovMode, and aspectRatio) provide such parameters that correspond to those fields used in the Viewpoint node. Detailed descriptions of each field are in section 3 where the Viewpoint node is described.

CalibratedCameraSensor : X3DSensorNode {
	SFBool	 [in,out]	enabled	TRUE
	SFNode	 [in,out]	metadata	NULL [X3DMetadataObject]
	SFBool	 [out]		isActive

	SFString [in,out]	description	""
	SFImage	 [out]		image 
	SFVec2f	 [out]		focalPoint
	SFFloat	 [out]		fieldOfView
	SFString [out]		fovMode
	SFFloat	 [out]		aspectRatio

The browser should ask the user to choose which camera to use for each CalibratedCameraSensor node through the user interface (e.g. a dialog box). The browser will show the value of the "description" field to the user, providing hint on what type of camera is expected for use. Predefined values can be used in the description field to let the browser to automatically map the default sensors preconfigured by the user. Table 1 shows the predefined values for the description field of CalibratedCameraSensor nodes.

Table 1. Predefined values for the description field of CalibratedCameraSensor nodes
Predefined Value Description
USER_FACING Camera that is facing towards the user.
WORLD_FACING Camera that is facing towards the user’s view direction.

2.2 TrackingSensor node

The TrackingSensor node provides an interface for motion tracking information. The main information provided in this node is position and orientation of the tracked physical object. These values are provided through ‘position’ and ‘rotation’ fields respectively. The ‘isPositionAvailable’ and ‘isRotationAvailable’ fields are TRUE if the tracking target is successfully tracked and the values of the ‘position’ or ‘rotation’ field is valid.

TrackingSensor : X3DSensorNode {
	SFBool	 [in,out]	enabled  TRUE
	SFNode	 [in,out]	metadata NULL [X3DMetadataObject]
	SFBool	 [out]		isActive

	SFString [in,out]	description       ""
	SFVec3f	 [out]		position
	SFRotation [out]	rotation
	SFBool	 [out]		isPositionAvailable	FALSE
	SFBool	 [out]		isRotationAvailable	FALSE

The “description” string field defines the intended use of the tracking sensor, which will be provided to the user to help choosing the tracking hardware to use. The value should include what kind of object the tracking sensor is intended to track, and what reference frame it is using for the coordinate system. Table 2 shows predefined values for the description field that can help the browser to automatically map default sensors that are preconfigured by the user.

Table 2. Predefined values for the description field of TrackingSensor nodes
Predefined Value Description
VIEWPOINT_FROM_WORLD For tracking viewpoint relative to the world coordinate. (e.g., useful for immersive displays to track user's viewpoint.)
OBJECT_FROM_WORLD For tracking an arbitrary physical object relative to the world coordinate.
OBJECT_FROM_VIEWPOINT For tracking an arbitrary physical object relative to the viewpoint coordinate. (e.g. useful for computer vision based AR tracking systems.)

3. Rendering video stream from camera

To visualize a MR scene, the video stream image acquired from a sensor node should be rendered in the X3D scene. For AR visualization, the video stream should be rendered as a background of the virtual environment, while in the AV visualization, the video stream is used as a texture of a virtual object.

3.1 Using video stream as a Texture

For using the video stream image as a texture, no extension of the standard is needed. We can use the PixelTexture node, which is already available in the current version of the X3D specification. The video stream image from the CalibratedCameraSensor node’s “image” field can be routed to the corresponding field of the PixelTexture node. The following example shows how this routing works.

<CalibratedCameraSensor DEF=”camera” />
<PixelTexture DEF=”tex” />
<ROUTE fromNode='camera' fromField='image' toNode='tex' toField='image'/>

3.2 Using video stream as a Background

The Background node in the current X3D specification covers only environmental backgrounds in 3D space. Both Background and TextureBackground nodes describe environment around the user’s viewpoint, represented as a colored sphere or a textured cube around the user . In both cases the background of the virtual scene gets updated depending on the viewing direction of the user. However, for AR visualization, the background of the virtual scene should always show the video stream from the camera sensor. While the Background node and TextureBackground node represent a three dimensional environmental background around the user, the AR background should work as a two dimensional backdrop of the viewport where the 3D scene is rendered on. For this purpose we need a new node type that could represent these kinds of background that work as a 2D backdrop of the scene. We propose two new nodes for this purpose: BackdropBackground and ImageBackdropBackground nodes. The node structure of these nodes are described as the following:

BackdropBackground: X3DBackgroundNode {
	SFColor	[in,out]	color
	MFString [in,out]	url

ImageBackdropBackground: X3DBackgroundNode {
	SFColor	[in,out]	color
	SFImage	[in,out]	image

While only ImageBackdropBackground is necessary for AR application, we also define BackdropBackground node as a another node that corresponds to the node structure of the Background node. Feeding the video stream image from the camera sensor to the ImageBackdropBackround node can be achieved by routing the ‘image’ field of the CalibratedCameraSensor node to the ‘image’ field of the ImageBackdropBackground node.

<CalibratedCameraSensor DEF=”camera” />
<ImageBackdropBackground DEF=”bg” />
<ROUTE fromNode='camera' fromField='image' toNode=bg toField='image'/>

The ImageBackdropBackround will automatically scale the image and fit the width or height of the image to that of the viewport while retaining the aspect ratio. As a result, the background image will fill the entire viewport so that there are no blank region left uncovered by the image background.

4. Camera calibration

To assure the virtual world appears correctly registered to the real world in the MR scene, the camera parameters of the virtual camera should be calibrated to match those of the real camera. There are two types of camera parameters: internal and external parameters. External parameters are position and orientation of the camera in the world reference frame, while the internal parameters represent the projection of the 3D scene onto a 2D plane to produce a rendered image of the 3D scene. The external parameters of a real camera is measured with tracking sensors, while the internal parameters are defined from the optical features of the real camera. The internal and external parameters of the real camera can be fed into the X3D scene through the CalibratedCameraSensor node and TrackingSensor node defined in section 2. The Viewpoint node in the X3D specification represents a virtual camera in the virtual scene. While the fields of the Viewpoint node cover the full set of external parameters (position and orientation), it only has fields that cover limited aspects of the internal parameters. To meet the minimum requirements for achieving MR visualization, we propose adding the two new fields (at bottom) to the Viewpoint node.

Viewpoint: X3DViewpointNode {
	SFVec3f	[in,out]	centerOfRotation
	SFFloat	[in,out]	fieldOfView
	SFRotation [in,out]	orientation
	SFVec3f	[in,out]	position

	SFString [in,out]	fovMode
	SFFloat	[in,out]	aspectRatio

In the current X3D specification, the “fieldOfView” field represents minimum field of view (either vertical or horizontal) that the virtual camera will have. This is insufficient for MR visualization, which needs precise calibration of the field of view (FOV) parameter. While the straightforward way would be explicitly having both horizontal and vertical FOV parameters as individual fields, this is not compatible with the current specification. In order to keep backward compatibility with the current specification, we propose having a “fovMode” field which designates what does the value of the “fieldOfView” field represent. The “fovMode” field can have one of the following values: MINIMUM, VERTICAL, HORIZONTAL, or DIAGONAL. The value MINIMUM is the default value for the “fovMode” field which represents the value of the “fieldOfView” is considered as a minimum FOV (either vertical or horizontal), as it is in the current specification. When the “fovMode” field has the value of VERTICAL, HORIZONTAL, or DIAGONAL, the “fieldOfView” is considered as specific values of FOV in vertical, horizontal, or diagonal direction, respectively. In addition to the “fovMode” field, the aspect ratio of the FOV in real cameras might not necessarily follow the aspect ratio of the image size it produces. To accommodate this feature, the “aspectRatio” field is introduced which represents the ratio of vertical FOV to the horizontal FOV (vertical/horizontal).

5. Use cases

The following example X3D scene shows how a simple AR scene can be described using the proposed nodes.

<CalibratedCameraSensor DEF=”camera” />

<ImageBackdropBackground DEF=”bg” />
<ROUTE fromNode=”camera” fromField=”value” toNode=”bg” toField=”image”/>

<Viewpoint DEF=”arview” position=”0 0 0” />
<ROUTE fromNode=”camera” fromField=”fieldOfView” toNode=”arview” toField=”fieldOfView”/>
<ROUTE fromNode=”camera” fromField=”fovMode” toNode=”arview” toField=”fovMode”/>
<ROUTE fromNode=”camera” fromField=”aspectRatio” toNode=”arview” toField=”aspectRatio”/>

<TrackingSensor DEF=”tracker1” description=”OBJECT_FROM_VIEWPOINT” />

<Transform DEF=”tracked_object”> 
		<Appearance><Material diffuseColor="1 0 0" /></Appearance>            
		<Box />                    

<ROUTE fromNode=”tracker1” fromField=”position” toNode=”tracked_object” toField=”position”/>
<ROUTE fromNode=”tracker1” fromField=”rotation” toNode=”tracked_object” toField=”rotation”/>