Composing Scene Graph Alternatives

Problem and solution summary

"Out of intense complexities, intense simplicities arise." - Winston Churchill

Most discussions regarding X3D and VRML assume a single scene graph topology. However, people often have different mental models for the abstract scene graph.

Two scene graph models are both possible and composable. Each matches the abstract scene graph definition in the VRML 97 specification, and each is isomorphic with the other. As a result, many different issues regarding object models and wrapper tags are also shown to be resolvable.

Composing scene graphs and object models resolves a huge number of technical issues. This also means that a simpler choice can be made regarding parser/tagset combinations. The two simple, clarified alternatives are:

ultralight context-free parsing, with wrapper tags for field names surrounding nodes in all scenes, or
slightly more capable parsing (such as Blendo with declaration memory, or XML parsers), with no wrapper tags for field names in any scenes

Either choice acceptably enables both scene graph models. Thus all current and foreseen capabilities for X3D appear to be compatibly preserved. This development means that rapid progress on all major X3D issues is possible now, regardless of eventual choice.

Interestingly, this simple choice appears to be a classic computer-science time-space tradeoff: pay first with additional parser capability, or pay later with additional information in scene files. Determining this choice remains a task for the X3D consensus and assessment efforts.

Abstract scene graph definition

Defining the abstract (i.e. implementation-independent) VRML 97 scene graph is fundamentally important. Paragraph 4.4.2 "Scene graph hierarchy" of the VRML 97 specification states:

A VRML file contains a directed acyclic graph. Node statements can contain SFNode or MFNode field statements that, in turn, contain node (or USE) statements. This hierarchy of nodes is called the scene graph. Each arc in the graph from A to B means that node A has an SFNode or MFNode field whose value directly contains node B. See E.[FOLE] for details on hierarchical scene graphs.

The sentence beginning "Each arc in the graph from A to B" is correct. However the phrasing is somewhat awkward. Two different scene graphs can be inferred from this definition.

Editorial note: E.[FOLE] does not contain any details on hierarchical scene graphs.

Scene graph equivalences

The following diagram shows two possible interpretations of the abstract scene graph definition.

Color key:

Green: generic class type
Blue: VRML 97 node name
Red: VRML 97 field name, sometimes referred to as VRML 97 wrapper

Most (but not all) implementations appear to follow the scene graph diagram on the left, including the Appendix B, Java Script Authoring Interface (JSAI).

The abstract scene graph definitions for arcs and nodes (as defined in the specification excerpt above) correspond most closely to the scene graph diagram on the right, as do a few implementations.

A major concern has been whether choice of one scene graph precludes another. Visual inspection of each graph reveals a one-to-one correspondence between VRML nodes and VRML fields. This symmetry means that the two scene graphs are isomorphic when considered from a rendering perspective. There is an unambiguous transformation from one scene graph to the other, and vice versa. Thus both scene graphs are functionally equivalent.

No exclusive choice is needed between scene graphs - either implies the other. Different compatible applications might choose one or the other scene graph.

It is likely that lack of clarity regarding these two scene graphs has been a root problem underlying many unresolved discussions. Since nature of the scene graph is the most fundamental issue in the entire VRML 97 specification, it is quite understandable that resolving the many issues related to scene graphs is considered so fundamentally important.

Object models

There appear to be two different object models corresponding to the two scene graphs, since the (FieldObject) classes only appear in the left-hand diagram. Thus it appears that the object model for the left-hand scene graph is not the same as the object model for the right-hand scene graph. If that is really the case, incompatible object models present a problematic situation.

However, when considering the semantics of VRML, not every arbitrary operation within an object model is necessary. Only those operations which make sense from a scene graph rendering perspective are necessary. For example, when replacing a VRML Appearance node via the topmost example Transform, the necessary operation for the left-hand object model might be
ExampleTransform.getField("children").getNode("Shape").getField("appearance").setNode("Appearance");
while the necessary operation on the right-hand object model might be
ExampleTransform.getNode("Shape").setNode("Appearance");
A variety of solutions to problems like child access and disambiguation are possible in each case.

As a counterpoint, now consider an operation which does not occur. For example: the geometry field of the Shape node does not change while it still contains a valid Box node. More specifically: an illogical operation for the left-hand object model would be changing the value of (FieldObject) geometry while it contained an Box node. Similarly, an illogical operation for the right-hand object model would be changing the value of ----arc---- geometry while it contained an Box node.

An important constraint emerges from examples like this. Each object model is functionally equivalent when considered from the perspective of scene graph semantics. Functionally complete Application Programming Interfaces (APIs) for each object model will have a direct functional correspondence between each scene graph operation. Syntax for the two APIs will certainly be different, and function calls for the two APIs can't be mixed indiscriminately. Nevertheless the two APIs will be internally consistent, and the same kinds of scene graph operations will be provided by each.

No exclusive choice is needed between object models - either is functionally equivalent to the other. Different compatible applications might choose choose one or the other object model.

This is an important result. Use of either scene graph permits development of functionally equivalent object models. A promising editorial task while writing the VRML 200x specification will be elaborating both object models in complete detail, as equivalent examples of a single abstract object model.

Composition of scene graphs and their corresponding object models allows compatible mutual coexistence.

Nevertheless, some choices remain (of course). A single tagset is necessary for X3D, in order to produce a single unambiguous Document Object Model (DOM) for X3D. The Document Object Model (DOM) for X3D will match the DTD tagset chosen. This X3D DOM will correspond either to the left-hand scene graph object model, or the right-hand scene graph object model. Selecting one tagset (and therefore one DOM) will enable unambiguous integration of X3D events with other DOM-compatible event generators (such as XHTML, SVG, MathML, SMIL, etc.). Selecting one DOM will also facilitate elaboration of the forthcoming X3D Script Authoring Interface (SAI).

Choice of XML parser/tagset can now be posed in much simpler terms. Either choice is a technically acceptable alternative. Either choice enables both pairs of scene graphs and object models.

Wrapper tags

The term "wrapper tags" has commonly referred to tags which wrap the field name for a node around the node itself. An alternate term is "field containers." For example,
<appearance><Appearance> ... </Appearance><appearance>
contains two (lower-case "a") <appearance> wrapper tags. This construct indicates that the wrapped <Appearance> node belongs in the <appearance> field of the parent <Shape> node.

An example X3D scene with wrapper tags embedded is WrapperTagsExample.xml (autotranslated to VRML as WrapperTagsExampleTranslated.wrl).

Wrapper tags are used for initializing ProtoInstance/extension fields in the various SONY DTDs. Using wrapper tags for field names means that parsers do not need to remember field-name definitions, either for PROTOs or built-in VRML 97 nodes.

The "Sony VRML 97 DTD" and the "Sony Compromise DTD" show the precise details of two proposed tagset definitions and relationships.

Initializing ProtoInstance fields occurs in x3d-draft.dtd using the <defaultValue> tag instead of wrapper tags. This enables disambiguation of contained fields from contained content, just as wrapper tags do. An in-depth explanation of this mechanism is provided in QuadTreeExamplesExplanation.

Thus the syntax of prototype field-name definitions can be provided via the following mechanisms:

untyped/unverifiable wrapper tags which duplicate the field name for each instance, or
strongly typed <defaultValue> tags which use the already-declared field name

A possible third choice is to avoid wrappers for specification-defined nodes, and use wrappers for extension nodes. The primary objection to this possibility is lack of consistency in treatment of field names, although that objection is somewhat offset by broad recognition of the VRML 97 specification nodes.

Interestingly, the two choices to include or avoid wrapper tags corresponds exactly to literal encodings of the two equivalent scene graphs.

Parser/tagset choices

Composition of scene graphs plus functional equivalence of object models provides excellent flexibility. Most of the issues in the wrapper tag controversy documents articulated by NPS and SONY now appear composed and compatible through this analysis. All of the decision points have been squeezed down into tradeoffs between parser capability and scene size.

Thus a simple choice is now possible. The X3D tagset either

enables context-free parsers by including wrapper tags in all content, or
requires parsers to remember defined field names, and avoid wrapper tags in all content

Keeping track of defined field names is a lightweight requirement. A total of 14 field names are used by 25 VRML 97 nodes. The complete list of field-name [Node names] combinations (corresponding to VRML 97 wrapper tags) follows:

appearance [Shape]
children [Anchor, Billboard, Collision, Group, Transform]
choice [Switch]
color [ElevationGrid, IndexedFaceSet, IndexedLineSet, PointSet]
coord [IndexedFaceSet, IndexedLineSet, PointSet]
fontStyle [Text]
geometry [Shape]
level [LOD]
material [Appearance]
normal [ElevationGrid, IndexedFaceSet]
source [Sound]
texCoord [ElevationGrid, IndexedFaceSet]
texture [Appearance]
textureTransform [Appearance]

The following table summarizes the requirements for X3D parsing, together with the corresponding requirements for X3D scenes. The left and right columns each correspond to literal encoding of the left and right equivalent scene graphs, respectively. Either the left-side choice or the right-side choice must be made to decide on the X3D tagset Document Type Definition (DTD).

X3D Parsing Choices
can ignore 14 native VRML field name definitions can ignore PROTO/EXTERNPROTO field name definitions	must utilize 14 native VRML field name definitions must utilize PROTO/EXTERNPROTO field name definitions
Corresponding Choices for Scene Authoring
additional field-name wrapper tags required in all scenes	no field-name wrapper tags required in all scenes

X3D Parsing Choices

can ignore 14 native VRML field name definitions

can ignore PROTO/EXTERNPROTO field name definitions

must utilize 14 native VRML field name definitions

must utilize PROTO/EXTERNPROTO field name definitions

Corresponding Choices for Scene Authoring

additional field-name wrapper tags required in all scenes

no field-name wrapper tags required in all scenes

It is worth reiterating that choosing either column still enables both scene graphs and both object models.

All participants are likely to have a qualified opinion on this time-space tradeoff: choosing more computation in parsers, versus choosing more tag information in scenes.

Next Steps

Several next steps should be considered by the X3D contributors. A "divide and conquer" strategy appears possible since either choice appears to enable all major goals. Here is a preliminary list of next-step tasks:

Examine and evaluate this analysis (See 29 Feb 2000 followup message regarding which DTDs pertain)
Examine computational cost of modifying Blendo and Shout3D parsers to maintain VRML and PROTO field names (experience by blaxxun and Draw may help)
Assess consensus on choices by following though on the consensus process
Proceed in parallel with specification writing, object model elaboration, Script Authoring Interface (SAI), Component Interface Model (CIM), extensions and other X3D strategic priorities
Prepare timeline for X3D implementation completion prior to SIGGRAPH in July 2000
Continue W3C and MPEG4 liaison activities, defining specific technical objectives and milestones
Renew the Binary Encoding Request for Proposals (RFP) once DTD/tagset is chosen
Continue open and community source implementations
Evaluate implementations using the NIST conformance suite and post results

Revised: 5 March 2000

Uniform Resource Locator (URL): www.web3D.org/TaskGroups/x3d/content/ComposingAlternateSceneGraphs.html

Author: Don Brutzman

Acknowledgements: new ideas in this document emerged during discussions with Paul Diefenbach, Stefan Diehl, Paul Fishwick, Mike Fletcher, Rob Glidden, Rick Goldberg, Holger Grahn, Bryan Housel, Alan Hudson, Chris Marrin, Rob Myers, Dick Puk, Rick Rafey, Sandy Ressler, Bernie Roehl, Auvo Severinkangas, Henry Sowizral, Jim Stewartson, Erick von Schweber, Neil Trevett and other X3D contributors.