X3D V4 HTML Integration Requirements

Objectives and Strategy

The objective for this page is to describe particular issues and details for close integration of X3D capabilities in HTML5/DOM pages. We are working towards identification of consensus points open issues.

This work is part of a broad strategy for X3D version 4.

Web3D Standards Strategy emphasizes a "fundamental objective is to enable the open publishing of interactive 3D graphics models on the Web, enabling real-time 3D communication. We carefully improve and evolve our Recommended Standards while maintaining long-term archival stability."
X3D V4 describes the design strategy of X3D v4.0 for HTML5/DOM integration, and X3D v4.1 Mixed Augmented Reality (MAR). "Next-generation evolution + revolution is combined with archival compatibility of existing legacy content."
X3D V4 Development describes Genesis and Strategic Overview, Legacy Issues, Candidate Capabilities, Backwards and Forwards Compatibility, Architectural Considerations, Open Questions, and Related Work.

Boundary conditions

This is just for X3D running in a web browser and integrated with HTML. This could be expressed as <x3d> ... </x3d> can be treated as a document fragment in the body of an HTML document. If there is X3D outside of this limitation, these requirements may or may not apply.
Changes to X3D Abstract Specification? I had not gone as far as indicating which changes needed to go where. Obviously new nodes would need to be in the Abstract specification. But even before going there it is necessary to determine what environments X3D will operate in. My limitations are only for a web browser environment. Environments beside that one are not a concern of the requirements I stated. If there are to be other environments than those differences and similarities would need to be worked out.
Defining an HTML5 Encoding, and perhaps a corresponding HTML5 Profile

Notes

Some of these requirements may overlap by varying amounts. It was easier to specify them this way then trying to make a complete and non-overlapping set. I'm not going to even claim that this is a complete set, but it is a beginning.
For those that want XHTML, then convert all references to HTML to XHTML and make the appropriate syntax changes. The differences between HTML and XHTML in the appearance of the elements and attributes is that XHTML is strict. This amounts to closing elements, elements and attributes are lower case, mandatory elements and attributes, different MIME type (text/html vs. application/xml or applications/xhtml+xml), and quoted attributes -- see https://www.w3schools.com/html/html_xhtml.asp for details. The WhatWG has a pretty good documentation page on the differences at https://wiki.whatwg.org/wiki/HTML_vs._XHTML.

Decisions taken

At the X3D Working Group meeting held 19th July 2017 the following decisions were made:

The HTML standard to be normatively referenced should be the current W3C HTML recommendation, found at https://www.w3.org/TR/html/. As of the meeting date this is HTML 5.1, dated 1 November 2016.
As a consequence the DOM specification normatively referenced by the HTML specification can be found at https://www.w3.org/TR/DOM/. This is the W3C recommendation for DOM4, dated 19 November 2015.
All X3D nodes and statements are defined in the X3D abstract specification. Each encoding shall specify the mapping of each node and statement in the X3D abstract specification to the respective encoding form.
Each X3D element needs a corresponding DOM interface definition. The form of this is still to be decided.
X3D element DOM interface definitions will use the same IDL as HTML, namely WebIDL Level 1 defined at https://www.w3.org/TR/WebIDL-1/.

Candidate Requirements

Looks "like" HTML

Elements (nodes in X3D-speak) shall be case independent

Attributes (fields in X3D-speak) shall be case independent

X3D nodes shall not have name conflicts with any HTML-defined elements

I believe the HTML5 Custom Elements spec requires a prefix-name format for element names. While allowing lowercase should we also allow 'x3d-transform' format alternative names?

In that case, each node would be aliased with a HTML5 Custom Element compatible name.

All X3D nodes shall support all HTML Global Attributes https://www.w3schools.com/tags/ref_standardattributes.asp

Is this equivalent to requiring that X3D nodes need to derive from HTMLElement as defined in the DOM?

All X3D fields with the same name as HTML attributes shall behave as the HTML element

TBD: Not all style attributes apply to X3D nodes & fields

Candidate interface definitions for X3DElement

Start with the assumption that all X3D elements inherit a common interface, named X3DElement. Two candidate interface definitions are as follows:

Candidate A

interface X3DElement : HTMLElement { } ;

Using the definition of HTMLElement from the W3C HTML specification, this can be expanded to:

interface X3DElement : Element {

 // metadata attributes
 attribute DOMString title;
 attribute DOMString lang;
 attribute boolean translate;
 attribute DOMString dir;
 [SameObject] readonly attribute DOMStringMap dataset;

 // user interaction
 attribute boolean hidden;
 void click();
 attribute long tabIndex;
 void focus();
 void blur();
 attribute DOMString accessKey;
 attribute boolean draggable;
 [PutForwards=value] readonly attribute DOMTokenList dropzone;
 attribute HTMLMenuElement? contextMenu;
 attribute boolean spellcheck;
 void forceSpellCheck();

};

X3DElement implements GlobalEventHandlers;

X3DElement implements DocumentAndElementEventHandlers;

X3DElement implements ElementContentEditable;

Candidate B

Take the SVGElement interface, substituting X3D for all occurrences of SVG. Note: The original names have been retained in order to include the hyperlinks.

interface SVGElement : Element {

 [SameObject] readonly attribute SVGAnimatedString className;
 [SameObject] readonly attribute DOMStringMap dataset;
 readonly attribute SVGSVGElement? ownerSVGElement;
 readonly attribute SVGElement? viewportElement;
 attribute long tabIndex;
 void focus();
 void blur();

};

SVGElement implements GlobalEventHandlers;

SVGElement implements SVGElementInstance;

HTML encodings

HTML5 has two encodings, not one. Each consistently maps to DOM. The WG developing corresponding functionality and syntax that supports both encodings will add tremendous value to everyone.

HTML5 Recommendation section 8. The HTML syntax https://www.w3.org/TR/html5/syntax.html#syntax

HTML5 Recommendation section 9. The XHTML syntax https://www.w3.org/TR/html5/the-xhtml-syntax.html#the-xhtml-syntax

The differences between HTML and XHTML in the appearance of the elements and attributes is that XHTML is strict. This amounts to closing elements, elements and attributes are lower case, mandatory elements and attributes, different MIME type (text/html vs. application/xml or applications/xhtml+xml), and quoted attributes -- see https://www.w3schools.com/html/html_xhtml.asp for details. The WhatWG has a pretty good documentation page on the differences at https://wiki.whatwg.org/wiki/HTML_vs._XHTML. There are differences in scripting. Most of the differences are obvious extensions from XHTML being an XML document. The W3C HTML5 specification (https://www.w3.org/TR/html5/introduction.html#html-vs-xhtml) is consistent with the above comments. All of the formalities aside, from my observations there is very little use of the XHTML document type for regular web pages.

Functions "like" HTML

All nodes shall be fully integrated with the DOM [This may need to change if certain nodes need to remain hidden from the DOM. For example, the manner which X3DOM implements Inline.]

The scene graph does not need to be DOM (or a portion of it).

X3DOM keeps a separate scene graph from the DOM. The rendering occurs from the scene graph. Changes to the DOM cause changes to the scene graph and vice versa.

Changes to the DOM shall be reflected in the scene graph

Changes to the scene graph shall be reflected in the DOM

Using the DOM to maintain and reflect state is often considered slow. Could this be optional? Changes to the scene graph cause events which can be listened to if required. (This is why cobweb-dom does not reflect back interpolator output to the DOM at this point)

Perhaps it is better to set all current fields that are initialized to be (generally) non-updated. If you want to see a particular value, you need to listen to the appropriate event. Certain events would cause the initialized fields to be updated. I know that this is not written well.

Events are handled as HTML events

DOM is the external (i.e., from the web page) API to the scene graph

X3dom does it that way. One could add that this external API could be implemented using the SAI thereby guaranteeing X3D event flow determinism

Would it be possible to implement the SAI over DOM? I am concerned about having more than one fundamental API. It will cause confusion and errors.

Note: The notes on Cobweb DOM by Andreas Plesch should be noted. See https://github.com/andreasplesch/cobweb_dom

Other HTML related

X3D scenes can pass events to/from the DOM and ROUTE events within the scene graph.

HTML elements, SVG elements and css blocks can be interspersed with X3D elements.

HTML Script, SVG Script and X3D Script elements are unique and distinguishable.

I still have not heard a good reason to have X3D scripts. They cannot have the same element name so legacy X3D files cannot run. At that point using HTML JavaScript for all processing appears to be the right thing to do. I see it as very important to content developers not to have multiple flavors of JavaScript running.

String-based DOM events correlate with strongly typed X3D events by defined correspondences

Multiple candidate approaches can be listed and compared

The events I am referring to are those that cross between X3D and HTML or any other non-X3D environment (e.g., display, orientation, etc.).

X3D shall support the evolving web standards for flat-3D (WebGL), VR (WebVR) and AR

Existing features & functionality

Perform the following tasks with the requirements that is be conflict with the above

Conduct a review of all existing features (nodes) to determine if any should be deprecated in V4

Conduct a review of all existing functionality (run-time) to determine if any should be deprecated or changed in V4

Include all features and functionality that passes the above reviews

Additional features to be added

Deformable-skin joint based animation

Support for multiple geometry formats, including OBJ, glTF

Leonard has a pretty good understanding about binary glTF support in x3dom. There are issues posted under the X3DOM project at GitHub, but resolution of those issues will take quite a bit of work.

Increased Material support with a standard library of pre-defined shaders

Three.js has good shaders including for PBR. But again quite a bit of work. Not sure if Fraunhofer is that engaged since there may be commercialization interest.

Including programmable shaders in declarative language and not providing predefined operations seems like a poor choice. This is an attempt to start to rectify it.

Mechanism to navigate a scene without a pointing device

Mechanism to touch or select objects without a pointing device

Or with wand or controller.

Review other 3D/VR display technologies (XML3D, A-Frame, GLAM, etc.) to determine if there are features that should be included

The three.js examples are also inspiring. Postprocessing of the rendered frame in another shader stage comes to mind. There are probably many more modern graphics methods which need some support. Ambient occlusion ssao? Motion blur? Distance defocus?

Issues

What encodings are needed? For example, do we need an HTML encoding. Or an XHTML encoding. Or can we simply reference the XML encoding, with modifications as appropriate, if required?

Existing implementation designs

There are currently two implementations that enable X3D to be integrated into HTML web pages. These are:

X3DOM - Developed by Fraunhofer - see https://www.x3dom.org/. Version is 1.7.2 at the time of writing.

Cobweb - Developed by Holger Seelig - see http://titania.create3000.de/cobweb/. Version is 3.2 at the time of writing.

See also the Cobweb DOM developed by Andreas Plesch - see https://github.com/andreasplesch/cobweb_dom

The remainder of this section will consider some of the design patterns of these two implementations. For a more wide ranging and detailed review see the page X3D/HTML Implementations.

Commonalities

Both Cobweb/Cobweb DOM and X3DOM support the embedding of X3D scenes directly into the HTML. Both implementations also support referencing an external X3D file.

Both X3DOM and Cobweb/Cobweb DOM have a parallel dual-node setup. That is, they have both a DOM document fragment that integrates directly with the remainder of the HTML in the web page, and an internal X3D scene graph. The two sets of nodes are synchronized, so that changes in either one are reflected into the other. The DOM document fragment permits all the usual HTML/DOM interactions. The X3D scene graph permits the regular X3D interactions. In Cobweb, the synchronization functionality for the X3D scene graph is faithful to the scene access interface (SAI).

Both Cobweb/Cobweb DOM and X3DOM use the HTML canvas tag for rendering the scene. Each implementation will automatically incorporate the canvas tag into the DOM document fragment.

Both X3DOM and Cobweb/Cobweb DOM support both the X3D "DEF" attribute and the HTML "id" attribute. Cobweb maintains these in the appropriate document fragment, with no mapping from one to the other. While X3DOM supports both "DEF" and "id", it also provides some mapping between them. All "id" attributes are automatically mapped to "DEF" attributes if no "DEF" attribute is defined. The reverse mapping is only optionally allowed for Inline scenes, where the Inline node has an additional attribute "mapDEFToID" defined.

With embedded X3D scenes both X3DOM and Cobweb/Cobweb DOM incorporate all nodes into the DOM. This can be verified by inspection of the HTML web page when loaded into a browser.

Differences

In order for the X3D tag to faithfully follow the standard and not include extra attributes Cobweb has an additional X3DCanvas tag that is used as a parent to the X3D tag. This X3DCanvas tag has all the additional properties that might be required for integrating an X3D scene into a web page. In contrast, X3DOM simply adds additional properties to the X3D tag.

Cobweb supports both prototypes and scripts. X3DOM supports neither. In order to recognize the difference between an HTML script element and an X3D script element the X3D script element in the Cobweb implementation has been given an additional attribute named "type" that is analogous to the HTML script tag attribute "type". For X3D scripts, the value for this attribute should be "application/x-vrmlscript".

As a consequence of the current X3DOM implementation not supporting prototypes and scripts it cannot support the "Immersive" or "Full" profiles. TODO: Find out if variables and functions from X3D Script nodes can be accessed in other HTML scripts.

Cobweb/Cobweb DOM runs the X3D scenegraph portion of the scene using the standard X3D event models. Output events are translated into custom DOM events. TODO: Find out what X3DOM does.

Cobweb/Cobweb DOM is designed to be a faithful implementation of the X3D standard. X3DOM, however, has significant differences from the standard. TODO: Enumerate these differences.

X3DOM only permits the external referencing of XML encoded X3D files. Cobweb can externally reference both XML and Classic VRML encoded X3D files.

Comparison of implementations
Description	X3DOM	Cobweb	Cobweb/DOM
Embedding X3D	Supported	Supported	Supported
Referencing external X3D	Supported	Supported	Supported

General comments

Generally, the vrml spec. was written before the advent of powerful GPUs (massively parallel super computers) where it now can be faster to repeat the same fragment shader operation for each pixel (perhaps a million times) rather than update data between cpu and gpu once. Not sure what the consequences exactly are other than being shader friendly in some way.

Deformable skin from joint animation calculations are typically carried out on the GPU. Too much for serial processors. I'm sure there are other items.

HTML empty tag closing

In general web browsers are not very good at recognizing self-closing empty tags. For example:

        <Material diffuseColor='1 1 1' />
        <ImageTexture url='"MyImage.jpg"' />

will be incorrectly interpreted, with ImageTexture being loaded as a child, and not a sibling, of Material. Instead, these must be written as:

        <Material diffuseColor='1 1 1'></Material>
        <ImageTexture url='"MyImage.jpg"'></ImageTexture>

NOTE: It must be remembered that all X3D nodes can contain children, if only a metdata node. Therefore, they are not empty elements by design.

MFString quoting

In accordance with the standards MFString fields should have double quotes, as illustrated in the examples above. While X3DOM will be tolerant of single quote usage, Cobweb will not.

DEF/USE in HTML

(Comment by Leonard) The original purpose (and still used in this manner) of the 'USE' attribute is to indicate that another node should also appear in place of the node declaring 'USE'. In fact the specification states (4.4.3 - http://www.web3d.org/documents/specifications/19775-1/V3.3/Part01/concepts.html#DEFL_USESemantics) that "the same node is inserted into the scene graph a second time, resulting in the node having multiple parents".

This requirement is not allowed in DOM (see https://www.w3.org/TR/dom/#concept-node-tree for the standard, https://www.w3.org/wiki/Traversing_the_DOM#Nodes for the explanation). A DOM element is allowed to have at most one parent. It is possible to create a (deep) copy of the node and insert it into the tree. That gives a structure like:

  B - C  - D
 /       
A        
 \       
  E - CC - DD

Where A is the parent of this (sub-)tree, B is the node that start one branch (e.g., Transform). C is the 'DEF'ed node with a child of D. E is a separate child of A (e.g., a different Transform) CC is the 'USE' version of C. Since HTML does not allow multiple parents ('B' and 'E'), a copy of 'C' needs to be made. This needs to be a deep copy (all children) as no node can have more than one parent.

(Comment by Andreas) It may be also instructive to determine how x3dom currently deals with DEF/USE. My recollection is that DOM elements with a USE attribute have just one parent (they have to) but are mapped to x3dom nodes (in javascript memory) which have the DEF parent (through a map).

cobweb_dom is initially loading nodes including USE nodes with the SAI importDocument(DomNode) function (http://www.web3d.org/documents/specifications/19777-1/V3.3/Part1/functions.html#t-FunctionsBrowserObject). USE nodes added later to the DOM are parsed into the x3d graph using cobwebs parser which I think takes care of parenting to the x3d DEF node.

A-frame has an assets systems which allows reuse of components in multiple entities. I suspect A-frame avoids copying and would reference the same asset multiple times. This is possible because A-frame also has a parallel scene graph (the THREE graph) which exists in memory apart from the DOM.

Another angle is the shadow DOM but I am not sure if that applies since I suspect that it also does not allow multiple parents.

I think SVG has DEF/USE as well. SVG then must define some exceptions (?).

Overall, it looks like it may be necessary to rely on more than the DOM alone for DEF/USE functionality.

(Comment by Leonard) I have looked through the code and have not yet been able to find the specific lines that deal with 'USE'. X3DOM does deal with 'USE' as can be seen in the example at https://examples.x3dom.org/example/x3dom_defUse.xhtml. View-source shows that it is used in the second 'Shape'. I suspect that the second 'Shape' is removed from the DOM but kept in the scene graph.

This particular method breaks (2 - not all X3D nodes are in the DOM). I believe Andreas describes Cobweb as doing essentially the same thing.

cobweb_dom is initally loading nodes including USE nodes with the SAI importDocument(DomNode) function (http://www.web3d.org/documents/specifications/19777-1/V3.3/Part1/functions.html#t-FunctionsBrowserObject) . USE nodes added later to the DOM are parsed into the x3d graph using cobwebs parser which I think takes care of parenting to the x3d DEF node.

A-frame has an assets systems which allows reuse of components in multiple entities. I suspect A-frame avoids copying and would reference the same asset multiple times. This is possible because A-frame also has a parallel scene graph (the THREE graph) which exists in memory apart from the DOM.

I have looked at the asset system in A-Frame, but not in sufficient depth to determine whether 1) It uses the data as a single copy inserted into the DOM in the asset statement with the uses not appearing in the DOM 2) It copies the information from the asset definition to the node that uses it 3) It uses the asset-defined information by reference

If the asset system does not expand nodes in the DOM when it is used, then it could very easily work like CSS by supplying a common definition that other nodes just refer to (#3 above).

Another angle is the shadow DOM but I am not sure if that applies since I suspect that it also does not allow multiple parents.

Shadow DOM is (sort-of) like regular DOM. You are right about not allowing multiple parents. It allows you to expose elements to the (page) renderer without them being directly visible in the DOM. Since the scene graph is roughly parallel to the DOM, adding another DOM seems to me to be increasing the complexity without necessarily solving the problem.

Even though it is 6 years old, this article provides a good basic description of Shadow DOM - https://glazkov.com/2011/01/14/what-the-heck-is-shadow-dom/

I think SVG has DEF/USE as well. SVG then must define some exceptions (?).

Not being an SVG expert, I had to look this up. I found this Stack Overflow questions: https://stackoverflow.com/questions/19246232/svg-reuse-a-line-node-with-def-and-use

It describes that the 'use' attribute does a deep clone and inserted into the generated tree. Unfortunately, the answer did not provide a reference, but it does look like it is quoting something important.

Overall, it looks like it may be necessary to rely on more than the DOM alone for DEF/USE functionality.

Offering another possibility. Unreal (and I think Unity) create a class for each object. If something is "copied", then it is sub-classed from the original class. Changes to the parent propagate to all subclasses; however, a sub-class can override any property (which would then propagate to sub-sub-classes). I think that means in "X3D"-speak, the "DEF" node defines the master class. Any node that "USE"s it would create a subclass. A USE node could also redefine fields for that subclass, and even make itself available for subclassing by using a DEF statement. This would look something like:

(Comment by Andreas) The second shape is still in the DOM. But I am not sure what happens if its USE attribute is modified.

https://github.com/x3dom/x3dom/search?utf8=✓&q=defmap&type=

shows how DEF is implemented.

The svg spec. needs a lot of language to deal with def/use:

https://www.w3.org/TR/SVG/struct.html#Head

https://www.w3.org/TR/SVG/struct.html#UseElement

An excerpt: "The effect of a ‘use’ element is as if the contents of the referenced element were deeply cloned into a separate non-exposed DOM tree which had the ‘use’ element as its parent and all of the ‘use’ element's ancestors as its higher-level ancestors. Because the cloned DOM tree is non-exposed, the SVG Document Object Model (DOM) only contains the ‘use’ element and its attributes. The SVG DOM does not show the referenced element's contents as children of ‘use’ element."

Mostly use seems understood as referencing def but the actual representation then needs to be a deep and unexposed copy.

Svg also allows reusing a modified use for subclassing, judging from the spec. . I have never tried this.

Could be cool. There may be a chance to make this a backwards compatible addition to x3d ?

Trying to come up with a practical use case. Cars. Blue Cars. Blue cars with lights on.

(Comment by Leonard) Examples where you might wish to sub-class an object. You have one master car wheel. The ones on the other side need to have their geometry inverted, but not their rotation characteristics. The ones in front turn, the ones in back don't. Going around corners cause the tires to rotate at different speeds.

Another one is trees. They may all be the same, but they will bend differently depending on how the wind blows through the forest.

The people who build Unreal environments do this all time. I can ask a couple of illustrated examples if that would help.

Scripting

See https://www.w3.org/TR/SVG/script.html
See the example in 18.2 The 'script' element at https://www.w3.org/TR/SVG/script.html#ScriptElement
See the DOM interface definition for SVGScriptElement at https://www.w3.org/TR/SVG/script.html#InterfaceSVGScriptElement

Scripting Examples

This section added because of a structural reason in the parent that prevents additional items from being added there

ECMAScript is very powerful and allows dynamic changes to code at run-time. This section illustrates some features that are regularly used in advanced HTML/JavaScript programming

X3D Script function called 'foo' HTML script function called 'bar'

In the body of bar, I should be able to redefine 'foo'. It is (in a fully integrated system) available as window.foo. Similarly for inside of 'foo' to change 'bar'.

function bar(event, time) {
     window.foo = bar;            // I would (perhaps) settle for 
window.x3d.foo = bar;
}

and

X3D-Script ...>    <!-- X3D Script node -->
     function foo (event, time) {
         window.bar = foo;
     }
</script>

I should be able to construct an object with 'foo' as a method. E.g.,

var globalVar = {};
globalVar.x3d = foo;

// should call the X3D script passing it a reference to the current 
value of 'event' and 'time'.
globalVar.x3d(event, time);

Inside 'foo' I should be able to access any DOM element to get it's current state or even establish an event listener.

OBJ file format

I applaud the idea of multiple geometry formats, as we need interoperability, and glTF is a nice standard.

But I would remove OBJ from the list, or be careful what parts of the OBJ are necessary to be supported.

The OBJ is an old format. It doesn't have many features nice for modern rendering (like a specification of shading suitable for modern GPUs). And it does have some features that are unhandy, and were probably really useful only by the initial Wavefront OBJ implementation (trace_obj).
The OBJ specification is not officially maintained by anyone, I think. It's not updated. The creators of it, and the original software implementing it, are closed now. (https://en.wikipedia.org/wiki/Wavefront_Technologies , https://en.wikipedia.org/wiki/The_Advanced_Visualizer ). No-one since then has taken the effort to update the OBJ specification in any way, as far as I know.
Although it has a specification http://www.martinreddy.net/gfx/3d/OBJ.spec, most implementations treat it more like a "de-facto" standard, implementing different subsets of the format, whatever seems useful. There are features in OBJ format that are widely supported (vertexes with normals, tex coords), and there are features not supported by any existing implementation (I have not seen yet an implementation that supports Bezier patches defined in OBJ files; or one that supports d_interp or trace_obj).

Bottom line: While OBJ is a popular format, I would either discourage from using it (we made X3D for a reason iii it's really better on all accounts), or at least specify only a precise subset of OBJ features required to be supported.

A very informative link to Wikipedia description of OBJ: https://en.wikipedia.org/wiki/Wavefront_.obj_file

I included OBJ because it is a very popular format for the exchange of geometry and normal information. I have only seen it with triangles. I am in complete agreement with Michalis about limiting it's scope. For example, SVG also has a script node that seems to operate just fine in combination with HTML.