to X3D Specifications: XML Schema and DOCTYPE Validation
   

X3D Regular Expressions (regexes)

    
to Web3D home page

Overview | XML Patterns | X3D Patterns | References | Tools | Contact

X3D Regular Expressions (regexes) are used to validate the correctness of string and numeric array values in an X3D scene.

XML | DOCTYPE | Bool | Color | ColorRGBA | Double | Float | Image | Int32 | Rotation | String | Time | Vec2 | Vec3 | Vec4 | bboxSize


Overview to top

Regular expressions (regexes) define string grammars that efficiently and rigorously define allowable character patterns making up a data value.

Regexes themselves are carefully defined sequences of characters that form a search pattern, mainly used for string pattern matching. For example, this technique allows detection of well-formed (or incorrect) MFVec3f arrays of three-tuple floats in an X3D scene.

X3D regexes are utilized judiciously when the base types of XML Schema are insufficient to capture the necessary richness of X3D content validation. Like all aspects of X3D Schema validation, regex validation is reasonably high performance and optional.

Note that not all regex languages are completely consistent, thus small (but fundamentally important) variations can occur. This work strictly follows regex syntax for XML Schema.

Existing data validation tools provide expressive power that is able to validate values to different degrees of fidelity.

  1. DOCTYPE (DTD). DOCTYPE validation can only check that attribute values are strings. In some cases, a strict set of allowed enumeration values is defined (such as legal names for profiles and components).
  2. XML Schema. Schema validation can check a large set of built-in data types. However, XML Schema validation is typically not able to fully check the correctness of array values. For example, an SFVec3f triplet (3-tuple) or an MFVec3f array can be checked to only contain floating-point values, but cannot be checked to have a multiple of three floats.
  3. Regular expressions (regexes). Regular expressions can define any regular grammar, and thus have arbitrary expressive power. Although definitions may be tricky to define, character patterns of arbitrary complexity are theoretically achievable.

X3D Regular Expressions are an important part of X3D Quality Assurance (QA) to maximize the correctness of X3D scene content.


XML Patterns to top

X3D Scene Authoring Hints: Validation describe all initial XML headers for X3D.

Regex testing for XML-related constructs by X3dDoctypeChecker.java include the following.

XML and DOCTYPE regex patterns
XML declaration as part of the document prolog:
<\\?xml version=(\"|')1.(0|1)(\"|') encoding=(\"|')UTF-(8|16)(\"|')\\?>
XML declaration, case insensitive:
<\\?xml version=(\"|')1.(0|1)(\"|') encoding=(\"|')(U|u)(T|t)(F|f)-(8|16)(\"|')\\?>
Experimental X3D 4.0 DOCTYPE:
<!DOCTYPE X3D PUBLIC(\\s)+\"ISO//Web3D//DTD X3D 4.0//EN\"(\\s)+\"http://www.web3d.org/specifications/x3d-4.0.dtd\"(\\s)*(>|\\[)
Approved X3D 3.3 DOCTYPE:
<!DOCTYPE X3D PUBLIC(\\s)+\"ISO//Web3D//DTD X3D 3.3//EN\"(\\s)+\"http://www.web3d.org/specifications/x3d-3.3.dtd\"(\\s)*(>|\\[)
X3D 3.2 DOCTYPE:
<!DOCTYPE X3D PUBLIC(\\s)+\"ISO//Web3D//DTD X3D 3.2//EN\"(\\s)+\"http://www.web3d.org/specifications/x3d-3.2.dtd\"(\\s)*(>|\\[)
X3D 3.1 DOCTYPE:
<!DOCTYPE X3D PUBLIC(\\s)+\"ISO//Web3D//DTD X3D 3.1//EN\"(\\s)+\"http://www.web3d.org/specifications/x3d-3.1.dtd\"(\\s)*(>|\\[)
X3D 3.0 DOCTYPE:
<!DOCTYPE X3D PUBLIC(\\s)+\"ISO//Web3D//DTD X3D 3.0//EN\"(\\s)+\"http://www.web3d.org/specifications/x3d-3.0.dtd\"(\\s)*(>|\\[)

No regex checking of X3D XML Schema values is needed, since literal validation of those attribute values gets performed by X3D DTD and XML Schema themselves.


X3D Patterns to top

In order to enable inclusion of commas as whitespace characters, native XML Schema datatypes typically cannot be used directly. In order to ensure strict validation, regex patterns must be used. Of further note is that regex patterns only apply to base type xs:string.

General regex design considerations for X3D XML Schema include the following.

  1. For numeric types, leading sign characters (+ or -) are optionally present.
  2. For numeric types, leading zeroes are not allowed, except for an optional leading zero preceding the decimal point when the significand is only fractional.
  3. Intermediate commas are treated as whitespace, but only allowed between each singleton value. For example, SFVec3f 3-tuple values within an MFVec3f array do not contain comma characters.
  4. Careful design allows use of regexes that can also be adapted to JavaScript/JSON, Java and other language environments.
  5. These regexes all assume that leading/trailing whitespace has been removed. It is possible to prepend/append regex constructs such as (\s)* to consume outer whitespace.

The following regex patterns are used in X3D Schema and other validation tools.

X3D datatype regex patterns
SFBool and MFBool: each SFBool value can be either true or false.

SFBool pattern:

<xs:restriction base="xs:boolean"/>
or (true|false)

MFBool pattern:

<xs:list itemType="xs:boolean"/>
or ((true|false)((\s|,\s)+)*(true|false))*
SFImage and MFImage: each SFImage value describes pixels in a 2D image.

 SFImage pattern: (\d|[1-9]\d+)(\s+(\d|[1-9]\d+)){2}(\s+((0x([a-f]|[A-F]|\d]){1,8})|[1-9]\d+|\d))*

MFImage pattern: ((\d|[1-9]\d+)(\s+(\d|[1-9]\d+)){2}(\s+((0x([a-f]|[A-F]|\d]){1,8})|[1-9]\d+|\d))*(\s)*(,)?(\s)*)*

SFInt32 and MFInt32: each SFInt32 value can be either true or false.

SFInt32 pattern:

<xs:restriction base="xs:integer"/>
or (\+|\-)?(0|[1-9][0-9]*)?

MFInt32 pattern:

((\+|\-)?(0|[1-9][0-9]*)?( )?(,)?( )?)*
SFDouble and MFDouble: each SFDouble value is a double-precision floating-point number.
SFFloat    and MFFloat:     each SFFloat     value is a single-precision floating-point number.
SFTime    and MFTime:     each SFTime     value is a double-precision floating-point number.

SFFloat and SFDouble/SFTime pattern:

<xs:restriction base="xs:float"/> and <xs:restriction base="xs:double"/>
or ((\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)?)

MFFloat, MFDouble and MFTime pattern:

(((\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)?)?( )?(,)?( )?)*
SFVec2f  and MFVec2f: each SFVec2f value is a 2-tuple set of single-precision floating point numbers.
SFVec2d and MFVec2d: each SFVec2d value is a 2-tuple set of double-precision floating point numbers.

SFVec2f and SFVec2d pattern:

((\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)? (\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)?)?

MFVec2f and MFVec2d pattern:

((\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)? (\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)?( )?(,)?( )?)*
SFVec3f  and MFVec3f: each SFVec3f value is a 3-tuple triplet of single-precision floating point numbers.
SFVec3d and MFVec3d: each SFVec3d value is a 3-tuple triplet of double-precision floating point numbers.

SFVec3f and SFVec3d pattern:

((\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)? (\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)? (\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)?)?

MFVec3f and MFVec3d pattern:

((\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)? (\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)? (\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)?( )?(,)?( )?)*
SFVec4f  and MFVec4f: each SFVec4f value is a 4-tuple set of single-precision floating point numbers.
SFVec4d and MFVec4d: each SFVec4d value is a 4-tuple set of double-precision floating point numbers.

SFVec4f and SFVec4d pattern:

((\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)? (\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)? (\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)? (\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)?)?

MFColor and MFVec4d pattern:

((\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)? (\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)? (\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)? (\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)?( )?(,)?( )?)*
SFColor  and MFColor: each SFColor value is a 3-tuple set of single-precision floating point numbers, each ranging from [0,1] inclusive.

SFColor pattern:

((((\.[0-9]+|0(\.[0-9]*)?)((E|e)(\+|\-)?[0-9]+)?)|(1(\.[0]*)?((E|e)\-[0-9]+)?)|([1-9](\.[0-9]*)((E|e)\-[0-9]+))) (((\.[0-9]+|0(\.[0-9]*)?)((E|e)(\+|\-)?[0-9]+)?)|(1(\.[0]*)?((E|e)\-[0-9]+)?)|([1-9](\.[0-9]*)((E|e)\-[0-9]+))) (((\.[0-9]+|0(\.[0-9]*)?)((E|e)(\+|\-)?[0-9]+)?)|(1(\.[0]*)?((E|e)\-[0-9]+)?)|([1-9](\.[0-9]*)((E|e)\-[0-9]+))))?

MFColor pattern:

SFColorRGBA  and MFColorRGBA: each SFColorRGBA value is a 4-tuple set of single-precision floating point numbers, each ranging from [0,1] inclusive.

SFColorRGBA pattern:

((((\.[0-9]+|0(\.[0-9]*)?)((E|e)(\+|\-)?[0-9]+)?)|(1(\.[0]*)?((E|e)\-[0-9]+)?)|([1-9](\.[0-9]*)((E|e)\-[0-9]+))) (((\.[0-9]+|0(\.[0-9]*)?)((E|e)(\+|\-)?[0-9]+)?)|(1(\.[0]*)?((E|e)\-[0-9]+)?)|([1-9](\.[0-9]*)((E|e)\-[0-9]+))) (((\.[0-9]+|0(\.[0-9]*)?)((E|e)(\+|\-)?[0-9]+)?)|(1(\.[0]*)?((E|e)\-[0-9]+)?)|([1-9](\.[0-9]*)((E|e)\-[0-9]+))) (((\.[0-9]+|0(\.[0-9]*)?)((E|e)(\+|\-)?[0-9]+)?)|(1(\.[0]*)?((E|e)\-[0-9]+)?)|([1-9](\.[0-9]*)((E|e)\-[0-9]+))))?

MFColorRGBA pattern:

((((\.[0-9]+|0(\.[0-9]*)?)((E|e)(\+|\-)?[0-9]+)?)|(1(\.[0]*)?((E|e)\-[0-9]+)?)|([1-9](\.[0-9]*)((E|e)\-[0-9]+))) (((\.[0-9]+|0(\.[0-9]*)?)((E|e)(\+|\-)?[0-9]+)?)|(1(\.[0]*)?((E|e)\-[0-9]+)?)|([1-9](\.[0-9]*)((E|e)\-[0-9]+))) (((\.[0-9]+|0(\.[0-9]*)?)((E|e)(\+|\-)?[0-9]+)?)|(1(\.[0]*)?((E|e)\-[0-9]+)?)|([1-9](\.[0-9]*)((E|e)\-[0-9]+))) (((\.[0-9]+|0(\.[0-9]*)?)((E|e)(\+|\-)?[0-9]+)?)|(1(\.[0]*)?((E|e)\-[0-9]+)?)|([1-9](\.[0-9]*)((E|e)\-[0-9]+)))( )?(,)?( )?)*
SFRotation  and MFRotation: each SFRotation value is a 4-tuple set of single-precision floating point numbers, representing normalized 3-tuple axis of rotation followed by angle of rotation in radians.
TODO: detect illegal axis vector 0 0 0

SFRotation pattern:

((\+|\-)?(((\.[0-9]+|0(\.[0-9]*)?)((E|e)(\+|\-)?[0-9]+)?)|(1(\.[0]*)?((E|e)\-[0-9]+)?)|([1-9](\.[0-9]*)((E|e)\-[0-9]+))) (\+|\-)?(((\.[0-9]+|0(\.[0-9]*)?)((E|e)(\+|\-)?[0-9]+)?)|(1(\.[0]*)?((E|e)\-[0-9]+)?)|([1-9](\.[0-9]*)((E|e)\-[0-9]+))) (\+|\-)?(((\.[0-9]+|0(\.[0-9]*)?)((E|e)(\+|\-)?[0-9]+)?)|(1(\.[0]*)?((E|e)\-[0-9]+)?)|([1-9](\.[0-9]*)((E|e)\-[0-9]+))) (\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)?)?

MFRotation pattern:

((\+|\-)?(((\.[0-9]+|0(\.[0-9]*)?)((E|e)(\+|\-)?[0-9]+)?)|(1(\.[0]*)?((E|e)\-[0-9]+)?)|([1-9](\.[0-9]*)((E|e)\-[0-9]+))) (\+|\-)?(((\.[0-9]+|0(\.[0-9]*)?)((E|e)(\+|\-)?[0-9]+)?)|(1(\.[0]*)?((E|e)\-[0-9]+)?)|([1-9](\.[0-9]*)((E|e)\-[0-9]+))) (\+|\-)?(((\.[0-9]+|0(\.[0-9]*)?)((E|e)(\+|\-)?[0-9]+)?)|(1(\.[0]*)?((E|e)\-[0-9]+)?)|([1-9](\.[0-9]*)((E|e)\-[0-9]+))) (\+|\-)?(0|[1-9][0-9]*)?(\.[0-9]*)?((E|e)(\+|\-)?[0-9]+)?( )?(,)?( )?)*
SFString  and MFString: each SFString value is an unquoted set of characters, while each MFString consists of quoted SFString values separated by whitespace (including commas).

SFString pattern:

<xs:restriction base="xs:string"/>

MFString pattern: TODO is a more restrictive expression possible?

<xs:list itemType="xs:string"/>
Bounding box size (bboxSize): a 3-tuple SFVec3f value with each x,y,z component in range [0,+infinity) or -1. Default bboxSize='-1 -1 -1' indicating that no bounding box has been provided.

bboxSize pattern:

(((\+)?([1-9][0-9]*(\.[0-9]*)?|(0?\.[0-9]*[1-9][0-9]*))((E|e)(\+|\-)?[0-9]+)? (\+)?([1-9][0-9]*(\.[0-9]*)?|(0?\.[0-9]*[1-9][0-9]*))((E|e)(\+|\-)?[0-9]+)? (\+)?([1-9][0-9]*(\.[0-9]*)?|(0?\.[0-9]*[1-9][0-9]*))((E|e)(\+|\-)?[0-9]+)?)|(\-1(\.(0)*)? \-1(\.(0)*)? \-1(\.(0)*)?))?

TODO. List other existing X3D schema Matrix types in this table.

A scratch pad of various experimental regexes is found in X3dRegularExpressionTests.txt.


References to top

The following references provide useful additional information about X3D, regular expressions and data validation.

  1. Davis, Mark. Unicode Regular Expression Guidelines, 2016.
  2. Friedl, Jeffrey E.F., Mastering Regular Expressions: Understand Your Data and Be More Productive 3rd Edition, O'Reilly and Associates, Sebastopol California, 2009.
  3. RegExLib.com regular expression library
  4. Regular-expressions.Info website, including Runaway Regular Expressions: Catastrophic Backtracking.
  5. E-mail post: Regular expression checking for malformed floating-point numbers and excess leading zeros
  6. Wikipedia: Regular expression article.
  7. Wikipedia: Ken Thompson's construction algorithm article.
  8. X3D field data types are defined in Annex 5. Field type reference X3D Abstract Specification.
  9. World Wide Web Consortium (W3C) Recommendation: XML Schema Definition Language (XSD) 1.1 Part 2 Datatypes with sections on Regular expressions, float checking, etc.
  10. X3D Specifications: XML Schema and DOCTYPE Validation include the latest versions of recommended XML Schemas and DOCTYPEs (DTDs) for X3D.
  11. X3D Resources provides links to numerous resources supporting X3D and VRML.
  12. X3D Scene Authoring Hints provide a collection of style guidelines, authoring tips and best practices to improve the quality, consistency and maintainability of X3D Graphics scenes.
  13. X3D Unified Object Model (X3DUOM)
  14. X3D Java Scene Access Interface Library (X3DJSAIL)
  15. X3D Validator is a Web application that checks X3D scene validity.
  16. XML Schema 1.1 Part 2: Datatypes Second Edition, World Wide Web Consortium (W3C) Recommendation. Includes sections on Datatype System, Built-in datatypes and Regular Expressions.

Tools to top

Links to tools of interest follow. Additional recommendations are welcome.


Contact to top

Questions, suggestions and comments about these resources are welcome. Please send them to Don Brutzman and Roy Walmsley (brutzman at nps.edu, roy.walmsley at ntlworld.com()

Available online at http://www.web3d.org/specifications/X3dRegularExpressions.html

Version control of this document is maintained at
https://sourceforge.net/p/x3d/code/HEAD/tree/www.web3d.org/specifications/X3dRegularExpressions.html

Updated: 14 June 2018