to X3D Specifications: XML Schema and DOCTYPE Validation
   

X3D Regular Expressions (regexes)

    
to Web3D home page

Overview | Design Considerations and Whitespace | X3D Patterns | XML Patterns | References | Tools | X3D Resources | Contact

X3D Regular Expressions (regexes) are used to validate the correctness of string and numeric array values in an X3D scene.

XML | DOCTYPE | Bool | Color | ColorRGBA | Double | Float | Image | Int32 | Rotation | String | Time | Vec2 | Vec3 | Vec4 | Matrix3 | Matrix4 | bboxSize


🔖 Overview to top

Regular expressions (regexes) define string grammars that efficiently and rigorously define allowable character patterns making up a data value.

Regexes themselves are carefully defined sequences of characters that form a search pattern, mainly used for string pattern matching. For example, this technique allows detection of well-formed (or incorrect) MFVec3f arrays of three-tuple floats in an X3D scene.

X3D regexes are utilized judiciously when the base types of XML Schema are insufficient to capture the necessary richness of X3D content validation. Like all aspects of X3D Schema validation, regex validation is reasonably high performance and optional.

Note that not all regex languages are completely consistent, thus small (but fundamentally important) variations can occur. This work strictly follows regex syntax for XML Schema, which in turn permits consistent application using other variations of regex languages.

Interestingly, various data validation tools provide expressive power that is able to validate values to different degrees of fidelity.

  1. DOCTYPE (DTD). DOCTYPE validation can only check that attribute values are strings. In some cases, a strict set of allowed enumeration values is defined (such as legal names for profiles and components).
  2. XML Schema. Schema validation can check a large set of built-in data types. However, XML Schema validation is typically not able to fully check the correctness of array values. For example, an SFVec3f triplet (3-tuple) or an MFVec3f array can be checked to only contain floating-point values, but cannot be checked to have a multiple of three floats.
  3. Regular expressions (regexes). Regular expressions can define any regular grammar, and thus have arbitrary expressive power. Although definitions may be tricky to define, character patterns of arbitrary complexity are theoretically achievable.
  4. Regexes found on this page are included in the data-type definitions of each X3D XML Schema and X3D Unified Object Model (X3DUOM).

X3D Regular Expressions are an important part of X3D Quality Assurance (QA) to maximize the correctness of X3D scene content.


🔖 Design Considerations to top

General regex design considerations for X3D XML Schema include the following.

  1. Regex constructs that are inconsistently implemented across major programming languages are avoided.
  2. Careful design allows use of regexes that can be adapted for usage in XML, ClassicVRML, JavaScript/JSON, Java and other language environments.
  3. For numeric types, leading sign characters (+ or -) are optionally present.
  4. For numeric types, leading zeroes are not allowed, except for an optional leading zero preceding the decimal point when the significand is only fractional.
  5. A required mantissa (integer or floating point) is represented as 0|[1-9][0-9]* (meaning either a solitary 0 or else a single non-zero digit followed by an optional number of additional digits).
  6. The fractional part (to the right of the decimal point) can be represented as [0-9]*
  7. The decimal separator is the symbol used to separate the integer part from the fractional part of a decimal number. The period character . is used for all such X3D values, and is escaped as \. when written in a regex.
  8. Scientific notation starts with upper or lower-case E, is optionally positive or negative ([Ee][+-]?[0-9]+)? and can be appended to float expressions. For integer expressions, nonnegative scientific notation ([Ee][+]?[0-9]+)? can be appended.
  9. Regex anchors are implicit and not included in XML Schema and X3DUOM regexes. Note that strict consumption of all value characters gets performed by these regexes. The anchor characters are necessary for regex101 engine unit testing so that otherwise-illegal values are not rejected (in MF tests).

Pattern components are used repeatedly in this design:

🔖 Negative lookahead and disallowed values

Negative lookahead filters can disqualify attributes that contain illegal values.

This can be a useful construct since some specific values are disallowed in X3D. One such instance is that any zero vector is illegal as initial axis triplet of an SFRotation.

TODO: determine whether alternate (stricter and more complex) regexes should also get included in X3DUOM.

🔖 Whitespace Considerations to top

Intermediate commas are treated as whitespace, but are only allowed between each singleton value. For example, SFVec3f 3-tuple values within an MFVec3f array do not contain comma characters (but may be separated by commas and whitespace). Experience has shown that misplaced commas are a crucial indicator of malformed tuple values in large float arrays.

For XML Schema to enable any inclusion of intermediate commas as whitespace characters in MultiField (MF) array types, native XML Schema datatypes typically cannot be used directly. In order to ensure strict validation, regex patterns must be used instead.

Potential whitespace variations in scene content are not an issue if X3D Canonicalization (C14N) has been applied to an X3D scene.


🔖 X3D Patterns to top

The following regex patterns are used in X3D XML Schema, X3D Unified Object Model (X3DUOM) and other validation tools.

X3D datatype regex patterns
🔖 SFBool and ordered list 🔖 MFBool: each SFBool value can be either true or false.

Note that VRML97 values TRUE and FALSE are capitalized and illegal in X3D.

SFBool matching: single true or false value.

XML Schema:   <xs:restriction base="xs:boolean"/>
Regex pattern:  ^\s*(true|false)\s*$

MFBool matching: ordered list of zero or more SFBool values.

XML Schema:  <xs:list itemType="xs:boolean"/>
Regex pattern:  ^\s*((true|false)\s*,?\s*)*$
🔖 SFImage and ordered list 🔖 MFImage: each SFImage value describes pixels in a 2D image.

SFImage fields contain three nonnegative integers representing width, height and number of components [1..4] in the image, followed by width×height hexadecimal or integer values representing pixel colors in the image.

SFImage matching: two nonnegative integers, followed by integer [0-4], followed by a hexadecimal array, values

Regex pattern:  ^\s*([+]?(0|[1-9][0-9]*)([Ee][+]?[0-9]+)?\s+){2}[+]?[0-4](\s+(0x[0-9a-fA-F]{1,16}|[+]?(0|[1-9][0-9]*)([Ee][+]?[0-9]+)?))*\s*$

MFImage matching: ordered list of zero or more SFImage values.

Regex pattern:  ^\s*(([+]?(0|[1-9][0-9]*)([Ee][+]?[0-9]+)?\s+){2}[+]?[0-4](\s+(0x[0-9a-fA-F]{1,16}|[+]?(0|[1-9][0-9]*)([Ee][+]?[0-9]+)?))*\s*,?\s*)*$
🔖 SFInt32 and ordered list 🔖 MFInt32: each SFInt32 value is an unrestricted integer, with optional scientific-notation exponent.

SFInt32 matching: integer value with optional scientific notation.

XML Schema:  <xs:restriction base="xs:integer"/>
Regex pattern:  ^\s*[+-]?(0|[1-9][0-9]*)([Ee][+]?[0-9]+)?\s*$

MFInt32 matching: ordered list of zero or more SFInt32 values.

Regex pattern:  ^\s*([+-]?(0|[1-9][0-9]*)([Ee][+]?[0-9]+)?\s*,?\s*)*$
🔖 SFFloat    and ordered list 🔖 MFFloat:     each SFFloat     value is a  single-precision floating-point number.
🔖 SFDouble and ordered list 🔖 MFDouble: each SFDouble value is a double-precision floating-point number.
🔖 SFTime    and ordered list 🔖 MFTime:     each SFTime     value is a double-precision floating-point number.

SFFloat, SFDouble and SFTime matching: floating-point value with optional scientific notation.

XML Schema:   <xs:restriction base="xs:float"/> and <xs:restriction base="xs:double"/>
Regex pattern:  ^\s*([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*$

MFFloat/MFDouble/MFTime matching: ordered list of zero or more SFFloat/SFDouble/SFTime values.

Regex pattern:  ^\s*(([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*,?\s*)*$
🔖 SFVec2f  and ordered list 🔖 MFVec2f: each SFVec2f  value is a 2-tuple set of  single-precision floating point numbers.
🔖 SFVec2d and ordered list 🔖 MFVec2d: each SFVec2d value is a 2-tuple set of double-precision floating point numbers.

SFVec2f/SFVec2d matching: two-tuple floating-point values with optional scientific notation.

Regex pattern:  ^\s*(([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s+){1}([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*$

MFVec2f/MFVec2d matching: ordered list of zero or more SFVec2f/SFVec2d values.

Regex pattern:  ^\s*((([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s+){1}([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*,?\s*)*$
🔖 SFVec3f  and ordered list 🔖 MFVec3f: each SFVec3f  value is a 3-tuple triplet of  single-precision floating point numbers.
🔖 SFVec3d and ordered list 🔖 MFVec3d: each SFVec3d value is a 3-tuple triplet of double-precision floating point numbers.

SFVec3f/SFVec3d matching: three-tuple floating-point values with optional scientific notation.

Regex pattern:  ^\s*(([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s+){2}([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*$

MFVec3f/MFVec3d matching: ordered list of zero or more SFVec3f/SFVec3d values.

Regex pattern:  ^\s*((([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s+){2}([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*,?\s*)*$
🔖 SFVec4f  and ordered list 🔖 MFVec4f: each SFVec4f  value is a 4-tuple set of  single-precision floating point numbers.
🔖 SFVec4d and ordered list 🔖 MFVec4d: each SFVec4d value is a 4-tuple set of double-precision floating point numbers.

SFVec4f/SFVec4d matching: four-tuple floating-point values with optional scientific notation.

Regex pattern:  ^\s*(([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s+){3}([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*$

MFVec4f/MFVec4d matching: ordered list of zero or more SFVec4f/SFVec4d values.

Regex pattern:  ^\s*((([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s+){3}([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*,?\s*)*$
🔖 SFColor  and ordered list 🔖 MFColor: each SFColor value is a 3-tuple set of  single-precision floating point numbers, each ranging from [0,1] inclusive.

SFColor matching: three-tuple floating-point values with optional scientific notation, bounded.

Regex pattern:  ^\s*(([+]?((0(\.[0-9]*)?|\.[0-9]+)|1(\.0*)?)([Ee][+-]?[0-9]+)?)\s+){2}([+]?((0(\.[0-9]*)?|\.[0-9]+)|1(\.0*)?)([Ee][+-]?[0-9]+)?)\s*$

MFColor matching: ordered list of zero or more SFColor values.

Regex pattern:  ^\s*((([+]?((0(\.[0-9]*)?|\.[0-9]+)|1(\.0*)?)([Ee][+-]?[0-9]+)?)\s+){2}([+]?((0(\.[0-9]*)?|\.[0-9]+)|1(\.0*)?)([Ee][+-]?[0-9]+)?)\s*,?\s*)*$
🔖 SFColorRGBA  and ordered list 🔖 MFColorRGBA: each SFColorRGBA value is a 4-tuple set of  single-precision floating point numbers, each ranging from [0,1] inclusive.

SFColorRGBA matching: four-tuple floating-point values with optional scientific notation, bounded.

Regex pattern:  ^\s*(([+]?((0(\.[0-9]*)?|\.[0-9]+)|1(\.0*)?)([Ee][+-]?[0-9]+)?)\s+){3}([+]?((0(\.[0-9]*)?|\.[0-9]+)|1(\.0*)?)([Ee][+-]?[0-9]+)?)\s*$

MFColorRGBA matching: ordered list of zero or more SFColorRGBA values.

Regex pattern:  ^\s*((([+]?((0(\.[0-9]*)?|\.[0-9]+)|1(\.0*)?)([Ee][+-]?[0-9]+)?)\s+){3}([+]?((0(\.[0-9]*)?|\.[0-9]+)|1(\.0*)?)([Ee][+-]?[0-9]+)?)\s*,?\s*)*$
🔖 SFMatrix3f  and ordered list 🔖 MFMatrix3f: each SFMatrix3f  value is a 3x3 nine-tuple triplet of  single-precision floating point numbers.
🔖 SFMatrix3d and ordered list 🔖 MFMatrix3d: each SFMatrix3d value is a 3x3 nine-tuple triplet of double-precision floating point numbers.

SFMatrix3f/SFMatrix3d matching: 3x3 nine-tuple floating-point values with optional scientific notation.

Regex pattern:  ^\s*(([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s+){8}([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*$

MFMatrix3f/MFMatrix3d matching: ordered list of zero or more SFMatrix3f/SFMatrix3d values.

Regex pattern:  ^\s*((([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s+){8}([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*,?\s*)*$
🔖 SFMatrix4f  and ordered list 🔖 MFMatrix4f: each SFMatrix4f  value is a 4x4 sixteen-tuple set of  single-precision floating point numbers.
🔖 SFMatrix4d and ordered list 🔖 MFMatrix4d: each SFMatrix4d value is a 4x4 sixteen-tuple set of double-precision floating point numbers.

SFMatrix4f/SFMatrix4d matching: 4x4 sixteen-tuple floating-point values with optional scientific notation.

Regex pattern:  ^\s*(([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s+){15}([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*$

MFMatrix4f/MFMatrix4d matching: ordered list of zero or more SFMatrix4f/SFMatrix4d values.

Regex pattern:  ^\s*((([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s+){15}([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*,?\s*)*$
🔖 SFRotation  and ordered list 🔖 MFRotation: each SFRotation value is a 4-tuple set of  single-precision floating point numbers, representing normalized 3-tuple axis of rotation, followed by angle of rotation in angle base units (default is radians).
  • Note: it is reject illegal axis vector using negative lookahead (?!((0|0\.0*|\.0+)\s+){3})
  • Candidate subtypes: stricter enforcement that magnitudes of axis values are not greater than 1.

SFRotation matching: four-tuple floating-point values with optional scientific notation.

Regex pattern:  ^\s*(([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s+){3}([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*$

MFRotation matching: ordered list of zero or more SFRotation values.

Regex pattern:  ^\s*((([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s+){3}([+-]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*,?\s*)*$
🔖 SFString  and ordered list 🔖 MFString: each SFString value is an unquoted set of characters, while each MFString consists of quoted SFString values separated by whitespace (including commas).

SFString matching: unquoted string value.

XML Schema:  <xs:restriction base="xs:string"/>
Regex pattern: none.

MFString matching: ordered list of zero or more quoted SFString values.

XML Schema:  <xs:list itemType="xs:string"/>
TODO. Is there a practical regex that can match pairs of unescaped quotation marks?
🔖 Bounding box size (bboxSize): a 3-tuple SFVec3f value with each x,y,z component in nonnegative range [0,+infinity) or else -1 -1 -1.
Default bboxSize='-1 -1 -1' is a sentinel value indicating that no bounding box value has been provided.

bboxSize matching: three-tuple floating-point values with optional scientific notation, bounded non-negative or else -1 -1 -1 sentinel value.

Regex pattern:  ^\s*((([+]?(((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s+){2}([+]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*)|((\-1(\.(0)*)?([Ee][+-]?[0]+)?\s+){2}\-1(\.(0)*)?([Ee][+-]?[0]+)?)\s*)?$

TODO. Several pattern possibilities for urls/URNs are conceivable, SFStringURL type definition and regex pattern is likely useful.

X3D Java Scene Access Interface Library (X3DJSAIL) provides a set of unit tests in org.web3d.x3d.tests.FieldObjectTests to check these regexes against default and alternative values.

A scratch pad of various experimental regexes is found in X3dRegularExpressionTests.txt.


🔖 XML Patterns to top

X3D Scene Authoring Hints: Validation describe all initial XML headers for X3D.

Note that regex checking of expressions within the X3D XML Schemas are needed, since literal validation of those attribute values gets performed by various authoring tools (such as Altova XMLSpy).

Regex testing for XML-related constructs by X3dDoctypeChecker.java include the following.

XML and DOCTYPE regex patterns
XML declaration as part of the document prolog:
<\\?xml version=(\"|')1.(0|1)(\"|') encoding=(\"|')UTF-(8|16)(\"|')\\?>
XML declaration, case insensitive:
<\\?xml version=(\"|')1.(0|1)(\"|') encoding=(\"|')(U|u)(T|t)(F|f)-(8|16)(\"|')\\?>
Experimental X3D 4.0 DOCTYPE:
<!DOCTYPE X3D PUBLIC(\\s)+\"ISO//Web3D//DTD X3D 4.0//EN\"(\\s)+\"http://www.web3d.org/specifications/x3d-4.0.dtd\"(\\s)*(>|\\[)
Approved X3D 3.3 DOCTYPE:
<!DOCTYPE X3D PUBLIC(\\s)+\"ISO//Web3D//DTD X3D 3.3//EN\"(\\s)+\"http://www.web3d.org/specifications/x3d-3.3.dtd\"(\\s)*(>|\\[)
X3D 3.2 DOCTYPE:
<!DOCTYPE X3D PUBLIC(\\s)+\"ISO//Web3D//DTD X3D 3.2//EN\"(\\s)+\"http://www.web3d.org/specifications/x3d-3.2.dtd\"(\\s)*(>|\\[)
X3D 3.1 DOCTYPE:
<!DOCTYPE X3D PUBLIC(\\s)+\"ISO//Web3D//DTD X3D 3.1//EN\"(\\s)+\"http://www.web3d.org/specifications/x3d-3.1.dtd\"(\\s)*(>|\\[)
X3D 3.0 DOCTYPE:
<!DOCTYPE X3D PUBLIC(\\s)+\"ISO//Web3D//DTD X3D 3.0//EN\"(\\s)+\"http://www.web3d.org/specifications/x3d-3.0.dtd\"(\\s)*(>|\\[)

Error-detection regexes are applied as part of X3D Examples Archives build.xml checking, see target processScenes.regularExpressionChecks for invocation details.

Error detection patterns
Malformed floating-point numbers (multiple decimal points, etc.), property name="regexGarbledFloats"
(\s|,|"|')(((\+|-)?((\.\d+)|(\d+\.\d*))((E|e)(\+|-)?\d+)?(\.|\+|-)+(\d*))|(\d+((\+|-)\d+)+))(\s|,|"|')
Excess leading zeroes, property name="regexLeadingZeroes"
(\s|,|"|')(\+|-)?0\d+(\.\d*)?((E|e)(\+|-)?\d+)?(\s|,|"|')
... which is preceded by negative look-behind checks to avoid flagging software version numbers, property name="regexNegativeLookBehinds"
?<!address=)(?<!pecification)(?<!ection)(?<!aragraph)(?<!CosmoPlayer)(?<!CAD Exchanger)

🔖 References to top

The following references provide useful additional information about X3D, regular expressions and data validation.

  1. Davis, Mark. Unicode Regular Expression Guidelines, 2016.
  2. Friedl, Jeffrey E.F., Mastering Regular Expressions: Understand Your Data and Be More Productive Third Edition, O'Reilly and Associates, Sebastopol California, 2009.
  3. Goyvaertz, Jan and Levithan, Steven, Regular Expressions Cookbook: Detailed Solutions in Eight Programming Languages, second edition, O'Reilly Media, Sebastopol California, 2012.
  4. RegExLib.com regular expression library
  5. Regular-expressions.Info website, including XML Schema Regular Expressions and Runaway Regular Expressions: Catastrophic Backtracking.
  6. E-mail post: Regular expression checking for malformed floating-point numbers and excess leading zeros
  7. Wikipedia: Regular expression article.
  8. Wikipedia: Ken Thompson's construction algorithm article.
  9. X3D field data types are defined in Annex 5. Field type reference X3D Abstract Specification.
  10. World Wide Web Consortium (W3C) Recommendation: XML Schema Definition Language (XSD) 1.1 Part 2 Datatypes with sections on Regular expressions, float checking, etc.
  11. X3D Specifications: XML Schema and DOCTYPE Validation include the latest versions of recommended XML Schemas and DOCTYPEs (DTDs) for X3D.
  12. X3D Resources provides links to numerous resources supporting X3D and VRML.
  13. X3D Scene Authoring Hints provide a collection of style guidelines, authoring tips and best practices to improve the quality, consistency and maintainability of X3D Graphics scenes.
  14. X3D Unified Object Model (X3DUOM)
  15. X3D Java Scene Access Interface Library (X3DJSAIL)
  16. X3D Validator is a Web application that checks X3D scene validity.
  17. XML Schema 1.1 Part 2: Datatypes Second Edition, World Wide Web Consortium (W3C) Recommendation. Includes sections on Datatype System, Built-in datatypes and Regular Expressions.

🔖 Tools to top

Links to tools of interest follow. Additional recommendations are welcome.


🔖 Contact to top

Questions, suggestions and comments about these resources are welcome. Please send them to Don Brutzman and Roy Walmsley (brutzman at nps.edu, roy.walmsley at ntlworld.com()

Available online at http://www.web3d.org/specifications/X3dRegularExpressions.html

Version control of this document is maintained at
https://sourceforge.net/p/x3d/code/HEAD/tree/www.web3d.org/specifications/X3dRegularExpressions.html

Updated: 24 September 2018