XSD (XML Schema) Best Practices

XML Schema is very important for Xopus since it determines much of how Xopus works with your XML. The better you define what is and isn't possible, the better Xopus can assist you in editing.

Remember the following: the more you require of your XML the more Xopus will do for it, and for you. The less you define anything, the more trouble the representation by Xopus will have.

Therefore take a look at the following:

  • Re-use your complex-types when defining a schema
  • The default attribute isn't what it seems
  • Don't mix block and inline elements
  • Define your schema as strict as possible
  • Too many choices
  • List-items and Table Data elements
  • Section handling
  • A good idea for a document's XML

Re-use your complex-types when defining a schema

When you are defining your XML in schema, it is best to start at the root, and define types for elements. Then re-use these types for other elements in your schema. Take a look at the following example:

<?xml version="1.0" encoding="utf-8" ?>
 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
 version="0.4">

  <xs:element name="document" type="sType"></xs:element>

  <xs:complexType name="sType">
    <xs:sequence>
      <xs:element name="header" type="hType"/>
      <xs:choice>
        <xs:choice maxOccurs="unbounded">
          <xs:element name="paragraph" type="pType"/>
          <xs:element name="orderedlist" type="lType"/>
          <xs:element name="unorderedlist" type="lType"/>
        </xs:choice>
        <xs:element name="section" type="sType"/>
      </xs:choice>
      <xs:element name="section" minOccurs="0"
 maxOccurs="unbounded" type="sType"/>
    </xs:sequence>
  </xs:complexType>

  <xs:simpleType name="hType">
    <xs:restriction base="xs:string"/>
  </xs:simpleType>

  <xs:complexType name="pType" mixed="true">
    <xs:choice minOccurs="0">
      <xs:element name="strong"/>
      <xs:element name="italic"/>
      <xs:element name="br"/>
    </xs:choice>
  </xs:complexType>

  <xs:complexType name="lType">
    <xs:sequence minOccurs="1" maxOccurs="unbounded">
      <xs:element name="listitem" type="iType"/>
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="iType">
    <xs:choice maxOccurs="unbounded">
      <xs:element name="paragraph" type="pType"/>
      <xs:element name="orderedlist" type="lType"/>
      <xs:element name="unorderedlist" type="lType"/>
    </xs:choice>
  </xs:complexType>

</xs:schema>

We recommend using the Venetian Blinds pattern because it automatically assists the user in re-using types. Take a look at the following Guidelines for Schema and this Introduction to Design Patterns in XML Schemas.

If a an element has no type, it means that everything is possible in the element. This doesn't work well at all with Xopus.

The default attribute isn't what it seems

Setting a default attribute on an element in schema means that the default value is used when the attribute doesn't exist. It is better to make attributes required, and define the values in restrictions and/or enumerations.

The default value is only used within Xopus and in the XSL where the value is mentioned, or other systems which read the XML through the schema. These default values are not used in schema-unaware applications.

Don't mix block and inline elements

The main rule is not to mix up block and inline elements. The rule is that you cannot put block elements into inline elements. This is very important for Xopus since it outputs XHTML to draw the wysiwyg view. This means the following:

Having a paragraph which is surrounded by text is invalid:

text text <p>text</p> text

This is true for all block elements.

Define your schema as strict as possible.

Xopus really benefits from having a very strict schema. It allows Xopus to be assist the user in the best way possible.

Strictness has to do what is possible around, in and with an element. If the element can be added everywhere, Xopus will always show this as a possibility everywhere.

If everything is possible within the element, then the options that you get for the element will be expansive.

And if you can do everything with an element, then Xopus will need a lot of time calculating these options. Xopus does pre-validation, this means that it checks what is possible with an element before showing you what is possible.

If your XML isn't strict, and can't be, consider defining a subset of it stricter and edit the XML as such, converting it later to a less strict Schema.

In order to be stricter about your XML, consider the following things:

  • Use sequences in your schema and not 'all'. An 'all' defined element, with more than eight elements will severely slow down Xopus.
  • When you define an element's or attribute's type, consider carefully what you want from it. Is it an integer, or a non-negative integer. Take a look at the options at : http://www.w3.org/TR/xmlschema-0/#CreatDt

Too many choices

Xopus could have problems with schema constructions that allow for too many possibilities. Using the attributes minOccurs and maxOccurs in an other way than than they were intended, could supply Xopus with such an amount of possibilites that it takes a lot of time to calculate them all. Here is how minOccurs and maxOccurs are best used.

  1. minOccurs and maxOccurs have “1” as default value. Keep your XSD readable by removing minOccurs=”1” and maxOccurs=”1” attributes.
  2. Using large values usually makes no sense. They can confuse the user and can hurt startup performance. Usual values for minOccurs are “0”, “1” and “2”, and for maxOccurs only “1” and “unbounded”. Other values should be used with care.
  3. Sequences should not have either minOccurs or maxOccurs. Consider wrapping the sequence in an element and put the minOccurs and maxOccurs on the element declaration.
  4. Choices should have only minOccurs and maxOccurs on the choice itself or on its children, but not on both. Putting the attributes on both the choice and its children creates an exponential amount of possibilities.

List-items and Table Data elements

A list-item can be defined as something containing another list. This can cause problems. Similarly a table-cell containing another table, with more cells can be trouble. What happens is that it is unclear in Xopus where you can add a new list-item or a new cell.

Define the contents of a list-item or cell in a paragraph. Your XML then looks something like this:

<li><p>contents</p></li>

Since Xopus 3.1.4 pressing enter once will create a new bullet item in the list, and pressing it twice will create a paragraph after the list. If you are using the model above, once creates a new paragraph in the list item, twice creates a new list-item (with a paragraph in it), and pressing enter three times will get you out of the list, and create a new paragraph after the list.

Section handling

There are three ways to define a section:

  • A document can have sections and paragraphs. Sections have a title and one or more paragraphs. (Paragraphs are allowed in between of sections, which isn't very strict)
  • A document only has sections, but a section doesn't require a header (Then why have sections at all? Again, this isn't strict enough, it will be unclear to which section the paragraph belongs)
  • A document only has sections, and always begins with a title. (Simple, clean and strict)

In the first case the following XML could be created:

<section>
    <title/>
    <paragraph/>
  </section>
  <paragraph/>

This will confuse the user and Xopus about the second paragraph. Depending on how it is drawn, it will appear as if the second paragraph is in the section, but it isn't. Thus when sections are drawn differently, this paragraph will not be fitted correctly.

In the second case you would have almost the same XML, except that the second paragraph woul d be in it's own section, however now you wouldn't know to which section the paragraph belongs.

In the third case all paragraphs are always in a section, and every section always has a title. This it can always be drawn correctly, and is the only right choice. It has a strict structure, and is therefore easy to edit. You could always leave the title of the section empty, but the element still exists.

A good idea for a document's XML

The following is a good setup of a document.

  • A document has a title
  • the title is followed by one or more blocks (p, div, etc.) and this is followed by one section
  • That section is then followed by zero or more sections.This way there can never be a mix-up about where the paragraph is.

For example:

<?xml version="1.0" encoding="utf-8" ?>
 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
   version="0.4">
  
  <xs:element name="document" type="sType"></xs:element>

  <xs:complexType name="sType">
    <xs:sequence>
      <xs:element name="header" type="hType"/>
      <xs:choice>
        <xs:choice maxOccurs="unbounded">
          <xs:element name="paragraph" type="pType"/>
          <xs:element name="orderedlist" type="lType"/>
          <xs:element name="unorderedlist" type="lType"/>
        </xs:choice>
        <xs:element name="section" type="sType"/>
      </xs:choice>
      <xs:element name="section" minOccurs="0" 
                 maxOccurs="unbounded" type="sType"/>
    </xs:sequence>
  </xs:complexType>

  <xs:simpleType name="hType">
    <xs:restriction base="xs:string"/>
  </xs:simpleType>
  <xs:complexType name="pType" mixed="true">
    <xs:choice minOccurs="0">
      <xs:element name="strong"/>
      <xs:element name="italic"/>
      <xs:element name="br"/>
    </xs:choice>
  </xs:complexType>

...

</xs:schema>