Schema Standard

Introduction

This document defines a set of common approaches that we expect to be applied to all schema designs. Beyond those things that we just consider to be best-practice in schema design generally, the overriding requirements for OpenActuary schema design are ease of comprehension and is ease of use. It is seen as critical to the success of OpenActuary that anyone who is exposed to the schema can quickly grasp the concepts being represented and can extract meaningful result sets from it, and insert new data,  without the need for overly complex queries.

We are not proposing that these conventions represent best practice for schema design in any context other than OpenActuary, but we think it does meet our specific requirements well.

Principals

  1. Three letter prefixes are applied to all table names.

    For example: Policy would become polPolicy, claim would be claClaim.

    On its own, this principal is not of much value. However, in combination with 2. and 3. it clarifies the structure by clearly defining the prefix used for all data. For example, without further investigation, a user will know that polNumber is the number from the policy table.

  2. All column names carry the table name's prefix.

    For example: Number in the polPolicy table becomes polNumber, Number in the claClaim table becomes claNumber.

    This makes it easier to write queries, as aliases are less frequently needed, and also makes queries shorter and easier to understand as each column in a result set automatically identifies the table it came from.

  3. Table and column names camel format (starting with lower case).

    For example, a table representing the claim handler concept might be clhClaimHandler (not ClhClaimHandler, or clh_claim_handler). Column names follow the same convention.

  4. All tables have and ID column.

    With the exception of ENUMS (more later on them), natural IDs are avoided in favour or system generated IDs.Following the rules above, the ID for the polPolicy table would be polID.

  5. Foreign key names identify the table they point to.

    Foreign keys adopt the convention of always ending with ID<table prefix>. A foreign key in the claClaim referring to polPolicy would therefore be named claIDpol.

  6. Full words are preferred in column names.

    It is quite common to find TLAs and vowel-dropping or short-name techniques used in database design. It is our observation that this leads to inconsistency, which makes queries harder to write/understand, and misunderstandings which impact the comprehensibility of the schema. Therefore, for example, the concept of expiry date in a policy would be represented as polExpiryDate, not polExpDt or polExDate etc.

    When a column name does become long to the point of being unwieldy, the first suggestion is to take a step back and look at the schema as a whole. An unwieldy column name can be an indication that the table structure itself is at fault. Remove repetition in the name, polPolicyNumberOutwardsInsurance could be shortened to polNumberOutwards for example.

Enumerations

The topic of enumerations (or codes) is worthy of a section in its own right. There are a number of common approaches used in database design to handle the concept of enumerations. Each has advantages, but we think that the approach outlined here best fits our requirements for ease of use and comprehensibility.

Approaches considered

  • Using the ENUM column types. Many databases (MySQL included) support the column type of ENUM. This allows the list of enumeration values to be defined as part of the table definition. The advantage of this approach is that the values themselves are used directly in queries (without joins), and appear directly in result sets. However, they work less well in our context because whilst some enumeration lists are very unlikely to ever change (take Gender (Male,Female) as an example!) others, like currency and country, change quite frequently. As OpenActuary expects to hold historical information it is important that we can easily identify which values for a given enumeration were valid for which dates. The ENUM column type has no support for additional constraints like validTo and validFrom dates.

  • Classification Scheme/Classification. This is another common approach where two tables are used to represent the enumeration types (classification scheme) and the enumeration values (classification). Classification has a foreign key to classification scheme to tie each value to the scheme to which it belongs. The advantage of this approach is that it uses just two tables to represent all enumeration types and values and that those values are extensible without schema modification. It has disadvantages too though: It isn't "type safe" - i.e. there is nothing to prevent the polPolicy table's polCover column being populated with "Male" from the Gender enumeration when it should only be possible to populate it with a value from the CoverType enumeration. It also requires the use of joins and aliases to extract meaningful results (as opposed to IDs).

Approach adopted

  • Each enumeration type is defined by its own table
    Each row of an enum table defines a valid value for the enum. The standard columns are: name, validFrom and validTo.

  • The name column will be the primary key.

  • Each enumeration table will carry a prefix (exactly as other tables do), the prefix is followed by the letters EN to identify the table as an enumeration, this is followed by the name. e.g. makENMake, curENCurrency.

  • The aim of this is to clearly partition enumeration tables from data tables. In the absence of this convention the schema would appear far more complex than it is.

The advantages of this approach are that it is type safe (like the ENUM column type), and is also capable of supporting validTo and validFrom dates (like Classification Scheme/Classification). As the primary key is the char value of the enumeration queries are simpler to write and result sets are easier to understand without the need for joins or aliases (like ENUM).

Sample Scheme

The following example schema and data extract outlines all the main features of these standards.

Sample standing data for enumCover table

Name Valid from Valid to
THIRD_PARTY NULL NULL
THIRD_PARTY_FIRE_AND_THEFT NULL NULL
COMPREHENSIVE NULL NULL

Sample standing data for enumBodyType table

Name Valid from Valid to
CONVERTIBLE NULL NULL
HATCHBACK NULL NULL
ESTATE NULL NULL
SALOON NULL NULL
COUPE NULL NULL

Sample standing data for enumMake table

Name makeID
FORD 1
FIAT 2

Sample standing data for enumModel table

Name makeID
FIESTA 1
MONDEO 2
FOCUS 3
KUGA 4
MODEL-T 5
DOBLO 6
PANDA 7
BRAVO 8
500C 9

Sample standing data for cascadeMakeModel table

makeID modelID Valid from Valid to
1 1 1976 NULL
1 2 1993 NULL
1 3 1998 NULL
1 4 2008 NULL
1 5 1908 1927
2 6 2001 NULL
2 7 2003 NULL
2 8 2007 NULL
2 9 2009 NULL
1 Attachment
1303 Views
Average (0 Votes)
Comments
No comments yet. Be the first.