Architecture
The architecture for CST is described in two parts. The first section describes the general themes and features which are supported by the architecture. The design issues described in these sections have greatly influenced the design of the code base, which is described in the second section.
-
General Design Themes and Features
- Main Business Concepts
- Separate administrator and data curation tools share the same data repository
- Common three-tier architecture is used to separate concerns about presentation, business concepts and persistent storage
- The tools can interact with either an in-memory demonstration database or a MySQL production database
- An audit trail of changes made to activity data is maintained
- Tools support plugins
- The entire tool suite is generated from properties described in a configuration file
- Tool suite supports complete data migration
General Design Themes and Features
Main Business Concepts
At the heart of the architecture are two main business concepts: a TrialSubjectModel and a TrialActivityModel. ATrialSubjectModel
represents information about a trial subject and comprises two types of data fields:
- a primary key field which can be used to uniquely identify a trial subject (example: name="serial_number", value="456M03")
- zero or more filter fields which can be used to filter subsets of trial subjects. (example: name="location" value="Manchester")
Each TrialSubjectModel
is associated with at least one TrialActivityModel
, which represents some kind of activity applied to a trial subject. A TrialActivityModel
comprises a primary key field and a linear sequence of steps, each of which is represented by a date field. The primary key field uniquely identifies a trial subject and is the same as the primary key field value found in some associated TrialSubjectModel.
The TrialActivityModel
instances can be configured so that the sequence of steps can include blank field values between dated ones. The activity objects can optionally enforce a non-descending chronological ordering of date field values. Each instance is also associated with a comments field, which holds free-text messages that may provide explanation for a trial subject’s state of progress.
The system is designed to support data curation of TrialActivityModel
records. The major purpose of holding TrialSubjectModel records is to support the ability for data curators to create filtered views of activity data for groups of trial subjects.
Separate administrator and data curation tools share the same data repository
CST assigns separate user roles to manage trial subjects and trial activities. Administrators are responsible for creating the data repository and populating it with trial subject records. Typically trial subject data are loaded once at the start of a project. Administrators can use the data import feature of the Administrator Tool to overwrite or add new trial subject records to the data repository. However, the system does not include facilities to let administrators curate fields in each record.Administrators will also be provided features which help synchronise the definitions of trial subjects and trial activities with corresponding table definitions in the database (more on this later).
Data curators use the Logging Tool to fill in dates when a given trial subject completes steps in one or more trial activities. They can use the filter menu of the tool to view specific sets of trial subject records. The menu is partly populated from using unique field values found in the filter fields of TrialSubjectModel
records. Curators may also use the Logging Tool to produce activity graphs which indicate the number of trial subjects which have completed given activity steps.
Common three-tier architecture is used to separate concerns about presentation, business concepts and persistent storage
CST uses a common three-tiered architecture which helps the data entry forms retain ignorance about how trial activity data are managed. Code for the applications is organised into three well-separated layers: a presentation layer, a business concept layer and a data persistence layer. The presentation layer contains code which creates the user interface. The UI code transfers data back and forth between electronic forms and instances of data container classes defined in the business concept layer. The presentation layer is not responsible for any form of validation and is not aware of how data are serialised. The business concept layer contains classes which correspond to domain concepts that project members can understand. Each class is responsible for three activities:- Setting and getting field values
- Validating individual and combinations of field values within the same instance
- Detecting field changes between old and revised instances of the same record.
- Cloning an instance.
The data persistence layer contains code responsible holding trial subject records and for saving and retrieving trial activity records in the data repository. The layer will contain all the SQL code used to manage data in the database. It will also contain validation code that checks for duplicates and non-existent records.
The business and persistence layers support operations that do not require human intervention. They can be exercised automatically either by other client applications or through automated test suites. The two layers are grouped together and advertised to the presentation layer via an abstract interface. When users initiate an action, code in the forms calls methods in the API using parameters expressed in terms of instances of classes defined in the business concept layer. The presentation layer only has knowledge of the signatures of API calls; it does not know anything of how the method calls are implemented.
The tools can interact with either an in-memory demonstration database or a MySQL production database
The tools’ reliance on an abstract interface rather than a concrete class for the repository allows the design to support at least two implementations of a data repository. The first is an in-memory database that uses instances of data container classes defined in the business concept layer. The ability forTrialSubjectModel
and TrialActivityModel
instances to clone themselves allows the in-memory database to simulate notions of original and copied records that are supported in a repository that used persistent storage. The ability for the business classes to detect changes between original and copied instances allows the database to record an audit trail of changes made to the records.
The in-memory database is used for demonstration and testing purposes. When it is used for demonstrations, it allows CST to run without requiring MySQL to be installed on a client machine. The code base includes classes which can generate dummy record values that help administrators and data curators appreciate what a production version would do. The use of dummy data fosters rapid prototyping of the configuration file which contains the definitions of trial subjects and trial activities (see data persistence layer). Dummy data can also allow projects to present their data without them worrying about compromising security interests.
Second, the in-memory database is used in testing. The implementation of the in-memory database is assumed to be much simpler than the implementation of a live relational database. Most test cases exercise the API, so when test suites are run against in-memory and relational versions of the data repository, the differences in outcomes can help identify source of errors quickly.
The production database is used to hold real data. The implementation contains all the SQL queries in the code base and addressees other issues such as concurrent usage and optimisations to help retrieve and filter records.
An audit trail of changes made to activity data is maintained
The TrialActivityModel class defined in the business concept layer has the ability to detect field value changes between old and revised instances of the same record. Changes are logged in the data repository and each entry includes the following fields:- User who made the change
- Date the change was made
- A description of the change made
- The primary key identifier of the trial subject
- The name of the activity related to the change
Tools support plugins
One of the major goals of design was to support a minimal feature set in a code base which could be supported by a single developer. In an effort to support future features and customisations, the architecture was made to accommodate plugins. So far, four types of plugins are envisioned for the Logging Tool:- Plugins which produce different types of progress graphs
- Plugins which can retrieve additional information about a trial subject using the primary key identifier
- Plugins which can import and export activity data.
As yet, no plugins are envisioned for the Administration Tool.
The entire tool suite is generated from properties described in a configuration file
The kinds of trial subjects and trial activities supported by CST will vary from project to project. One project may be tracking mice while another is tracking people. One activity may be responding to a drug regime while another may involve examining someone’s cardiac responses. The architecture attaches no special semantic significance to a trial subject filter field or to a trial activity step. Definitions of a subject and of activities are specified in an XML-based configuration file.CST reads the configuration file and uses it to generate Administration Tool, the Logging Tool and the data repository they share. This feature of design provides the most important means of showing that the software can be re-used in multiple projects.
Many model-driven applications rely on models expressed in UML or XML Schemas. These formalisms for expressing properties in generated applications were avoided for two main reasons:
- They are too expressive relative to the simple feature set that will be supported in the generated tools.
- They often require specialised skill sets on projects in order to build the models.
Instead, CST borrows a model-driven approach similar to Microsoft’s Software Factory approach. The XML configuration file contains a limited set of tags which describe business concepts for generating a specific family line of product. The tags may be regarded as a kind of domain-specific language used to describe trial activities and the trial subjects to which they apply.
Tool suite supports complete data migration
The limited scope of data curation and the simplicity of the schema for the MySQL production database allow all data in the repository to be exported to tabbed-delimited files.The Administrator Tool is able to export or import all data on trial subjects using a single tab-delimited table with columns for primary key identifier and filter attributes. The system stores no other data about trial subjects and the system design assumes that users will maintain data about trial subjects through some means independent of the tool suite.
There are two other kinds of data maintained by CST: activity data and an audit trail of changes made for each activity for each trial subject.
The Logging Tool supports facilities for importing and exporting data for a specific activity using a single tab-delimited table with a column for primary key identifier and columns for each activity step. Because the system assumes that activities are independent of one another, the exported text files will contain no foreign key references other than identifiers for the trial subjects.
The audit trail is maintained as a collection of change request objects which are serialised in a single table in the MySQL database. In future it will be easy to develop a feature which exports the change log to tab-delimited or XML-based text files.
The only kind of data that the system cannot import is an audit trail of changes. However, this kind of data would require that another application maintained records of the same provenance fields as CST. In theory a feature could be developed to import an audit trail but in practice the feature would rarely be used.
Author: Kevin Garwood
(c)2010 Medical Research Council. Licensed under Apache 2.0.