You are currently browsing the archives for the Enterprise Data Dictionary category.

Archive for the 'Enterprise Data Dictionary' Category

Important documentation for SOA: the Interface Catalogue and Data Dictionary

Sunday, June 16th, 2013

Interface Catalogue

A few months ago I blogged about our initiation of an Interface Catalogue here at the University of Bristol ( Each row in our interface catalogue represents a single integration – a join – between two different IT Systems. Despite our having established an operational datastore (called the Datahub) some years ago, we have a proliferation of point-to-point system integrations at Bristol (mainly because the Datahub is currently only updated nightly whereas many of integrations need to exchange data in real-time or near real-time). We’d like to significantly reduce our percentage of point-to-point integrations and mature our integration architecture, hence the need to do some analysis of the current integration architecture: the Interface Catalogue is giving us good “As Is” documentation to this end.

We have implemented it in an Oracle database thus far and are building a user-friendly Web frontend for the maintenance of entries in the catalogue over time. The schema is shown below

Interface Catalogue Schema Diagram

Tables in the Interface Catalogue


The interfaces table is completed such that each row in it represents an interface between two systems. We use unique codenames for all our IT systems, so the services table provides a look up for those system names. Most interfaces transform source system data and get it into the data format that is required by the destination system. For example, we might use PL/SQL to extract data about research grants from our Finance system’s Oracle database and present the data as a database schema view (an SQL result set) for our Research Information System to access. In this example, then, we would describe our data exchange mechanism as having a transformation type of PL/SQL and a data exchange format of SQL result set. In some more complex integrations we may do repeated transformations of the data and in this case we would link an ordered set of data exchange mechanisms (each having a transformation type and data exchange format) to the interface. Note that the Source Object is the data object being extracted from the source system – in this example it’s research grant data objects. In an attempt to converge on a controlled vocabulary for developers to use we have a look up table of values for all of transformation type, data exchange format and objects. In the case of data objects we aim to eventually sync that table with the data dictionary which is intended to become a complete set of documentation about the data in our systems – something we are only just embarking on and which I describe further below. The users and groups tables are used to support access control policy in terms of who is allowed to edit what in the interface catalogue. We have currently set responsibility for maintaining data on the destination system owners, mainly because those system owners are typically most likely to know specific information about the data objects they are obtaining from the source system and can add an explanation of the business purpose of that interface. Here’s an example entry in the interface catalogue (I haven’t included all fields for the sake of simplicity):



Enterprise Data Dictionary

As with the Interface Catalogue, we started off building consensus around the schema for our data dictionary by using a shared wiki. In my view this is quite a nice way to browse a data dictionary as it is possible to describe each data object on a separate wiki page and link from one to another (and also to controlled vocabularies that might be related to certain entity attributes). However I wanted to get the data dictionary populated in a devolved way as soon as possible, and I have been taking an opportunistic approach to getting entries filled out by a range of people around the organisation who are already having to capture this sort of information for their own purposes; these people are most comfortable with using Excel spreadsheets so I am willing to work in this way for now. We will migrate to a more sophisticated solution later this year (I come back to this below). I currently have business users in five different areas of the organisation all collaborating with me to complete information about their data objects in the same way, in this central repository, they are in: Finance, Payroll, HR, Identity Management and Student Information. The reason that each group is already wanting to document data is that we are migrating to a new ERP system in one area of the organisation (to replace several finance, payroll and HR systems), elsewhere, IT Services is implementing a new Identity Management system, and also we are developing improved data structures in SITS (our student information system). Each group therefore is in some sense needing to migrate current data and they only way they can do that safely is to document the “As Is” data structures in our systems and to then evaluate how to migrate existing data to target (“To Be”) data structures. I managed to convince each group to work with me in building up a central data store of this information that we will need to maintain over time. Just having this sort of centralised data in future should not only help with practical tasks being undertaken right now, but also greatly reduce the cost of migrating to new systems over time as, going forward, we’ll have good documentation at our finger tips and held in a consistent format.

Our current schema is at version 0.1 and subject to further development, however I post it here for information:

Data Dictionary Schema v 1.0
We are using a separate tab for each data object (entity) being documented (Person, Appointment, Address, Progamme, Unit and so on), and each data object is being documented as per the schema above.

Clearly this spreadsheet is growing rapidly and we are pushing this current documentation approach to its limits! So, how to host the data dictionary in such as way as it both access controlled and also intuitive to use and maintain by non-technical as well as technical people? Well, we currently propose to take a similar approach to that which we’ve taken with the Interface Catalogue thus far (i.e. a fairly simple Oracle database backend layered with a Java Web App that presents an access controlled administrative, Web frontend, integrated with our University’s Single Sign On system). However I am very interested in whether there are good, equally low cost tools out there, perhaps in SOA suites that we could be interested in evaluating for future adoption. I did a very quick trawl of suppliers of enterprise data dictionary tools out there and came up with this:

My next job is to talk more with our Business Intelligence team at Bristol. This team is already documenting data structures as in effect they are what I think of as consumers of master data – they need it to access data to put into our data warehouse to enable management information reporting at many levels within the organisation. They need to clearly understand the semantics of the data as well as to integrate data structures because clearly the end-users who read the generated reports need to be able to interpret the meaning of the information shown in them without ambiguity. We use SAP Business Objects solutions for our business intelligence purposes at Bristol and I plan to find out if there’s anything in this solution which will cover off our data dictionary needs and enable cross-linking with the interface catalogue. Meanwhile, if anyone reading this has useful suggestions, please let me know!