You are currently browsing the archives for the Interface Catalog category.

Archive for the 'Interface Catalog' Category

Important documentation for SOA: the Interface Catalogue and Data Dictionary

Sunday, June 16th, 2013

Interface Catalogue

A few months ago I blogged about our initiation of an Interface Catalogue here at the University of Bristol ( Each row in our interface catalogue represents a single integration – a join – between two different IT Systems. Despite our having established an operational datastore (called the Datahub) some years ago, we have a proliferation of point-to-point system integrations at Bristol (mainly because the Datahub is currently only updated nightly whereas many of integrations need to exchange data in real-time or near real-time). We’d like to significantly reduce our percentage of point-to-point integrations and mature our integration architecture, hence the need to do some analysis of the current integration architecture: the Interface Catalogue is giving us good “As Is” documentation to this end.

We have implemented it in an Oracle database thus far and are building a user-friendly Web frontend for the maintenance of entries in the catalogue over time. The schema is shown below

Interface Catalogue Schema Diagram

Tables in the Interface Catalogue


The interfaces table is completed such that each row in it represents an interface between two systems. We use unique codenames for all our IT systems, so the services table provides a look up for those system names. Most interfaces transform source system data and get it into the data format that is required by the destination system. For example, we might use PL/SQL to extract data about research grants from our Finance system’s Oracle database and present the data as a database schema view (an SQL result set) for our Research Information System to access. In this example, then, we would describe our data exchange mechanism as having a transformation type of PL/SQL and a data exchange format of SQL result set. In some more complex integrations we may do repeated transformations of the data and in this case we would link an ordered set of data exchange mechanisms (each having a transformation type and data exchange format) to the interface. Note that the Source Object is the data object being extracted from the source system – in this example it’s research grant data objects. In an attempt to converge on a controlled vocabulary for developers to use we have a look up table of values for all of transformation type, data exchange format and objects. In the case of data objects we aim to eventually sync that table with the data dictionary which is intended to become a complete set of documentation about the data in our systems – something we are only just embarking on and which I describe further below. The users and groups tables are used to support access control policy in terms of who is allowed to edit what in the interface catalogue. We have currently set responsibility for maintaining data on the destination system owners, mainly because those system owners are typically most likely to know specific information about the data objects they are obtaining from the source system and can add an explanation of the business purpose of that interface. Here’s an example entry in the interface catalogue (I haven’t included all fields for the sake of simplicity):



Enterprise Data Dictionary

As with the Interface Catalogue, we started off building consensus around the schema for our data dictionary by using a shared wiki. In my view this is quite a nice way to browse a data dictionary as it is possible to describe each data object on a separate wiki page and link from one to another (and also to controlled vocabularies that might be related to certain entity attributes). However I wanted to get the data dictionary populated in a devolved way as soon as possible, and I have been taking an opportunistic approach to getting entries filled out by a range of people around the organisation who are already having to capture this sort of information for their own purposes; these people are most comfortable with using Excel spreadsheets so I am willing to work in this way for now. We will migrate to a more sophisticated solution later this year (I come back to this below). I currently have business users in five different areas of the organisation all collaborating with me to complete information about their data objects in the same way, in this central repository, they are in: Finance, Payroll, HR, Identity Management and Student Information. The reason that each group is already wanting to document data is that we are migrating to a new ERP system in one area of the organisation (to replace several finance, payroll and HR systems), elsewhere, IT Services is implementing a new Identity Management system, and also we are developing improved data structures in SITS (our student information system). Each group therefore is in some sense needing to migrate current data and they only way they can do that safely is to document the “As Is” data structures in our systems and to then evaluate how to migrate existing data to target (“To Be”) data structures. I managed to convince each group to work with me in building up a central data store of this information that we will need to maintain over time. Just having this sort of centralised data in future should not only help with practical tasks being undertaken right now, but also greatly reduce the cost of migrating to new systems over time as, going forward, we’ll have good documentation at our finger tips and held in a consistent format.

Our current schema is at version 0.1 and subject to further development, however I post it here for information:

Data Dictionary Schema v 1.0
We are using a separate tab for each data object (entity) being documented (Person, Appointment, Address, Progamme, Unit and so on), and each data object is being documented as per the schema above.

Clearly this spreadsheet is growing rapidly and we are pushing this current documentation approach to its limits! So, how to host the data dictionary in such as way as it both access controlled and also intuitive to use and maintain by non-technical as well as technical people? Well, we currently propose to take a similar approach to that which we’ve taken with the Interface Catalogue thus far (i.e. a fairly simple Oracle database backend layered with a Java Web App that presents an access controlled administrative, Web frontend, integrated with our University’s Single Sign On system). However I am very interested in whether there are good, equally low cost tools out there, perhaps in SOA suites that we could be interested in evaluating for future adoption. I did a very quick trawl of suppliers of enterprise data dictionary tools out there and came up with this:

My next job is to talk more with our Business Intelligence team at Bristol. This team is already documenting data structures as in effect they are what I think of as consumers of master data – they need it to access data to put into our data warehouse to enable management information reporting at many levels within the organisation. They need to clearly understand the semantics of the data as well as to integrate data structures because clearly the end-users who read the generated reports need to be able to interpret the meaning of the information shown in them without ambiguity. We use SAP Business Objects solutions for our business intelligence purposes at Bristol and I plan to find out if there’s anything in this solution which will cover off our data dictionary needs and enable cross-linking with the interface catalogue. Meanwhile, if anyone reading this has useful suggestions, please let me know!


Explaining the importance of a good data integration architecture to senior management

Monday, October 22nd, 2012

In the summer I ran a workshop with the Portfolio Executive, which is our senior decision making body at the University regarding the distribution of funds to internal projects. In one half of the workshop I tackled a discussion about the need to regard master data as an asset and the problems with our current data integration architecture. I claimed that too high a percentage of our current system-to-system integrations are point-to-point, this position having been arrived at naturally following many years of allowing technical developers to create bespoke integrations for new systems.

I discussed how this architecture is risky for the institution as the combination of the lack of documentation, lack of a standards-based approach to data integration and too many system-to-system joins (“spaghetti” landscape of systems)  results in data model changes often propagating through our network systems in an unmanaged way. This can result in the sudden, unplanned loss of functionality in ‘consumer’ IT systems because the implications of changes made in a source IT system are not easy to appraise. We also suffer in some cases from a lack of agility in terms of our ability to respond quickly and efficiently to new imperatives (such as Government requirements for Key Information Sets or changes in REF requirements and so on). So, for example, in terms of data model change we’ve had a case where the organisational hierarchy was changed and unexpectedly broke a system that happened to be depending on it. As far as agility is concerned, we find that when we are starting to replace a system (or multiple systems) with a new one we usually have to go back to the drawing board each time to try to deduce how many interfaces there are to the current system and which interfaces will have to be put in for the new one. This can be overly time-consuming. All too often we create interfaces that are essentially doing the same task as other interfaces e.g. moving student data objects between systems, or transferring data about organisational units, research publications and so on. Although we began to tackle this duplication problem some years ago when we attempted to replicate key master data in a core system called the Datahub, we have not fully met requirements with this approach: for example the Datahub is not realtime (some data in it can be up to 24 hours old) causing many new system implementations to avoid use of the Datahub and instead to connect direct to source data systems. The consequence of this is that we simply perpetuate the problem of having many point to point integrations.

Now, this is is all rather technical. IT Services would like to make an investment in developing a new, more sophisticated architecture, whereby we have abstracted out key functionality (such as system requests of the type ‘give me all the students from department X’ and so on), developed a service oriented architecture where appropriate and deployed ESB technology wherever realtime data needs are most pressing.  We see the benefits this can bring, such as more reliable management of the knock on effect of system changes (reduction in system downtime), quicker project start up times due to a more agile integration architecture and a more standardised and therefore sustainable system integration architecture longer term. However, how to convince senior management to invest in this rather intangible sounding concept of a more mature data integration architecture is difficult when constrained to use non-technical speak! This is a brief summary of how I attempted to describe the concepts to the portfolio executive this summer:

Using a jigsaw analogy suggested to me by the Head of IT services I explained that we constantly try to fit new systems into our existing systems architecture seamlessly as though they are pieces from the same jigsaw puzzle – quite a challenge:

Jigsaw Architecture problem


The more we buy third party products, the more this is a real puzzle. The yellow boxes, by the way, are to do with students/researcher lifecycles – a concept that the portfolio executive were already familiar with and which I have blogged about elsewhere (see my enterprise architecture blog).

Next I discussed how we can think of our IT systems roughly as supporting the three areas of research, education and support services (such as finance and administration). Sticking with the jigsaw analogy, I described how we try to connect systems wherever we need to reuse data from one system in another. For example, where we might want to copy lists of courses stored in our Student Information System over to our timetabling system, or present publications stored in our research information system on the Web via our content management system. The ‘joins’, therefore, are created wherever we need a connection through which we can pass data automatically. This enables us to keep data in sync across systems and avoids any manual reentry of data. It’s the data reuse advantage. I used the diagram below to help discuss this concept:

Illustration of how we currently join IT systems together


I described how, as our requirements around information become more sophisticated, so the pressure on our data integration architecture increases. For example, we need integrated views of both research and teaching data to help inform discussions about individual academic performance, also we need cross-linked information about research and teaching on our website etc. If our data integration architecture is not fit for purpose (i.e. if the overall approach to system ‘joins’ is not planned, standardised, organised, documented and well governed) then we will struggle to deliver important benefits.

I used the following diagram to discuss what the future vision for the data architecture would look like:

The To Be vision of Joining Systems


This diagram is deliberately simplistic and purely illustrative. The blue ring is totally conceptual, but what it allowed me to talk about is the need to decouple systems that consume data from connecting direct to master data systems (i.e. to get away from such a proliferation of point-to-point system integrations). Naturally I’ve only shown a small subset of our systems in these diagrams, but this was enough to explain the concepts I needed to convey. I described how the blue ring could be made up of a combination of technologies, but that we would need to standardise carefully on these, organisation-wide, in order to increase our chances of sustaining the integration architecture over time. I didn’t mention service oriented architecture, but some of the blue ring could be composed of services that abstract out key, commonly used functionality using SOA technology. We didn’t discuss ESB, but some of the blue ring could use ESB technology. We have a data warehouse solution and this could be used to replicate some or even all our master data if we wish to go that route for data reuse too.

Determining the exact composition of the blue ring (ie the exact future data integration architecture for our institution) is not possible for us to do yet because we are still gathering information about the “As Is” (see my blog on the Interface Catalog). When we have fuller data about our current architecture then we will be able to review it and decide by what percentage we wish to reduce groups of point-to-point integrations (replacing them with web service api’s, say) and how we might want to replace Datahub with Datawarehouse technology and so on.

In order to throw more resource at developing the information needed for this analysis, we will be delivering a business case to the portfolio executive, requesting funds to help us. I hope to indicate how we’ve described the business model in that document in a future blog. Meanwhile, it is possible to continue to mature the integration architecture on a project by project basis.

The JISC are hosting an Enterprise Architecture workshop next month at which SOA is on the agenda. I hope to have useful conversations about where other Universities are at in maturing their data integration architectures at that event. One thing I can report is that the Portfolio Executive felt that they did understand the concept of the ‘blue ring’ and the importance of it, and are prepared to accept our forthcoming business case. This feels like an important step forward for our organisation.



Developing a new Integration Architecture – the importance of the “As Is”

Thursday, August 23rd, 2012

The core data integration project is about identifying core master data as a valuable, reusable asset within the organisation. One part of this project’s activity is designing the “to be” architecture that will support our goals in terms of master data. However, we also need to document the “as is” architecture so that we can roadmap between the “as is” and the “to be”. This methodology is part of TOGAF.

To this end I have been working on establishing an Interface Catalog, now being contributed to by technical teams at the University, according to a defined standard to suit our purposes. You can read more about this at