There is a lot of potential benefit in importing and interrogating your enterprise architecture components in a graph database; you can easily do extensive analysis on dependencies, redundancies, what-if and root cause analysis. ArchiMate is a commonly-used framework for representing different architectures; it is well-defined and used by many organisations and templates are available in many diagramming packages.

There are a number of great blogs showing the power of doing just this in Neo4j, such as:

The challenge in getting started can be from moving from existing modelling tools and artefact repositories to be able to start using the power of graph databases. Fortunately, this is relatively straightforward with Neo4j. In this blog we are going to cover, step by step, how to:

  • Import an existing ArchiMate diagram into Neo4j using either the Archi database plugin, or Neo4j Cypher queries

  • Export data from Neo4j back out again

  • Query the diagram, both for basic items such as lone elements and connectivity, to using some graph algorithms to look at element relationship strengths

The diagramming package we use explore the ArchiMate diagram is Archi, an open source modelling tool: https://www.archimatetool.com.

Getting started - ArchIsurance Case Study

For this post we’re going to use ArchIsurance case study available from The Open Group (https://publications.opengroup.org/y121). The model has already been coded up into Archi and is available from: https://github.com/archimatetool/ArchiModels/blob/master/Archisurance/Archisurance.archimate

This is a great example which covers the complexity your architecture can start to expand to. It’s difficult to view the entire model, as there is a lot of information and complexity conveyed within the model, and it is necessary to explore the data using the different views that have been created.

We are going to import the entire case study into Neo4j, and start to interrogate the different elements of the model, starting with dependencies to components, to using the graph algorithm procedures now available to look for connectedness within the diagram.

34154444 75e94292 e4ad 11e7 9baa 53709272dedb

Importing data into Neo4j

The two ways we will explore to import data from Archi into Neo4j are:

  • Via the Archi database plugin

  • Via the Neo4j CSV importer

Via the Archi database plugin

The Archi database plugin is available from https://github.com/archi-contribs/database-plugin/tree/master/v2 and version 2.07 support data exports to Neo4j. Once the plugin has been downloaded and copied into the Archi plugin folder, and the details for your running Neo4j instance have been entered, it is simply a matter of exporting the model into Neo4j. Calling up the Neo4j browser and running MATCH (n) RETURN * will show the loaded model:

34154446 760c1894 e4ad 11e7 8d89 85da5627b4a4

The export tool stores the element names and relationship types within the properties of the respective nodes and relationships. For example:

  • Relationship type is stored as a property on the relationship, e.g. [:relationships {class:”TriggeringRelationship”}]

  • Element type is stored as a property on the node, e.g. (:elements {class:”BusinessRole”})

  • Element name is also stored as a property on the node, e.g. (:elements {name:”Customer files service”})

  • Any properties attached to elements are represented as additional nodes with relationship “hasProperty”, and within the node name:value persisted by the name and the value properties on the node itself

Via the Neo4j CSV importer

We can also use the LOAD CSV tool available in Cypher to load our Archimate model. Firstly, “Export→Model to ~CSV”. This will create three CSV files:

  • elements.csv

  • properties.csv

  • relationships.csv

Copy and drop the files into the Neo4j import folder. Once this has been done, we are going to execute two LOAD CSV queries to get the elements and relationships data from the model into Neo4j (in this order, we are skipping the properties.csv file as there’s no data to load from there). This will produce the same output into Neo4j as the Archi export plugin.

Load the elements data
LOAD CSV WITH HEADERS FROM 'file:///elements.csv' AS line
CREATE (:elements {class:line.Type, name:line.Name, documentation:line.Documentation,
    id:line.ID })
Load relations info
LOAD CSV WITH HEADERS FROM 'file:///relations.csv' AS line
MATCH (n {id:line.Source})
WITH n, line
MATCH (m {id:line.Target})
WITH n, m, line
CREATE (n)-[:relationships {id:line.ID, class:line.Type, documentation:line.Documentation,
    name:line.Name}]->(m)

Exporting the model back to Archi

We can also make tweaks to the model (via manual Cypher). Of course we don’t have Archi to validate what we’re doing is legal with the ArchiMate framework, so tread with caution. Also don’t forget to create UUID’s for each of the new elements created (helpfully supported by APOC: https://neo4j-contrib.github.io/neo4j-apoc-procedures/).

We can export the information back into Archi, again via the CSV function. Firstly we need to create outputs that resemble the CSV format Archi expects. To do this we will create the two output files we originally imported, and via Cypher queries, create the expected output, clicking the export function once the query has executed.

Output the element in an Archi csv friendly format (elements.csv)
MATCH (n)
WHERE n.id <>"null"
RETURN n.id AS ID, n.class AS Type, n.name AS Name, n.documentation AS Documentation
Output the relations in an Archi csv friendly format (relations.csv)
MATCH (n)-[r]->(m)
RETURN r.id AS ID, r.class AS Type, r.name AS Name, r.documentation AS Documentation,
    n.id AS Source, m.id AS Target

From here, we use the import CSV function in Archi, including properties.csv we didn’t use, and we’re good to go.

Exploring ArchIsurance in Neo4j

We’re now going to explore our ArchiMate model through a series of Cypher queries in the Neo4j browser. A note before we commence; we are going to be exploring ‘long distance’ relationships between the elements, so we will be doing pattern matches on long paths. This is a really powerful tool in Neo4j as we can look deep into dependencies that might not be so obvious to us immediately. As there are circular relationships within the model (e.g. flow relationships), we will be limiting the Cypher query as to how far to dig down the relationships when interrogating the diagram. Also, we could further tune the queries outlined and do a search on the relationship parameter to ignore/target certain relationship/node types if appropriate.

As we’re going to use some targeted queries in the examples looking at particular layers, the following query will take the ArchiMate element types (from the class property on the nodes) and apply them as labels to the nodes. Again, this is courtesy of the APOC library and a feature available in Neo4j 3.0 onwards:

MATCH (n:elements)
CALL apoc.create.addLabels(n, [ n.class ]) YIELD node
RETURN node

We can also change the relationship type to the ArchiMate relation ( [:relationships {class:”reltype”} ] ) via Cypher query if we wish to do so.

As we’ve now added new labels to the nodes, we can colour them up in the browser, which helps us to visualise the different layers of the elements:

34154447 761fcda8 e4ad 11e7 8b2c ee4462343508

Relationship directions between nodes are a powerful concept in Neo4j, especially when it comes to interrogating our ArchIsurance model: we can ignore the direction when we’re understanding the strength of connections between elements, and we can use it to understand underpinning dependencies for components within the layers.

The following queries outlined below are answering questions we have about our model, ranging from are there any inconsistency issues such as forgetting to add relationships, to practical considerations such as identifying points of failure and understanding component dependencies.

Q1: Have we got any elements that haven’t been linked in?

This question seeks to answer whether we have any elements in our ArchiMate diagram that we may have forgotten to link to other elements. We achieve this by searching for nodes without relationships:

MATCH (n)
WHERE NOT (n)--()
RETURN n.name, n.class
34154448 76401d10 e4ad 11e7 8314 3629f1981056

Q2: Are there any application elements that don’t link from other applications or lower layers?

Building up from the previous question, we are now looking to see whether there are any application-level components that are missing upward relationships from other elements, most likely in the technology layer:

MATCH (n)
WHERE (n:ApplicationComponent OR n:ApplicationService OR n:ApplicationFunction)
AND NOT (n)<-[]-()
RETURN n.name, n.class
34154449 76604a2c e4ad 11e7 9b70 6b7da6f2e43b

Q3: What business services are dependent on the “Policy Data Management” application component?

Now we are specifically interested to understand what business services have dependencies on an application component, even though they are not directly joined:

MATCH (bizServ {class:"BusinessService"})<-[*1..10]-({name:"Policy Data Management"})
RETURN DISTINCT bizServ.name
34154450 7674f31e e4ad 11e7 9eee 98a183719daa

Note that because we’ve added labels to the nodes, the following query also returns the same result:

MATCH (bizServ:BusinessService )<-[*1..10]-({name:"Policy Data Management"})
RETURN DISTINCT bizServ.name

Q4: What elements are impacted if technology service “Claim Files Service” stops working?

This question is seeing to understand our dependencies and knock-on effects of the technology service not being available:

MATCH (n)<-[*1..10]-(:TechnologyService {name:"Customer File Service"})
RETURN DISTINCT n.class, n.name
ORDER BY n.class, n.name
34154451 76937744 e4ad 11e7 9970 ea9c2a0f7e00

As we’ve represented the ArchiIsurance ArchiMate model as a graph, we can use graph algorithms to find heavily connected elements that may not necessarily be immediately obvious. To do this, we will use the Neo4j graph algorithm library, available from: https://neo4j-contrib.github.io/neo4j-graph-algorithms/.

Q5: What are the most connected elements in my estate?

By understanding which are the most connected components, we can get a rough idea of the importance of that element and whether we wish to split the element out. We use the Pagerank algorithm (of Google fame) to do this:

CALL algo.pageRank.stream()
YIELD node, score
RETURN node.name, node.class, score
ORDER BY score DESC
34154452 76b6820c e4ad 11e7 8b3e d7246ff1b1de

Q6: What are the most connected application/technology elements in my estate?

We can extend the query above and make it specific we want to look at application and technology level elements. The interest here is to see how interconnected these elements are. This starts to give us a feel of dependencies and risk assessments in case of outages. We’re going to use the labels we’ve attached to the nodes to specifically call up all the elements within the application and technology layer:

CALL algo.pageRank.stream('MATCH (n) WHERE n:DataObject OR n:Node OR n:Device OR n:SystemSoftware OR n:Path OR n:CommunicationNetwork OR n:Artifact OR n:ApplicationComponent OR n:ApplicationCollaboration OR n:ApplicationInterface OR n:ApplicationFunction OR n:ApplicationInteraction OR n:ApplicationProcess OR n:ApplicationEvent OR n:ApplicationService OR n:TechnologyCollaboration OR n:TechnologyInterface OR n:TechnologyFunction OR n:TechnologyProcess OR n:TechnologyInteraction OR n:TechnologyEvent OR n:TechnologyService RETURN id(n) as id', 'MATCH (n)-->(m)  RETURN id(n) as source, id(m) as target',{graph:'cypher'}) YIELD node, score
RETURN node.id, node.name, node.class, score
ORDER BY score DESC
34154453 76d7ab6c e4ad 11e7 9bbb 56ad9f77abcd

Now that was quite a large, unwieldy query! Luckily, we can add multiple labels to nodes, and in this instance it makes perfect sense to label the elements based in which of the ArchiMate layers they sit in. The following two queries will apply "ApplicationLayer" and "TechnologyLayer" to the appropriate elements within our model:

Add Application Layer labels
MATCH (n:elements)
WHERE n:DataObject OR n:ApplicationComponent OR n:n:ApplicationCollaboration
    OR n:ApplicationInterface OR n:ApplicationFunction OR n:ApplicationInteraction
	OR n:ApplicationProcess OR n:ApplicationEvent OR n:ApplicationService
CALL apoc.create.addLabels(n, [ "ApplicationLayer" ]) YIELD node
RETURN node
Add Technology Layer labels
MATCH (n:elements)
WHERE n:Node OR n:Device OR n:SystemSoftware OR n:Path OR n:CommunicationNetwork
    OR n:Artifact OR n:TechnologyCollaboration OR n:TechnologyInterface
	OR n:TechnologyFunction OR n:TechnologyProcess OR n:TechnologyInteraction
	OR n:TechnologyEvent OR n:TechnologyService
CALL apoc.create.addLabels(n, [ "TechnologyLayer" ])
YIELD node
RETURN node

Now that we’ve added the new labels, the above query now becomes the following:

CALL algo.pageRank.stream('MATCH (n) WHERE n:ApplicationLayer OR n:TechnologyLayer RETURN id(n) as id',
    'MATCH (n)-->(m)  RETURN id(n) as source, id(m) as target',{graph:'cypher'}) YIELD node, score
RETURN node.id, node.name, node.class, score
ORDER BY score DESC

Extending the model

A powerful continuing step would be to enrich the model and start creating relationships to real-time monitoring systems, incident occurrences, feedback sentiment, and so forth to the architectural elements. We could also add business requirements as they come in to help prioritise project and platform advancement.

Using a tool like Archi and maintaining the model in Neo4j makes this easy: we can import and export the ArchiMate model back and forth between the tools without needing to delete/sync other data sources, provided that sensitivity to existing elements is considered and the unique identifiers do not change. In the next post we’ll start to explore what this might look like.

Final Words

We have shown how easy it is to import an architectural diagram from Archi and leverage the power of the graph to do both model integrity checks, as well as looking out for dependencies or pinch points within the proposed architecture. Although we have focussed exclusively on entity types, do be aware we can also specifically target relationship types too by adapting the Cypher queries. Next time, we’ll examine approaches to further enrich the model, thereby really getting value out of using Neo4j for exploring your IT elements.