Deploying the Fragment Graph

Note

Allow 2 hours to install all of the applications.

The playbooks that we’d normally run from AWX can be executed form the command line. That’s what we’ll be doing here.

The steps we’ll follow here are: -

  • Clone the Graph playbook repository
  • Create a parameter file to satisfy your cluster
  • Run the playbook

Clone the graph repo

The repository contains playbooks and roles for the deployment of a neo4j graph database and associated fragmentation data. From your virtual environment clone it and (ideally) switch to the most recent tag:

$ git clone https://github.com/InformaticsMatters/docker-neo4j-ansible
$ cd docker-neo4j-ansible
$ git checkout tags/2.4.4

Create parameters

The Graph deployment is flexible and is controlled by a number of Ansible variables. To control your deployment you are likely to have to define your own set of variable values in a parameter file - not all of those that are set in the repo may be of use to you.

A parameters-template.yaml file contains a small set of significant ones.

Copy the example in the repository, inspect it and set/change any values you need to:

$ cp parameter-template.yaml parameter.yaml
[edit parameter.yaml]

parameter.yaml is protected from being committed by the project’s .gitignore file.

The parameter-template.yaml contains only an example set of role variables - you should familiarise yourself with the role and inspect the other parameters (in the role’s defaults/main.yaml and vars/main.yaml), just in case you also need to adjust them.

Run the playbook

With a set of parameters created, deploy the Graph using the site.yaml playbook:

$ ansible-playbook -e @parameter.yaml site.yaml
[...]

As the graph initialisation can take some time the job does not (at the time of writing) wait for the graph to initialise. We therefore use the kubectl command-line (in a separate terminal/shell) to check on the status of the graph before moving on.

First, check that the graph namespace exists:

$ kubectl get namespace/graph
NAME    STATUS   AGE
graph   Active   7s

And then watch the Graph Pod status until it’s Running. The graph contains an initialisation container used to download the graph data to the cluster:

$ kubectl get pod/graph-0 -n graph -w
NAME      READY   STATUS     RESTARTS   AGE
graph-0   0/1     Init:0/1   0          14s
graph-0   0/1     Init:0/1   0          95s
graph-0   0/1     Init:0/1   0          100s
graph-0   0/1     PodInitializing   0          108s
graph-0   1/1     Running           0          114s

Once you see Running the Pod has started and you can ctrl-c from the command.

The graph needs to import the downloaded files into a graph database, which can take a significant length of time, depending on the data that’s been downloaded.

You can follow the Graph Pod’s logs and wait for the import process to complete. The graph import typically involved 4 stages that are easily followed from the logs.

The output here has been truncated because there is a lot of it.

Importantly, to be confident the deployment has worked, you must see: -

  • A section starting (1/4) Node import
  • A section starting (2/4) Relationship import
  • A section starting (3/4) Relationship linking
  • A section starting (4/4) Post processing

And, finally, you’re waiting to see the word Finished. issued by the cypher-runner.sh script:

$ kubectl logs pod/graph-0 -n graph -f
[...]
(1/4) Node import 2020-09-16 03:18:22.955+0000
Estimated number of nodes: 40.16 M
Estimated disk space usage: 8.64 GB
Estimated required memory usage: 1.49 GB
.......... .......... .......... .......... ..........   5% ∆4s 813ms
.......... .......... .......... .......... ..........  10% ∆3s 609ms
.......... .......... .......... .......... ..........  15% ∆3s 405ms
.......... .......... .......... .......... ..........  20% ∆3s 406ms
[...]
(4/4) Post processing 2020-09-16 04:13:13.062+0000
Estimated required memory usage: 1020.01 MB
.--.-..... .......... .......... .......... ..........   5% ∆7s 601ms
.......... .......... .......... .......... ..........  10% ∆11s 413ms
.......... .......... .......... .......... ..........  15% ∆12s 209ms
.......... .......-.. .......... .......... ..........  20% ∆3s 906ms
[...]
2020-03-19 14:25:08.527+0000 INFO  ======== Neo4j 3.5.5 ========
2020-03-19 14:25:08.532+0000 INFO  Starting...
2020-03-19 14:25:14.865+0000 INFO  Bolt enabled on 0.0.0.0:7687.
2020-03-19 14:25:16.444+0000 INFO  Started.
2020-03-19 14:25:17.531+0000 INFO  Remote interface available at http://localhost:7474/
(cypher-runner.sh) Thu Mar 19 14:26:05 UTC 2020 Setting neo4j password...
(cypher-runner.sh) Thu Mar 19 14:26:07 UTC 2020 No legacy script.
(cypher-runner.sh) Thu Mar 19 14:26:07 UTC 2020 Trying /data/cypher-script/cypher-script.once...
(cypher-runner.sh) Thu Mar 19 14:26:08 UTC 2020 .once script executed.
(cypher-runner.sh) Thu Mar 19 14:26:08 UTC 2020 No .always script.
(cypher-runner.sh) Thu Mar 19 14:26:08 UTC 2020 Touching /data/data-loader/cypher-runner.executed...
(cypher-runner.sh) Thu Mar 19 14:26:08 UTC 2020 Finished.

Once you see that you can ctrl-c from the follow command and continue with the remaining applications.