Getting started

Vocabulary

HW Requirements (Hello World specs, single machine)

Server name CPU RAM Disc space OS
All in one >= 3 cores >= 10 GB 200 GB Debian (recommended)

Tip: The whole solution is multi-platform and runs on Windows as well. Though it’s not recommended and not thoroughly tested.

Prerequisites

  1. Access to Docker registry with build Docker images.
  2. SSH access to all VMs (if run on VMs).
  3. Allready installed Docker and Docker Swarm.

Creation of Docker Swarm cluster

Virtual machines (even in the case of 1 virtual machine) have to be connected into cluster. You should add docker labels to each docker node.

Labels restrict which nodes may be used for each Docker service. In case of single node cluster, add all labels to that node.

First you need to initialize Docker Swarm cluster.

docker swarm init

Then find your hostname by

docker node ls

and replace {my-vm} in the following set of commands by ID or HOSTNAME from the output. Then run them as:

Single machine example:

docker node update --label-add type-monitor=true {my-vm}
docker node update --label-add type-db=true {my-vm}
docker node update --label-add type-back=true {my-vm}
docker node update --label-add type-front=true {my-vm}

4-node cluster alternative:

docker node update --label-add type-monitor=true {my-vm1}
docker node update --label-add type-db=true {my-vm2}
docker node update --label-add type-back=true {my-vm3}
docker node update --label-add type-front=true {my-vm4}

Create all Docker services

Clone repo infrastructure

The easiest way is to call prepared bash script which will create all docker stacks with their services in folder {BASE_PATH}. You can export your own {BASE_PATH} by export BASE_PATH=/my/path/to/repo on Unix or $env:BASE_PATH=/my/path/to/repo on Windows (PowerShell).

After that just some of the services will work, because the rest is not configured yet.

State of current services is shown in Swarmpit on http://{IP_ADRESS_OF_CLUSTER}:888. Create your default admin user to login. {IP_ADDRESS_OF_CLUSTER} is the IP where you run the cluster. On localhost it’s 127.0.0.1.

If the login doesn’t work, maybe the services are still just starting and you have to wait a bit. To check the status, run docker service logs -f swarmpit_app.

Tip: If you want to delete all and start over, run docker stack rm $(docker stack ls --format "{{.Name}}"). Careful! This will DELETE EVERYTHING.

Setup Secrets & Config files

Copy secrets-example folder into secrets (eg. by running cp -r secrets-example secrets. This will create an example Docker secrets for the services. Docker Swarm takes Docker Secrets from this folder. You can see it in Swarmpit (http://{IP_ADRESS_OF_CLUSTER}:888).

Copy configs-example folder into configs. This will create an example Docker configuration for the services. Docker Swarm takes Docker Configs from this folder. You can see it in Swarmpit (http://{IP_ADRESS_OF_CLUSTER}:888).

Now all the services should be running, but they still don’t work fully, as you have to setup databases, migrations etc.

You can see the status of all services and where they are all available (IP addresses) in Swarmpit in “Services” tab or in the Service detail.

Setup relational databases

Use Adminer to login into PostgreSQL (at http://{IP_ADRESS_OF_CLUSTER}:8080). Credentials are set in the file database.yml by constants POSTGRES_USER and POSTGRES_PASSWORD.

Adminer login screen

When you are logged in, create new databases for dataplatform (recommended: dataplatform) and for permission proxy (recommended: permission_proxy)

CREATE DATABASE dataplatform;
CREATE DATABASE permission_proxy;

Copy & paste SQL script from init-database/init.sql into the Adminer and change default values (usernames, passwords, …). It creates users for modules with correct privileges. These credentials are setup in the file gateways.yml.

Database migrations

In Swarmpit (http://{IP_ADDRESS_OF_CLUSTER}:888) go to detail of service migration_schema-definitions-migration (http://{IP_ADDRESS_OF_CLUSTER}:888/services/migration_schema-definitions-migration) and click on Redeploy service. It runs database migrations which prepare whole PostgreSQL database structure for you.

Redeploy migrations

Result will be seen in Adminer (http://{IP_ADDRESS_OF_CLUSTER}:8080).


Voilà! That’s it.

How to test it works

Prerequisities

  1. Running project
  2. Access to RabbitMQ Management
  3. Access to MongoDB
  4. cUrl or Postman for sending Requests to API

Data Processing (Integration Engine)

By default the cron-tasks project sends messages to refresh data for datasets City Districts (every 5 minutes) and Bicycle Parkings (every 1 hour). So you can check the message queues in RabbitMQ was created and the messages in the queues dataplatform.citydistricts.refreshDataInDB and dataplatform.bicycleparkings.refreshDataInDB are processed. If the message processing is ok, you can check that the collections citydistricts and bicycleparkings in the MongoDB are not empty.

Input (Input Gateway)

To check that the Input Gateway is running you can send the health check request on the endpoint https://{IP_ADDRESS_OF_CLUSTER}/dev/input-gateway/health-check. To send the request you can use cUrl:

curl -X GET https://{IP_ADDRESS_OF_CLUSTER}/dev/input-gateway/health-check

To check that the Input Gateway works you can use the General endpoint to process some test data. You can use the cUrl:

curl -X POST https://{IP_ADDRESS_OF_CLUSTER}/dev/input-gateway/general/test -H 'Content-Type: application/json' -d '{ "message": "Hello World!" }'

Then you can see in the RabbitMQ the message in the queue dataplatform.general.import was processed and the test data was saved in the MongoDB collection general_test.

Output (Output Gateway)

To check that the Output Gateway is running you can send the health check request on the endpoint https://{IP_ADDRESS_OF_CLUSTER}/dev/output-gateway/health-check. To send the request you can use cUrl, curl -X GET https://{IP_ADDRESS_OF_CLUSTER}/dev/output-gateway/health-check.

To check that the Output Gateway works you can use e.g. the City Districts endpoint to get some data. You can use cUrl:

curl -X GET http://{IP_ADDRESS_OF_CLUSTER}/dev/output-gateway/citydistricts
Warning: the `/dev/` routes for Input and Output gateways are direct access to the services APIs and should be use only for testing. In production environment, the access to gateways should go through a security layer (e.g. Permission Proxy) with explicitly granted access to individual endpoints.

Monitoring

TBD

How run project on bare metal

Prerequisites

  1. Virtual machine with SSH access and root privileges.

3rd party software

For some of the next software you have to create users with proper access rights and then setup it in each module. How to setup credentials is described in each module.

Setup relational databases

Module installation

Installation and setup of each module is described in repository of that module.