System architecture

The data platform is realized as a modular system with the use of individually functional, individually deployable and replaceable modules. The individual modules and layers are designed with the idea of substitutability for a different solution or an already existing service (e.g. provided as SaaS) and with the idea of maximal flexibility and scalability.

Scalability and modularity

Because of the difficulties in anticipating future developments (in data volume, number of data suppliers and individual use cases) there is a big emphasis on scalability – e.g. the layer for receiving data is designed as a minimal (lightweight) stateless application, which is possible to deploy, in combination with a load balancer, in a number of instances for a quick reception of data and for inclusion of requirements to the queue for gradual processing. The queue is also an integral part of the system and realizes the reception, persistence of data, distribution, and synchronisation between the individual computational nodes. The system is, therefore, able to react to traffic in the data flow.

Stateless applications

The maximum of modules is solved as stateless applications, where the state is kept by a common database cluster. Computational codes (individual instances realizing processing of data, transformation, and calculation of data) are also arbitrarily scalable and by increasing the number of their instances the processing and picking up data from the queue will become faster.

Microservice-oriented

The system architecture connects the pros of a microservice-oriented architecture (flexibility, individual scaling, substitutability, speed of rollouts of new updates) with the traditional design. That, of course, does not bring just the pros but also the cons, which had to be taken into consideration during the design and realization (mainly during operating and version promotions with retroactively incompatible changes), because some modules are not purely independent microservices.

Orchestration

The whole system is operated on a virtual infrastructure, the individual applications are in a containerized environment (Docker). Above the containers there is an orchestration layer, which is responsible for load balancing, clustering, deploying of new versions (a rolling update of the individual services, first there is a launch of the second instance in the new version, the traffic is redirected and then the old version is switched off, that is why there is zero-downtime deployment) etc.

CI/CD

The architecture is designed with the idea of continuous deployment (CI/CD) therefore new versions of individual services in an automatized system can be deployed gradually in short iterations. The platform is built above a cloud environment with the use of microservices, containerization, scaling of the environment and using the services ‘as a service’.

The application is divided into individual and independent modules:

  • Input interface (Input Gateway),
  • Access layer (ACL, Access proxy),
  • Queue (Message Broker, Queue),
  • Integration layer (Integration Engine),
  • Database layer,
  • Output interface (Output Gateway),
  • Admin panel (Admin Panel),
  • Dispatching and data analysis (Client Panel),
  • Management of time-controlled tasks (CRON).

Technological scheme

Technologické Schéma architektury

Basic characteristics of system architecture:

  • Individual components have a clearly defined interface and they can be scaled or replaced with a different solution in the future
  • All incoming requests have to go through Access proxy, this is a layer which is responsible for authentication and authorization
  • Input gateway defines incoming endpoints and validates the data structure of incoming data
  • Processing of data is done through the message queue RabbitMQ with the help of worker in the integration engine
  • Worker flow: picks up message -> performs the task -> saves the result through a message to another queue or just switches off
  • It is possible to download data from external sources on the basis of time (cron) or a different event, which saves the message to the queue