Technical specification

This chapter describes the functional requirements (‘what can it actually do?’) of the Golemio data platform in relation to the software tool (the application set).

Functional requirements

Integration of data from external systems of cities and municipal companies

  1. The system will process real-time data
    1. The exposed push API will be able to process
      1. JSON data
      2. XML data
      3. Plaintext data
      4. Binary data
    2. The data on the input API will be validated
    3. Input API will be authorized
      1. On the level of an API key (in headers or in the body)
      2. On the level of an IP whitelist
      3. The individual types and levels of the authorization will be adjustable separately for every API endpoint (data from one source)
  2. The system will be able to communicate in both directions with connected devices
    1. Calling of an external controlling API
    2. Connecting to a WebSocket
  3. The system will process static data and data from an API / DB
    1. Pull API
      1. JSON REST API
      2. XML/SOAP API
      3. A proprietary HTTP/S API
    2. An external FTP storage
      1. Downloading data in the interval from 1 minute to 1 year
    3. Adjustable by configuration (the definition of CRON)
  4. The received data will be validated and there will be a check of its correctness with the option of setting up operative alerts in the case of an error
  5. The system will process manual inputs (map documentation, code lists) – by direct upload to the database
    1. GeoJSON, JSON, CSV formats
  6. The system will allow automatic upload to the OpenData catalogue
    1. Weekly, daily frequency
  7. Calculations will be run over received data
    1. Geoprocessing, delay calculations, predictions, data enrichment

Data storage

  1. SQL and NoSQL databases will be used in the system with the option of direct access for data analysts
  2. The system will store current data (current location, current occupancy)
  3. The system will store historical data (occupancy history, location history, state history)

Data analysis

  1. The option of connecting tools for data analysis (Grafana, PowerBI, R studio, ArcGIS Desktop or other standard tools)
  2. The system will store received raw data for various time according to the needs of the project

Web applications

  1. The web application for internal dashboards, dispatching (operative monitoring) and admin panel DP or the use of the Golemio website for public dashboards
    1. The display of public dashboards
    2. The display of internal dashboards
    3. The option of managing the connected devices via a dispatching application
  2. Access via user credentials to the web application intended for the display of specific dashboards
    1. Logging in and managing access to individual dashboards based on user roles
    2. Dividing access to individual data on dashboards (e.g. the lamp dashboard, some users only see the lamps with the parameter ‘municipal district: Prague 7’, other users see them all)
      1. Access on the basis of the municipal district
      2. Access on the basis of permitted IDs of individual records
  3. Data update (the shortest possible update time)
    1. For public dashboards from 30 minutes
    2. Internal dashboards from 1 minute
    3. Dispatching panel from 10 seconds
  4. A web application for the admin panel for managing users and rights
    1. Managing of users, user groups, API accesses (more can be found in the Open API chapter)
  5. Web applications for managing API keys for the public
    1. The option of registering
    2. The option of generating an own API key
    3. The option of deleting an API key
    4. An application in Czech and English localization

Open API

  1. The implementation of an output gateway for the accessibility of data from the Data Platform for third parties (mobile developers, external systems)
    1. Uniform REST API
    2. API based on relational (SQL) and document data (NoSQL)
    3. Documentation of the output API, including the description of endpoints, the description of HTTP methods and parameters, the structure of return data and sample data
    4. Output API will support pagination, limitation of the number of returned records
    5. Output API will support the query with the help of location (geolocation query) and the sorting of results by the distance from the point
    6. Output API will support filtering of results by municipal districts or by other specified filters
    7. Output API will provide historical data with the parameter From-To (the date and time) for filtering results
    8. Output API will use caching (every query will not mean a query to the database)
  2. Setting up request limits, roles and access permissions, logging access
    1. An admin panel with an overview of accesses/utilizations
    2. Logging of all the accesses and generating of statistics
    3. Blocking of a previously provided access
    4. It will be possible to determine the general rate limit for a user
    5. Setting access permissions will be possible on the basis of a data set (endpoint) and on the basis of attributes in data
    6. Adjustable accesses
    7. Access to individual API endpoints
    8. Access based on the municipal district
    9. Access based on permitted IDs of individual records
    10. Access with a maximal query limit (its size)
  3. Automatic generating of API keys
    1. Users will have the opportunity to generate their own key, which will have a default rate limit value
    2. Email verification – sending a verification email for activating the key

Alerting and monitoring

  1. Data check
    1. In the given time frame there must be an amount of incoming data, otherwise, a warning will be sent
    2. Periodic calling of external sources and schema validation
    3. Verification of values against set rules
  2. Monitoring
    1. All the services will be monitored via heartbeats
    2. The number of received data will be logged and monitored (received push data on the input API and pull from external API)
    3. A regular check of data sources and reporting will take place
    4. Logs will be collected and alerting to any serious errors in the applications will be set up
    5. The above-mentioned monitoring requirements will have GUI

Non-functional requirements

  1. Dispatching and dashboards will be available as a web application with managing of accesses
  2. The admin panel will be available as a web application for managing of accesses
  3. The individual modules of solutions will be horizontally scalable – a layer for data integration, an input interface layer, a database layer, connecting to sensors
  4. The modules will allow a performance increase and an increase in the number of processed requests/storage capacity with the help of horizontal scaling without interfering with source codes of solutions
  5. Individual modules/layers will be individually replaceable – data integration, output interface, database layer, connecting to sensors (input interface)
  6. The solution will use the queue system for securing persistence and synchronization of received data/messages
  7. The whole solution will be deployable in the virtual architecture VMWare
  8. Individual modules will use the technology of Linux containers (Docker) for deployment
  9. The source code of individual parts of solutions will be kept in a git repository, it will not contain sensitive data and it will be ready for publication as an open-source
  10. Source codes will be covered by unit tests, which will be launched automatically before the deployment of a new version
  11. The module of the output interface and the integration interface with the database layer will be possible to fully get going individually without dependence on other modules
  12. The whole solution will be possible to monitor with standard monitoring tools
  13. The assumed volume of data is 1 TB for the time of 3 months, the solution will be dimensioned at least for this volume
  14. The system will be dimensioned for an input of 200 data messages per second
  15. The solution will be robust to the outage of any one model (layer) of the application for a limited time
  16. The whole solution will be deployed in the regime of high availability, it will be resistant to the outage of any of the virtual engines
  17. The system will perform automatic data backups from the data storage and from other modules
  18. The availability of services will be:
    • Output API: 99.5
    • Database layer: 99.5
    • Dispatching: 99.5
    • Monitoring: 99.5
    • Alerting: 99.5
  19. A part of the solution is also a proposal for solving a Disaster recovery and critical scenarios
  20. A part of the solution is a proposal of automatic backup, data export for migration of data