This chapter describes the functional requirements (‘what can it actually do?’) of the Golemio data platform in relation to the software tool (the application set).
Functional requirements
Integration of data from external systems of cities and municipal companies
The system will process real-time data
The exposed push API will be able to process
JSON data
XML data
Plaintext data
Binary data
The data on the input API will be validated
Input API will be authorized
On the level of an API key (in headers or in the body)
On the level of an IP whitelist
The individual types and levels of the authorization will be adjustable separately for every API endpoint (data from one source)
The system will be able to communicate in both directions with connected devices
Calling of an external controlling API
Connecting to a WebSocket
The system will process static data and data from an API / DB
Pull API
JSON REST API
XML/SOAP API
A proprietary HTTP/S API
An external FTP storage
Downloading data in the interval from 1 minute to 1 year
Adjustable by configuration (the definition of CRON)
The received data will be validated and there will be a check of its correctness with the option of setting up operative alerts in the case of an error
The system will process manual inputs (map documentation, code lists) – by direct upload to the database
GeoJSON, JSON, CSV formats
The system will allow automatic upload to the OpenData catalogue
Weekly, daily frequency
Calculations will be run over received data
Geoprocessing, delay calculations, predictions, data enrichment
Data storage
SQL and NoSQL databases will be used in the system with the option of direct access for data analysts
The system will store current data (current location, current occupancy)
The system will store historical data (occupancy history, location history, state history)
Data analysis
The option of connecting tools for data analysis (Grafana, PowerBI, R studio, ArcGIS Desktop or other standard tools)
The system will store received raw data for various time according to the needs of the project
Web applications
The web application for internal dashboards, dispatching (operative monitoring) and admin panel DP or the use of the Golemio website for public dashboards
The display of public dashboards
The display of internal dashboards
The option of managing the connected devices via a dispatching application
Access via user credentials to the web application intended for the display of specific dashboards
Logging in and managing access to individual dashboards based on user roles
Dividing access to individual data on dashboards (e.g. the lamp dashboard, some users only see the lamps with the parameter ‘municipal district: Prague 7’, other users see them all)
Access on the basis of the municipal district
Access on the basis of permitted IDs of individual records
Data update (the shortest possible update time)
For public dashboards from 30 minutes
Internal dashboards from 1 minute
Dispatching panel from 10 seconds
A web application for the admin panel for managing users and rights
Managing of users, user groups, API accesses (more can be found in the Open API chapter)
Web applications for managing API keys for the public
The option of registering
The option of generating an own API key
The option of deleting an API key
An application in Czech and English localization
Open API
The implementation of an output gateway for the accessibility of data from the Data Platform for third parties (mobile developers, external systems)
Uniform REST API
API based on relational (SQL) and document data (NoSQL)
Documentation of the output API, including the description of endpoints, the description of HTTP methods and parameters, the structure of return data and sample data
Output API will support pagination, limitation of the number of returned records
Output API will support the query with the help of location (geolocation query) and the sorting of results by the distance from the point
Output API will support filtering of results by municipal districts or by other specified filters
Output API will provide historical data with the parameter From-To (the date and time) for filtering results
Output API will use caching (every query will not mean a query to the database)
Setting up request limits, roles and access permissions, logging access
An admin panel with an overview of accesses/utilizations
Logging of all the accesses and generating of statistics
Blocking of a previously provided access
It will be possible to determine the general rate limit for a user
Setting access permissions will be possible on the basis of a data set (endpoint) and on the basis of attributes in data
Adjustable accesses
Access to individual API endpoints
Access based on the municipal district
Access based on permitted IDs of individual records
Access with a maximal query limit (its size)
Automatic generating of API keys
Users will have the opportunity to generate their own key, which will have a default rate limit value
Email verification – sending a verification email for activating the key
Alerting and monitoring
Data check
In the given time frame there must be an amount of incoming data, otherwise, a warning will be sent
Periodic calling of external sources and schema validation
Verification of values against set rules
Monitoring
All the services will be monitored via heartbeats
The number of received data will be logged and monitored (received push data on the input API and pull from external API)
A regular check of data sources and reporting will take place
Logs will be collected and alerting to any serious errors in the applications will be set up
The above-mentioned monitoring requirements will have GUI
Non-functional requirements
Dispatching and dashboards will be available as a web application with managing of accesses
The admin panel will be available as a web application for managing of accesses
The individual modules of solutions will be horizontally scalable – a layer for data integration, an input interface layer, a database layer, connecting to sensors
The modules will allow a performance increase and an increase in the number of processed requests/storage capacity with the help of horizontal scaling without interfering with source codes of solutions
Individual modules/layers will be individually replaceable – data integration, output interface, database layer, connecting to sensors (input interface)
The solution will use the queue system for securing persistence and synchronization of received data/messages
The whole solution will be deployable in the virtual architecture VMWare
Individual modules will use the technology of Linux containers (Docker) for deployment
The source code of individual parts of solutions will be kept in a git repository, it will not contain sensitive data and it will be ready for publication as an open-source
Source codes will be covered by unit tests, which will be launched automatically before the deployment of a new version
The module of the output interface and the integration interface with the database layer will be possible to fully get going individually without dependence on other modules
The whole solution will be possible to monitor with standard monitoring tools
The assumed volume of data is 1 TB for the time of 3 months, the solution will be dimensioned at least for this volume
The system will be dimensioned for an input of 200 data messages per second
The solution will be robust to the outage of any one model (layer) of the application for a limited time
The whole solution will be deployed in the regime of high availability, it will be resistant to the outage of any of the virtual engines
The system will perform automatic data backups from the data storage and from other modules
The availability of services will be:
Output API: 99.5
Database layer: 99.5
Dispatching: 99.5
Monitoring: 99.5
Alerting: 99.5
A part of the solution is also a proposal for solving a Disaster recovery and critical scenarios
A part of the solution is a proposal of automatic backup, data export for migration of data