Integration of a dataset

The main instructions that describe the process of the integration of a new data set: from analysis, implementation of receiving PUSH data/periodic download of PULL data, transformation and storing in the database, to publishing and exposing an output API from the data.

1. Analysis of a data set

  • The process of acquiring data (PULL, PUSH)
  • The type and size of data
  • The format of storage and target DB (mongo, postgresql)

2. Creating schemas for a data set (Schema Definitions)


3. Input Gateway

  • git repo:
  • This step is needed only if the data is sent actively from the source (PUSH)
  • Creating an endpoint for receiving data
  • Validation of incoming data
  • Sending data to the queue
  • Documentation (Apiary)


4. Integration Engine

  • git repo:
  • Creating transformation of data, e.g.: modules/NewDataset/NewDatasetTransformation.ts
  • Creating a worker, e.g.: modules/NewDataset/NewDatasetWorker.ts
  • Adding a record to queueDefinitions.ts
  • Defining a data source in the worker, only if it is necessary to actively download the data (PULL)
  • Defining a model in the worker
  • Implementation of methods for processing messages from the queues, the whole logic
  • Test writing
  • Documentation (docs/


5. Definition of a cron task

6. Output Gateway