The Monitoring Subsystem gathers and processes runtime statistics and metrics from the entire AI-SPRINT system (including running AI-SPRINT-based applications) and sends alerts when a quality metric exceeds a specified threshold.
All components are open-source (Apache 2, MIT, Elastic 2, GNU AGPL).
A typical deployment of the Monitoring Subsystem comprises of:
- A time series database (InfluxDB) will be responsible for metric storage and data query execution. It will also provide UI for administrators and REST API to simplify administrative tasks.
- An indexing engine (ElasticSearch) will be responsible for storing application text logs. It will also allow an efficient way of browsing through the gathered data.
- Data visualisation system (Grafana) provides UI for system users and administrators with dashboards showing various system statistics. Out of the box Grafana deployed in the Monitoring Subsystem will contain a predefined dashboard displaying a basic set of collected statistics. As this tool allows users to configure customised dashboards, adjusted to their needs, users will be able to modify or create new dashboards to present specific data. It is worth mentioning that Grafana has built-in support for various data sources, including InfluxDB. This will allow users to extend their capabilities based on their future needs.
- Metric cache (Telegraf) gathers data and pushes it into InfluxDB.
- The Monitoring Subsystem Library is an additional Python library that allows users to send custom metrics from their applications to the database.
Watch the demo video