Maintenance & Support

Frends services are constantly being monitored and taken care of.

The Frends Cloud services are being monitored and managed by the Platform Operations team. Here's an overview of what is being done and at what frequency.

Maintenance

Cloud-hosted Frends resources are updated in line with our release cycle.

For customers with hybrid configurations (Cloud and On-Prem Agents) we arrange a suitable time/date to update the Frends Core service and Agents in order to synchronize the update of On-Prem resources.

OS and Security updates for Cloud Agent are kept in line with Microsoft’s monthly patching cycle.

To avoid Agent downtime during patching we recommend using load balanced Agents in a high availability configuration for production environments.

Database Maintenance

Log and configuration databases for Frends Tenants are periodically maintained by purging old Process Instance logs, including Instance header data and promoted values, and reorganizing the indices. This is automated for both cloud and self hosted instances of Frends. Note that Process Instance step data is uploaded to Azure Blob Storage, allowing less data to be stored on each Agent's database.

By default, the Process Instance purge and index reorganization will be run on Log Service startup, and is rescheduled to run every 24 hours after finishing successfully. The purge will delete Process Instances older than what is set as the retention period by log settings for each Agent Group, which is up to 60 days.

The purge is done by executing the stored procedure PurgeProcessHistory. The purge procedure has a 30 minute timeout and if it cannot finish or an error occurs the execution is retried after 30 minutes up to five times by default.

You can configure the maintenance actions with the following optional settings in deploymentSettings.json when installing an Agent. These settings should be put directly under the root settings node:

  • maintenanceTimeWindowStart - string with a format of "[hour]:[minute]:[second]", e.g. 00:30:00 for half past midnight

  • maintenanceRetryCount - number

  • disableDatabaseMaintenance - boolean, set to true if you have set up your own scheduled cleanup and maintenance procedures.

Backups

Frends Databases (LogStore, ConfigurationStore) have Point-in-time-restore and geo-redundant backups with 30 day retention. ConfigurationStore have an additional off-site backup with 7 day retention.

Monitoring

All Frends Cloud resources are monitored around the clock with alert thresholds set for key metrics.

HTTPS Endpoints for Monitoring

Each agent can have status and monitoring HTTPS endpoints enabled for them, where status and metrics of the Agent can be read from. The endpoints are enabled by configuring HTTPS for the Agent, or by setting HttpStatusinfoPort option in Agent application settings. API key authentication for the endpoints can be configured by HealthCheckApiKey option. If enabled, API key is to be provided in the HTTP request headers with name health-check-api-key.

With the base URL set to be your Agent's base URL, the /frendsstatusinfo endpoint can be used to check if the Agent is running and not paused, and the response message also contains a message of agent health, executing processes, and memory usage. Information is provided as JSON so it can be used to create automated monitoring.

For additional Agent metric details the /metrics endpoint can be used, which exposes metrics in Prometheus format for scraping by third-party monitoring tools. This endpoint provides infrastructure-level metrics including HTTP client operations, SQL client operations, and Agent health indicators.

For guidance on creating monitoring dashboards, see How to create Grafana dashboard for Frends.

OpenTelemetry Integration

Frends Agents support OpenTelemetry Protocol (OTLP) for exporting metrics and distributed traces to modern observability platforms such as Grafana, Datadog, Splunk, and Honeycomb. OpenTelemetry is enabled through Agent application settings and requires an OpenTelemetry Collector to receive and forward telemetry data.

OpenTelemetry provides two types of telemetry data: metrics (Agent health, HTTP client operations, SQL client operations, and infrastructure-level performance data) and traces (health check endpoints, SQL database operations, and outbound HTTP requests). This infrastructure-level monitoring does not include Process-level execution data, Task performance, business metrics (promoted variables), or custom KPIs. For Process-level metrics, use the Frends Platform API.

The /metrics endpoint allows monitoring tools to pull metrics via HTTP scraping, while OpenTelemetry pushes both metrics and traces to a collector using the OTLP protocol. HTTPS endpoints are simpler to set up and suitable for basic monitoring. OpenTelemetry provides distributed tracing and standardized integration with observability platforms. Both can be used together for comprehensive monitoring coverage.

For detailed implementation instructions, see How to visualize Frends telemetry with OpenTelemetry.

Self-hosted Agents

For self-hosted Agents the monitoring is provided by the above-mentioned endpoints, as well as by any monitoring tools available in the Agent machine's operating system. For example, Windows Performance Counters can be used to retrieve agent status information.

PaaS Agents

Frends Cloud Agent monitoring services include:

Frends Heartbeat Monitor Service

  • Agent Service Availability

Azure Log Analytics Agent (30 Day retention)

  • Event logs (System/Application)

    • Error Events (OOM)

    • Service Failures

  • Performance Counters (Ingested by Azure Log Analytics)

    • Memory

    • CPU

    • Page File

    • Threads

    • Disk

    • Frends Process Executions

    • Frends Process Logging

    • Network IO

Application Insights

  • Availability Checks

  • Monitor Agent API Endpoints (Ports 80 and/or 443)

  • 5 Second ping from 5 geographically different locations

Frends Core & UI

Monitoring for the Frends Core services and the UI includes:

Web UI

  • Performance Counters (Ingested by Azure Log Analytics)

    • Memory

    • CPU

    • Response Time

  • Application Insights (Ingested by Azure Log Analytics)

    • Application Exceptions

    • Availability Checks

      • Monitor Frends UI

      • 5 Second ping from 5 geographically different locations

Azure Storage

  • Performance Counters (Ingested by Azure Log Analytics)

    • Usage

    • Latency

    • Ingress/Egress

    • Transactions

    • Availability

Database

  • Performance Counters (Ingested by Azure Log Analytics)

    • Storage Usage & Growth

    • DTU Load

Service Bus

  • Performance Counters (Ingested by Azure Monitor)

    • Inbound/Outbound Messages

    • Requests

    • Connections

    • Active Messages

    • Inbound/Outbound Queue Size

    • Errors (User/Server)

    • Dead Letter Count

    • Message Count

    • Throttled Requests

Last updated

Was this helpful?