Maintenance & Support
Frends services are constantly being monitored and taken care of.
The Frends Cloud services are being monitored and managed by the Platform Operations team. Here's an overview of what is being done and at what frequency.
Maintenance
Cloud-hosted Frends resources are updated in line with our release cycle.
For customers with hybrid configurations (Cloud and On-Prem Agents) we arrange a suitable time/date to update the Frends Core service and Agents in order to synchronize the update of On-Prem resources.
OS and Security updates for Cloud Agent are kept in line with Microsoft’s monthly patching cycle.
To avoid Agent downtime during patching we recommend using load balanced Agents in a high availability configuration for production environments.
Database Maintenance
Log and configuration databases for Frends Tenants are periodically maintained by purging old Process Instance logs, including Instance header data and promoted values, and reorganizing the indices. This is automated for both cloud and self hosted instances of Frends. Note that Process Instance step data is uploaded to Azure Blob Storage, allowing less data to be stored on each Agent's database.
By default, the Process Instance purge and index reorganization will be run on Log Service startup, and is rescheduled to run every 24 hours after finishing successfully. The purge will delete Process Instances older than what is set as the retention period by log settings for each Agent Group, which is up to 60 days.
The purge is done by executing the stored procedure PurgeProcessHistory
. The purge procedure has a 30 minute timeout and if it cannot finish or an error occurs the execution is retried after 30 minutes up to five times by default.
You can configure the maintenance actions with the following optional settings in deploymentSettings.json
when installing an Agent. These settings should be put directly under the root settings
node:
maintenanceTimeWindowStart - string with a format of "[hour]:[minute]:[second]", e.g. 00:30:00 for half past midnight
maintenanceRetryCount - number
disableDatabaseMaintenance - boolean, set to true if you have set up your own scheduled cleanup and maintenance procedures.
Backups
Frends Databases (LogStore, ConfigurationStore) have Point-in-time-restore and geo-redundant backups with 30 day retention. ConfigurationStore have an additional off-site backup with 7 day retention.
Recovery
Recovery in case of catastrophic failure depends partially on your deployment setup.
Recovery of Frends PaaS Agents
If Frends PaaS Agent machine is destroyed it can be recreated and configured to use the existing Azure SQL database that the PaaS Agents use by default. This usually takes less than half an hour. If there are customizations to the Agent machine in Frends Cloud, such as VPNs or certificates, those need to be handled separately. The PaaS Agents are meant to be kept as transient as possible to be able to recreate them as easily as possible.
Recovery for self-hosted Agents
Recovery of self-hosted Agents depend largely on the administrators for configuration of backups and performing the recovery. However, reinstalling an Agent to a new machine and synchronizing it will re-establish the same Agent from the view of Frends. Any customizations, such as firewall settings or certificates, will have to be reconfigured unless the machine is recovered from a backup.
Recovery for databases
The Configuration store is the most important database for the survival of the Frends installation, it is backed up on Azure and off-site. The Log store is backed up in Azure.
If Frends Configuration store or Log store databases, or core services crash, the Agents continue their operation. If the Service Bus is still operational it may fill up and stop operating as messages are not processed while the databases are down.
The Agent databases usually can be recreated at will, but the databases for PaaS Agents are backed up in Azure. If the Agent database is recreated any File Triggers may be re-executed if the files are not moved from their directories, which should be avoided anyway.
If Agent databases crash the Agent can not execute Schedule or File Triggers. If a connection to the database is lost the Agent will continue to execute integrations based on its cache and tries to reconnect to the database. This state can not be sustained long and for example, restarting the Agent will clear its cache.
Recovery for Service Bus
Even if the Service Bus connection is down, the Agents will continue to execute integrations as usual, but any remote Subprocesses cannot be executed as they rely on the Service Bus connection.
The Agent will start storing log data locally until it can reconnect to the Service Bus. If the connectivity is down for a longer period of time, the Agent may run out of space for the log data and cannot continue functioning.
Agents also cannot communicate with the Frends Core while the Service Bus is down, meaning there might not be any activity being updated to the Frends UI and any Process deployments to Agents will not function until the Service Bus is restored.
Major catastrophic failures
In a total destruction scenario, the Frends Environment can be recreated from the Configuration store database backup that is kept off-site.
Monitoring
All Frends Cloud resources are monitored around the clock with alert thresholds set for key metrics.
HTTPS Endpoints for Monitoring
Each agent can have a status and monitoring HTTPS endpoints enabled for them, where status and metrics of the Agent can be read from. The endpoints are enabled by configuring HTTPS for the Agent, or by setting HttpStatusinfoPort
option in Agent application settings. API key authentication for the endpoints can be configured by HealthCheckApiKey
option. If enabled, API key is to be provided in the HTTP request headers with name health-check-api-key
.
With the base URL set to be your Agent's base URL, the /frendsstatusinfo
endpoint can be used to check if the Agent is running and not paused, and the response message also contains a message of agent health, executing processes, and memory usage. Information is provided as JSON so it can be used to create automated monitoring.
For additional Agent metric details the /metrics
endpoint can be used, which similarly allows scraping by third party tools.
Self-hosted Agents
For self-hosted Agents the monitoring is provided by the above-mentioned endpoints, as well as by any monitoring tools available in the Agent machine's operating system. For example, Windows Performance Counters can be used to retrieve agent status information.
PaaS Agents
Frends Cloud Agent monitoring services include:
Frends Heartbeat Monitor Service
Agent Service Availability
Azure Log Analytics Agent (30 Day retention)
Event logs (System/Application)
Error Events (OOM)
Service Failures
Performance Counters (Ingested by Azure Log Analytics)
Memory
CPU
Page File
Threads
Disk
Frends Process Executions
Frends Process Logging
Network IO
Application Insights
Availability Checks
Monitor Agent API Endpoints (Ports 80 and/or 443)
5 Second ping from 5 geographically different locations
Frends Core & UI
Monitoring for the Frends Core services and the UI includes:
Web UI
Performance Counters (Ingested by Azure Log Analytics)
Memory
CPU
Response Time
Application Insights (Ingested by Azure Log Analytics)
Application Exceptions
Availability Checks
Monitor Frends UI
5 Second ping from 5 geographically different locations
Azure Storage
Performance Counters (Ingested by Azure Log Analytics)
Usage
Latency
Ingress/Egress
Transactions
Availability
Database
Performance Counters (Ingested by Azure Log Analytics)
Storage Usage & Growth
DTU Load
Service Bus
Performance Counters (Ingested by Azure Monitor)
Inbound/Outbound Messages
Requests
Connections
Active Messages
Inbound/Outbound Queue Size
Errors (User/Server)
Dead Letter Count
Message Count
Throttled Requests
Last updated
Was this helpful?