# Frends Process Optimization

Frends Processes can handle a wide variety of integration workloads, from lightweight API calls to heavy data migrations involving millions of records. As the volume and complexity of data grows, so does the importance of designing Processes with performance in mind.

This guide covers practical tips and techniques for optimizing Process performance, with a particular focus on handling large amounts of data, managing memory, tuning log settings, and structuring your Process design for efficiency.

The techniques and principles in this guide are rooted in general software engineering and database development practices — they are not unique to Frends, but are presented here through the lens of Process design, where each concept maps directly to a concrete shape, setting, or execution pattern.

## Prerequisites

To follow this guide, you should have a working knowledge of the Frends platform, including how to build and configure Processes, use Tasks, and navigate the Instance view. Familiarity with concepts like Subprocesses, Scopes, and log settings is helpful.

Having general knowledge about software engineering, especially with .NEt and C#, will help with Process development and optimization as well.

## Understanding Performance Bottlenecks

Before applying optimizations, it helps to understand the most common sources of performance problems in Frends Processes. These typically fall into a few categories: loading too much data into memory at once, excessive logging of large payloads, inefficient Process structure such as unnecessary Subprocess calls inside loops, and suboptimal database queries.

Processes that handle large datasets can run for hours or crash entirely with an out-of-memory error, particularly in cloud environments where Agent hardware — and memory especially — is limited. Identifying which of these areas is causing problems is the first step toward fixing them.

### Avoid Premature Optimization

It is worth noting that performance optimization should be driven by actual, observed problems rather than applied speculatively during initial development. Optimizing a Process before you have evidence of a bottleneck often adds unnecessary complexity, makes the code harder to read and maintain, and may solve a problem that would never have materialized in practice.&#x20;

Write your Process to be correct and clear first, measure its behavior under realistic load, and then optimize the parts that are genuinely causing issues.

## Process Structure and Design

How a Process is structured has a lasting impact on its runtime performance. Keep the following principles in mind when designing Processes that need to handle scale.

Avoid unnecessary intermediate transformations. If a previous Task already returns a `JToken`, there is no need to pass it through a `JSONStringToJToken` Task or `JToken.Parse(#result.ToString())` C# expression. Extra conversion steps add processing time and memory overhead with no benefit.

Every intermediate result stored in a separate variable multiplies the memory required to hold the same data. If a Process reads a dataset, transforms it, and stores each stage in a new variable, all of those representations exist in memory simultaneously until the garbage collector reclaims them. Where possible, prefer in-place processing or reuse the same variable across transformation steps, so that only one copy of the data is live at any given time. This is especially impactful when working with large payloads.

For very large or complex workloads, consider splitting the full processing across multiple separate Processes, each responsible for a distinct stage — one to query and extract, one to transform, and one to deliver. Each Process completes its work, releases its memory, and hands off to the next via a file, a database record or a Shared State, or an HTTP call. This way no single Process holds the entire dataset in memory from start to finish, and each stage can be monitored, retried, and optimized independently.

Another effective strategy for large-volume Processes is offloading intermediate data to storage rather than keeping it in Agent memory between stages, during the same Process. Writing results to a file on disk or to a blob storage service means the data does not need to live in memory while subsequent stages execute. Each stage can complete, release its memory, and the next stage reads only what it needs from the file.

For high-throughput Processes, consider offloading complex transformation logic to the data layer itself. Running transformations in a database like Snowflake or SQL Server and using Frends primarily for transport is often far more efficient than performing heavy mapping in-memory within a Process.

## Handling Large Amounts of Data

Large data volumes are one of the most common performance challenges in integration development. The core principle is to avoid loading the entire dataset into memory at once, and instead work with it in smaller pieces.

### Micro-batching

When processing large datasets from a database or file, micro-batching is one of the most effective strategies available. Instead of querying the entire dataset and processing it as one large collection, divide it into smaller chunks and process each chunk sequentially.

A typical micro-batch size falls somewhere between 20 and 20,000 rows, depending on your data and available Agent memory. A good rule of thumb is to keep Agent memory usage below 70% — if it climbs above that, reduce your batch size; if it stays well below, you can increase it for better throughput.

When implementing micro-batching, process each batch fully before querying the next one. This allows the .NET garbage collector to reclaim memory used by the previous batch before you load new data. Combining all batches into a single large collection before processing defeats the purpose, so avoid that pattern unless the downstream system requires it — and even then, perform the combination after processing, when the data is likely smaller.

### Data Format Considerations

The format of your data has a noticeable effect on memory usage, and the reason comes down to how Frends handles each format internally. When a Task returns XML or CSV, the result is stored as a plain string in memory. JSON, on the other hand, is deserialized into a `JToken` object graph. The deserialization operation itself is memory-intensive, and the resulting object graph takes up considerably more memory than the equivalent raw string — which is why a Process might handle a 100 MB CSV file without any issues but run into problems with a 20 MB JSON file. When you have a choice over the data format for a large-volume integration, XML or CSV will generally be less memory-intensive.

### Parallel Processing

For workloads where independent operations can run concurrently, use parallel branches in your Process. Drawing two or more connections out from a shape causes those branches to execute in parallel. This is particularly effective for operations like processing data for multiple carriers or calling independent APIs simultaneously.

The biggest performance gains from parallelism typically come from the first couple of parallel threads. Running two or three parallel For Each loops that each handle a portion of the data can yield significant throughput improvements. However, running a very large number of parallel Subprocess calls — hundreds or more — can starve the thread pool and delay or freeze other concurrently running Processes. Limit the degree of parallelism based on your Agent's CPU and memory resources.

Parallel branches carry a fixed overhead, so the operations being parallelized need to be substantial enough to justify it. Parallelizing trivial work — such as variable initialization or simple value assignments — will likely make a Process slower rather than faster, as the coordination cost outweighs any benefit. Parallel branches are most effective when each branch performs meaningful, time-consuming work such as external API calls, database queries, or processing a sizable chunk of data.

Note that Inclusive Decision does not perform true parallel processing. Despite visually resembling a parallel split, an Inclusive Decision evaluates and executes its branches serially, in the order they were created. If parallel execution is required, use a plain parallel split by drawing multiple connections out from a non-decision shape instead.

### Subprocesses and Serialization Overhead

Subprocesses are useful for organizing reusable logic, but calling them repeatedly inside a loop comes with a cost that is easy to overlook. Every time a Subprocess is called, the input data is serialized at the Process boundary and the result is deserialized back on return — regardless of how much actual work the Subprocess performs.

The concern here is specifically a Subprocess that contains only one or two Tasks in its body: when such a Subprocess is called on every iteration of a For Each loop, the serialization round-trip on each call can easily exceed the time spent doing the actual work inside the Subprocess. Over thousands of iterations this overhead compounds, adding significant CPU cost and potentially causing memory pressure if the garbage collector cannot release the result of each call before the next one begins.

If a loop needs to process a large collection, consider keeping the logic inline within the main Process, or restructure the Subprocess to accept a full batch of records at a time rather than a single record per call.

## Managing Memory with Scopes

The Scope element in Frends is a powerful tool for controlling memory usage. Variables declared inside a Scope are eligible for garbage collection once execution leaves that Scope. This means that if you wrap the processing of each micro-batch inside a Scope, the memory used by that batch can be released before the next batch begins.

Individual Tasks, Code Tasks, Assign Variable shapes, and Foreach shapes also have a **Dispose at the end of the scope** toggle under their Advanced Settings. When enabled, the result and any other memory allocations produced by that shape are released once the containing Scope exits. The containing Scope can be a Scope shape, a single iteration of a Foreach or While Scope, or the Process execution itself.

The option is most useful inside a Scope or Foreach iteration, where releasing memory early — before the next iteration or the next stage — can meaningfully reduce peak memory usage.

Even at the Process level the option has practical value: while objects are eventually finalized when the Process ends, the .NET finalizer only calls `Dispose` if the class has explicitly overridden it for that purpose — enabling this toggle guarantees that `Dispose` is called explicitly, regardless.

Note that this option has no effect for non-disposable objects, such as plain strings or simple value types. Its primary benefit is for shapes that return objects implementing `IDisposable`, as well as for Task results that hold large amounts of data.

If a variable is declared outside a Scope but used inside one, it will not be automatically released when the Scope ends. In that case, you can manually reset the variable by assigning it `null` or an empty value to allow the earlier memory to be deallocated.

## How Logging Affects Performance

Log settings have a substantial effect on Process performance, particularly when processing large amounts of data. On the default log level, the Frends Agent serializes the result of every shape and stores it in the log database. If a shape returns a large payload — say, 1 GB of JSON — that data is serialized, potentially multiple times across multiple shapes, before being written to the log. In a Process with several such shapes, the combined memory impact can easily reach several gigabytes.

Performance can be up to 50% slower with full logging enabled compared to running with logging disabled or reduced, and memory consumption increases correspondingly.

### Log Level Settings

Frends offers three process-level log settings, and choosing the right one for production workloads matters.

The **Default** log level records the result of each executed shape, with arrays truncated to the first 100 elements and text values capped at 10,000 characters. Input parameters for Tasks and Subprocesses are not logged at this level. This is a reasonable level for development and troubleshooting but carries a noticeable memory cost when processing large payloads.

**Only Errors** logs nothing for successful executions. Step results and Subprocess outputs are not recorded unless an error occurs, at which point the relevant parameters and exception details are captured. This level is strongly recommended for Production Environments, and for Processes that handle high data volumes or execute frequently, as it significantly reduces memory pressure, I/O load, and storage consumption. Note that if you use this level, you should avoid promoting large values, as promoted values are always logged regardless of the log level setting.

**Everything** logs all parameters and results in full, without truncation. This is useful during development or active debugging but should not be used in production, as it generates large amounts of log data and has a direct impact on performance.

### Per-Shape Logging Controls

Beyond the process-level log setting, each individual Task, Code Block, and other shape has its own logging options under Advanced Settings.

The **Skip logging result and parameters** toggle prevents the values for that shape from appearing in the Instance logs at all. When enabled, the log shows `<truncated>` in place of the actual values. This is useful both for hiding sensitive data such as passwords or tokens, and for reducing the memory and I/O cost of logging when a shape handles large payloads. Applying this selectively to shapes that process or return large datasets can provide a meaningful performance improvement without sacrificing visibility into the rest of the Process.

**Promoted values** work in the opposite direction — they guarantee that a shape's result is always logged, even under the Only Errors setting. Use promoted values deliberately during development, and remove or limit them in production to avoid unintended memory and storage costs.

### Logging and For Each Loops

Processes with a very high number of loop iterations accumulate a significant amount of log data. A For Each loop iterating over tens of thousands of records generates a corresponding number of log entries, which can make browsing the Instance view slow or impossible, and causes meaningful overhead in the Frends background log processing. Using micro-batching to reduce the total number of iterations — rather than iterating over every individual record — helps keep this under control. Setting the log level to Only Errors is also strongly recommended for high-iteration Processes.

## Database Query Optimization

When a Process reads from or writes to a database, the performance of those queries directly affects overall Process execution time. Most database performance issues in integration scenarios stem from read patterns that degrade as the dataset grows — and the most common culprit is `OFFSET`-based paging. When using `OFFSET`, the database engine must internally scan and discard all rows preceding the requested page, meaning execution time grows linearly the further into the dataset you go.

### Keyset Pagination

The most efficient way to page through a large dataset while maintaining near-constant execution time is keyset pagination, sometimes called the seek method. Rather than telling the database which page to jump to, you tell it where the last batch ended — using a unique, indexed column such as a primary key or a timestamp as a cursor.

The first batch is fetched with a simple ordered query. Each subsequent batch adds a `WHERE` clause filtering to rows beyond the last value seen in the previous batch:

```sql
-- First batch
SELECT TOP 100 TransactionID, Amount, Date
FROM Transactions
ORDER BY TransactionID ASC;

-- Next batch, using the last TransactionID from the previous result (e.g. 5042)
SELECT TOP 100 TransactionID, Amount, Date
FROM Transactions
WHERE TransactionID > 5042
ORDER BY TransactionID ASC;
```

Because the database can use an index seek to jump directly to the cursor value, the time to retrieve the millionth row is virtually the same as retrieving the hundredth. The trade-off is that you cannot jump to an arbitrary page number without knowing the ending cursor value of the preceding page — but in the context of Frends micro-batching, where batches are processed sequentially, this is rarely a concern.

### Clustered Indexes and Sort Order

For large batch queries, ensure your `ORDER BY` clause matches the table's clustered index. When data is physically stored in the order you are querying it, the database performs an efficient sequential read. Querying or sorting by a non-indexed column forces SQL Server to sort the entire result set in `tempdb` before returning anything, which causes a significant and growing performance penalty as the dataset size increases.

Where stale reads are acceptable, consider using the `NOLOCK` hint or `READ UNCOMMITTED` isolation level on read queries to avoid lock contention on busy tables. This allows queries to read rows that may be mid-write, which is a reasonable trade-off in many integration scenarios where absolute read consistency is not critical.

### Incremental Loading

When implementing incremental loading patterns, use a timestamp-based approach to query only records created or modified since the last run. This avoids re-processing the entire dataset on every execution and is one of the most effective ways to reduce both query time and data volume. For cases where the source data has no reliable modification timestamp, keyset pagination with a tracked cursor value is a practical alternative — page through the full dataset in fixed-size batches until no more rows are returned.

### Parallel Batch Processing

For high-volume extraction workloads such as ETL Processes, micro-batching can be combined with parallel processing by dividing the full dataset into ID range buckets and processing each bucket in a separate parallel branch. Records 1–100,000 in one branch, 100,001–200,000 in another, and so on — with each branch paging through its own range using keyset pagination independently. This approach pairs naturally with the parallel branch and micro-batching strategies described earlier in this guide.

### Write Optimization

For writes, prefer bulk insert operations over row-by-row inserts. Most database Tasks in Frends offer a `BulkInsert` option that is significantly faster when inserting large numbers of records.

## AI Connector Usage Optimization

When a Process uses the Frends AI Connector, performance considerations go beyond execution speed. Every call to a language model consumes tokens, and tokens translate directly into Frends AI credits. Understanding how the AI Connector works — and where tokens are spent — lets you design AI-powered Processes that are both effective and cost-efficient.

### Understanding Token Usage

The AI Connector charges credits based on the number of tokens consumed during a request, calculated separately for input and output according to the formula `(Input_Tokens / Model.Input_Tokens_Per_Credit) + (Output_Tokens / Model.Output_Tokens_Per_Credit)`, with a minimum of one credit per call. You can inspect the actual token counts after each AI Task execution through the `#result.Metadata` object, which exposes `InputTokenCount` and `OutputTokenCount` alongside the credits used and credits remaining for the Tenant.

Monitoring these values during development — for example by logging `#result.Metadata` in a test run — gives you a reliable baseline before deploying to production. If token counts are higher than expected, the metadata breakdown tells you whether the cost is coming from the prompt side (input) or the generated response (output), which directly informs where to focus optimization efforts.

### Writing Efficient Prompts

The system prompt and user prompt together make up the input tokens for every AI Connector call, so their size has an outsized effect on credit consumption. Compressing the system prompt alone can reduce total token usage by more than 20%, while trimming the user prompt typically reduces prompt tokens by around 10% and overall usage by roughly 6%. Even small discipline around prompt verbosity compounds across many Process executions.

#### Keeping the System Prompt Focused

The system prompt defines the AI's role and constraints. It is sent on every call, so it pays to keep it precise. Avoid repeating instructions that can be inferred from context, and resist the temptation to paste in general background information that the model does not need to complete its specific task. A system prompt that is two concise paragraphs will almost always outperform one that runs to several screens of text, both in token cost and in output reliability.

#### Passing Only Relevant Data in the User Prompt

The AI Connector does not automatically receive any Process variables or execution context — only what you explicitly include in the user prompt. This means you have full control over what the model sees, and you should use that control deliberately. Extract and pass only the fields the AI actually needs to complete its task rather than serializing an entire object or JSON structure. If the model only needs a customer name and order total to generate a summary, there is no reason to include the full order payload.

When the data passed to the AI comes from an external source or end-user input, use a prompt template and validate or sanitize the input before inserting it. Passing raw, unvalidated user input directly into a prompt creates a prompt injection risk where a malicious user can attempt to override the system prompt or extract information they should not have access to.

### Tuning Model Parameters

Two parameters in the AI Connector have a direct effect on output quality and token efficiency: `Temperature` and `TopP`. Both are set in the Task's configuration and can also be overridden at a more granular level through the `Custom Options` JSON field.

`Temperature` controls the randomness of the model's output on a scale from `0.0` to `2.0`. Lower values produce more deterministic, focused responses, which tend to be shorter and more consistently on-topic. For Tasks where the output format is predictable — such as generating structured JSON, classifying input, or extracting specific fields — setting a low temperature like `0.1` or `0.2` reduces both wasted output tokens and the likelihood of needing a retry. Higher temperature values make sense for creative or generative Tasks where diversity in output is desirable, but they come with higher and less predictable token costs.

`TopP` (nucleus sampling) works similarly, restricting the model to tokens within a cumulative probability mass. A `TopP` of `1.0` means the full vocabulary is available; lower values restrict the model to its highest-confidence choices. For deterministic Tasks, combining a low `Temperature` with a `TopP` around `0.9`–`0.95` is a common and effective starting point.

### Structuring AI Logic Across Multiple Tasks

It can be tempting to consolidate all AI logic into a single, large prompt — asking the model to analyze input, make a decision, and format output all at once. In practice, chaining multiple focused AI Tasks in sequence often produces better results with fewer total tokens. Each Task receives a narrower prompt, generates a shorter and more predictable response, and feeds only the relevant output forward to the next step.

This pattern also makes it easier to identify where failures or unexpected outputs originate, since each Task has a clear, single responsibility. When a reasoning step needs to be repeated or retried, only that one Task is re-executed rather than the entire monolithic prompt.

Where the same AI logic is needed across multiple Processes, encapsulate it in a Subprocess. Reusing a configured AI Agent as a Subprocess avoids duplicating system prompt definitions and model parameter settings, and keeps updates centralized.

### Avoiding Redundant AI Calls

Not every step in a Process needs to call a language model. If the output of an AI Task is deterministic given the same input — for instance, a classification that always maps a fixed set of values — consider whether the logic could be handled by a simpler expression or a lookup table instead. Reserve AI Tasks for the parts of a Process where natural language understanding or generation genuinely adds value.

When a Process runs in a loop and the AI Task's input does not change between iterations, move the AI Task outside the loop so the model is called once and its output is reused. Calling an AI Task inside a high-frequency loop is one of the fastest ways to accumulate unnecessary credit consumption.

### Logging Considerations for AI Tasks

The general recommendation to use the `Only Errors` log level in production applies with particular force to AI Tasks, since the `Result` object returned by the AI Connector can contain large JSON payloads including the full generated text and metadata. Logging these in Production Environment will inflate stored log volume noticeably. Reserve verbose logging for development and testing environments where you need visibility into prompt content and token metadata. Consider using external log database for the AI decisions for better control and performance.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.frends.com/guides/development/frends-process-optimization.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
