General SQL tips
There are a few ways in which SQL queries can be made much faster in integration scenarios. Some of these techniques involve thinking about how data is stored in a database.
Clustered indexes
A clustered index is an index that defines the physical order in which table records are stored in a database = the physical order of the data in the file system. Since there can be only one way in which records are physically stored in a database table, there can be only one clustered index per table. By default, a clustered index is created on a primary key column and it usually is the best candidate for it, however, there are cases where it might be helpful to use another column as a clustered index because the best candidate for the clustered index is a column that stores a unique short static increasing value which gets accessed frequently and is used in ORDER BY clauses and the current primary key might not meet these requirements e.g. using GUIDs as the primary key. The clustered index can be beneficial for queries that read large result sets of ordered sequential data.
Non-Clustered indexes
A non-clustered index is mainly used to speed up search operations. Unlike a clustered index, a non-clustered index doesn’t physically define the order in which records are inserted into a table. In fact, a non-clustered index is stored in a separate location from the data table. A non-clustered index is like a book index, which is located separately from the main contents of the book. Since non-clustered indexes are located in a separate location, there can be multiple non-clustered indexes per table. Keep in mind that if the search operations select columns that are not included in the non-clustered index the search operation might be slower because the columns which are included in the non-clustered index can be searched from the index and then the reference to the actual table record will be used to search the columns which are not included in the non-clustered index.
Removing constraints
Removing a constraint in a database involves eliminating a restriction on the values that can be stored in a table column or set of columns. The process for removing a constraint may vary depending on the type of constraint and the specific database management system being used. Generally, the steps involved include identifying the constraint, determining its type, checking for dependencies, removing the constraint using a command such as "ALTER TABLE" or "DROP CONSTRAINT", and validating the table to ensure data integrity and consistency. It's crucial to exercise caution when removing constraints, as it can have implications for data integrity, and it's often a good idea to create a backup of the database before making any changes.
Foreign keys
Foreign keys are a useful feature for ensuring data consistency and referential integrity in a database, but they can also present challenges in certain scenarios. Problems with foreign keys can include difficulty making changes to data, the potential impact on database performance, complexity of the database schema, maintenance challenges, and error handling issues. Despite these challenges, foreign keys remain an important tool for database management, and careful planning and consideration can help mitigate potential issues.
Pagination
Pagination is a technique used in SQL queries to limit the amount of data that is retrieved from a database at a time. It is particularly useful when working with large datasets, as it can improve the performance of the query and make the data more manageable.
In addition, Frends Agents have only a limited amount of memory and other resources, so pagination is the only way to make really big queries. Depending on the data page size can vary between a few hundred rows to something like 100 000 rows.
The basic idea behind pagination is to break up the results of a query into smaller, more manageable chunks. For example, instead of retrieving 100,000 rows of data all at once, a pagination technique might retrieve just the first 1,000 rows at a time, and then retrieve subsequent chunks of 1,000 rows as needed.
Turning off Frends Logging
As with all other Frends processes, turning off logging might improve performance, and if the amount of data that is being processed is huge this impact might be significant. If you have turned off logging and want to gather information on e.g. queries that have failed, you can collect them to a manually created array. It is also beneficial to Promote amount of failed rows, so they are easy to see on Process List.
Garbage Collection (GC) and Dispose
Frends uses standard .Net garbage collection. Even though it is sometimes slow to release memory between loop iterations it is better not to try doing tricks to get it to release memory any faster. Those tricks tend to have unintended consequences. Starting from Frends you can, however, instruct it to dispose of any element in Process.
Outsource work to a database
It should be noted that sometimes data transformations and queries are a lot easier to do in the database than in Frends.
The next article is Introduction to SFTP Tasks