Monday, 19 December 2016

Optimizing Large Import

Hi All

Want to share with you some good approaches to Import a large amount of data.
If you ever tried to import some large set of data from Excel you actually know that it is not as fast as importing data into the SQL table. There are plenty of reasons why it is like this:
  • Data validation
  • Defaulting empty field
  • Running of business logic
  • Data integrity 
  • Updating of referenced data
  • Security tracking
  • Audit 
Of course all these rules have different effects in different modules. It is obvious that GL is much faster than SO Orders. At the same time, AR is faster than SO, but slower than GL.
Also import depends on hardware and database performance.

If you import AR Invoices from Excel you can expect to save about 2 AR Invoices per second.
In this case ideally you can import about 150 000 invoices per day. Actual performance may be different because performance depends on number of invoices and other dictionaries in database. Also this is just saving and does not include release operation that also may take similar amount of time. In addition to the performance we may have problem with errors handling.

What you need to do if you have 1 000 000 invoices to import and little time before go live of a site? It's impractical to use Excel files and import scenarios to do that. It will take too long, and will be very difficult to manage errors and add more advanced data mapping logic.

In this article I want to touch several optimizations tips that can be grouped into few groups:
  • Database Optimizations
  • Import Process Optimization
  • Business Logic Optimization
Lets go through all of them:

Database Optimizations
  • First of all keep database in good condition and do regular maintenance. Fragmented indexes can significantly slow down database performance. 
  • Have enough memory for database server. SQL server can cache most frequent queries in the memory, that helps to process selects faster.
  • SSD disk will significantly speed up disk IO operations
  • Microsoft SQL server will be faster than My SQL.
  • If possible split application and database on 2 different servers. This will help you free some additional processing resources
  • Microsoft Azure cloud Database may also bring some troubles. Azure infrastructure, replication and resources sharing between multiple database is slower than a dedicated server.

Import Process Optimization

Import from excel file using Integration Services will use just one core of the server. This is how system designed just because import process should not block any other users.
Acumatica is designed to leverage multiple cores. Using 2 cores for import can double the amount of imported documents in a minute. Assuming you can import 100 orders per core per hour, using 2 cores would give you 200 orders per hour, 3 cores 300 orders per hour. Performance will max out once the number of concurrent processes equals number of physical cores, or number of licensed cores if less than number of physical cores.
That is really significant!

Note that if SQL Server running on the same server as application server it will also consume some cores.
Lastly, please note that there could be screen-specific issues affecting scalability, like heavy locking of shared resources. If you face any, I would highly recommend to contact Acumatica Support Team.

To resolve that issues we can:
  • Split one Excel file into several and run several processes simultaneously
  • Develop an integration app, that will load documents though web-services.
My personal approach for high-volume imports has always been to use web services in multi-threaded mode.

For importing of large transactions I would recommend Screen-Base API. Contract based API is better and easier for real-time integration, but unfortunately when you do something simpler you usually loose on flexibility. In case of large import Screen Based API allows you to minimize callbacks to server and triggers of business logic validation.

So I would recommend to:
  • Use Screen-Based API that gives you more control on the process
  • Use bulk Import or Submit method. 
    • Import(Commands[], Filters[], String[][]) method performs similarly to the Import by Scenario form, accepting tabular data. You can pass there multiple records at once.
    • Submit(Commands[]) method allows simultaneous import and export of data in the form of commands coupled with values. If you use Submit, pass there multiple records simultaneously. It will help you to have less calls.
  • Remove some unnecessary business logic triggers by setting "Commit" flag to False. We recommend do this for grid fields - you may just disable all commits except the last one.
  • Note that by default .NET has a limit of 2 active connections per address, so unless you increase System.Net.ServicePointManager.DefaultConnectionLimit there will be a serialization of calls. 
  • Create and use multiple threads in your code. The optimal number of threads you can get from number of cores available on your Acumatica server server. 
    • Please remember that Acumatica license will limit processing power to licensed amount of cores.
    • Make sure to use a different session for each of your thread.
    • In case you have SQL Server on different computer, you can try 1-2 threads per 1 core.
Tip: Note that for large transactions import you also can use multiple instances of Acumatica connected to the same database. If you have multiple free servers, just install Acumatica there and use different instances from your integration application.

See below for a sample command-line app doing multi-threaded imports to Acumatica.


Business Logic Optimizations

Another thing I want to highlight in this article is how you can optimize import by disabling some of the business logic and features:
  • Auto-numbering of transactions has an impact on scalability due to lock contention. Disable it when you do large import importing.
  • Turn off discount & pricing feature, that will minimize business logic calls.
  • Turn off automatic packaging feature on places were it is applicable.
  • Consolidated posting to GL feature can minimize number of batches that will be generated during invoices release operation. Less documents - better performance. 
  • Disable Field-level Audit, for documents that should be imported.
  • Disable Request Profiller and other logging tools.
  • Disable Notifications if any configured

Example Multi-Thread Application 
This program is available on GitHub.
Source Code:

Have a nice Import!

No comments: