Want to share with you some good approaches to Import a large amount of data.
If you ever tried to import some large set of data from Excel you actually know that it is not as fast as importing data into the SQL table. There are plenty of reasons why it is like this:
- Data validation
- Defaulting empty field
- Running of business logic
- Data integrity
- Updating of referenced data
- Security tracking
Of course all these rules have different effects in different modules. It is obvious that GL is much faster than SO Orders. At the same time, AR is faster than SO, but slower than GL.
Also import depends on hardware and database performance.
If you import AR Invoices from Excel you can expect to save about 2 AR Invoices per second.
In this case ideally you can import about 150 000 invoices per day. Actual performance may be different because performance depends on number of invoices and other dictionaries in database. Also this is just saving and does not include release operation that also may take similar amount of time. In addition to the performance we may have problem with errors handling.
What you need to do if you have 1 000 000 invoices to import and little time before go live of a site? It’s impractical to use Excel files and import scenarios to do that. It will take too long, and will be very difficult to manage errors and add more advanced data mapping logic.
In this article I want to touch several optimizations tips that can be grouped into few groups:
- Database Optimizations
- Import Process Optimization
- Business Logic Optimization
Lets go through all of them:
- First of all keep database in good condition and do regular maintenance. Fragmented indexes can significantly slow down database performance.
- Have enough memory for database server. SQL server can cache most frequent queries in the memory, that helps to process selects faster.
- SSD disk will significantly speed up disk IO operations
- Microsoft SQL server will be faster than My SQL.
- If possible split application and database on 2 different servers. This will help you free some additional processing resources
- Microsoft Azure cloud Database may also bring some troubles. Azure infrastructure, replication and resources sharing between multiple database is slower than a dedicated server.
Import Process Optimization
Import from excel file using Integration Services will use just one core of the server. This is how system designed just because import process should not block any other users.
Acumatica is designed to leverage multiple cores. Using 2 cores for import can double the amount of imported documents in a minute. Assuming you can import 100 orders per core per hour, using 2 cores would give you 200 orders per hour, 3 cores 300 orders per hour. Performance will max out once the number of concurrent processes equals number of physical cores, or number of licensed cores if less than number of physical cores.
That is really significant!
Note that if SQL Server running on the same server as application server it will also consume some cores.
Lastly, please note that there could be screen-specific issues affecting scalability, like heavy locking of shared resources. If you face any, I would highly recommend to contact Acumatica Support Team.
To resolve that issues we can:
- Split one Excel file into several and run several processes simultaneously
- Develop an integration app, that will load documents though web-services.
My personal approach for high-volume imports has always been to use web services in multi-threaded mode.
For importing of large transactions I would recommend Screen-Base API. Contract based API is better and easier for real-time integration, but unfortunately when you do something simpler you usually loose on flexibility. In case of large import Screen Based API allows you to minimize callbacks to server and triggers of business logic validation.
So I would recommend to:
- Use Screen-Based API that gives you more control on the process
- Use bulk Import or Submit method.
- Import(Commands, Filters, String) method performs similarly to the Import by Scenario form, accepting tabular data. You can pass there multiple records at once.
- Submit(Commands) method allows simultaneous import and export of data in the form of commands coupled with values. If you use Submit, pass there multiple records simultaneously. It will help you to have less calls.
- Remove some unnecessary business logic triggers by setting “Commit” flag to False. We recommend do this for grid fields – you may just disable all commits except the last one.
- Note that by default .NET has a limit of 2 active connections per address, so unless you increase System.Net.ServicePointManager.DefaultConnectionLimit there will be a serialization of calls.
Create and use multiple threads in your code. The optimal number of threads you can get from number of cores available on your Acumatica server server.
- Please remember that Acumatica license will limit processing power to licensed amount of cores.
- Make sure to use a different session for each of your thread.
- In case you have SQL Server on different computer, you can try 1-2 threads per 1 core.
Tip: Note that for large transactions import you also can use multiple instances of Acumatica connected to the same database. If you have multiple free servers, just install Acumatica there and use different instances from your integration application.
See below for a sample command-line app doing multi-threaded imports to Acumatica.
Business Logic Optimizations
Another thing I want to highlight in this article is how you can optimize import by disabling some of the business logic and features:
- Auto-numbering of transactions has an impact on scalability due to lock contention. Disable it when you do large import importing.
- Turn off discount & pricing feature, that will minimize business logic calls.
- Turn off automatic packaging feature on places were it is applicable.
- Consolidated posting to GL feature can minimize number of batches that will be generated during invoices release operation. Less documents – better performance.
- Disable Field-level Audit, for documents that should be imported.
- Disable Request Profiller and other logging tools.
- Disable Notifications if any configured
Example Multi-Thread Application
This program is available on GitHub.
Have a nice Import!
4 Replies to “Optimizing Large Import”
I hope you’re doing well. I’m using contract-based SOAP APIs to try to import about 25,000 Journal Entry lines from a banking system into a single Acumatica GL batch.
If I try to add all the records at once to the same GL batch, my request times out after a few hours. Since it uses the same GL batch, this solution does not leverage multi-threading.
I’ve also tried adding the 25000 lines one line at a time to a single GL batch and the requests does not time out, but performance-speed starts decreasing significantly after approximately 3000 records or so are added to the GL batch. This process takes several hours to run and since it uses the same GL batch, this solution does not leverage multi-threading.
I looked into multi-threading as well to import the data into several smaller GL-batches of 5000 lines each and that works without any timeout issues. However, this process still takes about an hour and a half to run. Also, the customer does not accept this multi-batch approach; they want all their daily data in a single GL batch.
25,000 records does not seem like a lot to me, so I wonder if Acumatica’s APIs were not built for this volume of lines in a single transaction. All I’m doing in my code is building the entity info by reading a text file and then calling the PUT method to create the GL batch using that entity with 25,000 line records.
I’ve read a couple of articles about optimizing the APIs (This one being one of them), but they primarily deal with different instances of an entity, as in several different GL batches or several different Stock Items for example. In those cases, multi-threading is a great asset because you can have multiple threads creating multiple “different” GL batches, but multi-threading is not helpful when updating the same GL batch.
Here’s what I’ve read so far:
I’m at a loss here, so any pointers would be greatly appreciated.
I look forward to your response.
Is that possible for you do go a bit different way – generate GL lines thought code. AS per my experiments it can generate really a lot of lines per hour.
Please check example here:
I think the best way is to create a GI and export data from there.
but if you have 700000, it still might be slow and quite a lot of data, so would be good to slit it by batches.
Hope it helps!
How about optimizing large export? Example, I am exporting 700,000 sales order and I am getting request timeout everytime and not finishing up the export to excel. What do you suggest me to do?