This site uses cookies. Click here to find out more

Scalable and Performant ASP.NET Core Web APIs: Importing Data

April 03, 2018
dotnet

This is our last post in our series of posts on creating performant and scalable web APIs using ASP.NET Core. In this post, we’ll look at high volume data imports …

Data import

Often when a customer adopts our product, they will need to get some existing data into it. Ideally a frontend import facility would just leverage APIs that the rest of the product uses in order to import the data. The problem is that those APIs will likely have been built to POST single records and trying to use those APIs to insert millions of records simply won’t cut it.

Benchmark

Let’s pretend we need to get just 1000 contacts into our system through our API. Our API contoller is below - we are again using Dapper as our data access library. We’ll be posting and inserting the contacts one at a time to start off with.

[Route("api/contacts")]
public class ContactController : Controller
{
    private readonly string _connectionString;
    public ContactController(IConfiguration configuration)
    {
        _connectionString = configuration.GetConnectionString("DefaultConnection");
    }

    [HttpPost]
    public async Task<IActionResult> PostSingle([FromBody] ContactInsert contact)
    {
        using (SqlConnection connection = new SqlConnection(_connectionString))
        {
            connection.Open();

            string sql = @"INSERT INTO Contact(ContactId, Title, FirstName, Surname)
                           VALUES(@ContactId, @Title, @FirstName, @Surname)";

            await connection.ExecuteAsync(sql, contact);
        }
        return Ok(new { ContactId = contact.ContactId });
    }
}

public class ContactInsert
{
    public Guid ContactId { get; set; } = Guid.NewGuid();
    public string Title { get; set; }
    public string FirstName { get; set; }
    public string Surname { get; set; }
}

We’re going to import the data through our API via a console app. We are generating 1000 contacts and pushing them through our API one at a time.

class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Enter number of records to import:");
        int recordCount = Convert.ToInt32(Console.ReadLine());

        Console.WriteLine("Importing - please wait ...");
        Import import = new Import();
        long timeTaken = import.SingleRequests(recordCount).GetAwaiter().GetResult();

        Console.WriteLine($"Took {timeTaken}ms");

        Console.WriteLine("");
        Console.WriteLine("Import finished!");
        Console.ReadLine();
    }
}

class Import
{
    public async Task<long> SingleRequests(int recordCount)
    {
        List<Contact> contacts = GetContacts(recordCount);

        Stopwatch stopwatch = Stopwatch.StartNew();

        HttpClient client = new HttpClient();
        client.BaseAddress = new Uri("http://localhost:7454/");
        client.DefaultRequestHeaders.Accept.Clear();
        client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));

        foreach(Contact contact in contacts)
        {
            HttpResponseMessage response = await client.PostAsJsonAsync("api/contacts", contact);
            response.EnsureSuccessStatusCode();
        }

        stopwatch.Stop();

        return stopwatch.ElapsedMilliseconds;
    }

    private List<Contact> GetContacts(int recordCount)
    {
        List<Contact> contacts = new List<Contact>();
        for (int i = 1; i <= recordCount; i++)
        {
            contacts.Add(new Contact()
            {
                Title = "Mrs",
                FirstName = $"FirstName{i}",
                Surname = $"Surname{i}"
            });
        }
        return contacts;
    }
}

class Contact
{
    public string Title { get; set; }
    public string FirstName { get; set; }
    public string Surname { get; set; }
}

If we run the console app we get:

One at a time test

So it took 131 seconds to import just 1000 contacts. Let’s see if we can improve that …

Posting a collection

An improvement would be to be able to POST all 1000 contacts to our API in a single request. The following API action method allows us to do this:

[HttpPost("collection")]
public async Task<IActionResult> PostCollection([FromBody] IEnumerable<ContactInsert> contacts)
{
    List<Guid> contactIds = new List<Guid>();
    using (SqlConnection connection = new SqlConnection(_connectionString))
    {
        connection.Open();

        string sql = @"INSERT INTO Contact(ContactId, Title, FirstName, Surname)
                       VALUES(@ContactId, @Title, @FirstName, @Surname)";
        foreach(ContactInsert contact in contacts)
        {
            await connection.ExecuteAsync(sql, contact);
            contactIds.Add(contact.ContactId);
        }
    }

    return Ok(contactIds);
}

Here are the changes that allow our console app to use this new method of importing:

class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Enter number of records to import:");
        int recordCount = Convert.ToInt32(Console.ReadLine());

        Console.WriteLine("Importing - please wait ...");
        Import import = new Import();
        //long timeTaken = import.SingleRequests(recordCount).GetAwaiter().GetResult();
        long timeTaken = import.Collection(recordCount).GetAwaiter().GetResult();
        Console.WriteLine($"Took {timeTaken}ms");

        Console.WriteLine("");
        Console.WriteLine("Import finished!");
        Console.ReadLine();
    }
}

class Import
{
    ...

    public async Task<long> Collection(int recordCount)    {        List<Contact> contacts = GetContacts(recordCount);        Stopwatch stopwatch = Stopwatch.StartNew();        HttpClient client = new HttpClient();        client.BaseAddress = new Uri("http://localhost:7454/");        client.DefaultRequestHeaders.Accept.Clear();        client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));        HttpResponseMessage response = await client.PostAsJsonAsync("api/contacts/collection", contacts);        response.EnsureSuccessStatusCode();        stopwatch.Stop();        return stopwatch.ElapsedMilliseconds;    }
    ...
}

This time we are able to load 1000 contacts in 13 seconds.

Batch test

A pretty good improvement but can we do any better?

Inserting in bulk

Although our API takes a collection of contacts in the request, it still inserts them one by one into the database. If we insert them into the database using SqlBulkCopy we should be able to make the process even faster.

Here’s a new action method that does this:

[HttpPost("bulk")]
public IActionResult PostBulk([FromBody] IEnumerable<ContactInsert> contacts)
{
    List<Guid> contactIds = new List<Guid>();
    using (SqlConnection connection = new SqlConnection(_connectionString))
    {
        connection.Open();

        using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
        {
            bulkCopy.DestinationTableName = "dbo.Contact";
            bulkCopy.ColumnMappings.Add("ContactId", "ContactId");
            bulkCopy.ColumnMappings.Add("Title", "Title");
            bulkCopy.ColumnMappings.Add("FirstName", "FirstName");
            bulkCopy.ColumnMappings.Add("Surname", "Surname");

            DataTable dt = new DataTable();
            dt.Columns.Add(new DataColumn("ContactId", typeof(Guid)));
            dt.Columns.Add(new DataColumn("Title"));
            dt.Columns.Add(new DataColumn("FirstName"));
            dt.Columns.Add(new DataColumn("Surname"));
            foreach (ContactInsert contact in contacts)
            {
                DataRow row = dt.NewRow();
                row["ContactId"] = contact.ContactId;
                row["Title"] = contact.Title;
                row["FirstName"] = contact.FirstName;
                row["Surname"] = contact.Surname;
                dt.Rows.Add(row);
            }
            bulkCopy.WriteToServer(dt);
        }
    }
    return Ok(contactIds);
}

Here are the changes that allow our console app to use this new method of importing:

class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Enter number of records to import:");
        int recordCount = Convert.ToInt32(Console.ReadLine());

        Console.WriteLine("Importing - please wait ...");
        Import import = new Import();
        //long timeTaken = import.SingleRequests(recordCount).GetAwaiter().GetResult();
        //long timeTaken = import.Collection(recordCount).GetAwaiter().GetResult();
        long timeTaken = import.Bulk(recordCount).GetAwaiter().GetResult();
        Console.WriteLine($"Took {timeTaken}ms");

        Console.WriteLine("");
        Console.WriteLine("Import finished!");
        Console.ReadLine();
    }
}

class Import
{
    ...
    public async Task<long> Bulk(int recordCount)    {        List<Contact> contacts = GetContacts(recordCount);        Stopwatch stopwatch = Stopwatch.StartNew();        HttpClient client = new HttpClient();        client.BaseAddress = new Uri("http://localhost:7454/");        client.DefaultRequestHeaders.Accept.Clear();        client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));        HttpResponseMessage response = await client.PostAsJsonAsync("api/contacts/bulk", contacts);        response.EnsureSuccessStatusCode();        stopwatch.Stop();        return stopwatch.ElapsedMilliseconds;    }

    ...
}

If we try this method, it takes just over 1 second to do the import:

Batch test

Conclusion

If our system needs to cope with large volume imports - leveraging single record APIs won’t cut it for large data volumes. It’s well worth creating specific APIs for importing large volumes of data that take collections of data in the request and then bulk insert into the database.


Comments

David McClelland April 6, 2018

I really enjoyed this series! Thinking about importing data as fast as possible brought to mind another question: what would be the best way to export data as fast as possible? For example, if I need to provide a large resultset, should I lean toward paging or is there a way to stream the result back to the client – kind of like downloading a large file? My particular use case is to return results from my API in response to a query (start date, end date), and that query runs against a SQL server database which hosts many millions of rows.

Carl April 6, 2018

Thanks David. I usually implement both filtering and paging on GET requests on resources with lots of data. It’s a bit of work to implement but if your resource contains millions of records then it’s worth it to minimise the memory your API will consume.


Toby April 6, 2018

Nice article. I would find it really handy to see an example of a bulk upsert with dapper to go with the bulk insert.

Carl April 6, 2018

Thanks Toby. Bulk upserting is a little more involved … I usually bulk insert the data into an intermediate table and then execute an additional SQL statement to the inserts / updates on the live data. If there’s enough interest, I’ll do a future post on this.


Danniel May 27, 2018

Best article I read so far about API import. Well done!


Sankar March 29, 2019

Very good article. I would like to see a Bulk Upsert example as well.


Chandrakant Patil April 23, 2019

Very nice explained.


Devesh Sharma August 8, 2019

Brilliant Article. Many things I was already using but lot I get to know by this Post. Very Informative for any Developer.

Want more content like this?

Subscribe to receive notifications on new blog posts and courses

Required
© Carl Rippon