What I'm going to demonstrate here applies not just to SQLite, but to all ADO.NET providers. I'm going to demonstrate bulk inserting data using a DbCommandBuilder, DbDataAdapter and a DataTable, as well as direct bulk inserting data using a hand-crafted parameterized insert. There are a few tricks along the way that will speed up your inserts tremendously, so don't skip!
The following examples assume a table has been created using the following command:
CREATE TABLE TestCase (ID INTEGER PRIMARY KEY, MyValue INTEGER)
The DataTable/DbDataAdapter/DbCommandBuilder Method
This method is not the fastest method of bulk inserting data, but it is commonly used and has its place, so I will demonstrate it here and give some tips on how to cut the insert time in half!
First, it is a common misconception that the DbCommandBuilder is unsuitable for production use. I disagree, and think for simple inserts it is a great way to avoid generating your own SQL. Since it is a jack of all trades, it therefore is a master of none, and may not be suitable for every circumstance.
A common use of the DbCommandBuilder might be as follows:
internal static void CommandBuilderTest1(DbProviderFactory fact, DbConnection cnn){ using (DbTransaction dbTrans = cnn.BeginTransaction()) { using (DbDataAdapter adp = fact.CreateDataAdapter()) { using (DbCommand cmd = cnn.CreateCommand()) { cmd.Transaction = dbTrans; cmd.CommandText = "SELECT * FROM TestCase WHERE 1 = 2"; adp.SelectCommand = cmd; using (DbCommandBuilder bld = fact.CreateCommandBuilder()) { bld.DataAdapter = adp; using (DataTable tbl = new DataTable()) { adp.Fill(tbl); for (int n = 0; n < 10000; n++) { DataRow row = tbl.NewRow(); row[1] = n; tbl.Rows.Add(row); } adp.Update(tbl); dbTrans.Commit(); } } } } } } |
That code will execute on my machine in roughly 1 second. Notice the only SQL I specified was the Select Command. The DbCommandBuilder automatically generated an insert statement behind the scenes. At first you might be impressed. 10,000 rows in 1 second doesn't seem too shabby, does it? What if, by a mere 4 lines of code, I was able to cut that time down to about 300ms? I'll demonstrate:
internal static void CommandBuilderTest2(DbProviderFactory fact, DbConnection cnn){ using (DbTransaction dbTrans = cnn.BeginTransaction()) { using (DbDataAdapter adp = fact.CreateDataAdapter()) { using (DbCommand cmd = cnn.CreateCommand()) { cmd.Transaction = dbTrans; cmd.CommandText = "SELECT * FROM TestCase WHERE 1 = 2"; adp.SelectCommand = cmd; using (DbCommandBuilder bld = fact.CreateCommandBuilder()) { bld.DataAdapter = adp; // 3 lines of new code (+1 closing brace). We create a clone of the insert command // and followup by setting the DbCommandBuilder's DataAdapter property to null using (adp.InsertCommand = (DbCommand)((ICloneable)bld.GetInsertCommand()).Clone()) { bld.DataAdapter = null; using (DataTable tbl = new DataTable()) { adp.Fill(tbl); for (int n = 0; n < 10000; n++) { DataRow row = tbl.NewRow(); row[1] = n; tbl.Rows.Add(row); } adp.Update(tbl); dbTrans.Commit(); } } } } } } } |
The above code now executes on my machine in around 300ms. Why? What's the significance of what I've done here? It's deviously subtle, but very important and goes to the core of understanding the underpinnings of ADO.NET. You see, the DbCommandBuilder tries to be as convenient as possible for you, and often to your own detriment. Once a DbCommandBuilder has a DataAdapter, the DbCommandBuilder will connect to the DataAdapter's OnUpdating event, and if you happen to generate an UPDATE or an INSERT or a DELETE, and the DataAdapter doesn't have those commands, DbCommandBuilder will automatically generate them for the DataAdapter.
Why is that important? Because for every insert we do in the test, we're raising an event that the DbCommandBuilder processes. That's 10,000 list lookups and 10,000 delegate callbacks.
Bottom line: If you already know you're going to do an insert, update or delete, create your DbCommandBuilder, clone the commands and assign them to the DataAdapter, then destroy the DbCommandBuilder. Your inserts updates and deletes will love you for it.
Lets modify example 2 slightly so we can retrieve the primary key on the insert. This is a great little trick, and gets around a design issue with DbCommandBuilder (I'll explain after the sample):
internal static void CommandBuilderTest3(DbProviderFactory fact, DbConnection cnn){ using (DbTransaction dbTrans = cnn.BeginTransaction()) { using (DbDataAdapter adp = fact.CreateDataAdapter()) { using (DbCommand cmd = cnn.CreateCommand()) { cmd.Transaction = dbTrans; cmd.CommandText = "SELECT * FROM TestCase WHERE 1 = 2"; adp.SelectCommand = cmd; using (DbCommandBuilder bld = fact.CreateCommandBuilder()) { bld.DataAdapter = adp; using (adp.InsertCommand = (DbCommand)((ICloneable)bld.GetInsertCommand()).Clone()) { // These two lines modify the insert command to return the newly inserted rowid adp.InsertCommand.CommandText += ";SELECT last_insert_rowid() AS [ID]"; adp.InsertCommand.UpdatedRowSource = UpdateRowSource.FirstReturnedRecord; bld.DataAdapter = null; using (DataTable tbl = new DataTable()) { adp.Fill(tbl); for (int n = 0; n < 10000; n++) { DataRow row = tbl.NewRow(); row[1] = n; tbl.Rows.Add(row); } adp.Update(tbl); dbTrans.Commit(); } } } } } } } |
Because we cloned the DbCommandBuilder's InsertCommand, we're free to modify it as we desire. If you tried this using the default InsertCommand that DbCommandBuilder generates, you'd find that the DbCommandBuilder would've reverted it back to its original state behind the scenes. See this MSDN article describing the issue.
Fastest universal way to insert data using standard ADO.NET constructs
Now that the slow stuff is out of the way, lets talk about some hardcore bulk loading. Aside from SqlBulkCopy and specialized constructs involving ISAM or custom bulk insert classes from other providers, there is simply no beating the raw power of ExecuteNonQuery() on a parameterized INSERT statement. I will demonstrate:
internal static void FastInsertMany(DbConnection cnn){ using (DbTransaction dbTrans = cnn.BeginTransaction()) { using (DbCommand cmd = cnn.CreateCommand()) { cmd.CommandText = "INSERT INTO TestCase(MyValue) VALUES(?)"; DbParameter Field1 = cmd.CreateParameter(); cmd.Parameters.Add(Field1); for (int n = 0; n < 100000; n++) { Field1.Value = n + 100000; cmd.ExecuteNonQuery(); } } dbTrans.Commit(); } } |
Simple, elegant, clean. 100,000 inserts on my machine in 1.4 seconds. Yes, that's 10 times more inserts than the DataAdapter/DbCommandBuilder/DataTable method, and at very nearly the same speed as example 1!
Do not, under any circumstances, bulk insert data by building your own command text over and over and over again. That is the surest way to drag performance into the dirt. Build your CommandText one time, build your parameters one time, and set their values over and over and over instead. SQLite runs much faster, SQL Server runs much faster, and every database known to mankind will run much faster with a prepared, parameterized insert statement.
Enjoy your inserting!
Robert