Sandboxing is one of those widely used I.T. buzz words. In some circles of software developers this word is on par with other common business buzz terms like “Synergy” or “Innovation”.
If you are a developer then you have no doubt heard this term used in multiple facets of software engineering. It is often used in describing a development vs test vs staging vs production environment. It is also used to describe generic isolation. Isolation takes many forms such as networking (subsets of users can only access resources on Network A and not Network B), application execution (application can only access a subset of assigned computer resources), or server side development (which is the topic of this blog). Some of us resisted this idea, but in the end it is hard to argue against it. It’s one thing to sandbox application code, but what about server side?
A very exciting niche in software engineering is server side development. You get so many opportunities to work with many different technologies, and you also learn SO MUCH about integrating systems. One of the many exhilarating aspects of server side development is controlling the data! In today’s development world applications are often simply a shell to display data that comes from a server. This means that server side software development is more important now-a-days than it has ever been previously. People are so spoiled with peek performance from companies like Google that it leaves very little tolerance for applications to lag! The modern view of development is very server focused. The client side is important too, don’t get me wrong, but client software is heavily depend on the data resulted from the server. Any inefficiencies on the server will show on the client, and users will grow tired of significant lag very quickly!
Highly integrated applications, like Facebook, Twitter, etc have more complex methods of dealing with massive amounts of data. But a more simple application can use simple server side sandboxing techniques to maximize performance. If your application is company based, and the companies do not need to have highly integrated interactions with each other then one simple method of increasing server performance is to sandbox each company in its own database! This can be done automatically and it can be part of the account creation process.
Many applications store user credentials. Let’s say you have a table called [authorize], and it contains basic information about your users such as Company foreign key, User primary key, first name, last name and a SHA1 hash of their password. This table is queried when users request a login to gain access to the application. The table looks like this:
PK_User |
FK_Company |
FirstName |
LastName |
1 |
1 |
Bill |
Simpson |
2 |
1 |
Allan |
Smith |
3 |
1 |
Kim |
Itzi |
4 |
1 |
Bob |
Bobson |
5 |
1 |
John |
McClain |
6 |
1 |
Liz |
Sampson |
7 |
1 |
Iris |
Bliss |
8 |
1 |
Chris |
Rossfe |
9 |
1 |
Randy |
Moss |
10 |
1 |
Robert |
Griffin |
Looks like a simple table right? This could easily be one of dozens (or even hundreds) of tables in a database that contains relationships to different database objects all over the place! Let’s say you have an application that has many companies sharing the same data source. After another company joins your [authorize] table will start to look like this:
PK_User |
FK_Company |
FirstName |
LastName |
1 |
1 |
Bill |
Simpson |
2 |
1 |
Allan |
Smith |
3 |
1 |
Kim |
Itzi |
4 |
1 |
Bob |
Bobson |
5 |
1 |
John |
McClain |
6 |
1 |
Liz |
Sampson |
7 |
1 |
Iris |
Bliss |
8 |
1 |
Chris |
Rossfe |
9 |
1 |
Randy |
Moss |
10 |
1 |
Robert |
Griffin |
11 |
2 |
Al |
Rotsman |
12 |
2 |
Bethany |
Bell |
13 |
2 |
Kwiki |
Island |
14 |
2 |
John |
Riggins |
15 |
2 |
Shazam |
Shwing |
16 |
2 |
Zack |
Groff |
17 |
2 |
Emily |
Deckerson |
18 |
2 |
Linda |
Hamlet |
19 |
2 |
Yolanda |
Quitzb |
20 |
2 |
Ravindra |
Rumar |
You can see that there are two different companies using this table and each company has 10 users. This is not big deal, right? Well., what happens if your application gets really popular and over night 1,000 different companies sign up. What if each company has 10,000 employees? Now this table contains 10,000,000 records. If this is only one table that contains data (which inevitably it will be), then that means each table starts to get overloaded with tons of data! Each time a new company joins and adds their employees the side effect is that every other company will experience more and more lag as it takes longer and longer to extract data from the database.
Yikes! This is a real problem. After you analyze your software you realize that the cost of rewriting it will severely affect your profits (because time is money), so you make the logical choice: increase the hardware! GIVE IT MORE RAM! GIVE IT MORE MEMORY! GIVE IT MORE SPEED!!!
Then, you get more and more clients… Your hardware cost cuts deep into your profits and you come to the realization that you need to increase your prices to make a profit. Your customers won’t like that, will they? Internet customers are used to either free or very low cost services so a price increase could cost you your entire business!!!!!! At this point, you may be forced to consider a rewrite.
Don’t kill yourself over this! If your application is focused on different companies, and the companies are not highly integrated, then consider this idea…
(RootDatabase)
[Company]
PK_Company | Name | DatabaseName
Inside of a database called [RootDatabase] (in this example) there is only one table. It contains basic information about the companies plus and a database name. In your server side code you can create a database when a company creates an account and all the tables in that database belong only to that company. This keeps the root database very light weight because there is only one simple record for each company.
After 2 companies sign up your database will look like this:
(RootDatabase)
[Company]
PK_Company |
Name |
DatabaseName |
1 |
Company A |
Database1 |
2 |
Company B |
Database2 |
When each company creates an account, the two databases (CompanyDatabase1 & CompanyDatabase2) will be created automatically.
Your database server will look like this:
(RootDatabase)
[Company]
(CompanyDatabase1)
[Authorize]
[UserBlog]
… other tables
(CompanyDatabase2)
[Authorize]
[UserBlog]
… other tables
Neat and organized!
How exactly do you do this? Well, let’s say you are using PHP/MySQL on your server… part of the account creation process will run code like this:
$conn = mysql_connect(“localhost”, “mysqlUsername”, “mysqlPassword”);
$result = mysql_query(“SELECT COUNT(PK_Company) FROM Company”);
$companyCount = mysql_fetch_row($result)[0];
$newCompanyDatabaseName = “CompanyDatabase” . $companyCount;
mysql_query(“CREATE DATABASE ” . $newCompanyDatabaseName);
mysql_query(“INSERT INTO `Company` (Name,DatabaseName) VALUES (’$companyName’,’$newCompanyDatabaseName’)”);
The above code is incomplete, does not protect against SQL Injection, and I didn’t actually test the above code, but you get the idea! It’s very easy to do this.
As you can see… sandboxing your server side is a fantastic idea when you are dealing with massive amounts of data and you want to make sure your clients get the best performance possible! Now, the size and activity of one company has significantly less impact on others. Even simple things like removing the need to check a company foreign key will save valuable milliseconds or seconds or minutes depending on the table capacity! If you are selecting all of the blogs a company has submitted, then this query…
SELECT * FROM UserBlog WHERE FK_Company=1
Will technically run slower than this one…
SELECT * FROM UserBlog
… assuming the table is filled with hundreds or thousands of different company data!
Is there a limitation to this approach? Well, yes… Many DBMS application have a maximum number of databases that can be created. For example, SQL Server and MySQL typically max out at 32,000ish databases. But my view is… If you have 32,000 paying customers then that’s a good problem to have, and with the money you’re making you can expand to an array of SQL Server instances!
My guess is that if you are a database purist you hate this idea. And that’s cool if you do! I have seen this concept work before and I admit I do like the idea. I’d love to hear your thoughts, comments, complaints, rebuttals, etc!
Happy Coding!