In my previous post “How I Introduced Multitenant Architecture In my Web Application” I discussed database isolation as one of main requirements in a Multitenant SaaS application. So what are the reasons for data isolation? They are obvious and to lay them down.
- Security – The user does not want to mingle their data with their competitors.
- Scalability – There are many users, for example billion users in my webtop application :). So to deal with gazzilion bytes of data in a table and also to keep database sanity intact you need to chop down the table rows in smaller chunks.
- Combination of other two – Most of the users don’t mind their data coexisting with other users however there still are some fussy customers.
If you feel I left out points then please leave a comment or two. I will update my post.
Whatever be your reason once you decide to separate user data data you have to think about the ways by which you can achieve this. Once again I will lay down the approaches (Ok. I have again stolen the points from google, big deal.)
- Separate database instance – Each tenant has his own database instance.
- Separate schema but same database instance – Each tenant has his own schema or sets of table in a single database instance.
- Separate Rows – The database instance is same same however rows of tenant data is partitioned using some kind of discriminator (tenant_id for example).
Each approach has their own pros and cons. Kindly check the table compiled in postSaaS – Multi-Tenant Database Design Options. It compares the various approaches under different user requirements.
If you had to make your decision then you could easily choose The separate database instances for each tenant approach, but you know that it could prove costly in long run and also add to maintenance and administrative headache. So what to do now? Don’t worry there is a middle path which will help you counter all these. Can you tell me? No? Not to worry I am here for the rescue. Here is my fourth Multitenant database approach.
Sharding – http://en.wikipedia.org/wiki/Sharding
Sharding is a mechanism by which data of a table is horizontally partitioned across multiple database instances. Each of the independent database partition is called a Shard. Concept of Sharding is different than the Horizontal Partitioning. In horizontal partitioning the data in a single table in a single database instance is partitioned across rows. Meanwhile in sharding the horizontal partitioning is achieved across multiple database servers. The shards are not aware of each other and work independently with each other. They can be present in any database server in any data-center anywhere in the world. So now look at advantages of using this approach.
- The total number of rows in each table are lesser now. Hence the size of index hence the performance of queries improve.
- Due to sharding not only the problematic table is spread across multiple database other table are also replicated across multiple database instances hence overall load on a database is reduced hence improving performance. In other words we have a database cluster each serving its own bunch of clients and each of which can be scaled in it own way.
- We can also get one shard for each tenant using this approach.
Ok, since we have the best approach (I know nosql cassandra etc lovers will be gunning for my head, but let me warn you I am very strong, I got super powers) now let us put our heads down and try to figure out how to implement sharding in your own cool application. But first let me pen down the approach once more
Multitenant database approach by horizontally partitioning data across multiple database instance each having identical schema.
I had some portion my webtop already written and I had used Hibernate as my ORM tool. It took a long time to get my database layer designed in a way I liked so I decided to keep it. Hence I decided to give a try to hibernate shards. Hibernate shards is not in active development however it is still a very robust piece of code. I thought it was better to use it rather than writing a utility right from scratch. I did a proof of concept by splitting data across three shards. Each installed in three separate database servers in my office, I really loved what I saw so I will share it with my fans ;).
Just FYI – Hibernate 4 will be released in next few months with full Multitenant support embedded with its core API. https://hibernate.onjira.com/browse/HHH-5697 . I am happy as I am sure that I can easily re-factor my shards code using latest API.