What is a Database:
A database is a system to efficiently stores, retrieves, and searches data. A database can be imagined as an organised way of storing the data of any application.
- easy to retrieve, search, read, and write data
- Low Latency
- Data recovery
- Flexible Schema*
*not all databases provide it.
- to organise the data
- to store the data
- to retrieve the data
- to have a backup of data
- low latency search
- RDBMS: MySQL, Oracle, etc.
- NoSQL: MongoDB, DynamoDB
Type of database
- Table based database consists of rows and columns.
- Table based database has a primary key to link the data subsets or partition the database
- Best fit for the vertical scaling
- The document-based database consists of the collection, of documents, and fields
- The data is stored as key-value pair in fields.
- Best fit for the horizontal scaling
- The graph-based database store the data as nodes and relations
- It is schema-less
- Easy to scale
RDBMS vs NoSQL
|RDBMS||NoSQL (Not Only SQL)|
|Example||MySQL, Oracle, Maria, etc.||mongo, couch DB, Dynamo DB|
|Design||tables||document, key-value, graphs|
|Ease of scaling||hard||easy|
|Flexible||not so flexible||very flexible|
|Schema||need schema||no need for a schema|
|DBA||required||only if, DB is big|
|Dynamic (doing any change at any time)||no||yes|
|Language||Query language||MQL (mongo query language)|
|Integrated Caching||Needs to implement separately||inbuilt|
|Security||Yes||not as MySQL|
|Usecases||Where the ACID properties are required eg: financial banks||Where ACID is not the priority|
|Performance||Faster at selecting the number of data||Faster in inserting and updating the data. Writes are faster|
|When to use||Best fit for data is suitable for table/rows||Best fit for unstructured data|
|Best suitable for the small data set||Best suitable for the large data set|
|–||high availability during unstable environments|
|Strong dependency on multi-row transactions||–|
For any database, we communicate with terms of ‘transactions’. How many transactions are happening?
eg: There are A and B accounts with Rs 500 and Rs. 1000. A will transfer the 200 to the B account and increase the balance of B 1200.
Here, we have 3 steps:
- Account A transfer 200 Rs
- Account B will receive the amount from A
- Account B’s balance will increase
The above 3 steps are one transaction.
Atomicity means that all the transactions should be either 100% successful or fail. There should not be any partial state. Basically All or none.
eg: In the example of transactions, if 1 step will fail then the whole transaction should be failed. For atomicity, the whole transaction should be 100% done
The database must be consistent before and after the transaction always. Concurrency should not impact the database integrity.
eg: If a user has deposited 100 in an account and it failed. The database should reflect the last value to keep the database consistent.
Multiple transactions occur independently without interference. In the end, the database should have the data after all the transactions.
eg: Bank balance is 600. User A withdraws the 100 and user B is adding 50. For user A the balance would be 600 and after withdrawing it would be 500. For user B the balance is 500 and on that 50 will get added and the new balance would be 550. It would look like the transactions are running sequentially.
The changes made by a transaction should persist.
eg: A user has done a transaction of Rs 200. If there is an outage still the transaction success/failure should be persisted in the database. This can be done by the WALs (Write-Ahead-logs).
Data Sharding is a way of distributing a single database across multiple databases which can be stored across multiple machines. It is a way of reducing the load to a single database.
Data sharding is a kind of ‘Horizontal scaling’. While doing sharding, the data is distributed into multiple machines known as ‘physical shards’. Each shard would be ‘autonomous’ – they don’t share any data or compute resources. Sharding can be implemented at the application level or a few databases supports it natively.
Advantages of sharding:
- Fast search
- Avoiding SPoF
- Easy to upgrade horizontal
The disadvantages of sharding:
- Unbalanced shard
- Not natively supported by many databases
- Shard architecture is hard
- Unsharding is difficult
Techniques for Sharding:
The data partition is a way of distributing the data into multiple subsets of data under a single instance. MySQL supports the partition. There are different types of partitions.
The way to link the different tables together is by the primary key.
Eg: There is data of users. Instead of storing whole data in one table. We can use data partition to avoid making one table too big, avoid SPoF, and easily maintain the database. We can break it into multiple databases.
- Query performance
- Data availability
- Data manageability
- Improve security
Data Partition methods:
- Horizontal partition (Database Sharding)
- Vertical partition
- Functional partition
Difference between data-partition and data-sharding.
Data sharding and partition goal is the same to distribute the data. The data partition is the backbone of the distributed system architecture. Where Data-partition focuses on distributing the database into multiple data subsets under one single instance. Data sharding is about distributing the data into multiple machines.
BA: Basically Available
S: Soft State
E: Eventually consistent
- MySQL will be a great fit for financial transactions where ACID compliance is required
- Data is structured
- Horizontal scaling is not the requirement
- Search is faster
- Where ACID compliance is not the priority. Eg: blog application, e-commerce (product)
- Data is unstructured. Eg: products
- Read and Writes are higher. Eg: Blog where we will be creating blogs and reading them
- Looking for horizontal scaling of the Database.
Like the blog? Follow me on [Twitter],(https://twitter.com/hellonehha)