5. Introduction to Database

What is a Database:

A database is a system to efficiently stores, retrieves, and searches data. A database can be imagined as an organised way of storing the data of any application.


  • Organized
  • Availability
  • easy to retrieve, search, read, and write data
  • Low Latency
  • Data recovery
  • ACID*
  • Flexible Schema*

*not all databases provide it.

Why Database

  • to organise the data
  • to store the data
  • to retrieve the data
  • to have a backup of data
  • low latency search

Popular database

  • RDBMS: MySQL, Oracle, etc.
  • NoSQL: MongoDB, DynamoDB

Type of database

  • Table
    • Table based database consists of rows and columns.
    • Table based database has a primary key to link the data subsets or partition the database
    • Best fit for the vertical scaling
    • Schema-based
  • Document
    • The document-based database consists of the collection, of documents, and fields
    • The data is stored as key-value pair in fields.
    • Best fit for the horizontal scaling
    • Schema-less
  • Graph
    • The graph-based database store the data as nodes and relations
    • It is schema-less
    • Easy to scale


ExampleMySQL, Oracle, Maria, etc.mongo, couch DB, Dynamo DB
Designtablesdocument, key-value, graphs
Ease of scalinghardeasy
Flexiblenot so flexiblevery flexible
Schemaneed schemano need for a schema
Primary keyYesno
DBArequiredonly if, DB is big
Dynamic (doing any change at any time)noyes
LanguageQuery languageMQL (mongo query language)
Integrated CachingNeeds to implement separatelyinbuilt
SecurityYesnot as MySQL
Store proceduresYesNo
UsecasesWhere the ACID properties are required eg: financial banksWhere ACID is not the priority
PerformanceFaster at selecting the number of dataFaster in inserting and updating the data. Writes are faster
When to useBest fit for data is suitable for table/rowsBest fit for unstructured data
Best suitable for the small data setBest suitable for the large data set
high availability during unstable environments
Strong dependency on multi-row transactions

Important concepts:

1) Transactions

For any database, we communicate with terms of ‘transactions’. How many transactions are happening?

eg: There are A and B accounts with Rs 500 and Rs. 1000. A will transfer the 200 to the B account and increase the balance of B 1200.

Here, we have 3 steps:

  1. Account A transfer 200 Rs
  2. Account B will receive the amount from A
  3. Account B’s balance will increase

The above 3 steps are one transaction.


a. Atomicity

Atomicity means that all the transactions should be either 100% successful or fail. There should not be any partial state. Basically All or none.

eg: In the example of transactions, if 1 step will fail then the whole transaction should be failed. For atomicity, the whole transaction should be 100% done

b. Consistency

The database must be consistent before and after the transaction always. Concurrency should not impact the database integrity.

eg: If a user has deposited 100 in an account and it failed. The database should reflect the last value to keep the database consistent.

c. Isolation

Multiple transactions occur independently without interference. In the end, the database should have the data after all the transactions.

eg: Bank balance is 600. User A withdraws the 100 and user B is adding 50. For user A the balance would be 600 and after withdrawing it would be 500. For user B the balance is 500 and on that 50 will get added and the new balance would be 550. It would look like the transactions are running sequentially.

d. Durability

The changes made by a transaction should persist.

eg: A user has done a transaction of Rs 200. If there is an outage still the transaction success/failure should be persisted in the database. This can be done by the WALs (Write-Ahead-logs).

3) Sharding

Data Sharding is a way of distributing a single database across multiple databases which can be stored across multiple machines. It is a way of reducing the load to a single database.

Data sharding is a kind of ‘Horizontal scaling’. While doing sharding, the data is distributed into multiple machines known as ‘physical shards’. Each shard would be ‘autonomous’ – they don’t share any data or compute resources. Sharding can be implemented at the application level or a few databases supports it natively.

Advantages of sharding:

  • Fast search
  • Avoiding SPoF
  • Easy to upgrade horizontal

The disadvantages of sharding:

  • Unbalanced shard
  • Not natively supported by many databases
  • Shard architecture is hard
  • Unsharding is difficult

Techniques for Sharding:

  • Key-based
  • Range-based
  • Directory-based

4) Partition

The data partition is a way of distributing the data into multiple subsets of data under a single instance. MySQL supports the partition. There are different types of partitions.

The way to link the different tables together is by the primary key.

Eg: There is data of users. Instead of storing whole data in one table. We can use data partition to avoid making one table too big, avoid SPoF, and easily maintain the database. We can break it into multiple databases.


  • Query performance
  • Data availability
  • Performance
  • Data manageability
  • Improve security

Data Partition methods:

  • Horizontal partition (Database Sharding)
  • Vertical partition
  • Functional partition


  • Range-based
  • Hash-based
  • List-based

Difference between data-partition and data-sharding.

Data sharding and partition goal is the same to distribute the data. The data partition is the backbone of the distributed system architecture. Where Data-partition focuses on distributing the database into multiple data subsets under one single instance. Data sharding is about distributing the data into multiple machines.


BA: Basically Available

S: Soft State

E: Eventually consistent

Use Cases:


  • MySQL will be a great fit for financial transactions where ACID compliance is required
  • Data is structured
  • Horizontal scaling is not the requirement
  • Search is faster


  • Where ACID compliance is not the priority. Eg: blog application, e-commerce (product)
  • Data is unstructured. Eg: products
  • Read and Writes are higher. Eg: Blog where we will be creating blogs and reading them
  • Looking for horizontal scaling of the Database.

A big thanks to folks who reviewed this and gave valuable feedback: Ahsan, Tauseef, Naman,Anand,Uddeshya

Like the blog? Follow me on [Twitter],(https://twitter.com/hellonehha)

Leave a Reply