Data Analysis using SQL – Basics of SQL – Part 1

No Comments
Data anaylysis using SQL part 1

An introduction to RDBMS and SQL

There are various ways to arrange and manage data in a database. The most common is to arrange the data in tables, which is similar to an Excel file. The table contains multiple columns and rows.

A Database is a collection of related data. But a question still remains unanswered — how do you access this data? The answer is a specific language designed for this purpose, called the Structured Query Language, or SQL.

Concepts

Tables − In relational data model, relations are saved in the format of Tables. This format stores the relation among entities. A table has rows and columns, where rows represents records and columns represent the attributes.

Tuple − A single row of a table, which contains a single record for that relation is called a tuple.

Relation instance − A finite set of tuples in the relational database system represents relation instance. Relation instances do not have duplicate tuples.

Relation schema − A relation schema describes the relation name (table name), attributes, and their names.

Relation key − Each row has one or more attributes, known as relation key, which can identify the row in the relation (table) uniquely.

Attribute domain − Every attribute has some pre-defined value scope, known as attribute domain.

Entity

An entity can be a real-world object, either animate or inanimate, that can be easily identifiable. For example, in a school database, students, teachers, classes, and courses offered can be considered as entities. All these entities have some attributes or properties that give them their identity.

An entity set is a collection of similar types of entities. An entity set may contain entities with attribute sharing similar values. For example, a Students set may contain all the students of a school; likewise a Teachers set may contain all the teachers of a school from all faculties. Entity sets need not be disjoint.

Attributes

Entities are represented by means of their properties, called attributes. All attributes have values. For example, a student entity may have name, class, and age as attributes.

There exists a domain or range of values that can be assigned to attributes. For example, a student’s name cannot be a numeric value. It has to be alphabetic. A student’s age cannot be negative, etc.

Types of Attributes

  • Simple attribute − Simple attributes are atomic values, which cannot be divided further. For example, a student’s phone number is an atomic value of 10 digits.
  • Composite attribute − Composite attributes are made of more than one simple attribute. For example, a student’s complete name may have first_name and last_name.
  • Derived attribute − Derived attributes are the attributes that do not exist in the physical database, but their values are derived from other attributes present in the database. For example, average_salary in a department should not be saved directly in the database, instead it can be derived. For another example, age can be derived from data_of_birth.
  • Single-value attribute − Single-value attributes contain single value. For example − Social_Security_Number.
  • Multi-value attribute − Multi-value attributes may contain more than one values. For example, a person can have more than one phone number, email_address, etc.

These attribute types can come together in a way like −

  • simple single-valued attributes
  • simple multi-valued attributes
  • composite single-valued attributes
  • composite multi-valued attributes

Entity-Set and Keys

Key is an attribute or collection of attributes that uniquely identifies an entity among entity set.

For example, the roll_number of a student makes him/her identifiable among students.

  • Super Key − A set of attributes (one or more) that collectively identifies an entity in an entity set.
  • Candidate Key − A minimal super key is called a candidate key. An entity set may have more than one candidate key.
  • Primary Key − A primary key is one of the candidate keys chosen by the database designer to uniquely identify the entity set.

Basics of SQL

SQL stands for Structured Query Language. A query language is a kind of programming language that’s designed to facilitate retrieving specific information from databases, and that’s exactly what SQL does. To put it simply, SQL is the language of databases.

SQL Commands

The standard SQL commands to interact with relational databases are CREATE, SELECT, INSERT, UPDATE, DELETE and DROP. These commands can be classified into the following groups based on their nature −

DDL – Data Definition Language

Sr.No.Command & Description
1CREATE Creates a new table, a view of a table, or other object in the database.
2ALTER Modifies an existing database object, such as a table.
3DROP Deletes an entire table, a view of a table or other objects in the database.

DML – Data Manipulation Language

Sr.No.Command & Description
1SELECT Retrieves certain records from one or more tables.
2INSERT Creates a record.
3UPDATE Modifies records.
4DELETE Deletes records.

DCL – Data Control Language

Sr.No.Command & Description
1GRANT Gives a privilege to user.
2REVOKE Takes back privileges granted from user.

Introduction to DDL

  • DDL stands for Data Definition Language.
  • It is a language used for defining and modifying the data and its structure.
  • It is used to build and modify the structure of your tables and other objects in the database.
  • These commands can be used to add, remove or modify tables within a database.
  • DDL has pre-defined syntax for describing the data.

1. CREATE COMMAND

  • CREATE command is used for creating objects in the database.
  • It creates a new table.

Syntax:
CREATE TABLE <table_name>
(    column_name1 datatype,
     column_name2 datatype,
     .
     .
     .
     column_name_n datatype
);

Example : CREATE command

CREATE TABLE employee
(     
     empid INT,
     ename CHAR,
     age INT,
     city CHAR(25),
     phone_no VARCHAR(20)
);

2. DROP COMMAND

  • DROP command allows to remove entire database objects from the database.
  • It removes entire data structure from the database.
  • It deletes a table, index or view.
  • If you want to remove individual records, then use DELETE command of the DML statement.

3. ALTER COMMAND

  • An ALTER command allows to alter or modify the structure of the database.
  • It modifies an existing database object.
  • Using this command, you can add additional column, drop existing column and even change the data type of columns.

Syntax:
ALTER TABLE <table_name>
ADD <column_name datatype>;

OR

ALTER TABLE <table_name>
CHANGE <old_column_name> <new_column_name>;

OR

ALTER TABLE <table_name>
DROP COLUMN <column_name>;

Example : ALTER Command

ALTER TABLE employee
ADD (address varchar2(50));

OR

ALTER TABLE employee
CHANGE (phone_no) (contact_no);

OR

ALTER TABLE employee
DROP COLUMN age;
To view the changed structure of table, use ‘DESCRIBE’ command.
For example:
DESCRIBE TABLE employee;

4. RENAME COMMAND

  • RENAME command is used to rename an object.
  • It renames a database table.

Syntax:
RENAME TABLE <old_name> TO <new_name>;

Example:
RENAME TABLE emp TO employee;

5. TRUNCATE COMMAND

  • TRUNCATE command is used to delete all the rows from the table permanently.
  • It removes all the records from a table, including all spaces allocated for the records.
  • This command is same as DELETE command, but TRUNCATE command does not generate any rollback data.

Syntax:
TRUNCATE TABLE <table_name>;

Example:
TRUNCATE TABLE employee;

Introduction to DML

  • DML stands for Data Manipulation Language.
  • It is a language used for selecting, inserting, deleting and updating data in a database.
  • It is used to retrieve and manipulate data in a relational database.
  • DML performs read-only queries of data.

1. SELECT COMMAND

  • SELECT command is used to retrieve data from the database.
  • This command allows database users to retrieve the specific information they desire from an operational database.
  • It returns a result set of records from one or more tables.

SELECT Command has many optional clauses are as stated below:

ClauseDescription
WHEREIt specifies which rows to retrieve.
GROUP BYIt is used to arrange the data into groups.
HAVINGIt selects among the groups defined by the GROUP BY clause.
ORDER BYIt specifies an order in which to return the rows.
ASIt provides an alias which can be used to temporarily rename tables or columns.

Syntax:
SELECT * FROM <table_name>;

Example : SELECT Command

SELECT * FROM employee;

OR

SELECT * FROM employee
where salary >=10,000;

2. INSERT COMMAND

  • INSERT command is used for inserting a data into a table.
  • Using this command, you can add one or more records to any single table in a database.
  • It is also used to add records to an existing code.

Syntax:
INSERT INTO <table_name> (`column_name1` <datatype>, `column_name2` <datatype>, . . . , `column_name_n` <database>) VALUES (`value1`, `value2`, . . . , `value n`);

Example:
INSERT INTO employee (`eid` int, `ename` varchar(20), `city` varchar(20))
VALUES (`1`, `ABC`, `PUNE`);

3. UPDATE COMMAND

  • UPDATE command is used to modify the records present in existing table.
  • This command updates existing data within a table.
  • It changes the data of one or more records in a table.

Syntax:
UPDATE <table_name>
SET <column_name = value>
WHERE condition;

Example : UPDATE Command

UPDATE employee
SET salary=20000
WHERE ename=’ABC’;

4. DELETE COMMAND

  • DELETE command is used to delete some or all records from the existing table.
  • It deletes all the records from a table.

Compound Functions and Relational Operators

There are situations when the filtering conditions are quite complex. The SQL WHERE clause is used to specify a condition while fetching the data from a single table or by joining with multiple tables.

The basic syntax of the SELECT statement with the WHERE clause is as shown below.

SELECT column1, column2, columnN 
FROM table_name
WHERE [condition]

You can specify a condition using the comparison or logical operators like >, <, =, LIKE, NOT, etc

Example :
SQL> SELECT ID, NAME, SALARY 
FROM CUSTOMERS
WHERE SALARY > 2000;

Compound operators such as AND, OR and NOT are used to combine these conditions together. 

To delete a record from an existing table, the DELETE command is used. The syntax is

DELETE from [name of the table] where [condition].

Pattern Matching with Wildcards

Pattern matching is an important concept in the string or text-based processing. In SQL, certain characters are reserved as wildcards that can match any number of preceding or trailing characters

Wildcard Characters in SQL Server

SymbolDescriptionExample
%Represents zero or more charactersbl% finds bl, black, blue, and blob
_Represents a single characterh_t finds hot, hat, and hit
[]Represents any single character within the bracketsh[oa]t finds hot and hat, but not hit
^Represents any character not in the bracketsh[^oa]t finds hit, but not hot and hat
Represents a range of charactersc[a-b]t finds cat and cbt

Basics of Sorting

The SQL ORDER BY clause is used to sort the data in ascending or descending order, based on one or more columns. Some databases sort the query results in an ascending order by default.

Syntax

The basic syntax of the ORDER BY clause which would be used to sort the result in an ascending or descending order is as follows −

SELECT column-list 
FROM table_name 
[WHERE condition] 
[ORDER BY column1, column2, .. columnN] [ASC | DESC];

Following is an example, which would sort the result in an ascending order by NAME and SALARY.

SQL> SELECT * FROM CUSTOMERS
   ORDER BY NAME, SALARY;

Session Summary

Well, this session started on the note of managing data through multiple files and the drawbacks. To provide a more consistent view of data, database management systems or DBMS were invented. You learnt that DBMS and RDBMS are commonly used to refer to database management systems:

  • DBMS and 
  • RDBMS (Relational DBMS)

You then learnt that in an RDBMS, the data is organised in tables inside a database and SQL is the language to access and manipulate data in an RDBMS. There are two major categories of SQL commands:

  • Data Definition Language i.e. DDL
  • Data Manipulation Language i.e. DML

The DDL commands are used to create the schema of the database, which is typically done by database administrators. The DML commands allow us to manipulate the data available in a database. As a data analyst, you will be using DML commands very heavily. 

This introductory session covered following topics:

  • Organising data in Database
  • Schema, Constraints, Keys, Relations
  • Select command
  • filtering using where clause
  • basics of sorting, to name a few. 

In the next session, you will learn some advanced SQL skills commonly used by data analysts.

Categories: Machine Learning Tags: Tags: ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*

code