Sunday, July 8, 2018

Saturday, February 3, 2018

how to find missing indexes in sql server


A proper index can improve the performance and a bad index can hamper the performance. here I have mention a query for missing indexes based on table usage.

SELECT
dm_mid.database_id AS DatabaseID,
dm_migs.avg_user_impact*(dm_migs.user_seeks+dm_migs.user_scans) Avg_Estimated_Impact,
dm_migs.last_user_seek AS Last_User_Seek,
OBJECT_NAME(dm_mid.OBJECT_ID,dm_mid.database_id) AS [TableName],
'CREATE INDEX [IX_' + OBJECT_NAME(dm_mid.OBJECT_ID,dm_mid.database_id) + '_'
+ REPLACE(REPLACE(REPLACE(ISNULL(dm_mid.equality_columns,''),', ','_'),'[',''),']','')
+ CASE
WHEN dm_mid.equality_columns IS NOT NULL
AND dm_mid.inequality_columns IS NOT NULL THEN '_'
ELSE ''
END
+ REPLACE(REPLACE(REPLACE(ISNULL(dm_mid.inequality_columns,''),', ','_'),'[',''),']','')
+ ']'
+ ' ON ' + dm_mid.statement
+ ' (' + ISNULL (dm_mid.equality_columns,'')
+ CASE WHEN dm_mid.equality_columns IS NOT NULL AND dm_mid.inequality_columns
IS NOT NULL THEN ',' ELSE
'' END
+ ISNULL (dm_mid.inequality_columns, '')
+ ')'
+ ISNULL (' INCLUDE (' + dm_mid.included_columns + ')', '') AS Create_Statement
FROM sys.dm_db_missing_index_groups dm_mig
INNER JOIN sys.dm_db_missing_index_group_stats dm_migs
ON dm_migs.group_handle = dm_mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details dm_mid
ON dm_mig.index_handle = dm_mid.index_handle
WHERE dm_mid.database_ID = DB_ID()
ORDER BY Avg_Estimated_Impact DESC
GO

Ultimate List of 20 Important SQL Queries


Ultimate List of 20 Important SQL Queries

SQL is incredibly powerful, and like every well-made development tool, it has a few commands which it’s vital for a good developer to know.

1.    Query for Retrieving Tables

 This query can be run to retrieve the list of tables present in a database where the database is “My_Schema”.
1
SELECT * FROM My_Schema.Tables;

2.    Query for Selecting Columns from a Table

 This is perhaps the most widely used SQL query. In the example below, we are extracting the “Student_ID” column or attribute from the table “STUDENT”.
1
SELECT Student_ID FROM STUDENT;
If you want to display all attributes from a particular table, this is the right query to use:
1
SELECT * FROM STUDENT;

3.    Query for Outputting Data Using a Constraint

 This query retrieves the specified attributes from the table on the constraint; Employee ID =0000.
1
SELECT EMP_ID, NAME FROM EMPLOYEE_TBL WHERE EMP_ID = '0000';

4.    Query for Outputting Sorted Data Using ‘Order By’

 This query orders the results with respect to the attribute which is referenced to using “Order By” – so for example, if that attribute is an integer data type, then the result would either be sorted in ascending or descending order; likewise, if the datatype is a String then the result would be ordered in alphabetical order.
1
SELECT EMP_ID, LAST_NAME FROM EMPLOYEE WHERE CITY = 'Seattle' ORDER BY EMP_ID;
The ordering of the result can also be set manually, using “asc ” for ascending and “desc” for descending.
1
SELECT EMP_ID, LAST_NAME FROM EMPLOYEE_TBL WHERE CITY = 'INDIANAPOLIS' ORDER BY EMP_ID asc;

5.    Query for Outputting Sorted Data Using ‘Group By’

 The ‘Group By’ property groups the resulting data according to the specified attribute.
1
SELECT Name, Age FROM Patients WHERE Age > 40 GROUP BY Age ORDER BY Name;

Queries for Data Manipulation Using Math Functions

 There are a lot of built-in math functions like COUNT and AVG which provide basic functionalities of counting the number of results and averaging them respectively.

6.    Data Manipulation Using COUNT

 This query displays the total number of customers by counting each customer ID. In addition, it groups the results according to the country of each customer.
1
SELECT COUNT(CustomerID), Country FROM Customers GROUP BY Country;

7.    Data Manipulation Using SUM

 SUM calculates the total of the attribute that is given to it as an argument.
1
SELECT SUM(Salary)FROM Employee WHERE Emp_Age < 30;

8.    Data Manipulation Using AVG

 Simple – an average of a given attribute.
1
SELECT AVG(Price)FROM Products;

9.    Query for Listing all Views

 This query lists all the views available in the schema.
1
SELECT * FROM My_Schema.views;

10.       Query for Creating a View

 A view is a tailored table that is formed as a result of a query. It has tables and rows just like any other table. It’s usually a good idea to run queries as independent views, because this allows them to be retrieved later to view the query results, rather than computing the same command every time for a particular set of results.
1
2
3
4
CREATE VIEW Failing_Students AS
SELECT S_NAME, Student_ID
FROM STUDENT
WHERE GPA > 40;

11.       Query for Retrieving a View

 The standard syntax of selecting attributes from a table is applicable to views as well.
                    SELECT * FROM Failing_Students;

12.       Query for Updating a View

This query updates the view named ‘Product List’ – and if this view doesn’t exist, then the Product List view gets created as specified in this query.
1
2
3
4
CREATE OR REPLACE VIEW [ Product List] AS
SELECT ProductID, ProductName, Category
FROM Products
WHERE Discontinued = No;

13.       Query for Dropping a View

 This query will drop or delete a view named ‘V1’.
1
DROP VIEW V1;

14.       Query to Display User Tables

 A user-defined table is a representation of defined information in a table, and they can be used as arguments for procedures or user-defined functions. Because they’re so useful, it’s useful to keep track of them using the following query.
1
select * from Sys.objects where Type='u'

15.       Query to Display Primary Keys

 A primary key uniquely identifies all values within a table. The following query lists all the fields in a table’s primary key.
1
select * from Sys.Objects where Type='PK'

16.       Query for Displaying Unique Keys

 A Unique Key allows a column to ensure that all of its values are different.
1
select * from Sys.Objects where Type='uq'

17.       Displaying Foreign Keys

 Foreign keys link one table to another – they are attributes in one table which refer to the primary key of another table.
1
select * from Sys.Objects where Type='f'

18.       Displaying Triggers

 A Trigger is sort of an ‘event listener’ –  i.e, it’s a pre-specified set of instructions that execute when a certain event occurs. The list of defined triggers can be viewed using the following query.
1
select * from Sys.Objects where Type='tr'

19.       Displaying Internal Tables

 Internal tables are formed as a by-product of a user-action and are usually not accessible. The data in internal tables cannot be manipulated; however, the metadata of the internal tables can be viewed using the following query.
1
select * from Sys.Objects where Type='it'

20.       Displaying a List of Procedures

 A stored procedure is a group of SQL queries that logically form a single unit and perform a particular task. Thus, using the following query you can keep track of them:
1
select * from Sys.Objects where Type='p'

Saturday, July 22, 2017

Top 50 SQL Interview Questions & Answers


1. What is DBMS?
A Database Management System (DBMS) is a program that controls creation, maintenance and use of a database. DBMS can be termed as File Manager that manages data in a database rather than saving it in file systems.

2. What is RDBMS?
RDBMS stands for Relational Database Management System. RDBMS store the data into the collection of tables, which is related by common fields between the columns of the table. It also provides relational operators to manipulate the data stored into the tables.
Example: SQL Server.
 3. What is SQL?
SQL stands for Structured Query Language , and it is used to communicate with the Database. This is a standard language used to perform tasks such as retrieval, updation, insertion and deletion of data from a database.
Standard SQL Commands are Select.
4. What is a Database?
Database is nothing but an organized form of data for easy access, storing, retrieval and managing of data. This is also known as structured form of data which can be accessed in many ways.
Example: School Management Database, Bank Management Database.
 5. What are tables and Fields?
A table is a set of data that are organized in a model with Columns and Rows. Columns can be categorized as vertical, and Rows are horizontal. A table has specified number of column called fields but can have any number of rows which is called record.
Example:.
Table: Employee.
Field: Emp ID, Emp Name, Date of Birth.
6. What is a primary key? A primary key is a combination of fields which uniquely specify a row. This is a special kind of unique key, and it has implicit NOT NULL constraint. It means, Primary key values cannot be NULL.
7. What is a unique key?
A Unique key constraint uniquely identified each record in the database. This provides uniqueness for the column or set of columns.
A Primary key constraint has automatic unique constraint defined on it. But not, in the case of Unique Key.
There can be many unique constraint defined per table, but only one Primary key constraint defined per table.
 8. What is a foreign key?
A foreign key is one table which can be related to the primary key of another table. Relationship needs to be created between two tables by referencing foreign key with the primary key of another table.
9. What is a join?
This is a keyword used to query data from more tables based on the relationship between the fields of the tables. Keys play a major role when JOINs are used.
 10. What are the types of join and explain each?
There are various types of join which can be used to retrieve data and it depends on the relationship between tables.
Inner join.
Inner join return rows when there is at least one match of rows between the tables.
Right Join.
Right join return rows which are common between the tables and all rows of Right hand side table. Simply, it returns all the rows from the right hand side table even though there are no matches in the left hand side table.
Left Join.
Left join return rows which are common between the tables and all rows of Left hand side table. Simply, it returns all the rows from Left hand side table even though there are no matches in the Right hand side table.
Full Join.
Full join return rows when there are matching rows in any one of the tables. This means, it returns all the rows from the left hand side table and all the rows from the right hand side table.
11. What is normalization?
Normalization is the process of minimizing redundancy and dependency by organizing fields and table of a database. The main aim of Normalization is to add, delete or modify field that can be made in a single table.
 12. What is Denormalization.
DeNormalization is a technique used to access the data from higher to lower normal forms of database. It is also process of introducing redundancy into a table by incorporating data from the related tables.
13. What are all the different normalizations?
The normal forms can be divided into 5 forms, and they are explained below -.
First Normal Form (1NF):.
This should remove all the duplicate columns from the table. Creation of tables for the related data and identification of unique columns.
Second Normal Form (2NF):.
Meeting all requirements of the first normal form. Placing the subsets of data in separate tables and Creation of relationships between the tables using primary keys.
Third Normal Form (3NF):.
This should meet all requirements of 2NF. Removing the columns which are not dependent on primary key constraints.
Fourth Normal Form (3NF):.
Meeting all the requirements of third normal form and it should not have multi- valued dependencies.
14. What is a View?
A view is a virtual table which consists of a subset of data contained in a table. Views are not virtually present, and it takes less space to store. View can have data of one or more tables combined, and it is depending on the relationship.
15. What is an Index?
An index is performance tuning method of allowing faster retrieval of records from the table. An index creates an entry for each value and it will be faster to retrieve data.
 16. What are all the different types of indexes?
There are three types of indexes -.
Unique Index.
This indexing does not allow the field to have duplicate values if the column is unique indexed. Unique index can be applied automatically when primary key is defined.
Clustered Index.
This type of index reorders the physical order of the table and search based on the key values. Each table can have only one clustered index.
NonClustered Index.
NonClustered Index does not alter the physical order of the table and maintains logical order of data. Each table can have 999 nonclustered indexes.
17. What is a Cursor?
A database Cursor is a control which enables traversal over the rows or records in the table. This can be viewed as a pointer to one row in a set of rows. Cursor is very much useful for traversing such as retrieval, addition and removal of database records.
 18. What is a relationship and what are they?
Database Relationship is defined as the connection between the tables in a database. There are various data basing relationships, and they are as follows:.
  • One to One Relationship.
  • One to Many Relationship.
  • Many to One Relationship.
  • Self-Referencing Relationship.
19. What is a query?
A DB query is a code written in order to get the information back from the database. Query can be designed in such a way that it matched with our expectation of the result set. Simply, a question to the Database.
 20. What is subquery?
A subquery is a query within another query. The outer query is called as main query, and inner query is called subquery. SubQuery is always executed first, and the result of subquery is passed on to the main query.
21. What are the types of subquery?
There are two types of subquery – Correlated and Non-Correlated.
A correlated subquery cannot be considered as independent query, but it can refer the column in a table listed in the FROM the list of the main query.
A Non-Correlated sub query can be considered as independent query and the output of subquery are substituted in the main query.
 22. What is a stored procedure?
Stored Procedure is a function consists of many SQL statement to access the database system. Several SQL statements are consolidated into a stored procedure and execute them whenever and wherever required.
23. What is a trigger?
A DB trigger is a code or programs that automatically execute with response to some event on a table or view in a database. Mainly, trigger helps to maintain the integrity of the database.
Example: When a new student is added to the student database, new records should be created in the related tables like Exam, Score and Attendance tables.
24. What is the difference between DELETE and TRUNCATE commands?
DELETE command is used to remove rows from the table, and WHERE clause can be used for conditional set of parameters. Commit and Rollback can be performed after delete statement.
TRUNCATE removes all rows from the table. Truncate operation cannot be rolled back.
25. What are local and global variables and their differences?
Local variables are the variables which can be used or exist inside the function. They are not known to the other functions and those variables cannot be referred or used. Variables can be created whenever that function is called.
Global variables are the variables which can be used or exist throughout the program. Same variable declared in global cannot be used in functions. Global variables cannot be created whenever that function is called.
 26. What is a constraint?
Constraint can be used to specify the limit on the data type of table. Constraint can be specified while creating or altering the table statement. Sample of constraint are.
  • NOT NULL.
  • CHECK.
  • DEFAULT.
  • UNIQUE.
  • PRIMARY KEY.
  • FOREIGN KEY.
27. What is data Integrity?
Data Integrity defines the accuracy and consistency of data stored in a database. It can also define integrity constraints to enforce business rules on the data when it is entered into the application or database.
 28. What is Auto Increment?
Auto increment keyword allows the user to create a unique number to be generated when a new record is inserted into the table. AUTO INCREMENT keyword can be used in Oracle and IDENTITY keyword can be used in SQL SERVER.
Mostly this keyword can be used whenever PRIMARY KEY is used.
 29. What is the difference between Cluster and Non-Cluster Index?
Clustered index is used for easy retrieval of data from the database by altering the way that the records are stored. Database sorts out rows by the column which is set to be clustered index.
A nonclustered index does not alter the way it was stored but creates a complete separate object within the table. It point back to the original table rows after searching.
30. What is Datawarehouse?
Datawarehouse is a central repository of data from multiple sources of information. Those data are consolidated, transformed and made available for the mining and online processing. Warehouse data have a subset of data called Data Marts.
31. What is Self-Join?
Self-join is set to be query used to compare to itself. This is used to compare values in a column with other values in the same column in the same table. ALIAS ES can be used for the same table comparison.
 32. What is Cross-Join?
Cross join defines as Cartesian product where number of rows in the first table multiplied by number of rows in the second table. If suppose, WHERE clause is used in cross join then the query will work like an INNER JOIN.
33. What is user defined functions?
User defined functions are the functions written to use that logic whenever required. It is not necessary to write the same logic several times. Instead, function can be called or executed whenever needed.
 34. What are all types of user defined functions?
Three types of user defined functions are.
  • Scalar Functions.
  • Inline Table valued functions.
  • Multi statement valued functions.
Scalar returns unit, variant defined the return clause. Other two types return table as a return.
35. What is collation?
Collation is defined as set of rules that determine how character data can be sorted and compared. This can be used to compare A and, other language characters and also depends on the width of the characters.
ASCII value can be used to compare these character data.

36. What are all different types of collation sensitivity?
Following are different types of collation sensitivity -.
  • Case Sensitivity – A and a and B and b.
  • Accent Sensitivity.
  • Kana Sensitivity – Japanese Kana characters.
  • Width Sensitivity – Single byte character and double byte character.
37. Advantages and Disadvantages of Stored Procedure?
Stored procedure can be used as a modular programming – means create once, store and call for several times whenever required. This supports faster execution instead of executing multiple queries. This reduces network traffic and provides better security to the data.
Disadvantage is that it can be executed only in the Database and utilizes more memory in the database server.
38. What is Online Transaction Processing (OLTP)?
Online Transaction Processing or OLTP manages transaction based applications which can be used for data entry and easy retrieval processing of data. This processing makes like easier on simplicity and efficiency. It is faster, more accurate results and expenses with respect to OTLP.
Example – Bank Transactions on a daily basis.
 39. What is CLAUSE?
SQL clause is defined to limit the result set by providing condition to the query. This usually filters some rows from the whole set of records.
Example – Query that has WHERE condition
Query that has HAVING condition.
40. What is recursive stored procedure?
A stored procedure which calls by itself until it reaches some boundary condition. This recursive function or procedure helps programmers to use the same set of code any number of times.
 41. What is Union, minus and Interact commands?
UNION operator is used to combine the results of two tables, and it eliminates duplicate rows from the tables.
MINUS operator is used to return rows from the first query but not from the second query. Matching records of first and second query and other rows from the first query will be displayed as a result set.
INTERSECT operator is used to return rows returned by both the queries.
 42. What is an ALIAS command?
ALIAS name can be given to a table or column. This alias name can be referred in WHERE clause to identify the table or column.
Example-.

Here, st refers to alias name for student table and Ex refers to alias name for exam table.
43. What is the difference between TRUNCATE and DROP statements?
TRUNCATE removes all the rows from the table, and it cannot be rolled back. DROP command removes a table from the database and operation cannot be rolled back.
44. What are aggregate and scalar functions?
Aggregate functions are used to evaluate mathematical calculation and return single values. This can be calculated from the columns in a table. Scalar functions return a single value based on the input value.
Example -.
Aggregate – max(), count – Calculated with respect to numeric.
Scalar – UCASE(), NOW() – Calculated with respect to strings.
45. How can you create an empty table from an existing table?
Example will be -.

Here, we are copying student table to another table with the same structure with no rows copied.
 46. How to fetch common records from two tables?
Common records result set can be achieved by -.

47. How to fetch alternate records from a table?
Records can be fetched for both Odd and Even row numbers -.
To display even numbers-.

To display odd numbers-.

from (Select rowno, studentId from student) where mod(rowno,2)=1.[/sql]
  48. How to select unique records from a table?
Select unique records from a table by using DISTINCT keyword.

49. What is the command used to fetch first 5 characters of the string?
There are many ways to fetch first 5 characters of the string -.

 select RIGHT(studentname,5) as studentname from student
50. Which operator is used in query for pattern matching?
LIKE operator is used for pattern matching, and it can be used as -.
  1. % – Matches zero or more characters.
  2. _(Underscore) – Matching exactly one character.
    Example 
    select * from student where studentname like 'a%'
    select * from student where studentname like 'ami_'

Tuesday, July 18, 2017

MS SQL Server - Architecture



We have classified the architecture of SQL Server into the following parts for easy understanding −
  • General architecture
  • Memory architecture
  • Data file architecture
  • Log file architecture

General Architecture

Client − Where the request initiated.
Query − SQL query which is high level language.
Logical Units − Keywords, expressions and operators, etc.
N/W Packets − Network related code.
Protocols − In SQL Server we have 4 protocols.
  • Shared memory (for local connections and troubleshooting purpose).
  • Named pipes (for connections which are in LAN connectivity).
  • TCP/IP (for connections which are in WAN connectivity).
  • VIA-Virtual Interface Adapter (requires special hardware to set up by vendor and also deprecated from SQL 2012 version).
Server − Where SQL Services got installed and databases reside.
Relational Engine − This is where real execution will be done. It contains Query parser, Query optimizer and Query executor.
Query Parser (Command Parser) and Compiler (Translator) − This will check syntax of the query and it will convert the query to machine language.
Query Optimizer − It will prepare the execution plan as output by taking query, statistics and Algebrizer tree as input.
Execution Plan − It is like a roadmap, which contains the order of all the steps to be performed as part of the query execution.
Query Executor − This is where the query will be executed step by step with the help of execution plan and also the storage engine will be contacted.
Storage Engine − It is responsible for storage and retrieval of data on the storage system (disk, SAN, etc.,), data manipulation, locking and managing transactions.
SQL OS − This lies between the host machine (Windows OS) and SQL Server. All the activities performed on database engine are taken care of by SQL OS. SQL OS provides various operating system services, such as memory management deals with buffer pool, log buffer and deadlock detection using the blocking and locking structure.
Checkpoint Process − Checkpoint is an internal process that writes all dirty pages (modified pages) from Buffer Cache to Physical disk. Apart from this, it also writes the log records from log buffer to physical file. Writing of Dirty pages from buffer cache to data file is also known as Hardening of dirty pages.
It is a dedicated process and runs automatically by SQL Server at specific intervals. SQL Server runs checkpoint process for each database individually. Checkpoint helps to reduce the recovery time for SQL Server in the event of unexpected shutdown or system crash\Failure.

Checkpoints in SQL Server

In SQL Server 2012 there are four types of checkpoints
  • Automatic − This is the most common checkpoint which runs as a process in the background to make sure SQL Server Database can be recovered in the time limit defined by the Recovery Interval − Server Configuration Option.
  • Indirect − This is new in SQL Server 2012. This also runs in the background but to meet a user-specified target recovery time for the specific database where the option has been configured. Once the Target_Recovery_Time for a given database has been selected, this will override the Recovery Interval specified for the server and avoid automatic checkpoint on such DB.
  • Manual − This one runs just like any other T-SQL statement, once you issue checkpoint command it will run to its completion. Manual checkpoint runs for your current database only. You can also specify the Checkpoint_Duration which is optional - this duration specifies the time in which you want your checkpoint to complete.
  • Internal − As a user you can’t control internal checkpoint. Issued on specific operations such as
    • Shutdown initiates a checkpoint operation on all databases except when shutdown is not clean (shutdown with nowait).
    • If the recovery model gets changed from Full\Bulk-logged to Simple.
    • While taking backup of the database.
    • If your DB is in simple recovery model, checkpoint process executes automatically either when the log becomes 70% full, or based on Server option-Recovery Interval.
    • Alter database command to add or remove a data\log file also initiates a checkpoint.
    • Checkpoint also takes place when the recovery model of the DB is bulk-logged and a minimally logged operation is performed.
    • DB Snapshot creation.
  • Lazy Writer Process − Lazy writer will push dirty pages to disk for an entirely different reason, because it needs to free up memory in the buffer pool. This happens when SQL server comes under memory pressure. As far as I am aware, this is controlled by an internal process and there is no setting for it.
SQL server constantly monitors memory usage to assess resource contention (or availability); its job is to make sure that there is a certain amount of free space available at all times. As part of this process, when it notices any such resource contention, it triggers Lazy Writer to free up some pages in memory by writing out dirty pages to disk. It employs Least Recently Used (LRU) algorithm to decide which pages are to be flushed to the disk.
If Lazy Writer is always active, it could indicate memory bottleneck.

Memory Architecture

Following are some of the salient features of memory architecture.
  • One of the primary design goals of all database software is to minimize disk I/O because disk reads and writes are among the most resource-intensive operations.
  • Memory in windows can be called with Virtual Address Space, shared by Kernel mode (OS mode) and User mode (Application like SQL Server).
  • SQL Server "User address space" is broken into two regions: MemToLeave and Buffer Pool.
  • Size of MemToLeave (MTL) and Buffer Pool (BPool) is determined by SQL Server during startup.
  • Buffer management is a key component in achieving I/O highly efficiency. The buffer management component consists of two mechanisms: the buffer manager to access and update database pages, and the buffer pool to reduce database file I/O.
  • The buffer pool is further divided into multiple sections. The most important ones being the buffer cache (also referred to as data cache) and procedure cache. Buffer cache holds the data pages in memory so that frequently accessed data can be retrieved from cache. The alternative would be reading data pages from the disk. Reading data pages from cache optimizes performance by minimizing the number of required I/O operations which are inherently slower than retrieving data from the memory.
  • Procedure cache keeps the stored procedure and query execution plans to minimize the number of times that query plans have to be generated. You can find out information about the size and activity within the procedure cache using DBCC PROCCACHE statement.
Other portions of buffer pool include −
  • System level data structures − Holds SQL Server instance level data about databases and locks.
  • Log cache − Reserved for reading and writing transaction log pages.
  • Connection context − Each connection to the instance has a small area of memory to record the current state of the connection. This information includes stored procedure and user-defined function parameters, cursor positions and more.
  • Stack space − Windows allocates stack space for each thread started by SQL Server.

Data File Architecture

Data File architecture has the following components −

File Groups

Database files can be grouped together in file groups for allocation and administration purposes. No file can be a member of more than one file group. Log files are never part of a file group. Log space is managed separately from data space.
There are two types of file groups in SQL Server, Primary and User-defined. Primary file group contains the primary data file and any other files not specifically assigned to another file group. All pages for the system tables are allocated in the primary file group. User-defined file groups are any file groups specified using the file group keyword in create database or alter database statement.
One file group in each database operates as the default file group. When SQL Server allocates a page to a table or index for which no file group was specified when they were created, the pages are allocated from default file group. To switch the default file group from one file group to another file group, it should have db_owner fixed db role.
By default, primary file group is the default file group. User should have db_owner fixed database role in order to take backup of files and file groups individually.

Files

Databases have three types of files - Primary data file, Secondary data file, and Log file. Primary data file is the starting point of the database and points to the other files in the database.
Every database has one primary data file. We can give any extension for the primary data file but the recommended extension is .mdf. Secondary data file is a file other than the primary data file in that database. Some databases may have multiple secondary data files. Some databases may not have a single secondary data file. Recommended extension for secondary data file is .ndf.
Log files hold all of the log information used to recover the database. Database must have at least one log file. We can have multiple log files for one database. The recommended extension for log file is .ldf.
The location of all the files in a database are recorded in both master database and the primary file for the database. Most of the time, the database engine uses the file location from the master database.
Files have two names − Logical and Physical. Logical name is used to refer to the file in all T-SQL statements. Physical name is the OS_file_name, it must follow the rules of OS. Data and Log files can be placed on either FAT or NTFS file systems, but cannot be placed on compressed file systems. There can be up to 32,767 files in one database.

Extents

Extents are basic unit in which space is allocated to tables and indexes. An extent is 8 contiguous pages or 64KB. SQL Server has two types of extents - Uniform and Mixed. Uniform extents are made up of only single object. Mixed extents are shared by up to eight objects.

Pages

It is the fundamental unit of data storage in MS SQL Server. The size of the page is 8KB. The start of each page is 96 byte header used to store system information such as type of page, amount of free space on the page and object id of the object owning the page. There are 9 types of data pages in SQL Server.
  • Data − Data rows with all data except text, ntext and image data.
  • Index − Index entries.
  • Tex\Image − Text, image and ntext data.
  • GAM − Information about allocated extents.
  • SGAM − Information about allocated extents at system level.
  • Page Free Space (PFS) − Information about free space available on pages.
  • Index Allocation Map (IAM) − Information about extents used by a table or index.
  • Bulk Changed Map (BCM) − Information about extents modified by bulk operations since the last backup log statement.
  • Differential Changed Map (DCM) − Information about extents that have changed since the last backup database statement.

Log File Architecture

The SQL Server transaction log operates logically as if the transaction log is a string of log records. Each log record is identified by Log Sequence Number (LSN). Each log record contains the ID of the transaction that it belongs to.
Log records for data modifications record either the logical operation performed or they record the before and after images of the modified data. The before image is a copy of the data before the operation is performed; the after image is a copy of the data after the operation has been performed.
The steps to recover an operation depend on the type of log record −
  • Logical operation logged.
    • To roll the logical operation forward, the operation is performed again.
    • To roll the logical operation back, the reverse logical operation is performed.
  • Before and after image logged.
    • To roll the operation forward, the after image is applied.
    • To roll the operation back, the before image is applied.
Different types of operations are recorded in the transaction log. These operations include −
  • The start and end of each transaction.
  • Every data modification (insert, update, or delete). This includes changes by system stored procedures or data definition language (DDL) statements to any table, including system tables.
  • Every extent and page allocation or de allocation.
  • Creating or dropping a table or index.
Rollback operations are also logged. Each transaction reserves space on the transaction log to make sure that enough log space exists to support a rollback that is caused by either an explicit rollback statement or if an error is encountered. This reserved space is freed when the transaction is completed.
The section of the log file from the first log record that must be present for a successful database-wide rollback to the last-written log record is called the active part of the log, or the active log. This is the section of the log required to a full recovery of the database. No part of the active log can ever be truncated. LSN of this first log record is known as the minimum recovery LSN (Min LSN).
The SQL Server Database Engine divides each physical log file internally into a number of virtual log files. Virtual log files have no fixed size, and there is no fixed number of virtual log files for a physical log file.
The Database Engine chooses the size of the virtual log files dynamically while it is creating or extending log files. The Database Engine tries to maintain a small number of virtual files. The size or number of virtual log files cannot be configured or set by administrators. The only time virtual log files affect system performance is if the physical log files are defined by small size and growth_increment values.
The size value is the initial size for the log file and the growth_increment value is the amount of space added to the file every time new space is required. If the log files grow to a large size because of many small increments, they will have many virtual log files. This can slow down database startup and also log backup and restore operations.
We recommend that you assign log files a size value close to the final size required, and also have a relatively large growth_increment value. SQL Server uses a write-ahead log (WAL), which guarantees that no data modifications are written to disk before the associated log record is written to disk. This maintains the ACID properties for a transaction.

Monday, July 17, 2017

Deadlock



A common issue with SQL Server is deadlocks.  A deadlock occurs when two or more processes are waiting on the same resource and each process is waiting on the other process to complete before moving forward.  When this situation occurs and there is no way for these processes to resolve the conflict, SQL Server will choose one of processes as the deadlock victim and rollback that process, so the other process or processes can move forward.
By default when this occurs, your application may see or handle the error, but there is nothing that is captured in the SQL Server Error Log or the Windows Event Log to let you know this occurred.  The error message that SQL Server sends back to the client is similar to the following:
Msg 1205, Level 13, State 51, Line 3
Transaction (Process ID xx) was deadlocked on {xxx} resources with another process 
and has been chosen as the deadlock victim. Rerun the transaction. 
In this tutorial we cover what steps you can take to capture deadlock information and some steps you can take to resolve the problem.
Explanation
Deadlock information can be captured in the SQL Server Error Log or by using Profiler / Server Side Trace.

Trace Flags

If you want to capture this information in the SQL Server Error Log you need to enable one or both of these trace flags. 
  • 1204 - this provides information about the nodes involved in the deadlock
  • 1222 - returns deadlock information in an XML format
You can turn on each of these separately or turn them on together.  
To turn these on you can issue the following commands in a query window or you can add these as startup parameters.  If these are turned on from a query window, the next time SQL Server starts these trace flags will not be active, so if you always want to capture this data the startup parameters is the best option.
DBCC TRACEON (1204, -1)
DBCC TRACEON (1222, -1)

Thursday, July 6, 2017

Indexing Dos and Don’ts


Indexing is such a large subject that getting a handle on what to do and what not to do when you're developing your indexing strategies can be difficult. You create indexes to improve query response time. But indexing is a balancing act.

Each index is a system-managed table, so every addition or modification to the data in a user table potentially involves updating the indexes, which can cause slower performance of data updates. Everyone—from product vendors to database-performance specialists to DBAs who work on the production front lines—has ideas about how and when to index. Here's my personal list of dos and don'ts.

Do Index the Primary and Unique Key Columns

Make certain that the primary key for each table and any unique key columns that exist are indexed. You don't have to index a table's primary key yourself because when you create a table and declare the primary key, SQL Server automatically creates a unique index on that column or set of columns. However, if another column also contains unique data and you want to maintain the uniqueness of each column entry, you should create a unique index or a unique constraint on that column as well. A secondary benefit from indexing the unique key column is the query performance boost this operation gives to the joins or searches executed on the unique key column.

Do Index the Foreign Key Column

If a table contains one or more foreign key columns that reference other tables, you need to index those columns. Foreign key columns are usually nonunique because they're on the "many" side of the one-to-many (1:M) relationship. Therefore, you expect to see duplicate values in the foreign key column every time one row in the master table (on the "one" side) is related to two or more rows in the detail table (on the "many" side). You need to create a unique index on a foreign key column only if you're enforcing a mandatory one-to-one (1:1) relationship between two tables. In a 1:1 relationship, the values in the foreign key column are a unique set that corresponds to the primary key values in the master table. When the foreign key column in a table is indexed, SQL Server can scan the indexes for the rows it needs instead of scanning both tables.

Do Consider Using a Covering Index for Queries That Return Few Columns

A covering index contains all columns in all parts of a query, including the FROM and WHERE clauses. If SQL Server can scan the index to find the data it needs, it doesn't have to read the associated data pages. Consequently, query performance improves. For example, in the Authors table in the Pubs database, the aunmind covering index consists of author last name (au_lname) and author first name (au_fname). Usually, unless you specify differently in your query, SQL Server will choose whether to use an index. When you're working with tiny tables—tables that fit on one or two pages (8K blocks of space on the hard disk)—SQL Server will choose to perform a table scan instead of using an index. For little tables, a table scan (which requires only one hard-disk access) is less costly than reading an index, finding the location of the data rows, then reading the data rows (which requires two hard-disk accesses). But for a query such as
                              SELECT au_lname, au_fname FROM AUTHORS                              
SQL Server can use the covering index to build the result set, even for the tiny 24-row Authors table, as Figure 1 shows. For more information about designing indexes to improve query performance, see Kalen Delaney, Inside SQL Server, "9 Burning Questions," October 2002.

Do Consider Using a Clustered Index for Large Tables and Range-of-Values Queries

A clustered index determines the storage order of the records on disk. The Authors table is clustered on au_id, so au_id is the clustered key. When you query the Authors table without specifying ordering criteria, as follows:
                              SELECT * FROM Authors
The results might display in clustered key order (e.g., the lowest value of au_id might appear at the top of the list, as Figure 2 shows). In SQL Server 7.0 and later releases, ordering of the result set can vary, depending on the retrieval strategy that the query optimizer chooses. Therefore, don't count on record ordering. Instead, use the ORDER BY option in your queries. However, if you have a large table, particularly a table that you often query by using a range-of-values condition on the clustered key (e.g., WHERE au_id between '400-00-0000' and '499-99-9999'), the query performance will improve when you use a clustered index. If you join this table to other tables and the clustered key is also a joined column, the JOIN query performance might be better than if the query executed the join without a clustered index.

Do Index Sorting, Grouping, and Aggregating Columns

You also need to consider indexing columns that you use to order by and those that you use in a grouping expression. You might benefit from indexing the columns that the MIN(), MAX(), COUNT(), SUM(), and AVG() functions use to aggregate the data. When you use the MIN() and MAX() functions, SQL Server does a simple lookup for the minimum and maximum values in the column, respectively. If an index's data values are arranged in ascending order, SQL Server can read the index to quickly determine the correct values of MIN() or MAX(). The range-of-values query incorporates a filter or constraint (expressed in the SELECT query's WHERE clause or HAVING clause) to limit the rows that the query returns. Similarly, when you have an index, you can optimize data sorting (by using the ORDER BY clause) and data grouping (by using the GROUP BY clause), especially if the table or tables you're querying contain many rows.

Do Consider a DSS

Indexing in a transactional environment is a balancing act. You try to enhance query response by creating indexes—but doing so decreases performance of inserts, updates, and deletes. If your querying compromises the database's transactional requirements, and if you decide that you can't resolve the two conflicting requirements, consider creating a separate decision support system (DSS). A DSS is essentially a database and a set of queries that produce reports. The DSS can be a simple copy of the transactional database enhanced by tables of summary data and views that make running reports simpler and quicker than using the transactional database. The DSS doesn't need to be a full-fledged data mart or data warehouse running on its own server. The DSS can share the same hardware platform that the transactional database is running on as long as the hardware is robust enough to support all the user connections.
And the DSS doesn't necessarily have to be read-only: You can create views on the DSS that are designed to facilitate report-writing and which, because of their complexity, make data seen through a view act as though it's read-only. When you implement a DSS, you can take the query load off your transactional database by analyzing the most commonly executed queries, creating covering indexes for the tables, and creating effective indexed views for the data.

Don't Over-Index Transactional Tables with Heavy I/O Activity

A transactional table is a table in an online transaction processing (OLTP) database. You can use transactional tables for reports or as input to programs that export data (e.g., in a SELECT ... FOR XML query). However, the OLTP database's primary purpose is capturing data, and if you over-index a transactional table, you might inhibit performance of INSERT, UPDATE, and DELETE operations. Each index is a database structure—a table that contains a column of values arranged in ascending order, with bookmarks that point to associated rows in the data pages. Therefore, when you add a record to a transactional table, an INSERT operation occurs not only on the data page, but also in all associated indexes. And when you delete a row from the data page, SQL Server removes the corresponding row from all associated indexes. Certainly you need to create some indexes on heavily used OLTP tables, but don't overdo it—an UPDATE operation involves not only the data page but also the number of index pages that matches the number of indexed columns that you need to modify.
Don't Index Wide ColumnsIf a table's columns are wide (i.e., the columns have a data type of char(25) or larger), indexes created on those columns might result in a large number of pages, thereby negating the benefit of using an index in the first place. Nonclustered indexes in SQL Server 2000 and 7.0 are structured differently from those in earlier releases. If you're not careful, you can create indexes that take more space on disk than the original data pages.

A Double-Edged Sword

Proper indexing can make a sluggish database perform better. Conversely, improper indexing can make a high-performing database run poorly. The difference depends on how you structure the indexes—and on your follow-up. You need to test and observe the effect of a newly created index on your production environment, and you need to remove indexes that compromise your database's performance. Your indexing strategy is an important part of your responsibilities as a physical data modeler and DBA.