code prettify

Monday 30 January 2017

MySQL Procedure Analyse Use

Sometimes we are not sure if the database design we have created and our site has been using is correct and optimised. We all do have this feeling at times don't we :)


Well, MySQL provides with PROCEDURE ANALYSE() to help us detect inconsistencies in our database design by suggesting for an optimal datatype and data length for columns.

The syntax for using PROCEDURE ANALYSE() is as below:

SELECT ... FROM ... WHERE ... PROCEDURE
ANALYSE([max_elements,[max_memory]])


Ref: http://dev.mysql.com/doc/refman/5.7/en/procedure-analyse.html

We will create a sample database for our case here and try out some things.

Lets create a database called analyse_db.

mysql> CREATE DATABASE `analyse_db`
DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;


Then lets create a table called employee where we can keep our data.

mysql> CREATE TABLE `analyse_db`.`employee` (
`id` INT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(100) NOT NULL,
`dept_id` INT NULL DEFAULT NULL,
`country` VARCHAR(100) NOT NULL,
PRIMARY KEY (`id`))
ENGINE = InnoDB;


Lets insert some data into it, so that finally we have some records in it as displayed below:

mysql> SELECT * FROM `employee`;

Let us now run PROCEDURE ANALYSE() on the table.

mysql> SELECT * FROM `employee` PROCEDURE ANALYSE();




So from the analysis result we can take away below points:


First row output

1. From the first row output above, employee.id column seems to be fine, but the output suggests that we have enum as datatype for it. The suggestion makes sense given the less number of records we have in our table right now but in the long run our table would be much larger and each row value would be unique and enum would not suit our purpose. So we can neglect this suggestion.

Second row output

2. From the second row output above, employee.name has a min and max length of 8 and 13, average length of 10 and we have assigned a max varchar length of 100 while creating the table. This seems like a wrong decision on our part to assign it 100, when the range of length for values for this column is in between 8 and 13, but taking into consideration the very less data and our future estimate that there will be longer names with some even having middle names, our column design is fine. Also taking into consideration point 1 above, enum type can similarly be neglected for this column too.

Third row output

3. From the third row output above, employee.dept_id has repeated values of same departments which if are not mapped to another department table and do not have probability to change then they can be added as a enum type, but since there may be a probability that a new department can be added in future, our consideration for the data type is fine.

In addition to this, the result also shows that we have allowed for NULL in our schema while there are no NULL values in any of the values for the column, so we can mark it as NOT NULL, which makes SQL operations faster, provided we also have in our requirement that every employee should have a department value assigned.

Excerpt from the MySQL site manual page related to table column optimisation:

"- Use the most efficient (smallest) data types possible. MySQL has many specialized types that save disk space and memory. For example, use the smaller integer types if possible to get smaller tables. MEDIUMINT is often a better choice than INT because a MEDIUMINT column uses 25% less space.

- Declare columns to be NOT NULL if possible. It makes SQL operations faster, by enabling better use of indexes and eliminating overhead for testing whether each value is NULL. You also save some storage space, one bit per column. If you really need NULL values in your tables, use them. Just avoid the default setting that allows NULL values in every column."

Ref: https://dev.mysql.com/doc/refman/5.7/en/data-size.html

Fourth row output

4. From the fourth row output above, our country list consists of only four set of values for this example but in a real world example this may vary with an increased constant list which is very rarely modified. So in this case we can change this column to enum with the list of countries, since this will help us save some storage space and will provide us an additional constraint on the values stored for country.

Note: Even phpMyAdmin has an interface to run this check and access this info. In the table structure page, you will find a "Propose table structure" link which does this action for you as shown below:

Propose table structure link.

Propose table structure action result.

So to conclude PROCEDURE ANALYSE() provides some very important insight into schema design for an optimal data type like column length, type and NULL value usage, which we can consider, but not all of it needs to be taken into account considering application requirement, future data possibilities and usage :)