Open Source Thinking: 2017

Monday, 18 September 2017

MySQL 8 - Roles

With the next version of MySQL that is MySQL 8, there is a very nice feature of creating "roles" which can be assigned certain privileges and then these roles can be assigned to users, thus helping us in maintaining the principle of least privilege.

It makes our life as developers and DBAs easier as we do not have to remember what are the specific privileges assigned to different users. These "roles" can simply be assigned or de-assigned from users as necessary.

The setup process is pretty easy and intuitive.

1. Create Role:

CREATE ROLE 'read', 'write', 'read_write';

Which essentially creates a set of named roles which do not have any privileges assigned to them, as yet.

2. Grant/Assign Privileges to a Role:

GRANT SELECT ON sample_db.* TO 'read';

3. Assign Role to user:

GRANT 'read' TO 'john'@'localhost', 'mary'@'localhost';

Reference: https://dev.mysql.com/doc/refman/8.0/en/roles.html

Monday, 5 June 2017

Random human recognizable dataset

We all do need sometimes to generate raw valid dummy data for our use cases and applications as we start them. Obviously, one can write their own scripts to generate random data, but it is much better to have data, to which human beings can associate with like names, addresses instead of having them filled with random "lorem ipsum" string data :)

While searching for such a tool, I found a site which does exactly this: http://www.generatedata.com/

Documentation: http://benkeen.github.io/generatedata/

This can also be downloaded and installed locally. It supports three types of installations:
- A single, anonymous user account
- A single user account, requires login
- Multiple accounts

Below is the set of wide varied data types it supports for generating random data in different categories:

Well, for the sake of demo, I am planning to create a dataset for my Employee table.

Starting on it, I want to have a unique column for which I will use GUID type. Then, I want to have a first_name and last_name column for which I select the type as "Names" which then provides many options in the right examples dropdown as to what kind of naming pattern I want like full name or first name or surname etc of which I select first name and surname for my use case.

Then, I go on to my next column date_joined as "date" type which since I am intending use in MySQL database, have chosen MySQL datetime in the examples dropdown list, which then provides me the range of date from which the values should be randomly selected. There are other popular date formats provided too to choose from.

Then I select email type which does not have any options to it and will be populates with some valid human friendly emails. Next is phone, for which it provides me which country region specific format I want. It has a limited list as of now but it would suffice for now. I choose UK from the list and it shows me the various formats in the options column.

Next is zip which shows me that it will select zip values specific to the "Indian States & UT" since I had selected in the very start to localize my dataset to India.

Then I select a few more useful columns like phone, city, region and credit card. For credit card, it also provides many options to choose from.

I chose to download the data in SQL format and below is how it looks in web view.

Now, the ice on the cake offering is that you get to download the dataset in many well supported popular formats like csv, excel, html, json, ldif, programming language (hurrays!), sql and xml.

Below is how my final schema looks like:

Just to give an idea of the dataset, I downloaded it in JS format for a clear view as to how the data looks.

Looks good to me and I am more than satisfied with it :)

So to round up, the features you get from this tool are:
- Generate random human recognizable dataset.
- Random dataset spans across many categories like name, phone, email, date, company, street, city, country, pan, pin etc.
- The dataset can be localized to a specific country.
- It supports downloading the generating data in many well-known formats like csv, excel, html, json, ldif, programming language specific, sql and xml.

One of the limitations is that you can generate only 100 rows as of now from the site, but I believe you can generate more if you run it from your local setup.

Hope this tool helps others as it has helped me. Give it a try for the sake of fun :)

Monday, 30 January 2017

MySQL Procedure Analyse Use

Sometimes we are not sure if the database design we have created and our site has been using is correct and optimised. We all do have this feeling at times don't we :)

Well, MySQL provides with PROCEDURE ANALYSE() to help us detect inconsistencies in our database design by suggesting for an optimal datatype and data length for columns.

The syntax for using PROCEDURE ANALYSE() is as below:

SELECT ... FROM ... WHERE ... PROCEDURE

ANALYSE([max_elements,[max_memory]])

Ref: http://dev.mysql.com/doc/refman/5.7/en/procedure-analyse.html

We will create a sample database for our case here and try out some things.

Lets create a database called analyse_db.

mysql> CREATE DATABASE `analyse_db`

DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;

Then lets create a table called employee where we can keep our data.

mysql> CREATE TABLE `analyse_db`.`employee` (

`id` INT NOT NULL AUTO_INCREMENT,

`name` VARCHAR(100) NOT NULL,

`dept_id` INT NULL DEFAULT NULL,

`country` VARCHAR(100) NOT NULL,

PRIMARY KEY (`id`))

ENGINE = InnoDB;

Lets insert some data into it, so that finally we have some records in it as displayed below:

mysql> SELECT * FROM `employee`;

Let us now run PROCEDURE ANALYSE() on the table.

mysql> SELECT * FROM `employee` PROCEDURE ANALYSE();

So from the analysis result we can take away below points:

First row output

1. From the first row output above, employee.id column seems to be fine, but the output suggests that we have enum as datatype for it. The suggestion makes sense given the less number of records we have in our table right now but in the long run our table would be much larger and each row value would be unique and enum would not suit our purpose. So we can neglect this suggestion.

Second row output

2. From the second row output above, employee.name has a min and max length of 8 and 13, average length of 10 and we have assigned a max varchar length of 100 while creating the table. This seems like a wrong decision on our part to assign it 100, when the range of length for values for this column is in between 8 and 13, but taking into consideration the very less data and our future estimate that there will be longer names with some even having middle names, our column design is fine. Also taking into consideration point 1 above, enum type can similarly be neglected for this column too.

Third row output

3. From the third row output above, employee.dept_id has repeated values of same departments which if are not mapped to another department table and do not have probability to change then they can be added as a enum type, but since there may be a probability that a new department can be added in future, our consideration for the data type is fine.

In addition to this, the result also shows that we have allowed for NULL in our schema while there are no NULL values in any of the values for the column, so we can mark it as NOT NULL, which makes SQL operations faster, provided we also have in our requirement that every employee should have a department value assigned.

Excerpt from the MySQL site manual page related to table column optimisation:

"- Use the most efficient (smallest) data types possible. MySQL has many specialized types that save disk space and memory. For example, use the smaller integer types if possible to get smaller tables. MEDIUMINT is often a better choice than INT because a MEDIUMINT column uses 25% less space.

- Declare columns to be NOT NULL if possible. It makes SQL operations faster, by enabling better use of indexes and eliminating overhead for testing whether each value is NULL. You also save some storage space, one bit per column. If you really need NULL values in your tables, use them. Just avoid the default setting that allows NULL values in every column."

Ref: https://dev.mysql.com/doc/refman/5.7/en/data-size.html

Fourth row output

4. From the fourth row output above, our country list consists of only four set of values for this example but in a real world example this may vary with an increased constant list which is very rarely modified. So in this case we can change this column to enum with the list of countries, since this will help us save some storage space and will provide us an additional constraint on the values stored for country.

Note: Even phpMyAdmin has an interface to run this check and access this info. In the table structure page, you will find a "Propose table structure" link which does this action for you as shown below:

Propose table structure link.

Propose table structure action result.

So to conclude PROCEDURE ANALYSE() provides some very important insight into schema design for an optimal data type like column length, type and NULL value usage, which we can consider, but not all of it needs to be taken into account considering application requirement, future data possibilities and usage :)

Open Source Thinking

code prettify