code prettify

Monday, 5 June 2017

Random human recognizable dataset

We all do need sometimes to generate raw valid dummy data for our use cases and applications as we start them. Obviously, one can write their own scripts to generate random data, but it is much better to have data, to which human beings can associate with like names, addresses instead of having them filled with random "lorem ipsum" string data :)

While searching for such a tool, I found a site which does exactly this: http://www.generatedata.com/

Documentation: http://benkeen.github.io/generatedata/

This can also be downloaded and installed locally. It supports three types of installations:
- A single, anonymous user account
- A single user account, requires login
- Multiple accounts

Below is the set of wide varied data types it supports for generating random data in different categories:


Well, for the sake of demo, I am planning to create a dataset for my Employee table.

Starting on it, I want to have a unique column for which I will use GUID type. Then, I want to have a first_name and last_name column for which I select the type as "Names" which then provides many options in the right examples dropdown as to what kind of naming pattern I want like full name or first name or surname etc of which I select first name and surname for my use case.


Then, I go on to my next column date_joined as "date" type which since I am intending use in MySQL database, have chosen MySQL datetime in the examples dropdown list, which then provides me the range of date from which the values should be randomly selected. There are other popular date formats provided too to choose from.


Then I select email type which does not have any options to it and will be populates with some valid human friendly emails. Next is phone, for which it provides me which country region specific format I want. It has a limited list as of now but it would suffice for now. I choose UK from the list and it shows me the various formats in the options column.


Next is zip which shows me that it will select zip values specific to the "Indian States & UT" since I had selected in the very start to localize my dataset to India.


Then I select a few more useful columns like phone, city, region and credit card. For credit card, it also provides many options to choose from.


I chose to download the data in SQL format and below is how it looks in web view.


Now, the ice on the cake offering is that you get to download the dataset in many well supported popular formats like csv, excel, html, json, ldif, programming language (hurrays!), sql and xml.


Below is how my final schema looks like:



Just to give an idea of the dataset, I downloaded it in JS format for a clear view as to how the data looks.


Looks good to me and I am more than satisfied with it :)

So to round up, the features you get from this tool are:
- Generate random human recognizable dataset.
- Random dataset spans across many categories like name, phone, email, date, company, street, city, country, pan, pin etc.
- The dataset can be localized to a specific country.
- It supports downloading the generating data in many well-known formats like csv, excel, html, json, ldif, programming language specific, sql and xml.

One of the limitations is that you can generate only 100 rows as of now from the site, but I believe you can generate more if you run it from your local setup.

Hope this tool helps others as it has helped me. Give it a try for the sake of fun :)