Create test data with tpc-h for i.e. SQL Server or MongoDB

When you are testing databases (i.e. Microsoft SQL Server or MongoDB) or ETL-tools (like Pentaho or SSIS) you often need great amounts of data. The tool tpc-h from the „Transaction Processing Perfomance Council“ fits this needs and generates as much data as you need.  This should be a short tutorial how to compile the tool an generate some data.

1. First of all, we start downloading the tpc-h from the homepage. The download link is located at the right site. The package should be arround 25 megabytes. After downloading you can extract it.

tpc-h homepage

2. You need a c++ compiler (because the tpc-h download just holds the source code). I used the Visual Studio 2012 Express Edition, it can be downloaded at http://www.microsoft.com/visualstudio/eng/downloads#d-2012-express. Even it is for free, you have to register, so keep that in mind. Select „Visual Studio 2012 for Windows Desktop“ or „Visual Studio 2012 for Windows 8“, download it and install it. Afterwards you should see something like the following.

visual studio 2012 gui3. Go to the location of tpc-h and navigate to the subfolder „\tpch_version\dbgen“ and open „tpch.sln“. You will be asked to upgrade the project, just click ok and go on.

4. Right click on „dbgen“ on the upper right cornder and select „Build“. Same procedure for „qgen“. In the Output windows you should read something like „Build succeeded“.

 tpc-h compile

5. You should now have another subfolder „\tpch_version\dbgen\Debug“ with „dbgen.exe“ inside it. Copy this file to „\tpch_version\dbgen\“ in order to work correctly. Now start the command line tool, switch to the folder and enter „dbgen -h“ to get a short overview over the possible commands of tpc-h.

tpc-h dbgen command line

To have an overview about the data model of the tpc-h database, here is a small entity-relationship-diagram. If you want some closer information, you should check the documentation available at http://www.tpc.org/tpch/default.asp.

tpc-h ER-Diagram

6. To generate data for the full data model just type in „dbgen –s 1“. You should see 8 tbl-files in the folder, this may take a few minutes.

tpc-h test data generation

7. If you look at the file „customer.tbl“ you see data in the flat file format, separated by a pipe („|“).

tpc-h Customer-Tbl Output

You now have generated successfully some test data for any database.

Social tagging: > > > >

Schreibe einen Kommentar