When you are testing databases (i.e. Microsoft SQL Server or MongoDB) or ETL-tools (like Pentaho or SSIS) you often need great amounts of data. The tool tpc-h from the „Transaction Processing Perfomance Council“ fits this needs and generates as much data as you need. This should be a short tutorial how to compile the tool an generate some data.
1. First of all, we start downloading the tpc-h from the homepage. The download link is located at the right site. The package should be arround 25 megabytes. After downloading you can extract it.
2. You need a c++ compiler (because the tpc-h download just holds the source code). I used the Visual Studio 2012 Express Edition, it can be downloaded at http://www.microsoft.com/visualstudio/eng/downloads#d-2012-express. Even it is for free, you have to register, so keep that in mind. Select „Visual Studio 2012 for Windows Desktop“ or „Visual Studio 2012 for Windows 8“, download it and install it. Afterwards you should see something like the following.
4. Right click on „dbgen“ on the upper right cornder and select „Build“. Same procedure for „qgen“. In the Output windows you should read something like „Build succeeded“.
5. You should now have another subfolder „\tpch_version\dbgen\Debug“ with „dbgen.exe“ inside it. Copy this file to „\tpch_version\dbgen\“ in order to work correctly. Now start the command line tool, switch to the folder and enter „dbgen -h“ to get a short overview over the possible commands of tpc-h.
To have an overview about the data model of the tpc-h database, here is a small entity-relationship-diagram. If you want some closer information, you should check the documentation available at http://www.tpc.org/tpch/default.asp.
6. To generate data for the full data model just type in „dbgen –s 1“. You should see 8 tbl-files in the folder, this may take a few minutes.
7. If you look at the file „customer.tbl“ you see data in the flat file format, separated by a pipe („|“).
You now have generated successfully some test data for any database.Social tagging: dbgen > mongodb > sql server > test data > tpc-h