Running the Ruby on Rails Database Performance Benchmark

The Ruby on Rails Performance Benchmark is described in a white paper available on the Clustrix website. I’m going to go through the high-level steps to get the appropriate pieces of software deployed and configured within your user environment so that you can take a look at benchmarking a 3-node Clustrix cluster.

1. Ensure You Have the Required Hardware and Software

First, it is necessary to have several servers upon which to deploy the software. You should have the following:

  • Master client:
    • local MySQL instance or Clustrix devkit (for hosting the test database)
    • Python 2.6.6
  • Slave client(s):
    • Python 2.6.6
    • Ruby 1.9.3p125
    • Rails 3.2.3
  • Clustrix Cluster or MySQL server
    • Either a Clustrix Cluster, the Clustrix devkit, or MySQL 5.1.61 (generally available over yum)

It is possible to combine several of the software components from the various servers described here but for the purposes of this discussion I’ll assume that you have the following three separate servers: one master server (named “MASTER”), one slave server (named “SLAVE”), and one 3-node Clustrix cluster (named “CLUSTRIX_3NODE”).

Note that steps taken for the single slave client described herein could also be extended to a group of slave clients.

2. User Permissions and Passwordless-SSH

Throughout this post I’ll assume that you are operating with a username of “root” on all systems.

Establish passwordless-SSH, LDAP access, or some other strategy for authenticating from one machine to another with little effort. Or, if you really like typing your password, that should work as well. ;)

3. Download the Benchmark Code

Navigate to the link on the Clustrix website listed in the whitepaper’s “ADDITIONAL INFORMATION” section for the link to the code required to implement the benchmark. Download this and extract it. The result should be three files: “rails-benchmark.tar.gz”, “rails-benchmark-automation.tar.gz”, and “deploy_all_local.sh”.

4. Configure the Master Client

We’ll need to make some directories where the master’s test automation code will live:

MASTER>mkdir /temp
MASTER>mkdir /temp/working
MASTER>mkdir /rails-benchmark-results
MASTER>mkdir /rails-benchmark-automation-master

Now, extract the “rails-benchmark-automation.tar.gz” file into the “/rails-benchmark-automation-master” directory.

5. Configure the Slave Client

SLAVE>mkdir /temp
SLAVE>mkdir /temp/incoming

Copy the three files from the benchmark zip file (“rails-benchmark.tar.gz”, “rails-benchmark-automation.tar.gz”, and “deploy_all_local.sh”) to /tmp. First you’ll need to edit the last line of the “deploy_all_local.sh” file and then execute it. The last bit of the file looks like this:

<SNIP>

now=$(date +”%Y-%m-%d_%H.%M.%S”)

selected_operation LOCAL_USERNAME sudo

<SNIP>
SLAVE>./deploy_all_local.sh
++ date +%Y-%m-%d_%H.%M.%S
+ now=2012-09-29_16.06.20
+ selected_operation root sudo
+ dep_to_clients root sudo
+ sudo mkdir /temp
<SNIP>
+ sudo cp /tmp/rails-benchmark /rails-benchmark-097 -r
+ sudo chown root /rails-benchmark-097 -R
+ for o in 0 1 2 3 4 5 6 7 8 9
+ sudo cp /tmp/rails-benchmark /rails-benchmark-098 -r
+ sudo chown root /rails-benchmark-098 -R
+ for o in 0 1 2 3 4 5 6 7 8 9
+ sudo cp /tmp/rails-benchmark /rails-benchmark-099 -r
+ sudo chown root /rails-benchmark-099 -R

The result should be a directory named /rails-benchmark-automation-slave as well as 100 directories named /rails-benchmark-000 to /rails-benchmark-099.

6. Build the Seed Data Set on the Target Database

First, create a database to hold the seed information.

CLUSTRIX_3NODE>create database dbSeed;
Query OK, 1 row affected (0.06 sec)

Then, an empty schema must be loaded into this seed database:

SLAVE>mysql -hCLUSTRIX_3NODE -uroot dbSeed < /rails-benchmark-automation-slave/sql_dump_files/empty_benchmark_schema.DUMP.sql

Now, edit the database.yml file in the /rails-benchmark-099 directory to point to the targe Clustrix database. Note that this change and the following edits to the ruby files will be overwritten by the test automation code as it executes.

SLAVE>vi /rails-benchmark-099/config/database.yml

<SNIP>
production:
adapter: mysql2
reconnect: false
database: dbSeed
pool: 25
username: root
host: CLUSTRIX_3NODE
# socket: /data/clustrix/p1/mysql.sock
password:
<SNIP>

And edit the same benchmark instance’s benchmark.rb file as shown to indicate that a database should be built with 10000 users:

SLAVE>vi /rails-benchmark-099/lib/tasks/benchmark.rake

<SNIP>
task :populate => :environment do
BenchmarkDatabase.populate_with_dummy_data(10000)
end
<SNIP>

A Rake command is called to actually populate the seed database with information. Note that one must change into the project directory to run the Rake command:

SLAVE>cd /rails-benchmark-099/
SLAVE>rake benchmark:db:clean benchmark:db:populate RAILS_ENV=production
Note that the rake command returns successfully with no messages but the seed database will have rows in it:
CLUSTRIX_3NODE> select count(*) from users;
+----------+
| count(*) |
+----------+
| 10000 |
+----------+
1 row in set (0.01 sec)

The numbers of rows in each of the seed database’s tables should be observed and recorded for later calculations:

CLUSTRIX_3NODE>select count(*) from auctions ;
CLUSTRIX_3NODE>select count(*) from auction_photos ;
CLUSTRIX_3NODE>select count(*) from auction_tags ;
CLUSTRIX_3NODE>select count(*) from bids ;
CLUSTRIX_3NODE>select count(*) from tags ;
CLUSTRIX_3NODE>select count(*) from users ;
CLUSTRIX_3NODE>select count(*) from user_addresses ;
CLUSTRIX_3NODE>select count(*) from user_comments ;
CLUSTRIX_3NODE>select count(*) from user_details ;
CLUSTRIX_3NODE>select count(*) from user_phones ;

Finally, dump the seed to ensure that subsequent efforts produce identical initial data sets to this one:

MASTER>mysqldump -hCLUSTRIX_3NODE -uroot dbSeed > dbSeed.DUMP.sql

7. Next the seed data set built in the last step is “stacked” to the desired number of increments. For the benchmark white paper efforts, a 581MB seed database was “stacked” 850 times to produce an initial data set of ~475GB.

First, an empty database is created:

CLUSTRIX_3NODE> create database dbRorPerfBench;
Query OK, 0 rows affected (0.05 sec)

Then, an empty schema is loaded into this database:

MASTER>mysql -hCLUSTRIX_3NODE -uroot dbRorPerfBench < /rails-benchmark-automation-slave/sql_dump_files/empty_benchmark_schema.DUMP.sql

Next, the “main” Pyton function in the “ConcurrentDBScaler.py” file is edited to point at the target database and to indicate the desired number of seed “increments”:

MASTER>vi /rails-benchmark-automation-master/scale_seed_db/scale_fullauto/ConcurrentDBScaler.py

<SNIP>
def main():
nNumberOfIncrements = 850
sTargetHost = "CLUSTRIX_3NODE"
sTargetDBName = "dbRorPerfBench"

printFormattedTestInfo("SCALE SEED DATABASES")
<SNIP>

The “checkIncrement” function in the same file also needs to be altered to match the number of rows observed in the seed database tables (in alphabetical order: auctions, auction_photos…user_phones):

<SNIP>
def checkIncrement(nIncrement, db):
#constants
clsCompleteRowCounts = [124555, 373665, 100113, 125215, 100000, 10000, 45149, 250597, 10000, 45094]
lsLow = [str(((nIncrement + 0) * i) + 1) for i in clsCompleteRowCounts]
<SNIP>

Then, execute this Python file:

MASTER>./ConcurrentDBScaler.py
 ********************************************************************************************************************************************
******************************************* 1.SCALE SEED DATABASES (2012-09-29 17:23:58.388519) ******************************************** ********************************************************************************************************************************************
 <SNIP>

One can then check to see that the proper number of rows are present in the scaled database. For each table, this is done by multiplying the number of rows in the seed databases’s version of that table by the number of times the seed database was “stacked”. For example 10,000 rows in the “users” table stacked 850 times should mean that dbRorPerfBench has 8,500,000 rows in its “users” table.

It is possible that multiple stacking processes all running at the same time will conflict with one-another such that incomplete increments can result in the initial data set. All tables should be checked to verify that they are the correct size. Any deviations can be corrected by running “ConcurrentDBScaler.py” again with no farther edits; it will go back through the database, delete out any malformed increments and replace them with the correct information.

The initial database can also be checked on an increment-by-increment basis by using the following command, after editing that file’s “clsCompleteRowCounts” variable to reflect the correct number of rows in the seed database:

MASTER>/rails-benchmark-automation-master/scale_seed_db/check_db/check_db_results.py CLUSTRIX_3NODE dbRorPerfBench
0000:OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
100:OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
200:OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
<SNIP>
4800:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
4900:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The individual increments are represented from a possible 0000 position to a 4999 position (5000 total). The values ‘O’, ‘X’, and ‘.’ indicate increment completely filled, increment empty, and increment partially filled.

8. Loading the Test Database

The initial test database needs to be loaded onto the Clustrix devkit or MySQL instance running on the master client. First an empty database is created to hold the imported db:

MASTER>create database dbRoRPerfTesting;
Query OK, 1 row affected (0.00 sec)

Then, the initial test database is restored to this empty database that we just created:

MASTER>mysql -hlocalhost -uroot dbRoRPerfTesting < /rails-benchmark-automation-master/sql_dump_files/initial_test_db.DUMP.sql

9. Configuring a Campaing of Tests

The test administrator then needs to create a campaign of tests reflecting the desired workload and a series of increasing concurrency levels. This provides the information to the test automation code to allow it to run the benchmark tests. Lucky you, this has already been done in the database you just imported. It is preconfigured to have 17 tests set to run on both Clustrix and on MySQL, a total of 34 tests. This can be seen by entering the following:

     MASTER>select * from vTestsRemaining;
     +---------+-----------+----------+--------------------------------------+--------------+------------------------+--------------+---------------------+
     | nTestID | nCampaign | sDBType  | sTargetHostString_X                  | nClientCount | nTestDurationInSeconds | nThreadCount | nRubyCountPerClient |
     +---------+-----------+----------+--------------------------------------+--------------+------------------------+--------------+---------------------+
     |       1 |         1 | CLUSTRIX | -1                                   |            1 |                    600 |            1 |                   1 |
     |       2 |         1 | CLUSTRIX | -1                                   |            1 |                    600 |            4 |                   1 |
     |       3 |         1 | CLUSTRIX | -1                                   |            1 |                    600 |            4 |                   4 |
     |       4 |         1 | CLUSTRIX | -1                                   |            1 |                    600 |            4 |                  16 |
     <SNIP>
     |      15 |         1 | MYSQL    | IP_ADDRESS_FOR_MYSQL_SERVER_AND_PORT |            1 |                    600 |            4 |                  47 |
     |      16 |         1 | MYSQL    | IP_ADDRESS_FOR_MYSQL_SERVER_AND_PORT |            1 |                    600 |            4 |                  53 |
     |      17 |         1 | MYSQL    | IP_ADDRESS_FOR_MYSQL_SERVER_AND_PORT |            1 |                    600 |            4 |                  59 |
     +---------+-----------+----------+--------------------------------------+--------------+------------------------+--------------+---------------------+
     34 rows in set (0.00 sec)

Things are generally set up but some configuration must be entered to set the IP addresses of the destination databases and the client machines. The following changes will tell the test automation code to skip the MySQL tests, that the target database is named ‘CLUSTRIX_3NODE’, and that the slave client list includes only ‘slave’. These changes are applied only to campaign 1 (as multiple campaigns may very well exist in your test database):

MASTER>update tTestParameters set bSkipTest_M = 1, sInitialTargetHostName_C = 'CLUSTRIX_3NODE', sClientList = 'slave' where nCampaign = 1;

10. Running the Benchmark

Now all the setup has been completed and the benchmark is ready to run. Checking “select * from vTestsRemaining;” against the test database, we see that 17 tests are ready to run. These 17 tests will be run in series:

MASTER>/rails-benchmark-automation-master/perf_test/perfTestMaster_MultiRuby.py
     ********************************************************************************************************************************************
     *************************** 1.FETCH PARAMETERS FOR TESTS TO BE RUN FROM TESTING DB (2012-09-29 18:56:35.551921) ****************************
     ********************************************************************************************************************************************
     ooooooooooooooooooooooooooo Master version: 1.16 (edited 2012-09-11)
     ooooooooooooooooooooooooooo TESTS TO RUN...
     {'nCampaign': 1L, 'nTestDurationInSeconds': 15L, 'sDBType': 'CLUSTRIX', 'sTargetHostString_X': '-1', 'nThreadCount': 1L, 'nTestID': 1L, 'nClientCount': 1L, 'nRubyCountPerClient': 1L}
     {'nCampaign': 1L, 'nTestDurationInSeconds': 15L, 'sDBType': 'CLUSTRIX', 'sTargetHostString_X': '-1', 'nThreadCount': 4L, 'nTestID': 2L, 'nClientCount': 1L, 'nRubyCountPerClient': 1L}
     {'nCampaign': 1L, 'nTestDurationInSeconds': 15L, 'sDBType': 'CLUSTRIX', 'sTargetHostString_X': '-1', 'nThreadCount': 4L, 'nTestID': 3L, 'nClientCount': 1L, 'nRubyCountPerClient': 4L}
     {'nCampaign': 1L, 'nTestDurationInSeconds': 15L, 'sDBType': 'CLUSTRIX', 'sTargetHostString_X': '-1', 'nThreadCount': 4L, 'nTestID': 4L, 'nClientCount': 1L, 'nRubyCountPerClient': 16L}

 

     <SNIP>
     ********************************************************************************************************************************************
     ********************************* 127.AGGREGATING THE CLIENT TEST JSON FILES (2012-09-29 18:58:36.229014) **********************************
     ********************************************************************************************************************************************