Overview
We have just been involved with extensive load testing on our current project, a portlet based project running on Liferay Portal Server (which in turn runs on top of
Tomcat), and making heavy use of Spring Portlet MVC for front end development. The following considerations came into play:
Understand The Top Command: Average Load versus CPU Percentage
Our Tomcat/Liferay portlet web app runs on a LINUX server. Therefore, for a high level understanding of how our load was being handled by the server, we made use of the “top” command. We initially used the CPU percentage as an indicator of server load but soon discovered this is a high misleading measurement. Take a look at http://www.linuxjournal.com/article/9001 to understand why CPU percentage is an inaccurate proxy for the load. It really only captures snapshots of how frequently some task is running on the processor but it does not properly capture how many tasks are queuing up to run on the available processors, how much contention/context switching is occuring as a result of the contention for the processors, etc. Also, note that the load can and likely will exceed the number of available processors. Finding the point at which the average load (for a time analogous to the period of the test) equals four or five times the number of available processors, in the case of largely I/O bound processes, or closer to the number of processors for processes that are largely CPU-bound, can help indicate the sweet spot at which the server is smoothly processing requests without major contention.
Use the Right Load Average
top provides 1, 5, and 15 minute running averages. For quick burst tests, these measures will not at all be accurate. So it is important to run tests that run at least a couple of minutes (in order to get an accurate 1 minute running average) which ramp up load, peak load in the middle of the testing period, and then bring load back down.
Perform Baseline Observations
Make sure to perform baseline observations of the server (to see what basic load is on the server before the test as well as memory usage). It probably pays to take several measurements of this to make sure that your app server is not currently running other tasks with unpredictable load or performance. Then perform observations under load, and then additional observations after the test is completed to make sure that load drops back to baseline and that memory usage as well (i.e. there are no memory leaks).
Make Sure Connection Pool Settings Are Correct
Liferay, the Java portal server we are using, uses connection pooling internally by default for its metadata, but its defaults are set very high, and we ended up using an excessive number of connections to the database, especially when we started to ramp up our load tests. Try to guard against this by setting minimum and maximum pool sizes to reasonable amounts, both for any web apps with which you are testing as well as for application server metadata database connections.
Set Idle Connection Settings Appropriately
As an extension to the previous point, it is important that the settings include reclamation of idle connections. We are using c3p0 for connection pooling, and it has a setting for an idle connection timeout. Make use of these settings, but also make sure your Spring configuration specifies a destroy-method on the connection pool (“close” method in the case of the C3P0 pool class com.mchange.v2.c3p0.ComboPooledDataSource), so that the connections are reclaimed (at least down to min pool size) when the server shuts down. Note this works even if server is shut down ungraceful, but obviously it will not work under abrupt machine shutdown.
Identify Proper User Cases
This can be a challenge in itself. Ideally, these can be identified by existing performance data, perhaps by web server or HTTP server log analysis, looking at query execution times, etc. However, this is not always possible because such data does not exist before you begin load testing. Using a profiling tool might help here, because it is better to measure than guess. However, often it is helpful to find some use cases that seem like good candidates for profiling, given that there might be a large number of cases out there, and it is not possible to profile every one, or perhaps differences in performance do not become apparent until you hit the application with load.
We identified such cases by combining knowledge of
(i) the use cases most commonly exercised by users or some proxy for this (perhaps the links at the front of your site)
(ii) queries with particularly complex joins or contain conditions that are not well indexed,
(iii) calls to other external systems for which network communication latency would be the bottleneck.
Existing load data was not available, or we would have made use of this instead. We tried to first identify those use cases that were most likely to fit these critieria, and then we further categorized these so as to narrow down to a small set of use cases that will realistically exercise the most likely problem points of the application.
Clearly Identify Your Load Testing Goals
Are you trying to find the breaking load for the application, e.g. find out how far the app can scale? Are you looking at response times under realistic load? Are you trying to identify use cases with performance problems? This identification can affect which use cases you choose and how you perform the tests.
Determine Appropriate Load
Exactly what load should you use in your load test? If your users have some idea of what the load will be, maybe because there is a previous web site that you are now replacing, use this as a starting point, but you will likely have to double, triple, or use an even greater scale factor to determine the real load for testing, given that load can increase unexpectedly, especially if you succeed in improving the site significantly. To be safe, you probably should try to find the breaking point for the test, and then try to determine if that maximum load could ever be realistically exceeded.
Build Up Loads Gradually
If you perform a test that submits requests at a set time interval, note load can increase quickly, due to fact that previous requests might not have been completed before new requests are submitted. This is why it pays to start out with minimal loads (say 1 user) and then build up to higher, more realistic loads to get an idea how the app behaves under load.
Account for Database Caching
One trickiness with load testing: if you perform multiple requests but each request retrieves the same read only data, caching may come into play, which can skew your results. See if there are ways you can introduce variability into your load testing scripts.
Perfrom Several Load Testing Phases
1) Simple tests of our automated scripts and will generally be run against a highly controllable environment (maybe a desktop server) with a load perhaps of only 1 user.
2) We ran more extensive tests against our test server machine, first with individual test cases and then combining multiple test
cases, gradually stepping up load, to see whether server performed adequately under load. Specifically, we were trying to see whether server could handle expected loads (with a scaling factor for expected increase in load once server is in production) with reasonable response times, no timeouts, reasonable server load, no memory leaks, etc.
Any issues we identified we investigated using a Java profiling tool.
3) Rinse and repeat for future iterations if changes need to be made to meet performance targets (in our case we met performance targets so this was not necessary).
4) Following this we will need to perform iterations to determine maximum possible server load (try to break the
application by increasing load until requests start to fail or response times become unacceptable) in order to understand the limits of the app.
5) In addition, we will need to perform another few iterations to tweak settings to achieve maximum performance and scalaiblity, i.e. adjust Java heap size, play with connection pool settings, app server
settings, etc.
6) It may be a requirement on your project, as it is on ours, to also run iterations in which real clients join the load so they can get a personal feel for how the app performs under load.