Using VisualVM to improve Liferay performance

User Rating: 0 / 5

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive
 

Using VisualVM to find and resolve performance issue in Liferay

Introduction

This is a follow-up article to my previously published article on testing Liferay performance. In that article I idenfitied a performance problem when testing 7500 different URLs, where Liferay was only able to serve 26 requests per second. In this article I show how I used VisualVM to look at the garbage collection and the configuration and statistics of Ehcache to increase the performance by 10-15 times, to about 200-300 requests per second.

My testing shows that using hardware costing about $1.000 to run Liferay 5, I am able to serve between 300 to 2200 requests per second. This test scenario mimics hosting a web site consisting of several thousand pages, with at least 1.000 users simultaneously browsing the site. Note that this does not take the bandwidth to the Internet into consideration, since I have tested on a Gigabit LAN.

VisualVM

VisualVM is part of the Sun JDK 1.6 Update 7 and onwards. I downloaded the latest version, 1.0.1, for my testing. I also installed the VisualGC and VisualVM-MBeans plugins from within VisualVM.

To give you a quick overview of some of the very useful information you get from VisualVM, I present a screen grab from the VisualGC plugin taken during my Liferay testing, it shows the garbage collection status for Tomcat (click image to see the full image):

Screengrab from VisualVM - VisualGC

Ehcache

Ehcache is the caching mechanism used by Liferay, and it provides some JMX MBeans that can be used to look at the configuration for Ehcache and the statistics of the caching. I needed both of these aspects to be able to solve my performance problems. In order to be able to view the JMX information available in Ehcache in VisualVM, or any other JMX viewer for that matter, I had to patch the Liferay code slightly, see the further down on this page for more information.

Here is a screengrab that shows the Ehcache cache statistics JMX Bean, displayed in VisualVM (click image to see the full image) :

Screen grab showing cache statistics

 

Clues to finding cause for Liferay performance problem

When I only tested one URL, I was achieving a throughput of 2200 request per second. When I tested 7500 different URLs, the throughput was only 26 requests per second, and the CPU utilization on the server was very close to 100%.

This lead me to conclude that Tomcat was having to either compile or render each JSP "from scratch", and that very little caching was being performed on Liferay. I got some useful input in this post on the Liferay forum, which suggested even more that it was a caching problem.

I was also thinking that it might be that the default settings in the Ehcache configuration of Liferay was causing items to be dropped from the cache.

Actual performance test

I executed the SQL update query against the Liferay database as suggested in the forum post, making the journal templates cacheable. I then reran the JMeter test script, but I was not seeing any performance problem. Then I unpacked the "portal-impl.jar", and had a look at the "ehcache/*.xml" files. I edited the "hibernate.xml" and "liferay-single-vm.xml" Liferay Ehcache configuration files, increasing the "maxElementsInMemory" by a factor of 100. I reran the tests, and saw now difference. Then I increased the "maxElementsInMemory" by a factor of 100 again, so the values used where between 50.000.000 and 100.000.00. I reran the tests, and saw no difference.

At this point I knew something odd was going on. So I downloaded VisualVM 1.0.1, which I had a quick look at some weeks ago. Connecting the VisualVM to my Tomcat instance, and rerunning the JMeter scripts, I could see in the VisualGC plugin that the heap size was only around 2 Gigabytes, even though max heap size was at 6.5Gb.

So clearly there wasn't much caching going on, or at least the amount of heap size was not limiting the number of items being cached. I then started to wonder if my Ehcache config files were being picked up by Liferay, so I wanted to look at some JMX beans from Ehcache.

Patching Liferay to expose Ehcache JMX beans

I looked further into Ehache, and noticed that method calls had to be made to Ehcache so that Ehcache would register the JMX beans it contains. So I had to patch the com.liferay.portal.cache.EhcachePortalCacheManager class. I added the following import statements :

import javax.management.MBeanServer;
import java.lang.management.ManagementFactory;
import net.sf.ehcache.management.ManagementService;

and I changed the "afterPropertiesSet" method to look like this, by adding the last three lines :

    public void afterPropertiesSet() {
        URL url = getClass().getResource(PropsUtil.get(_configPropertyKey));

        _cacheManager = new CacheManager(url);
        // Register Ehcache JMX beans
        MBeanServer mBeanServer = ManagementFactory.getPlatformMBeanServer();
        ManagementService.registerMBeans(_cacheManager, mBeanServer, true, true, true, true);
    }
 

Being able to see the Ehcache JMX beans in VisualVM when testing Liferay is so useful that I suggest that Liferay upgrades to Ehcache version 1.5, and introduces config setting to control wether or not the Ehcache JMX beans should be enabled. I have registered this enhancement suggestion as LEP-7489.

 

Improving performance of Liferay

After rerunning the test with the Ehcache JMX beans exposed in VisualVM, I noticed that some of the "maxElementsInMemory"  were still quite low, at either 1.000 or 10.000. I then realized that I should have edited the "liferay-multi-vm.xml" file as well. Failing to edit this file was causing a lot of cache misses for "JournalContentUtil", which I think is caching the articles displayes on the Liferay pages. The files was edited by increasing the max elements cached, Tomcat was restarted and JMeter scripts rerun.

 

Using VisualVM I could then see that the cache statistics went from  :

Cache hits : 45.021, Cache misses : 43.764, Cached objects : 1.000

to:

Cache hits : 17.571, Cache misses : 11.045, Cached objects : 8.930

Focus on the cache misses / cache hits ratio and the number of cached objects. The reason for more cache hits and cache misses in the first cache, is that more JMeter script has been run with more samples in the first cache.

 

And the throughput of Liferay went from 26 requests per second to about 300 requests per second.

 

Large number of cache misses for PermissionCacheUtil

When rerunning the JMeter script several times, without restarting Tomcat, I noticed that the journal content was being cached all the time, and very few cache misses occured for subsequent test runs. But there are still plenty of cache misses for "PermissionCacheUtil", in fact, there are several million cache misses. I do not know how costly the retrieval of the permission information is, but I am suspecting that this can be the cause of the performance only being 300 requests per second versus 2200 requests per second when only testing one URL.

This is what I am seeing :

Cache hits : 2.308.430, Cache misses : 2.311.793, Cached objects : 2.474

Liferay error messages in Tomcat log

I also noted that I kept getting thousands of these error messages in the Tomcat log :

13:15:01,955 ERROR [velocity:154] Left side ($border.getData()) of '==' operation has null value. If a reference, it may not be in the context. Operation not possible. com.liferay.portlet.journal.util.JournalVmUtil [line 10, column 34] 

I am suspecting that this is a Liferay bug somewhere, and that this is causing the page where this occurs to not be put into the cache. There is also good reason to suspect that this error is caused by some exception, which should further hurt the performance.

What is the difference between the "com.liferay.portal.kernel.dao.orm.FinderCache" and "com.liferay.portal.spring.hibernate.FinderCache" caches ?

The "com.liferay.portal.spring.hibernate.FinderCache" is the one that is available for configuration in the "liferay-multi-vm.xml" file, but my testing shows that this cache is not used at all, there are no cache hits nor cache misses. However, the "com.liferay.portal.kernel.dao.orm.FinderCache" has the following statistics :

Cache hits : 3.191.497, Cache misses : 7.801, Cached objects : 6.651

So I guess it was rather the "com.liferay.portal.kernel.dao.orm.FinderCache" that should have been exposed for configuration in the "liferay-multi-vm.xml" file.

I have reported this as a Liferay bug : LEP-7490

 

Areas needing further investigation

  • Try to find out why there are so many cache misses for the "PermissionCacheUtil". It could because I'm trying to access URLs that are protected by login.
  • Test on Liferay 5.1.1, to see if the error message for the "border.getData" is gone, and if so, if the performance increases 

If you have comments to this article, you can reach me at This email address is being protected from spambots. You need JavaScript enabled to view it.

Additional information