We are running the latest CF 9 server running JVM 1.6_26 on a Win2003 server with an i7 processor and 8GB of ram.
Here is the JRun config:
java.args=-server -Xms4096m -Xmx4096m -Dsun.io.useCanonCaches=false -XX:PermSize=512m -XX:MaxPermSize=512m -XX:+UseParallelGC -Dsun.rmi.dgc.client.gcInterval=600000 -Dsun.rmi.dgc.server.gcInterval=600000 -Dcoldfusion.sessioncookie.httponly=true -XX:NewRatio=3 -Xbatch -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib -Dcoldfusion.classPath={application.home}/../lib/updates,{application .home}/../lib,{application.home}/../gateway/lib/,{application.home}/.. /wwwroot/WEB-INF/flex/jars,{application.home}/../wwwroot/WEB-INF/cffor m/jars
For the past few weeks, every couple of days the CF server grinds to a halt.
Using SeeFusion we can monitor the requests and see them just starting to stack up.
We are typically alerted to the brewing problem when our application starts sending notices
that SESSION variables are undefined. The interesting part is that typically the line
where the error occurs is after the variable has been checked if its defined:
For instance:
<cfif NOT IsDefined("SESSION.User")>
<cflocation url="somewhere">
</cfif>
Hi <cfoutput>#SESSION.User.GetUsername()#</cfoutput>
Reports an error USER IS UNDEFINED IN SESSION on the output line AFTER the variable has been checked for existence meaning
to me that somewhere in the middle of processing the thread, memory is getting screwed up.
Anyway, after starting to see random errors like this we log into SeeFusion and see that
memory usage is running at about 85% and simple page requests are stacking up.
I can force a full GC cleanup in milliseconds but it doesn't do anything for memory usage.
The page response times begin to climb.
At first we though it might be some long running page or report on our site but looking at the actively
running requests we see nothing intensive which could be causing the issue. Looking at the
task manager, the processes on the server are all running at 0% execpt for JRun which is hovering around
15% to 18%.
The problem isn't in the database either. Our MySQL database shows no long running queries, hung processes, or crashed tables
the application could be stalling over.
All of thisleads up to the site slowing to a crawl and then becoming completely unresponsive while JRun chugs along
at 15% and memory never maxes out. This never causes any errors in the logs ie memory heap errors or connection timeouts.
Its just crawls along. I've never let it sit in this state for more than 5 or 10 minutes so I don't know if it would eventually come back.
The only way so far to bring it back it to restart the CF server at which point everything returns to normal.
In other types of situations like this I've seen JRun peg out at 100% or more percent or memroy is pegged at 100% with an eventual heap error
or the database is locked up causing the app problems. But none of that happens here.
I'm truly stuck as to how to continue to diagnose and fix this problem.
Any help would be awesome. Thanks.