Tracking users from post processing the access.log

Most log analysers do not produce all statistics that a site might want to track. One of the important usage statistics


Tracking users from post processing the access.log

Posted on: July 10, 2010

Most log analysers do not produce all statistics that a site might want to track. One of the important usage statistics that needs to be calculated is to track the number of active users (i.e. users that are actively using the system - not idle users) to try and correlate performance events with level of usage on the system.
It is possible to post process the access.log files using a set of UNIX script to summarize the information as the basic information such as userid and date time are contained in the file. The key to processing this data is to:


  • Filter out the superfluous data and isolate the userid, date and time from the access.log records. Remember a record is written to the access.log when the user is actively seeking a resource on the web server. This is a definition of an "active" transaction record.

  • Summarize that subset by the desired time frame (say each minute of the day). This is to cut down the information to the basics that are needed to for analysis. After the summarization you should end up with a unique record for each user in that minute of the day for the dates you require.

Note: The code below is not optimized and is provided as is. I am not a great script programmer and I am sure some of you out there would improve on it.

The script below is an example of such a set of commands to summarize that information:

#!/usr/bin/ksh
# Find all the access.log files in the log directory
# 1) We read each record from the access.log files (cat)
# 2) We filter out the console (grep)
# 3) We get the date time and userid from the access log record (cut)
# 4) We substring the date/time to chop off the seconds, the first "[" char and move the fields around (awk)
# 5) We filter out any records containing "-" (grep)
# 6) We isolate duplicates within that minute (sort)
# 7) We comma seperate the date/time and userid fields (sed)
# 8) Change the ":" within the date/time field to seperate date and time (sed) - Only the first one which should be at the end of the date
# At the end of pass 1 you should have a CSV that has a unique user entry for a date and time to the minute level. This implies that the user was active for that minute on that date.
ls -1 $SPLEBASE/logs/system/*access.log | while read _file
do
echo `date` : Started Parsing ${_file}
echo `date` : Pass 1 Started - Creating user list by minute
export _outfile=${_file}_userlist.csv
cat ${_file} | grep -v console |cut -f3,4 -d " " | awk '{print substr($2,2,17),$1}' | grep -v "-" | sort -u | sed -e "s% %,%g" | sed -e "s%:%,%" >${_outfile}
echo `date` : Pass 1 Ended - Check ${_outfile} for results
echo `date` : Lines created: `wc -l <${_outfile}`
done
echo `date` : Parsing finished

The above set of commands creates a file containing the date, time (to minute only) and userid and is comma delimited.
Note: Additionally the above information with coupled with information from SC_USER table will yield the Name of the user. It can also be used to determine who did not logon on (as you have the users who were active, the missing users are the ones that DID NOT login in the collection period).
This information may be loaded into data or post processed by another script to count the number of users in each minute of the day. The script below is such a script, which takes the files produced by the last process and creates active user counts:


#!/usr/bin/ksh
# Find all the *_userlist.csv files in the current directory
# 1) Sort the file to remove duplicates (just in case the files were combined)
# Otherwise you might count a user twice.
# 2) For each date/time combination work out how many users are active.
# 3) Output a date, time, number CSV file called _summary.csv
ls -1 *userlist.csv | while read _file
do
echo `date` : Started Parsing ${_file}
export _tmpfile=$$.tmp
cat ${_file} | cut -f1,2 -d "," | sort -u >${_tmpfile}
echo `date` : Pass 2 Started - Creating user summary
# Set the file name for the output
export _outfile=`basename ${_file} _userlist.csv`
export _outfile=${_outfile}_summary.csv
# Delete the file if it exists as you are going to append to it
rm -rf ${_outfile} 2>/dev/null
# Set the initial key values
export _key=""
export _count=0
# Loop through file and count number of instances
cat ${_tmpfile} | sort -u | while read _key
do
export _numrows=`cat ${_file} | grep "${_key}" | wc -l`
echo ${_key},${_numrows} >>${_outfile}
done
rm -rf ${_tmpfile} 2>/dev/null
echo `date` : Pass 2 Ended - Check ${_outfile} for results
echo `date` : Lines created: `wc -l <${_outfile}`
done
echo `date` : Parsing finished

This will produce a file that contains date, time (to the minute) and count of active users in a comma delimited format that can be loaded into Excel to produce relevant graphs.

http://blogs.oracle.com/theshortenspot/2010/07/tracking_users_from_post_proce.html
Home

TagCloud:                            

Recent Posts

What Should CEOs Do Differently?
Generating Complex Events from a Partitioned Stream
How Will Google Instant Affect Your Company's SEO?
Apple Relaxes Restrictions on Mobile App Development
Amazon Upgrades Checkout, Plays Catch-up with Google, PayPal
For Advertisers, Location-Based Services "Blew Up Overnight"
Google launches 'Instant' search
Police in 14 Countries Raid File-Sharing Hosts And Hit Close to Wikileaks
Call 911? Maybe not from a cell phone
Bahraini Blogger Arrested, Probably Tortured

Categories


Archives

  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • Sponsors

    Partners

    Programming Tool Links Webmaster tricks and tips Graphic Design Links
    Add Web Tool links Web Design Tool links IT plugin Links


    Sitemap | Google Sitemap | RSS Feeds