Report Processing

Purpose

Document the setup of report processing.

Failure Points

Below is a description of how reporting data is collected and processed. Each step represents a potential point of failure for the reporting system. Italicized items represent unrecoverable failure points.

  1. User requests template.
  2. Template contains appropriate reporting crumbs.
  3. User's browser requests crumb images from reporting collection server.
  4. Crumb request is rewritten to a clear gif by a Rewrite rule in the apache configuration.
  5. Request for crumb is logged in the reporting collection server's access_log in CrumbLog format.
  6. Access log is archived after analog processing.
  7. Archived data file is moved to processing area.
  8. Archived data file is parsed and converted to a loadable format.
  9. Loadable format data file is sorted to help efficiency.
  10. Loadable format data file is loaded into the database.
  11. For mdTransit, the Transit Report table is updated to improve efficiency of reports about search behavior.

Note: Steps to ensure a log file is not processed twice should be taken as duplicate processing must be corrected by removing affected data from the database (this must be done manually) and then rerunning data for affected dates (which typically requires two or more access logs to be merged manually.

Database Setup

Please refer to the database preparation guide for database items that should be set up for reporting.

Processor Installation

Place the contents of the Processing folder on the system you will perform report processing on. This will be referred to as ${PROCESSING_DIRECTORY}.

Configure the file ${PROCESSING_DIRECTORY}/conf/CrumbLog.cfg.

Set the environmental variable CRUMBLOG_HOME to ${PROCESSING_DIRECTORY}.

Make sure the environmental variable ORACLE_HOME is set appropriately.

Make sure both of the previous variables are set appropriately when the processor is called via cron.

Access log processing details

Reporting server access_log is rotated by customer created process. A sample program called run_nightly.pl is included for reference. Access log should be rotated and stored in format $LOG_FILE_NAME.YYYY-MM-DD. Once logs have been copied/transferred for processing they may be handled according to usual customer practice.
A customer created process is started that should do the following steps for each file that needs to be processed:
Rotated log is transferred for processing to ${ACCESS_LOG_DIRECTORY} (this value is set in the CrumbLog.cfg file.
Access log is parsed CrumbLog program is run with -P option to parse data
Parsed file is sorted sort.sh script is run which calls the sort command on the parsed data file to improve efficiency of load
Parsed file has quotes escaped (mdTransit only) escaper.pl is run on sorted file to escape quotes that would cause problems during loading.
Optional: Line counts are logged wc -l is run on parsed file so that a calculation of the efficiency gained by sort program can be performed. This calculation is done by comparing the line count value in the customer created processing controller log file to the Total number or record(s) processed line in the CrumbLog log file.
Parsed file is loaded CrumbLog program is run with -L option
Transit Report Table is updated (mdTransit only) TransitReportUpdate program is run
Filesystem maintainance Files are removed from ${ARCHIVE_DIRECTORY} (this value is set in the CrumbLog.cfg file). If file archiving is set to false, files must be removed from ${ACCESS_LOG_DIRECTORY} after each processing run.