Evolution of Log Management

Log management (sometimes referred to as SIEM) has evolved over the years that I have been working on it.  I have seen several significant stages of how organizations create, collect, search, and report on their log data.  It is interesting to look at the past decade of log management and try to think about what will influence the next major changes.  I’ve tried to summarize some of the stages from my perspective:

1990s – Roll your own logging solution

A decade ago most organizations were doing spotty log analysis.  Usually a log server was setup for a specific application or device on the network.  An engineer would set up a syslog server and run some perl and grep scripts against some existing log data.  Log files were rotated on a scheduled interval and only the most knowledgeable engineer could write the scripts and know what to look for in the data.  Occasionally, a perl script would run a cgi application to post the log data to a web server so the log data could be viewed through a web browser.  Application developers were typically writing the log analysis tools for their own application.

Best practices were not well documented and there were few organizations that specialized in analyzing logs.  Security focused organizations like SANS and CERT had written a few articles about log management, but there were no guidelines like ITIL and no regulations existed that would require companies to review any of their log data.  Logs were not perceived as very important and in many cases were not reviewed or stored for very long.

There weren’t many commercial tools that made centralized log collection, retention, reporting, and alerting easy.  Very few organizations wanted to develop their own solutions and there wasn’t a big enough financial incentive for executives to  invest in those technologies, but that was all about to change.

2001 – Security Information Management (SIM)

Security was the first comprehensive driver for IT organizations to look at their log data.  Security Information Management tools came along soon after the proliferation of Intrusion Detection Systems (IDS) and were primarily used to reduce the number of false positives those systems were notorious for generating.  The goal of SIM products was to uncover the real security issues from the noise that was in the log data.

SIM applications identified and alerted users to security threats by normalizing and correlating log data from multiple sources.  For example, if an IDS sent a log message that an attack signature was recorded on the network and  vulnerability assessment software indicated that that system was vulnerable to that kind of attack, then it alerted someone to the attack.  This correlation became especially useful between security, network, and system devices.

SIM vendors needed to develop log parsing software and normalization for each source device they collected so they could run them through their correlation engines.  This made SIM tools very expensive and difficult to maintain because it required an organization to constantly write correlation rules about what was normal traffic.   It usually required a staff of very talented (and expensive) security experts.  It also required a large investment in hardware and software because correlation rules required a lot of CPU cycles and memory.

SIM tools were very useful because they identified and alerted their users to potential security issues without having to read through all the log data, but because of their complexity and expensive price tag, a lot of organizations couldn’t use them.  One valuable feature that the SIM products showed to all organizations was the centralized collection and retention of log data that was then available for reporting and forensics analysis.

2004 – PCI and log management appliances

In 2004 the Payment Card Industry Data Security Standards (PCI) were created.  They set a requirement that any business that wanted to use credit cards for payments had to implement a long list of security controls that included log management.  Those best practices included how long to keep the log data (1 year), in what format (tamper proof), how often someone had to review it (daily), and many other requirements.  If organizations wanted to continue to collect credit card payments they would have to meet the guidelines and be audited regularly or they would face financial penalties.

When PCI was created companies across several industries started looking for log management solutions.  For more than security reasons, they needed to meet PCI guidelines.  The IT department that was required to meet the guidelines was typically short staffed and so the most likely choice was to acquire tools that wouldn’t take much time to setup and maintain, but could meet all the requirements.  IT managers often turned to log management appliances that would collect the log data, parse it into reports, and provide secure storage of the sensitive data.

These log management appliances often were built for creating reports that could be used for security and to pass compliance audits.  The appliance would collect the log data and then parse it into a relational database for normalization and reporting.  These appliances excelled at finding the log data from a specific time period, but as the volume of log data increased or the period that people wanted to analyze it grew, the database became very slow.

Most of the SIEM and log management products support specific source log types because the the products need to parse the log data into common formats or database schemas for analysis.  These common formats allow the software to apply meaning to the log data so it can be used for correlation rules or a common reporting engine (i.e.. compliance reports).  These common formats or database schemas also prevent the products from being flexible enough to easily support other applications.

2008 – Full text indexing and fast search

By 2008 most log management products had added full text indexing to add search and troubleshooting capabilities for applications, especially the ones they weren’t going to build parsing code or support.   By indexing all messages, or full text indexing, the data could easily be searched quickly for key words and organized by words instead of parsed fields.  Usually a ‘word’ in a full text index was anything that was between two delimiters like a space or a comma.  Full text indexing allowed all the words, or the full messages, in a log file to be put in an index and retrieved when the software searched the data.  Simple reporting could be generated by doing a full text index search and then parsing the results, thereby parsing only the records that meet the search criteria rather than the whole data set.

Searching log data and using the simple reports changed the way organizations used log data and identified the information they needed.  It is the best way to find the needle in the haystack, but it still requires talented engineers to know what to search for in their applications.  Full text indexing and search also allow for less work on application logs that were not traditionally supported by vendor common formats or database schemas.

Searching could be used for forensics or troubleshooting an application, but since the data was not parsed and normalized  into fields it couldn’t be used effectively for security correlation, anomaly detection, or other analysis that require semantic information or relationships stored with the data.

2010 and beyond – What is next?

Log management will continue to evolve over the coming years.  With the volume of log data growing an average of 20% (in some cases over 100%) annually at organizations, the log management tools that were designed five or ten years ago are not scalable for the log volume expected tomorrow.  Scalability, flexibility, and functionality need to continue to evolve in log management products.

Several developments are changing the way logs are utilized:

  • The Cloud.  Many people point to the cloud as the future of log management.  Cloud applications will definitely impact who owns the log data and where it originates.  Also, it should change the volume and complexity of centralized log management.  Data center logs won’t go away; they will probably continue to grow.  Most organizations will need to come up with a strategy on how to monitor the log data that they have in the data center and the log data from cloud applications.
  • HIPAA.  In the US. HIPAA regulations have the potential to drive log management and security in the health care and insurance industries.  Also, there are regulations coming in European countries that will influence many international organizations.
  • New Applications.  Fraud detection and cyber warfare are driving many of the financial and government organizations to make large investments in SIM and log management tools.
  • Big Data.  Hadoop/Map-reduce and NoSQL applications are often discussed as having potential for large scale log management applications.  I see more and more organizations turning to these tools for their largest log data archives and problems.  I heard from one hadoop vendor this year that over half of their customers were using hadoop for log analysis.

As log data continues to grow in organizations and the requirements for identifying security, application troubleshooting, and business intelligence increase there will be a need to utilize new techniques or technologies.  This is what keeps me interested and excited about the future of log management.  I’m just trying to stay ahead of the curve.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s