Wednesday, October 26, 2011

Debuggers lie

Not always, of course, but when debuggers to lie, they can drive you nuts.

Case in point: I'm noticing a 412 Precondition Failed error while loading some relatively static json and xml files off the data server. The result is that I don't get the file. (Actually, I'll get the file once or twice, but then no more.)

Feels like a Browser Cache problem. Only happens on Safari, I've never seen it on Firefox, Chrome or IE.

But, 412 is generated by the (apache) server, not the browser. It's the server saying:
Hey browser, you set some conditions to get this file and those conditions were not met, so I'm returning 412-Precondition Failed.
Request/Response Headers as seen in Safari WebKit debugger
Cool. So I look at the Request Headers being sent with the POST. I see that the headers are exactly the same between the working request and the non-working requests. (This via Safari's built-in WebKit debugger.) Huh? Exactly same request to the same server results in the server replying with a different result?

Investigating why Apache httpd sends 412, I see that often it's a security violation as generated through mod_security module. Problem is, I'm not using mod_security module on that part of the site, so that's not it.

Since clearly the server is sending the error, I download copy of source for the httpd -- Thank god for open source -- Fire up emacs and igrep-find 412.

The only references in the source code surround web-dav (which is not relevant here) and the parsing of If-Modified-Since / If-None-Match headers. Looking at the debugger, neither of these headers are sent, but this feels more relevant.

Using javascript, it's pretty easy to grab and print out response headers, as the result of a query. It's not so easy to print out the request headers: In fact, I don't know how to get them all, except through something like a debugger.

So, bring up a different debugger. I've used Ethereal since the Bush père Administration. The latest incarnation is called WireShark, and remains an terrific tool.

WireShark easily captures the full dialog between my local computer (hosting the browser) and the remote server. I can see the succession of:

>> HTTP GET /test/foo.html
<<  HTTP/1.1 200 OK (test/html)
>> HTTP POST /test/foo.json
<<  HTTP/1.1 200 OK (application/json)
>> HTTP POST /test/foo.json
<<  HTTP/1.1 412 Precodition Failed

The #1 get is a test HTML file which includes javascript which loops on the POST request. You can see the first POST works, but the second fails.

WireShark shows the headers being sent by the POST request, but here I note that the headers differ from what Safari was telling me. Specifically, there's both an If-Modified-Since and If-None-Match.


Combine that observation with the Apache source code and we're on to something:
Safari is sending headers to which Apache replies with 412, and in response, Safari stops with that error.
Not what Safari told me it was doing.


So what's going on? Well, as typical with standards, bits which are implemented by different organizations behave "as according to spec", but don't form a coherent whole (or perhaps simply form a complicated whole...):


Consider first the GET request. There's an optimization that allows the webserver to send a short message 304 Unmodified if your current request is similar to a previous one: essentially the browser says, "GET /foo/index.html, I already have version XXX", the server either sends the resource, or sends "304 Unmodified". (This is called a Conditional GET because the request includes If-None-Match and/or If-Modified-Since headers... if no conditional headers, then the server must send the resource rather than the code.) When the resource requested is very large,  the difference in message sizes between the full resource and the simple code can be significant. Fortunately, when your browser gets 304 Unmodified, rather than 200 Success, the browser goes to it's cache and pulls the resource from there: the rest of your browser (and javascript) doesn't notice the difference.


Compare that behavior with the POST request. According to RFC2616, 304 can be sent only in response to GET, not POST. Instead Apache looks at the incoming request with the If-None-Match and If-Modified-Since and thinks, "No, I don't have a reply that is modified since the date provided, and the only response(s) I do have would return the same Etag matching in the If-None-Match, so I must return 412." This is a difference between POST and GET, if it appears to be a duplicate response, POST will return 412 Precondition Failed, but GET with return 304 Not Modified. See section 14.26 in RFC2616.


It makes sense, kind of. Since POSTs are generally considered more like commands, if you're posting twice, with the same potential results, the server instead rejects the request.


Now, since I wasn't seeing the problem with either Firefox or Chrome, what's going on there? Simple: on POST requests, neither browser sends the conditional commands and therefore the server doesn't reject them.

What's the proper behavior? Well, the server looks like it's doing the right thing, so Apache is in the clear. RFC 2616 section 9.5 says:
Responses to this method are not cacheable, unless the response includes appropriate Cache-Control or Expires header fields.
Which it does, if you've set ExpiresActive directive for Apache httpd.

Since the browsers are doing different things, one can look there for blame. But here, it would appear that Safari is doing the correct thing: the server said the resource is cacheable, Safari cached it. On subsequent requests, Safari say it has a cached version. Server says "precondition failed" rather the "unmodified" because it's a POST request, so Safari says "failed" and does not retrieve from cache -- how can it? It wasn't told the response was unmodified, it's merely been told one or more preconditions failed.

But in the end, it never seems to matter who's to blame: you need to figure out the best way to work around the differences. Such is the life of a web master.


So, what to do? Some options:
  1. remove Expires directive from apache. That way, the browser won't be tempted to send conditional requests. Technically, removing ETags is sufficient (FileEtag None directive)
  2. hack my javascript such that when I get a 412 on a POST, immediately redo the request as a GET (since I know it's a duplicate!) and get the 304 error, and have the browser pull from cache.
  3. just don't use POST to get static files
Option #1 is quick (since I have access to apache files). The real answer is probably #3, but that troubles me because that puts a burden on me to know when the response is a static file (which may have an ETag) versus when the response is dynamically generated (where it won't have an ETag)... The client shouldn't need to know that.

Seems to me the best solution would be to allow server to return 304 Unmodified... If I send a conditional request and the response hasn't changed from an earlier response, the 304 fits the bill. Code 412 is still useful, of there are other preconditions which may fail, but I don't see value in separating GET and POST behavior in this instance.

Finally, to wrap up the real point: Good tools allow you to see what's going on. Bad tools lie.

2 comments: