Skip to content

Browser cachability issues with CloudFront

Published on Oct 11 2011

In this post, I intend to illustrate some potential issues with browser cachability when using Amazon CloudFront and some possible work-arounds. CloudFront users should be aware of these issues, because sending stale responses to users may have a big impact on the user experience.

One of the Holy Grails of web performance is browser caching. Google lists it as the first item in their Web Performance Best Practices.

First, some primer into web-caching. If you are well versed in the topic, skip this section.

How does browser caching work?

The Expires and Cache-Control response headers are responsible for indicating the cachability of an object to browsers (and proxies). Expires is a part of HTTP/1.0 whereas Cache-Control was introduced in HTTP/1.1 . On HTTP/1.1 compliant agents, Cache-Control supersedes the Expires header. All browsers released in the last ~10 years implement HTTP/1.1.

Lets examine the response headers for the URL : http://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js

sajal@sajal-desktop:~$ HEAD http://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js
200 OK
Cache-Control: private, x-gzip-ok="", max-age=31536000
Connection: close
Date: Sat, 08 Oct 2011 15:11:26 GMT
Server: sffe
Vary: Accept-Encoding
Content-Type: text/javascript; charset=UTF-8
Expires: Sat, 08 Oct 2011 15:11:26 GMT
Last-Modified: Thu, 22 Sep 2011 14:12:07 GMT
Client-Date: Sat, 08 Oct 2011 15:11:26 GMT
Client-Peer: 209.85.175.95:80
Client-Response-Num: 1
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block
sajal@sajal-desktop:~$ 

HTTP/1.0 agents see the Expires: Sat, 08 Oct 2011 15:11:26 GMT header and takes it as a permission to keep the object into cache untill Sat, 08 Oct 2011 15:11:26 GMT which is equivalent to saying this asset shouldn't be cached. HTTP/1.1 agents look at the Cache-Control: private, x-gzip-ok="", max-age=31536000 header, and specifically the max-age=31536000 portion, allowing the agent to keep it in cache for 31536000 seconds or roughly one year since the value of the Date header which is Sat, 08 Oct 2011 15:11:26 GMT.

There are other directives in Cache-Control, but for the sake of this discussion we ignore them. Basically the directive private says that the object should only be cached on the browser and not at upstream proxies. If the directive was public then any upstream proxy could cache it.

Note: Many modern HTTP/1.0 agents implement some or all of HTTP/1.1 features

Expiration Calculations

HTTP/1.0

On HTTP/1.0 agents, the Expires header is solely responsible for determining cachability. The behavior of Expires is defined in section 10.7 of RFC 1945.

  • If Expires is equal or earlier than the Date header, the object should not be cached.
  • If Expires is later than Date, then the object can be cached until the date as specified in the Expires header.

HTTP/1.1

HTTP/1.1 has a lot more complex algorithm and options for calculating cachability of the object. Calculate the following in seconds (simplified version).

  • Age: Age of the object is the current time on the client minus the Date value.
  • freshness_lifetime: This is the max_age_value if present, else Date minus Expires is used.
  • cache_ttl: freshness_lifetime minus Age

Now, if cache_ttl is zero or a negative value, the object is stale and can't be cached. If cache_ttl is a positive, it can be kept in cache for that duration.

Take this example:-

Cache-Control: max-age=31536000
Connection: close
Date: Sat, 08 Oct 2011 15:11:26 GMT
Server: sffe
Vary: Accept-Encoding
Content-Type: text/javascript; charset=UTF-8
Expires: Sat, 08 Oct 2011 15:11:26 GMT
Last-Modified: Thu, 22 Sep 2011 14:12:07 GMT
Client-Date: Sat, 08 Oct 2011 15:11:26 GMT
Client-Peer: 209.85.175.95:80
Client-Response-Num: 1
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block
  • Age = Sat, 08 Oct 2011 15:11:26 GMT - Sat, 08 Oct 2011 15:11:26 GMT = 0 seconds
  • freshness_lifetime = 31536000 seconds
  • cache_ttl = 31536000 seconds - 0 seconds = 31536000 seconds = 365 days

Now, if the Date was Thu, 08 Sep 2011 15:11:26 GMT

  • Age = Sat, 08 Oct 2011 15:11:26 GMT - Thu, 08 Sep 2011 15:11:26 GMT = 2592000 seconds
  • freshness_lifetime = 31536000 seconds
  • cache_ttl = 31536000 seconds - 2592000 seconds = 28944000 seconds = 335 days

But, if the Date was Wed, 08 Sep 2010 15:11:26 GMT

  • Age = Sat, 08 Oct 2011 15:11:26 GMT - Wed, 08 Sep 2010 15:11:26 GMT = 395 days = 34128000 seconds
  • freshness_lifetime = 31536000 seconds
  • cache_ttl = 31536000 seconds - 34128000 seconds = -2592000 seconds

In the last example, the cache_ttl value is a negative value. The object is already stale when it reached the browser, hence it should not be cached.

In the above example, if the Age response header was present, we should use that value for the cache_ttl calculations.

Proxy behavior of CloudFront vs other CDNs

As per RFC 2616, proxies are supposed to cache the Date and Expires headers from the origin. These are useful indications to the end client about the file properties. CloudFront follows the RFCs and sends cached Date and Expires values, however all other CDNs I tested act as if they were the origin and generate a fresh Date header.

Lets assume you have just implemented CloudFront for your newly launched website, and decided to use Cache-Control: max-age=2592000, thus allowing CloudFront (and browsers) to cache your object for 30 days. The Date header is cached at CloudFront, the max-age value also remains the same, what changes is the current time at your visitors' browser. 5 days after launch, your object would only be browser cacheble for 25 days. After 29 days, it would be cachable for 1 day. If the a user visited your site on day 29, then again on day 31 then both these visits would result in a request being made to CloudFront since the first visit had a cache_ttl of one day only.

Browsers may be receiving expired content from CloudFront

Now, after day 30, when the object gets stale in CloudFront, it would make a request to the origin to check for changes in the file using a conditional GET. If it gets a 304 Not modified response, it refresh the cachability of the object, but continues to send the original Date header. All requests made to CloudFront for that object are already stale the moment it reaches the browser. All efforts you put into having the correct caching headers are now rendered useless.

Time since launch Age(now - Date) Age Header cache_ttl in Firefox and Chrome
0 days - Just Launched 0 0 2592000 (30 days)
1 day 86400 86400 2505600 (29 days)
29 days 2505600 2505600 86400 (1 day)
30 days 2592000 0 0 (not cachable)
31 days 2678400 86400 -86400 (stale -not cachable)

One thing to note here is that CloudFront sends an appropriate Age header, which is the time since CloudFront last re-validated the object from the origin. Unfortunately, not all browsers follow this. Firefox and Chrome (link goes to a page with headers from i.ticdn.com) surprisingly seem to ignore the Age header, which is too bad. IE8 which does seem to take it into consideration is not suspectable to the stale object problem, but it would still face the reduced cache_ttl problem.

Possible work-arounds

  • Have Expires/Cache-Control so far into the future that you are sure to have modified objects in the meantime. CDN Planet uses max-age of 10 years. To do this effectively, you also need to have versioned URLs, so that if you change something, the url should change too. (Tip: CloudFront ignores the querystring, so the cache buster should be in the path)
  • Purge/invalidate objects at CloudFront at least once during the duration of max-age. Be aware: CloudFront charges for invalidation requests so this can add up if you have many files.

Comments