In this post, I intend to illustrate some potential issues with browser cachability when using Amazon CloudFront and some possible work-arounds. CloudFront users should be aware of these issues, because sending stale responses to users may have a big impact on the user experience.
One of the Holy Grails of web performance is browser caching. Google lists it as the first item in their Web Performance Best Practices.
First, some primer into web-caching. If you are well versed in the topic, skip this section.
How does browser caching work?
The Expires
and Cache-Control
response headers are responsible for indicating the cachability of an object to browsers (and proxies). Expires
is a part of HTTP/1.0 whereas Cache-Control
was introduced in HTTP/1.1 . On HTTP/1.1 compliant agents, Cache-Control
supersedes the Expires
header. All browsers released in the last ~10 years implement HTTP/1.1.
Lets examine the response headers for the URL : https://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js
sajal@sajal-desktop:~$ HEAD https://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js 200 OK Cache-Control: private, x-gzip-ok="", max-age=31536000 Connection: close Date: Sat, 08 Oct 2011 15:11:26 GMT Server: sffe Vary: Accept-Encoding Content-Type: text/javascript; charset=UTF-8 Expires: Sat, 08 Oct 2011 15:11:26 GMT Last-Modified: Thu, 22 Sep 2011 14:12:07 GMT Client-Date: Sat, 08 Oct 2011 15:11:26 GMT Client-Peer: 209.85.175.95:80 Client-Response-Num: 1 X-Content-Type-Options: nosniff X-Xss-Protection: 1; mode=block sajal@sajal-desktop:~$
HTTP/1.0 agents see the Expires: Sat, 08 Oct 2011 15:11:26 GMT
header and takes it as a permission to keep the object into cache untill Sat, 08 Oct 2011 15:11:26 GMT which is equivalent to saying this asset shouldn't be cached. HTTP/1.1 agents look at the Cache-Control: private, x-gzip-ok="", max-age=31536000
header, and specifically the max-age=31536000
portion, allowing the agent to keep it in cache for 31536000 seconds or roughly one year since the value of the Date
header which is Sat, 08 Oct 2011 15:11:26 GMT
.
There are other directives in Cache-Control
, but for the sake of this discussion we ignore them. Basically the directive private
says that the object should only be cached on the browser and not at upstream proxies. If the directive was public
then any upstream proxy could cache it.
Note: Many modern HTTP/1.0 agents implement some or all of HTTP/1.1 features
Expiration Calculations
HTTP/1.0
On HTTP/1.0 agents, the Expires
header is solely responsible for determining cachability. The behavior of Expires
is defined in section 10.7 of RFC 1945.
- If
Expires
is equal or earlier than theDate
header, the object should not be cached. - If
Expires
is later thanDate
, then the object can be cached until the date as specified in theExpires
header.
HTTP/1.1
HTTP/1.1 has a lot more complex algorithm and options for calculating cachability of the object. Calculate the following in seconds (simplified version).
- Age: Age of the object is the current time on the client minus the
Date
value. - freshness_lifetime: This is the max_age_value if present, else
Date
minusExpires
is used. - cache_ttl: freshness_lifetime minus Age
Now, if cache_ttl is zero or a negative value, the object is stale and can't be cached. If cache_ttl is a positive, it can be kept in cache for that duration.
Take this example:-
Cache-Control: max-age=31536000 Connection: close Date: Sat, 08 Oct 2011 15:11:26 GMT Server: sffe Vary: Accept-Encoding Content-Type: text/javascript; charset=UTF-8 Expires: Sat, 08 Oct 2011 15:11:26 GMT Last-Modified: Thu, 22 Sep 2011 14:12:07 GMT Client-Date: Sat, 08 Oct 2011 15:11:26 GMT Client-Peer: 209.85.175.95:80 Client-Response-Num: 1 X-Content-Type-Options: nosniff X-Xss-Protection: 1; mode=block
- Age = Sat, 08 Oct 2011 15:11:26 GMT - Sat, 08 Oct 2011 15:11:26 GMT = 0 seconds
- freshness_lifetime = 31536000 seconds
- cache_ttl = 31536000 seconds - 0 seconds = 31536000 seconds = 365 days
Now, if the Date
was Thu, 08 Sep 2011 15:11:26 GMT
- Age = Sat, 08 Oct 2011 15:11:26 GMT - Thu, 08 Sep 2011 15:11:26 GMT = 2592000 seconds
- freshness_lifetime = 31536000 seconds
- cache_ttl = 31536000 seconds - 2592000 seconds = 28944000 seconds = 335 days
But, if the Date
was Wed, 08 Sep 2010 15:11:26 GMT
- Age = Sat, 08 Oct 2011 15:11:26 GMT - Wed, 08 Sep 2010 15:11:26 GMT = 395 days = 34128000 seconds
- freshness_lifetime = 31536000 seconds
- cache_ttl = 31536000 seconds - 34128000 seconds = -2592000 seconds
In the last example, the cache_ttl
value is a negative value. The object is already stale when it reached the browser, hence it should not be cached.
In the above example, if the Age
response header was present, we should use that value for the cache_ttl
calculations.
Proxy behavior of CloudFront vs other CDNs
As per RFC 2616, proxies are supposed to cache the Date
and Expires
headers from the origin. These are useful indications to the end client about the file properties. CloudFront follows the RFCs and sends cached Date
and Expires
values, however all other CDNs I tested act as if they were the origin and generate a fresh Date
header.
Lets assume you have just implemented CloudFront for your newly launched website, and decided to use Cache-Control: max-age=2592000
, thus allowing CloudFront (and browsers) to cache your object for 30 days. The Date
header is cached at CloudFront, the max-age
value also remains the same, what changes is the current time at your visitors' browser. 5 days after launch, your object would only be browser cacheble for 25 days. After 29 days, it would be cachable for 1 day. If the a user visited your site on day 29, then again on day 31 then both these visits would result in a request being made to CloudFront since the first visit had a cache_ttl
of one day only.
Browsers may be receiving expired content from CloudFront
Now, after day 30, when the object gets stale in CloudFront, it would make a request to the origin to check for changes in the file using a conditional GET. If it gets a 304 Not modified
response, it refresh the cachability of the object, but continues to send the original Date
header. All requests made to CloudFront for that object are already stale the moment it reaches the browser. All efforts you put into having the correct caching headers are now rendered useless.
Time since launch | Age(now - Date) | Age Header | cache_ttl in Firefox and Chrome |
---|---|---|---|
0 days - Just Launched | 0 | 0 | 2592000 (30 days) |
1 day | 86400 | 86400 | 2505600 (29 days) |
29 days | 2505600 | 2505600 | 86400 (1 day) |
30 days | 2592000 | 0 | 0 (not cachable) |
31 days | 2678400 | 86400 | -86400 (stale -not cachable) |
One thing to note here is that CloudFront sends an appropriate Age
header, which is the time since CloudFront last re-validated the object from the origin. Unfortunately, not all browsers follow this. Firefox and Chrome (link goes to a page with headers from i.ticdn.com) surprisingly seem to ignore the Age
header, which is too bad. IE8 which does seem to take it into consideration is not suspectable to the stale object problem, but it would still face the reduced cache_ttl problem.
Possible work-arounds
- Have Expires/Cache-Control so far into the future that you are sure to have modified objects in the meantime.
CDN Planet
usesmax-age
of 10 years. To do this effectively, you also need to have versioned URLs, so that if you change something, the url should change too. (Tip: CloudFront ignores the querystring, so the cache buster should be in the path) - Purge/invalidate objects at CloudFront at least once during the duration of
max-age
. Be aware: CloudFront charges for invalidation requests so this can add up if you have many files.