Skip to content

A better CDN Finder

Published on May 29 2012

The CDN Finder tool has been very popular since the day we launched it 6 months ago. It enables you to easily find out what CDN(s) a site is using or what CDN is behind a hostname. CDN Finder is easy to use, pretty fast and often gives correct results, but definitely not always. We have always been aware of the tool's imperfections and Sajal recently decided to spend a Saturday on building a new, better CDN Finder from the ground up, with the primary goal to return better results (less false negatives: our tool says 'no CDN' while the hostname does point to a CDN).

Mission accomplished: the new CDN Finder is done, deployed and really is a lot better.

What we did to improve CDN Finder

In the first version of the tool, CDN Finder fetched the HTML and parsed it with very simple logic to get to a list of hostnames. For each found hostname CDN Finder would then do a DNS lookup, look at the last CNAME only and regex match that hostname to our list of CDN hostnames.

There are three problems with this method:

  1. Hostnames serving content on the page linked to from within CSS or injected via JavaScript are missed
  2. Using just the last CNAME in the chain is error prone as some CDNs have many different hostnames
  3. Our list of CDN hostnames was not complete

We tackled all three problems and the results are now much better.

1. Full page rendering

CDN Finder (full site lookup) now loads and renders the full page in Webkit using PhantomJS. The 'browser' will parse the HTML, CSS and JavaScript, execute the JavaScript, etc. As a result, CDN Finder has much better list of hostnames that served at least one object on the page. And there is another reason to render the page with PhantomJS: we get the response headers for each resource. This allows catching CDNs even if no CNAMEs are given. For example, if the Server response header contains cloudflare-nginx, the resource is served by CloudFlare. We use this only as a fallback, if the CNAME chain processing has 'no CDN' as a result.

2. Full CNAME chain processing

Taking that better list of hostnames as input, for each hostname the CDN Finder analyzes the full CNAME chain and by that can better decide which CDN is behind it. Example:

cdnfinder@cdnplanet:~$ host ec.cdnplanet.com
ec.cdnplanet.com is an alias for wac.6e8d.edgecastcdn.net.
wac.6e8d.edgecastcdn.net is an alias for gp1.wac.edgecastcdn.net.
gp1.wac.edgecastcdn.net has address 117.18.237.250

Previously the CDN Finder would only use gp1.wac.edgecastcdn.net to detect the CDN. Now it uses ec.cdnplanet.com, wac.6e8d.edgecastcdn.net and gp1.wac.edgecastcdn.net. In this example, the wac.6e8d.edgecastcdn.net part will be stable, as that is what the customer CNAMEs to. gp1.wac.edgecastcdn.net can be changed at any time by the CDN, into for example gp1.wac.eccdn.net and that would then fail as eccdn.net is not in our whitelist.

3. More CDNs on our whitelist

CDN Finder now also catches the following CDNs: Bitgravity, ChinaCache, CDN77, OnApp and Turbobytes.

Benefits to the user

Simply put: the results are more correct/reliable. If the result for a hostname is 'no CDN', then this really should be the case. We feel confident about the methodology and the tool runs smoothly. We hope you enjoy using the CDN Finder and please share your thoughts in the comments!

Open-source

CDN Finder is open-source. You can find it on Github here.


Comments