The CDN Finder tool has been very popular since the day we launched it 6 months ago. It enables you to easily find out what CDN(s) a site is using or what CDN is behind a hostname. CDN Finder is easy to use, pretty fast and often gives correct results, but definitely not always. We have always been aware of the tool's imperfections and Sajal recently decided to spend a Saturday on building a new, better CDN Finder from the ground up, with the primary goal to return better results (less false negatives: our tool says 'no CDN' while the hostname does point to a CDN).
Mission accomplished: the new CDN Finder is done, deployed and really is a lot better.
What we did to improve CDN Finder
In the first version of the tool, CDN Finder fetched the HTML and parsed it with very simple logic to get to a list of hostnames. For each found hostname CDN Finder would then do a DNS lookup, look at the last CNAME only and regex match that hostname to our list of CDN hostnames.
There are three problems with this method:
- Hostnames serving content on the page linked to from within CSS or injected via JavaScript are missed
- Using just the last CNAME in the chain is error prone as some CDNs have many different hostnames
- Our list of CDN hostnames was not complete
We tackled all three problems and the results are now much better.
1. Full page rendering
CDN Finder (full site lookup) now loads and renders the full page in Webkit using PhantomJS.
The 'browser' will parse the HTML, CSS and JavaScript, execute the JavaScript, etc.
As a result, CDN Finder has much better list of hostnames that served at least one object on the page.
And there is another reason to render the page with PhantomJS: we get the response headers for each resource.
This allows catching CDNs even if no CNAMEs are given. For example, if the Server
response header contains cloudflare-nginx
, the resource is served by CloudFlare. We use this only as a fallback, if the CNAME chain processing has 'no CDN' as a result.
2. Full CNAME chain processing
Taking that better list of hostnames as input, for each hostname the CDN Finder analyzes the full CNAME chain and by that can better decide which CDN is behind it. Example:
cdnfinder@cdnplanet:~$ host ec.cdnplanet.com ec.cdnplanet.com is an alias for wac.6e8d.edgecastcdn.net. wac.6e8d.edgecastcdn.net is an alias for gp1.wac.edgecastcdn.net. gp1.wac.edgecastcdn.net has address 117.18.237.250
Previously the CDN Finder would only use gp1.wac.edgecastcdn.net
to detect the CDN. Now it uses ec.cdnplanet.com
, wac.6e8d.edgecastcdn.net
and gp1.wac.edgecastcdn.net
. In this example, the wac.6e8d.edgecastcdn.net
part will be stable, as that is what the customer CNAMEs to. gp1.wac.edgecastcdn.net
can be changed at any time by the CDN, into for example gp1.wac.eccdn.net
and that would then fail as eccdn.net
is not in our whitelist.
3. More CDNs on our whitelist
CDN Finder now also catches the following CDNs: Bitgravity, ChinaCache, CDN77, OnApp and Turbobytes.
Benefits to the user
Simply put: the results are more correct/reliable. If the result for a hostname is 'no CDN', then this really should be the case. We feel confident about the methodology and the tool runs smoothly. We hope you enjoy using the CDN Finder and please share your thoughts in the comments!
Open-source
CDN Finder is open-source. You can find it on Github here.