Recently I had an idea about creating the local in-browser CDN using the extension API. There is a Chrome extension, incorporating some of those ideas.
As a web developer I am using various CDN for static files delivery.
CDN is a great idea - it let browsers cache frequently used files. But the question is: why don't browser include those static assets in it's distributions?
There are several js/css/font files which can be bundled into the browser. This will save some traffic and reduce page loading time (still not too much in a short run, but really significant amass).
I have came to some assumptions:
- All CDN static assets can be cached locally, as they are supposed to never change. Nobody can modify those files, they are permanent. Why should we spent traffic and time loading those files from the network at all?
- The majority of static files, which are hosted not on the CDN, but on private servers and has a certain name pattern (e.g. jquery-1.10.2.min.js) can be assumed permanent.
I have created the abovementioned extension, which incorporates those two assumptions. The extensions basically listens to the webpage resources requests and hijacks it by local resources.
I wanted to have a better insight than experience-based assumptions, so I've decided to conduct a little research on the topic.
First of all I've thought about using Common Crawl dataset. But I have no access to the computation cloud, which will be able to download and crunch those data (81Tb is not a joke).
Bingo! The dataset is what I really need to check my hypothesis: it contains the request data for 300 000 top-ranked Alexa sites.
I have used the Mar, 1st crawl results.
I've downloaded the dataset and played around with it.
I will share the results following the pattern in the Steve's article, so that you can compare the trends.
Sites Loading jQuery from Google CDN - Mar 1 2014
Most popular jQuery versions from Google CDN - Mar 1 2014
I have made some changes to initial SQL and postprocessed the data. There are two problems, which can bias the data:
- Some sites use urls in the "short format":
Today this format corresponds to jquery-1.9.1.
- Wordpress adds
?ver=wpversionquery for all static resources urls, which will be groupped as a different entry in SQL results.
- http vs https does not make sense for version frequency statistics. If you are interested in this kind of distribution, you should run another query.
Top CDNs Serving JS Libs - Mar 1 2014
Google CDN profile - Mar 1 2014
|Total CDN requests||67198||23.1052|
Rough number of Wordpress-powered sites - Mar 1 2014
There is heuristics which can help us get a rough estimate of number of sited powered by Wordpress:
- Wordpress uses jquery-migrate plugin. The plugin is a rare thing, as it is used to bring deprecated features of old jQuery to jQuery 1.9+.
- Wordpress adds
?ver=<wpversion>query to all static assets it serves.
- The total percent of CDN-friendly sites are keep growing.
- jquery-1.7.x is still the most popular jquery version having 25% share of all jquery scripts.
- Google, Jquery and Cloudflare are most popular CDN providers.
You can find some other CDN providers here.
- jQuery and jQueryUI are total leaders in Google CDN servings. Accompanied by swfobject and webfontloader.
- 10% of top internet sites are powered by Wordpress!
Now you have the secret knowledge! Be responsible using it in public ;)