Lightning-fast sites with Varnish
During the development phase, the site is really fast, but once it enters the production phase and is open to the public, it stops working correctly. This is the sad reality that we as a hoster have to face every day.
Unfortunately, many developers today still do not pay attention enough to performance and scalability. And yet, in the Internet industry, but also in various specialised developer communities, everyone keeps telling that scalability starts with the design of the architecture of an application or a site.
The various frameworks that are used to build sites increasingly include the appropriate tools to make the site run faster.
However, negligence and ignorance are not always the reasons why some sites are slow. In some cases, this is explained by a set of circumstances:
- More visitors than expected initially
- A larger concentration of the number of visitors
- Not enough server power
- Bad software tuning
- Bad database design and bad indexation
- The use of slow or unstable third party components.
In an ideal world every launch is preceded by a stress test. This test simulates visitors on your website or application and points out the weaknesses before you've even started in production. In reality, it often lacks time or resources to do this. But in the development phase it is of course always difficult to predict where the possible problems are situated. Therefore, it is only when the site enters the production phase and when many people visit it that bottlenecks arise.
In such situations, it is often too late and we need to intervene as a hosting company. One of the possibilities is to use more servers. This would be a very profitable strategy for us, but we usually choose a more pragmatic approach: not all applications can be run efficiently on multiple servers and, most of the time, performance issues can be solved by using a good caching layer.
Caching is a technique to store dynamic data (data that was calculated), in a static form, making it possible to achieve better performance.
Unfortunately, using caching comes with compromises: dynamic data become static and there is a chance that, because of caching, old data is displayed as a consequence. By applying a smart expiration or invalidation strategy, it is possible to make sure that new data is displayed on the site in time.
Unfortunately, most forms of caching need to be integrated in the site programming and are thus architectural decisions. The absence of proper caching results in problems.
Fortunately, there are reverse proxies. We all remember the proxy servers that were used back when Internet connections were still slow: the proxy server allowed to store the content of visited sites so that it did not constantly need to be downloaded from that slow Internet.
Internet connections are much faster than before and a reverse proxy does, as the name suggests, exactly the opposite: the server is not on the side of the user, but in the data centre. A reverse proxy protects servers against unexpectedly large numbers of visitors.
How can a reverse proxy achieve this? By (by analogy with an ordinary proxy) caching the servers’ results and presenting them in a static form to visitors. This way, requests do not always need to go through the back-end servers and sites load faster. Varnish is such a reverse proxy.
Varnish is an open source reverse proxy software, developed by Norwegian LinPro. What started as a project aimed to make a Norwegian news site faster has now become a project that is considered an “industry standard”.
Varnish can be easily installed on a Linux server and is placed before the web server(s). By linking the site’s DNS records (e.g. www.domainname.be) to Varnish, the software protects your back-end servers against overload.
Varnish speaks HTTP
Since Varnish is a proxy that establishes a bridge between the browser and the server, it is only logical that Varnish speaks HTTP, which is the protocol that is conventionally used for web traffic on the Internet.
In its specifications, HTTP includes a caching mechanism, often referred to as the “browser cache”. Just as proxy servers, the browser cache was formerly used to compensate for slow Internet connexions. Now, caching HTTP is also used to change the behaviour of reverse proxies. This behaviour is clearly described in section 14.9 of the HTTP specifications.
It is always best to cache as many things as possible, but of course, not everything can be cached. Varnish does not cache:
- When there are cookies (indicating the presence of user-specific content).
- When the time to live of the cache control headers is shorter than or equal to zero.
- When the visitor’s request is not a GET request (ergo POST, PUT or DELETE).
- When the back-end requests user authentication.
Varnish will cache everything that does not meet the criteria above. The validity/duration of the cache for a page is determined by the time to live of the cache control header. If it is set to 3600 seconds, the page will be cached for one hour.
Varnish Configuration Language
The previous section described Varnish’s standard behaviour and what can be cached or not.
Often, these scenarios are not good for existing software that sometimes does not meet these requirements. Fortunately, Varnish includes a programming language that makes it possible to customise Varnish’s behaviour.
By means of Varnish Configuration Language (VCL), it is possible to manage several components of Varnish’s cache and determine, where necessary, what needs to be cached or not, under spécific circumstances.
These are some of the most common actions that can be performed via VCL:
- Removing certain cookies where they are not necessary so that certain pages or other documents can be cached.
- Explicit caching (or not) of certain URLs.
- Determining how much time specific pages need to be cached.
- Elaborating an invalidation mechanism allowing to remove certain pages from the cache, even if the time to live has not expired.
- Rewriting HTTP headers.
- Integrating certain cookie data in the caching identification key.
- Load balancing requests to specific back-end servers.
- Determining how Varnish must respond if the back-end servers are down or do not respond in time.
Each site is different and requires a separate VCL configuration. It is important that the developer of the site can make an inventory of the different URLs and determine what cookies need to be used and where. This will make it possible to determine the ideal Varnish configuration.
Varnish is fast. Extremely fast! It is a tool that does very little, and that needs to do very little. And that is why it is so efficient. There is about no question of overhead and all cached items are stored by default in the RAM memory.
Some targeted tests we carried out showed that slow setups can work up to 500 times faster thanks to Varnish. But figures do not tell the whole story. The net performance gain often depends on how slow the initial setup is, how many different URLs need to be cached and how visitors surf the site.
One thing is certain: most major players on the web use this technology. In 2009, a case study was published about how, thanks to the use of Varnish, NU.nl (the largest traffic site in the Netherlands) managed to face up to 21 million visitors in one day.
Varnish at Combell
We at Combell have also been using Varnish for the past few years. It is the cornerstone of many setups. We usually use Varnish as a joker when an existing site fails to function correctly, but we often include Varnish during the design phase of the initial plan.
Some of our customers tuned their existing software so that, thanks to Varnish, they can achieve larger growth with existing hardware.
Other customers use Varnish as a cheaper and equivalent alternative to their expensive Content Delivery Network (CDN) solutions. These customers mainly use caching and, to a lesser extent, geographical distribution of the caches.
It is clear that we greatly value Varnish. That is why we invest in sufficient knowledge about this technology. We also love to give advice tailored to your website or application to help you get the most out of this technology. Contact us for a personal talk about your needs or questions.