HTTP compression is the ability that can be built into web servers and web clients to improve transfer speed and bandwidth usage.
HTTP data is compressed before it is sent from the server: the appropriate browser will announce what method is supported to the server before downloading the correct format; browsers that do not support the appropriate compression method will download uncompressed data. The most common compression schemes include gzip and deflate; however, the complete list of available schemes is managed by IANA. In addition, third parties develop new methods and incorporate them into their products, such as the Google Shared Dictionary Compression for HTTP (SDCH) scheme implemented in the Google Chrome browser and used on Google servers.
There are two different ways of compression that can be done in HTTP. At a lower level, the Transfer-Encoding header column may indicate the payload of compressed HTTP messages. At a higher level, the Content-Encoding header field may indicate that other transferred, cached, or referenced resources are compressed. Compression using Content-Encoding is more supported than Transfer-Encoding, and some browsers do not advertise support for Transfer-Encoding compression to avoid triggering bugs on the server.
Video HTTP compression
Negosiasi compression scheme
In many cases, excluding SDCH, negotiations are conducted in two steps, described in RFC 2616:
1. The web client advertises the compression schemes it supports by including a token list in the HTTP request. For Content-Encoding , list in the field called Accept-Encoding ; for Transfer-Encoding , this field is called TE .
2. If the server supports one or more compression schemes, the outgoing data can be compressed with one or more methods supported by both parties. If that is the case, the server will add the Content-Encoding or Transfer-Encoding field in the HTTP response to the scheme used, separated by commas.
The web server does not mean the obligation to use the compression method - it depends on the internal settings of the web server and may also depend on the internal architecture of the website in question.
In the case of SDCH, dictionary negotiations, which may involve additional steps, such as downloading the appropriate dictionary from an external server.
Maps HTTP compression
Content-Encoding token
The official token list available for servers and clients is managed by IANA, and it includes:
- compress - UNIX "compress" the program method (historically, obsolete in most applications and replaced by gzip or deflate)
- deflate - compression based on the deflate algorithm (described in RFC 1951), a combination of LZ77 and Huffman coding algorithms, wrapped in the zlib data format (RFC 1950);
- exi - W3C Efficient XML Interchange
- gzip - GNU zip format (described in RFC 1952). Using deflate algorithm for compression, but the data format and checksum algorithm is different from content-encoding "deflate". This method is most widely supported in March 2011.
- identity - No transformation is used. This is the default value for content encoding.
- pack200-gzip - Network Transfer Format for Java Archives
- br - Brotli, a new open-source compression algorithm specifically designed for HTTP content encoding, implemented in Mozilla Firefox 44 release and Chromium 50 release.
In addition to this, some unofficial or non-standard tokens are used in the wild by servers or clients:
- bzip2 - compression based on the free bzip2 format, powered by lighttpd
- lzma - LZMA based compression (raw) is available in Opera 20, and elinks via the compile time option
- peerist - Peer Content Caching and Retrieval Microsoft
- sdch - Google Shared Dictionary Compression for HTTP, based on VCDIFF (RFC 3284)
- xpress - the Microsoft compression protocol used by WindowsÃ, 8 and later for Windows Store app updates. LZ77-based compression optionally uses Huffman encoding.
- xz - LZMA2-based content compression, supported by unofficial Firefox patches; and fully implemented in mget since 2013-12-31.
Server that supports HTTP compression
- SAP NetWeaver
- Microsoft IIS: built-in or using third party modules
- The Apache HTTP server, via mod_deflate (regardless of its name, only supports gzip)
- HTI Server Hiawatha: presents pre-compressed files
- Cherokee HTTP Server, On the fly gzip and compression deflation
- Oracle iPlanet Web Server
- Zeus Web Server
- lighttpd, via mod_compress and later mod_deflate (1.4.42)
- nginx - installed
- Apps by Tornado, if "compress_response" is set to True in the application settings (for versions before 4.0, set "gzip" to True)
- Jetty Server - built-in to presenting the default static content and available via the servlet filter configuration
- GeoServer
- Apache Tomcat
- IBM Websphere
- AOLserver
- Ruby Rack, via Rack :: Deflater middleware
- HAProxy
- Varnish - installed. Works also with ESI
Compression in HTTP can also be achieved by using server side scripting language functionality such as PHP, or Java programming languages.
The problem prevents the use of HTTP compression
Article 2009 by Google engineers Arvind Jain and Jason Glasgow stated that over 99 person-years are wasted every day due to increased page load time when users do not receive compressed content. This happens when anti-virus software interrupts connections to force them to uncompress, where proxies are used (with overly cautious web browsers), where servers are misconfigured, and where browser bugs stop compression being used. Internet Explorer 6, which is down to HTTP 1.0 (without features like compression or pipelining) when behind a proxy - a common configuration in a corporate environment - is the most vulnerable main browser failing to return to uncompressed HTTP.
Another problem that is found when deploying HTTP compression on a large scale is because of the definition of deflate encoding: while HTTP 1.1 defines deflate encoding as data compressed with deflate (RFC 1951) in zlib stream formatted (RFC 1950), Microsoft servers and client products have historically implemented it as a raw "raw" stream, making its deployment unreliable. For this reason, some software, including Apache HTTP Server, only implements
Security implications
In 2012, a general attack on the use of data compression, called CRIME, was announced. While CRIME attacks can work effectively against a large number of protocols, including but not limited to TLS, and application-layer protocols such as SPDY or HTTP, only exploits against TLS and SPDY are shown and mostly mitigated in browsers and servers. CRIME hardness against HTTP compression has not been addressed at all, although the CRIME authors have warned that this vulnerability may be wider than the combined SPDY and TLS.
In 2013, a new example of CRIME attacks against HTTP compression, dubbed BREACH, has been published. BREACH attacks can extract login tokens, email addresses, or other sensitive information from TLS encrypted web traffic in just 30 seconds (depending on the number of bytes to be extracted), provided the attacker tricks the victim into visiting malicious web links. All TLS and SSL versions are at risk from BREACH regardless of the encryption algorithm or cipher used. Unlike previous examples of CRIME, which can be successfully defended against by turning off TLS compression or SPDY header compression, BREACH exploits HTTP compression that can not realistically be turned off, as almost all web servers rely on it to increase the speed of data transmission for users.
In 2016, TIME attacks and HEIST attacks are now public knowledge.
References
External links
- RFC 2616: Hypertext Transfer Protocol - HTTP/1.1
- Value of HTTP Content-Coding by an Internet Defined Number Authority
- Compression with lighttpd
- Coding Horror: HTTP Compression on IIS 6.0
- 15 Seconds: Web Site Compression in the Wayback Machine (archived July 16, 2011)
- HTTP compression: resource page by founder of VIGOS AG, Constantin Rack
- Using HTTP Compression by Martin Brown from Server Watch
- Using HTTP Compression in PHP
- Dynamic and static HTTP compression with Apache httpd
Source of the article : Wikipedia