What is Host Header Forgery and why Squid shows this error?

If you have a transparent intercepting proxy deployed in your network you have probably seen this Host Header Forgery error shown by Squid at least multiple times. This is a really annoying problem that has no easy solution. But what it is actually and why it appears in the first place?

2020/02/07 03:51:57 kid1| SECURITY ALERT: Host header forgery 
    detected on local=140.82.118.3:443 remote=10.0.0.50:1355 FD 12 
    flags=33 (local IP does not match any domain IP)
2020/02/07 03:51:57 kid1| SECURITY ALERT: on URL: github.com:443
2020/02/07 03:52:01 kid1| SECURITY ALERT: Host header forgery 
    detected on local=52.216.109.235:443 remote=10.0.0.50:1356 FD 12
    flags=33 (local IP does not match any domain IP)
2020/02/07 03:52:01 kid1| SECURITY ALERT: on URL: 
    github-production-release-asset-2e65be.s3.amazonaws.com:443

First, you must understand that if you have a transparently intercepting proxy deployed in your network, clients/browsers are unaware of that proxy. They always make direct connections to the Internet (at least this is what they think).

But let's see what actually happens when a user wants to visit some well-known site for example github.com (or Facebook or LinkedIn or Google, etc).

User types www.github.com in the browser address window and clicks Enter.
Browser connects to the DNS server that is configured on the machine where browser runs (for example 8.8.8.8). The www.github.com domain name gets resolved into a bunch of IP addresses and the browser chooses the first to connect to (let it be 140.82.118.4).
Browser opens a TCP connection to that IP address and sends the following header.
```
==> tcp connect to 140.82.118.4
    GET /
    Host: www.github.com
```
Proxy intercepts this connection, parses the content being sent and sees that the client tries to connect to www.github.com.
Proxy tries to resolve www.github.com in the configured DNS server (let's say it is 8.8.8.8 again), but gets not 140.82.118.4 but 140.82.118.3 response from DNS server - just because admins at github use DNS round robin with short DNS leases. Note in the DNS response initiated by proxy we only have .3 and we cannot know that .4 ip address is also part of DNS round robin settings!
Proxy sees that the browser tries to connect to IP address that is not part of official IP address for the www.github.com host, issues an alert!!! we found a hacker!!! log entry, responds with Host Header Forgery error and terminates the connection.
Your browsers shows the SSL_ERROR_RX_RECORD_TOO_LONG error as indicated on the following screenshot.

host_header_forgery

Is Host Header Forgery Algorithm Correct?

Is it correct thing to do? Well if we think a little more then yes - Squid is correct. Imagine you have a rule in Squid configuration that allows connections to github.com only and prohibits connections to anything else (super secure environment you would say). But if header forgery detection was not in place - the only thing an attacker would need to successfully go though your super-secure proxy is custom web server that would sit on any IP address and use another header (not Host) in your browser requests to relay the connections to the whole internet. And such server can be written in days if not hours.

So yes host header forgery detection algorithm is applied correctly.

How Could We Mitigate Connection Errors?

Way 1 - Use explicit proxy. In this case the browser does not connect to site IP, but asks proxy to connect to a given domain name, thus no forgery is possible. This is the recommended solution.

Way 2 - Deploy Caching DNS Server in your network that would forcefully override the response and cache them, thus both browsers and Squid would get the same IP when resolving the www.github.com names. As it is not clear if such DNS server actually exists we always recommend to deploy the one and only caching DNS server within your network. Also always make the clients and Squid both use that DNS server for name resolving.