The new protocol being pushed in browsers and some high profile companies which runs DNS requests over HTTPS and makes some bold claims about protecting your privacy from ISP's. You may think differently about these claims after reading.
Well, from the outset that's a complete fallacy in the protocol. Right from the start it involves a central provider. The whole idea of privacy is simply moved from one 3rd party to another, in this case from an ISP to the choice of DoH provider, several of whom have already admitted to sharing data and privacy polices which state they process your data and share it with other 3rd parties.
For the sake of this discussion, debate, argument lets assume for a moment that the 3rd party is actually moral, ethical and does not store, process or do anything with the DNS requests being sent over DoH. This raises a basic question from an ISP point of view. How does one continue to snoop on people if they do not have access to the DNS requests over DoH and still get mostly accurate results?
If it is general assumed that an ISP can snoop on DNS requests which are not destined for their DNS servers which is exactly the problem DoH attempts to solve. The ISP will also have access to all other data. This additional data contains any connection requests to web servers. Therefore, by filtering any data from a user which can be identified as the source address and the web server based on the destination address.
The obvious target is to filter out any TCP traffic which contains a SYN packet from the user to web server for ports 80 and 443. From that data alone we can identify the traffic that a particular user is accessing a particular web server.
So we know the intent of a user to use a specific web server. If we really want to know what website they are using for monitoring purposes and to later profit from selling the data, all we need to do is to turn the IP address back into a domain name.
The immediate problem here is the reverse DNS database is basically useless for this purpose as it often is not accurate or records don't exist at all. Its probably easier to make our own lookup database. There is a few different approaches to doing this.
Get a list of all registered domain names and do lookups on the host name and with the prefix www added. We are still in the view point of a ISP and run connections for 10,000's of users it's not unusual to also send 100,000's of DNS requests to other DNS servers in order to do this. At 350+ Million registered domain names this is probably not practical or efficient to implement. So lets drop this method.
Populate a database with information we already can get for nearly free from existing end users. How? We snoop the DNS servers hosts to IP lookups and turn them into a database for doing IP to Host lookups. We can get away with this method because not all users are going to switch to DoH.
When encryption is involved. Every HTTPS server has a certificate. In order to find out the hosts that the web server serves, we can just make a connection and perform a TLS negotiation and it will tell us the information we need in order to know what host names it serves. So we are able to form an IP to Host database very reliably using this method.
The conclusion here is to use methods two and three. The 3rd being most accurate, and we only need to update this database on demand when users are accessing the server in order to prevent the information going stale. It also is a benefit that the information on established sites does not tend to change very often. Its probably a good idea to store time information with the data as it changes.
The virtual hosting creates an accuracy problem. However, we absolutely must know the website that is being accessed, assuming the site is public it is possible to make an educated guess and which site the user is accessing. We can do this by monitoring the number of connections from user -> server and the amount of data transferred even though the data is encrypted. As the ISP if we load the same pages we should be able to generate signature within a varying degree of tolerance and make a very reasonably educated guess at the amount of data transferred from each site which involves the main page loading.
Technically this is possible to do but not very practical when trying to scale it to an ISP level because it requires connection tracking and won't work with multi path routing and various other transport methods possibly in use.
Another more practical method would be to load each main web page all the hosts and find out which other hosts are also contacted for loading averts, java script and various other parts of the pages. For most modern web pages this will almost always form a unique signature for the virtual hosts giving up a hint as to which site the user is actually accessing and make a reasonable educated guess with a probability of accuracy.
Other side effects.
DoH breaks all sorts of other things.
- No support in DHCP. So it breaks when you go to a new LAN where you need to use their DNS server.
- Breaks when proxies are used in a corporate environment.
- Can be finger printed and automatically identified as the DoH protocol then blocked by ISP's
- Centralized.. what happens if the DoH or a route to the DoH server fails? You loose 100% access.
- Doesn't prevent the DoH providers from messing with DNS in exactly the same way ISP's do when DNSSEC is not enabled.
While DoH has probably added additional steps to make the side of the ISP's tracking users more complex it doesn't significantly add any value to ensuring users privacy from their ISP. If anything it removes some privacy as an additional 3rd party that wasn't previous involved (the DoH provider) now also has access to your data.
DoH will prevent ISP's messing with DNS records. DoT (DNS over TLS) and DNSSEC also resolve these issues. So from my point of view DoH is probably best avoided as a technology and should be abandoned as you still have no control over your privacy.
Did You find this page useful?
Thanks for the feeback. Please consider sharing with others.