Chrome's proxy prefetch proposal: Privacy, Security and broader web implications
I have high regards for Chrome Security team because of their attention to detail in securing Chrome users against attacks from malicious publishers (case in point: see Chrome’s Site Isolation work).
However, Privacy in Chrome is a completely different story. In the past, they have used dark patterns to leak user’s information to Google servers (e.g., Chrome automatically logging users into the browser, Chrome sending an user identifying attribute to Google servers, Chrome exempting Google sites from user site data setting).
More recently, Google has now started silently intercepting the user traffic, and proxying it via their servers without the consent from either the user or the web developer (Look for “search result link prefetches” in Chrome’s privacy whitepaper, related GitHub proxy proposal and the alternate-loading-modes proposal). I’m sad to say this, but this is the figurative straw that finally broke the camel’s back, and I have completely lost my trust in Chrome.
First some history
In the past, Google rolled out the proposal of signed exchanges that enabled aggregators (such as Google’s search engine) to serve content on behalf of other sites. Safari and Firefox do not support signed exchange even to this date due to security concerns. Google “handled” their concerns by claiming that signed exchanges are an opt-in feature by sites, and they need to be aware of the security risks. See comment from Mozilla capturing this sentiment: “Sites opt-in to using this mechanism, and in doing so need to be aware that this comes with some risks, but in doing so they enable a new feature”.
From what I can tell, most of the web developers also decided not to opt-in to signed exchanges. This significantly reduced the uptake of the signed exchanges, but on the positive side, it also meant that the developers had a say in what they were getting in to.
This new proposal from Google seems like another take of the similar idea. However, this time Google has decided to silently proxy user’s traffic through their servers without any opt-in from either the user or the site.
Now, that’s a problem for multiple reasons. I dug through the proposals, Chrome Privacy whitepaper, code source and distilled them down:
User’s privacy at risk
To understand these risks better, think of the proposed CONNECT HTTPS proxy as like a VPN. At first level approximation, exposing user information to a completely new entity (VPN or a proxy) is privacy negative. Of course, if the protocol and the proxy are well-designed and trusted, it could actually improve user’s privacy. Unfortunately, I do not think that’s case here. There are at least few specific ways in which Chrome’s private proxy proposal leaks user’s information:
The proposal mentions that the browser would only prefetch the URLs for which the user does not have cookies. Imagine that Google search page (or GMail or any other page controlled by Google) requests prefetching of https://foo.com and https://bar.com. Lets say user had cookies for foo.com, and so Chrome would only prefetch bar.com. By joining logs internally among different backends, Google can easily conclude that the user has previously visited foo.com. This leaks user’s browsing history to Google without user’s consent.
Past Browsing history leaked to the employer: Lets say your employer wants to determine if you have visited https://foo.com in the past on the employer’s network or even outside on networks they do not control. To do this, your employer requests prefetching of https://foo.com on one of the webpages on the company’s Intranet webpage. Since Chrome will prefetch https://foo.com only if the user has not visited it in the past, the employer can observe the prefetched traffic, and determine user’s browsing history.
Browsing history leaked to all network observers: By the same logic as above, user’s network provider (think ISP/hotel/airport/coffee-chain WiFi) can also determine user’s past browsing history without their consent. This would happen even if all sites and connections are using HTTPS.
Private proxy, but private from whom?
It’s unclear what privacy guarantees are provided by the proxy and how the user or the web developer can verify those guarantees. Would Google use the intercepted traffic to build the ad profile of the user? I wish the proposal provided more clarity on the guarantees provided by the proxy, and how the users and developers can verify those guarantees.
Google does not disclose what user information they are logging at the proxy. I’m suspicious that to prevent abuse, the proxy would end up logging more information than just the user’s IP address. See this Tom Scott’s video on VPNs for a similar discussion.
Security concerns with proxying
Users generally do not know this, but a proxy or a VPN can leverage recently leaked TLS certificates to inject malicious code on the browser. If the browser has not yet learned about the revoked certificates, it gives a window of opportunity to the proxy/VPN to create malicious HTML/JavaScript, sign it using leaked certificate, and deliver it to the browser. The browser would happily cache the malicious code for whatever duration is specified in the cache headers by the malicious proxy, and run that malicious code every time user visits that website.
Now, I’m sure Google would not push malicious code to its own users, but it definitely makes proxy an appealing target for the attackers and authorities. The users or the publishers have no visibility in the working of the proxy and no way to verify on a continuous basis that the proxy has not been compromised. As a user or a developer, I do not see why I should use a browser that unnecessarily increases my security attack surface?
Sites need to relinquish their controls
The proposal requires sites to burden the serving cost of the prefetch traffic, but does not provide them with any tools to manage the traffic. The IP address of the other endpoint is hidden from the site which would make it difficult for them to detect abusive traffic. They have to rely entirely on the proxy to filter-out the abuse traffic.
If you have ever used Google Search on VPN, you have probably gotten captchas. That tells us that Google is wary of trusting other services that hide user’s IP address. If so, why would Google expect other web developers to blindly trust Google and relinquish their control to Google?
Site opt-out
The proposal mentions that it will allow sites to opt-out from proxying. However, would Google start directly or indirectly punishing the sites that opt-out by de-ranking those sites from Google searches? That would mean that the sites would have the option to either relinquish their security and controls, or lose their market business — not really the best set of options.
More centralization of the web
The sites have to rely on the proxy to filter the abusive traffic and blindly trust the proxy. Since there are only so many proxies that an individual site can blindly trust, it naturally means that the web ecosystem can realistically support very few proxies. It would not be possible for upcoming web browsers (or browsers with lower market share) to earn the trust of the developers. This further centralizes the web at Google.
At the same time, it’s unclear whether the proxy would give some preferential treatment to prefetches from Google websites vs. others. e.g., marking prefetch traffic from other sites as abusive, while letting www.google.com prefetch indiscriminately? Google should provide more details on their abuse prevention mechanisms.
User opt-out
The proposal mentions that it would allow users to opt-out from this traffic interception, but I will argue that enabling this feature by default is another example of Chrome’s dark pattern. Users should not have to stay on a constant vigil to avoid hostile defaults in their software.
Where do we go from here
I’m sure that Google would try its best to safeguard its proxy against attackers. I’m less sure of whether they would fix the privacy gaps or provide more guarantees about how the proxy would safeguard sites from abusive traffic while logging minimal user information.
For me, I’m just left speechless that it’s somehow considered okay for the browser to automatically proxy parts of user traffic. If Chrome wants to earn the user trust, they would hopefully move this feature behind a site or user opt-in.
Suggestions
I would suggest sites to opt-out of this feature as soon as possible to avoid security risks and for safe-guarding the privacy of their users.
In the end, the goal of these proposals is to have users give out their private information to Google instead of the individual sites. If so, I would challenge Chrome to go with a 3rd option: Go back to whiteboard and design a solution that does not discloses user’s private information to either sites or to Google. Design a solution that both users and sites like. That way you would not have to strong-arm users and sites into a feature that’s harmful to users, sites and the broader web.
To the users, I would suggest trying a different browser that values their privacy.