Sunstein Insights

Back to All Publications

Exploitation of Publicly Available Website Data May be Unstoppable

November 1, 2019

Sunstein | Winning IP View more articles

Data scraping is a technique by which automated tools are used to extract data from a website and format the data for analysis. Many companies mine website users’ publicly accessible data in order to tailor products and services to their perceived interests. This is particularly common with social network sites, where users may share their work histories, interests, backgrounds, and connections with other users.

There are some defenses to data scraping. First, users can often utilize a website’s settings to specify which personal data is for public access and which is for private access only by their connections.

Second, websites often provide terms and conditions that prohibit or limit data scraping, and may use tools that detect and block suspicious scraping activities.

Websites seeking to stop data scraping often invoke the Computer Fraud and Abuse Act, which, according to some courts, makes it a crime to obtain, without proper authority, information from a computer that is connected to the internet.

In hiQ Labs, Inc, v. LinkedIn Corporation, the Ninth Circuit recently considered whether California law and the CFAA allow a website to block access to its users’ public data. Data analytics company hiQ sought injunctive relief to stop LinkedIn from blocking its scraping of Linkedin users’ public data. Neither California law nor the CFAA, it argued, supported the measures that Linkedin took to block hiQ’s access to the public portion of the Linkedin website.

A federal judge in San Francisco had granted hiQ a preliminary injunction to halt these practices. The Ninth Circuit affirmed. The appellate court found that hiQ met all the prerequisites for a preliminary injunction.

First, the company would likely suffer irreparable harm since it had no viable way to remain in business without using LinkedIn’s public data. Second, the court decided that hiQ’s interest in continuing its business far outweighs the privacy interest that LinkedIn users retained in their publicly accessible data.

As for the third and most important factor, the court found that hiQ was likely to succeed on the merits of its claim of intentional tortious interference under California law: hiQ had contracts with third-parties for its product, LinkedIn knew of the contracts, LinkedIn’s intentional acts to block hiQ’s scraping access were designed to induce disruption of the contracts, and actual disruption occurred which caused harm to hiQ.

The appeals court rejected LinkedIn’s defense that it had a legitimate business purpose in protecting its users’ data. The privacy expectations of LinkedIn’s customers in such data, said the court, were “uncertain at best” because the data was on their public profiles, available for viewing by anyone with a web browser.

The Ninth Circuit also held that no authorization is needed under the CFAA to view information that is publicly accessible on a website.

The court contrasted the hiQ facts to those in two Ninth Circuit precedents. United States v. Nosal II (2016) held that a former employee who used a current employee’s login credentials to access the company’s computers acted “without authorization” under the CFAA. And Facebook, Inc. v. Power Ventures, Inc. (2016) held that a website that circumvented barriers to gain access to password-protected Facebook members’ data had acted “without authorization” under the CFAA.

The panel in hiQ inferred from these cases that no authorization is needed where data is not password-protected.

Other circuits, the court acknowledged, have interpreted the CFAA more broadly with respect to authorization. The First Circuit, in EF Cultural Travel BV v. Explorica, Inc. (2001), held that violations of confidentiality agreements or other contractual restraints could give rise to unauthorized access under the CFAA. The Eleventh Circuit, in United States v. Rodriguez (2010), held that violations of a policy governing the use of databases exceeded authorized access under the CFAA.

The lesson that the Ninth Circuit drew from these cases is that the CFAA is an “anti-intrusion” statute – one meant to criminalize the digital equivalent of breaking and entering – rather than a contract-based statute, which could criminalize a violation of a website’s terms of service.

The court also determined that the public interest—the fourth preliminary injunction factor-- favored hiQ. Linkedin is not entitled to free rein to decide who can collect and use data that it does not own, that is otherwise publicly available, and that Linkedin itself collects and uses. To do so, the court reasoned, risks the creation of information monopolies that would disserve the public interest.

In view of the hiQ decision, website owners should reconsider their policies governing automated access of their users’ data by third parties. User agreements, which provide the terms and conditions for such access, may have no legal effect to limit access to publicly shared data. Barriers to such publicly shared data may violate the rights of third-party companies.

Website owners may want to evaluate the types of publicly shared user data their sites elicit and consider whether authorization requirements, such as username and password, should be mandated for some or all of this data.

Further, companies using automated tools to access website data should be aware that not all jurisdictions share the Ninth Circuit’s narrow interpretation of the CFAA. Violating the terms and conditions in websites’ user agreements may have legal ramifications under that statute.

Finally, website users should be mindful that analytics companies are collecting and using personal data from their public profiles. Such access may not benefit from protections under federal privacy laws. If they see such use of personal data as undesirable, website users should set appropriate privacy settings to limit access only to their accepted connections.