Data scraping is a technique in which automated tools-- robots or “bots”--are used to extract large quantities of data from a website. In some cases, the extracted data has been developed by the website owner with considerable effort using proprietary methods.
Companies deploying these bots use the extracted data for market research, business intelligence or other purposes related to providing a product or service, often in competition with the website owner.
A federal appeals court recently decided that scraping data from a publicly accessible website may constitute trade secret misappropriation. This ruling will likely caution companies to limit the quantity of data scraped from another company’s website.
Publicly accessible websites often provide an interface for human users to query a website’s database. For example, a website may contain fields to enter search criteria, such as a user’s vehicle model and zip code, which are translated to a command that is executed to return a matching item in the database, such as an insurance quote. Bots use the commands executed by the interface to automatically query the database.
In this way, bots can query the database much more rapidly than a human user, at several queries per second, and collect an enormous quantity of the website’s data. By strategically formatting the query commands, a bot may reproduce a portion of the database in a short period of time.
In Compulife Software, Inc. v. Newman (11th Cir 2020), the Eleventh Circuit Court of Appeals considered whether using data scraping to reproduce a portion of a website’s database, deemed to be a trade secret, constitutes unlawful misappropriation.
Compulife provides a publicly accessible website for consumers to request free insurance quotes. The website queries the quotes from Compulife’s database, which is generated by compiling public rate tables using a specialized method known only within Compulife.
The defendant, Rutstein, also provides a website for requesting insurance quotes. By means of data scraping performed by a hired hacker, Rutstein reproduced a portion of Compulife’s database, which was used to provide quotes on Rutstein’s own website.
In particular, the hacker used a bot to scrape all data pertaining to two zip codes from Compulife’s database, totaling more than 43 million quotes. To do so, the bot queried the database using every combination of demographic data for the two zip codes, which took the bot only four days to complete. Compulife had no access barrier in place to prevent such scraping activities.
Compulife alleged that Rutstein’s scraping of their quotes violated Florida’s Uniform Trade Secrets Act (FUTSA) and the federal Defend Trade Secrets Act (DTSA). The district court held that, although the database was a trade secret, the individual quotes, due to their public availability, were not trade secrets that could be misappropriated.
The Eleventh Circuit saw things differently: Whether the individual quotes were trade secrets is the wrong question. Taking enough of the quotes, it said, could be a misappropriation of a protected portion of the database. Otherwise, trade secret protection for data compilations would have no value.
The court sent the case back to the trial court to determine: (1) whether the quantity of data scraped was substantial enough to constitute misappropriation of the database “as a whole,” and (2) whether the means employed were “improper.”
As for the second factor, what qualifies as “improper means”? Compulife relied on the FUTSA statutory provisions for trade secret acquisition and use in alleging that Rutstein misappropriated their database.
Acquisition applies when a person acquires a trade secret through improper means. Compulife alleged that Rutstein engaged in acquisition by hiring the hacker who scraped the data from Compulife’s website.
Use, in turn, applies when a person uses a trade secret knowing that it was improperly acquired. Compulife alleged that Rutstein engaged in use by using the data which the hacker scraped from Compulife’s website.
Each of these provisions requires the element of “improper means.” If the hacker’s scraping was determined to constitute “improper means,” it would be difficult, said the court, to escape liability for misappropriation under these laws. Although the court remanded the case for determination of “improper means,” some of its comments point to Rutstein having used “improper means.”
First, the fact that the quotes were taken from a publicly accessible website does not mean its scraping was proper. Compulife authorized the public to access only as many quotes “as humanly possible.” A bot, by contrast, can collect an “infeasible amount” of quotes, which may constitute an improper means.
If precautions to maintain the database’s secrecy were reasonable under the circumstances, scraping an infeasible quantity of quotes may be improper, even though the database was accessible to the public.
On this point, the Court relied on a 50-year old opinion, E. I. du Pont de Nemours. v. Christopher, in which the Fifth Circuit held that aerial photography from public airspace that exposed a trade secret constituted improper means, even though the trade secret owner had left its facility exposed to such photography.
Based on Christopher, the Eleventh Circuit opined that an owner’s use of imperfect measures to protect a trade secret does not render a means of acquisition proper. So long as the precautions taken to maintain secrecy were reasonable under the circumstances, it does not matter that Rutstein found a way to circumvent them.
An even more analogous case, said the court, is Physicians Interactive v. Lathian Systems (E.D. Va. 2003), in which a federal judge in Virginia held that a bot hacking a public-facing website to acquire confidential customer lists and computer code is an improper means to obtain a trade secret, and thus misappropriation. The Virginia court’s holding--that a trade secret owner’s “failure to place a usage restriction on its website” did not automatically render the hacking proper--applies equally to Compulife’s case, wrote the Eleventh Circuit.
Courts in other jurisdictions have analyzed data scraping in the context of the federal Computer Fraud and Abuse Act (CFAA) and state statutes. Florida has just such a statute, but Compulife could not show that the scraped data was protected by a “technological access barrier” as required by that law.
Protection against scraping of publicly accessibly data is inconsistent across jurisdictions. For example, as we discussed here, the Ninth Circuit, recently held that the CFAA and California state law did not allow a website to block access to its publicly accessible data.
The Compulife decision puts companies on notice that the data they scrape from publicly accessible websites may be legally protected. Even if a user interface is provided for human access to individual items of data, the underlying database may still be a confidential compilation protected by a trade secret.
Using a bot to collect vast quantities of a website’s data to reproduce a substantial portion of the database may leave these companies liable for misappropriation. Such liability may apply even if the website’s owner did not place a barrier, user agreement, or other usage restriction on its website. With this in mind, these companies should consider limiting the quantity of data they scrape.
For their part, website owners should be mindful of the inconsistent opinions on data scraping across jurisdictions. They should review the access restrictions allowable in their jurisdiction and consider whether such restrictions should be placed on automated access to some or all of their data.
NFTs – A Novel Challenge For Traders, Investors and Copyright Lawyers
What is a COVID-19 Vaccine Intellectual Property Waiver?
Google v. Oracle: Supreme Court Holds Copying of Key Part of Java Software, its API, is Fair Use
Will NFTs revolutionize patent law?