The Website Planet research team discovered a critical data exposure affecting an organization using an open-source data analytics software that allows entities to gather and analyze information about their websites’ visitors.
Two ElasticSearch servers owned by an unknown organization using this software were left unsecured, exposing data related to website visitors.
Web analytics tools collect a vast amount of data to build a detailed profile for each website visitor. Even though websites often collect this information without users knowing, the servers demonstrate that these detailed user profiles can be exposed. As such, users must take adequate steps to protect their privacy when using the internet.
Customer Data Exposed
The ElasticSearch servers were misconfigured; left without any user authentication controls or encryption in place. As such, the unsecured ElasticSearch servers exposed 359,019,902 total records, equating to 579.4 GBof data.
Both of the unknown organization’s ElasticSearch servers contained detailed logs of website user traffic — information that belongs to users of various websites collecting data with the open-source technology.
Website user traffic found on the server included:
Geolocation data
Web page visited
Referrer page
Timestamp IP
User agent data of website visitors
The servers contained user information collected over two months in 2021. The first server contained September 2021 data and the second server featured December 2021 data.
The September 2021 server consisted of 242,728,328 records, totaling 389.7 GB of data, and this data was collected between September 2nd, 2021, and October 1st, 2021.
The December 2021 server featured 116,291,574 records, totaling 189.7 GB of data, and was collected between December 1st, 2021, and December 27th, 2021.
People could be located through each server’s logs of user profiles. Users could be filtered based on their IP addresses and, from here, the server disclosed extensive details about each user’s passive digital footprint — information that’s collected from internet users without them knowing, such as web-browsing activity.
Exposed users appeared in 4 to 100 records on average across the two servers. Considering the presence of multiple logs for each user, we estimate there are around 15 million people affected by the misconfigured ElasticSearch servers.
Two factors could impact our estimated figure. Firstly, a number of each server’s user profiles belonged to bots that were crawling each website, including Googlebot and Pinterest Bot. The presence of bots could lower our estimate, though, only by a small amount. Secondly, exposed users were distinguished through their IP addresses, so any website visitors using a VPN or Tor may be included in our “people affected” estimate but wouldn’t be exposed. In practice, these two factors likely balance out to some degree.
The servers were live and were being updated at the time of discovery. ElasticSearch is not at fault for this data exposure, and neither is the company providing the open-source web-analytics technology that was used to harvest the data.
You can see evidence of server logs that exposed website user traffic in the screenshots below.
Who was Affected?
An estimated 15 million users were exposed on the two open ElasticSearch servers. These people visited various websites that were analyzed by the company at fault, which was using a web analytics technology provided by the software company SnowPlow Analytics.
What’s more, this data exposure has a global impact. Users from around the world had data stored on the unsecured server.
How Was the Exposed Data Collected?
The unknown organization’s dataset was collected using software from Snowplow Analytics.
Founded in 2012, Snowplow Analytics Ltd. provides a suite of web analytics products that websites and companies can tailor to their needs.
The Snowplow Open-Source software gathers information about visitors’ traffic on websites and apps and gives users the functionality to control and customize their data collection. Organizations can use Snowplow to help analyze visitors’ passive digital footprints and gain insights into these visitors.
Snowplow Analytics is based in London, England, and turns-over annual revenues of around US$10 million. Snowplow’s software is popular with huge corporations, including Strava, The Wall Street Journal, AutoTrader, Capital One, and ABC.
It’s important to note that we aren’t accusing Snowplow Analytics of this data exposure. While the unknown organization used Snowplow’s technology, the data was seemingly exposed due to the unknown organization’s misconfigured servers.
We know the ElasticSearch servers belong to a Snowplow user because Snowplow’s website URL appears as a source in server records. Upon further correspondence, Snowplow told us that the servers’ owner was using an open-source installation of Snowplow’s software.
Impact on End Users
We do not and cannot know whether malicious parties have accessed the ElasticSearch servers. If bad actors have read or downloaded the servers’ records, exposed users could face the threat of cybercrime.
Privacy Violation
The organization’s ElasticSearch servers violate users’ privacy, though, this fact may not be obvious given the various details exposed. IP address information, used in conjunction with geolocation data, ISP, device types, operating system details, browser details, timestamps, and website visit history could be used to locate and identify specific individuals. It’s not easy to identify users with this information. Nonetheless, each user could be subject to various cybercrimes if a hacker was able to identify them on one of the ElasticSearch servers.
Impersonation
Malicious actors could conduct illegal activities online while posing as another device with the exposed user agent data. Hackers impersonate user agents to appear as legitimate sources to web applications. This means hackers can avoid being blocked during attacks on websites or user accounts.
Impact on the Server’s Owner
The organization that owns the ElasticSearch servers could also be impacted by this data leak. This organization could be investigated by several data protection authorities with users from around the world exposed on its server.
Data Privacy Violations
Prominent data protection regulations in the United States, the EU, and the United Kingdom may have been breached when the ElasticSearch servers exposed website users’ traffic. This is because users from around the world are identifiable through their leaked data.
Status of the Data Exposure
The Website Planet research team discovered the misconfigured ElasticSearch servers on January 18th, 2022. It was easy to discern the servers’ origin because of references to Snowplow’s website URL in logs and further correspondence with Snowplow. Though, we do not have a name for the ElasticSearch servers’ owner.
The Website Planet research team sent an email to Snowplow after discovering the open ElasticSearch servers. On January 20th, 2022, we sent another message to Snowplow Analytics. The company’s head of engineering replied and told us that the servers belong to an organization using an open-source installation of Snowplow’s software. Snowplow said it would reach the organization to close the breach. On January 26th, 2022, we followed up with Snowplow regarding the open servers, and between January 27th and January 31st, 2022, the misconfigured Elasticsearch servers were secured.
Protecting Your Data
The open ElasticSearches demonstrate that online users can have personal data exposed that was collected through on-site tracking, something many of us do not even consider while browsing the internet.
There are steps users can take to limit on-site tracking and prevent this kind of data exposure from happening in the future.
A Virtual Private Network (VPN) hides the user’s online activity and IP address, making the user anonymous to on-site tracking and cookies. A VPN should be the first option for internet users who want to protect their online activity. People can also use the Tor browser to access the internet anonymously and maintain the privacy of their data.
Internet users should consider carefully whether they want to accept “cookies” from a website before entering. Cookies are used to track our online activity. Cookies can improve our experience of a website, brand, or service, though we ultimately hand over more data in exchange for this improved customer experience.
Finally, most web browsers allow users to disable cookies and ad tracking in their browser settings. Users can exercise this option if they want to avoid tracking altogether.
How and Why We Report on Data Breaches
We want to help our readers stay safe when using any website or online product.
Unfortunately, most data breaches are never discovered or reported by the companies responsible. So, we decided to do the work and find the vulnerabilities putting people at risk.
We follow the principles of ethical hacking and stay within the law. We only investigate open, unprotected databases that we find randomly, and we never target specific companies.
By reporting these leaks, we hope to make the internet safer for everyone.
What is Website Planet?
Website Planet prioritizes honesty and serves as the premier resource for web designers, digital marketers, developers, and businesses with an online presence. We offer tools and resources tailored to individuals at all skill levels, ranging from beginners to experts.
We have an experienced team of ethical security research experts who uncover and disclose serious data leaks as part of a free service for the online community at large. This has included a breach in a medical AI platform, as well as a breach in a French real estate agency leaking sensitive data.
You can read about how we tested five popular web hosts to see how easily hackable they are here.
Website Planet is the number one resource for web designers, digital marketers, developers, and businesses with an online presence.
Our team of ethical security research experts uncovers and discloses serious data leaks as part of a free community service we perform for the web at large.
Thank you, - your comment was submitted successfully!
We check all user comments within 48 hours to make sure they are from real people like you. We're glad you found this article useful - we would appreciate it if you let more people know about it.
Share this blog post with friends and co-workers right now:
Thank you, , your comment was submitted successfully!
We check all comments within 48 hours to make sure they're from real users like you. In the meantime, you can share your comment with others to let more people know what you think.
Thank you for signing up!
Once a month you will receive interesting, insightful tips, tricks, and advice to improve your website performance and reach your digital marketing goals!