Since half a year, the General Data Protection Regulation, better known under its acronym GDPR, is enforced. Collection, storage and publication of personal data is greatly limited and though it’s not perfect yet, privacy really is in everyone’s mind nowadays.
Within the scope of domain registration, the biggest impact is that on the whois of generic top level domains (gTLDs). Let me briefly explain those two terms, which is important for understanding the scope of this blog post.
- The whois is a data directory, maintained by registries and/or registrars, that contains the registration data of each domain name. The data stored include the domain name, domain status, expiration date, registrar and domain contacts.
- A gTLD is any domain extension that is not a country code: .com is a gTLD, .uk is not; .info is a gTLD, .cn is not; .berlin is a gTLD, .de is not; .eu is… well, even registry EURid does not know, but at least .eu is not controlled by coordinator ICANN, so we consider it a country code. Let’s make it simple: each extension that has 3 or more characters is a gTLD.
I do not cover the ccTLDs (country code top level domains) here, as in most cases their whois service was already quite restricted before the enforcement date of the GDPR.
The era of unrestricted, open whois
ICANN has always enforced an open whois in its contracts with registries and registrars. All contact data of the domain owner, administrative contact and technical contact were published without restrictions. Perfect if you want to contact the domain holder because you want to buy his domain; perfect if you want to contact the domain’s technical contact because the nameservers are broken; perfect if you’re doing statistical research; perfect if you are a trademark lawyer and want to contact the domain holder of a domain that infringes a trademark.
On the other hand, personal data comes with potential abuse: the prime minister does not want the whole world to know his telephone number; the domain holder of a strongly polarising website does not want his opponents to visit his home address; the registrant of a website does not want to be flooded by unsolicited offers of web design or search engine optimisation services.
This last example is what this blog post is about.
A short note on how easy it was to get all data from just registered domains: gTLDs are required to open their zone files to almost everyone. I could (and still can), at any moment, get a full list of all registered domain names in the gTLD zone and start crawling the whoises to retrieve linked e-mail addresses. With almost 200.000.000 registered gTLDs, this is a very valuable data source. If I do it daily, I can easily find all domains registered within the last 24 hours.
The era of restricted whois
Since the GDPR is enforced, the contact data which are publicly available through the whois are limited to company name, state/province and country code, together with an anonymised e-mail address or a link to a contact form. An example is the whois output of openprovider.com on the right; click the image to enlarge.
In other words: there is no relation anymore between domain name and contact e-mail address. Nobody can find the domain holder’s e-mail address from the whois data of the respective domain name.
So, we finally got rid of domain spam!?
Theoretically, with the whois limited to the most basic information, old practices are no longer possible. However, theory is not always the real life situation… Recently, we investigated two unrelated questions from two of our most valued customers. Both customers complained about domain-related spam to the domain holder, shortly after registration of a new domain.
There justified question was: is Openprovider leaking data? The answer is “no”. We were able to drill down each single case to a logical explanation.
Fact: the data is not published anywhere
What are the potential leaks? Of course our own whois server, which ICANN requires us to run. I can assure you: we do not leak data there. I can be so sure about that, because the “REDACTED FOR PRIVACY” fields are hard coded in our implementation: we did not yet implement a way to disclose whois data for eligible requesters.
That leads us to the registry whois server. Most cases we saw concerned .com and .net domains. Again, I can assure you that no data is leaking there: Verisign, the registry for .com and .net, does not even support contact objects in their registry system! In other words, we never send contact data to Verisign.
Other registries do run their own whois server including contact data stored in their systems, but just like us, those registries are bound to ICANN rules. Immediately from the 25th of May 2018, those registries hid the personal data fields. I cannot guarantee that for all gTLD registries, but the complaints are about the biggest ones: .org, .biz, .info and just one other domain under .art. So while I cannot talk for all of them, I’m pretty sure that’s not a source for leaking.
A short note on the difference between collection and publication: the GDPR so far only affects the published data. Under the hood, full contact data is still collected by the registrar and most registries, and full contact data is shared with a designated escrow agent as part of ICANN’s requirements. However, publication is strictly limited.
Other potential sources require illegal actions: of course, our systems can be hacked (we don’t think it happened), registry systems can be hacked (we don’t think either), the data escrow provider can be hacked (small chance, plus all data is encrypted) – and your systems can be hacked (I assume that’s unlikely as well, just for the goal of sending spam to domain holders).
Looking at all the above, well, it’s quite unlikely that spamming happens on such a big scale because of source data being available.
Assumption: what happened?
All cases that we investigated can be divided in three scenarios, of which the second and third are most likely to happen on a big scale. For full understanding, it’s important to know that the registries’ zone files are still available: if you want, you can download a fresh list of domains every day, and you know which domains have been registered during the last 24 hours.
- Some websites clearly show contact details: if domain-x.com is registered and you place a big banner “Contact met at firstname.lastname@example.org!” on it, you deliberately choose to disclose your personal data.
- Spammers may use standard addresses: if domain-y.com is registered, just try sending an e-mail to email@example.com.
- But most likely, spammers use professional service providers like domaintools.com or whoxy.com. Those providers have decades of history on millions of domains, including personal data from the pre-GDPR era. In my opinion they are not allowed to store and publish it, but it’s a fact they still do so and these data are used. I’ll provide some examples.
Example 1: openprovider.page
The domain openprovider.page was registered on the 10th of October this year on the name of Openprovider, on our current address (Kipstraat in Rotterdam) – which is invisible in the whois, of course.
However, the details shown are an address where we moved away around 15 years ago, the contact person is not the one actually linked to the domain and all other personal data is incorrect as well.
Looking up this domain on Whoxy.com, they claim to know the holder’s personal details:
I expect these data to come from an older domain (probably a test domain) that was registered on the name “Openprovider” as well.
Example 2: incorrect historical match
The second example (I won’t disclose the domain name as it’s owned by an unaware end user) strengthens my assumptions. The public whois shows that the domain is registered on the Dutch company Masoko. Whoxy.com (I don’t have shares in that company!) correctly shows “Masoko” as owner, but the contact data are those of a completely unrelated Swedish company with the same name.
Example 3: GDPR-limited data
A last real life example is a domain registered on the company Proto-Service. Apparently Whoxy.com could not link this company name to any previously collected data, and thus it shows nothing more than “REDACTED FOR PRIVACY”:
What does this proof?
Let’s be honest, those are all examples that just illustrate my assumptions. You can call it a bias. It does not proof anything, but I hope that I’ve explained in this blog post why I rather explain “domain-related spam after the 25th of May” as a result of smart minds than a result of data leaking.
Every example that we’ve investigated so far can be explained by what I wrote down in this blog post.
I am still looking for an example that can not be explained in this way: a domain, registered on a unique e-mail address that is used nowhere else, that receives domain-related spam. For example, domain-z.com is registered with e-mail address firstname.lastname@example.org, which is published or used nowhere else on the internet. Could it be possible that domain-related spam is received on this e-mail address? I doubt! But if you encounter this situation, we’ll investigate and pay you a beer!
Oh, and what about the question?
You’re right, the question in this blog post’s title is whether or not the GDPR resulted in a reduction of domain related spam. Honestly, I do not know. Despite the title, my intention was to explain the above rather than make a numerical analysis. I gladly leave that to the experts! For example, one of the leading anti spam agencies, Spamhouse, does not see a clear proof and states it is far too early to tell.