Microsoft Replaces User Data Search with Standard eDiscovery

A New Method to Handle GDPR DSRs

This one is for the compliance purists, or at least, those concerned with dealing with GDPR data subject requests (DSRs). Message center notification MC664475 (3 Aug 2023) announces that Microsoft is retiring the User data search tool (previously called the Data subject requests tool) with effect from August 30, 2023. Active cases will move to eDiscovery (standard) and can be processed to completion there.

DSRs came about when GDPR gave individuals (the data subjects) the right to recover any information an organization held about them. A DSR is a formal request for that information which the receiving organization must respond to within a month. Microsoft’s user data search solution is a wizard that creates a special form of a standard eDiscovery case with a search designed to find the relevant information.

Microsoft says that there’s been an increase in DSRs and note that the User data search tool is not as functional as eDiscovery standard. The tool doesn’t take advantage of changes and improvements added to eDiscovery standard recently, so it makes sense to replace the tool and concentrate on a single set of features.

Search Query to Find All User Data

To help with the transition, Microsoft published a sample Keyword Query Language (KQL) query to find emails and documents authored by the subject of a user data search. The query is:

participants:"<user name>" OR author:"<user name>" OR createdby:"<user name>"(c:c)(ItemClass=IPM.Document)(ItemClass=IPM.Note)(ItemClass=IPM.Note.Microsoft.Conversation)(ItemClass=IPM.Note.Microsoft.Missed)(ItemClass=IPM.Note.Microsoft.Conversation.Voice)(ItemClass=IPM.Note.Microsoft.Missed.Voice)(ItemClass=IPM.SkypeTeams.Message)

The query can be used with a content search or eDiscovery case search. The important thing is to make sure that the search covers all Exchange Online and SharePoint Online locations.

I tested the search query with a content search. I made three changes. First, I entered the user principal name of the user to search for. Second, I removed the “(c:c)” entry from the search as this term is usually only inserted by the query editor when it checks the syntax and completeness of queries. Finally, I removed the trailing double quotation mark as it wasn’t needed. Figure 1 shows the query as input into the KQL editor. The syntax check advises that the query is quite dense and difficult to read, but that doesn’t affect the effectiveness of the query.

Entering the KQL query for a user data search
Figure 1: Entering the KQL query for a user data search

Figure 2 shows the search statistics. Remember that content searches always perform an initial estimate based on search indexes, which is what we see here. The final output for a search is generated when exporting search results. However, the estimate creates a good picture of where content related to the user is present. In this instance, it’s mostly in Exchange Online mailboxes, which implies that the user didn’t create many documents stored in SharePoint Online or OneDrive for Business.

Reviewing statistics for a user data search
Figure 2: Reviewing statistics for a user data search

Searching is Only the Start

Running a search to find information is only the start of satisfying a DSR. Among points that should be considered are:

  • Content searches and eDiscovery standard can only find information in cloud locations. In hybrid environments, you might need to run searches against on-premises servers.
  • Because of the way that Exchange Online delivers separate messages to recipient mailboxes, there’s likely to be many duplicates in the search results.
  • When you export search results, Exchange Online decrypts protected messages. Only eDiscovery premium decrypts protected documents when exporting those files, so some other arrangements might be needed to remove sensitivity labels from protected documents before their content is checked and the files can be passed to the user.
  • Searches do not address the need to remove information about a data subject (the right to be forgotten defined in Article 17 of the GDPR). However, the reports generated for a search tell you where data matches are found and act as a guide for checking individual locations and items to decide whether items are relevant and what content should be removed. Remember, not all data found for a data subject needs to be removed from locations as it is legally permissible to keep data under certain circumstances, such as the requirement to comply with a legal obligation.

The work to prepare to handover information to the person who requested the DSR starts when the search export finishes. Unlike the search and export operations, reviewing the exported material is a manual process that can become very time consuming, especially for people who aren’t accustomed to responding to DSRs.

Sensible Change

Compliance nerds (like me – as evident in this article about using targeted collections in content searches) will understand why Microsoft removed a specialized tool in favor of a more generic approach. Let’s hope that the engineering resources released by the move help to improve content searches and eDiscovery standard. Better performance for content searches would be a start. They haven’t improved much in that respect since the introduction of the new UI in 2021.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.