Microsoft 365 Cloud Data Scans

Last updated: March 11, 2026

Microsoft Cloud Data Scans are used to identify sensitive data stored within Microsoft 365 services, including Email, OneDrive, and SharePoint. These scans help organizations understand data exposure risks within cloud-based collaboration and communication platforms.

This article explains how Microsoft Cloud Data Scans work, how they are executed, what data they collect, and important performance considerations.


What Microsoft Cloud Data Scans Do

Microsoft Cloud Data Scans evaluate Microsoft 365 content for sensitive data patterns based on selected categories. They are commonly used to:

  • Identify sensitive data stored in email and cloud file repositories

  • Detect regulated data such as credit card numbers or credentials

  • Assess cloud data exposure risk

  • Support remediation and compliance initiatives

These scans evaluate cloud-hosted data only and do not scan on-premises file systems.


Sensitive Data Categories Scanned

When configuring a Data Sensitivity Scan, you must select at least one scan category. Categories define the types of sensitive data patterns the scan searches for. Selecting a large number of categories may significantly increase scan duration.

Common Categories

  • Bank Accounts

  • Credit Cards

  • Drivers Licenses

  • Passport Numbers

  • Passwords

  • Social Security Numbers

Finance Categories

  • Tax ID Numbers

  • Financial Keywords

Healthcare Categories

  • Genetic Disorder Keywords

  • ICD10 Diagnoses

  • General Healthcare Keywords

  • Medication Drug Names

  • Mental Health Disorders

  • Medicare Numbers

  • National Provider ID's

  • Provider DEA Numbers

  • Medicaid CIN Numbers

PII Categories

  • Addresses

  • Alien USCIS Numbers

  • Dates of Births

  • Email Addresses

  • GPS Coordinates

  • Phone Numbers

  • Race - Ethnicity

  • Religious Beliefs Keywords

  • Sex - Gender

  • Social Media

Organization Categories

  • HR Keywords

  • IP Addresses

  • MAC Addresses

  • UNC Paths

  • URL Addresses

  • VIN Numbers

Custom Category (Custom Regex / Keywords)

The Custom category is available for organization-specific patterns. By default, Custom contains no categories until configured.

To add Custom categories, navigate to:

Admin โ†’ Custom Regex / Keywords

Adding a New RegEx

Data Sensitivity Scans use built-in scan categories, but if unique patterns need to be scanned, custom RegEx can be created.

Add New RegEx:

  • Click Add New RegEx in the Custom Regex section.

  • Fill out the following fields:

    • Name: Provide a meaningful name for the RegEx.

    • Description: Add a brief description of the purpose of this RegEx.

    • Score: Assign a risk score between 1 and 13 for each match. Scores are cumulative across occurrences in a file.

    • RegEx: Enter the actual regular expression used to identify the pattern.

  • Submit the completed form.

Using Custom RegEx in Scans:

  • When configuring a Data Sensitivity Scan, custom RegEx patterns appear under the Custom category for selection.

Adding a New Keyword List

Keyword lists allow scans to identify specific terms in files.

Add New Keywords:

  • Click Add New Keywords in the Custom Keywords section.

  • Fill out the following fields:

    • Name: Provide a name for the keyword list.

    • Description: Add a description of the listโ€™s purpose.

    • Score: Assign a risk score (1โ€“13) for each keyword match. Similar to RegEx, scores are cumulative.

    • Keywords: Enter a list of keywords to search for during scans.

  • Submit the completed form.

Using Custom Keywords in Scans:

  • Keyword lists appear under the Custom category when setting up a Data Sensitivity Scan.


Execution Model

Microsoft Cloud Data Scans can be executed using two supported models:

1) Cyrisma Cloud Agent (Default)

  • One cloud scanning agent is provided per Cyrisma instance

  • No local infrastructure is required

  • Scans are executed from Cyrisma-managed cloud infrastructure

2) Local Cloud Agent (Optional)

  • A user-installed agent designated as a Local Cloud Agent

  • Any Windows Cyrisma may be assigned this role

  • Only one Local Cloud Agent can be assigned per instance at a time

Both models perform the same types of cloud data scans. The difference is performance, control, and scale, not scan capability.


When to Use a Local Cloud Agent

A Local Cloud Agent is recommended when:

  • Scanning large Microsoft 365 environments

  • Scanning many mailboxes, drives, or SharePoint sites

  • Faster scan execution is required

  • Reduced dependency on shared cloud resources is desired

Because the Local Cloud Agent runs within your environment, it avoids shared-agent queueing and provides more predictable scan performance.


Microsoft 365 Integration Requirement

Microsoft Cloud Data Scans require a completed Microsoft 365 integration.

  • Authentication and access are handled through the integration

  • No local credentials or NT / NetBIOS credentials are used

  • The integration defines which services and scopes are available for scanning

Without a completed integration, Microsoft Cloud Data Scans cannot be performed.

๐Ÿ“„ Microsoft Integration Guide


Supported Data Sources

Microsoft Cloud Data Scans can evaluate data stored in:

  • Email

  • OneDrive

  • SharePoint

The scan scope is limited to content accessible through the configured integration permissions.


Data Collected

Depending on scan configuration and selected categories, Microsoft Cloud Data Scans may collect:

  • File and message identifiers

  • Locations of sensitive data findings

  • Matches for sensitive data patterns, including:

    • Passwords (displayed to validate false positives)

    • Credit card numbers (masked for validation)

  • Contextual evidence required for remediation review

Only cloud content accessible through the integration is evaluated.


5-Item Selection Limit (CCloud Agent Only)

When using the Cyrisma Cloud Agent, data scans are limited to five selected items per scan (for example, five mailboxes or five drives).

This limit exists to:

  • Prevent overloading shared scanning infrastructure

  • Reduce scan failures and incomplete results

  • Avoid Microsoft API throttling

  • Ensure fair resource usage across all customers

This limit does not apply when using a Local Cloud Agent.


Alternatives for Large Environments

If scanning more than five items at a time is required, you have two supported options:

  1. Install and assign a Local Cloud Agent

    • Removes the 5-item limit

    • Improves scan speed and reliability

  2. Batch scans in groups of five

    • Supported when using the shared cloud agent

    • Requires more scheduling but avoids throttling issues


Performance Considerations

Scan duration is influenced by:

  • Volume of Microsoft 365 content

  • Number of selected data categories

  • Microsoft API rate limits

For large environments:

  • Expect longer scan times

  • Consider using a Local Cloud Agent for consistent performance


Common Limitations

  • Scans are limited to Microsoft 365 services included in the integration

  • Content outside Exchange Online, OneDrive, and SharePoint is not evaluated

  • API throttling may affect scan duration when using the shared cloud agent

  • Only one Local Cloud Agent may be assigned per instance


Best Practices

  • Complete Microsoft 365 integration before scheduling scans

  • Use a Local Cloud Agent for large or frequent scans

  • Limit initial scan scope to validate performance and results

  • Review findings carefully to distinguish true positives

  • Schedule scans strategically to minimize throttling impact