Microsoft 365 Cloud Data Scans
Last updated: March 11, 2026
Microsoft Cloud Data Scans are used to identify sensitive data stored within Microsoft 365 services, including Email, OneDrive, and SharePoint. These scans help organizations understand data exposure risks within cloud-based collaboration and communication platforms.
This article explains how Microsoft Cloud Data Scans work, how they are executed, what data they collect, and important performance considerations.
What Microsoft Cloud Data Scans Do
Microsoft Cloud Data Scans evaluate Microsoft 365 content for sensitive data patterns based on selected categories. They are commonly used to:
Identify sensitive data stored in email and cloud file repositories
Detect regulated data such as credit card numbers or credentials
Assess cloud data exposure risk
Support remediation and compliance initiatives
These scans evaluate cloud-hosted data only and do not scan on-premises file systems.
Sensitive Data Categories Scanned
When configuring a Data Sensitivity Scan, you must select at least one scan category. Categories define the types of sensitive data patterns the scan searches for. Selecting a large number of categories may significantly increase scan duration.
Common Categories
Bank Accounts
Credit Cards
Drivers Licenses
Passport Numbers
Passwords
Social Security Numbers
Finance Categories
Tax ID Numbers
Financial Keywords
Healthcare Categories
Genetic Disorder Keywords
ICD10 Diagnoses
General Healthcare Keywords
Medication Drug Names
Mental Health Disorders
Medicare Numbers
National Provider ID's
Provider DEA Numbers
Medicaid CIN Numbers
PII Categories
Addresses
Alien USCIS Numbers
Dates of Births
Email Addresses
GPS Coordinates
Phone Numbers
Race - Ethnicity
Religious Beliefs Keywords
Sex - Gender
Social Media
Organization Categories
HR Keywords
IP Addresses
MAC Addresses
UNC Paths
URL Addresses
VIN Numbers
Custom Category (Custom Regex / Keywords)
The Custom category is available for organization-specific patterns. By default, Custom contains no categories until configured.
To add Custom categories, navigate to:
Admin โ Custom Regex / Keywords
Adding a New RegEx
Data Sensitivity Scans use built-in scan categories, but if unique patterns need to be scanned, custom RegEx can be created.
Add New RegEx:
Click Add New RegEx in the Custom Regex section.
Fill out the following fields:
Name: Provide a meaningful name for the RegEx.
Description: Add a brief description of the purpose of this RegEx.
Score: Assign a risk score between 1 and 13 for each match. Scores are cumulative across occurrences in a file.
RegEx: Enter the actual regular expression used to identify the pattern.
Submit the completed form.
Using Custom RegEx in Scans:
When configuring a Data Sensitivity Scan, custom RegEx patterns appear under the Custom category for selection.
Adding a New Keyword List
Keyword lists allow scans to identify specific terms in files.
Add New Keywords:
Click Add New Keywords in the Custom Keywords section.
Fill out the following fields:
Name: Provide a name for the keyword list.
Description: Add a description of the listโs purpose.
Score: Assign a risk score (1โ13) for each keyword match. Similar to RegEx, scores are cumulative.
Keywords: Enter a list of keywords to search for during scans.
Submit the completed form.
Using Custom Keywords in Scans:
Keyword lists appear under the Custom category when setting up a Data Sensitivity Scan.
Execution Model
Microsoft Cloud Data Scans can be executed using two supported models:
1) Cyrisma Cloud Agent (Default)
One cloud scanning agent is provided per Cyrisma instance
No local infrastructure is required
Scans are executed from Cyrisma-managed cloud infrastructure
2) Local Cloud Agent (Optional)
A user-installed agent designated as a Local Cloud Agent
Any Windows Cyrisma may be assigned this role
Only one Local Cloud Agent can be assigned per instance at a time
Both models perform the same types of cloud data scans. The difference is performance, control, and scale, not scan capability.
When to Use a Local Cloud Agent
A Local Cloud Agent is recommended when:
Scanning large Microsoft 365 environments
Scanning many mailboxes, drives, or SharePoint sites
Faster scan execution is required
Reduced dependency on shared cloud resources is desired
Because the Local Cloud Agent runs within your environment, it avoids shared-agent queueing and provides more predictable scan performance.
Microsoft 365 Integration Requirement
Microsoft Cloud Data Scans require a completed Microsoft 365 integration.
Authentication and access are handled through the integration
No local credentials or NT / NetBIOS credentials are used
The integration defines which services and scopes are available for scanning
Without a completed integration, Microsoft Cloud Data Scans cannot be performed.
๐ Microsoft Integration Guide
Supported Data Sources
Microsoft Cloud Data Scans can evaluate data stored in:
Email
OneDrive
SharePoint
The scan scope is limited to content accessible through the configured integration permissions.
Data Collected
Depending on scan configuration and selected categories, Microsoft Cloud Data Scans may collect:
File and message identifiers
Locations of sensitive data findings
Matches for sensitive data patterns, including:
Passwords (displayed to validate false positives)
Credit card numbers (masked for validation)
Contextual evidence required for remediation review
Only cloud content accessible through the integration is evaluated.
5-Item Selection Limit (CCloud Agent Only)
When using the Cyrisma Cloud Agent, data scans are limited to five selected items per scan (for example, five mailboxes or five drives).
This limit exists to:
Prevent overloading shared scanning infrastructure
Reduce scan failures and incomplete results
Avoid Microsoft API throttling
Ensure fair resource usage across all customers
This limit does not apply when using a Local Cloud Agent.
Alternatives for Large Environments
If scanning more than five items at a time is required, you have two supported options:
Install and assign a Local Cloud Agent
Removes the 5-item limit
Improves scan speed and reliability
Batch scans in groups of five
Supported when using the shared cloud agent
Requires more scheduling but avoids throttling issues
Performance Considerations
Scan duration is influenced by:
Volume of Microsoft 365 content
Number of selected data categories
Microsoft API rate limits
For large environments:
Expect longer scan times
Consider using a Local Cloud Agent for consistent performance
Common Limitations
Scans are limited to Microsoft 365 services included in the integration
Content outside Exchange Online, OneDrive, and SharePoint is not evaluated
API throttling may affect scan duration when using the shared cloud agent
Only one Local Cloud Agent may be assigned per instance
Best Practices
Complete Microsoft 365 integration before scheduling scans
Use a Local Cloud Agent for large or frequent scans
Limit initial scan scope to validate performance and results
Review findings carefully to distinguish true positives
Schedule scans strategically to minimize throttling impact