Google Cloud Data Scans

Last updated: February 3, 2026

Google Cloud Data Scans are used to identify sensitive data stored within Google Workspace services, including Gmail and Google Drive. These scans help organizations understand data exposure risks within cloud-based collaboration and communication platforms.

This article explains how Google Cloud Data Scans work, how they are executed, what data they collect, and important performance considerations.


What Google Cloud Data Scans Do

Google Cloud Data Scans evaluate Google Workspace content for sensitive data patterns based on selected categories. They are commonly used to:

  • Identify sensitive data stored in email and cloud file repositories

  • Detect regulated data such as credit card numbers or credentials

  • Assess cloud data exposure risk

  • Support remediation and compliance initiatives

These scans evaluate cloud-hosted data only and do not scan on-premises file systems.


Sensitive Data Categories Scanned

When configuring a Data Sensitivity Scan, you must select at least one scan category. Categories define the types of sensitive data patterns the scan searches for. Selecting a large number of categories may significantly increase scan duration.

Common Categories

  • Bank Accounts

  • Credit Cards

  • Drivers Licenses

  • Passport Numbers

  • Passwords

  • Social Security Numbers

Finance Categories

  • Tax ID Numbers

  • Financial Keywords

Healthcare Categories

  • Genetic Disorder Keywords

  • ICD10 Diagnoses

  • General Healthcare Keywords

  • Medication Drug Names

  • Mental Health Disorders

  • Medicare Numbers

  • National Provider ID's

  • Provider DEA Numbers

  • Medicaid CIN Numbers

PII Categories

  • Addresses

  • Alien USCIS Numbers

  • Dates of Births

  • Email Addresses

  • GPS Coordinates

  • Phone Numbers

  • Race - Ethnicity

  • Religious Beliefs Keywords

  • Sex - Gender

  • Social Media

Organization Categories

  • HR Keywords

  • IP Addresses

  • MAC Addresses

  • UNC Paths

  • URL Addresses

  • VIN Numbers

Custom Category (Custom Regex / Keywords)

The Custom category is available for organization-specific patterns. By default, Custom contains no categories until configured.

To add Custom categories, navigate to:

Admin โ†’ Custom Regex / Keywords

Adding a New RegEx

Data Sensitivity Scans use built-in scan categories, but if unique patterns need to be scanned, custom RegEx can be created.

Add New RegEx:

  • Click Add New RegEx in the Custom Regex section.

  • Fill out the following fields:

    • Name: Provide a meaningful name for the RegEx.

    • Description: Add a brief description of the purpose of this RegEx.

    • Score: Assign a risk score between 1 and 13 for each match. Scores are cumulative across occurrences in a file.

    • RegEx: Enter the actual regular expression used to identify the pattern.

  • Submit the completed form.

Using Custom RegEx in Scans:

  • When configuring a Data Sensitivity Scan, custom RegEx patterns appear under the Custom category for selection.

Adding a New Keyword List

Keyword lists allow scans to identify specific terms in files.

Add New Keywords:

  • Click Add New Keywords in the Custom Keywords section.

  • Fill out the following fields:

    • Name: Provide a name for the keyword list.

    • Description: Add a description of the listโ€™s purpose.

    • Score: Assign a risk score (1โ€“13) for each keyword match. Similar to RegEx, scores are cumulative.

    • Keywords: Enter a list of keywords to search for during scans.

  • Submit the completed form.

Using Custom Keywords in Scans:

  • Keyword lists appear under the Custom category when setting up a Data Sensitivity Scan.


Execution Model

Google Cloud Data Scans can be executed using two supported models:

1) Cyrisma Cloud Agent (Default)

  • One cloud scanning agent is provided per Cyrisma instance

  • No local infrastructure is required

  • Scans are executed from Cyrisma-managed cloud infrastructure

2) Local Cloud Agent (Optional)

  • A user-installed agent designated as a Local Cloud Agent

  • Any Windows Cyrisma agent may be assigned this role

  • Only one Local Cloud Agent may be assigned per instance at a time

  • If a Local Cloud Agent is assigned, it replaces the shared cloud agent for scan execution

Both models provide the same scanning capabilities. The difference is performance, control, and scale, not scan coverage.


When to Use a Local Cloud Agent

A Local Cloud Agent is recommended when:

  • Scanning large Google Workspace environments

  • Scanning many mailboxes or drives

  • Faster scan execution is required

  • Reduced dependency on shared cloud resources is preferred

Using a Local Cloud Agent improves scan predictability and throughput.


Google Workspace Integration Requirement

Google Cloud Data Scans require a completed Google Workspace integration.

  • Authentication and access are handled through the integration

  • No local or NT / NetBIOS credentials are used

  • The integration defines which services and scopes are available for scanning

Without a completed integration, Google Cloud Data Scans cannot be performed.

๐Ÿ“„ Google Integration Guide


Supported Data Sources

Google Cloud Data Scans can evaluate data stored in:

  • Gmail

  • Google Drive

The scan scope depends on permissions granted during integration.


Data Collected

Depending on scan configuration and selected categories, Google Cloud Data Scans may collect:

  • File and message identifiers

  • Locations of sensitive data findings

  • Matches for sensitive data patterns, including:

    • Passwords (displayed to validate false positives)

    • Credit card numbers (masked for validation)

  • Contextual evidence required for remediation review

Only content accessible through the integration is evaluated.


Performance Considerations

Scan duration is influenced by:

  • Volume of Google Workspace content

  • Number of selected sensitive data categories

  • Google API rate limits

For large environments:

  • Expect longer scan durations

  • Use a Local Cloud Agent for consistent performance


Common Limitations

  • Scans are limited to Google Workspace services included in the integration

  • Content outside Gmail and Google Drive is not evaluated

  • API rate limits may affect scan duration

  • Only one Local Cloud Agent may be assigned per instance


Best Practices

  • Complete Google Workspace integration before scheduling scans

  • Use a Local Cloud Agent for large or frequent scans

  • Limit initial scan scope to validate performance and results

  • Review findings carefully to confirm true positives

  • Schedule scans strategically to minimize API throttling impact