MithiDocs

Archiving Google Workspace data into Vaultastic

Overview

Applications in Google Workspace—such as Gmail, Google Chat, and Google Drive—generate large volumes of organizational data.

This data often needs to be preserved for:

  • Business continuity and disaster recovery

  • Regulatory compliance

  • Supervision and audit requirements

  • Legal hold and investigation workflows

Vaultastic enables organizations to ingest, store, and manage Google Workspace data using configurable ingestion pipelines. Data is archived into storage tiers optimized for access frequency, performance, and long-term retention.

Vaultastic Storage Tiers

Vaultastic organizes archived data into multiple storage tiers.

StorePurpose
Active StoreHigh-performance storage for frequently accessed data and supervision workflows
Open StoreMedium-term archival with searchable retention
Deep StoreLong-term archival optimized for low-cost storage


Key Notes:

  • Data placement is configurable during ingestion.
  • Data can be moved between tiers based on access needs.
  • Activation workflows allow moving data from Open/Deep → Active for investigation.

Data Ingestion Overview

The following table summarizes how Google Workspace data can be archived into Vaultastic.

Data SourceDestination StoreMethodDescription
Live Email TransactionsActive StoreGmail Routing Rules
Automatically archives inbound and outbound email
Mailbox Email (Existing Data)Active / Open / DeepData Upload Application
Copies historical mailbox email to Vaultastic
PST / EML FilesActive / Open / DeepManual Upload
Upload existing email archives
Google ChatActive / Open / DeepData Upload Application
Converts chat messages into email format before ingestion
Google DriveOpen / DeepData Upload Application
Uploads files for long-term archival


Clarification:

  • Chat data is normalized into email format to enable consistent indexing and search.

Email Archival

Vaultastic supports:

  • Real-time email capture (journal-based)
  • Historical mailbox ingestion

Live Mail Flow

To automatically capture all email transactions:

  1. Configure Gmail routing rules in Google Workspace Admin Console.

  2. Route copies of inbound and outbound email to the Vaultastic Active Store.

  3. Apply the rule to

    • Selected users, 

    • Groups, or 

    • The entire domain.

Important:

  • This is a continuous journaling mechanism
  • It does not impact user mail delivery

Validation:

  • Send test emails internally and externally
  • Confirm ingestion in Vaultastic Active Store

Existing Mailbox Data

Historical Email already stored in user mailboxes can be ingested using the Data Upload Application.

Supported ingestion targets:

  • Active Store → for supervision and search
  • Open Store → for medium-term retention
  • Deep Store → for long-term archival

Enhancements:

  • Filter by:
    • Date range
    • Users
    • Mailbox size thresholds

PST or EML Upload

If data is already exported:

Supported formats:

  • PST
  • EML

Upload targets:

  • Open Store
  • Deep Store

Operational Notes:

  • Bulk uploads should be staged to avoid performance impact
  • Data can later be promoted to Active Store for investigation

Google Chat Archival

Google Chat data is archived for:

  • Compliance supervision
  • Investigations & audit workflows
  • Long-term retention

Using the Data Upload Application, administrators can archive:

  • Direct messages
  • Spaces

Filters:

  • Date range
  • Selected users
  • Entire domain

Processing Behavior:

  • Chat messages are converted into an email-compatible format
  • Ensures:
    • Uniform indexing
    • Search consistency

Google Drive Archival

Google Drive files can be archived using the Data Upload Application.

Supported destinations:

  • Open Store
  • Deep Store

Configuration Options:

  • User-based selection
  • Group-based selection
  • Date-based filtering
  • Scheduled recurring archival

Clarification Added:

  • File metadata (owner, timestamps, permissions) is preserved
  • Version history handling depends on API limitations (see limitations section)

Initial Configuration

Follow the steps below to configure Google Workspace archival in Vaultastic.

1. Define User Scope

  • Create Google Workspace Groups
  • Add users to be archived

Why this matters:

  • Enables centralized management
  • Simplifies onboarding/offboarding

2. Configure Email Routing

  • Configure Gmail routing rules
  • Route inbound, local and outbound email copies to Vaultastic

Validation Checklist:

Verify that mail flow is functioning correctly to ensure continuous capture of email transactions.

  • The rule applied to correct the scope
  • No delivery disruption
  • Emails visible in Vaultastic

3. Configure API Access

  • Generate API credentials in Google Workspace
  • Register credentials in Vaultastic using the Setup Connections application

Required API Access:

  • Gmail
  • Google Chat
  • Google Drive

Best Practice:

  • Use least privilege access
  • Prefer service accounts with domain-wide delegation

4. Configure Automated Archival

Using Data Upload application, configure automated archival schedules for:

  • Google Chat

  • Google Drive

Recommendations:

  • Align with operational and compliance requirements
  • Define frequency (daily/weekly)
  • Avoid peak business hours

5. Upload Historical Data

To eliminate gaps:

  • Ingest historical mailbox data
  • Archive historical chat data
  • Upload legacy Drive data

Outcome:

  • Ensures a complete baseline before automation begins

This approach ensures:

  • Continuous capture of new data

  • Complete historical data coverage

  • Efficient storage management across Vaultastic tiers

  • Audit-ready compliance and retention capabilities.

Security and Access Control

  • All data ingestion occurs via authenticated APIs or journaling pipelines
  • Access to archived data is controlled via role-based access control (RBAC)
  • Audit logs should be enabled for:
    • Data access
    • Search activity
    • Export operations

Recommended Controls:

  • Enable MFA for admin accounts
  • Restrict API credentials
  • Periodically review access permissions

Monitoring and Validation

After configuration, validate:

  • GMail routing rules are active and delivering emails
  • API ingestion jobs are running successfully
  • Data is searchable in Vaultastic
  • No ingestion gaps exist

Suggested Checks:

  • Sample user mailbox validation
  • Google chat sampling
  • File count comparison (source vs archive)

Important Considerations

  • Google Chat data is stored in email-compatible format (not native structure)
  • API throttling may impact large-scale ingestion jobs
  • Historical ingestion duration depends on tenant size and API limits
  • Gmail routing rules capture only email (not Chat/GDrive)