Overview
The Vaultastic Open or Deep Stores can be used to securely archive in-prem data. The AWS DataSync service is one method supported by Vaultastic to upload data to these stores. The AWS DataSync service allows you to quickly move file and object data to Vaultastic Open or Deep Stores. Your data is secure with in-flight encryption and end-to-end data validation.
Purpose-built Network protocol
Network optimizations performed by DataSync include incremental transfers, in-line compression, and sparse file detection, as well as in-line data validation and encryption.
Connections between the local DataSync agent and the in-cloud service components are multi-threaded, maximizing performance over your Wide Area Network (WAN). A single DataSync task is capable fully utilizing 10 Gbps over a network link between your on-premises environment and AWS.
Data Encryption & Validation
All your data is encrypted in transit between the DataSync agent and the DataSync service using Transport Layer Security (TLS). DataSync ensures that your data arrives intact. For each transfer, the service performs integrity checks both in transit and at rest. These checks ensure that the data written to your destination matches the data read from your source, validating consistency.
One-time or scheduled data transfers
You can use the service for one time data transfer. In addition, DataSync comes with a built-in scheduling mechanism, allowing you to periodically run data transfer tasks to detect and copy changes from your source storage system to the destination. You can schedule your tasks using the AWS DataSync Console or AWS Command Line Interface (CLI) without writing scripts to manage repeated transfers. Task scheduling automatically runs tasks on your configured schedule with hourly, daily, or weekly options provided directly in the AWS Console.
This document highlights the requirements and steps to configure the service to upload data from your in-premise Server Message Block (SMB) shares to Vaultastic Open or Deep Stores.
Pre-requisites
- Your on-premises data that is to be uploaded to the Vaultastic Stores should be accessible from a SMB Server. The SMB server should not be shut down when the sync is in progress. Recommend Version: SMB 3.0.2. Supported Versions: SMB1.0+.
Reference: : https://docs.aws.amazon.com/datasync/latest/userguide/datasync-network.html#on-premises-network-requirements - Physical resources to run the DataSync agent. An agent is a virtual machine (VM) appliance that you deploy in your storage environment for data transfers. The agent VM requires the following resources:
- Virtual processors: Four virtual processors assigned to the VM.
Disk space: 80 GB of disk space for installing the VM image and system data.
RAM: Depending on your transfer scenario, you need the following amount of memory:
- 32 GB of RAM assigned to the VM for task executions working with up to 20 million files, objects, or directories.
- 64 GB of RAM assigned to the VM for task executions working with more than 20 million files, objects, or directories.
Reference: https://docs.aws.amazon.com/datasync/latest/userguide/agent-requirements.html
- Static IP addresses. Both the SMB server and the machine hosting the agent VM should have static IP addresses and they should belong to the same subnet. As an example, the on-premises data source SMB IP address is 192.168.0.220 where the subnet is 192.168.0.0/24 (where /24 means 255.255.255.0). And the assigned static IP address to the agent VM is 192.168.0.89 where the subnet is 192.168.0.0/24 (where /24 means 255.255.255.0)
- Stable Internet connection with sufficient capacity.
Setup of the AWS DataSync Agent
Installation of the DataSync Agent
- Request the onboarding team to share a link to download the DataSync agent to be installed.
- Download the Agent VM OVA file shared by Mithi On-boarding Team. It should be around 700 MB.
- To deploy the Agent VM on your server within your on-premises network, open the downloaded ova file using virtualization software.
- NOTE: The rest of the documentation is completed using the Oracle VirtualBox. If you already have a virtualization software, you can just double click on the agent ova file. You can set the name of the agent under the “Name” section.
Configuration the Network Connectivity of the DataSync Agent
1. Start your agent from VirtualBox by right clicking on your agent VM name and selecting start.
2. On the AWS Data Sync screen, enter the following as the login and password
login: admin
password: password
3. You will be then prompted to change your default password to a new unix password. Make sure you enter the password carefully and note it down somewhere else because you might not be able to change it later.
4. On the next screen , you will see the options for AWS DataSync Activation: Network Configuration
- Select option “Configure Static IP” to set a static IP address to your agent VM, such that your agent would be on the same subnet as the on-premises data source. NOTE: Always use a static IP address for your agent, do not use DHCP as it gives you a dynamic IP address, meaning it may change over time when your device reconnects to the network or after the lease time expires. And your SMB Server should also have static IP address as well.
- Select “Edit DNS Configuration” and set the DNS manually, as the automatic setting may prevent the agent from connecting to the internet.
- Select the network adapter from the options shown
- Select the IP version (recommended 4)
- Select network mask and the default gateway.
5. Testing the Network connections
You can use the View Routes option to check the routes. Ensure that your agent VM can connect to both your local storage server/data source and the AWS services endpoints by testing your network connectivity of your agent VM.
You can select option “Test Connectivity to Self-Managed Storage” to test whether your agent can connect with your on-premises data source/server.
You can select option “Test Network Connectivity” to check network connection with the AWS service endpoints with the specified region where you would run your DataSync services. Get the region details from the Mithi Onboarding team.
After both of these options show “[PASSED]”, you can continue with the registration of the agent with the help of the Mithi Onboarding team.
After both of these options show “[PASSED]”, you can continue with the registration of the agent with the help of the Mithi Onboarding team. Inform the Mithi on-boarding team that agent is ready and has passed both network tests of AWS public service endpoints and Connectivity to Self-Managed Storage. Mithi will contact you for further agent registration process
NOTE: Do not select option “Get Activation Key” in the Main Menu in this step, as it will only be required while registering the agent with the DataSync Services with the help of the Mithi Onboarding team.
Activate the DataSync Agent
NOTE: This step has to be done with the help of the Mithi Onboarding team
1. Login to the Data Sync Agent
2. Select Get Activation Key
3. Select Public Endpoints
4. Share the activation details with the Mithi Onboarding team.
NOTE: The activation key expires every 30 mins if it is not registered.
5. The Mithi Onboarding team will complete the agent configuration process.
Define the upload tasks with the help of the Mithi Onboarding team
For the data upload task
1. Share the task name
2. Share the mount points or subdirectories on the mount point hosting the data to be uploaded
3. Share Schedule
NOTE: If you have more than one task, then they will run sequentially.
Monitoring the data movement tasks
The AWS DataSync service will be configured to generate summary reports to be generated at the end of a task. These reports will be available to you in the AWS DataSync Reports folder of your domain folder on the Open Store.
Troubleshooting Connectivity issues of the AWS DataSync Agent
Method: Test Network Connectivity Option
- Your on-premises PC/machine should allow outbound access for TCP on port 445. So, the agent will be able to communicate with AWS public endpoints.
- You can select “Reset All to DHCP” option in the “Network Configure Setting” of agent VM. This will assign a dynamic IP address to your agent. Use it temporarily. Then test your network connectivity with AWS public endpoints using the option “Test Network Connectivity”.
- If it shows [PASSED], you can go back to your “Network Configure Setting” and assign your preferred static IP address again. You can test the network connectivity with AWS public endpoints again.
Note: Always use a static IP address for your agent, after using DHCP, do not leave the IP address given by DHCP, as it gives you a dynamic IP address, meaning it may change over time when your device reconnects to the network or after the lease time expires. You should revert back to static IP address again before registering it with AWS DataSync Services.
Method: Test Network Connectivity Using Command Prompt
Select option “Command Prompt”
Enter h for help
Type “nping” command and hit Enter
Type complete command for AWS endpoint to test connectivity. nping -d amazon.com -p 445 -c 2 -t tcp. Agent will send and receive data packets.