Accessing Google Drive Data on Katahdin with rclone
Introduction and prerequisites:
Rclone is a program that allows you to manage files on cloud storage, including Google Drive. This page will walk you through the process of setting up your Google Drive "My Drive" folder on Katahdin so you can copy files between your cloud storage drive and your ACG HPC account.
Setting up Google Drive with Rclone requires an ACG account with a VNC connection to Katahdin, so if you haven't already, run through
https://acg.maine.edu/hpc/connecting-to-katahdin
and
https://acg.maine.edu/hpc/vnc-setup-for-katahdin
before continuing.
Accessing your My Drive:
Connect to Katahdin and open a VNC session. On your graphical desktop, open a terminal window and type the command:
rclone config create mydrive drive
Rclone will open a browser window for you to sign in and provide access to your Google Drive. Sign in with the same credentials you use for your @maine.edu gmail account. When you get to the "Success" page, close the browser window.
Rclone commands
At this point rclone is set up for access to your Google Drive My Drive with the name "mydrive" and you can copy files from the shell using the rclone command line interface (i.e., you don't need a VNC session). Common commands:
#list all files on your My Drive
#(recursively including subfolders).
rclone ls mydrive:
#list files in the project1 folder.
rclone ls mydrive:project1
#list directories in the root folder of your My Drive.
rclone lsd mydrive:
#copy inputs.dat to the current directory on Katahdin.
rclone copy mydrive:project1/inputs.dat .
#copy outputs.dat from the bigrun directory on Katahdin
#to the project1 folder on My Drive.
rclone copy bigrun/outputs.dat mydrive:project1
Note that while Katahdin is connected to your My Drive, the compute nodes are not. You can use data from Google Drive to set up your runs, and you can copy results to Google Drive, but you can't access Google Drive from within a scheduled SLURM job. You wouldn't want to do so anyway as it would be very slow compared to HPC storage resources.
Accessing a Shared Drive
If you want to access a Google Drive shared drive / team drive from Katahdin with rclone, you need the drive root folder ID. The easiest way to get this ID is by navigating to the the shared drive in the Google Drive web interface. The ID will be at the end of the URL in the location bar.
The following command lists the files/folders in the root directory of the drive with ID THiSiSAlOngIDtotyPe.
rclone lsd mydrive: --drive-root-folder-id THiSiSAlOngIDtotyPe
If you frequently access that drive you might want to create an environment variable for that ID.
echo "export XYZTeamDrive=THiSiSAlOngIDtotyPe" >> ~/.bashrc
source ~/.bashrc
rclone lsd mydrive: --drive-root-folder-id $XYZTeamDrive