Setting up

Google Colab

ImportantWhat You’ll Need
  • A Google account (free)
  • A modern web browser

Getting Started with Google Colaboratory

Google Colaboratory (or “Colab” for short) is a free cloud-based service that provides hosted Jupyter Notebooks1 directly in your browser. No installation or setup is required—you can start coding immediately.

Why are we using Colab?

We use Colab in this course to ensure everyone has a consistent development environment. This approach helps us avoid common problems like:

  • Environment setup issues on different operating systems (Windows, Mac, Linux)
  • Package dependency conflicts between different Python versions
  • Hardware limitations on personal computers
  • To avoid troubleshooting installation problems

While Colab is free to use, paid subscriptions (Colab Pro and Pro+) provide access to more powerful GPUs, higher RAM, and longer runtime limits. For this workshop, the free tier is sufficient.

Starting a session

To begin working in Colab, you’ll need to start a runtime—this is the virtual machine that will execute your code.

Creating and Connecting to a Runtime

  1. Navigate to Google Colaboratory.
  2. Click File > New Notebook or open an existing notebook
  3. Colab will automatically allocate a runtime when you open a notebook
  4. Look for the connection status in the upper right corner:
    • Disconnected: No runtime is active
    • Connecting…: Runtime is being allocated
    • Connected: You’re ready to run code (shows RAM and disk usage)

Choosing Your Runtime Type (Python or R?)

Since we’ll be working with both R and Python in this workshop, you may need to change the runtime type:

  1. Click Runtime > Change runtime type in the menu
  2. Select your preferred language:
    • Python 3 (default)
    • R (for R sessions)
  3. Choose a hardware accelerator if needed (None, GPU, or TPU)
  4. Click Save

Once connected, you can start writing and executing code!

File organization

Recipe 1: Mounting Your Google Drive

To access files from your Google Drive within Colab, you need to “mount” it first. This step is necessary only if you are using Colab.

Google Drive mounting is only available in Colab’s Python runtime. To use R with Drive access:

  1. Start with a Python runtime and mount your drive:
from google.colab import drive
drive.mount('/content/drive')
  1. Switch to the R runtime (Runtime → Change runtime type → R). The drive mount persists.
  2. Now you can access your files using the mounted path:
setwd('/content/drive/MyDrive/WorkshopData/')
df <- read.csv('imd_msoa.csv')
from google.colab import drive
drive.mount('/content/drive')

You’ll be prompted to authorize access. Click the link, sign in, and copy the authorization code back to Colab.

Tip

After mounting, your Drive files will be accessible at /content/drive/MyDrive/ in the session. Mounting Google Drive is optional. You can also upload files directly to the Colab runtime using the file browser in the left sidebar. These files are temporary and will be deleted when the runtime disconnects.

Recipe 2: Accessing the Workshop Data Folder

We’ve prepared a shared Google Drive folder with some starter colab notebooks for this workshop.

Step 1: Add the shared folder to your Drive

  1. Open the shared folder link: Workshop Data Folder
  2. Copy the folder and add it to your own Drive

Step 2: Set your working directory

After mounting your Drive, change to the workshop data directory so you can use relative paths.

setwd('/content/drive/MyDrive/WorkshopData/')  # Adjust path if needed

# Now you can use relative paths
library(readr)
df <- read_csv('data/dataset.csv')
import os
os.chdir('/content/drive/MyDrive/WorkshopData/')  # Adjust path if needed

# Now you can use relative paths
import pandas as pd
df = pd.read_csv('data/dataset.csv')

Downloading Data from imago

Imago hosts the datasets we’ll use. Here’s how to download them directly to your Drive.

library(googledrive)

# Set up output directory
output_dir <- "workshop_data/"
dir.create(output_dir, showWarnings = FALSE)

# Download file
url <- "https://imago.example.com/dataset.csv"  # Replace with actual URL
filename <- basename(url)
output_path <- file.path(output_dir, filename)

download.file(url, destfile = output_path, mode = "wb")
print(paste("Downloaded to:", output_path))
import requests
import os

# Mount drive first (see Recipe 1)
# Make a directory to save your outputs
output_dir = '/content/drive/MyDrive/WorkshopData/'
os.makedirs(output_dir, exist_ok=True)

# Download file
url = 'https://imago.example.com/dataset.csv'  # Replace with actual URL
filename = url.split('/')[-1]
output_path = os.path.join(output_dir, filename)

response = requests.get(url)
with open(output_path, 'wb') as f:
    f.write(response.content)
    
print(f"Downloaded to: {output_path}")

Quick Data Check

After downloading, verify your data loaded correctly:

library(readr)
df <- read_csv('/content/drWorkshopData/dataset.csv')
head(df)
str(df)
import pandas as pd
df = pd.read_csv('/content/drive/MyDrive/WorkshopData/dataset.csv')
print(df.head())
print(df.info())

Installing and loading Packages

Before you can use specialized packages for geospatial analysis and statistics, you need to install them (download and set them up on your system or runtime) and then load them (make their functions available in your current session). You only need to install a package once, but you must load it every time you start a new session.

Installing packages

# Install from CRAN
install.packages("sf")
install.packages(c("terra", "exactextractr", "ggplot2"))

# Install from GitHub (development versions)
remotes::install_github("r-spatial/sf")

# Install specific version
remotes::install_version("sf", version = "1.0-14")
# Using conda (preferred for local installations)
# Run in terminal/command line:
#conda install geopandas rasterio xarray -c conda-forge

# Using pip (in Colab or Jupyter notebooks)
%pip install geopandas rasterio xarray pyfixest

# Install specific version
%pip install geopandas==0.14.0
NoteInstallation in Colab

In Google Colab:

  • Python packages use %pip (magic command) to ensure installation in the correct environment
  • R packages use the standard install.packages() function
  • Many common packages (pandas, numpy, matplotlib) are pre-installed in Colab
  • For geospatial packages, you’ll typically need to install them at the start of your notebook

Loading Packages

# Load packages
library(sf)
library(terra)
library(exactextractr)
library(ggplot2)

# Alternative way to load packages
require(dplyr)
# Import packages
import geopandas as gpd
import rasterio
import pandas as pd

# You can also import specific functions
from pyfixest.estimation import feols

  1. A Jupyter Notebook is an interactive development environment (IDE) that combines live code, visualizations, and explanatory text in a single document. It allows you to write and execute code in cells, making it ideal for data analysis, machine learning, and educational purposes.↩︎