Working with Big Data in R

Wed Nov 12, 2025 3:00 p.m.—4:00 p.m.

This intermediate-level workshop provides social science researchers with essential skills for analyzing datasets that exceed typical computer memory limitations. Participants will learn to distinguish between datasets and databases, implement efficient data storage solutions using Apache Arrow and Parquet files, and build robust Extract-Transform-Load (ETL) pipelines for large-scale data processing. The workshop covers partitioning strategies for optimal performance, writing custom functions using both dplyr API for Acero and SQL syntax, and creating local analytical databases with DuckDB. Through hands-on exercises using real voter file data, researchers will develop practical skills in out-of-core processing, database management, and scalable data analysis workflows.

Location:
RKZ 01