Another day, another blog

Your life has a limit, but knowledge has none. If you use what is limited to pursue what has no limit, you will be in danger.
-Chuang Tzu

Reproducible data science with R, RStudio Server, and Docker

Although Docker has been around for years, Docker in the data science community doesn’t seem as widespread as git, which I attribute to the fact that many data scientists learn to program and work independently. However, as data scientists increasingly collaborate with others or aim to make their work more reproducible, Docker deserves an equal place alongside git in the practitioner’s toolkit.

By Rich in R Docker

January 28, 2023

Parquet, SQL, DuckDB, arrow, dbplyr and R

As opposed to traditional row-based storage (e.g., SQL), Parquet files (.parquet) are columnar-based, and feature efficient compression (fast read/write and small disk usage) and optimized performance for big data.

By Rich

November 6, 2021