Rich Pauloo, PhD

Data Scientist

Hi ๐Ÿ‘‹, my name is Rich.

I'm a data scientist at LWA and spend most of my time in R automating ETL pipelines for sensor networks ๐Ÿ“ก, building Shiny Apps and dashboards ๐Ÿ–ฅ, designing approaches with spatial statistics and hydrologic models, and generally wrangling lots and lots of data.

I have a PhD in Hydrology and my dissertation is titled ‘Emerging consequences of regional-scale aquifer depletion: data-driven and numerical models of well failure, basin salinization, and contaminant transport’ (my exit seminar can be viewed here). Early in my PhD, I found that I really enjoyed data science and programming, and I used these years to sharpen those skills. My published research includes NLP and network analysis, spatial statistics, and physical modeling of 3D, subsurface contaminant transport.

I'm an #rstats nerd and automation/reproducibility fanatic. My favorite tools include tidyverse (dplyr, ggplot2, purrr), shiny, flexdashboard, DT, RMarkdown (for dashbaords/reporting), sf, sp, raster, leaflet (for spatial data), and DBI for databases. A few projects I'm proud of include an R package to query water quality data ๐Ÿ“ฆ, R data science curriculum ๐Ÿ“š, a dashboard that makes millions of water quality observations understandable ๐Ÿ“ˆ, and a model that predicts the risk of wells going dry ๐Ÿ’ง funded by Microsoft's AI for Earth Grant.

Before my PhD, I taught environmental science to middle and high school students in Yosemite and the Marin Headlands for the educational nonprofit NatureBridge. I spent summers leading expeditions in the wilderness, and in Thailand for National Geographic Student Expeditions.

In my free time, I enjoy anything that puts me in a flow state: alpine climbing ๐Ÿง—๐Ÿผ, running ๐Ÿƒโ€โ™‚๏ธ, surfing ๐Ÿ„โ€โ™‚๏ธ, tinkering on bikes ๐Ÿšดโ€โ™‚๏ธ, reading ๐Ÿ“š, playing guitar ๐ŸŽธ, and cooking ๐Ÿง‘โ€๐Ÿณ.


  • ๐Ÿ‘จโ€๐Ÿ’ป data science
  • โ›ฐ๏ธ expedition behavior
  • ๐Ÿงฎ mathematical modeling
  • ๐Ÿ“ก sensor networks


  • PhD in Hydrogeology, 2020

    University of California Davis

  • BA in Integrative Biology, minor in Conflict Resolution, 2011

    University of California Berkeley




R for water Resources Data Science.

gsp dry wells .com

Domestic well failure prediction and cost estimates in critically overdrafted basins.

low cost sensor networks

Real-time sensor networks and dashboards for monitoring environmental data.

cal water quality .com

Automated water quality reports for > 3,000 California public water systems. ๐Ÿ† Winner 2019 California Water Data Challenge.

interpretable random forests

Cumulative variable importance.


Text yourself from R when long running jobs complete.

Tulare basin TDS

Groundwater quality data visualization.

CA well report filter

Upload a shapefile of a study area to return clean OSWCR data from that area.

CA well reports

Exploratory data analysis of California's Online State Well Completion Report Database.

Fatal landslide prediction

Using random forests, boosting, LDA, and QDA with variable probability thresholds for global landslide classification.


An adaptation of PapR to 30,000 American Geophysical Union abstracts.