A water quality portal for California
⛅ ✅ Checking the weather for a given location is simple.
🚰 ❌ Checking the water quality is not as easy.
To make water quality data in California easy to access and understand, I built a “weather app for water quality,” that enables users to easily check public drinking water quality in their hometown, place of work, school, and anywhere they may travel.
Challenge & Opportunity
Present sources of water quality information (i.e. - consumer confidence reports) are fragmented across thousands of water utilities in the state, and lack consistency and clarity. These reports do not always effectively communicate water quality data, are only updated once per year, and vary in quality (here's an example of one, another, and another). Near-real-time, standardized, and easy to understand water quality reports would improve the public’s understanding of the water they buy and consume.
Data isn’t available for all California water systems, notably “state smalls” which service between 5 and 15 connections. However, data availability is much better for community water systems (15 or more connections).
This project aimed to build a “weather app” for California community water system water quality. The benefits of this approach include standardization, consistency across agencies, ease of use, and near-real-time water quality information.
To this end, I aggregate publicly available water quality data for all California community water systems to report:
- compliance status
- chemicals detected within the past 2 years
- average detection levels for all contaminants tested for
- water quality indicators
- local water and health system contact information
I imagine that Californians will use this tool to learn more about water quality in their places of residence, work, school, and recreation. If they wish to learn more, they should find it easy to contact their water system.
Open water data
I turn on the tap every day to drink, shower, cook, and water the plants, generally taking for granted that clean water flows out. If I were to trace the water back through the pipe, I'd find a complex network of water utilities, treatment plants, and regulatory bodies all working together to acquire, treat, and deliver that water. This is one of the privileges of living in a wealthy country that’s easy to forget. If you asked me to tell you about the quality of the water coming out of my tap, about the different chemicals tested for and the levels detected, I wouldn't be able to. I might search the internet to search for this information, but I would find it difficult to find and make sense of.
Water in California is provided by a federated system of independent utilities. Some water agencies are so small that they slip through the cracks of state-mandated water quality testing, and others, like Metropolitan water district, are so large that they make up a majority of the water users in the state (see bubble chart below, created by Avery Kreuger. Note that some smaller water districts in the chart buy water from other districts in the chart, and that the population listed is the total number of people served by each system. For example Metropolitan water district is a wholesaler that serves 18M people, but through other retailers that are also listed (e.g. - City of Long Beach, City of Fullerton, Calleguas Municipal Water District).
Open data as democracy
It's the year 2020. Worldwide, water resources are threatened by over-extraction from underground aquifers, increasing climate variability and population growth, and aging infrastructure.
About a decade ago, in 2010, the UN recognized water as a human right, and in 2012, the state of California followed suit. If the leaders of California and the constituents they represent agree that water is vital, we must answer the question, “how do we go from where we're at today to the future we envision? How do we actually secure water as a human right?”
If the government is to play a key role in regulating water availability and quality, then policy is the answer. Policy is blind without supporting information, thus, open and transparent data is the most reasonable approach towards the goal of securing clean and accessible water. We need to measure the systems we manage, thus, we need open, accurate, and up-to-date information.
Efforts to make previously-private data public domain are relatively new. In 2016, California legislators passed the Open and transparent water data act, the technical implementation of which can be found in California's open data portal. It’s hard to emphasize just how significant of a shift this is.
Just 4 years ago, I started a PhD in hydrogeology at the University of California Davis, and found it nearly impossible to find publicly available water data. Water data was valuable, and hence, guarded. Private consulting firms collected, curated and sold services using those data. Why would a self-interested firm then release work that took them years to create, only to be stolen by another competing firm? Furthermore, to some, open data was synonymous with more regulation, so it was feared and opposed. For example, although well construction data has been collected for over 100 years by the state, it was only made public in 2017, enabling some of the first estimates of the population served by private domestic wells.
Although much more data is available in 2020 than it was 4 years ago, let alone a decade ago, not all water data is measured. Groundwater pumping data for example, is still private, and inferred by models developed by the Department of Water Resources, the United States Geological Survey, and NASA satellites. I’m not arguing that we should measure everything – that’s wasteful. However, we should think critically about our policy objectives and the decisions we need to make as a society, (whether that’s some human right to water, or avoiding an undesirable result), and then reverse-engineer the problem to determine what data we need to inform that critical decision.
Does the data that informs these critical policy objectives need to be open? Will it actually be in the best interest of the public? In some cases, the answer is no. For instance, consider information that may compromise the privacy of certain individuals, threaten national security, or be used in other perverse ways to disenfranchise marginalized communities. However, transitioning closed to open and transparent data systems has the potential to create new possibilities. For one, open data allows researchers to answer problems outside of the scope of government and industry that might otherwise remain neglected. Secondly, open data is very similar to the free press, in that it provides information which allows people to develop informed opinions, and put political pressure on our representatives. An informed society is critical to a functioning and healthy democracy, and this fundamentally depends on the openness of public domain data.
While working on this project, the following thought returned again and again to me:
The responsibility of reporting water quality information to customers currently falls on water systems. Is this reporting paradigm more aligned with public interest, or bureaucratic expedience?
There are clear benefits to requiring water systems to report water quality information to their customers, such as expedience and granting local control. However, as mentioned above, the drawbacks of this approach include infrequent and inconsistent reports, and information that’s hard to find. It’s also not inconceivable that utilities with lackluster water quality may be hesitant to make this information visible, something that an impartial third party wouldn’t suffer from.
With the rise of data technologies, automated reporting is breeze. Water quality data is a perfect example of a public dataset that can benefit from automated reporting. To ensure quality control, such a reporting pipeline would require government or private supervision. This comes at some cost, but ultimately, I believe this cost is outweighed by the benefit such a system offers to the public. To this end, this project aimed not only to create a use-case of how automated reporting can transform open data into a useful public resource, but also, to develop a re-usable data pipeline that can accommodate new data, and serve as a model for similar efforts in other domains.