Analysis Air Data

Study of atmospheric pollution in different Catalan cities

Here you can find the Air quality standards in the European Union and World Health Organisation

Pollutant (µg/m3) Period EU Limit WHO Limit
PM2.5 day  15
PM2.5 year  20  5
PM10 day  50 (1)  45
PM10 year  40  15
O3 8 hours  120 (2)  60 (3)-100
NO2  hour  200 (4)  200
NO2  day  25
NO2  year  40  10
SO2 hour 350 (5)
SO2 day 125 (6) 40
CO (mg/m3) hour 35
 CO (mg/m3)  8 hours  10  10
Air pollutants limits by WHO and EU

(1) <35 days/year (2)<25 days/year (3 year average) (3) peak season (worst consecutive 6 month) (4) <18 h/year (5) <24 h/year (6) <3 days/year

Check all your city hourly data using openair library e.g. here you have the code and results from Martorell data from 1991 to  May 2022:

> episodeO3<-selectRunning(mydata, pollutant=”o3″, threshold=120, run.len=8)

> table(episodeO3$criterion)

no
65533

> episodePM10<-selectRunning(daily, pollutant=”pm10″, threshold=50, run.len=1)
> table(episodePM10$criterion)

no                         yes
2493                    238

Calculate always de percentages 238/(2493+238)*100 means 8.71% of days particulate matter PM10 is over 50 ug/m3 corresponding to around one month per year of pollution over the limits.

episodeNO2<-selectRunning(mydata, pollutant=”no2″, threshold=200, run.len=1)
table(episode$criterion)

table(episodeNO2$criterion)

no yes
65532 1

For example WHO limits for SO2 is correct concerning daily data, but in hour data there are some data above limits

> episodeSO2<-selectRunning(daily,pollutant=”so2″, threshold=40, run.len=1)
> table(episodeSO2$criterion)

no
2731
> episodeSO2<-selectRunning(mydata,pollutant=”so2″, threshold=40, run.len=1)
> table(episodeSO2$criterion)

no yes
65520 13

Check rollingMean function?

o3<-mydata[-c(columnsdeleted,x:y)]
View(o3)
meano3 <- rollingMean(cityall, pollutant = “o3”, width = 8)
View(meano3)

YEARLY DATA

>library(ggplot2)

>ggplot(data=yearly, aes(x=date, y=value)) +geom_line()+geom_point()+geom_hline(yintercept=40)+geom_hline(yintercept=10, linetype=”dashed”, color = “red”)

Figure 1. NO2 air pollution in Martorell from 1991 to May 2022, red line WHO limit, black line: EU limit

Example of poster content: Half of the years (16 over 30 year time line) NO2 is over the recommended limits by European Union in Martorell and all the years over the limit of 15ug/m3 recommended by WHO in 2021

IMPORTANT! You need to create 3 Figures all with your city air pollution data: Figure 1 with annual data limits, Figure 2 with daily data limits and Figure 3 with hourly data limits.

In Figure 1 you must include 3 pollutants (NO2, PM10, PM2.5), using the previous code, but only adding data in the pollutant column and adding inside AES function ,col=pollutant)

ggplot(data=fig1, aes(x=date, y=value, color=pollutant)) +geom_line()+geom_point()+geom_hline(yintercept=40)+geom_hline(yintercept=10, linetype=”dashed”, color = “red”)

Figure 2 using daily data of PM2.5, PM10, NO2 and SO2 and Figure 3 using  hourly data of O3, CO, SO2, NO2 not forgetting in both cases to add horizontal lines for WHO/EU limits.

If data stops in year 2005 it is possible to add missing dates of 2005 in order to see the full series until 2022

https://blog.exploratory.io/populating-missing-dates-with-complete-and-fill-functions-in-r-and-exploratory-79f2a321e6b5

timePlot(mydata, pollutant=c(“pm25″,”pm10″,”no2″,”so2”))

Figure 4 can be a timeVariation, Figure 5 a trendLevel and Figure 6 a pollutionRose.

Include 5-6 Figures and very few words (maximum 150) in your scientific poster entitled “Air pollution in cityname” mainly conclusions and refer to your website for more available data or supplemental data, figures and discussion.

LATEST VERSION AT:

http://xtec.cat/~fperez5/medi_ambient.html

You can find here an original way to present real-time data created in our school.

  1. Download automatic and manual data from 1991 at Generalitat de Catalunya (different pollutants, cities and years)
  2. Download data of XEMA database wind speed and wind direction
  3. Download R software for data analysis.
  4. Add your badge of finishing the R introduction online in your wordpress blog
  5. Download R Studio
  6. Download data, books, articles sent by your teacher in a zip wetransfer file before a week
  7. Install R libraries. There are thousands available on CRAN. Some of the most interesting ones for your project are:
    • openair (to analyse air pollution)
    • ggplot2 (plotting the easy way, e.g. error bars)
    • ggvis and googleVis (interactive visualization),
    • ggmap and googleVis (google maps visualization of data if you don’t like fusion tables)
    • aplpack (Chernoff faces)
    • forecast,
    • timeSeries,
    • corrplot (correlation plots)
    • lubridate (working with dates and hours),
    • Rcmdr (R Commander GUI for statistics)
    • e1071 (skewness, kurtosis, etc),
    • nortest (normality tests),
    • car (homocedascity Levene’s test),
    • reshape2 and dplyr (manipulate data sets) etc: >install.packages(c(“openair”, “ggplot2”, “forescast”,…))
  8. Convert pollution data in comma separated values file (csv format) using  first Microsoft Excel or LibreOffice Calc (Save As … csv) and the replace commas to points (decimal values) and then semicolons to commas using Sublime Text . Replace “Sense dades” values to NA values. Check the file  contents carefully with Sublime Text
  9. Introduce pollution data in R Studio (read.csv(“mydata.csv”) instruction)
  10. Analyse data with R (statistical analysis, time series forescast, interactive visualization)
  11. Present data with interactive plots and graphs.
  12. Draw conclusions from data: Is the data normally distributed? Is Martorell more polluted than the Catalan city chosen for you? Is it more polluted at night, at weekends, world car free day ? What is the pollution forecast for the next days?  Are all week days equally polluted (Mon to Sun)? Are pollutants correlated?
  13. Write a scientific poster with your conclusions: introduction, materials and methods, results and discussion, conclusions and bibliography from science scholar, WHO, EU, EPA and NCBI.
  14. Discuss health effects (BreatheLife 2030 WHO)  and legal limits (WHO, EU, EPA, etc.) of air pollutants.
  15. All information must be included also in your scientific poster.
  16. Try to convert graphs in art objects as in enviromental graphiti website.

Air pollution index

MPI can be defined as:

mpi

where n is the total number of pollutants taken in consideration, ACi is the atmospheric concentration of a pollutant in a certain location and CGi is the guideline concentration of a pollutant recommended by a national or international agency such as the WHO.

A clean environment (in the absence of air pollutants) has an MPI equal to ?1, while for a location with ACi equal to the CGi (i.e. a location with borderline air quality), the MPI would be around 0. No upper limit is given for the MPI, with relatively higher values corresponding to poorer air quality. We refer to Gurjar et al. (2008) for a detailed discussion on this index.

For example, MPI (WHO) Measured pollutants mean values (ppm)  are: SO2: 63, CO: 12, NO2: 222 and guideline values are respectively 125, 10 and 200. The calculation is MPI=1/3 (65-125)/125)+(12-10)/10)+ (222-200/200) =-0.19 Negative values are good air-quality range values. So, you can calculate MPI from data in an easy way using R software.

Another example could be to calculate an index as “Index Català de Qualitat de l’Aire“,  air quality index of Environmental Protection Agency (USA) in a 0-500 scale for individual pollutants

Reference:
Gurjar, B., Butler, T., Lawrence, M., and Lelieveld, J.: Evaluation of emissions and air quality in megacities, Atmos. Environ., 42: 1593–1606, 2008.

Create your scientific poster using Powerpoint or Impress.

Template here

References:

Air pollution data is related with meteorological conditions.


Openair is a package created for the analysis of air pollution by David Carlslaw (University College London). Please find how it works from the official website and from the tutorial or manual package.

You will find many instructions available to create powerful graphs as for example timePlot(x), calendarPlot(x),summaryPlot(x)

You can also create an interactive graph by making use of C3.js library. In this example you can interact with air pollutant data found in Martorell from 2005 to 2016.

Data preparation

  • Name of Dates column must be dates
  • Convert dates into dates

>martorell$date<-as.Date(martorell$date, format=”%d/%m/%Y”, tz=”Europe/Madrid”)

class (martorell$date) “Date”

  • Convert factor to numeric values:

martorell$H2S<-as.numeric(levels(martorell$H2S))[martorell$H2S]

cutData()

selectByDate instruction:

We can conclude that there is less pollution during weekends according to available hour data from H2S levels in the period 01/12/2016 to 30/11/2017.

You can find some data from january 2018 to December18th, but you need to analyse the full year 2018

WORKING LOCALLY in a pendrive with R and RStudio:

  • install.packages (“openair”, lib=”F:/SOFTWARE/RStudio/libraries”)
  • library(“openair”, lib.loc=”F:/SOFTWARE/RStudio/libraries”)
  • setwd(“F://SOFTWARE/RStudio/Projects”)
  • getwd()
  • Review setwd and getwd to know the working directory

SET LANGUAGE TO ENGLISH

Sys.setlocale(“LC_ALL”, “English”)

CHECK CLASSES and use as.Date or as.POSIXct to work with dates (lubridate package can also help)

AIR POLLUTION IN MARTORELL 2017 EXAMPLE

  1. SummaryPlot example

summaryPlot(martorell)
date NO NO2 H2S
“Date” “integer” “integer” “numeric”

2. Calendar plot example

Sys.setenv(LANGUAGE=”en”)

>martorell<read.csv(file=”/Users/franciscoperezgarcia/Desktop/martorelld17.csv”)
> View(martorelld17)
> class(martorell$date)
[1] “factor”
> martorell$date<-as.Date(martorell$date, format=”%d/%m/%Y”)
> class(martorell$date)
[1] “Date”
> library(openair)
> calendarPlot(martorell,pollutant=”H2S”, year=”2017″)

2. timePlot example

timePlot(martorell, pollutant=c(“NO”, “NO2″,”H2S”))

> timePlot(martorell,pollutant=”NO”, year=”2017″)

timePlot(martorell, pollutant = c(“NO”, “NO2”, “H2S”),
+ avg.time = “month”, normalise = “01/08/2017”, lwd = 4, lty = 1,
+ group = TRUE, ylim = c(0, 700))

3. monthPlot example

monthplot(martorell$NO2, main=expression(paste(“2017 Month plot of Martorell NO”[2])), ylab=”Conc (ug/m3)”)

4. Episodes of air pollution in Martorell

Remember to convert ppb to microgram/m3

Pollution limits in other countries

Concerning H2S: 8h over 1.5 microgram/m3 (1ppm), 15 min over 7.5 microgram/m3 (5 ppm) according to ACGIH, 2010

ALERT: Values of H2S are in microgram/m3 and limits in Catalonia are 40 (24-hour mean) and 100 (30-min mean). Review the previous code accordingly.

Concerning NO2, criteria from EU Air Quality Standards and Air Quality Guidelines WHO 2011 limits

Remember to calculate 8-hour mean of ozone and other means (1h,8h,24h, annual,etc) of other air pollutants to find air pollution episodes in your city.

5. trendLevel

trendLevel(martorellh, pollutant=”H2S”)

6. smoothTrend

First, we choose to deseasonalise the data, then we choose to plot percentiles (the 5th, 50th, 75th and 95th)

7. Times series ts

8. timeVariation

Data from 2018 below

timeVariation(palau, pollutant=c(“NO”, “NO2″,”SO2″,”CO”, “PM10”, “O3”), main = expression(paste(“Air pollutants, “, mu, “g/m3″ at Barcelona-Palau Reial (2018)”)

timeVariation(palau, pollutant=c(“NO”, “NO2″,”SO2″,”CO”, “PM10”, “O3”), normalise = TRUE)

Next graphs are Air pollutant from Martorell in 2018:

Air pollution in Martorell: levels in NO (ug/m3)

Air pollution in Martorell: levels in H2S (ug/m3)

Air pollution in Martorell: levels in NO2 (ug/m3)


Analyse the meteo half-hour measurements given by the teacher from 2010 to 2022 and merge it with hourly air pollutants data.

Some abbrevations used in meteo data are:

  1. T, Tx and Tn means mean temperature, minimum temperature and maximum temperature (Celsius degrees) and HR, HRn, HRx is the mean, minimum and maximum relative humidity (%)
  2. P is mean atmospherical pressure (hPa) and PPT rainfall accumulation (mm)
  3. VV10 is mean wind speed (m/s) and DV10 is mean wind direction (º), both at 10 m high
  4. VVx10 is maximum wind speed (m/s) and DVx10 is maximum wind direction (º), both at 10 m high
  5. RS is solar radiation (W/m2)

 

Here you have some useful R codes:

  1. You must rename columns from e.g NO2 to NO2city1 and NO2city2 by using dplyr R package instruction rename or by changing manually the csv files
  2. Change also the weather column names from DATA to date, from VV10 to ws (wind speed), from DV10 to wd (wind direction) because it is compulsory in openair R package.
  3. You need to read all pollution and weather flat files (csv). For example, to read a 30 min-weather file in R:
    weather<-read.csv(file.choose())
  4. View the data with View(weather)
  5. Convert date column in Date format by using as.Date instruction explained in openair tutorial and done previously with air pollution data because software does not recognise automatically numbers as dates
  6. After reading the csv pollution file and meteo file you need to give them the same structure, that is, to calculate averages in R in different date frames, days, hours or 30-min time frames. For example you can also duplicate some data by using fill=TRUE parameter.
    weatherd<-timeAverage(weather, avg.time = “day”)
    pollution30min<-timeAverage(pollution, avg.time = “30 min”, fill=TRUE)
  7. When you have check that your dataframes are well formatted, that is, both datasets contained dates recognised as dates (not as factors) you are ready to merge these datasets by date, one with air pollution data and another with weather data
    The instruction class(pollutiond$date) or class(weatherd$date) inform you if these data are factors or dates
    city1All
  8. You need to compare pollution between cities:
    allData
  9. You can keep a copy of the new file with all data, meteo and pollution from both cities with instruction:
    write.csv(allData, file = “allData.csv”)
  10. When you have all meteorological and air pollution data from two cities together in file allData.csv and in allData R dataframe you can make statistical analysis of data
    wilcox.test (alldata$NO2city1, alldata$NO2city2)
  11. Create a boxplot figure: boxplot(alldata$NO2city1, alldata$NO2city2, main=”NO2 pollution in city 1 and city 2″ ylab=”ug/m3″, xlab=”1: City 1, 2: City2 (year 2018, N=8760)”)
  12. You need to connect openair library to your project by writing require(openair) or library (openair)
  13. To present a pollution rose combining air pollution and wind speed and direction:
    pollutionRose(allData, pollutant = ‘O3’, type = ‘season’)
  14. calendarPlot(allData, pollutant = “ozone”, annotate = “ws”)

The following symbols can be used with the format( )function to print dates.

%d is day as a number (01-31), %a abbreviated weekday (Mon), %A  unabbreviated weekday (Monday), %m month (01-12); %b abbreviated month (Jan); %B unabbreviated month (January); %y 2-digit year (19); %Y 4-digit year  4-digit year (2019). Hour:Minutes:Seconds R code format is %H:%M:%S

Here is an example:

# change from a factor variable to a date variable:

weatherd$date<- as.Date(weatherd$date format=”%d/%m/%Y”)

You can use as.POSIXct if as.Date does not work properly.


In order to forecast air pollution you can use interesting R packages:
prophet (created by facebook), forecast, caret, tseries (to forecast with ARIMA, Box-Jenkins and other models).

In the following example you will use prophet R package to forecast air pollution:

m <- prophet(allData)
future <- make_future_dataframe(m, periods = 31)
forecast <- predict(m, future)
plot(m, forecast) prophet_plot_components(m, forecast) shows the overall trend, weekly and yearly seasonality.

https://towardsdatascience.com/using-open-source-prophet-package-to-make-future-predictions-in-r-ece585b73687 (full prophet R tutorial)

 

Write subscript, superscript and symbols with keyword expression, example:

ylab=expression(paste( “NO[2] (“, mu, “g/”, m^3, “)”, sep=””))