Air pollution data is related with meteorological conditions.
Analyse the meteo half-hour measurements given by the teacher from 2010 to 2018, and merge it with hourly air pollutants data.
Some abbrevations used in meteo data are:
- T, Tx and Tn means mean temperature, minimum temperature and maximum temperature (Celsius degrees) and HR, HRn, HRx is the mean, minimum and maximum relative humidity (%)
- P is mean atmospherical pressure (hPa) and PPT rainfall accumulation (mm)
- VV10 is mean wind speed (m/s) and DV10 is mean wind direction (º), both at 10 m high
- VVx10 is maximum wind speed (m/s) and DVx10 is maximum wind direction (º), both at 10 m high
- RS is solar radiation (W/m2)
Here you have some useful R codes:
- You must rename columns from e.g NO2 to NO2city1 and NO2city2 by using dplyr R package instruction rename or by changing manually the csv files
- Change also the weather column names from DATA to date, from VV10 to ws (wind speed), from DV10 to wd (wind direction) because it is compulsory in openair R package.
- You need to read all pollution and weather flat files (csv). For example, to read a 30 min-weather file in R:
- View the data with View(weather)
- Convert date column in Date format by using as.Date instruction explained in openair tutorial and done previously with air pollution data because software does not recognise automatically numbers as dates
- After reading the csv pollution file and meteo file you need to give them the same structure, that is, to calculate averages in R in different date frames, days, hours or 30-min time frames. For example you can also duplicate some data by using fill=TRUE parameter.
weatherd<-timeAverage(weather, avg.time = “day”)
pollution30min<-timeAverage(pollution, avg.time = "30 min", fill=TRUE)
- When you have check that your dataframes are well formatted, that is, both datasets contained dates recognised as dates (not as factors) you are ready to merge these datasets by date, one with air pollution data and another with weather data
The instruction class(pollutiond$date) or class(weatherd$date) inform you if these data are factors or dates
city1All <- merge(pollutiond, weatherd, by="date")
- You need to compare pollution between cities:
allData <- merge(city1All, city2All, by="date")
- You can keep a copy of the new file with all data, meteo and pollution from both cities with instruction:
write.csv(allData, file = “allData.csv”)
- When you have all meteorological and air pollution data from two cities together in file allData.csv and in allData R dataframe you can make statistical analysis of data
wilcox.test (alldata$NO2city1, alldata$NO2city2)
- Create a boxplot figure: boxplot(alldata$NO2city1, alldata$NO2city2, main=”NO2 pollution in city 1 and city 2″ ylab=”ug/m3″, xlab=”1: City 1, 2: City2 (year 2018, N=8760)”)
- You need to connect openair library to your project by writing require(openair) or library (openair)
To present a pollution rose combining air pollution and wind speed and direction:
pollutionRose(allData, pollutant = ‘O3’, type = ‘season’)
- calendarPlot(allData, pollutant = “ozone”, annotate = “ws”)
The following symbols can be used with the format( )function to print dates.
%d is day as a number (01-31), %a abbreviated weekday (Mon), %A unabbreviated weekday (Monday), %m month (01-12); %b abbreviated month (Jan); %B unabbreviated month (January); %y 2-digit year (19); %Y 4-digit year 4-digit year (2019). Hour:Minutes:Seconds R code format is %H:%M:%S
Here is an example:
# change from a factor variable to a date variable:
weatherd$date<- as.Date(weatherd$date format=”%d/%m/%Y”)
You can use as.POSIXct if as.Date does not work properly.
In order to forecast air pollution you can use interesting R packages:
prophet (created by facebook), forecast, caret, tseries (to forecast with ARIMA, Box-Jenkins and other models).
In the following example you will use prophet R package to forecast air pollution:
m <- prophet(allData)
future <- make_future_dataframe(m, periods = 31)
forecast <- predict(m, future)
plot(m, forecast) prophet_plot_components(m, forecast) shows the overall trend, weekly and yearly seasonality.
Here you have an online system to view some information created by our students:
Write subscript, superscript and symbols with keyword expression, example:
ylab=expression(paste( “NO (“, mu, “g/”, m^3, “)”, sep=””))