Welcome!

I am a physical oceanographer interested in how ocean water is mixed and transformed. I am currently working as Research Scientist at the Bedford Institute of Oceanography in Halifax, Nova Scotia.

Introduction

A lot has happened since my last post (and really, I mean A LOT). However, in the spirit of that last post, one good thing that has happened is that in the upcoming 1.3-0 release of the oce package on CRAN we have changed the default colormap (or “palette” as it’s often referred in the R world) to something not nearly sucky as the classic “jet” colormap (originally made popular by Matlab).

The purpose of this post is just to highlight the difference between the old and new defaults, and also to show many other great colormaps for plotting oceanographic data.

Jet vs viridis

When the imagep() function was first added to oce, there was a limited selection of palettes that could be used. For a long time, the default of the imagep() function itself has been a palette called oceColorsPalette() (which is a blue-through-white-to-red palette most useful for diverging colormaps), however many of the internal functions that use colors defaulted to the oceColorsJet() palette, including: the plot,section-method, and more importantly, the colormap() function itself (which I wrote about here).

The palette on the left, oceColorsPalette() has been retained as the default for the plot,adp-method (is velocities are typically diverging). The palette in the middle, Jet, is still available but is no longer the default for any plots. The viridis palette on the right is now the default for all palette-related functions.

I won’t go into the details of why viridis is so much better – a quick google will turn up lots of articles (though this new paper from Kristen Thyng has lots of great info.

Other, even better palettes

Kristen Thyng is author the author of the cmocean series of oceanographic colormaps, originally created for python/matplotlib. There is now a cmocean package on CRAN, which makes all the python colormaps available in R.

There is also the viridis package itself, which makes the series of palettes developed for matplotlib available in R.

Introduction

Making plots in oceanography (or anything, really) often requires creating some kind of “color map” – that is, having a color represent a field in a plot that is otherwise two-dimensional. Frequently this is done when making “image”-style plots (known in MatlabTM parlance as “pcolor” or pseudocolor plots), but could also be in coloring points on a 2D scatter plot based on a third variable (e.g. a TS plot with points colored for depth).

There are a whole bunch of different ways to make colormaps in R, including various approaches that are derived from the “tidyverse” and ggplot2 package for analyzing and plotting data. I don’t really use that approach for most of my work, so won’t touch on them here.

Instead, the purpose of this post (inspired by a question from a colleague who I know is a Matlab and Python user) is to show some of the ways various functions contained in the oce package cab be used to make colormaps and colorbars (or “palettes”). In particular, for the case where one wants a “discrete” (i.e. not continuous) colormap.

The imagep() function

For making image-style plots, the function imagep() provided by the oce package is a handy function for quickly making nice-looking pseudocolor plots of matrices. The “p” in imagep() stands for “palette” or “pseudocolor”. Mostly, it is a wrapper around the base R function image(), to allow for increased control of the axes, colors, and palette specification.

The above example shows how to use imagep() with 3 different colormaps (the default, the classic “jet” scheme, and the cmocean package) to generate an image plot with a nice palette automatically placed on the side.

The drawPalette() and colormap() functions

Under the hood, the imagep() function calls another function to actually draw the palette on the side of the plot – the drawPalette() function. That function can be called on it’s own, enabling plot building to be much more flexible. Additionally there is the colormap() function, which allows for detailed specification of the colormap properties to use, which can then be passed as an object to drawPalette().

For example, say we wanted to make a TS (temperature-salinity) plot of the Levitus surface data, but with each point colored by latitude in 10 degree increments. We do that first by making a colormap object:

(note that we use expand.grid() to make the number of lon/lat points match the matrices).

Looking inside the object, we can see some of the details that further plotting/palette functions can make use of:

Some of those fields are obvious (some probably aren’t to inexperienced users) but one field that is handy to know about is the zcol field. This encodes a color for every value in the original object based on the colormap specification. So, we can make a plot with the points colored based on the colormap using the argument col=cm\$zcol. We can also add the palette to the plot using the drawPalette() function, which has to be called before the main plot:

One problem that can happen when there are a lot of points is that overplotting obscures patterns in the colors. An easy way to fix this is to randomize the order of the plotted points with the sample() function:

Custom color palettes with colormap()

In addition to the “known” color palettes that are included in R and oce (see also the cmocean package below, as well as RcolorBrewer), the colormap() function has arguments that allow for custom-built palettes. Specifically the x0, x1, col0 and col1 arguments, which are detailed in the help file as:

x0, x1, col0, col1: Vectors that specify a color map.  They must all be
the same length, with ‘x0’ and ‘x1’ being numerical values,
and ‘col0’ and ‘col1’ being colors.  The colors may be
strings (e.g. ‘"red"’) or colors as defined by rgb or
hsv.


The idea is that the x0 values define the numeric level of the bottom of the color ranges, the x1 values define the top of the color ranges, and the col0 and col1 the colors associated with the levels. An example:

(Note that to make the above plot, I had to fix a bug in oce that was making the zcol come out as “black” for all cases. Either build oce from source, or wait for the update to get pushed to CRAN in a month or two).

Introduction

There is a recent trend in places like Twitter to include in your bio the atmospheric CO2 concentration when you were born. I like it, since it is both a neat measure of the range of ages of people that you can interact with (without being really about age per se), and also since it is a sobering reminder of just how much damage we as a species have done in a very short amount of time.

Anyway, there’s nothing overly complicated about figuring this out – probably a simple Google search would be enough to tell me what the atmospheric concentration of CO2 was when I was born. But where’s the fun in that? :)

The R co2 dataset

Handily, R comes bundled with an example dataset called co2 (as an examples of a “time series”, or ts object), which contains monthly measurements of CO2 from the Mauna Loa observatory from 1959 up to 1997 (I wonder if we can get the R core team to update this dataset for the last 20 years ….).

My birthday is September, 1979, so let’s see where that lands on the curve:

It occurs to me looking at that graph that perhaps the raw monthly value isn’t the right number to choose, since I was clearly born at a seasonal minimum of CO2 concentration (i.e. at the end of Northern Hemisphere summer, when lots of atmospheric CO2 was locked up in plants). So, first I’ll figure out the “raw” value, and then next we’ll smooth the series to get something that is more representative of the background CO2 concentration.

Concentration based on a smoothed time series

I love smoothing splines, so I’ll use that to smooth the co2 data before interpolating:

So, I was born at 337 ppm.

A perfect CTD profile

I love the $\tanh$ function. A lot. It’s such a perfect model for a density interface in the ocean, that it is commonly used in theoretical and numerical models and I regularly used it for both research and demonstration/example purposes. Behold, a $\tanh$ interface:

$T(z) = T_0 + \delta T \tanh \left( \frac{z-z_0}{dz} \right)$

But whenever I use it, especially for teaching, I’m always saying how it’s idealized and really doesn’t represent what an ocean interface actually looks like. UNTIL NOW.

Yes, this is real data.

Just how close to a $\tanh$ is it?1

1. My PhD advisor, who also taught me introductory physical oceanography, once said to a class of students while using tanh to describe an idealized interface: “tanh – it’s like ‘lunch’, only better!”

Negating functions and function definitions: an 'opposite' function to the wonderful %in% operator

Introduction

R has some neat functions, and even some weird quirks, that you aren’t likely to discover on your own but can either be immensely helpful or horribly confounding.

For example, the “+” operator (i.e. addition) is actually a function, and can even be called using the typical “bracket” notation:

We can use backticks to evaluate the function as a “regular” function:

And can therefore call it as a “regular” function, using brackets to pass the arguments:

One consequence of this is that it is possible to redefine how “+” works:

Ok … admittedly that’s confusing. Why would you want to redefine “+”? Well, one example is given by the syntax of the ggplot2 package, which defines it’s own version of “+” that lets you string plotting functions together to build up a plot (e.g. see my post about plotting here).

The %in% function

The %in% function is one of those functions that just clicked when I started using R. It’s an elegant way to write conditional statements – it checks whether the object to the left of the operator occurs anywhere in the object on the right of the operator. An example:

Does the number 5 occur in the vector of 1 through 10?

Yes, it does (obviously).

One thing that I often find myself doing however, is wanting to know if something doesn’t occur in another object. To make that work, I usually wrap the whole statement in brackets and then precede with an ! operator (logical negation). Like this:

This evaluates to TRUE because 11 is not in the vector 1:10. While this works, it’s always bugged me because it just looks inelegant.

Well, while browsing Twitter recently, I came across this post from @groundwalkergmb:

First, note the (confusing) syntax that you don’t actually have to negate the entire expression. It is equivalent to write:

Basically, you can use the Negate() function (never knew about this before) to create a new function which returns the logical negation of the output of the original.