Functional Programming: reduce

September 13, 2020

Continuing the theme of functional programming and higher-order functions, this one is about another example: reduce. In my previous post, I wrote about map and its use generally in functional programming and specifically in R. This tutorial will follow a similar structure.

map

(Image source: https://iamit.in/blog/Higher-order-functions-in-Python-sorted-map-reduce-filter/)

What is `reduce`?

reduce is quite an interesting function. It is sometimes known in other languages as fold. Like map it takes as input a function and a collection (e.g. list). Because this is functional programming, the magic lies in the input function. So, let me describe its behavior and how reduce uses it:

The function takes two argument. The first is an accumulator (we’ll call it acc), and the second would correspond to an element in the collection (we’ll call it x).
reduce applies that function on each element of the collection in a way that the result of the first call (on the first element) is used as the accumulator for the second call, which returns another value to be used as the accumulator for the third call, and so on… In other words, with each element in the collection, the accumulator gets updated until a final value is obtained.

Unlike map, I believe it is hard to grasp from an abstract point of view without examples, so I will get to that right away.

How to implement it in R?

Let’s start with a simple example. But before starting, I need to point out that the function’s name is Reduce() with a capital R.

If I want to sum a numeric vector c(1,2,3,4,5) with an algorithm that uses Reduce, the logic is that the accumulator starts at 0, gets added to the first element and 1 becomes the new accumulator. Then, it gets added to the second element and 3 becomes the new accumulator, and so on… So, the function that should be given to Reduce() takes an accumulator and a (new) element and adds them. You can define that as an anonymous function.

function(acc, x) {
    acc + x
}

After nailing down the function, we can now call Reduce() with the anonymous function and the vector as arguments.

Reduce(function(acc, x) {acc + x}, c(1,2,3,4))
# [1] 10

You can think that reduce performs the task like this: (((1 + 2) + 3) + 4)

Side tip: you could actually use the plus sign wrapped in backticks `+` which means you are using this infix operator as a function to be passed (e.g. as an argument to Reduce()).

Practical example

The previous example was a little bit silly, although you have to get silly sometimes to introduce new concepts based on what you already know and are comfortable with. The next example is a bit more practical. Our goal here is merge two data frames by inserting one data frame at a specific index (i.e. in the middle) of another data frame.

Consider these example data frames:

d <- data.frame(A = 1:4, B = c("AA", "BB", "DD", "EE"))
to_insert <- data.frame(A = 2.5, B = "CC")

d$ID <- seq(nrow(d))
to_insert$ID <- 2.5

d
#   A  B ID
# 1 1 AA  1
# 2 2 BB  2
# 3 3 DD  3
# 4 4 EE  4

to_insert
#     A  B  ID
# 1 2.5 CC 2.5

What I want to do is basically insert the data frame to_insert between row 2 and row 3 of the data frame d. There are two main things that need to be done to get a Reduce() algorithm ready.

Make a list out of the data frame

Reduce() need a collection to iterate (and accumulate) over. In this case, we need to build a “collection of rows”. This is easily done with the function split().

d_split <- split(d, d$ID)
d_split
# $`1`
#   A  B ID
# 1 1 AA  1
#
# $`2`
#   A  B ID
# 2 2 BB  2
#
# $`3`
#   A  B ID
# 3 3 DD  3
#
# $`4`
#   A  B ID
# 4 4 EE  4
#

That takes care of one argument.

Build the accumulating function

The accumulation in the previous example was done by addition. But in this example, accumulation is actually row-binding. So, the accumulator will be a data frame, and I will keep adding the rows of the list to that accumulator until I’ve reached the index of interest, in which case I will also bind the data frame I want to insert before continuing the accumulation process.

fn <- function(acc, row) {

  if (row$ID == 2) {
    rbind(acc, row, to_insert)
  } else {
    rbind(acc, row)
  }

}

Putting it together

The final step is merely calling Reduce() with the pieces we built.

Reduce(fn, d_split)
#      A  B  ID
# 1  1.0 AA 1.0
# 2  2.0 BB 2.0
# 11 2.5 CC 2.5
# 3  3.0 DD 3.0
# 4  4.0 EE 4.0

From there, you could easily work your way forward by abstracting this algorithm into a function, or maybe adjust the row names.

That’s it for reduce! Another interesting functional programming concept, and R already has it built in.

P.S. The last example was inspired from a LinkedIn post by Adrian Olszewski in which he used a similar implementation but with do.call() instead of Reduce().

Tags: