Functional Programming: reduce
Continuing the theme of functional programming and
higher-order functions,
this one is about another example: reduce
. In my
previous post, I wrote about map
and its use
generally in functional programming and specifically in R. This tutorial will
follow a similar structure.
(Image source: https://iamit.in/blog/Higher-order-functions-in-Python-sorted-map-reduce-filter/)
What is reduce
?
reduce
is quite an interesting function. It is sometimes known in other
languages as fold
. Like map
it takes as input a function and a collection
(e.g. list). Because this is functional programming, the magic lies in the
input function. So, let me describe its behavior and how reduce
uses it:
- The function takes two argument. The first is an accumulator (we’ll call
it
acc
), and the second would correspond to an element in the collection (we’ll call itx
). reduce
applies that function on each element of the collection in a way that the result of the first call (on the first element) is used as the accumulator for the second call, which returns another value to be used as the accumulator for the third call, and so on… In other words, with each element in the collection, the accumulator gets updated until a final value is obtained.
Unlike map
, I believe it is hard to grasp from an abstract point of view
without examples, so I will get to that right away.
How to implement it in R?
Let’s start with a simple example. But before starting, I need to point out
that the function’s name is Reduce()
with a capital R.
If I want to sum a numeric vector c(1,2,3,4,5)
with an algorithm that uses
Reduce
, the logic is that the accumulator starts at 0, gets added to the
first element and 1 becomes the new accumulator. Then, it gets added to the
second element and 3 becomes the new accumulator, and so on… So, the
function that should be given to Reduce()
takes an accumulator and a (new)
element and adds them. You can define that as an anonymous function.
function(acc, x) {
acc + x
}
After nailing down the function, we can now call Reduce()
with the
anonymous function and the vector as arguments.
Reduce(function(acc, x) {acc + x}, c(1,2,3,4))
# [1] 10
You can think that reduce
performs the task like this:
(((1 + 2) + 3) + 4)
Side tip: you could actually use the plus sign wrapped in backticks `+`
which means you are using this
infix operator as a function
to be passed (e.g. as an argument to Reduce()
).
Practical example
The previous example was a little bit silly, although you have to get silly sometimes to introduce new concepts based on what you already know and are comfortable with. The next example is a bit more practical. Our goal here is merge two data frames by inserting one data frame at a specific index (i.e. in the middle) of another data frame.
Consider these example data frames:
d <- data.frame(A = 1:4, B = c("AA", "BB", "DD", "EE"))
to_insert <- data.frame(A = 2.5, B = "CC")
d$ID <- seq(nrow(d))
to_insert$ID <- 2.5
d
# A B ID
# 1 1 AA 1
# 2 2 BB 2
# 3 3 DD 3
# 4 4 EE 4
to_insert
# A B ID
# 1 2.5 CC 2.5
What I want to do is basically insert the data frame to_insert
between row
2 and row 3 of the data frame d
. There are two main things that need to be
done to get a Reduce()
algorithm ready.
Make a list out of the data frame
Reduce()
need a collection to iterate (and accumulate) over. In this case, we need to build a “collection of rows”. This is easily done with the function split()
.
d_split <- split(d, d$ID)
d_split
# $`1`
# A B ID
# 1 1 AA 1
#
# $`2`
# A B ID
# 2 2 BB 2
#
# $`3`
# A B ID
# 3 3 DD 3
#
# $`4`
# A B ID
# 4 4 EE 4
#
That takes care of one argument.
Build the accumulating function
The accumulation in the previous example was done by addition. But in this example, accumulation is actually row-binding. So, the accumulator will be a data frame, and I will keep adding the rows of the list to that accumulator until I’ve reached the index of interest, in which case I will also bind the data frame I want to insert before continuing the accumulation process.
fn <- function(acc, row) {
if (row$ID == 2) {
rbind(acc, row, to_insert)
} else {
rbind(acc, row)
}
}
Putting it together
The final step is merely calling Reduce()
with the pieces we built.
Reduce(fn, d_split)
# A B ID
# 1 1.0 AA 1.0
# 2 2.0 BB 2.0
# 11 2.5 CC 2.5
# 3 3.0 DD 3.0
# 4 4.0 EE 4.0
From there, you could easily work your way forward by abstracting this algorithm into a function, or maybe adjust the row names.
That’s it for reduce
! Another interesting functional programming concept,
and R already has it built in.
P.S. The last example was inspired from
a LinkedIn post by Adrian Olszewski
in which he used a similar implementation but with do.call()
instead of
Reduce()
.