Mapreduce in R - Prime Numbers

This R sample shall get you started with mapreduce in R. Note that we follow some naming such that you can actually run mapreduce algorithms developed in this environment over Hadoop using, for example, rhadoop. You might need to run install.packages("plyr") before using this, as the PLYR package is used for grouping.

Implementation

First of all lets load PLYR package.

library(plyr);

Now lets initialize input values.

n_primes = 10000;
test_values = 2:n_primes;

We define a predicate which tests whether a number is prime.

mapfunc_isprime <- function(test_number)
{
  if (test_number==2)
    return(list(key = 1,value=1));
  
  for (i in 2:sqrt(test_number))
    if (test_number %% i == 0)
       return(list(key = 0,value=1));
  
  return(list(key = 1,value=1));
}

Step 1: Map all numbers to mappers.

mapresult = lapply(test_values, mapfunc_isprime) 

Step 2: Sort using key.

map_result_sorted = mapresult[order(sapply(mapresult,function(x){x$key}))]

Step 2b: Make a table (using ldply, very slow)

mapresult_df  = data.frame(key = unlist(lapply(map_result_sorted, "[","key")),
	       		  value = unlist(lapply(map_result_sorted, "[","value")))

Step 3: Apply the reducer over groups

reduce_result = ddply(mapresult_df, .(key), function(x) sum(x$value))
print(head(reduce_result));
print((reduce_result));

Online Resources

From functional programming to MapReduce in R Rhadoop

Author: Artem Leichter
Last modified: 2019-05-02