Embarrassingly Parallel Computations with Ray

Ray is an excellent framework for large-scale distributed computations in Python. In this blog post, I demonstrate a simple example of Ray’s capability to perform Embarrassingly Parallel Computation with minimal and straightforward source code. It is well known that if we randomly throw dots inside a square enclosing a circle, the ratio of the number of dots that fall inside the circle to the total number of dots approaches π/4 (where π ≈ 3.1415926…), as the total number of dots approaches infinity. Figure 1 illustrates this concept.

Figure 1: Estimating Pi using a Monte Carlo computation.

Numerical methods that are based on random numbers are call Monte-Carlo computations. The accuracy of this method depends on the sample size. However, achieving higher accuracy requires a larger sample, which in turn increases computation time. Fortunately, this algorithm, which relies on random numbers, can be easily parallelized. Since random numbers are, by definition, uncorrelated, parallel tasks can execute the same algorithm simultaneously without interference. The only step remaining is to combine the partial results of these independent computations into a single final result using a reduction operation.

The simple Python code that implements this using Ray, along with the computation result executed in VS Code, is shown in Figure 2.

Figure 2: Python code using Ray in VS-Code.

The full code is enclosed below:

import ray
import random

# Initialize Ray
ray.init(ignore_reinit_error=True)

@ray.remote
def sample_pi(num_samples):
    count = 0
    for _ in range(num_samples):
        x, y = random.random(), random.random()
        if x*x + y*y <= 1.0:
            count += 1
    return count

# Number of samples for each task
num_samples = 10000000
# Number of tasks
num_tasks = 100

# Submit tasks to Ray
counts = ray.get([sample_pi.remote(num_samples) for _ in range(num_tasks)])

# Calculate the estimated value of pi
total_samples = num_samples * num_tasks
pi_estimate = 4 * sum(counts) / total_samples

print(f"Estimated value of pi: {pi_estimate}")

# Shutdown Ray
ray.shutdown()

It is recommended to verify the parallel execution of this code by monitoring your computer’s resource usage. For this purpose, you can use your system’s task manager. On Linux systems, tools like top or htop are particularly useful for observing CPU utilization in real time. Figure 3 provides an example of this process.

Figure 3: Verification of the parallel execution of the code by looking simultaneously at ‘htop’.

You are welcome to try this short example yourself and leave me a comment below.

Guy

Leave a comment

Your email address will not be published. Required fields are marked *