Python Multiprocessing: A Finesse Approach

A declaration

This content is not AI-generated whatsoever, so help me God.

Introduction

It’s 2025, and the 18-year-old me would never think I would be writing a blog on Python. As a platform engineer, I find myself running migrations of databases. As I advanced in these things, I found Python scripts are things I can depend on to get the job done, simulating migrations without hard writes, etc…

Today, we aren’t here to talk about migrations, but how I’m leveraging Python multiprocessing to run them quickly, and Multiprocessing is not concurrency but “true” Parallelism, aka doing multiple things at the same time.

A back story

Say you are a painter, and you have 100 rooms to paint in an apartment, options:

Single Processing (traditional approach)

You are one painter who has to paint room 1, finish goes to 2, and so on, till you reach room 100.

The only Advantage is the memory space one painter takes up space based on the room. On the downside is time, say you take 30 mins per room, that would be 30 × 100 = 3000 mins ~ 50 hours
Multi Processing

Another approach, say you hire 10 painters, in this case, they represent processes, each painter takes up memory space (n), during execution, painter 1 takes up rooms 1 to 10, painter 2 takes up rooms 11 to 20, and so forth. With a bit of memory sacrifice, you will save time, 50 hours being cut into:
10 rooms X 30 mins = 300 mins ~ 5 hours. Which is way faster

Python Interpretation

The above can be represented by the code snippet below

# This is like having one painter
for room in rooms:
    paint_room(room)  # Takes 30 minutes each

# This is like having multiple painters
with Pool(processes=10) as pool:
    pool.map(paint_room, rooms)  # All painters work simultaneously

Limitations to Multiprocessing

Based on my experience, it might be a bad thing for production usage depending on the use case.

Number of Cores

There is no faking till you make it here. And I learnt the hard way after seeing very weird behaviours in database updates. The more the workers never means faster execution Here is a breakdown:

 import time
 from multiprocessing import Pool
 import matplotlib.pyplot as plt

 def cpu_intensive_task(n):
     total = 0
     for i in range(10000000):
         total += i * n
     return total

 def benchmark_workers(max_workers=32, tasks=32):
     results = []

     for num_workers in range(1, max_workers + 1):
         start = time.time()

         with Pool(processes=num_workers) as pool:
             pool.map(cpu_intensive_task, range(tasks))

         elapsed = time.time() - start
         results.append({
             'workers': num_workers,
             'time': elapsed,
             'tasks_per_second': tasks / elapsed
         })
         print(f"Workers: {num_workers:3d} | Time: {elapsed:.2f}s | Tasks/sec: {tasks/elapsed:.2f}")

     return results

 # Run benchmark
 results = benchmark_workers(max_workers=32, tasks=32)
 ```

 ### Typical Results Pattern (8-core machine):
 ```
 Workers:   1 | Time: 32.00s | Tasks/sec: 1.00   ← Baseline
 Workers:   2 | Time: 16.10s | Tasks/sec: 1.99   ← ~2x faster ✓
 Workers:   4 | Time:  8.20s | Tasks/sec: 3.90   ← ~4x faster ✓
 Workers:   6 | Time:  5.60s | Tasks/sec: 5.71   ← ~6x faster ✓
 Workers:   8 | Time:  4.10s | Tasks/sec: 7.80   ← ~8x faster ✓
 Workers:  10 | Time:  4.05s | Tasks/sec: 7.90   ← Marginal improvement
 Workers:  12 | Time:  4.02s | Tasks/sec: 7.96   ← Barely faster
 Workers:  16 | Time:  4.00s | Tasks/sec: 8.00   ← Plateauing
 Workers:  20 | Time:  4.01s | Tasks/sec: 7.98   ← No improvement
 Workers:  32 | Time:  4.15s | Tasks/sec: 7.71   ← SLOWER! ✗
 ```

 ## Why This Happens

 ### With 8 Cores, 8 Workers (Optimal)
 ```
 Time →
 Core 1: [Worker 1 ████████████████]
 Core 2: [Worker 2 ████████████████]
 Core 3: [Worker 3 ████████████████]
 Core 4: [Worker 4 ████████████████]
 Core 5: [Worker 5 ████████████████]
 Core 6: [Worker 6 ████████████████]
 Core 7: [Worker 7 ████████████████]
 Core 8: [Worker 8 ████████████████]

 Each worker gets dedicated CPU time = FAST
 ```

 ### With 8 Cores, 16 Workers (Oversubscribed)
 ```
 Time →
 Core 1: [W1 ██][W9 ██][W1 ██][W9 ██]  ← Context switching
 Core 2: [W2 ██][W10 ██][W2 ██][W10 ██]
 Core 3: [W3 ██][W11 ██][W3 ██][W11 ██]
 Core 4: [W4 ██][W12 ██][W4 ██][W12 ██]
 Core 5: [W5 ██][W13 ██][W5 ██][W13 ██]
 Core 6: [W6 ██][W14 ██][W6 ██][W14 ██]
 Core 7: [W7 ██][W15 ██][W7 ██][W15 ██]
 Core 8: [W8 ██][W16 ██][W8 ██][W16 ██]

 Workers fight for CPU time = OVERHEAD

Shared Resources

Think of this like all 10 painters trying to use the same brush at once That could be chaotic. Things like Database connections, File handling that includes reading, writing, etc, HTTP requests, Memory, and variables. For example:

 # WRONG - File handle can't be shared
 file = open('data.txt', 'w')

 def worker(data):
     file.write(data)  # Will crash

 with Pool(5) as pool:
     pool.map(worker, ['data1', 'data2'])

 # CORRECT - Each process opens its own file
 def worker(data):
     with open(f'data_{os.getpid()}.txt', 'w') as file:
         file.write(data)

The multiprocessing library has a manager that can manage variables

From above, we have already established that shared resources are tricky to handle in parallel. So how do you handle this?.

For non-mutating variables, it’s fine; each worker will get a copy of the global variable

 LARGE_LOOKUP_TABLE = {'painter_name': 'Patrick', ...}

 def worker(item):
     value = LARGE_LOOKUP_TABLE.get(item)

Shared Variables

 from multiprocessing import Value, Array, Manager

 # Simple
 counter = Value('i', 0)  
 def worker(n, counter):
     with counter.get_lock():
         counter.value += 1

 # Complex
 manager = Manager()
 results_list = manager.list() 
 results_dict = manager.dict()

When to use `Multiprocessing`

As I sign out, as you are ready to prompt that agent to rewrite your backend APIs to use parallelism, there are things you need to consider.

Good for:

I/O bound tasks (database queries, API calls)
CPU-intensive tasks with independent work units
When you have many similar tasks

Not good for:

Tasks that need to share state frequently
Simple, fast operations (overhead might make it slower)
When you're near memory limits

This is just an introduction. Let me know if you are interested in advanced stuff like process intercommunications, which are quite similar to Golang channels, but a bit complicated.
Adios ✌️

Python Multiprocessing: A Finesse Approach

A declaration

Introduction

A back story

Python Interpretation

Limitations to Multiprocessing

When to use `Multiprocessing`

Comments

More from this blog

Great Design Thinking Beats Clever Workarounds Every Time

WebRTC for Audio Spaces: Key Architecture Considerations

Part 1: Setting Up the Development Environment

Introduction to Building a Mini Chatting App

Command Palette

A declaration

Introduction

A back story

Python Interpretation

Limitations to Multiprocessing

When to use Multiprocessing

Comments

More from this blog

When to use `Multiprocessing`