Skip to main content

Command Palette

Search for a command to run...

Python Multiprocessing: A Finesse Approach

Updated
5 min read
P

git commit -m "always keep learning"

A declaration

This content is not AI-generated whatsoever, so help me God.

Introduction

It’s 2025, and the 18-year-old me would never think I would be writing a blog on Python. As a platform engineer, I find myself running migrations of databases. As I advanced in these things, I found Python scripts are things I can depend on to get the job done, simulating migrations without hard writes, etc…

Today, we aren’t here to talk about migrations, but how I’m leveraging Python multiprocessing to run them quickly, and Multiprocessing is not concurrency but “true” Parallelism, aka doing multiple things at the same time.

A back story

Say you are a painter, and you have 100 rooms to paint in an apartment, options:

  • Single Processing (traditional approach)

    You are one painter who has to paint room 1, finish goes to 2, and so on, till you reach room 100.

    The only Advantage is the memory space one painter takes up space based on the room. On the downside is time, say you take 30 mins per room, that would be 30 × 100 = 3000 mins ~ 50 hours

  • Multi Processing

    Another approach, say you hire 10 painters, in this case, they represent processes, each painter takes up memory space (n), during execution, painter 1 takes up rooms 1 to 10, painter 2 takes up rooms 11 to 20, and so forth. With a bit of memory sacrifice, you will save time, 50 hours being cut into:
    10 rooms X 30 mins = 300 mins ~ 5 hours. Which is way faster

Python Interpretation

The above can be represented by the code snippet below

# This is like having one painter
for room in rooms:
    paint_room(room)  # Takes 30 minutes each

# This is like having multiple painters
with Pool(processes=10) as pool:
    pool.map(paint_room, rooms)  # All painters work simultaneously

Limitations to Multiprocessing

Based on my experience, it might be a bad thing for production usage depending on the use case.

  1. Number of Cores

    There is no faking till you make it here. And I learnt the hard way after seeing very weird behaviours in database updates. The more the workers never means faster execution Here is a breakdown:

     import time
     from multiprocessing import Pool
     import matplotlib.pyplot as plt
    
     def cpu_intensive_task(n):
         total = 0
         for i in range(10000000):
             total += i * n
         return total
    
     def benchmark_workers(max_workers=32, tasks=32):
         results = []
    
         for num_workers in range(1, max_workers + 1):
             start = time.time()
    
             with Pool(processes=num_workers) as pool:
                 pool.map(cpu_intensive_task, range(tasks))
    
             elapsed = time.time() - start
             results.append({
                 'workers': num_workers,
                 'time': elapsed,
                 'tasks_per_second': tasks / elapsed
             })
             print(f"Workers: {num_workers:3d} | Time: {elapsed:.2f}s | Tasks/sec: {tasks/elapsed:.2f}")
    
         return results
    
     # Run benchmark
     results = benchmark_workers(max_workers=32, tasks=32)
     ```
    
     ### Typical Results Pattern (8-core machine):
     ```
     Workers:   1 | Time: 32.00s | Tasks/sec: 1.00   ← Baseline
     Workers:   2 | Time: 16.10s | Tasks/sec: 1.99   ← ~2x faster ✓
     Workers:   4 | Time:  8.20s | Tasks/sec: 3.90   ← ~4x faster ✓
     Workers:   6 | Time:  5.60s | Tasks/sec: 5.71   ← ~6x faster ✓
     Workers:   8 | Time:  4.10s | Tasks/sec: 7.80   ← ~8x faster ✓
     Workers:  10 | Time:  4.05s | Tasks/sec: 7.90   ← Marginal improvement
     Workers:  12 | Time:  4.02s | Tasks/sec: 7.96   ← Barely faster
     Workers:  16 | Time:  4.00s | Tasks/sec: 8.00   ← Plateauing
     Workers:  20 | Time:  4.01s | Tasks/sec: 7.98   ← No improvement
     Workers:  32 | Time:  4.15s | Tasks/sec: 7.71   ← SLOWER! ✗
     ```
    
     ## Why This Happens
    
     ### With 8 Cores, 8 Workers (Optimal)
     ```
     Time →
     Core 1: [Worker 1 ████████████████]
     Core 2: [Worker 2 ████████████████]
     Core 3: [Worker 3 ████████████████]
     Core 4: [Worker 4 ████████████████]
     Core 5: [Worker 5 ████████████████]
     Core 6: [Worker 6 ████████████████]
     Core 7: [Worker 7 ████████████████]
     Core 8: [Worker 8 ████████████████]
    
     Each worker gets dedicated CPU time = FAST
     ```
    
     ### With 8 Cores, 16 Workers (Oversubscribed)
     ```
     Time →
     Core 1: [W1 ██][W9 ██][W1 ██][W9 ██]  ← Context switching
     Core 2: [W2 ██][W10 ██][W2 ██][W10 ██]
     Core 3: [W3 ██][W11 ██][W3 ██][W11 ██]
     Core 4: [W4 ██][W12 ██][W4 ██][W12 ██]
     Core 5: [W5 ██][W13 ██][W5 ██][W13 ██]
     Core 6: [W6 ██][W14 ██][W6 ██][W14 ██]
     Core 7: [W7 ██][W15 ██][W7 ██][W15 ██]
     Core 8: [W8 ██][W16 ██][W8 ██][W16 ██]
    
     Workers fight for CPU time = OVERHEAD
    
  2. Shared Resources

    Think of this like all 10 painters trying to use the same brush at once That could be chaotic. Things like Database connections, File handling that includes reading, writing, etc, HTTP requests, Memory, and variables. For example:

     # WRONG - File handle can't be shared
     file = open('data.txt', 'w')
    
     def worker(data):
         file.write(data)  # Will crash
    
     with Pool(5) as pool:
         pool.map(worker, ['data1', 'data2'])
    
     # CORRECT - Each process opens its own file
     def worker(data):
         with open(f'data_{os.getpid()}.txt', 'w') as file:
             file.write(data)
    
  3. The multiprocessing library has a manager that can manage variables

    From above, we have already established that shared resources are tricky to handle in parallel. So how do you handle this?.

    1. For non-mutating variables, it’s fine; each worker will get a copy of the global variable

       LARGE_LOOKUP_TABLE = {'painter_name': 'Patrick', ...}
      
       def worker(item):
           value = LARGE_LOOKUP_TABLE.get(item)
      
    2. Shared Variables

       from multiprocessing import Value, Array, Manager
      
       # Simple
       counter = Value('i', 0)  
       def worker(n, counter):
           with counter.get_lock():
               counter.value += 1
      
       # Complex
       manager = Manager()
       results_list = manager.list() 
       results_dict = manager.dict()
      

When to use Multiprocessing

As I sign out, as you are ready to prompt that agent to rewrite your backend APIs to use parallelism, there are things you need to consider.

Good for:

  • I/O bound tasks (database queries, API calls)

  • CPU-intensive tasks with independent work units

  • When you have many similar tasks

Not good for:

  • Tasks that need to share state frequently

  • Simple, fast operations (overhead might make it slower)

  • When you're near memory limits

This is just an introduction. Let me know if you are interested in advanced stuff like process intercommunications, which are quite similar to Golang channels, but a bit complicated.
Adios ✌️