Jul 17, 2025

Optimizing File Processing in Python for High Performance

Discover how to accelerate file processing in Python using benchmark-driven techniques inspired by high-performance parsers in Rust, including lazy iteration, zero-copy strategies, and multiprocessing.

Introduction

Efficient file processing is a cornerstone of building high-performance backend systems, especially when dealing with large-scale data like logs, financial transactions, or sensor streams. Inspired by a Rust-based parser that achieved dramatic speedups through strategic optimization, this post explores how to achieve similar performance gains in Python. We’ll process math expressions from a file and walk through various optimization techniques — from baseline implementations to parallel processing.

Baseline Implementation in Python
Optimization 1: Avoid Unnecessary Memory Allocations
Optimization 2: Use Byte Processing with Minimal Parsing
Optimization 3: Avoid Lookahead with Peekable
Optimization 4: Parallel Processing with Multiprocessing
Optimization 5: Memory-Mapped File Access
Conclusion
What’s Next?
References

Baseline Implementation in Python

The base implementation reads the full file into memory, tokenizes it, and evaluates a mathematical expression line by line using recursive parsing. This naive approach is simple but not scalable.

def read_input_file(path):
    with open(path, 'r') as f:
        return f.read()

def tokenize(expression):
    for token in expression.split():
        if token.isdigit():
            yield ('NUMBER', int(token))
        else:
            yield (token, None)

def parse_expression(tokens):
    def parse_primary(tokens):
        token, value = next(tokens)
        if token == '(':
            result = parse_expression(tokens)
            next(tokens)  # skip ')'
            return result
        elif token == 'NUMBER':
            return value
        raise ValueError("Unexpected token")

    result = parse_primary(tokens)
    for token, _ in tokens:
        if token == ')':
            break
        right = parse_primary(tokens)
        if token == '+':
            result += right
        elif token == '-':
            result -= right
    return result

Optimization 1: Avoid Unnecessary Memory Allocations

Instead of collecting tokens into a list, we use generator-based lazy evaluation to reduce memory pressure. This avoids storing large intermediate structures and processes data on-the-fly.

def tokenize_lazy(expression):
    for part in expression.split():
        if part.isdigit():
            yield ('NUMBER', int(part))
        else:
            yield (part, None)

Optimization 2: Use Byte Processing with Minimal Parsing

Working with raw bytes can speed up processing by reducing the overhead of string manipulation. In this version, we avoid intermediate string objects entirely.

def tokenize_bytes(data):
    i = 0
    while i < len(data):
        byte = data[i]
        if byte in b' 
	':
            i += 1
            continue
        elif byte in b'+-()':
            yield (chr(byte), None)
            i += 1
        elif byte in b'0123456789':
            num = 0
            while i < len(data) and data[i] in b'0123456789':
                num = num * 10 + (data[i] - 48)
                i += 1
            yield ('NUMBER', num)

Optimization 3: Avoid Lookahead with Peekable

The original parser uses peek() to inspect tokens before consuming them. In Python, we can remove this overhead by rewriting the grammar to avoid lookahead entirely, improving iterator performance.

def parse_expression_linear(tokens):
    stack = []
    try:
        while True:
            token, value = next(tokens)
            if token == 'NUMBER':
                stack.append(value)
            elif token in ('+', '-'):
                op = token
                token, value = next(tokens)
                if token != 'NUMBER':
                    raise ValueError("Expected number after operator")
                if op == '+':
                    stack[-1] += value
                else:
                    stack[-1] -= value
            elif token == ')':
                break
    except StopIteration:
        pass
    return stack[0]

Optimization 4: Parallel Processing with Multiprocessing

For large files, splitting the work across multiple CPU cores can yield dramatic speedups. We split the input at top-level ’+’ operators and process each chunk in a separate process.

from multiprocessing import Pool

def parallel_eval(input_data, num_workers=4):
    # Assume input is split safely at '+' outside of parentheses
    chunks = split_input_safely(input_data, num_workers)
    with Pool(num_workers) as pool:
        results = pool.map(eval_chunk, chunks)
    return sum(results)

def split_input_safely(data, num_chunks):
    # Simplified logic for demonstration
    step = len(data) // num_chunks
    splits = [0]
    for i in range(1, num_chunks):
        idx = data.find(b'+', i * step)
        splits.append(idx if idx != -1 else len(data))
    splits.append(len(data))
    return [data[splits[i]:splits[i+1]] for i in range(num_chunks)]

def eval_chunk(chunk):
    return parse_expression(tokenize_bytes(chunk))

Optimization 5: Memory-Mapped File Access

Using memory-mapped files (mmap) avoids unnecessary data copying and leverages the OS’s virtual memory features, resulting in faster I/O for large files.

import mmap

def mmap_input_file(path):
    with open(path, 'rb') as f:
        with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
            return mm.read()

Conclusion

By adopting a step-by-step approach to optimizing file processing in Python, we can achieve significant performance improvements. From eliminating redundant allocations and using byte-level operations to introducing parallel processing and memory-mapping, each layer contributes to a faster and more efficient system.

What’s Next?

Here are several ideas for future posts:

Optimizing JSON parsing with simdjson and Python bindings
Building a multithreaded parser with asyncio and aiofiles
Profiling Python applications using py-spy and line_profiler
Performance benchmarking: timeit, pytest-benchmark, and perf
Exploring Cython or Rust extensions for CPU-bound workloads