How cucumber actually works

cucumber is built entirely on the standard library.

It is a serialization engine that converts Python objects into an intermediate representation (IR) that can be serialized by pickle into bytes.

Then, after the data is deserialized from bytes back to the IR, cucumber can take the IR and reconstruct the original Python objects.

Object -> Serializer._serialize_recursive() -> IR -> pickle.dumps() -> bytes
bytes -> pickle.loads() -> IR -> Deserializer._reconstruct_recursive() -> Object

2 central classes use specialized handlers to convert objects that pickle cannot handle into the IR format so that they can be serialized.

These same handlers are used to reconstruct the original Python objects from the IR.

There are handlers for a wide variety of objects, which is what allows cucumber to work with more objects than base pickle, cloudpickle, and dill.

The aforementioned classes, Serializer and Deserializer, handle recursion (walking through nested objects and collections), circular references, and metadata, using the handlers when they come across complex objects (or their IR state).

This allows cucumber to handle basically any object, including user defined classes.

Intermediate Representation (IR)

The IR is a nested structure of pickle native values. It attaches metadata to the objects so that cucumber knows how to reconstruct them on the other end.

There are a couple of different things you might see in an IR.

IR generated by a handler

This is the most common thing you will see in an object's IR.

# lock object's IR
{
    "__cucumber_type__": "lock",
    "__handler__": "LockHandler",
    "__object_id__": 140234567890,
    "state": {
        "locked": False
    }
}

pickle native wrapper IR (when __object_id__ is required)

{
    "__cucumber_type__": "pickle_native",
    "__object_id__": 123456,
    "value": obj
}

pickle native function wrapper IR

{
    "__cucumber_type__": "pickle_native_func",
    "__object_id__": 123456,
    "value": obj
}

Circular reference marker

These are used to mark circular references in the IR.

{"__cucumber_ref__": 140234567890}

Wrapped collections (for circular-capable containers)

These are used to wrap collections that are capable of handling circular references.

{
    "__cucumber_type__": "dict",
    "items": [(k1, v1), (k2, v2)],
    "__object_id__": 123456
}

Simple instance fast-path IR

For simple instances that don't need to pass through the standard flow in order to be serialized, cucumber will use a fast path to serialize them.

{
    "__cucumber_type__": "simple_class_instance",
    "__object_id__": 123,
    "module": "mymodule",
    "qualname": "MyClass",
    "attrs": {"x": 1, "y": 2}
}

This is a compact IR format that skips the overhead of the standard flow. It does this by storing a direct reference to the class, and the attributes of a given instance. It still attaches a __cucumber_type__ and __object_id__ to identify that the object took the fast path and to handle possible circular references.

Serialization

Serialization is done by a central, internal Serializer class, that uses the handlers to deconstruct complex objects into a nested dictionary of native pickle types, which are then serialized to bytes by pickle.dumps().

Tracking state

seen_objects: Dict[int, Any] Tracks object IDs that were already serialized to detect circular references.

_serialization_depth: int Recursive depth counter; used to prevent runaway recursion. If recursion depth exceeds 1000, a SerializationError is raised.

_object_path: List[str] Breadcrumb path to the current object, for error reporting.

_handler_cache: Dict[type, Handler] This is a cache for types that have been processed using a certain handler, so that future objects of that same type can find a valid handler without having to search through ALL_HANDLERS again.

Methods

Simple instance fast path

The serializer skips the entire handler system for simple instances to reduce overhead.

What counts as a simple instance?

When using the fast path, the serializer makes a compact IR that contains module, qualname, and a direct attrs dict with primitives only.

Function fast path

For module level functions without closures:

  1. The serializer records module and qualname.
  2. The deserializer can then import the module and resolve the function by name.

If the function is a lambda, local function, or has closures, cucumber falls back to the FunctionHandler and serializes code objects, globals, and closure state.

(A closure is a function that uses variables from outside itself, like a nested function using a variable from the outer function. The outside values need to be saved too.)

Handlers

All handlers are defined in suitkaise/cucumber/_int/handlers/.

They are placed into an ALL_HANDLERS list.

Deserialization

Deserialization is done by a central, internal Deserializer class, that uses the handlers to reconstruct the objects from the IR.

State tracking

_object_registry: Dict[int, Any] Maps __object_id__ to placeholders or reconstructed objects.

_reconstruction_path: List[str] Breadcrumbs for error reporting.

_reconstruction_depth: int Prevents infinite recursion.

_reconstructing: Set[int] and _reconstructed_cache: Dict[int, Any] Protect against pickle level deduplication of shared IR objects.

Reconstruction flow

The deserializer uses a two-pass approach to reconstruct the objects from the IR, in order to correctly handle all possible circular references.

  1. Placeholder registration _register_all_placeholders() scans the IR and creates empty containers or placeholder objects for each __object_id__.
  1. Actual reconstruction _reconstruct_recursive() walks the IR and resolves references, using handlers when needed.

_reconstruct_recursive(ir)

  1. Deduplicate cache pickle attemps to keep objects the same after loading them from bytes.

If the IR has one collection that is used in 2 or more places, pickle won't make 2 copies, instead creating a shared reference to that single copy.

If we don't deduplicate the cache, shared references become lost and multiple different objects are created.

  1. Circular reference marker (__cucumber_ref__) If the IR node is just a reference, the deserializer looks up the real object in the registry and returns that instead of rebuilding it.
  1. Primitive values Things like None, bool, int, float, str, bytes are already complete. They are returned directly with no reconstruction work.
  1. pickle native and wrapped collections Collections might be wrapped ({"__cucumber_type__": "list" ...}) to preserve identity, or they might be stored as plain lists, tuples, or sets.

The deserializer rebuilds the container and recursively reconstructs each item in the collection.

  1. Handler IRs (__cucumber_type__) If the node has a __cucumber_type__, it belongs to a handler. In this case, we use the matching handler to reconstruct the object using _reconstruct_from_handler().
  1. Fallback If nothing matches, the deserializer returns the IR node as-is. This keeps unknown structures intact instead of just erroring out.

_reconstruct_from_handler(data)

  1. Extract data type_name, handler_name, obj_id, state are extracted from the IR node.

These fields tell the deserializer:

  1. Find handler The handler list is searched for a handler whose class name matches handler_name.
  1. obj_id placeholders Reuse placeholders for obj_id if they exist.

Placeholders were created in the first pass to handle the circular references. Reusing them preserves shared references and cycles.

  1. Reconstruct state (recursively) The handler's state may contain nested objects, so the deserializer fully reconstructs that state first before moving on.
  1. Call handler.reconstruct(state) The handler uses the reconstructed state to create the real live object.
  1. Replace placeholder with real object If a placeholder was registered, it is swapped out for the real object so all references point to the final live object.
Handlers
Reconnectors