cucumber actually works is built entirely on the standard library.
It is a serialization engine that converts Python objects into an intermediate representation (IR) that can be serialized by pickle into bytes.
Then, after the data is deserialized from bytes back to the IR, can take the IR and reconstruct the original Python objects.
Object -> Serializer._serialize_recursive() -> IR -> pickle.dumps() -> bytes
bytes -> pickle.loads() -> IR -> Deserializer._reconstruct_recursive() -> Object
2 central classes use specialized handlers to convert objects that pickle cannot handle into the IR format so that they can be serialized.
These same handlers are used to reconstruct the original Python objects from the IR.
There are handlers for a wide variety of objects, which is what allows to work with more objects than base pickle, cloudpickle, and dill.
The aforementioned classes, Serializer and Deserializer, handle recursion (walking through nested objects and collections), circular references, and metadata, using the handlers when they come across complex objects (or their IR state).
This allows to handle basically any object, including user defined classes.
The IR is a nested structure of pickle native values. It attaches metadata to the objects so that knows how to reconstruct them on the other end.
There are a couple of different things you might see in an IR.
This is the most common thing you will see in an object's IR.
# lock object's IR
{
"__cucumber_type__": "lock",
"__handler__": "LockHandler",
"__object_id__": 140234567890,
"state": {
"locked": False
}
}
pickle native wrapper IR (when __object_id__ is required){
"__cucumber_type__": "pickle_native",
"__object_id__": 123456,
"value": obj
}
pickle native function wrapper IR{
"__cucumber_type__": "pickle_native_func",
"__object_id__": 123456,
"value": obj
}
These are used to mark circular references in the IR.
{"__cucumber_ref__": 140234567890}
These are used to wrap collections that are capable of handling circular references.
{
"__cucumber_type__": "dict",
"items": [(k1, v1), (k2, v2)],
"__object_id__": 123456
}
For simple instances that don't need to pass through the standard flow in order to be serialized, will use a fast path to serialize them.
{
"__cucumber_type__": "simple_class_instance",
"__object_id__": 123,
"module": "mymodule",
"qualname": "MyClass",
"attrs": {"x": 1, "y": 2}
}
This is a compact IR format that skips the overhead of the standard flow. It does this by storing a direct reference to the class, and the attributes of a given instance. It still attaches a __cucumber_type__ and __object_id__ to identify that the object took the fast path and to handle possible circular references.
Serialization is done by a central, internal Serializer class, that uses the handlers to deconstruct complex objects into a nested dictionary of native pickle types, which are then serialized to bytes by pickle.dumps().
seen_objects: Dict[int, Any] Tracks object IDs that were already serialized to detect circular references.
_serialization_depth: int Recursive depth counter; used to prevent runaway recursion. If recursion depth exceeds 1000, a is raised.
_object_path: List[str] Breadcrumb path to the current object, for error reporting.
_handler_cache: Dict[type, Handler] This is a cache for types that have been processed using a certain handler, so that future objects of that same type can find a valid handler without having to search through ALL_HANDLERS again.
serialize (obj) -> bytes_serialize_recursive(obj) to build IR.pickle.dumps().serialize_ir (obj) -> AnyThis does the same thing as , but returns the IR directly instead of converting it to bytes.
_serialize_recursive(obj) to build IR._serialize_recursive(obj) -> AnyThis is the core method that builds the IR for a given object. The steps are ordered in a way that maximizes speed and avoids unnecessary handler calls.
_serialization_depth exceeds 1000, a SerializationError is raised.None, bool, int, float, str, bytes.seen_objects and emit {"__cucumber_ref__": id}.pickle native objects For common pickle native objects, skip handlers and wrap as a pickle_native or pickle_native_func IR node when needed.then use the simple_class_instance IR instead of a full handler.
pickle native function fast path Module level functions without closures are serialized by reference (module + qualname).ALL_HANDLERS (with caching) and find the first can_handle(obj) that returns true.handler.extract_state(obj) returns a dict. That dict is then recursively serialized by the _serialize_recursive method, so on until the state is fully serialized.__cucumber_type__, __handler__, __object_id__, and the serialized state.The serializer skips the entire handler system for simple instances to reduce overhead.
What counts as a simple instance?
<locals> in __qualname__ and not nested)__slots____serialize____dict__ are primitivesWhen using the fast path, the serializer makes a compact IR that contains module, qualname, and a direct attrs dict with primitives only.
For module level functions without closures:
module and qualname.If the function is a lambda, local function, or has closures, falls back to the FunctionHandler and serializes code objects, globals, and closure state.
(A closure is a function that uses variables from outside itself, like a nested function using a variable from the outer function. The outside values need to be saved too.)
All handlers are defined in .
They are placed into an ALL_HANDLERS list.
ALL_HANDLERS is an ordered list.ClassInstanceHandler is last, acting as a catch all for generic or user defined classesDeserialization is done by a central, internal Deserializer class, that uses the handlers to reconstruct the objects from the IR.
pickle native and fast-path types_object_registry: Dict[int, Any] Maps __object_id__ to placeholders or reconstructed objects.
_reconstruction_path: List[str] Breadcrumbs for error reporting.
_reconstruction_depth: int Prevents infinite recursion.
_reconstructing: Set[int] and _reconstructed_cache: Dict[int, Any] Protect against pickle level deduplication of shared IR objects.
The deserializer uses a two-pass approach to reconstruct the objects from the IR, in order to correctly handle all possible circular references.
_register_all_placeholders() scans the IR and creates empty containers or placeholder objects for each __object_id__._reconstruct_recursive() walks the IR and resolves references, using handlers when needed._reconstruct_recursive(ir)pickle attemps to keep objects the same after loading them from bytes.If the IR has one collection that is used in 2 or more places, pickle won't make 2 copies, instead creating a shared reference to that single copy.
If we don't deduplicate the cache, shared references become lost and multiple different objects are created.
__cucumber_ref__) If the IR node is just a reference, the deserializer looks up the real object in the registry and returns that instead of rebuilding it.None, bool, int, float, str, bytes are already complete. They are returned directly with no reconstruction work.pickle native and wrapped collections Collections might be wrapped ({"__cucumber_type__": "list" ...}) to preserve identity, or they might be stored as plain lists, tuples, or sets.The deserializer rebuilds the container and recursively reconstructs each item in the collection.
__cucumber_type__) If the node has a __cucumber_type__, it belongs to a handler. In this case, we use the matching handler to reconstruct the object using _reconstruct_from_handler()._reconstruct_from_handler(data)type_name, handler_name, obj_id, state are extracted from the IR node.These fields tell the deserializer:
handler_name.obj_id placeholders Reuse placeholders for obj_id if they exist.Placeholders were created in the first pass to handle the circular references. Reusing them preserves shared references and cycles.
state may contain nested objects, so the deserializer fully reconstructs that state first before moving on.handler.reconstruct(state) The handler uses the reconstructed state to create the real live object.Handlers are helpers that know how to serialize and reconstruct specific object types.
They follow a simple pattern.
class Handler(ABC):
type_name: str
def can_handle(self, obj) -> bool: ...
def extract_state(self, obj) -> Dict[str, Any]: ...
def reconstruct(self, state: Dict[str, Any]) -> Any: ...
All handlers are defined in and are registered in handlers/__init__.py.
FunctionHandlerFunctionHandlerSerializes function objects.
For module-level functions without closures, stores module + qualname and imports on reconstruction. This is ~100x faster and smaller. It also stores a lightweight hash of the function signature to validate the reference.
Requirements
__module__ and __qualname____main__ (can't import __main__)qualname doesn't contain <locals> (not a nested function)For closures, nested functions, and dynamically created functions, stores bytecode, globals, and closure state.
State captured:
code: Code object (recursively serialized)globals: Only referenced global names (excludes __builtins__)name, defaults, kwdefaultsclosure: List of captured variable valuesannotations, doc, moduleReconstruction:
types.FunctionType(), inject local __builtins__, recreate closure cellsLambdaHandlerLambdaHandlerSerializes lambda functions (anonymous functions with name <lambda>).
Uses the same serialization approach as FunctionHandler. Lambdas are just anonymous functions, so the extraction and reconstruction logic is reused.
Since lambdas can't be referenced by name (they're anonymous), they always use the full serialization path with bytecode.
PartialFunctionHandlerPartialFunctionHandlerSerializes functools.partial objects.
State captured:
func: The wrapped function (recursively serialized)args: Positional arguments already boundkeywords: Keyword arguments already boundReconstruction: functools.partial(func, *args, **keywords)
BoundMethodHandlerBoundMethodHandlerSerializes bound method objects (methods bound to an instance).
State captured:
instance: The object the method is bound to (recursively serialized)function_name: Name of the methodclass_name, module: For debuggingReconstruction: Get the deserialized instance, then getattr(instance, function_name) to get the bound method. The method exists on the class, we just need to bind it to the instance.
StaticMethodHandlerStaticMethodHandlerSerializes @staticmethod wrappers.
State captured:
func: The underlying functionReconstruction: staticmethod(func)
ClassMethodHandlerClassMethodHandlerSerializes @classmethod wrappers.
State captured:
func: The underlying functionReconstruction: classmethod(func)
LoggerHandlerLoggerHandlerSerializes logging.Logger instances by capturing their configuration.
State captured:
name: Logger name (loggers are singletons per name)level: Logging level (DEBUG=10, INFO=20, etc.)handlers: List of handler objects (recursively serialized)filters: List of filter objectspropagate, disabled: Logger settingsReconstruction: logging.getLogger(name), clear existing handlers, restore configuration. Leverages the fact that loggers are singletons - same name returns same instance.
StreamHandlerHandlerStreamHandlerHandlerSerializes logging.StreamHandler objects.
State captured:
level: Handler's logging levelformatter: Formatter object (recursively serialized)Reconstruction: Create new StreamHandler() (uses sys.stderr by default), set level and formatter. Stream itself is not serialized - uses default in target process.
FileHandlerHandlerFileHandlerHandlerSerializes logging.FileHandler objects.
State captured:
filename: Path to log file (baseFilename)mode: File open mode ('a', 'w')encoding: File encodinglevel, formatterReconstruction: FileHandler(filename, mode, encoding), restore level and formatter.
FormatterHandlerFormatterHandlerSerializes logging.Formatter objects.
State captured:
fmt: Format string (e.g., '%(asctime)s - %(name)s')datefmt: Date format stringstyle: Format style ('%', '{', or '$')Reconstruction: Formatter(fmt, datefmt, style)
FileHandleHandlerFileHandleHandlerSerializes open file handle objects (TextIOWrapper, BufferedReader, etc.).
State captured:
path: Absolute file pathrelative_path: Relative path using Skpath (if available)mode: File open modeposition: Current position from tell()encoding, errors, newline: Text mode settingsclosed, is_pipe: State flagsReconstruction:
Limitation: Assumes file exists in target process's filesystem.
TemporaryFileHandlerTemporaryFileHandlerSerializes tempfile.NamedTemporaryFile objects.
State captured:
mode, positioncontent: Full file content (reads entire file)suffix, prefix, deleteencoding, original_nameReconstruction: Create NEW temp file with same properties, write content, seek to position.
NOTE: Creates new temp file with DIFFERENT path than original. Content and properties preserved, but not exact path.
StringIOHandlerStringIOHandlerSerializes io.StringIO in-memory text streams.
State captured:
content: Full string bufferposition: Current positionReconstruction: StringIO(content), seek to position.
BytesIOHandlerBytesIOHandlerSerializes io.BytesIO in-memory binary streams.
State captured:
content: Full bytes bufferposition: Current positionclosed: Whether closedReconstruction: Handle closed state, BytesIO(content), seek to position.
LockHandlerLockHandlerSerializes threading.Lock and threading.RLock objects.
State captured:
lock_type: "Lock" or "RLock"locked: Whether lock is currently acquiredFor RLock (no locked() method): try non-blocking acquire() to check state.
Reconstruction: Create new lock, acquire if it was locked.
Limitation: Lock thread ownership does NOT transfer across processes. The lock is acquired by the reconstructing thread, not the original owner.
SemaphoreHandlerSemaphoreHandlerSerializes threading.Semaphore and BoundedSemaphore objects.
State captured:
semaphore_type: "Semaphore" or "BoundedSemaphore"initial_value, current_valueReconstruction: Create semaphore with initial value, acquire repeatedly until counter matches current value.
BarrierHandlerBarrierHandlerSerializes threading.Barrier objects.
State captured:
parties: Number of threads that must arriveaction: Optional function called when all arrivetimeout: Optional timeoutReconstruction: Barrier(parties, action, timeout). Fresh barrier - doesn't capture how many threads are waiting.
ConditionHandlerConditionHandlerSerializes threading.Condition objects.
State captured:
lock: The underlying lock (recursively serialized)Reconstruction: Condition(lock=deserialized_lock)
QueueHandlerQueueHandlerSerializes queue.Queue, LifoQueue, PriorityQueue, and SimpleQueue objects.
State captured:
queue_type: Type namemaxsize: Maximum queue sizeitems: Snapshot of items (non-destructive using mutex + internal deque)Reconstruction: Create queue of appropriate type, put all items back.
MultiprocessingQueueHandlerMultiprocessingQueueHandlerSerializes multiprocessing.Queue objects.
State captured:
maxsizeitems: Best-effort snapshot (drain and restore, limited to 10000 items)Reconstruction: Create new queue (different underlying pipes), put items back.
Limitation: For reliable cross-process sharing, use over raw multiprocessing.Queue, or / for direct queue-based communication.
EventHandlerEventHandlerSerializes threading.Event objects.
State captured:
is_set: Whether event is set or clearReconstruction: Create new threading.Event, set if it was set.
MultiprocessingEventHandlerMultiprocessingEventHandlerSerializes multiprocessing.Event objects.
State captured:
is_set: Whether event is set or clearReconstruction: Create new multiprocessing.Event, set if it was set. Different underlying shared memory.
GeneratorHandlerGeneratorHandlerSerializes generator objects.
State captured:
generator_name, generator_qualname: For debuggingremaining_values: All values not yet yieldedImportant: EXHAUSTS the generator. Original becomes empty. This is the only way to preserve remaining values.
Reconstruction: iter(remaining_values). Returns iterator, not true generator. Values preserved but not pause/resume behavior.
IteratorHandlerIteratorHandlerSerializes iterator objects (enumerate, zip, filter, map, reversed, etc.).
State captured:
type_name: Iterator typeremaining_values: All remaining items (limited to 100000 for safety)Important: EXHAUSTS the iterator. Original becomes empty.
Reconstruction: iter(remaining_values). The original iterator type is not preserved.
RangeHandlerRangeHandlerSerializes range objects.
State captured:
start, stop, stepReconstruction: range(start, stop, step). Range objects are immutable and easy to serialize.
EnumerateHandlerEnumerateHandlerSerializes enumerate objects.
State captured:
remaining: List of (index, value) tuplesReconstruction: iter(remaining). The enumerate is consumed during serialization, and reconstruction returns a plain iterator over the remaining (index, value) pairs.
ZipHandlerZipHandlerSerializes zip objects.
State captured:
remaining: List of tuplesReconstruction: iter(remaining)
RegexPatternHandlerRegexPatternHandlerSerializes compiled re.Pattern objects.
State captured:
pattern: The regex pattern stringflags: Compilation flags (integer bitmask)Reconstruction: re.compile(pattern, flags)
MatchObjectHandlerMatchObjectHandlerSerializes re.Match objects.
State captured:
pattern, flags, string, pos, endposmatch_string, span, groups, groupdictReconstruction: Returns MatchReconnector. Call reconnect() to re-run the pattern on the original string and get a live re.Match object. Returns None if the match can't be reproduced (e.g., pattern or string changed).
SQLiteConnectionHandlerSQLiteConnectionHandlerSerializes sqlite3.Connection objects.
State captured:
database: Path or ':memory:'isolation_levelis_memory: Whether in-memory databaseFor in-memory databases:
schema: All CREATE TABLE statementsdata: All table data (expensive for large databases)Reconstruction: Returns SQLiteConnectionReconnector. Call reconnect() to create a new connection. For file databases, connects to the file. For in-memory databases, creates new connection and restores schema and data.
SQLiteCursorHandlerSQLiteCursorHandlerSerializes sqlite3.Cursor objects.
State captured:
connection: Database connection (recursively serialized)lastrowid, arraysizeReconstruction: Returns SQLiteCursorReconnector. Call reconnect() to create a new cursor. If the connection is also a reconnector, it will be reconnected first. Result set NOT restored - user must re-execute query.
HTTPSessionHandlerHTTPSessionHandlerSerializes requests.Session objects.
State captured:
cookies: Session cookies (as dict)headers: Default headersauth: Authentication tupleproxies, verify, cert, max_redirectsReconstruction: Create new Session, apply configuration. Active connections not preserved - connection pools recreated fresh.
SocketHandlerSocketHandlerSerializes socket.socket objects.
State captured:
family, type, proto: Socket parameterstimeout, blockinglocal_addr, remote_addr: From getsockname/getpeernameReconstruction: Returns SocketReconnector. Call reconnect() to create new socket with same parameters, apply timeout/blocking settings, and best-effort bind/connect using saved addresses.
Limitation: Actual connection NOT preserved. Buffer contents lost.
DatabaseConnectionHandlerDatabaseConnectionHandlerGeneric handler for database connections (PostgreSQL, MySQL, MongoDB, Redis, SQLAlchemy, etc.).
State captured:
module, class_namehost, port, user, databasePasswords/tokens intentionally NOT stored for security.
Reconstruction: Returns typed DbReconnector (PostgresReconnector, MySQLReconnector, etc.). User calls reconnect(auth=...) to create a new live connection, providing credentials for protected databases.
ThreadHandlerThreadHandlerSerializes threading.Thread objects.
State captured:
name, daemontarget: Target function (recursively serialized)args, kwargsis_alive: Whether runningReconstruction: Returns ThreadReconnector. Call reconnect(start=False) to create new Thread. Thread NOT started by default.
Limitation: Thread execution state (call stack, locals) cannot be serialized.
ThreadPoolExecutorHandlerThreadPoolExecutorHandlerSerializes ThreadPoolExecutor objects.
State captured:
max_workers, thread_name_prefixReconstruction: Fresh executor with same configuration. Running tasks NOT serialized.
ProcessPoolExecutorHandlerProcessPoolExecutorHandlerSerializes ProcessPoolExecutor objects.
State captured:
max_workersReconstruction: Fresh executor with same configuration.
ThreadLocalHandlerThreadLocalHandlerSerializes threading.local objects.
State captured:
data: Current thread's local values (from __dict__)Only current thread's values serialized. Other threads' values lost.
Reconstruction: New threading.local(), set attributes from data.
WeakrefHandlerWeakrefHandlerSerializes weakref.ref objects.
State captured:
referenced_object: The object (if still alive)is_dead: Whether reference is deadReconstruction:
Note: Weak references become strong references during serialization, then weak again in new weakref.
WeakValueDictionaryHandlerWeakValueDictionaryHandlerSerializes weakref.WeakValueDictionary objects.
State captured:
items: Current key-value pairs (values that exist)Reconstruction: New WeakValueDictionary, add items. Values become strong during transfer, weak again when inserted. If a value is not weakrefable, a weakrefable placeholder is stored (with a warning) so data is preserved.
WeakKeyDictionaryHandlerWeakKeyDictionaryHandlerSerializes weakref.WeakKeyDictionary objects.
State captured:
items: Current key-value pairs (keys that exist)Reconstruction: New WeakKeyDictionary, add items. Non-weakrefable keys are replaced with placeholder keys (with a warning) so data is preserved.
EnumHandlerEnumHandlerSerializes enum.Enum instances.
State captured:
module, enum_name, qualnamemember_name, valuemember_names for Flag/IntFlag combinationsdefinition for dynamic enums (name, members, base_type)Reconstruction: Import enum class, get member by name (or by value as fallback). If import fails and a definition is present, reconstruct from definition.
Works for Enum, IntEnum, Flag, IntFlag, and custom subclasses.
EnumClassHandlerEnumClassHandlerSerializes enum classes themselves (not instances).
Module-level enums: Store reference (module + name) Dynamic enums: Serialize full definition (name, members, base_type)
Reconstruction:
base_type(name, members) using functional APICoroutineHandlerCoroutineHandlerSerializes coroutine objects (from async def functions).
State captured:
cr_code, cr_name, cr_qualnameframe_locals: If availableReconstruction: Returns DeserializedCoroutine placeholder. Coroutine execution state cannot be transferred - awaiting raises error.
AsyncGeneratorHandlerAsyncGeneratorHandlerSerializes async generator objects.
State captured:
ag_name, ag_qualnameReconstruction: Returns DeserializedAsyncGenerator placeholder. Async iteration raises error.
TaskHandlerTaskHandlerSerializes asyncio.Task objects.
State captured:
task_name, is_done, is_cancelledresult, exception (if done)Reconstruction: Returns DeserializedTask placeholder with done(), cancelled(), result(), exception() methods.
FutureHandlerFutureHandlerSerializes asyncio.Future objects.
State captured:
is_done, is_cancelledresult, exceptionReconstruction: Returns DeserializedFuture placeholder with Future-like interface.
ClassObjectHandlerClassObjectHandlerSerializes class objects themselves (not instances).
Module-level classes: Store reference (module + name) Dynamic classes: Serialize full definition using ClassInstanceHandler._serialize_class_definition()
Reconstruction:
type()ClassInstanceHandlerClassInstanceHandlerSerializes instances of user-defined classes. This is the catch-all handler (last in the chain).
Extraction strategy hierarchy:
__serialize__ / __deserialize__ methods (custom serialization)to_dict() / from_dict() methods (library pattern)__dict__ access (generic)__slots__ extraction (for slots-only classes)__dict__ and __slots__ (for hybrid classes)types.GenericAlias (stores origin + args)State captured:
module, qualname: Class identitystrategy: Which strategy was usedclass_definition: For locally-defined or __main__ classescustom_state, dict_state, instance_dict, slots_dict)Handles:
__main__ (which pickle can't handle)__slots__ classesReconstruction:
dict strategy: cls.__new__(cls), then __dict__.update()slots strategy: cls.__new__(cls), then setattr() for each slotNetwork sockets, database connections, threads, subprocesses, and more cannot be serialized safely due to different reasons.
Additionally, auto reconnecting live resources like these can lead to unexpected behavior.
For these cases, we return Reconnector instances that store as much metadata as possible to recreate the live resource, acting as a placeholder until we actually reconnect the live resource. Reconnectors that do not require auth will lazily reconnect on first attribute access; auth-based reconnectors still require an explicit reconnect(...) call (or with credentials).
Reconnector is constructed from IR state and returnedReconnector.reconnect() is called to create a new live resource (may require authentication)How it works:
cucumber creates a placeholder Reconnector objectreconnect() to create the new live resource, providing any authentication neededreconnect_all (obj, **auth) also includes a function called .
Args:
start_threads: if True, any threading.Thread objects returned by reconnectors are automatically started after reconnect. This is a keyword only argument.**auth: a mapping of type key to secrets (authentication).start_threadsIf start_threads=True, any threading.Thread objects returned by reconnectors are automatically started after reconnect.
This allows you to quickly reconstruct all live resources in a given object.
**auth**auth is a mapping of type key to secrets (authentication).
Type keys are strings like "psycopg2.Connection" or "redis.Redis", that are the actual connection types.
Each type key maps to a dict where:
"*" is the default auth for all instances of that typeauth = {
"psycopg2.Connection": {
"*": "default_psycopg2_password",
"analytics_db": "analytics_password" # used for obj.analytics_db specifically
},
"redis.Redis": {
"*": "your_redis_password"
}
}
"*" for that typereconnect() is called with no credentialsDbReconnectorDbReconnectorDbReconnectorDatabase connections can't be pickled directly, and many require additional credentials (like a password or API key) to connect.
Authentication credentials should not be serialized into an IR for security reasons.
Instead, we store things like the host, port, user, database name, and everything else so that reconnect() can quickly recreate the same connection to your database. All you need to do is pass auth so that reconnect() can use the correct credentials.
DbReconnector actually has many specialized subclasses for different database types.
auth for each type is their respective password, token, or other authentication needed in order to create a connection to the actual *.Connection object.
For example, password is the arg for psycopg2.connect(). auth for PostgresReconnector will input what you pass as the password arg.
DbReconnector subclassesPostgresReconnector types: psycopg2.Connection, psycopg.Connection auth: required data collected: hostportuserdatabasedsn / url (if available)psycopg2.Connection or psycopg.Connection limitations: not the same session; open transactions, cursors, and server state are not preservedMySQLReconnector types: pymysql.Connection, mysql.connector.connect(), mariadb.Connection auth: required data collected: hostportuserdatabasepymysql.Connection / mysql.connector.connection_cext.CMySQLConnection / mariadb.Connection limitations: not the same session; open transactions and server state are not preservedSQLiteReconnector types: sqlite3.Connection auth: NOT required (path only) data collected: path / databasesqlite3.Connection limitations: in-memory databases do not persist across processes; file locks may force fallback to :memory:You still need to call reconnect() or to create the actual sqlite3.Connection object again, but we capture all data needed to recreate the object.
MongoReconnector types: pymongo.MongoClient auth: required data collected: uri / url (if available)hostportusername / userauthSource / auth_sourcepymongo.MongoClient limitations: not the same server session; any in-progress operations are lostSQLAlchemyReconnector types: sqlalchemy.Engine, sqlalchemy.Connection auth: required data collected: url / uri / dsn (if available)driver / drivernamehostportdatabaseuser / usernamequery (if present)sqlalchemy.engine.Connection (via Engine.connect()) limitations: does not preserve engine pool state or active transactionsCassandraReconnector types: cassandra.cluster.Cluster auth: required data collected: contact_points / hosts / nodesportusername / userkeyspace (if present)cassandra.cluster.Session limitations: cluster/session state is new; in-flight queries are not preservedElasticsearchReconnector types: elasticsearch.Elasticsearch auth: required (api_key supported) data collected: hosts or url / uriuser / usernameelasticsearch.Elasticsearch limitations: no preserved connections or request stateNeo4jReconnector types: neo4j.Driver auth: required data collected: uri / urlhostportschemeuser / usernameencrypted (if present)neo4j.Driver limitations: existing sessions and transactions are not preservedApi key is also supported.
InfluxDBReconnector types: influxdb.InfluxDBClient, influxdb_client.InfluxDBClient auth: required (token for v2) data collected: url / urihostportuserdatabaseorg (v2)timeout (if present)verify_ssl (if present)influxdb.InfluxDBClient or influxdb_client.InfluxDBClient limitations: server-side session state is not preservedODBCReconnector types: pyodbc.Connection auth: required data collected: dsndriverserver / hostportdatabase / dbuser / username / uidpyodbc.Connection limitations: any active transactions/cursors are lostClickHouseReconnector types: clickhouse_driver.Client auth: required data collected: hostportuserdatabaseclickhouse_driver.Client limitations: no preserved session stateMSSQLReconnector types: pymssql.Connection auth: required data collected: hostportuserdatabasepymssql.Connection limitations: open transactions and session state are not preservedOracleReconnector types: oracledb.Connection, cx_Oracle.Connection auth: required data collected: dsnhostportservice_name / databaseuseroracledb.Connection or cx_Oracle.Connection limitations: active sessions/transactions are not preservedSnowflakeReconnector types: snowflake.connector.Connection auth: required data collected: useraccountwarehousedatabaseschemarolesnowflake.connector.Connection limitations: session state (role, warehouse changes) is re-established from stored params onlyDuckDBReconnector types: duckdb.Connection auth: NOT required (path only) data collected: path / databaseduckdb.Connection limitations: in-memory databases do not persist across processesYou still need to call reconnect() or to create the actual duckdb.Connection object again, but we capture all data needed to recreate the object.
SocketReconnector --> socket.socketLive sockets are OS resources that can't be serialized safely.
Data collected:
familytypeprototimeoutblockinglocal_addrremote_addrWhen you call reconnect(), the socket is created with the same parameters and reconnects/binds as appropriate.
Result: socket.socket (fresh socket)
Limitations:
ThreadReconnector --> threading.ThreadThreads run in their own process's memory space. You cannot directly serialize and reconnect to the exact same thread in the same memory space if the thread object has moved to a different process.
However, it is still useful to quickly recreate the exact same thread in a different process. This is why includes a ThreadReconnector.
If an object is bouncing around different places in memory (running in different processes with different GILs), it is useful and convenient to be able to quickly start something like a background runner thread or a monitoring thread without having to manually create the thread object and start it.
It is also useful when doing something like adding a ThreadReconnector to shared memory using or to quickly start a thread in multiple different places.
Data collected:
namedaemontargetargskwargsis_alive (informational only)When you call reconnect(), a new threading.Thread object is constructed with the same exact configuration.
Result: threading.Thread (not started)
does not start the thread by default because that could cause silent issues or unexpected behavior.
If you want to start the thread automatically when you reconnect, use:
thread = reconnector.reconnect(start=True)
# reconnect_all()
reconnect_all (obj, start_threads=True, **auth)
# autoreconnect decorator on Skprocess class
from suitkaise .processing import Skprocess , autoreconnect
@autoreconnect (start_threads=True, **auth)
class MyProcess (Skprocess ):
# ...
Limitations:
PipeReconnector --> multiprocessing.Pipe / OS pipesCreated from OS pipe file objects and multiprocessing. endpoints.
Pipes are OS level handles tied to a specific process. The file descriptors or connection handles are not valid outside the original process boundary unless one side of the pipe is explicitly passed to a new process as it is being created.
Since pipes are actually OS handles, they cannot be converted down to bytes and then passed. When going through without using the class, OS handles get lost because of this, meaning pipes won't actually work correctly.
If you want to use pipes with , use the class from instead of relying on the PipeReconnector.
Data collected:
readablewritableclosedduplexpreferred_endReconnector specific attributes:
has_endpoint (True if the original pipe was strictly read-only or write-only)endpoint ("read" or "write" when has_endpoint is True)PipeReconnector has 3 methods instead of just reconnect():
reconnect() -> returns one end of a new pipepeer() -> returns the other end of the pipepair() -> returns both ends of a new pipe, ready for useThese are new multiprocessing.connection.Connection objects, that do not point to the original parent, but are still valid and ready to use.
NOTE: PipeReconnector does not apply to objects. These are handled directly using __serialize__ and __deserialize__ and preserve pipe handles.
Limitations:
SubprocessReconnector --> subprocess.PopenSubprocesses are OS processes and cannot be paused and moved between processes. We store the command args and output metadata so you can restart it if needed, but the actual process and its runtime state are not preserved.
Data collected:
argsreturncodepid (original)poll_resultstdout_datastderr_dataWhen you call reconnect(), a new subprocess is started with the saved launch parameters (args and other state). Because SubprocessReconnector does not require auth, it will also lazily reconnect on first attribute access, which starts a new process; use .snapshot() if you want metadata without starting a process.
Result: subprocess.Popen (new process)
Limitations:
MatchReconnector --> re.Matchre.Match objects are not constructible directly because Python doesn’t expose a public constructor for them. They only exist as the result of running a compiled regex pattern against a specific string. To recreate one, we store all of the data needed to recreate the match.
When you call reconnect(), the pattern is recompiled and re-run on the same string, and then groups and span info are restored. If the match is identical, a new re.Match is returned. None is returned if it is not.
Data collected:
patternflagsstringposendposmatch_stringspangroupsgroupdictResult: re.Match (if the match can be reproduced)
Limitations:
As seen above, some limitations are present.
This is due to how Python handles these resources when they cross process boundaries, and something that has to find workarounds for.
Using Reconnectors is the best way we can work with Python's architecture to allow you to use this in a cross-process/distributed environment.
It also ensures that even if you have these object types in your object, you won't receive errors when serializing/deserializing it.
Why not automatically reconnect these resources?
2 reasons: security and user expectations.
In order to automatically reconnect objects like database connections that require authentication, we would need to pass that authentication somewhere in the IR. This data is easily readable and can easily be compromised.
In general, it is not good practice to include sensitive data when moving objects between processes. does everything else for you so all you have to to is reaccess and add the authentication once your object reaches the target process.
Automatically reconnecting things like this is awkward for users.
offers 3 levels of user control regarding reconnection.
reconnect() on each Reconnector object, adding each auth individuallyreconnect_all (obj, **auth) on an entire object@autoreconnect (start_threads=True, **auth) to decorate a Skprocess classWe can automatically reconnect for you, but it is controlled and requires you to provide authentication through a decorator and use a specific class.