Why you would use cucumber

TLDR


cucumber is a serialization engine.

It allows you to serialize and deserialize objects across Python processes.

It is built for the Python environment, and isn't directly meant for use in external or cross-language serialization.

However, it can do something that no other Python serializer can do: get rid of all of your PicklingErrors.

If you need super fast speed for simple types, use base pickle. It is literally what Python originally gave us! Of course it's the fastest.

But, if you need to serialize anything else, use cucumber.

pickle vs cucumber — same object, different outcomes

import threading

class Worker:
    def __init__(self):
        self.lock = threading.Lock()
        self.thread = threading.Thread(target=self.run)
        self.results = []
    
    def run(self):
        self.results.append("done")

worker = Worker()

With pickle:

import pickle
pickle.dumps(worker)
# TypeError: cannot pickle '_thread.lock' objects

With cloudpickle:

import cloudpickle
cloudpickle.dumps(worker)
# TypeError: cannot pickle '_thread.lock' objects

With cucumber:

from suitkaise import cucumber
        
data = cucumber.serialize(worker)
restored = cucumber.deserialize(data)
# works. lock and thread become Reconnectors, ready to be recreated.
cucumber.reconnect_all(restored)
# lock and thread are live again.

No errors. No workarounds. No tiptoeing around types that cause PicklingErrors.

Serialize anything using cucumber

cucumber handles every type that dill and cloudpickle can handle.

It also handles many more types that are frequently used in higher level programming and parallel processing.

And, it can handle user created classes, with all of these objects!

Types only cucumber can handle

User created classes

cucumber has a way to dissect your class instances, allowing you to serialize essentially anything.

Classes defined in __main__

cucumber can handle classes defined in __main__.

Circular references

cucumber handles all circular references in your objects.

Superior speed

cucumber is faster than cloudpickle and dill for most simple types.

Additionally, it is multiple times faster that both of them for many types.

For a full performance breakdown, head to the performance page.

Actually reconstructs objects

cucumber intelligently reconstructs complex objects using custom handlers.

All you have to do after deserializing is call reconnect_all() and provide any authentication needed, and all of your live resources will be recreated automatically.

You can even start threads automatically if you use cucumber.

The Reconnector pattern — nothing else does this

When cucumber encounters a live resource (a database connection, an open socket, a running thread), it doesn't try to freeze and resume it -- that would be unsafe and often impossible. Instead, it creates a Reconnector object that stores the information needed to recreate the resource.

import psycopg2
from suitkaise import cucumber

# serialize a live database connection
conn = psycopg2.connect(host='localhost', database='mydb', password='secret')
data = cucumber.serialize(conn)

# deserialize it in another process
restored = cucumber.deserialize(data)
# restored.connection is a Reconnector, not a live connection yet

# reconnect with credentials (password is never stored in serialized data)
cucumber.reconnect_all(restored, password='secret')
# now restored.connection is a live psycopg2 connection again

This is a security-conscious design: authentication credentials are never stored in the serialized bytes. You provide them at reconnection time, so serialized data can be stored or transferred without leaking secrets.

No other Python serializer has this concept. Most either crash on live resources or silently produce broken objects.

Additionally, objects that don't need auth will be lazily reconstructed on first attribute access.

Easy inspection and error analysis

cucumber creates an intermediate representation (IR) of the object using pickle native types before using base pickle to serialize it to bytes.

{
    "__cucumber_type__": "<type_name>",
    "__handler__": "<handler_name>",
    "__object_id__": <id>,
    "state": {
        # object's state in IR form
    }
}

This allows everything to be cleanly organized and inspected.

Additionally, cucumber functions provide traceable, simple explanations of what went wrong if something fails.

# all you have to do is add debug=True
cucumber.serialize(obj, debug=True)

It also has an option to see how the object is getting serialized or reconstructed in real time with color-coded output.

# all you have to do is add verbose=True
cucumber.serialize(obj, verbose=True)

How do I know that cucumber can handle any user class?

cucumber can serialize any object as long as it contains supported types.

99% of Python objects only have supported types within them.

To prove to you that cucumber can handle any user class, I created a monster.

The WorstPossibleObject

WorstPossibleObject is an object I created that would never exist in real life.

Its only goal: try and break cucumber.

It contains every type that cucumber can handle, in a super nested, circular-referenced, randomly-generated structure.

Each WorstPossibleObject is different from the last, and they all have ways to verify that they remain intact after being converted to and from bytes.

Not only does cucumber handle this object, but it can handle more than 100 different WorstPossibleObjects per second.

By handle, I mean:

  1. Serialize it to bytes
  2. I pass it to a different process
  3. Deserialize it
  4. Reconnect everything

It can then verify that it is the same object as it was when it got created, and that all of its complex objects within still work as expected.

This test includes a full round trip.

`serialize()` → another process → `deserialize()` → `reconnect_all()` → verify → `serialize()` → back to original process → `deserialize()` → `reconnect_all()` → verify

To see the full WorstPossibleObject code, head to the worst possible object page. Have fun!

Where cucumber sits in the landscape

cucumber's real competitor is dill, not cloudpickle. Both cucumber and dill prioritize type coverage over raw speed. The difference: cucumber far outclasses dill on speed while exceeding its type coverage.

The fact that cucumber also competes with cloudpickle on speed -- despite covering vastly more types -- is the surprising part. cloudpickle is designed for speed with limited types. cucumber is designed for coverage and still keeps up.

For a full performance breakdown, head to the performance page.

Works with the rest of suitkaise

cucumber is the serialization backbone of the suitkaise ecosystem.