ADR-0010: Workflow Cloning Strategy¶
Status¶
Accepted
Date¶
2025-02-27
Context¶
Our codebase has multiple areas where we need to clone or share workflows:
- In the
Workflow::from_arc
method, which attempts to clone a workflow from anArc<Workflow>
- In the
BatchNode
implementation, which stores workflows in anArc
and clones the reference for each worker task
However, we're encountering issues because:
Box<dyn Node<...>>
does not implementClone
, preventing direct cloning of the nodes HashMap- We need to preserve the ability to share workflows between tasks efficiently
- There are logical ownership constraints in async code that prevent simply using references with lifetimes
Decision¶
We will take a multi-pronged approach to solve workflow cloning issues:
1. Use Arc for Node Storage¶
Instead of storing nodes directly in Box<dyn Node<...>>
, we'll store them in Arc<dyn Node<...>>
:
pub(crate) nodes: HashMap<NodeId, Arc<dyn Node<Context, A, Output = Output>>>,
This allows easy cloning of the entire node collection without duplicating the actual node implementations.
2. Implement Clone for Workflow¶
We'll implement a proper Clone
implementation for Workflow
that clones the structure but shares the node implementations:
impl<Context, A, Output> Clone for Workflow<Context, A, Output>
where
Context: Send + Sync + 'static,
A: ActionType + Clone + Send + Sync + 'static,
Output: Send + Sync + 'static,
{
fn clone(&self) -> Self {
Self {
start_node: self.start_node.clone(),
nodes: self.nodes.clone(), // Now possible because we're using Arc
edges: self.edges.clone(),
default_routes: self.default_routes.clone(),
}
}
}
3. Remove from_arc Method¶
The Workflow::from_arc
method will be removed since it's no longer necessary - Arc
4. Refactor BatchNode to Leverage Cloning¶
The BatchNode
will be updated to use this clone capability rather than wrapping the workflow in an Arc:
pub struct BatchNode<Context, ItemType, A = crate::action::DefaultAction>
where
Context: BatchContext<ItemType> + Send + Sync + 'static,
ItemType: Clone + Send + Sync + 'static,
A: ActionType + Clone + Send + Sync + 'static,
{
id: NodeId,
item_workflow: Workflow<Context, A>, // No longer an Arc
parallelism: usize,
_phantom: PhantomData<(Context, ItemType, A)>,
}
// In the process method:
let workflow = self.item_workflow.clone(); // Now directly cloneable
Consequences¶
Advantages¶
- Cleaner API: No need for Arc-specific methods
- Memory Efficiency: Node implementations are shared, not duplicated
- Thread Safety: Arc provides thread-safe reference counting
- Type Safety: Cloning is now properly supported at the type level
Disadvantages¶
- Indirection Cost: Extra indirection through Arc when accessing nodes
- Memory Overhead: Arc has a small overhead per reference
- API Changes: Will require changes to code that expects Box
Migration Plan¶
- First, update the
Workflow
struct to useArc<dyn Node>
instead ofBox<dyn Node>
- Implement
Clone
forWorkflow
- Update the
BatchNode
implementation to leverage this new capability - Remove the now-redundant
from_arc
method - Update tests to verify correct cloning behavior
Alternatives Considered¶
Use Clone Trait Objects¶
We considered making Node
require Clone
, but this would be problematic because:
- Trait objects cannot use clone to return a new trait object
- It would require all node implementations to implement Clone
Keep Arc as the Primary Interface¶
We considered embracing Arc
Use Cow (Clone-on-Write)¶
We explored using Cow
Implementation Notes¶
- The change to Arc will be backward compatible for most code that consumes nodes
- We'll need to update node creation code to wrap nodes in Arc instead of Box
- This change reinforces the immutability of nodes once created