ADR-0009: Cloneable Types for Batch Processing¶
Status¶
Accepted
Date¶
2025-02-27
Context¶
In implementing batch processing for the floxide framework, we've encountered an ownership challenge: individual items that are processed in parallel tasks need to be accessed in multiple places:
- When creating an item-specific context
- When returning the original item as part of the result
- When updating the batch context with results
The Rust borrow checker enforces strict ownership rules, and we need a solution that allows:
- Processing items in parallel
- Passing ownership of items into tasks
- Returning processed items from tasks
- Avoiding unnecessary copies of potentially large data
Decision¶
We will require that item types used in BatchContext<T>
must implement the Clone
trait. This requirement will be documented and enforced through trait bounds.
Updated BatchContext Trait¶
/// Trait for contexts that support batch processing
pub trait BatchContext<T>
where
T: Clone + Send + Sync + 'static,
{
/// Get the items to process in batch
fn get_batch_items(&self) -> Result<Vec<T>, FloxideError>;
/// Create a context for a single item
fn create_item_context(&self, item: T) -> Result<Self, FloxideError>
where
Self: Sized;
/// Update the main context with results from item processing
fn update_with_results(
&mut self,
results: &Vec<Result<T, FloxideError>>,
) -> Result<(), FloxideError>;
}
BatchNode Implementation¶
The BatchNode
implementation will be updated to properly handle cloning:
// Create tasks for each item
for item in items {
let semaphore = semaphore.clone();
let workflow = self.item_workflow.clone();
let ctx_clone = ctx.clone();
// Clone the item for use in the task
let item_clone = item.clone();
// Spawn a task for each item
let handle = tokio::spawn(async move {
// Acquire a permit from the semaphore to limit concurrency
let _permit = semaphore.acquire().await.unwrap();
match ctx_clone.create_item_context(item_clone) {
Ok(mut item_ctx) => match workflow.execute(&mut item_ctx).await {
Ok(_) => Ok(item),
Err(e) => Err(FloxideError::batch_processing(
"Failed to process item",
Box::new(e),
)),
},
Err(e) => Err(e),
}
});
handles.push(handle);
}
Consequences¶
Advantages¶
- Clear Requirements: Users know exactly what constraints apply to item types
- Type Safety: The compiler enforces the Clone constraint
- Efficient Processing: Items can be processed in parallel without unsafe code
- Safe Implementation: No risk of use-after-move errors
Disadvantages¶
- Constraint on Types: Requires all batch item types to implement Clone
- Potential Memory Overhead: May result in more copies than strictly necessary
- Potential Performance Impact: Cloning large items could impact performance
Alternatives Considered¶
Require Copy Instead of Clone¶
We considered requiring Copy
instead of Clone
, which would eliminate the need for explicit cloning. However, this would be too restrictive, as many useful types (like String, Vec, etc.) don't implement Copy.
Use References with Lifetime Parameters¶
Another approach would be to use references with explicit lifetimes throughout the batch processing system. While this would avoid cloning, it would significantly complicate the API and make it harder to use, especially with async code and closures.
Use Arc for Shared Ownership¶
We could require item types to be wrapped in Arc for shared ownership. This would avoid cloning the actual data but would require users to wrap and unwrap their data, complicating the API.
Implementation Notes¶
- The
Clone
constraint will be added to all relevant trait bounds - Documentation will clearly state that batch item types must be cloneable
- Examples will demonstrate best practices for minimizing cloning overhead
- Unit tests will verify correct behavior with various item types