branch_text¶
basemode.continue_.branch_text
Streams N parallel continuations simultaneously, interleaving tokens as they arrive.
Signature¶
async def branch_text(
prefix: str,
model: str = "gpt-4o-mini",
*,
n: int = 4,
max_tokens: int = 200,
temperature: float = 0.9,
strategy: str | None = None,
rewind: bool = False,
**extra,
) -> AsyncGenerator[tuple[int, str], None]
What it does¶
Launches n concurrent calls to continue_text() and multiplexes their output through an async queue. Tokens from all branches interleave in the order they arrive from the API.
Parameters¶
Same as continue_text, plus:
| Name | Type | Default | Description |
|---|---|---|---|
n |
int |
4 |
Number of parallel continuations to generate |
Note: context is not a parameter of branch_text — pass it via **extra if needed, or use continue_text with a system prompt.
Yields¶
tuple[int, str] — (branch_index, token). branch_index is 0-based, ranging from 0 to n-1.
Examples¶
import asyncio
from basemode import branch_text
async def main():
branches = [""] * 4
async for idx, token in branch_text(
"The message in the bottle read:",
model="gpt-4o-mini",
n=4,
max_tokens=200,
):
branches[idx] += token
for i, text in enumerate(branches):
print(f"\n[{i}] {text}")
asyncio.run(main())
# Stream tokens as they arrive
async def stream_live():
async for idx, token in branch_text("She reached for the door", n=3):
print(f"[{idx}]{token}", end="", flush=True)
Notes¶
- All
ngenerations are started concurrently. On most APIs they run in parallel server-side as well, so wall-clock time is similar to a single generation. - Tokens from slow branches will arrive less frequently; fast branches may complete before slow ones start yielding.
- The generator completes when all
nbranches have finished. - This is the function that
basemode-loomcalls internally when you press Space in the TUI or runbasemode-loom run -n 4.