continue_text¶
basemode.continue_.continue_text
Streams a single text continuation from a language model.
Signature¶
async def continue_text(
prefix: str,
model: str = "gpt-4o-mini",
*,
max_tokens: int = 200,
temperature: float = 0.9,
context: str = "",
strategy: str | None = None,
rewind: bool = False,
**extra,
) -> AsyncGenerator[str, None]
What it does¶
- Normalizes the model name (see basemode Model Normalization)
- Detects the best continuation strategy for the model (see basemode Strategies)
- Optionally rewinds the prefix to the nearest word boundary (
rewind=True) - Calls the strategy's
stream()method - Applies token boundary healing to the output stream
- Yields healed tokens one at a time
Parameters¶
| Name | Type | Default | Description |
|---|---|---|---|
prefix |
str |
required | Text to continue from |
model |
str |
"gpt-4o-mini" |
Model name; normalized automatically |
max_tokens |
int |
200 |
Maximum tokens to generate |
temperature |
float |
0.9 |
Sampling temperature |
context |
str |
"" |
System context / framing (inserted as system message) |
strategy |
str \| None |
None |
Strategy override; None = auto-detect |
rewind |
bool |
False |
If True, trim prefix back to the last word boundary before sending |
**extra |
Additional kwargs forwarded to the model API |
Yields¶
str — tokens as they arrive, with boundaries healed. Token size varies by model and position; typically 1–5 characters.
Examples¶
import asyncio
from basemode import continue_text
# Basic usage
async def main():
async for token in continue_text("The old lighthouse keeper"):
print(token, end="", flush=True)
asyncio.run(main())
# Collect full output
async def collect():
tokens = []
async for token in continue_text(
"She examined the artifact carefully",
model="anthropic/claude-opus-4-7",
max_tokens=400,
context="This is a science fiction story.",
):
tokens.append(token)
return "".join(tokens)
text = asyncio.run(collect())
Notes¶
- The
rewindoption is useful when you want generation to feel continuous even when the prefix ends mid-word. Withrewind=True, the last partial word is trimmed from the prefix before sending, so the model starts fresh from the previous word boundary. **extrais forwarded directly to the underlying LiteLLM call. Useful for provider-specific parameters liketop_p,seed, etc.- Temperature is silently removed for models that don't support it (GPT-5, o-series). Pass any temperature value — it won't cause an error.