You can use the onPartialResponse method to register a callback that is called for each partial response:
Copy
<?phpuse Cognesy\Polyglot\Inference\Inference;$inference = new Inference();$response = $inference->with( messages: 'Write a short story about a space explorer.', options: ['stream' => true]);// Set up a callback for processing partial responses$stream = $response->stream()->onPartialResponse(function($partialResponse) { echo $partialResponse->contentDelta; flush();});// Process all responsesforeach ($stream->responses() as $_) { // The callback is called for each partial response // We don't need to do anything here}
In some cases, you may want to stop the generation early:
Copy
<?phpuse Cognesy\Polyglot\Inference\Inference;$inference = new Inference();$response = $inference->with( messages: 'Write a long story about space exploration.', options: ['stream' => true]);$stream = $response->stream()->responses();$wordCount = 0;$maxWords = 100; // Limit to 100 wordsforeach ($stream as $partialResponse) { echo $partialResponse->contentDelta; flush(); // Count words in the accumulated content $words = str_word_count($partialResponse->content()); // Stop after reaching the word limit if ($words >= $maxWords) { echo "\n\n[Generation stopped after $maxWords words]\n"; break; // Exit the loop early }}
Note that when you break out of the loop, the request to the provider continues in the background, but your application stops processing the response.
When working with streaming responses, keep these performance considerations in mind:
Memory Usage: Be careful with how you accumulate content, especially for very long responses
Buffer Flushing: In web applications, make sure output buffers are properly flushed
Connection Stability: Streaming connections can be more sensitive to network issues
Timeouts: Adjust timeout settings for long-running streams
Here’s an example of memory-efficient processing for very long responses:
Copy
<?phpuse Cognesy\Polyglot\Inference\Inference;$inference = new Inference();$response = $inference->with( messages: 'Generate a very long story.', options: [ 'stream' => true, 'max_tokens' => 10000 // Request a long response ]);$stream = $response->stream()->responses();$outputFile = fopen('generated_story.txt', 'w');foreach ($stream as $partialResponse) { // Write chunks directly to file instead of keeping them in memory fwrite($outputFile, $partialResponse->contentDelta); // Optional: Show a progress indicator echo "."; flush();}fclose($outputFile);echo "\nGeneration complete. Story saved to generated_story.txt\n";