Getting remote content using HTTP
In addition to the classic retrieval of content using HTTP, VAST Platform is capable of using synchronous and asynchronous streaming to retrieve content using HTTP. These strategies optimize use of time and memory resources associated with getting remote content. It is achieved by interacting with an `SstStreamingByteMessage rather than the classic SstByteMessage`.
Classic content retrieval
When fetching content from an HTTP (or HTTPS) origin, the content is retrieved as a whole from the underlying connection and stored in memory until it is garbage collected, a GET
message will return an `SstByteMessage` containing the HTTP headers and the contents.
E.g.
| client message |
client := SstHttpClient new.
message := client get: 'http://httpbingo.org/bytes/100000' sstAsUrl.
message contents. " -> contains the downloaded contents"
This works fine for content of small to medium size, but has some implications (or limitations) if the downloaded content is large. For instance, if you're fetching a 2 GB file using HTTP, the operation will block the calling thread until the all the bytes are transfered from the underlying collection to Smalltalk memory. In addition, the transfer requires 2 GB of Smalltalk memory to keep the bytes that will very likely only be written down to disk or uploaded somewhere else.
Streaming HTTP responses
To overcome such issues, and even improve the ways to interact with remote content, there is a new way of getting content from HTTP by streaming the bytes per chunks instead of reading it as a whole.
E.g.
| client message |
client := SstHttpClient new.
message := client
get: 'http://httpbingo.org/bytes/10000' sstAsUrl
streaming: true. "new parameter"
In the above snippet when passing adding the `streaming: true argument to the HTTP client, the returned message will be an instance of SstStreamedByteMessage which doesn't have all the content in memory (yet). The content is processed by interacting with the instance of SstStreamedByteMessage` as described below.
Interacting with an `SstStreamedByteMessage`
An instance `SstStreamedByteMessage` will have an input stream pointing to the underlying connection but, at the moment of being returned, it won't read from from the connection.
Streaming from the message
Coming back to the example of the large file download, you can request a 2 GB download and stream it onto a file stream, without having to allocate the 2GB of memory, and instead using only the memory for a small chunk of the data.
E.g.
| client message |
client := SstHttpClient new.
message := client
get: 'http://httpbingo.org/bytes/10000' sstAsUrl
streaming: true "new parameter".
outputStream := CfsWriteFileStream openEmpty: 'output.bin'.
message streamTo: outputStream ensure: [outputStream close].
This expects the response to return the `Content-Length header, and then it will read the content, in chunks, from the streamed message and adding each chunk onto the output stream. For a regular HTTP response the chosen chunk size will be defined in SstHttpAssemblerPolicy>>#streamingBufferSize`.
The whole streaming will block until completed, and the `ensure:` argument will be evaluated when the streamed message reaches the end or if something fails at the underlying connection.
Transfer-encoding: chunked
Some HTTP messages will use a `chunked transfer encoding, where the Content-Length` is not specified, either because it is an unbound response or because it is more efficient to do so.
In such cases, the content will be read from the underlying connection as in a regular non-chunked response, but the data will be streamed using the chunks in the HTTP message instead of using a fixed size chunk.
Getting the contents (compatibility)
An instance of `SstStreamedByteMessage can answer its contents if you send it the contents message, and also will store its contents in its instance variable, working as if it were a regular SstByteMessage`.
NOTE: If you don't ask for its contents, the `contents` instance variable won't be initialized, and once the message has been streamed you cannot ask for its contents, because they have not been kept in memory (which is the purpose of streaming).
If you plan to directly access the `contents` as a whole, we recommend using the classic, non-streamed, variant.
Using asynchronous Streams
What enables the streaming of contents is the use of VAST's powerful asynchronous programming library, by means of using asynchronous streams as the input stream of the `SstStreamedByteMessage`.
Both `streamOnto: and streamOnto:ensure: relies on asynchronous stream transformations, so as a developer you can access the inputStream` of the streamed message and do different async operations with it.
E.g.
| client message suscription |
client := SstHttpClient new.
message := client
get: 'http://httpbingo.org/bytes/100000' sstAsUrl
streaming: true.
suscription := message inputStream
listen: [:chunk | "chunk handling" ]
onError: [ "Async error handling" ]
onDone: [ "Done streaming" ].
In such case the returned subscription is a kind of `EsStreamSuscription` that will emit one event per chunk, but as it being a subscription you can pause, resume or cancel it, at an any moment, without blocking the caller, since it will run asynchronous in its own thread.