SQL Server StreamInsight and Azure Stream Analytics


A couple of weeks ago, I put a presentation for Azure Stream Analytics for the Global Azure Bootcamp. As I was discussing this with colleagues at nVisionIT, some of the SQL Server gurus asked me what the difference was compared to StreamInsights.

Both products target the exact same problem: processing complex event streams (CEP’s) to deduce meaningful patterns. Most of the typical use cases for these products is when various devices and sensors are sending data for further processing.

Same-Same?

StreamInsights employs .NET based runtime with LINQ queries, on top of the . This is a very familiar interface for the .NET developers. On the other hand, Azure Stream Analytics uses a T-SQL like syntax to express its intents. A language that most data-power users will be familiar with.

Figure 1 - LINQ example of StreamInsight

var payloadByRegion =
  from i in inputStream
  group i by i.Region into byRegion
  from c in byRegion.HoppingWindow(
    TimeSpan.FromMinutes(1),
    TimeSpan.FromSeconds(2),
    HoppingWindowOutputPolicy.ClipToWindowEnd)
  select new {
    Region = byRegion.Key,
    Sum = c.Sum(p => p.Value) };

Figure 2 - T-SQL like syntax Azure Stream Analytics

SELECT
  System.Timestamp as Time,
  Topic,
  COUNT(*),
  AVG(SentimentScore),
  MIN(SentimentScore),
  Max(SentimentScore),
  STDEV(SentimentScore)
FROM TwitterStream
  TIMESTAMP BY CreatedAt
GROUP BY
  TUMBLINGWINDOW(s, 5),
  Topic

The similarities end there. The biggest difference is that StreamInsights can be deployed on-premise, whilst the Stream Analytics can only be hosted in Azure. Each technology utilizes various bits to enable the processing of the complex events processing.

Differences

Event processing on Stream Analytics and StreamInsights

Input/Output

StreamInsights allows developers to create custom consumers/publishers as adapters. This allows other any “raw” format to be ingested by the event processing logic and output the data to any “raw” format.

On the other hand, Stream Analytics puts a few requirements on the input: either the events need to be in JSON format or CSV format. An acceptable assumption, but does not allow those raw output sensors from the field to be directly plugged in.

Stream Analytics jobs are directly responsible for outputting the data to a list of pre-defined sinks such as event hubs, SQL Databases, blob storage, table storage, or even the brand new Power BI platform.

StreamInsights though allows the developers to output the data to anything.

Scalability

Both Stream Analytics and StreamInsights are able to scale horizontally.

Given a Stream Analytics query that has degrees-of-parallelism , the administrator is able to elastically commit more job agents (or computing power) to keep up with ingesting the incoming workload. Stream Analytics reads from a reservoir of data stored in the blob storage and/or the event hub - so if there is a huge influx of data, Stream Analytics will just continue working, albeit, through a longer backlog.

With StreamInsights, the number of agents are pre-defined whilst designing the solution; however, those agents could all be co-located on one virtual machine or spread out to one instance per virtual machine.

Abstracted Details

The biggest advantage that Stream Analytics has over StreamInsights is that the event publishers do not have to be aware that there may be multiple processes in the background reading their data. When events are being published to the Stream Analytics, the only thing that event publishers needs to know is which Event Bus or Blob storage to place the data. Stream Analytics will then fetch the data.

On the other hand, when events are published to StreamInsights, the event publishers need to know the specific adapter instance that it needs to deliver the messages to.

Resiliency

When it comes to resilient event processing, both StreamInsights and Stream Analytics offer this capability. Stream Analytics does this implicitly without the developer ever being aware of this.

StreamInsights makes this feature more explicit for developer - the developer needs to mark checkpoints as data is processed, per node (input adapter, queries, and output adapters).

StreamInsights does offer something that Stream Analytics could find useful: user-defined aggregate functions. Stream Analytics probably will never see this feature on its stack, but useful for the StreamInsights developers.

Conclusion

As stated out from the outset, both of these products target very different platforms and configurations, although they do solve the same problem - complex event processing.

If your application needs to process the events on-premise, consider using the SQL Server StreamInsights in your solution. Just bear in mind that your application needs a little bit of tender care when it comes to processing events.

Should you have the choice between the two, the Azure Stream Analytics service makes sense if you are just looking for a PaaS solution. Data in, data out.

blog comments powered by Disqus