Power Platform in a Modern Data Platform Architecture

This is the first in a series of essays featuring new and deeper best practices added to the Power Platform Adoption Framework. A version of the content below has been incorporated into the Adoption Framework at this link. Special thanks to Lee Baker with whom I have tirelessly collaborated on this concept.

I’ve been thinking quite a bit lately about Power Platform as one of the three principal components of the one Microsoft Cloud, alongside Azure and Microsoft 365. This is particularly important in a more complex data ecosystem, one of the enterprise management dimensions you’ll find in the Power Platform Adoption Framework. So I want to expand on that data ecosystem concept with a modern data platform architecture modeled as a loop or a cycle (rather than a linear flow), particularly when Power Platform solutions are leveraged to develop end-user solutions as much as 74% faster than traditional application development with a 188% return on investment (Forrester’s The Total Economic Impact of Power Apps, 2020). These solutions are seamlessly integrated with the Microsoft Cloud and our modern data platform architecture.

The diagram below conceptualizes the data platform as involving several major areas including data collection, sources, ingestion, storage, analysis, and visualization. I’ll walk through each of them below. The goal here is to provide a platform-based model for taking data from the point of its creation to becoming actually valuable to an organization, across any industry. In other words, what’s our enterprise approach to deriving business value from the vast amount of data to which we have access?

PPAF_20210517_Power Platform in a Modern Data Architecture.png

It’s important to caveat that the model above is in no way all-inclusive of everything we can do in this space, all of the services and capabilities available to us, and all of the connections that can be made between discreet components shown (and not shown) on the diagram. Also note that the lines of demarcation between many of these components are not cut and dry. For example, Dataverse’s role as both source and storage necessarily places it in the middle of the diagram. The same could be said of Cosmos DB or Azure SQL (which doesn’t even appear as it’s own icon here), though the goal of this discussion is to model Power Platform’s role in that modern data platform. Similarly, I’ve chosen to depict the model as a loop because Power Platform components are critical at both the start and the end as both the means of end-user data collection and the ultimate usability of that data (see “Visualization”) by humans.

Data Collection

Let’s begin atop the loop with the “Data Collection” area. These are the transactional solutions through which users collect and otherwise generate data. The use cases here are nearly infinite, but as hard examples I’ll offer a call center operator working a case with a customer, an insurance adjuster taking photos and geo-location data in the field, a nurse updating their patient’s medical records on administration of a vaccine, a maintenance technician completing an inspection or repair checklist on a bus or train, an employee interacting with a chatbot to update their HR particulars, or even a soldier being accounted for just prior to leaping from an airplane. The sky’s the limit (literally).

As discussed earlier, Power Platform solutions let us develop these point solutions as much as 74% faster and with a 188% ROI over traditional application development or “do nothing” alternatives… so it makes sense that we’d turn to Power Platform to create such solutions whenever feasible. Among the core components we’re interested in here are:

Power Apps, the mobile and browser based “apps” with which users interact most frequently. This is the point of service through which the agent is populating case details, the adjuster is taking photos, the nurse is punching the vaccine card, etc. Important to note that Power Apps includes the wide range of Microsoft Dynamics apps addressing specific business use cases around sales, marketing, customer service, field service, and more.
Power Virtual Agents are rapidly developed AI-infused chatbots through which internal employees or external customers may interact conversationally to process routine transactions such as the employee chatting via Microsoft Teams with a bot who is in the background updating that employees emergency contacts, skill records, etc.
Power Automate for in-app automation, management of business processes, and robotic process automation (RPA) that free humans to focus on more complex tasks. Think, from our examples above, about the maintenance technician being provided checklists and repair tasks specific to the make and model of bus on which they are working.

Solutions developed using this technology can store their data in hundreds of different places via data connectors, SharePoint, Teams, and Azure SQL, but in our data platform model I’ll point out specifically that the native data service for these solutions is Microsoft Dataverse, sitting visually in the center of our loop because it interacts seamlessly with so many other components of our ecosystem. Dataverse further exchanges data with a number of storage technologies, including Data Lake, Cosmos DB, and Azure Blob Storage that we will discuss as we progress around the loop.

Data Sources

Dataverse is itself a structured database, so it pops up again in “Data Sources”, the next stop along our loop. Data is often not so conveniently gathered through the point solutions described in the data collection section above (though there is tremendous overlap between data collection and data sources, as is the case throughout this model). There are many (thousands or even more) sources from which data may be pulled into the platform. They include, and note that the below are broad categories of data sources and technologies rather than specific services or components:

Structured, as is the case with Dataverse, here we are talking about properly modeled, relational data. In simplistic terms: a case that is tied to specific customer, an insurance claim tied to a specific vehicle in a specific mishap, a vaccination tied to a specific dose and patient, a technician inspecting a specific train car, a chatbot updating HR data for a specific employee, or a specific soldier leaping from a specific plane as part of a specific mission. Structured data is in practice much more complex, but these examples cover the basic concept.
Unstructured data are the files, images, videos, and vast stores of other, well… unstructured content. It is true that AI (see “Analysis”) or human intervention create metadata associated with unstructured data, say a photo that is known to show the face of a specific individual or to depict a specific product, but the photo itself remains unstructured, otherwise not particularly useful absent its associated metadata or the analysis that we’ll apply later on in our trip around the loop.
Streaming includes data gathered through devices and largely absent (much) human intervention. The “Smart Home” is likely the most common and relatable example here, wherein inputs from voice and apps (see “Data Collection”) control lighting, but where so too can data from a smart thermostat adjust lighting based on temperature, or based on time, or based on other ambient conditions in the home. Extend this at scale across an enterprise and now we’re thinking of data gathered from drones, or data concerning the health of equipment gathered from gauges connected to that equipment, for example, assessing risk to a production line based on data collected from the equipment on the line itself. This is what we call “Internet of Things”, or IoT.

Ingestion

At this point in the loop we are ingesting data from our structured, unstructured, and streaming sources which—remember—may include thousands of devices. It’s also important to note here that on-premise data sources are still prominent in many enterprises. These sources may be ingested as well, often requiring an intermediary data gateway to move data from on-premise to cloud (and use case dependent, back again). This is particularly true in regulated industries or public sector situations for which data classification or data sovereignty considerations must be accounted. Whatever the sources, data ingestion is about taking in vast amounts of data, establishing relationships with other data, making decisions on where to store it, and then actually getting it to where it needs to go. Oh, and we’re doing this billions (or more) times. Here our model relies on:

Data Factory provides ETL and data integration, ingesting data from our myriad data sources, orchestrating its transformation and movement into our data storage further downstream. Think of data factory as the top of the neck in a kitchen funnel, the point at which the data we’ve poured into the wide mouth of the funnel comes together. Importantly in Power Platform context, Data Factory is able to ingest data from and orchestrate the movement of data into Dataverse, thus making Data Factory an indispensable partner to Dataverse in scaled data platform scenarios.
Event Grid receives events occurring upstream in our loop, and routes those into actions taken downstream. Understanding this requires a bit of historical background, that in previous times a change in data (e.g. a record of an equipment inspection discussed previously) could only cause a downstream activity to fire if the downstream activity were polling (listening) for the upstream event. This always-on polling approach consumed significant computing resources, particularly at scale. Event Grid orchestrates this, essentially receiving the data equivalent of a push notification, and taking action based on that notification. This is useful in orchestrating downstream effects from events that occur in our data sources.

Storage

Data must be stored in an appropriate medium as it passes through our ingestion and integration points. Note that in the Power Platform context, as previously discussed, Dataverse is capable of integrating directly with these storage services. Think of it not as circumventing the ingestion stage, but rather as direct efficiencies that Microsoft has constructed between its service for structured application data (Dataverse) and several of its other cloud data storage capabilities. Though there are many possibilities here, we’ve identified several that work well in common scenarios:

Data Lake is unlimited cloud-based storage for data ingested from amongst all of our many data sources. This is important because it provides a storage platform from which our analyses may occur. It may not seem significant at a small scale where we are storing application data transacted by a narrow number of use cases, but we’re thinking really big here where storage and analysis of data living inside of Dataverse (or any other of our potential thousands of sources) is both inefficient and expensive. The commonality of this scenario is reflected in our ability to export from Dataverse directly a Data Lake from within Power Apps, though Data Lake is a storage destination for an essentially unlimited number of sources.
Cosmos DB is a NoSQL database that fits into our ecosystem here thanks to the scale of applications under which it can sit, and its capacity for real-time integration for (example) analytical purposes or the absorption of IoT devices. In context of Power Platform, it is useful to think of Cosmos DB as absorbing data generated in real-time by (example) telemetry from many vehicles or heavy equipment, which quickly exhaust the storage capacity of Dataverse whilst not really taking advantage of Dataverse’s services for application data.
Blob Storage is where we store unstructured data at scale. Here we are providing a destination for those images, videos, audio files, documents, blocks of text, and really anything discussed previously as being “unstructured”. An example in a Power Platform context is that when a file (again, could be image, video, audio, document, etc.) is attached to a record in Dataverse, the file is stored behind the scenes in Blob Storage, and then referenced back when that record is accessed later. Blob Storage provides a much more efficient means of storage than storing that file in the database itself. This happens smartly, as a matter of course without any intervention from the developer of the app sitting atop Dataverse itself.

Analysis

The entire goal of the modern data platform is to access the insights and decision making made possible through “Analysis”. So it is in this stage where we really achieve value through the application of cognition around what is seen, spoken, and read in our data, machine learning, and through analysis of stream and customer data. It is also in this stage that we really begin to close the loop around Power Platform in our data ecosystem as we feed data directly to Dynamics Customer Insights which sits within the native Microsoft business applications / Power Platform sphere as part of the Dynamics 365 family of applications.

Cognitive Services apply artificial intelligence to derive meaning and insight from a range of visual, auditory, spoken, and other inputs. Cognitive Services absorb data from a variety of storage methods, and crucially may be integrated by software developers into end-user applications via API. Though not shown in our diagram (for simplicity’s sake), for example, apps built in Power Apps may be extended to tap into the analysis and decision capabilities found in Cognitive Services. So referring back to our examples from earlier, we might use Cognitive Services to extract data from a patient’s medical record or anomalies in the condition of a bus or railcar. Think of Cognitive Services as the artificial intelligence sitting behind your application data.
Data Bricks provide an Apache Spark based analytics capability for real-time analytics, big data analytics, and machine learning across an entire range of structured and unstructured data. The idea is to provide a (very) large scale platform for data science, machine learning, and data warehousing wherein data from a vast array of sources is (example) integrated via Data Factory, stored in Data Lake, and manipulated at scale using Data Bricks.
Stream Analytics is best thought of as a pipeline through which real time analysis occurs around streaming data. To understand this, imagine sitting on the beach and pouring many grains of sand through your hands. You’ll feel that some are course, some are fine, perhaps there are shards of other things—bits of sea shells?—mixed in, on a hot day perhaps the first grains are warm to the touch but those dug from beneath the beach’s top layer grow stone cold or even moist. You draw conclusions about the depth of sand or the presence of sea shells in your mind because you have context as to what texture and temperature mean here. Good. So now imagine vast amounts of data flowing through your hands—just like sand—as you analyze and draw insight from that data in real-time. Obviously we can’t pour those many grains of sand (data) through physical hands to feel and analyze; that’s what Stream Analytics does, though, and it is particularly important in scenarios where time is of the essence and we cannot afford to wait even a few minutes to pull data out of storage and analyze it for future use.
Dynamics Customer Insights begins to complete the loop, tying our data journey back to Power Platform on which it began. That’s because Customer Insights is the data analytics engine built into the Dynamics family of applications and therefor the Power Platform upon which Dynamics sits. It is, true to its name, very customer-centric around areas of sales and service, though the capability may be deployed for myriad business use cases. Data is integrated from a variety of sources (including directly from Dataverse, as shown in the diagram) in order to create a real-time all-round view of the “customer” (however we define who that is). For example, we might feed HR data into Customer Insights as we seek to identify employees or clusters of employees at risk of leaving the organization, or we might integrate customer data with financial services product and rate change data in order to identify churn risk or opportunities to cross-sell, or to evaluate the efficacy of product changes on the customer experience.

Visualization

The final stage in our data platform loop, “Visualization” is where we surface insights and make our analysis usable to humans in an interactive way. Our primary tool here is Power BI, one of Power Platform’s principal four native components (alongside Power Apps, Power Automate, and Power Virtual agents discussed previously). In Power BI we create richly interactive charts, dashboards, and reports drawing on our entire ecosystem of data (including Dataverse, as shown in the model). We also integrate, transform, and manipulate data here, though our ability to do this at scale is drastically enhanced by the cloud-based ingestion, storage, and analysis capabilities discussed previously.

Closing the Loop

Finally, because Power BI is integrated with the rest of Power Platform (and increasingly more so, for veterans of the technology who have been through this evolution) its components are embeddable inside of Power Apps—and vice versa—so that decision data and transactions within end-user applications may be linked. In other words, the customer service agent is taking action on a customer inside of an app based on data displayed to that agent in real time right alongside in the same end-user experience. Power Automate plays a role here, too, through which data in Power BI triggers automation to fire back in the “Data Collection” area of our model. Thus the insights surfaced to the user through Power BI—be they customer insights in the call center, vehicle telemetry for the adjuster in the field, patient medical insights in the clinic, fleet “out of service” data on the busses or rail, predictive analytics around employee churn, or qualification data for soldiers leaping from airplanes—those insights drive end user actions within apps, place contextual information front and center, and continually improve users’ ability to take action at the point of data collection.

This model has now been added to the Power Platform Adoption Framework as part of the framework’s Data Ecosystem dimension of enterprise management and governance.