Category Archives: Azure
Implementing Change Data Capture (CDC) in a Unity Catalog-Based Lakehouse Architecture
As organizations scale, full data reload pipelines quickly become inefficient and risky. Reporting refresh windows grow longer, source systems experience increased load, and data duplication issues begin to surface. In our recent Unity Catalog-based Lakehouse implementation, we modernized incremental data processing using a structured Change Data Capture (CDC) strategy. Instead of reloading entire datasets daily, we captured only incremental changes across CRM, ERP, HR, and finance systems and governed them through Unity Catalog. This blog explains how we designed and implemented CDC in a production-ready Lakehouse architecture, the decisions behind our approach, and the technical patterns that made it scalable. One of the first challenges in CDC implementations is avoiding hardcoded logic for every entity. Centralized Incremental Control Using Metadata Configuration Instead of embedding incremental rules inside notebooks, we designed a centralized configuration table that drives CDC dynamically. Each record in this control table defines: This allowed us to manage incremental extraction logic centrally without modifying pipeline code for every new table. Fig – Azure Storage Table showing IncrementalField and Timestamp columns Why This Matters This configuration-driven design enabled: Most CDC blogs discuss theory. Few show how incremental control is actually governed in production. Bronze Layer: Append-Only Incremental Capture Once incremental records are identified, they land in the bronze layer in Delta format. Key design decisions: The Bronze layer acts as the immutable change log of the system. This ensures: Bronze is not for reporting. It is for reliability. Structuring CDC Layers with Unity Catalog To ensure proper governance and separation of concerns, we structured our Lakehouse using Unity Catalog with domain-based schemas. Each environment (dev, test, prod) had its own catalog. Within each catalog: (Unity Catalog Bronze schema view) Why Unity Catalog Was Critical Unity Catalog ensured: CDC without governance can become fragile. Unity Catalog added structure and security to the incremental architecture. Silver Layer: Applying CDC with Delta MERGE The Silver layer is where CDC logic is applied. We implemented Type 1 Change Data Capture using Delta Lake MERGE operations. The logic follows: If a job runs twice, the data remains consistent. We intentionally chose Type 1 because reporting required the latest operational state rather than historical tracking. Handling Late-Arriving Data One common CDC failure point is late-arriving records. If extraction logic strictly uses: modified_timestamp > last_run_timeSome records may be missed due to clock drift or processing delays. To mitigate this, we: This ensured no silent data loss. Governance and Power BI Integration A key architectural decision was limiting Power BI access strictly to Gold tables. Through Unity Catalog: This ensured reporting teams could not accidentally query raw incremental data. The result was a clean, governed reporting layer powered by curated Delta tables. Performance Optimization Considerations To maintain optimal performance: Compared to full data reloads, incremental CDC significantly reduced cluster runtime and improved refresh stability. Common CDC Mistakes We Avoided During implementation, we intentionally avoided: These mistakes often appear only after production failures. Designing CDC carefully from the start prevented costly refactoring later. Business Impact By implementing CDC within a Unity Catalog-governed Lakehouse: The architecture is now scalable and future ready. To encapsulates, change data capture is not just an incremental filter. It is a disciplined architectural pattern. When combined with: It becomes a powerful foundation for enterprise analytics. Organizations modernizing their reporting platforms must move beyond full reload pipelines and adopt structured CDC approaches that prioritize scalability, reliability, and governance. If you found this blog useful and would like to discuss, ,Get in touch with CloudFronts at transform@cloudfronts.com.
Share Story :
How to Build an Incremental Data Pipeline with Azure Logic Apps
Why Incremental Loads Matter When integrating data from external systems, whether it’s a CRM, an ERP like Business Central, or an HR platform like Zoho People, pulling all data every time is expensive, slow, and unnecessary. The smarter approach is to track what has changed since the last successful run and fetch only that delta. This is the core idea behind an incremental data pipeline: identify a timestamp or sequence field in your source system, persist the last-known watermark, and use it as a filter on your next API call. Azure Logic Apps, paired with Azure Table Storage as a lightweight checkpoint store, gives you everything you need to implement this pattern without managing any infrastructure. Architecture Overview Instead of one large workflow doing everything, we separate responsibilities. One Logic App handles scheduling and orchestration. Another handles actual data extraction. Core components: 3. Metadata Design (Azure Table) Instead of hardcoding entity names and fields inside Logic Apps, we define them in Azure Table Storage. Example structure: PartitionKey RowKey IncrementalField displayName entity businesscentral 1 systemCreatedAt Vendor Ledger Entry vendorLedgerEntries zohopeople 1 modifiedtime Leave leave Briefly, this table answers three questions: – What entity should be extracted?– Which column defines incremental logic?– What was the last successful checkpoint? When you want to onboard a new entity, you add a row. No redesign needed. 4. Logic App 1 – Scheduler Trigger: Recurrence (for example, every 15 minutes) Steps: This Logic App should not call APIs directly. Its only job is orchestration. Keep it light. 5. Logic App 2 – Incremental Processor Trigger: HTTP (called from Logic App 1) Functional steps: Example: This is where the real work happens. 6. Checkpoint Strategy Each entity must maintain: – LastSuccessfulRunTime– Status– LastRecordTimestamp After successful extraction: Checkpoint = max(modifiedOn) from extracted data. This ensures: Checkpoint management is the backbone of incremental loading. If this fails, everything fails. This pattern gives you a production-grade incremental data pipeline entirely within Azure’s managed services. By centralizing entity configuration and watermarks in Azure Table Storage, you create a data-driven pipeline where adding a new integration is as simple as inserting a row — no code deployment required. The two-Logic-App architecture cleanly separates orchestration from execution, enables parallel processing, and ensures your pipeline is resilient to failures through checkpoint-based watermark management. Whether you’re pulling from Business Central, Zoho People, or any REST API that exposes a timestamp field, this architecture scales gracefully with your data needs. Explore the case study below to learn how Logic Apps were implemented to solve key business challenges: Ready to deploy AIS to seamlessly connect systems and improve operational cost and efficiency? Get in touch with CloudFronts at transform@cloudfronts.com.
Share Story :
Simplifying Data Pipelines with Delta Live Tables in Azure Databricks
From a customer perspective, the hardest part of data engineering isn’t building pipelines-it’s ensuring that the data customers rely on is accurate, consistent, and trustworthy. When reports show incorrect revenue or missing customer information, confidence drops quickly. This is where Delta Live Tables in Databricks makes a real difference for customers. Instead of customers dealing with broken dashboards, manual fixes in BI tools, or delayed insights, Delta Live Tables enforces data quality at the pipeline level. Using a Bronze–Silver–Gold approach: Data validation rules are built directly into the pipeline, and customers gain visibility into data quality through built-in monitoring-without extra tools or manual checks. Quick Preview Building data pipelines is not the difficult part. The real challenge is building pipelines that are reliable, monitored, and enforce data quality automatically. That’s where Delta Live Tables in Databricks makes a difference. Instead of stitching together notebooks, writing custom validation scripts, and setting up separate monitoring jobs, Delta Live Tables lets you define your transformations once and handles the rest. Let’s look at a simple example. Imagine an e-commerce company storing raw order data in a Unity Catalog table called: cf.staging.orders_raw The problem? The data isn’t perfect. Some records have negative quantities. Some orders have zero amounts. Customer IDs may be missing. There might even be duplicate order IDs. If this raw data goes straight into reporting dashboards, revenue numbers will be wrong. And once business users lose trust in reports, it’s hard to win it back. Instead of fixing issues later in Power BI or during analysis, we fix them at the pipeline level. In Databricks, we create an ETL pipeline and define a simple three-layer structure: Bronze for raw data, Silver for cleaned data, and Gold for business-ready aggregation. The Bronze layer simply reads from Unity Catalog: Nothing complex here. We’re just loading data from Unity Catalog. No manual dependency setup required. The real value appears in the Silver layer, where we enforce data quality rules directly inside the pipeline: Here’s what’s happening behind the scenes. Invalid rows are automatically removed. Duplicate orders are eliminated. Data quality metrics are tracked and visible in the pipeline UI. There’s no need for separate validation jobs or manual checks. This is what simplifies pipeline development. You define expectations declaratively, and Delta Live Tables enforces them consistently. Finally, in the Gold layer, we create a clean reporting table: At this point, only validated and trusted data reaches reporting systems. Dashboards become reliable. Delta Live Tables doesn’t replace databases, and it doesn’t magically fix bad source systems. What it does is simplify how we build and manage reliable data pipelines. It combines transformation logic, validation rules, orchestration, monitoring, and lineage into one managed framework. Instead of reacting to data issues after reports break, we prevent them from progressing in the first place. For customers, trust in data is everything. Delta Live Tables helps organizations ensure that only validated, reliable data reaches customer-facing dashboards and analytics. Rather than reacting after customers notice incorrect numbers, Delta Live Tables prevents poor-quality data from moving forward. By unifying transformation logic, data quality enforcement, orchestration, monitoring, and lineage in one framework, it enables teams to deliver consistent, dependable insights. The result for customers is simple: accurate reports, faster decisions, and confidence that the data they see reflects reality. I Hope you found this blog useful, and if you would like to discuss anything, you can reach out to us at transform@cloudFronts.com.
Share Story :
Databricks Notebooks Explained – Your First Steps in Data Engineering
If you’re new to Databricks, chances are someone told you “Everything starts with a Notebook.” They weren’t wrong. In Databricks, a Notebook is where your entire data engineering workflow begins from reading raw data, transforming it, visualizing trends, and even deploying jobs. It’s your coding lab, dashboard, and documentation space all in one. What Is a Databricks Notebook? A Databricks Notebook is an interactive environment that supports multiple programming languages such as Python, SQL, R, and Scala. Each Notebook is divided into cells you can write code, add text (Markdown), and visualize data directly within it. Unlike local scripts, Notebooks in Databricks run on distributed Spark clusters. That means even your 100 GB dataset is processed within seconds using parallel computation. So, Notebooks are more than just code editors they are collaborative data workspaces for building, testing, and documenting pipelines. How Databricks Notebooks Work Under the hood, every Notebook connects to a cluster a group of virtual machines managed by Databricks. When you run code in a cell, it’s sent to Spark running on the cluster, processed there, and results are sent back to your Notebook. This gives you the scalability of big data without worrying about servers or configurations. Setting Up Your First Cluster Before running a Notebook, you must create a cluster it’s like starting the engine of your car. Here’s how: Step-by-Step: Creating a Cluster in a Standard Databricks Workspace Once the cluster is active, you’ll see a green light next to its name now it’s ready to process your code. Creating Your First Notebook Now, let’s build your first Databricks Notebook: Your Notebook is now live ready to connect to data and start executing. Loading and Exploring Data Let’s say you have a sales dataset in Azure Blob Storage or Data Lake. You can easily read it into Databricks using Spark: df = spark.read.csv(“/mnt/data/sales_data.csv”, header=True, inferSchema=True)display(df.limit(5)) Databricks automatically recognizes your file’s schema and displays a tabular preview.Now, you can transform the data: from pyspark.sql.functions import col, sumsummary = df.groupBy(“Region”).agg(sum(“Revenue”).alias(“Total_Revenue”))display(summary) Or, switch to SQL instantly: %sqlSELECT Region, SUM(Revenue) AS Total_RevenueFROM sales_dataGROUP BY RegionORDER BY Total_Revenue DESC Visualizing DataDatabricks Notebooks include built-in charting tools.After running your SQL query:Click + → Visualization → choose Bar Chart.Assign Region to the X-axis and Total_Revenue to the Y-axis.Congratulations — you’ve just built your first mini-dashboard! Real-World Example: ETL Pipeline in a Notebook In many projects, Databricks Notebooks are used to build ETL pipelines: Each stage is often written in a separate cell, making debugging and testing easier.Once tested, you can schedule the Notebook as a Job running daily, weekly, or on demand. Best Practices To conclude, Databricks Notebooks are not just a beginner’s playground they’re the backbone of real data engineering in the cloud.They combine flexibility, scalability, and collaboration into a single workspace where ideas turn into production pipelines. If you’re starting your data journey, learning Notebooks is the best first step.They help you understand data movement, Spark transformations, and the Databricks workflow everything a data engineer need. We hope you found this blog useful, and if you would like to discuss anything, you can reach out to us at transform@cloudFronts.com
Share Story :
Advanced Time Travel & Data Recovery Strategies in Delta Lake
In production Databricks environments, data issues such as accidental overwrites, faulty MERGE conditions, or incorrect backfills are common. Delta Lake’s Time Travel is not just a feature – it is a critical recovery and governance mechanism. This blog focuses only on practical recovery strategies that are actually used in real-world production systems. Why Time Travel Is Critical in Production Common failure scenarios include: •a. INSERT OVERWRITE wiping historical data • b. Incorrect MERGE conditions deleting valid records • c. Wrong filters during backfill corrupting data Reprocessing data is expensive and risky. Time Travel enables instant rollback with minimal impact. Version vs Timestamp (What You Should Use) Always prefer version-based time travel for recovery operations. Why version-based recovery is preferred: • a. Precise and deterministic • b. No time zone dependency • c. Safest option for production recovery Use timestamp-based queries only for auditing, not recovery. Identify the Last Safe State Before performing any recovery, always inspect the table history. DESCRIBE HISTORY crm_opportunities; Key fields to review: • a. version • b. timestamp • c. operation • d. userName This history acts as the single source of truth during incidents. Recovery Patterns That Actually Work 1. Partial Data Recovery (Recommended) Recover only the affected records instead of rolling back the entire table. Advantages: • a. No downtime • b. Safe for downstream reports • c. Most production-friendly approach 2. Full Table Restore (Use Carefully) Advantages: •a. Fast and atomic Risks: •a. Impacts all downstream consumers Use this approach only when the entire table is corrupted. Safe Validation Using CLONE Before restoring data in production, validate changes using a clone. Typical use cases: • a. Validate recovered data • b. Compare versions •c. Run business checks Retention & VACUUM (Most Common Mistake) The following command causes permanent data loss: Once vacuumed aggressively, time travel breaks and rollback becomes impossible. Production-Safe Retention Recommended retention: • a. Critical tables: 30 days • b. Reporting tables: 7–14 days Auditing & Root Cause Analysis (RCA) Track who changed data and when: Compare changes between versions: Key Best Practices • a. Capture table version before running risky jobs • b. Always use version-based time travel for recovery • c. Prefer partial recovery over full restores • d. Avoid aggressive VACUUM operations • e. Extend retention for critical tables • f. Validate using CLONE before restoring To conclude, Delta Lake Time Travel is not a backup mechanism, but it is the fastest and safest recovery tool in Databricks. When used correctly, it prevents downtime, reduces reprocessing cost, and improves production reliability. For enterprise Databricks pipelines, mastering this capability is mandatory, not optional. We hope you found this blog useful, and if you would like to discuss anything, you can reach out to us at transform@cloudfronts.com
Share Story :
What Are Databricks Clusters? A Simple Guide for Beginners
A Databricks Cluster is a group of virtual machines (VMs) in the cloud that work together to process data using Apache Spark.It provides the memory, CPU, and compute power required to run your code efficiently. Clusters are used for: Each cluster has two main parts: Types of Clusters Databricks supports multiple cluster types, depending on how you want to work. Cluster Type Use Case Interactive (All-Purpose) Clusters Used for notebooks, ad-hoc queries, and development. Multiple users can attach their notebooks. Job Clusters Created automatically for scheduled jobs or production pipelines. Deleted after job completion. Single Node Clusters Used for small data exploration or lightweight development. No executors, only one driver node. How Databricks Clusters WorkWhen you execute a notebook cell, Databricks sends your code to the cluster.The cluster’s driver node divides your task into smaller jobs and distributes them to the executors.The executors process the data in parallel and send the results back to the driver.This distributed processing is what makes Databricks fast and scalable for handling massive datasets. Step-by-Step: Creating Your First Cluster Let’s create a cluster in your Databricks workspace. Step 1: Navigate to Compute In the Databricks sidebar, click Compute. You’ll see a list of existing clusters or an option to create a new one. Step 2: Create a New Cluster Click Create Compute in the top-right corner. Step 3: Configure Basic Settings Step 4: Select Node Type Choose the VM type based on your workload. For development, Standard_DS3_v2 or Standard_D4ds_v5 are cost-effective. Step 5: Auto-Termination Set the cluster to terminate after 10 or 20 minutes of inactivity. This prevents unnecessary cost when the cluster is idle. Step 6: Review and Create Click Create Compute. After a few minutes, your cluster will turn green, indicating it is ready to run code. Clusters in Unity Catalog-Enabled Workspaces If Unity Catalog is enabled in your workspace, there are a few additional configurations to note. Feature Standard Workspace Unity Catalog Workspace Access Mode Default is Single User. Must choose Shared, Single User, or No Isolation Shared. Data Access Managed by workspace permissions. Controlled through Catalog, Schema, and Table permissions. Data Hierarchy Database → Table Catalog → Schema → Table Example Query SELECT * FROM sales.customers; SELECT * FROM main.sales.customers; When you create a cluster with Unity Catalog, you will see a new Access Mode field in the configuration page. Choose “Shared” if multiple users need to access governed data under Unity Catalog. Managing Cluster Performance and CostClusters can become expensive if not managed properly. Follow these tips to optimize performance and cost: a. Use Auto-Termination to shut down idle clusters automatically.b. Choose the right VM size for your workload. Avoid oversizing.c. Use Job Clusters for production pipelines since they start and stop automatically.d. Leverage Autoscaling so Databricks can adjust the number of workers dynamically.e. Monitor with Ganglia metrics to identify performance bottlenecks. Common Cluster Issues and Fixes Issue Cause Fix Cluster stuck starting VM quota exceeded or region issue Change VM size or region. Slow performance Too few workers or data skew Increase worker count or repartition data. Access denied to data Missing storage credentials Use Databricks Secrets or Unity Catalog permissions. High cost Idle clusters running Enable auto-termination. Best Practices for Using Databricks Clusters1. Always attach your notebook to the correct cluster before running it.2. Use development, staging, and production clusters separately.3. Keep the cluster runtime version consistent across environments.4. Terminate unused clusters to reduce cost.5. If you use Unity Catalog, prefer Shared clusters for collaboration. To conclude, clusters are the heart of Databricks.They provide the compute power needed to process large-scale data efficiently. Without them, Databricks Notebooks and Jobs cannot run. Once you understand how clusters work, you will find it easier to manage costs, optimize performance, and build reliable data pipelines. We hope you found this blog useful, and if you would like to discuss anything, you can reach out to us at transform@cloudfronts.com
Share Story :
Time Travel in Databricks: A Complete, Simple & Practical Guide
Databricks Time Travel is a powerful feature of Delta Lake that allows you to access older versions of your data. Whether you want to debug issues, recover deleted records, compare historical performance, or audit how data changed over time—Time Travel makes it effortless. It’s like having a complete rewind button for your tables, eliminating the fear of accidental updates or deletes. What is Time Travel? Time Travel enables you to query previous snapshots of a Delta table using either VERSION AS OF or TIMESTAMP AS OF. Delta automatically versions every transaction-UPDATE, MERGE, DELETE, INSERT. So, you can always go back to an earlier state without restoring backups manually. This versioning is stored in the Delta Log, making rewind operations efficient and reliable. Why Time Travel Matters (Use Cases) Debugging Pipelines: Quickly check what the data looked like before a bad job ran. Accidental Deletes: Recover records or entire tables. Audit & Compliance: Easily demonstrate how data has evolved. Root Cause Analysis: Compare two versions side by side. Model Re-training: Use historical datasets to retrain ML models. Data Quality Tracking: Validate when incorrect data first appeared. How Delta Stores Versions (Architecture Overview) Delta Lake stores metadata and version history inside the _delta_log folder. Each commit creates a new JSON or checkpoint Parquet file representing table state. When you run a query using Time Travel, Databricks does not rebuild the entire table. Instead, it directly reads the snapshot based on the transaction log. This architecture makes Time Travel extremely fast and scalable—even on very large datasets. Time Travel Commands Query older data: SELECT * FROM table VERSION AS OF 5; SELECT * FROM table TIMESTAMP AS OF ‘2024-11-20T10:00:00’; A. Example: DESCRIBE HISTORY Below is an example of using DESCRIBE HISTORY on a Delta table. B. Querying a Specific Version Here is how you can fetch an older snapshot using VERSION AS OF. C. Restoring a Table You can restore a Delta table to any older version using RESTORE TABLE. Retention Rules Delta keeps older versions based on two configs: `delta.logRetentionDuration` → How long commit logs are stored. `delta.deletedFileRetentionDuration`→ How long old data files are retained. By default, Databricks keeps 30 days of history. You can increase this if your compliance policy requires longer retention. Best Practices – Use Time Travel for debugging pipeline issues. – Increase retention for sensitive or audited datasets. – Use `DESCRIBE HISTORY` frequently during development. – Avoid unnecessarily large retention windows—they increase storage costs. – Use `RESTORE` carefully in production environments. To conclude, time Travel in Databricks brings reliability, auditability, and simplicity to modern data engineering. It protects teams from accidental data loss and gives full visibility into how datasets evolve. With just a few commands, you can analyze, compare, or restore historical data instantly making it one of the most useful features of Delta Lake. We hope you found this blog useful, and if you would like to discuss anything, you can reach out to us at transform@cloudfronts.com
Share Story :
Real-Time Integration with Dynamics 365 Finance & Operations Using Azure Event Hub & Logic Apps (F&O as Source System)
Most organizations think of Dynamics 365 Finance & Operations (D365 F&O) only as a system that receives data from other applications. In reality, the most powerful and scalable architecture is when F&O itself becomes the source of truth and an event producer. Every financial transaction, inventory update, order confirmation, or invoice posting is a critical business event – and when these events are not shared with other systems in real time, businesses face: So, the real question is: What if every critical event in D365 F&O could instantly trigger actions in other systems? The answer lies in an event-driven architecture using Azure Event Hub and Azure Logic Apps, where F&O becomes the producer of events and the rest of the enterprise becomes real-time listeners. Core Content Event-Driven Model with F&O as Source In this model, whenever a business event occurs inside Dynamics 365 F&O, an event is immediately published to Azure Event Hub. That event is then picked up by Azure Logic Apps and forwarded to downstream systems such as: In simple terms: Event occurs in F&O → Event is pushed to Event Hub → Logic App processes → External system is updated This enables true real-time integration across your entire IT ecosystem. Why Use Azure Event Hub Between F&O and Other Systems? Azure Event Hub is designed for high-throughput, real-time event ingestion. This makes it the perfect choice for capturing business transactions from F&O. Azure Event Hub provides: This ensures that every change in F&O is captured and made available in real time to any subscribed system. Technical Architecture Here is the architecture with F&O as the source: Role of each layer: Component Responsibility D365 F&O Generates business events Event Hub Ingests & streams events Logic App Consumes + transforms events External Systems Act on the event This architecture is:✔ Decoupled✔ Scalable✔ Secure✔ Real-time✔ Fault tolerant How Does D365 F&O Send Events to Event Hub? Using Business Events F&O has built-in Business Events Framework which can be configured to trigger events such as: These business events can be configured to push data to an Azure Event Hub endpoint. This is the cleanest, lowest-code, and recommended approach. Logic App as Event Consumer (Real-Time Processing) Azure Logic App is connected to Event Hub via Event Hub Trigger: Once triggered, the Logic App performs: Example downstream actions: F&O Event Logic App Action Invoice Posted Push to Power BI + Send email Sales Order Create record in CRM Inventory Change Update eCommerce stock Vendor Created Sync with procurement system This allows one F&O event to trigger multiple automated actions across platforms in real time. Real-Time Example: Invoice Posted in F&O Step-by-step flow: All of this happens automatically, within seconds. This is true enterprise-wide automation. Key Technical Benefits Why this Architecture is important for Technical Leaders If you are a CTO, architect, or technical lead, this approach helps you: Instead of systems “asking” for data, they react to real-time business events. To conclude, by making Dynamics 365 Finance & Operations the event source and combining it with Azure Event Hub and Azure Logic Apps, organizations can create a fully automated, real-time, intelligence-driven ecosystem. Your first step: ➡ Identify a critical business event in F&O➡ Publish it to Azure Event Hub➡ Use Logic App to trigger automatic actions This single change can transform your integration strategy from reactive to proactive. We hope you found this blog useful, and if you would like to discuss anything, you can reach out to us at transform@cloudfronts.com
Share Story :
From Legacy Middleware Debt to AI Innovation: Rebuilding the Digital Backbone of a 150-Year-Old Manufacturer
Summary A global manufacturing client was facing rising middleware costs, poor visibility, and growing pressure to support analytics and AI initiatives. A forced three-year middleware commitment became the trigger to rethink their integration strategy. This article shares how the client moved away from legacy middleware, reduced integration costs by nearly 95%, improved operational visibility, and built a strong data foundation for the future. Table of Contents 1. The Middleware Cost Problem 2. Building a New Integration Setup 3. Making Integrations Visible 4. Preparing Data for AI 5. How We Did It Savings Metrics The Middleware Cost Problem The client was running critical integrations on a legacy middleware platform that had gradually become a financial and operational burden. Licensing costs increased sharply, with annual fees rising from $20,000 to $50,000 and a mandatory three-year commitment pushing the total to $160,000. Despite the cost, visibility remained limited. Integrations behaved like black boxes, failures were difficult to trace, and teams relied on manual intervention to diagnose and fix issues. At the same time, the business was pushing toward better reporting, analytics, and AI-driven insights. These initiatives required clean and reliable data flows that the existing middleware could not provide efficiently. Building a New Integration Setup Legacy middleware and Scribe-based integrations were replaced with Azure Logic Apps and Azure Functions. The new setup was designed to support global operations across multiple legal entities. Separate DataAreaIDs were maintained for regions including TOUS, TOUK, TOIN, and TOCN. Branching logic handled country-specific account number mappings such as cf_accountnumberus and cf_accountnumberuk. An agentless architecture was adopted using Azure Blob Storage with Logic Apps. This removed firewall and SQL connectivity challenges and eliminated reliance on unsupported personal-mode gateways. Making Integrations Visible The previous setup offered no centralized monitoring, making it difficult to detect failures early. A Power BI dashboard built on Azure Log Analytics provided a clear view of integration health and execution status. Automated alerts were configured to notify teams within one hour of failures, allowing issues to be addressed before impacting critical business processes. Preparing Data for AI With stable integrations in place, the focus shifted from cost savings to long-term readiness. Clean data flows became the foundation for platforms such as Databricks and governance layers like Unity Catalog. The architecture supports conversational AI use cases, enabling questions like “Is raw material available for this production order?” to be answered from a unified data foundation. As a first step, 32 reports were consolidated into a single catalog to validate data quality and integration reliability. How We Did It Retrieve config.json and checkpoint.txt from Azure Blob Storage for configuration and state control. Run incremental HTTP GET queries using ModifiedDateTime1 gt [CheckpointTimestamp]. Check for existing records using OData queries in target systems with keys such as ScribeCRMKey. Transform data using Azure Functions with region-specific Liquid templates. Write data securely using PATCH or POST operations with OAuth 2.0 authentication. Update checkpoint timestamps in Azure Blob Storage after successful execution. Log step-level success or failure using a centralized Logging Logic App (TO-UAT-Logs). Savings Metrics 95% reduction in annual integration costs, from $50,000 to approximately $2,555. Approximately $140,000 in annual savings. Integrations across D365 Field Service, D365 Sales, D365 Finance & Operations, Shopify, and SQL Server. Designed to support modernization of more than 600 fragmented reports. FAQs Q: How does this impact Shopify integrations? A: Azure Integration Services acts as the middle layer, enabling Shopify orders to synchronize into Finance & Operations and CRM systems in real time. Q: Is the system secure for global entities? A: Yes. The solution uses Azure AD OAuth 2.0 and centralized key management for all API calls. Q: Can it handle attachments? A: Dedicated Logic Apps were designed to synchronize CRM annotations and attachments to SQL servers located behind firewalls using an agentless architecture. We hope you found this blog useful, and if you would like to discuss anything, you can reach out to us at transform@cloudfronts.com.
Share Story :
Bridging Project Execution and Finance: How PO F&O Connector Unlocks Full Value in Dynamics 365
In a world where timing, accuracy, and coordination make or break profitability, modern project-based enterprises demand more than isolated systems. You may be leveraging Dynamics 365 Project Operations (ProjOps) to manage projects, timesheets, and resource planning and Dynamics 365 Finance & Operations (F&O) for financials, billing, and accounting. But without seamless integration, you’re stuck with manual transfers, data silos, and delayed insights. That’s where PO F&O Connector app comes in built to synchronize Project Operations and F&O end-to-end, bringing together delivery and finance in perfect alignment. In this article, we’ll explore how it works, why it matters to CEOs, CFOs, and CTOs, and how adopting it gives you a competitive edge. The Pain Point: Disconnected Project & Finance Workflows When your project execution and financial systems aren’t talking: The result? Missed revenue, resource inefficiencies, and poor visibility into project financial health. The Solution: Cloudfronts Project-to-Finance Integration App Cloudfronts new app is purpose-built to connect Project Operations → Finance & Operations seamlessly, automating the flow of project data into financial systems and enabling real-time, consistent delivery-to-finance synchronization. Key capabilities include: Role Core Benefits Outcomes CEO Visibility into project margins and outcomes; faster time to value Better strategic decisions, competitive agility CFO Automates billing, enforces accounting rules, ensures audit compliance Revenue gets recognized faster, finance becomes a strategic enabler CTO Reduces custom integration burdens, ensures system integrity Lower maintenance costs, scalable architecture Beyond roles, your entire organization benefits through: We hope you found this blog useful, and if you would like to discuss anything, you can reach out to us at transform@cloudfronts.com
