Category Archives: DataBricks
How Unity Catalog Improves Data Governance for Power BI and Databricks Projects
As organizations scale their analytics platforms, governance often becomes the hardest problem to solve. Data may be accurate, pipelines may run on time, and reports may look correct, but without proper governance, the platform becomes fragile. We see this pattern frequently in environments where Power BI reporting has grown around a mix of SQL Server databases, direct Dataverse connections, shared storage accounts, and manually managed permissions. Over time, access control becomes inconsistent, ownership is unclear, and even small changes introduce risk. Unity Catalog addresses this problem by introducing a centralized, consistent governance layer across Databricks and downstream analytics tools like Power BI. The Governance Problem Most Teams Face In many data platforms, governance evolves as an afterthought. Access is granted at different layers depending on urgency rather than design. Common symptoms include: As reporting expands across departments like Finance, HR, PMO, and Operations, this fragmented governance model becomes difficult to control and audit. Why Unity Catalog Changes the Governance Model Unity Catalog introduces a unified governance layer that sits above storage and compute. Instead of managing permissions at the file or database level, governance is applied directly to data assets in a structured way. At its core, Unity Catalog provides: This shifts governance from an operational task to an architectural capability. A Structured Data Hierarchy That Scales Unity Catalog organizes data into a simple, predictable hierarchy: Catalog → Schema → Table This structure brings clarity to large analytics environments. Business domains can be separated cleanly, such as CRM, Finance, HR, or Projects, while still being governed centrally. For Power BI teams, this means datasets are easier to discover, understand, and trust. There is no ambiguity about where data lives or who owns it. Centralized Access Control Without Storage Exposure One of the biggest advantages of Unity Catalog is that access is granted at the data object level, not the storage level. Instead of giving Power BI users or service principals direct access to storage accounts, permissions are granted on catalogs, schemas, or tables. This significantly reduces security risk and simplifies access management. From a governance perspective, this enables: Power BI connects only to governed datasets, not raw storage paths. Cleaner Integration with Power BI When Power BI connects to Delta tables governed by Unity Catalog, the reporting layer becomes simpler and more secure. Benefits include: This model works especially well when combined with curated Gold-layer tables designed specifically for reporting. Governance at Scale, Not Just Control Unity Catalog is not only about restricting access. It is about enabling teams to scale responsibly. By defining ownership, standardizing naming, and centralizing permissions, teams can onboard new data sources and reports without reworking governance rules each time. This is particularly valuable in environments where multiple teams build and consume analytics simultaneously. Why This Matters for Decision Makers For leaders responsible for data, analytics, or security, Unity Catalog offers a way to balance speed and control. It allows teams to move quickly without sacrificing governance. Reporting platforms become easier to manage, easier to audit, and easier to extend as the organization grows. More importantly, it reduces long-term operational risk by replacing ad-hoc permission models with a consistent governance framework. To conclude, strong governance is not about slowing teams down. It is about creating a structure that allows analytics platforms to grow safely and sustainably. Unity Catalog provides that structure for Databricks and Power BI environments. By centralizing access control, standardizing data organization, and removing the need for direct storage exposure, it enables a cleaner, more secure analytics foundation. For organizations modernizing their reporting platforms or planning large-scale analytics initiatives, Unity Catalog is not optional. It is foundational. If your Power BI and Databricks environment is becoming difficult to govern as it scales, it may be time to rethink how access, ownership, and data structure are managed. We have implemented Unity Catalog–based governance in real enterprise environments and have seen the impact it can make. If you are exploring similar initiatives or evaluating how to strengthen governance across your analytics platform, we are always open to sharing insights from real-world implementations. We hope you found this blog useful, and if you would like to discuss anything, you can reach out to us at transform@cloudFronts.com
Share Story :
Databricks Notebooks Explained – Your First Steps in Data Engineering
If you’re new to Databricks, chances are someone told you “Everything starts with a Notebook.” They weren’t wrong. In Databricks, a Notebook is where your entire data engineering workflow begins from reading raw data, transforming it, visualizing trends, and even deploying jobs. It’s your coding lab, dashboard, and documentation space all in one. What Is a Databricks Notebook? A Databricks Notebook is an interactive environment that supports multiple programming languages such as Python, SQL, R, and Scala. Each Notebook is divided into cells you can write code, add text (Markdown), and visualize data directly within it. Unlike local scripts, Notebooks in Databricks run on distributed Spark clusters. That means even your 100 GB dataset is processed within seconds using parallel computation. So, Notebooks are more than just code editors they are collaborative data workspaces for building, testing, and documenting pipelines. How Databricks Notebooks Work Under the hood, every Notebook connects to a cluster a group of virtual machines managed by Databricks. When you run code in a cell, it’s sent to Spark running on the cluster, processed there, and results are sent back to your Notebook. This gives you the scalability of big data without worrying about servers or configurations. Setting Up Your First Cluster Before running a Notebook, you must create a cluster it’s like starting the engine of your car. Here’s how: Step-by-Step: Creating a Cluster in a Standard Databricks Workspace Once the cluster is active, you’ll see a green light next to its name now it’s ready to process your code. Creating Your First Notebook Now, let’s build your first Databricks Notebook: Your Notebook is now live ready to connect to data and start executing. Loading and Exploring Data Let’s say you have a sales dataset in Azure Blob Storage or Data Lake. You can easily read it into Databricks using Spark: df = spark.read.csv(“/mnt/data/sales_data.csv”, header=True, inferSchema=True)display(df.limit(5)) Databricks automatically recognizes your file’s schema and displays a tabular preview.Now, you can transform the data: from pyspark.sql.functions import col, sumsummary = df.groupBy(“Region”).agg(sum(“Revenue”).alias(“Total_Revenue”))display(summary) Or, switch to SQL instantly: %sqlSELECT Region, SUM(Revenue) AS Total_RevenueFROM sales_dataGROUP BY RegionORDER BY Total_Revenue DESC Visualizing DataDatabricks Notebooks include built-in charting tools.After running your SQL query:Click + → Visualization → choose Bar Chart.Assign Region to the X-axis and Total_Revenue to the Y-axis.Congratulations — you’ve just built your first mini-dashboard! Real-World Example: ETL Pipeline in a Notebook In many projects, Databricks Notebooks are used to build ETL pipelines: Each stage is often written in a separate cell, making debugging and testing easier.Once tested, you can schedule the Notebook as a Job running daily, weekly, or on demand. Best Practices To conclude, Databricks Notebooks are not just a beginner’s playground they’re the backbone of real data engineering in the cloud.They combine flexibility, scalability, and collaboration into a single workspace where ideas turn into production pipelines. If you’re starting your data journey, learning Notebooks is the best first step.They help you understand data movement, Spark transformations, and the Databricks workflow everything a data engineer need. We hope you found this blog useful, and if you would like to discuss anything, you can reach out to us at transform@cloudFronts.com
Share Story :
How Delta Lake Strengthens Data Reliability in Databricks
The Hidden Problem with Data Lakes Before Delta Lake, data engineers faced a common challenge. Jobs failed midway, data was partially written, and there was no way to roll back. Over time, these issues led to inconsistent reports and untrustworthy dashboards. Delta Lake was created to fix exactly this kind of chaos. What Is Delta Lake Delta Lake is an open-source storage layer developed by Databricks that brings reliability, consistency, and scalability to data lakes. It works on top of existing cloud storage like Azure Data Lake, AWS S3, or Google Cloud Storage. Delta Lake adds important capabilities to traditional data lakes such as: It forms the foundation of the Databricks Lakehouse, which combines the flexibility of data lakes with the reliability of data warehouses. How Delta Lake Works – The Transaction Log Every Delta table has a hidden folder called _delta_log.This folder contains JSON files that track every change made to the table. Instead of overwriting files, Delta Lake appends new parquet files and updates the transaction log. This mechanism allows you to view historical versions of data, perform rollbacks, and ensure data consistency across multiple jobs. ACID Transactions – The Reliability Layer ACID stands for Atomicity, Consistency, Isolation, and Durability. These properties ensure that data is never partially written or corrupted even when multiple pipelines write to the same table simultaneously. If a job fails in the middle of execution, Delta Lake automatically rolls back the incomplete changes.Readers always see a consistent snapshot of the table, which makes your data trustworthy at all times. Time Travel – Querying Past Versions Time Travel allows you to query older versions of your Delta table. It is extremely helpful for debugging or recovering accidentally deleted data. Example queries: SELECT * FROM sales_data VERSION AS OF 15; SELECT * FROM sales_data TIMESTAMP AS OF ‘2025-10-28T08:00:00.000Z’; These commands retrieve data as it existed at that specific point in time. Schema Enforcement and Schema Evolution In a traditional data lake, incoming files with different schemas often cause downstream failures.Delta Lake prevents this by enforcing schema validation during writes. If you intentionally want to add a new column, you can use schema evolution: df.write.option(“mergeSchema”, “true”).format(“delta”).mode(“append”).save(“/mnt/delta/customers”) This ensures that the new schema is safely merged without breaking existing queries. Practical Example – Daily Customer Data UpdatesSuppose you receive a new file of customer data every day.You can easily merge new records with existing data using Delta Lake: MERGE INTO customers AS targetUSING updates AS sourceON target.customer_id = source.customer_idWHEN MATCHED THEN UPDATE SET *WHEN NOT MATCHED THEN INSERT * This command updates existing records and inserts new ones without duplication. Delta Lake in the Medallion ArchitectureDelta Lake fits perfectly into the Medallion Architecture followed in Databricks. Maintenance: Optimize and VacuumDelta Lake includes commands that keep your tables optimized and storage efficient. Layer Purpose Bronze Raw data from various sources Silver Cleaned and validated data Gold Aggregated data ready for reporting OPTIMIZE sales_data;VACUUM sales_data RETAIN 168 HOURS. OPTIMIZE merges small files for faster queries.VACUUM removes older versions of data files to save storage. Unity Catalog IntegrationWhen Unity Catalog is enabled, your Delta tables become part of a centralized governance layer.Access to data is controlled at the Catalog, Schema, and Table levels. Example: SELECT * FROM main.sales.customers; This approach improves security, auditing, and collaboration across multiple Databricks workspaces. Best Practices for Working with Delta Lake a. Use Delta format for both intermediate and final datasets.b. Avoid small file issues by batching writes and running OPTIMIZE.c. Always validate schema compatibility before writing new data.d. Use Time Travel to verify or restore past data.e. Schedule VACUUM jobs to manage storage efficiently.f. Integrate with Unity Catalog for secure data governance. Why Delta Lake Matters Delta Lake bridges the gap between raw data storage and reliable analytics. It combines the best features of data lakes and warehouses, enabling scalable and trustworthy data pipelines. With Delta Lake, you can build production-grade ETL workflows, maintain versioned data, and ensure that every downstream system receives clean and accurate information. Convert an existing Parquet table into Delta format using: CONVERT TO DELTA parquet./mnt/raw/sales_data/; Then try using Time Travel, Schema Evolution, and Optimize commands. You will quickly realize how Delta Lake simplifies complex data engineering challenges and builds reliability into every pipeline you create. To conclude, Delta Lake provides reliability, performance, and governance for modern data platforms.It transforms your cloud data lake into a true Lakehouse that supports both data engineering and analytics efficiently. We hope you found this blog useful, and if you would like to discuss anything, you can reach out to us at transform@cloudFronts.com
Share Story :
Deploying AI Agents with Agent Bricks: A Modular Approach
In today’s rapidly evolving AI landscape, organizations are seeking scalable, secure, and efficient ways to deploy intelligent agents. Agent Bricks offers a modular, low-code approach to building AI agents that are reusable, compliant, and production-ready. This blog post explores the evolution of AI leading to Agentic AI, the prerequisites for deploying Agent Bricks, a real-world HR use case, and a glimpse into the future with the ‘Ask Me Anything’ enterprise AI assistant. Prerequisites to Deploy Agent Bricks Use Case: HR Knowledge Assistant HR departments often manage numerous SOPs scattered across documents and portals. Employees struggle to find accurate answers, leading to inefficiencies and inconsistent responses. Agent Bricks enables the deployment of a Knowledge Assistant that reads HR SOPs and answers employee queries like ‘How many casual leaves do I get?’ or ‘Can I carry forward sick leave?’. Business Impact: Agent Bricks in Action: Deployment Steps Figure 1: Add data to the volumes Figure 2: Select Agent bricks module Figure 3: Click on Create Agent option to deploy your agent Figure 4: Click on Update Agent option to update deploy your agent Agent Bricks in Action: Demo Figure 1: Response on Question based on data present in the dataset Figure 2: Response on Question asked based out of the present in the dataset To conclude, Agent Bricks empowers organizations to build intelligent, modular AI agents that are secure, scalable, and impactful. Whether you’re starting with a small HR assistant or scaling to enterprise-wide AI agents, the time to act is now. AI is no longer just a tool it’s your next teammate. Start building your AI workforce today with Agent Bricks. We hope you found this blog useful, and if you would like to discuss anything, you can reach out to us at transform@cloudFronts.com Start Your AI Journey Today !!
Share Story :
Databricks vs Azure Data Factory: When to Use Which in ETL Pipelines
Introduction: Two Powerful Tools, One Common Question If you work in data engineering, you’ve probably faced this question:Should I use Azure Data Factory or Databricks for my ETL pipeline? Both tools can move and transform data, but they serve very different purposes.Understanding where each tool fits can help you design cleaner, faster, and more cost-effective data pipelines. Let’s explore how these two Azure services complement each other rather than compete. What Is Azure Data Factory (ADF) Azure Data Factory is a data orchestration service.It’s designed to move, schedule, and automate data workflows between systems. Think of ADF as the “conductor of your data orchestra” — it doesn’t play the instruments itself, but it ensures everything runs in sync. Key Capabilities of ADF: Best For: What Is Azure Databricks Azure Databricks is a data processing and analytics platform built on Apache Spark.It’s designed for complex transformations, data modeling, and machine learning on large-scale data. Think of Databricks as the “engine” that processes and transforms the data your ADF pipelines deliver. Key Capabilities of Databricks: Best For: ADF vs Databricks: A Detailed Comparison Feature Azure Data Factory (ADF) Azure Databricks Primary Purpose Orchestration and data movement Data processing and advanced transformations Core Engine Integration Runtime Apache Spark Interface Type Low-code (GUI-based) Code-based (Python, SQL, Scala) Performance Limited by Data Flow engine Distributed and scalable Spark clusters Transformations Basic mapping and joins Complex joins, ML models, and aggregations Data Handling Batch-based Batch and streaming Cost Model Pay per pipeline run and Data Flow activity Pay per cluster usage (compute time) Versioning and Debugging Visual monitoring and alerts Notebook history and logging Integration Best for orchestrating multiple systems Best for building scalable ETL within pipelines In simple terms, ADF moves the data, while Databricks transforms it deeply. When to Use ADF Use Azure Data Factory when: Example:Copying data daily from Salesforce and SQL Server into Azure Data Lake. When to Use Databricks Use Databricks when: Example:Transforming millions of sales records into curated Delta tables with customer segmentation logic. When to Use Both Together In most enterprise data platforms, ADF and Databricks work together. Typical Flow: This hybrid approach combines the automation of ADF with the computing power of Databricks. Example Architecture:ADF → Databricks → Delta Lake → Synapse → Power BI This is a standard enterprise pattern for modern data engineering. Cost Considerations Using ADF for orchestration and Databricks for processing ensures you only pay for what you need. Best Practices Azure Data Factory and Azure Databricks are not competitors.They are complementary tools that together form a complete ETL solution. Understanding their strengths helps you design data pipelines that are reliable, scalable, and cost-efficient. We hope you found this blog useful, and if you would like to discuss anything, you can reach out to us at transform@cloudFronts.com
Share Story :
Advantages and Future Scope of the Unified Databricks Architecture – Part 2
Following our unified data architecture implementation using Databricks Unity Catalog, the next step focuses on understanding the advantages and future potential of this Lakehouse-driven ecosystem. The architecture consolidates data from multiple business systems and transforms it into an AI-powered data foundation that will support advanced analytics, automation, and conversational insights. Key Advantages Centralized Governance:Unity Catalog provides complete visibility into data lineage, security, and schema control — eliminating silos. Dynamic and Scalable Data Loading:A single Databricks notebook can dynamically load and transform data from multiple systems, simplifying maintenance. Enhanced Collaboration:Teams across domains can access shared data securely while maintaining compliance and data accuracy. Improved BI and Reporting:More than 30 Power BI reports are being migrated to the Gold layer for unified reporting. AI & Automation Ready:The architecture supports seamless integration with GenAI tools like Genie for natural language Q&A and predictive insights. Future Aspects In the next phase, we aim to:– Integrate Genie for conversational analytics.– Enable real-time insights through streaming pipelines.– Extend the Lakehouse to additional business sources.– Automate AI-based report generation and anomaly detection. For example, business users will soon be able to ask questions like:“How many hours did a specific resource submit in CRM time entries last week?”Databricks will process this query dynamically, returning instant, AI-driven insights. To conclude, the unified Databricks architecture is more than a data pipeline — it’s the foundation for AI-powered decision-making. By merging governance, automation, and intelligence, CloudFronts is building the next generation of data-first, AI-ready enterprise solutions.
Share Story :
Unified Data Architecture with Databricks Unity Catalog – Part 1
At CloudFronts Technologies, we are implementing a Unified Data Architecture powered by Databricks Unity Catalog to bring together data from multiple business systems into one governed, AI-ready platform. This solution integrates five major systems — Zoho People, Zoho Books, Business Central, Dynamics 365 CRM, and QuickBooks — using Azure Logic Apps, Blob Storage, and Databricks to build a centralized Lakehouse foundation. Objective To design a multi-source data architecture that supports:– Centralized data storage via Unity Catalog.– Automated ingestion through Azure Logic Apps.– Dynamic data loading and transformation in Databricks.– Future-ready integration for AI and BI analytics. Architecture Overview Data Flow Summary:1. Azure Logic Apps extract data from each of the five sources via APIs.2. Data is stored in Azure Blob Storage containers.3. Blob containers are mounted to Databricks for unified access.4. A dynamic Databricks notebook reads and processes data from all sources. Each data source operates independently while following a governed and modular design, making the solution scalable and easily maintainable. Role of Unity Catalog Unity Catalog enables lineage, and secure access across teams. Each layer — Bronze (raw), Silver (refined), and Gold (business-ready) — is managed under Unity Catalog, ensuring clear visibility into data flow and ownership. This ensures that as data grows, governance and performance remain consistent across all environments. Implementation Preview:In the upcoming blog, I will demonstrate the end-to-end implementation of one Power BI report using this unified Databricks architecture. This will include connecting the gold layer dataset from Databricks to Power BI, building dynamic visuals, and showcasing how the unified data foundation simplifies report creation and maintenance across multiple systems. To conclude, this architecture lays the foundation for a unified, governed, and scalable data ecosystem. By combining Azure Logic Apps, Blob Storage, and Databricks Unity Catalog, we are enabling a single source of truth that supports analytics, automation, and future AI innovations.
Share Story :
How Delta Lake Keeps Your Data Clean, Consistent, and Future-Ready
Delta Lake is a storage layer that brings reliability, consistency, and flexibility to big data lakes. It enables advanced features such as Time Travel, Schema Evolution, and ACID Transactions, which are crucial for modern data pipelines. Feature Benefit Time Travel Access historical data for auditing, recovery, or analysis. Schema Evolution Adapt automatically to changes in the data schema. ACID Transactions Guarantee reliable and consistent data with atomic upserts. 1. Time Travel Time Travel allows you to access historical versions of your data, making it possible to “go back in time” and query past snapshots of your dataset. Use Cases:– Recover accidentally deleted or updated data.– Audit and track changes over time.– Compare dataset versions for analytics. How it works:Delta Lake maintains a transaction log that records every change made to the table. You can query a previous version using either a timestamp or a version number. Example: 2. Schema Evolution Schema Evolution allows your Delta table to adapt automatically to changes in the data schema without breaking your pipelines. Use Cases:– Adding new columns to your dataset.– Adjusting to evolving business requirements.– Simplifying ETL pipelines when source data changes. How it works:When enabled, Delta automatically updates the table schema if the incoming data contains new columns. Example: 3. ACID Transactions (with Atomic Upsert) ACID Transactions (Atomicity, Consistency, Isolation, Durability) ensure that all data operations are reliable and consistent, even in the presence of concurrent reads and writes. Atomic Upsert guarantees that an update or insert operation happens fully or not at all. Key Benefits:– No partial updates — either all changes succeed or none.– Safe concurrent updates from multiple users or jobs.– Consistent data for reporting and analytics.– Atomic Upsert ensures data integrity during merges. Atomic Upsert Example (MERGE): Here:– whenMatchedUpdateAll() updates existing rows.– whenNotMatchedInsertAll() inserts new rows.– The operation is atomic — either all updates and inserts succeed together or none. To conclude, Delta Lake makes data pipelines modern, maintainable, and error-proof. By leveraging Time Travel, Schema Evolution, and ACID Transactions, you can build robust analytics and ETL workflows with confidence, ensuring reliability, consistency, and adaptability in your data lake operations. We hope you found this blog useful, and if you would like to discuss anything, you can reach out to us at transform@cloudfronts.com
Share Story :
CloudFronts Strengthens Partnership with Databricks to Drive AI-Powered Growth
CloudFronts is proud to announce its expanded collaboration with Databricks, the leading Data and AI company, as a Select Partner. This partnership marks a significant milestone in CloudFronts’ mission to help organizations unlock the full value of their data and accelerate business growth through AI-driven innovation. Together, CloudFronts and Databricks empower enterprises to establish a strong, scalable data foundation, automate repetitive tasks, enhance decision-making accuracy, and deliver hyper-personalized, data-driven customer experiences. Building Deep Expertise in Data and AI CloudFronts’ rapidly growing Data & AI practice now includes 15 Databricks-certified engineers and 3 sales experts with official Databricks credentials. This specialized expertise enables CloudFronts to deliver enterprise-grade data solutions across the entire analytics lifecycle, covering ingestion, transformation, storage, modeling, and insight generation. Clients benefit from faster implementation timelines, optimized performance, and measurable ROI. Beyond project execution, CloudFronts is deeply engaged in nurturing the global Databricks community. The team has hosted a Databricks Hackathon in Zurich, fostering collaboration and learning among data professionals, and proudly sponsored the Databricks Data + AI World Tour in Dallas, reflecting its commitment to knowledge sharing, innovation, and thought leadership within the Data & AI ecosystem. CloudFronts continues to invest in continuous learning and community enablement through a series of webinars on Data and AI best practices. The upcoming session, “Clean Data with Medallion Architecture,” will guide participants on building high-quality, scalable data pipelines and architectures. Professionals interested in deepening their expertise can register to learn directly from CloudFronts’ certified experts. To date, CloudFronts has successfully delivered three Databricks-powered projects that have transformed client operations and delivered a measurable impact hese include implementations for a leading global laboratory solutions provider, a major retail and hardware enterprise in the Maldives, and an internal data modernization initiative. Each project highlights CloudFronts’ commitment to delivering tangible business value through robust data engineering, advanced analytics, and AI integration. “Discover How We’ve Enabled Businesses Like Yours – Explore Our Client Testimonials!” About CloudFronts CloudFronts is a global AI- First Microsoft Solutions & Databricks Partner for Business Applications, Data & AI, helping teams and organizations worldwide solve their complex business challenges with Microsoft Cloud, AI, and Azure Integration Services. We have a global presence with offices in U.S, Singapore & India. Since its inception in 2012, CloudFronts has successfully served over 200+ small and medium-sized clients all over the world, such as North America, Europe, Australia, MENA, Maldives & India, with diverse experiences in sectors ranging from Professional Services, Financial Services, Manufacturing, Retail, Logistics/SCM, and Non-profits. Please feel free to connect with us at transform@cloudfronts.com
Share Story :
Inside SmartPitch: How CloudFronts Built an Enterprise-Grade AI Sales Agent Using Microsoft and Databricks Technologies
Why SmartPitch? – The Idea and Pain Point The idea for SmartPitch came directly from observing the day-to-day struggles of sales and pre-sales teams. Every Marketing Qualified Lead (MQL) to Sales Qualified Lead (SQL) conversion required hours of manual work: searching through documents stored in SharePoint, combing through case studies, aligning them with solution areas, and finally packaging them into a client-ready pitch deck. The reality was that documents across systems—SharePoint, Dynamics 365, PDFs, PPTs—remained underutilized because there was no intelligent way to bring them together. Sales teams often relied on tribal knowledge or reused existing decks with limited personalization. We asked: What if a sales assistant could automatically pull the right case studies, map them to solution areas, and draft an elevator pitch on demand, in minutes? That became the SmartPitch vision: an AI-powered agent that: As a result of this product, it has helped us reduce pitch creation time by 70%. 2. The First Prototype – Custom Copilot Studio Our first step was to build SmartPitch using Custom Copilot Studio. It gave us a low-code way to experiment with conversational flows, integrate with Azure AI Search, and provide sales teams with a chat interface. 1. Knowledge Sources Integration 2. Data Flow 3. Conversational Flow Design 4. Integration and Security 5. Technical Stack 6. Business Process Enablement 7. Early Prototypes With Custom Copilot, we were able to: We successfully demoed these early prototypes in Zurich and New York. They showed that the idea worked but they also revealed serious limitations. 3. Challenges in Custom Copilot Despite proving the concept, Custom Copilot Studio had critical shortcomings: Lacked support for model fine-tuning or advanced RAG customization. However, incorporating complex external APIs or custom workflows was difficult. This limitation meant SmartPitch, in its Copilot form, couldn’t scale to meet enterprise standards. 4. Rebuilding in Azure AI Foundry – Smarter, Extensible, Connected The next phase was Azure AI Foundry, Microsoft’s enterprise AI development platform. Unlike Copilot Studio, AI Foundry gave us: Extending SmartPitch with Logic Apps One of the biggest upgrades was the ability to integrate Azure Logic Apps as external tools for the agent. This allowed SmartPitch to: This modular approach meant we could add new functionality simply by publishing a new Logic App. No redeployment of SmartPitch was required. Automating Document Vectorization We also solved one of the biggest bottlenecks—document ingestion and retrieval—by building a pipeline for automatic document vectorization from SharePoint: This allowed SmartPitch to search across text, images, tables, and PDFs, providing relevant answers instead of keyword matches. But There Were Limitations Even with these improvements, we hit roadblocks: At this point, we realized the true bottleneck wasn’t the agent itself, it was the quality of the data powering it. 5. Bad Data, Governance, and the Medallion Architecture SmartPitch’s performance was only as good as the data it retrieved from. And much of the enterprise data was dirty: duplicate case studies, outdated documents, inconsistent file formats. This led to irrelevant or misleading responses in pitches. To address this, we turned to Databricks’ Unity Catalog and Medallion Architecture: You can read our post on building a clean data foundation with Medallion Architecture [Link] Now, every result SmartPitch surfaced could be trusted, audited, and tied to a governed source. 6. SmartPitch in Mosaic AI – The Final Evolution The last stage was migrating SmartPitch into Databricks Mosaic AI, part of the Lakehouse AI platform. This was where SmartPitch matured into an enterprise-grade solution. What We Gained in Mosaic AI: In Mosaic AI, SmartPitch wasn’t just a chatbot it became a data-native enterprise sales assistant: From these, we came to know the following differences between agent development in AI Foundry & DataBricks Mosaic AI – Attribute / Aspect Azure AI Foundry Mosaic AI Focus Developer and Data Scientist Data Engineers, Analysts, and Data Scientists Core Use Case Create and manage your own AI agent Build, experiment, and deploy data-driven AI models with analytics + AI workflows Interface Code-first (SDKs, REST APIs, Notebooks) No-code/low-code UI + Notebooks + APIs Data Access Azure Blob, Data Lake, vector DBs Native integration with Databricks Lakehouse, Delta Lake, Unity Catalog, vector DBs MCP Server Only custom MCP servers supported; built-in option complex Native MCP support with Databricks ecosystem; simpler setup Models 90 models available Access to open-source + foundation models (MPT, Llama, Mixtral, etc.) + partner models Model Customization Full model fine-tuning, prompt engineering, RAG Fine-tuning, instruction tuning, RAG, model orchestration Publish to Channels Complex (Azure Bot SDK + Bot Framework + App Service) Direct integration with Databricks workflows, APIs, dashboards, and third-party apps Agent Update Real-time updates in Microsoft Teams Updates deployed via Databricks workflows; versioning and rollback supported Key Capabilities Prompt flow orchestration, RAG, model choice, vector search, CICD pipelines, Azure ML & responsible AI integration Data + AI unification (native to Lakehouse), RAG with Lakehouse data, multi-model orchestration, fine-tuning, end-to-end ML pipelines, secure governance via Unity Catalog, real-time deployment Key Components Workspace & agent orchestration, 90+ models, OpenAI pay-as-you-go or self-hosted, security via Azure identity Mosaic AI Agent Framework, Model Serving, Fine-Tuning, Vector Search, RAG Studio, Evaluation & Monitoring, Unity Catalog Integration Cost / License Vector DB: external, Model Serving: token-based pricing (GPT-3.5, GPT-4), Fine-tuning: case-by-case, Total agent cost variable (~$5k–$7k+/month) Vector Search: $605–$760/month for 5M vectors, Model Serving: $90–$120 per million tokens, Fine-Tuning Llama 3.3: $146–$7,150, Managed Compute built into DBU usage, End-to-end AI Agent ~$5k–$7k+/month Use Cases / Capabilities Agents intelligent, can interact/modify responses; single AI search per agent; infrastructure setup required; custom MCP server registration Agents intelligent, interact/modify responses; AI search via APIs (Google/Bing); in-built MCP server; complex infrastructure; slower responses as results batch sent together Development Approach Low-code, faster agent creation, SDK-based, easier experimentation Manual coding using MLflow library, more customization, API integration, higher chance of errors, slower build Models Comparison 90 models, Azure OpenAI (GPT-3.5, GPT-4), multi-modal ~10 base models, OSS & partner models (Llama, Claude, Gemma), many models don’t support tool usage Knowledge Source One knowledge source of each type (adding new replaces previous) No limitation; supports data cleaning via Medallion Architecture; SQL-only access inside agent; Spark/PySQL not supported in agent Memory / Context Window 8K–128K tokens (up to 1M for GPT-4.1) Moderate, not specified Modalities Text, code, vision, audio (some models) Likely text-only Special Enhancements Turbo efficiency, reasoning, tool calling, multimodal Varies per model (Llama, Claude, Gemma architectures) Availability Deployed via Azure AI Foundry Through Databricks platform Limitations Only one knowledge source of each type, infrastructure complexity for MCP server No multi-modal Spark/PySQL access, slower batch responses, limited model count, high manual development 7. Lessons Learned: … Continue reading Inside SmartPitch: How CloudFronts Built an Enterprise-Grade AI Sales Agent Using Microsoft and Databricks Technologies
