4 Reasons Why Azure Databricks May Be in Your Future

Azure Databricks delivers Apache Spark-based analytics to the Microsoft Azure cloud.

 

Azure Databricks is unique collaboration between Microsoft and Databricks, forged to deliver Databricks’ Apache Spark-based analytics offering to the Microsoft Azure cloud. Databricks was founded by the creators of Apache Spark with the goal of helping clients with cloud-based big data processing. Apache Spark is an open source cluster-computing framework running on top of Scala that provides an interface and foundation for programming entire clusters with integrated fault tolerance and parallelism. Databricks set the record in late 2014 for performance in large-scale sorting performance. It’s blazing fast. This collaboration and integration natively integrates Apache Spark’s performance and redundancy with Azure’s security and wide variety of product offerings for data storage, processing, analytics, and best-in-class Power BI analytics insights reporting. Is Azure Databricks right for your company? Here are three reasons why it may be.

 

1. Get Started Quickly

 

With Azure Databricks, you can be developing your first solution within minutes. Once in the Azure Portal, you simply select Databricks under the Analytics heading and you’re ready to set up your first workspace, create a cluster and import Notebooks. Azure Databricks removes the difficulties and headaches from managing and configuring clusters. Utilizing Databricks Serverless and choosing Autoscaling takes that burden on itself–automatically–as your workloads and data sources need to scale. It’s easy to create and clone clusters to allow for branching of work efforts when the need arises. Everything from creation of your first cluster to security and billing is unified and as seamless as you’d expect from Microsoft Azure cloud platform.

 

2. Collaboration and Integration

 

Azure Databricks provides a single collaborative workspace for those involved in providing insights to customers and companies looking to gain a competitive edge on their competition, solve wide-ranging societal issues, or simply find ways to allow their companies to run more efficiently through data analysis, data science, artificial intelligence (AI) and machine learning (ML). This workspace is where data engineers can transform static and streaming data from seamlessly integrated data sources such as Azure SQL DB, Kafka on HDInsight, Azure SQL Data Warehouse, Cosmos DB, Azure Data Lake, Azure Blob Storage and Azure Event Hub. This is the same workspace where data scientists can develop models for AI and ML against those transformations, and where analysts can turn those findings and models into mission-critical reports, charts, graphs and more in Power BI. They can also write their own SQL queries against Databricks notebooks to visualize data in Power BI or Tableau for self-service business intelligence. Data sources are not limited to just Azure offerings, either. Azure Databricks can work with data sourced from Couchbase, Elasticsearch, CSV files, JSON files, Redis and more. A current list of applicable data sources is available here.

 

3. Universal Connectivity to Azure Storage Services

 

Second, Azure Databricks seamlessly connects to all the different Azure storage options. This includes the ability to read and write to file-based storage, like Blob storage and Azure Data Lake Store, as well as relational data stores, like Azure SQL Database/Data Warehouse, and NoSQL data stores, like Azure Cosmos DB. It also connects to streaming or event data sources in Azure, such as Event Hubs or Apache Kafka on HDInsight.

 

4. Security

 

All of these features and ease of use is for naught if the environment is not safe and trusted. Azure Databricks utilizes Microsoft Azure’s Active Directory  (AAD) security framework. AAD is a multi-tenant, cloud-based directory and identity management service that combines directory services, application access and identity protection into a single, core solution. It’s also extensible to any Windows Server Active Directory implementation for hybrid environments that run both on premises and in the Azure cloud. Four clicks is all it takes to integrate on-premise and Azure based AD. This allows for a secure workspace for all your data professionals to collaborate. Security is of utmost concern to any enterprise processing or collecting data to mine valuable insights; a solid framework such as AAD backing your high-performing analytics platform is more of a requirement than it ever has been. Those with administrative access can quickly and easily grant and revoke access at a fine-grained level so your employees’ performance isn’t negatively impacted by a security bottleneck.

By |2019-02-21T10:11:19+10:00February 8th, 2019|Machine Learning & AI|Comments Off on 4 Reasons Why Azure Databricks May Be in Your Future

About the Author:

Agile Analytics is a boutique consulting firm and a Microsoft Gold Partner in Data Platform and Data Analytics. At Agile Analytics, we consult, design and deliver innovative data analytics solutions that help you gain and sustain competitive advantage through data-driven culture.