newsletter

Obtenez par e-mail toute l'actualité Hortonworks

Une fois par mois, recevez les dernières idées, tendances, informations d’analyse et découvertes sur le Big Data.

AVAILABLE NEWSLETTERS:

Sign up for the Developers Newsletter

Une fois par mois, recevez les dernières idées, tendances, informations d’analyse et découvertes sur le Big Data.

cta

Démarrer

cloud

Prêt à débuter ?

Télécharger Sandbox

Que pouvons-nous faire pour vous ?

* Je comprends que je peux me désabonner à tout moment. J'ai également compris les informations supplémentaires fournies dans la Politique de confidentialité de Hortonworks.
fermerBouton Fermer
December 03, 2018
diapositive précédenteDiapositive suivante

Getting the Most Out of Your Data in the Cloud with Cloudbreak

auteur :
Jon Dybik

There are three common abilities across the cloud providers that I want to focus on and to see how they work together and build on each other to help you maximize agility and data insights in the cloud. They are: cloud storage, running workloads on demand, and elastic resource management. In addition, we’ll talk about how you can pull this all together with Hortonworks Cloudbreak on a path towards big data insights in the cloud.

Let’s start with cloud storage. Cloud storage is key and lays the foundation to take full advantage of the other abilities we’ll talk about. Simply put, cloud storage is elastic and HDFS is not. This is critical when capacity planning for a shared data environment can be tough to get right and you commonly end up with costly ad-hoc provisioning of unplanned resources or suffer from low resource utilization due to costly up-front provisioning. Cloud storage’s pay-as-you-go model allows you to effectively manage cost as your storage needs grow. All the while the cloud storage provider is provisioning resources under the hood and transparently to you.

However, a big benefit with cloud storage is we can now separate storage from compute, and as a result, we can now launch use case specific workloads on demand in a shared data environment. For example, this separation of compute and storage allows for different Apache Spark applications such as a data engineering ETL job and an ad-hoc data science model training cluster to run on their own clusters, preventing concurrency issues that affect multi-user fixed-sized Apache Hadoop clusters. This separation and the flexible accommodation of running disparate workloads on demand not only lowers cost but also improves the user experience.

Now that you have separated storage from compute and disparate workloads to run on demand, you can truly take advantage of the elasticity that the cloud provides on a level of granularity that makes business and technical sense. For example if you have a cluster experiencing YARN memory saturation or a need to increase data read throughput, you can simply scale up the existing cluster or launch a new, larger cluster for a smaller period of time to handle the increased workload and meet business demand.

How do we tie all this together and operationalize our big data environment in the cloud? That’s where Cloudbreak comes in. Cloudbreak simplifies the deployment of big data workloads with cloud storage on cloud providers such as Amazon Web Services, Microsoft Azure, Google Cloud Platform, and OpenStack. It is easy to get started with Cloudbreak and you can use the wizard interface to deploy your first Hadoop cluster in 6 easy steps using one of our prebuilt Apache Ambari blueprints for data science, EDW, or ETL style workloads. When you are ready to take things to the next level, Cloudbreak is full of enterprise features including:

  • A CLI and API to automate cluster provisioning or integrate with other orchestration and cloud management tools.
  • Security first architecture for deploying kerberized clusters, integrating your cluster with LDAP/Active Directory for authentication, and the ability to protect your cluster with a secured gateway powered by Apache Knox.
  • Support for your unique configuration needs through custom Ambari blueprints, custom OS images, and injecting your own scripts into the cluster build process.
  • Auto-scaling of workloads based on time of day and Ambari metrics to optimize for peak workload demand and cost.

To learn more about Cloudbreak and and see a live demonstration, please join us for an upcoming webinar on December 5th. Details can be found here.

Comments

Han says:

Thank you for the article! Super useful, will have a look into it.
Have you ever used MyAirBridge? It’s an amazing service and it’s a pity it is not mentioned often, I would love to read what you think.

Laisser une réponse

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués par une *