Obtenez par e-mail toute l'actualité Hortonworks

Une fois par mois, recevez les dernières idées, tendances, informations d’analyse et découvertes sur le Big Data.

cta

Démarrer

cloud

Prêt à débuter ?

Télécharger Sandbox

Que pouvons-nous faire pour vous ?

fermerBouton Fermer
January 10, 2018 | Rob Bearden | Executive Desk

2017 Year In Review

January 5, 2018 | Guest Author | Webinar, Webinar/Event

4 essential steps for managing sensitive data in your data lake

January 4, 2018 | Michael Lin | Thought Leadership, Hortonworks Case Study

Applying Big Data Streaming Analytics in the Real World

Viewing posts: From the Dev Team« Back to all

X
FILTRES
Tout
TECHNICAL
BUSINESS

All Topics















All Channels











RÉINITIALISER LES FILTRES

Understanding the basic functions of the YARN Capacity Scheduler is a concept I deal with typically across all kinds of deployments. While Capacity Management has many facets from sharing, chargeback, and forecasting the focus of this blog will be on the primary features available for platform operators to use. In addition to the basic features […]

Background Previous blog posts discussed The Matrix — a set of over 27 software components that need to work together as part of any big data infrastructure. The automation suite used to perform functional validation of these components consists of over 30,000 tests which are divided into 250+ logical groups (called splits). The splits are […]

At Hortonworks we are constantly striving to achieve high quality releases. HDP/HDF releases are deployed by thousands of enterprises and are used in business critical environments to crunch several petabytes of data every single day. So maintaining the highest standards of quality and investing in an infrastructure to support the repeatable standards of quality is […]

This is the second post in the Engineering @ Hortonworks blog series that explores how we in Hortonworks Engineering build, test and release new versions of our platforms. In this post, we deep dive into something that we are extremely excited about – Running a container cloud on YARN! We have been using this next-generation […]

One of the most exciting new features of HDP 2.6 from Hortonworks was the general availability of Apache Hive with LLAP. If you missed DataWorks Summit you’ll want to look at some of the great LLAP experiences our users shared, including Geisinger who found that Hive LLAP outperforms their traditional EDW for most of their […]

Our customers increasingly leverage Data Science, and Machine Learning to solve complex predictive analytics problem. A few examples of these problems are churn prediction, predictive maintenance, image classification, and entity matching. While everyone wants to predict the future, truly leveraging Data Science for Predictive Analytics remains the domain of a select few. To expand the […]

This is the introductory post in a blog series that explores how we in Hortonworks Engineering build, test and release new versions of our platforms. In this post, we introduce the basic themes and set context for deeper discussions in subsequent blogs. We at Hortonworks are very proud of the work we do. Along with […]

This blog has contributions from Mingliang Liu and Rajesh Balamohan. Late last year, we provided a brief history of Apache Hadoop support for Amazon S3. Our first focus of work was speeding up the read of S3-hosted data acting as a query input. That was followed by the write pipeline, as well as scaling and […]

This blog has contributions from: Vinod Vavilapalli, Wangda Tan, Gour  Saha, Priyanka Nagwekar, Sunil Govindan You have probably wondered what makes a self-driving car intelligent to process the live camera feeds, navigate the busy streets and distinguish objects on the streets, such as cars, trucks, traffic lights or pedestrians? A self-driving car is a perfect […]

Last week, in Part 3 of this blog series, we announced the GA of HDF 3.0 and let the cat out of the bag by introducing the new open source component called  Streaming Analytics Manager (SAM), an exciting new technology that helps developers, operators, and business analysts to build, deploy, manage, monitor streaming analytics apps. SAM […]

  Thank you for reading our Data Lake 3.0 series! In part 1 of the series, we introduced what a Data Lake 3.0 is. In part 2 of the series, we talked about how a multi-colored YARN will play a critical role in building a successful Data Lake 3.0. In part 3 of the series, […]

As part of the product management leadership team at Hortonworks, there is nothing more valuable than talking directly with customers and learning about their successes, challenges, and struggles implementing their big data and analytics use cases with HDP and HDF. These conversations provide more insight than any analyst report, white paper, or market study. In […]

R is one of the primary programming languages for data science with more than 10,000 packages. R is an open source software that is widely taught in colleges and universities as part of statistics and computer science curriculum. R uses data frame as the API which makes data manipulation convenient. R has powerful visualization infrastructure, […]

Large-scale Machine Learning The ability to learn without being explicitly programmed, Machine Learning, has been around for a long time and is well understood. What is different is the relatively recent emergence of general purpose tools, such as Apache Spark, that enable processing of very large datasets. Additionally, data scientists can now collaborate and rapidly […]

The 2014 Yahoo email hack is a good illustration how a big data security analytics platform such as Apache Metron can make it easier to detect, investigate, assess, and remediate threats in your environment.  In this article I will describe how to setup and configure Apache Metron to detect a recent cyber attack on Yahoo, […]