Obtenez par e-mail toute l'actualité Hortonworks

Une fois par mois, recevez les dernières idées, tendances, informations d’analyse et découvertes sur le Big Data.


Sign up for the Developers Newsletter

Une fois par mois, recevez les dernières idées, tendances, informations d’analyse et découvertes sur le Big Data.




Prêt à débuter ?

Télécharger Sandbox

Que pouvons-nous faire pour vous ?

* Je comprends que je peux me désabonner à tout moment. J'ai également compris les informations supplémentaires fournies dans la Politique de confidentialité de Hortonworks.
fermerBouton Fermer
diapositive précédente
Three Keys to a Successful Cloud Transformation Road Map
January 31, 2019

Data Mining in Action: Should You Outsource Data Science?

auteur :
Mark Samuels

Access to data science has been democratized. It’s now possible to use high-powered, cloud-based tools to create algorithms that can be used for specific tasks like data mining, language processing, and predictive modeling.

Application development relating to machine learning (ML) is becoming more accessible, too. The tool your team creates using an algorithm may become a prototype for a fully fledged application that becomes a product. Emerging technology, in short, is no longer a barrier to your business making the most of data science.

The growing use of open source products like Apache Spark means it’s easier than ever for individuals and businesses to build and use algorithms in applications. But the algorithm is simply the starting point. Any ML, artificial intelligence (AI), or deep learning product is only as good as the data it draws upon.

Using Data Safely and Responsibly

Data must be accessible, secure, and reliable to be useful to your business. While data mining technology is becoming more widely available, gaining access to the information you require is far from straightforward. In fact, the biggest barrier to entry when it comes to data science is often information acquisition.

Think of how complicated it can be to draw on information in sensitive areas like medical and government databases: the data may be private or require waiting for permissions, and thus arrive too late to be of use to your business. Then think of the challenge of data mining in a specific business domain. If you’re going to run algorithms against the information your business holds, who will provide that data? How will you use that data, and how do you know it’s reliable?

Your business must ensure the information it uses is compliant with data laws. It’s in your best interest to understand the implications of legislation like the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). Storing data without following these regulations could leave your business at risk of legal exposure.

Rather than taking this risk, your organization must keep track of the various ways its information is collected, stored, and exploited. Information collected via data mining should be safeguarded under the same rigorous rules that apply to any other digital asset.

Using External Partners to Build Your Capability

External expertise is critical when it comes to managing data collection and use. Look for a partner who understands data lineage: its origin, its destination, and how and where it moves over time. Your partner should help your business track how information is used throughout the data mining process—from ingestion to transformation and modification.

In an age where algorithms are easily found and repurposed, your business must ensure that it knows who created a data model and how that model morphs over time. Having access to external expertise and end-to-end-tracking will help ensure your business knows information comes from a trusted source and is safe to use.

Visibility across the whole data platform is key. Being able to see data at all stages helps your organization take note of ethical considerations like bias and consent that can undermine data reliability. Make sure your data science team receives feedback on the information it uses and that every person understands the importance of reducing risk.

Most businesses will not have a dedicated data science team from the outset, however. Experts in this young field are tough to find. Estimates suggests that in the U.S. alone the number of roles for all data professionals will reach 2.7 million by 2020, with the average role remaining open for 45 days—one workweek longer than the market average.

Making the Most of Professional Services Capability

The search for data professionals is even more challenging when it comes to domain expertise. A talented data scientist who also knows how to use algorithms in your business setting represents a key competitive advantage.

Your business will likely have to search externally for domain experts. Outsourcing will give you the opportunity to start your data science projects safely and effectively, and using external professional services expertise will help your company develop its own data science capabilities. Professional services experts work across a range of sectors and can use their deep domain knowledge to guide your firm in the right direction.

Once you’ve found your footing in data science, your business can start to develop an internal data science team. That team will help ensure that your organization uses data of the highest possible quality, and that it uses it to build trustworthy algorithms and products.

Taking Advantage of Experience

Professional services will allow your business to gain crucial access to the rarest ingredient for data science success: deep sector knowledge. With the awareness of what your competitors are studying and producing (and when), your business can create, test, and hone algorithm-based products that will give you a competitive advantage. If your business doesn’t start making moves in this direction, it risks being left behind—so start looking into data science capabilities now.

To learn more about the keys to successful data science initiatives, download this white paper.

Laisser une réponse

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués par une *