Why Bigger is Better and Scale-bound Approaches Limit Business Value

Fill out the form to Download the Whitepaper

Whitepaper

Why Bigger is Better: 10 Facts from Theory to Practice

The very first of the Vs of Big Data" was Volume. Is it true that bigger data is better? It turns out the answer is yes, based on 10 mathematical and empirical facts, from which we can derive the following conclusions about how to approach big data:

  • Training on a subsample of the data is giving up measurable predictive power, and thus
    business value.
  • Performing cross-validation on a subsample is incorrect.
  • When a dataset contains rare objects or values, which is not uncommon, subsampling can
    be disastrous.
  • Performing approximate training in order to handle large datasets may not be gaining any
    benefit from the data size.
  • Training simple models on large datasets may not be gaining any benefit from the data
    size.

Instead of current attempts at workarounds, one should train appropriately complex models with the maximum amount of data possible, as accurately as needed, leveraging recent computational advances.

About Skytree

Skytree – The Machine Learning Company® is disrupting the Advanced Analytics market with our enterprise-grade machine-learning platform that gives organizations the power to discover deep analytic insights, predict future trends, make recommendations and reveal untapped markets and customers. Skytree’s mission is to bring the power of state-of-the-art machine learning to everyone: including data scientists, developers and non-experts alike.

The Skytree machine-learning platform is built for speed and scalability, allowing users to build the most accurate machine-learning models, faster. Skytree machine-learning software delivers an end-to-end model building experience, from data preparation to model creation and deployment. Skytree automates the machine learning model-building process, saving you time. Machine-learning models using Skytree can be built using all of your data, both structured and unstructured, with no down sampling required.

© 2016 Skytree, Inc. All Rights Reserved. | 1731 Technology Drive, Suite 700, San Jose, CA 95110 | +1.408.392.9300