Machine learning doesn't have to be hard.

We've automated key parts of machine learning so that building models is scalable and simplified.


How does the Nexosis Machine Learning API work?

While there are nuances to each machine learning type, there are three essential steps to every machine learning task in our platform.

1. Upload your data
2. Run a session
3. Get results

It looks like this…


1. Upload your data

By using the DataSet and Views API Endpoints, you can upload and combine data from different sources to feed into our automated pipeline. The data you send needs to be tidy and useful. As with most things in engineering, garbage in, garbage out holds true. It's important that you understand your data and choose the appropriate features (factors that influence) to get the best results.


2. Run a session

Once your data is uploaded, next you want to choose the machine learning type you want to use to solve your problem.

Time-series forecasts and impact sessions are appropriate for data which has a naturally ordered timestamp associated with the data you are trying to forecast. Regression algorithms work well on problems where you require a continuous predicted variable from several other known variables.

No matter which you choose, the Nexosis API will go through a multi-stage elimination process to narrow down the appropriate algorithms to a handful of candidates that then are evaluated for performing the best fit on your data.

Now that you've selected your ML type our platform gets to work. There are several tasks that can kick-off at this point including data categorization, aggregation and imputation.

Metadata

By notating your data, our platform can return better predictions but in the cases where that is neglected or unknown, the platform will determine which type of data is present.

  • Numeric – A number either as an integer or float
  • Logical – Binary data (True/False or 0,1)
  • Numeric Measurement – Numeric measurement is for continuous variables which have a non-zero baseline and meaningful ranges such as temperature, wind speed, and volume.

Obviously, our platform knowing what type of data is in the columns you've provided can drastically help or hurt your results.

Smoothing

Often there are anomalies in datasets and these must be identified and removed as outliers (unless, of course, you are solving an anomaly detection problem).

Aggregation Strategies

Especially in time series data, a developer or data scientist might want to forecast at a granularity that is different than the raw data. While forecasting at a more granular level than the data is impossible, greater granularity can be easily achieved. Depending on your data, there are four different aggregation strategies that can be utilized.

  • Sum – A classic use of SUM is a roll-up. Rollups are performed when you have data rows that are more granular than the forecast you want to perform. Such as if you have hourly data but only want daily forecasts. In this case, the aggregation strategy would sum the columns for each day.
  • Mean – This takes the average of the data in a particular column.
  • Median – This would find the middle value regardless of how many integers of each there are. (5, 5, 10, 55 would return 25.)
  • Mode – The most common number would be used. (5, 5, 10, 55 would return 5.)

In each case, unless specified (do this, it helps) our platform analyzes your data and determines the aggregation strategy that makes the most sense.

Imputation Strategies

Sometimes data is submitted that has missing data which can create worthless predictions. To combat this and deliver the most accurate predictions, our platform will fill in missing values with one of four imputation strategies.

  • Zeros – A zero is inserted wherever data is missing
  • Mean – The platform inserts the average integer for that column of data
  • Median – Determines the middle value for your data set
  • Mode – Imputes the most common number

One-hot Encoding

This strategy enables data that is neither numeric or logical to become useful to the algorithm by transposing table cell into a series of logical data columns.

Scaling

Scaling makes sure that a predictor isn't overly favored in the algorithm.

Tuning

Tuning is the process of choosing the correct parameters for each algorithm. We use multiple techniques to do per algorithm tuning to select the best parameters for each according characteristics of your data and the performance of the algorithm.

Model Building

Using the data processed from all the steps above, we build many models to find the one that best fits your data. This is the "learning" part of machine learning.

Training, Testing, and Validation

We split your data into Train and Test sets and cross-validate the results to make sure we get a good fit with the data provided.


3. Get results

Finally, our platform packages the model up and deploys it to an API endpoint just for you including all the previous steps so that when the model is called it quickly generates results tuned from and for your dataset.


Let's build something smart, together.

Sign up now Schedule a demo