Latest From Our Blog

Creating Factor Sets and Dimension Sets

These are the things to consider when creating Factor Sets and Dimension Sets in Models 

Factor Sets
This is a set of user-specified factors that will be included in a model.  Factor Set were designed such that  generally, many factors would be applicable to all races, such as jockey and trainer ratings.   However, there are some types of races where one factor may be more important than others.  Such as in maiden races or those races that are restricted to two year olds, the number of prior races may be more important in this type of race.  Also, several factors are differentiated by fast  or off- track.  So factors sets can be created intended for use on off-tracks.  Keep in mind here that some horses may not have run on an off-track so results are not always complete due to lack of data.

Correlations – Factor usage in the models are weighted by correlation to actual results of the races used to compute the model.  The higher the correlation, the higher the weighting of the factor.  In the detailed horse comparison report, each factor is listed, along with the correlation.  This gives an idea of how important is each factor in the model.  This report can be run to include all possible factors or just the factors used in the model.  This allows refinement of the model, if desired, to use those factors with the highest correlations.

Although Factor Sets are not specifically designed to be track or regions specific, there are significant differences across regions for some factors such as Class.  Class definitions are consistent across all North American tracks, so an Allowance type race has the same specific Class rating at all tracks.  But are they all really the  same?  For example: do Allowance races at Lone Star park have the same caliber of horses as Allowance races at Belmont or Aqueduct?  For horses racing within he same circuit, the answer in generally yes.  However when crossing circuit/geographic regions, this may not be the case.

Specifically, using the Track Analytics section of the website and running a Race Type Comparison Report between Belmont and Lone Star  Park for all races run in the past 12 months, it shows that for Allowance races the average purse size at Belmont is $48,924.93 while it is about $24,830.30 for Lone Star, this is a significant difference.  It is likely that the caliber of horses are not the same.  Purse may be a better factor to use that Class.

Dimension Sets
Dimensions specify which types of races will be used to calculate the correlation of the factors in the model, and so this can only be applied to races which match on all dimension.  Dimension are:

  • Track/Geographic Region
  • Surface
  • Track condition
  • Class
  • Distance

How broad or narrow each of the dimensions ranges should be depends how many models one may want to create and what track/regions one may want to play.  Generally, the narrower and more specific the dimension range, the better the model.  If you are interested in betting only on one or two tracks, it would better to build models based on only races from those track, and go back further in history (since there are fewer races), say 12 months rather than 3.  If you want use a specific model on many tracks, probably broader ranges with only three month of history is better.

For Class and Distance Dimension, the more narrow the range the better.  Keep in mind that some dimensions may vary quite a bit across tracks.  Average claiming prices vary quite a bit across states and racing circuit.  So it may make sense to have very narrow ranges at the low end of the claiming price spectrum, and broader ranges at the higher end of the claiming price spectrum.

These are the things to consider when creating factor, dimensions and models.

Leave a Reply