Systems - Howard: Measuring system quality

Címkék: howard system backtest

2012.03.26. 15:59

Measuring System Quality

This article is in response to several emails and a comment to my article Distributions. This comment summarizes the question nicely:

Great post on distributions. Can you comment on using Van Tharp’s System Quality Number as an indicator of tradability? It uses only mean and standard deviation.

As you read my book, Modeling Trading System Performance, and my Distributions article, you will understand why I recommend using the distribution of profit and distribution of drawdown to measure system quality. The drawback of performing those analyses is that they require time and effort beyond a quick look at the performance results. Are there some quick and dirty metrics that can be easily computed from the system performance reports?

The ever-present caution to use only out-of-sample data to estimate future performance

If you are estimating future performance, it is important to use out-of-sample data. Using in-sample data for the analysis will over-estimate profit potential and under-estimate risk. How much the use of in-sample results distort the true results depends on whether the system does identify profitable trading opportunities or has simply fit the noise.

Why the Distribution is Important

To do a complete analysis, you cannot know the profit potential without knowing the position sizing. You cannot know the position sizing without knowing the risk. You cannot know the risk without analyzing the distribution of drawdown where each trade was made using a basic unit.

t-statistic

The t-statistic measures (among other things) the difference between a sample and its population, or between two samples. The t-statistic, as we use it here, is a measure of the probability that the mean of a sample of data is different than zero. Values are computed, then compared with well known critical values published in tables. When using a t-statistic, there are some assumptions, such as that the samples follow the Normal distribution, but they do not restrict most of the practical situations we will analyze.

You can compute the t-statistic.

$t = \frac{\mu}{\sigma}\sqrt{n}$

$\mu$ , mu, is the mean; $\sigma$ , sigma, the standard deviation; and n the number of data points.

Mean

There are two means related to a trading system. The calculation of the mean for a sample of n trades is straightforward.

The arithmetic mean (or average) is the traditional average. It is computed as the sum of the elements divided by n, the number of elements.

The geometric mean has particular importance when analyzing trading systems. In general, the geometric mean is computed as the nth root of the product of the elements. Specifically for trading systems, each data point is the percentage gained or lost by that trade. To compute the geometric mean, all data values must be positive. So express each percentage gain or loss, g, as 1 + g — a 2.5% gain would be 1.025, a 1.8% loss would be 0.982.

Standard Deviation

Standard deviation is a measure of the variability of the data. Standard deviation is the square root of the variance.

It is computed as the square root of the sum of the squared deviations of each data element in a sample from the mean of that sample.

Relating average, variability, and number of trades

Read Ralph Vince, Portfolio Management Formulas for an excellent discussion of the relationship between the average, a, geometric mean, g, and standard deviation, sd. To summarize him:

$g = \sqrt{a^2 + sd^2}$

Knowing g is important because the growth of the trading account depends on it. Terminal wealth relative, TWR, is the balance of the trading account at the end of the computation period, as a multiple of the initial wealth. It is computed as:
$TWR = (1 + g)^n$
where g is the geometric mean of percentage gain per trade, and n is the number of trades in the period.

Two trading systems that have equal values of a, sd, and n (or, equivalently, g and n) will have the same TWR. But knowing a, g, sd, and n is not sufficient to describe risk, since the path from initial wealth to terminal wealth depends on the variability of the trades.

System Quality Number

In Dr. Tharp’s book, Definitive Guide to Position Sizing (DGPS), he defines a variation of the t-statistic, which he calls SQN, as:
$SQN = \frac{expectancy}{StDev(R)}\sqrt{n}$

His definition of expectancy differs significantly from the standard definition, and R is his notation for the ratio of gain of a trade to the risk of that trade. N is the number of trades. He places an arbitrary upper limit on acceptable values of N.

Because of the modifications, SQN values cannot be treated as t-statistics, so cannot be looked up in the tables. They can, however, be used in comparing alternatives. Ignoring these differences, DGPS does treat SQN as if it is a t-statistic and leads the reader to unrealistic expectations of trading system performance.

DGPS gives examples, all hypothetical, and all as trade lists, of system results that have SQN values ranging from negative up to 7, and talks about systems with SQN of 10 or higher. If you treat R as if it is the geometric mean, then the t-statistic correlates well with SQN. For SQN values from 1 to 3, the t-statistic computed from the same data is about 3% higher than SQN. For higher SQN, the t-statistic is 10% to 40% higher than SQN.

I recommend Dr. Tharp’s books, and I will leave it to the reader to learn more about his definitions of expectancy, R, and SQN from them. And to determine whether systems with high SQN values are attainable.

Expectancy

The traditional, and mathematically correct, definition of expectancy as applied to a trading system is the average gain for a sample of trades. Measurement can be either a dollar amount or a percentage; and either the arithmetic average or the geometric average. To be most easily used when calculating TWR, expectancy will be the geometric mean of percentage gain per trade.

It is well known that a trading system can be profitable in the long run only if it has a positive expectancy. No money management or position sizing method can turn a system that has a negative expectancy into a profitable system,. See Ralph Vince’s books or my Modeling Trading System Performance for a detailed discussion.

t-statistic of Expectancy

The t-statistic can be computed for almost any metric. For example, gain per trade, length of time per trade, maximum adverse excursion, etc. For this article, the t-statistic is computed for geometric mean of percentage gain per trade — expectancy.

As the t-statistic is a measure of the probability that the mean of a sample of data is different than zero, the t-statistic of expectancy measures the probability that expectancy is different than zero. Our particular interest is in gaining confidence that it is greater than zero. The higher the t-statistic, the higher our confidence. A look at a table of critical values for the t-statistic for a one-tailed test shows that with 95% confidence (that is, at the 0.05 level), the mean of a sample of 30 trades is greater than zero if the t-statistic is greater than 1.70. The table is organized to show critical values based on levels of confidence and on sample size. It is up to the trader to decide what level of confidence is necessary before placing trades. And recall the earlier warning to base the decision on the best estimate of future results, not on in-sample results.

The minimum number of trades for which the computation of a t-statistic is defined is 2. The smaller the sample size, the higher the value of the t-statistic must be to achieve a given level of confidence. Small samples are more likely to deviate from the Normal distribution. The further the distribution of the sample from Normal, the less reliable the result. The larger the sample size, the easier it is to estimate the true mean of the population and the less important deviation from Normal matters. There is no magic in the number 30, but a sample with at least 30 trades usually gives reasonable results.

In general, any sample with a t-statistic greater than about 2.0 is probably going to be profitable in the long run. The next few sections run some tests for sets of trade results of varying t-statistics. Keep in mind the cautions I made in my Distributions article. The t-statistic uses only two metrics – mean and standard deviation. Samples with the same t-statistic can have very different trading performance – particularly drawdown.

Keep firmly in mind that knowing the mean and standard deviation tells us very little about the other moments or the more general distribution of trades. In particular, the number and size of losing trades is critically important, particularly as position size is increased.

Performance Related to t-statistic

It is well known that the distribution of price changes does not follow the Normal distribution — the tails of the stock price change data are much heavier. Heavier tails mean more large changes, and more and deeper losing trades than if the data were Normal. Study of the historical changes in stock prices is a topic for another article. Nevertheless, it is convenient to use the Normal distribution to examine the relative performance of several trading systems that differ only by their t-statistic.

t-statistic of 1.5 – Normally distributed

Using a list of 64 trades, the ratio of mean to standard deviation must be 0.1875 for the t-statistic to be 1.5. We get that from this relationship:
$ratio = \frac{\mu}{\sigma}$
$1.50 = \frac{\mu}{\sigma}\sqrt{64}$

ratio = 1.5 / 8 = 0.1875

Assuming that the geometric mean of the 64 trades is 2%, the standard deviation must be 10.66% for the ratio to be 0.1875. Several 64-trade sequences were drawn from a Normal distribution with a mean of 2% and standard deviation of 10.66%. Assuming all funds were used to take each position, the equity curve is calculated. A straw broom chart of 10 randomly selected equity curves is shown in the following chart. The theoretical average terminal wealth, TWR, is 1.02 ^ 64 which is 3.55 — the red dotted line. The thick black line is the average of the 10 runs. Note that 2 of the 10 had TWR below 1.0, meaning the account was lower at the end of the 64 trades than at the beginning, 2 broke even, 1 showed a slight gain.

t-statistic of 2.0 – Normally distributed

Using a list of 64 trades, the ratio of mean to standard deviation must be 0.25 for the t-statistic to be 2.0. A straw broom chart for 10 randomly selected equity curves is shown on the following chart. Note that the theoretical equity curve is identical to the previous chart. With a smaller standard deviation, all 10 runs result in a gain after 64 trades – some small, others substantial. All are equally likely.

Standard deviation is typically 3 to 4 times the mean, implying a ratio of 0.33 to 0.25, so a t-statistic of 2.0 is achievable.

t-statistic of 3.0 – Normally distributed

As the variability decreases, the standard deviation decreases, the ratio of mean to standard deviation increases, and profit become less random. The ratio is 0.375. The following chart shows 10 equally likely equity curves for a sample of trades that produce a t-statistic of 3.0. The theoretical growth is the same as the two earlier charts. The distribution of account value after 64 trades is narrowing.

t-statistic of 8.0 – Normally distributed

Right, 8. This chart is included to illustrate how the variability narrows, and to humor anyone who thinks a system with a t-statistic of 8 is even remotely possible.

The t-statistic has the same characteristics as the z-score. The probability of having a sample with a t-statistic of 8 is the same as the probability of finding a data value 8 times the standard deviation above the mean – both so unlikely as to appear impossible for data following the Normal distribution. But notice how narrow the distribution of results has become.

Drawdowns are the Critical Factor

Traders stop trading systems when losses exceed their personal tolerance. We can get an indication of the effect of having fewer and shallower losing trades by making a few more simulation runs.

Similar Metrics Do Not Tell the Whole Story

Two hypothetical, but realistic, trade lists were created. Each represents the results of a trading system. Each has 380 trades.

The means are equal: 1.39% per trade.
The standard deviations are equal: 7.77%
The maximum losing trades are equal: -18.0%

Consequently, the t-statistic is the same for the two systems.

One system produced fewer losing trades.

To establish the basic risk, 500 trades are selected at random and used to compute a running account balance or equity curve. To avoid the distortion that trade sequence, compounding, and leverage introduce, each trade is made with $10,000. This is the basic unit I regularly use when analyzing stock and ETF systems. Making 1000 of these runs allows creation of the distribution of terminal wealth, TWR, and maximum drawdown as a percentage of highest equity to date. The initial account balance is $100,000, which guarantees there is always enough cash to take all positions. The next chart shows the distribution of maximum drawdown.

Note that maximum drawdown is deeper across the entire distribution for the system that has more losing trades.

Assume a trader has confidence in this system, is willing to trade it, and that the system continues to be healthy and perform throughout the trading as it did in validation. (Consideration of system health is extremely important, but is the topic for a different article. See MTSP for details.) She wants to determine the position size that will maximize TWR while being 95% certain that the maximum drawdown, MaxDD, will not exceed 30%. In keeping with the terminology in MTSP, this is referred to as DD95 less than 30%.

The initial account balance is $100,000. If DD95 is greater than 30% when using $10,000 per trade, risk is already above the trader’s tolerance and no advanced position sizing is possible.

For each system:

Change the simulation from fixed $10,000 per trade to a fraction of the account balance per trade.
Make a series of runs, varying the fraction.
For each fraction, examine the distribution of MaxDD.
Determine the largest fraction where MaxDD at the 0.95 level of cumulative probability is less than 30%.
Create the distribution of TWR for that fraction.

The next chart shows the resulting distribution of TWR for the two systems.

Note that TWR is higher across the distribution for the system with fewer losing trades.

The fraction at which the 95th percentile MaxDD was 30% was:

0.29 for the system with fewer losing trades.
0.26 for the system with more losing trades.

Being able to commit a higher proportion of the account balance to each trade enabled the system with fewer losing trades to generate higher account growth with no more risk.

TWR can be converted into Compound Annual Rate of Return, CAR, if the time period covered is known. Assume these systems trade about once per week, so the 500 trades represents 10 years. The CAR equivalent to each TWR is shown on the chart. CAR for the system with fewer losing trades is consistently higher across the distribution.

Interpretation of the TWR distribution, using the system with fewer losing trades, is:

Average CAR is 21.8%.
CAR will be greater than 11.4% in 19 of 20 500-trade sequences.
CAR will be less than 32.4% in 19 of 20 500-trade sequences.
With 90% confidence, CAR will be between 11.4% and 32.4%.
All of the sequences that are represented in the distribution are equally likely.

Semi-deviation

There is an alternative measure of variability that can be used in place of standard deviation — it is semi-deviation, sometimes known as downside deviation.

Since standard deviation is in the denominator of the calculation of the t-statistic, increases in it decrease the value of t. Any increase in variability of trade gains increases standard deviation. Consequently, winning trades, trades greater than the mean, are penalized as much as losing trades the same distance below. Using semi-deviation in place of standard deviation ignores gains and does not penalize them.

There are two ways to calculate semi-deviation:

Omit positive data elements.
Change positive values to zero.

If you omit positive elements, skip over them if you are processing element by element, or omit them if you have access to the entire sample. If the original number of data points was n, the number of data points used in calculation of semi-deviation will be less than n. Have your code check for values less than 2 to avoid divide-by-zero errors.

If you change positive values to zero, n will remain unchanged no matter what the mix of positive and negative values is. Make a copy of the data, substituting zero for all positive values, then compute the standard deviation of the transformed data. Or, if you are processing element by element, whenever a positive value is passed to your routine, substitute zero for it.

Use semi-deviation in place of standard deviation to compute a modified t-statistic. (Whether you are using standard deviation or semi-deviation to compute t, use the mean of all data points, both positive and negative.) You will not be able to use standard tables to estimate probabilities. But you will be able to use the relative values to rank alternative systems.

Use of semi-deviation is well accepted. For example, it is used in calculation of the Sortino Ratio, a metric similar to the Sharpe Ratio. Where the Sharpe Ratio uses standard deviation, the Sortino Ratio uses semi-deviation.

Conclusion

The important points are:

Knowing the mean, standard deviation, and maximum losing trade is helpful, but knowing the distribution is better.
Systems with the same t-statistic can, and will, have different trading characteristics.
It is important to limit losing trades.
Using the procedure described in this article, position size can be determined in accordance with the trader’s personal risk tolerance.

Facebook Tumblr Tweet Pinterest Tetszik

ejszakai bagoly

Szólj hozzá!

A bejegyzés trackback címe:

https://vilagbagoly.blog.hu/api/trackback/id/tr394341435

Kommentek:

A hozzászólások a vonatkozó jogszabályok értelmében felhasználói tartalomnak minősülnek, értük a szolgáltatás technikai üzemeltetője semmilyen felelősséget nem vállal, azokat nem ellenőrzi. Kifogás esetén forduljon a blog szerkesztőjéhez. Részletek a Felhasználási feltételekben és az adatvédelmi tájékoztatóban.

Nincsenek hozzászólások.

Gondolatok

a tőzsdéről, de leginkább a mechanikus kereskedésről...

Gondolatok

Reference systems

Top 5

Tags

Useful sites

Download

Archive

Historical data (EOD stocks/options)

Egyéb

Systems - Howard: Measuring system quality

Címkék: howard system backtest

2012.03.26. 15:59

ejszakai bagoly

Szólj hozzá!

A bejegyzés trackback címe:

Kommentek:

Gondolatok

a tőzsdéről, de leginkább a mechanikus kereskedésről...

Gondolatok

Reference systems

Top 5

Tags

Useful sites

Download

Archive

Historical data (EOD stocks/options)

Egyéb

Systems - Howard: Measuring system quality

Címkék: howard system backtest

2012.03.26. 15:59

ejszakai bagoly

Szólj hozzá!

Ajánlott bejegyzések:

A bejegyzés trackback címe:

Kommentek: