Portfolio Management Theory
And Technical Analysis
Lecture Notes

Statistics Primer

By: Dr. Sam Vaknin

Malignant Self Love - Buy the Book - Click HERE!!!

Relationships with Abusive Narcissists - Buy the e-Books - Click HERE!!!

READ THIS: Scroll down to review a complete list of the articles - Click on the blue-coloured text!
Bookmark this Page - and SHARE IT with Others!

The Bill of Rights of the Investor

1. To earn a positive return (=yield) on their capital.

2. To insure his investments against risks (=to hedge).

3. To receive information identical to the that of ALL other investors - complete, accurate and timely and to form independent judgement based on this information.

4.To alternate between investments - or be compensated for diminished liquidity.

5. To study how to carefully and rationally manage his portfolio of investments.

6.To compete on equal terms for the allocation of resources.

7. To assume that the market is efficient and fair.


1. The difference between asset-owners, investors and speculators.

2. Income: general, free, current, projected (expectations), certain, uncertain.

3. CASE A (=pages 3 and 4)

4. The solutions to our FIRST DISCOVERY are called: "The Opportunities Set"



6. The OPTIMAL SOLUTION (=maximum consumption in both years).

7. The limitations of the CURVES:

  1. More than one investment alternative;
  2. Future streams of income are not certain;
  3. No investments is riskless;
  4. Risk=uncertainty;



INVESTOR A has secured income of $20,000 p.a. for the next 2 years.

One investment alternative: a savings account yielding 3% p.a.

(in real terms = above inflation or inflation adjusted).

One borrowing alternative: unlimited money at 3% interest rate

(in real terms = above inflation or inflation adjusted).


Will spend $20,000 in year 1

and $20,000 in year 2

and save $ 0


Will save $20,000 in year 1 (=give up his liquidity)

and spend this money

plus 3% interest $600

plus $20,000 in year 2 (=$40,600)


Will spend $20,000 in year 1

plus lend money against his income in year 2

He will be able to lend from the banks a maximum of:

$19,417 (+3% = $20,000)


  1. That he will live on long enough to pay back his debts.
  2. That his income of $20,000 in the second year is secure.
  3. That this is a stable, certain economy and, therefore, interest rates will remain at the same level.


Rests on the above three assumptions (Keynes' theorem about the long run).

$19,417 is the NPV of $20,000 in one year with 3%.




{Money Saved in the First Year X (1 + the interest rate)}



1. The concept of scenarios (Delphi) and probabilities


{SHOW TABLE - p14}

3. The properties of the Mean Value:

4. The mean of the multiplications of a Constant in the yields equals the multiplication of the Constant in the Mean Value of the yields.

5. The Mean of the yields on two types of assets = The Sum of the Means of each asset calculated separately

{SHOW TABLE - p16}

6. Bi-faceted securities: the example of a convertible bond.

{SHOW TABLE - p16}

7. VARIANCE and STANDARD DEVIATION as measures of the difference between mathematics and reality.

They are the measures of the frustration of our expectations.

{Calculation - p17}


We will prefer a security with the highest Mean Value plus the lowest Standard Deviation.

9. The PRINCIPLE OF DIVERSIFICATION of the investment portfolio: The Variance of combined assets may be less than the variance of each asset separately.

{Calculation - p18}


  1. The yield provided by an investment in a portfolio of assets will be closer to the Mean Yield than an investment in a single asset.
  1. When the yields are independent - most yields will be concentrated around the Mean.
  1. When all yields react similarly - the portfolio's variance will equal the variance of its underlying assets.
  1. If the yields are dependent - the portfolio's variance will be equal to or less than the lowest variance of one of the underlying assets.

11. Calculating the Average Yield of an Investment Portfolio.

{Calculation - pp. 18 - 19}

12. Short - cutting the way to the Variance:

PORTFOLIO COVARIANCE - the influence of events on the yields of underlying assets.

{Calculation - p19}

13. Simplifying the Covariance - the Correlation Coefficient.

{Calculation - p19}

14. Calculating the Variance of multi-asset investment portfolios.

{Calculations - p19 - 20}







Diminishing Avoidance of absolute risk

Invests more in risky assets as his capital grows

Derivative of avoidance of absolute risk < Æ

Natural logarithm (Ln) of capital


Constant Avoidance of absolute risk

Doesn't change his investment in risky assets as capital grows

Derivative = Æ

(-1) (e) raised to the power of a constant multiplied by the capital


Increasing Avoidance of absolute risk

Invests less in risky assets as his capital grows

Derivative > Æ

(Capital) less

(Constant) (Capital squared)


Diminishing Avoidance of relative risk

Percentage invested in risky assets grows with capital growth

Derivative < Æ

(-1) (e) squared multiplied by the square root of the capital


Constant Avoidance of relative risk

Percentage invested in risky assets unchanged as capital grows

Derivative = Æ

Natural logarithm (Ln) of capital


Increasing avoidance of relative risk

Percentage invested in risky assets decreases with capital growth

Derivative > Æ

Capital - (Number)

(Capital squared)




1. The tests: lenient, quasi - rigorous, rigorous

2. The relationship between information and yield

3. Insiders and insiders - trading

4. The Fair Play theorem

5. The Random Walk Theory

6. The Monte Carlo Fallacy

7. Structures - Infra and hyper

8. Market (price) predictions

  1. The Linear Model
  2. The Logarithmic Model
  3. The Filter Model
  4. The Relative Strength Model
  5. Technical Analysis

9. Case study: split and reverse split

10. Do-s and Don't Do-s: a guide to rational behaviour


1. Efficient Market: The price of the share reflects all available information.

2. The Lenient Test: Are the previous prices of a share reflected in its present price?

3. The Quasi - Rigorous Test: Is all the publicly available information fully reflected in the current price of a share?

4. The Rigorous Test: Is all the (publicly and privately) available information fully reflected in the current price of a share?

5. A positive answer would prevent situations of excess yields.

6. The main question: how can an investor increase his yield (beyond the average market yield) in a market where all the information is reflected in the price?

7. The Lenient version: It takes time for information to be reflected in prices.

Excess yield could have been produced in this time - had it not been so short.

The time needed to extract new information from prices = The time needed for the information to be reflected.

The Lenient Test: will acting after the price has changed - provide excess yield.

8. The Quasi - Rigorous version: A new price (slightly deviates from equilibrium) is established by buyers and sellers when they learn the new information.

The QR Test: will acting immediately on news provide excess yield?

Answer: No. On average, the investor will buy at equilibrium convergent price.

9. The Rigorous version: Investors cannot establish the "paper" value of a firm following new information. Different investors will form different evaluations and will act in unpredictable ways. This is "The Market Mechanism". If a right evaluation was possible - everyone would try to sell or buy at the same time.

The Rigorous Test: Is it at all possible to derive excess yield from information? Is there anyone who received excess yields?

10. New technology for the dissemination of information, professional analysis and portfolio management and strict reporting requirements and law enforcement - support the Rigorous version.

11. The Lenient Version: Analysing past performance (=prices) is worthless.

The QR Version: Publicly available information is worthless.

The Rigorous version: No analysis or portfolio management is worth anything.

12. The Fair Play Theorem: Since an investor cannot predict the equilibrium, he cannot use information to evaluate the divergence of (estimated) future yields from the equilibrium. His future yields will always be consistent with the risk of the share.

13. Insider - Trading and Arbitrageurs.

14. Price predictive models assume:

(a) The yield is positive and (b) High yield is associated with high risk.

15. Assumption (a) is not consistent with the Lenient Version.

16. Random Walk Theory (RWT):

  1. Current share prices are not dependent on yesterday's or tomorrow's prices.
  2. Share prices are equally distributed over time.

17. The Monte Carlo Fallacy and the Stock Exchange (no connection between colour and number).

18. The Fair Play Theorem does not require an equal distribution of share prices over time and allows for the possibility of predicting future prices (e.g., a company deposits money in a bank intended to cover an increase in its annual dividends).

19. If RWT is right (prices cannot be predicted) - the Lenient Version is right (excess yields are impossible). But if the Lenient Version is right - it does not mean that RWT is necessarily so.

20. The Rorschach tendency to impose patterns (cycles, channels) on totally random graphic images.

The Elton - Gruber experiments with random numbers and newly - added random numbers.

No difference between graphs of random numbers - and graphs of share prices.

21. Internal contradiction between assumption of "efficient market" and the ability to predict share prices, or price trends.

22. The Linear Model

P = Price of share; C = Counter; ED P = Expected difference (change) in price

DP = Previous change in price; R = Random number

Pa - Pa-1 = ( ED P + D P/ ED P ) · ( Pa-1-c - Pa-2-c + R )

Using a correlation coefficient.

23. The Logarithmic Model

( log CPn ) / ( log CPn-1 ) = Cumulative yield CP = Closing Price

Sometimes instead of CP, we use: D P / (div/P) D P = Price change div = dividend

24. These two models provide identical results - and they explain less than 2% of the change in share prices.

25. To eliminate the influence of very big or small numbers -

some analyse only the + and - signs of the price changes

Fama and Macbeth proved the statistical character of sign clusters.

26. Others say that proximate share prices are not connected - but share prices are sinusoidally connected over time.

Research shows faint traces of seasonality.

27. Research shows that past and future prices of shares are connected with transaction costs. The higher the costs - the higher the (artificial) correlation (intended to, at least, cover the transaction costs).

28. The Filter (Technical Analysis) Model

Sophisticated investors will always push prices to the point of equilibrium.

Shares will oscillate within boundaries. If they break them, they are on the way to a new equilibrium. It is a question of timing.

29. Is it better to use the Filter Model or to hold onto a share or onto cash?

Research shows: in market slumps, continuous holders were worse off than Filter users and were identical with random players.

This was proved by using a mirror filter.

30. The filter Model provides an excess yield identical to transaction costs.

Fama - Blum: the best filter was 0,5%. For the purchase side -1%, 1,5%.

Higher filters were better than constant holding ("Buy and Hold Strategy") only in countries with higher costs and taxes.

31. Relative Strength Model

( CP ) / ( AP ) = RS CP = Current price AP = Average in X previous weeks

  1. Divide investment equally among highest RS shares.
  2. Sell a share whose RS fell below the RS' of X% of all shares Best performance is obtained when: "highest RS" is 5% and X% = 70%.

32. RS models instruct us to invest in upwardly volatile stocks - high risk.

33. Research: RS selected shares (=sample) exhibit yields identical to the Group of stocks it was selected from.

When risk adjusted - the sample's performance was inferior (higher risk).

34. Short term movements are more predictable.

Example: the chances for a reverse move are 2-3 times bigger than the chances for an identical one.

35. Brunch: in countries with capital gains tax - people will sell losing shares to materialize losses and those will become underpriced.

They will correct at the beginning of the year but the excess yield will only cover transaction costs, (The January effect).

36. The market reacts identically (=efficiently) to all forms of information.

37. Why does a technical operation (split / reverse split) influence the price of the share (supposed to reflect underlying value of company)?

Split - a symptom of changes in the company. Shares go up before a split was conceived - so split is reserved for good shares (dividend is increased). There is excess yield until the split - but it is averaged out after it.

38. There is considerable gap (upto 2 months) between the announcement and the split. Research shows that no excess yield can be obtained in this period.

39. The same for M & A

40. The QR Version: excess yields could be made on private information.

Research: the influence of Wall Street Journal against the influence of market analyses distributed to a select public.

WSJ influenced the price of the stocks - but only that day.

41. The Rigorous Version: excess yields cannot be made on insider information.

How to test this - if we do not know the information? Study the behaviour of those who have (management, big players).

Research shows that they do achieve excess yields.

42. Do's and Don'ts

  1. Select your investments on economic grounds.
    Public knowledge is no advantage.
  1. Buy stock with a disparity and discrepancy between the situation of the firm - and the expectations and appraisal of the public (Contrarian approach vs. Consensus approach).
  1. Buy stocks in companies with potential for surprises.
  1. Take advantage of volatility before reaching a new equilibrium.
  1. Listen to rumours and tips, check for yourself.

Profitability and Share Prices

  1. The concept of a the business firm - ownership, capital and labour.
  2. Profit - the change in an assets value (different forms of change).
  3. Financial statements: Balance Sheet, PNL, Cash Flow, Consolidated - a review.
  4. The external influences on the financial statements - the cases of inflation, exchange rates, amortization / depreciation and financing expenses.
  5. The correlation between share price performance and profitability of the firms.
  6. Market indicators: P/E, P/BV (Book Value).
  7. Predicting future profitability and growth.


  1. The various types of bonds: bearer and named;
  2. The various types of bonds: straight and convertible;
  3. The various types of bonds (according to the identity of the issuer);
  4. The structure of a bond: principal (face), coupon;
  5. Stripping and discounting bonds;
  6. (Net) Present Value;
  7. Interest coupons, yields and the pricing of bonds;
  8. The Point Interest Rate and methods for its calculation (discrete and continuous);
  9. Calculating yields: current and to maturity;
  10. Summing up: interest, yield and time;
  11. Corporate bonds;
  12. Taxation and bond pricing;
  13. Options included in the bonds.

The Financial Statements

1. The Income Statement

revenues, expenses, net earning (profits)

2. Expenses

Costs of goods sold I Operating expenses

General and administrative (G & A) expenses I (including depreciation)

Interest expenses


3. Operating revenues - Operating costs = Operating income

4. Operating income + Extraordinary, nonrecurring item =

= Earning Before Interest and Taxes (EBIT)

5. EBIT - Net interest costs = Taxable income

6. Taxable income - Taxes = Net income (Bottom line)

7. The Balance Sheet

Assets = Liabilities + Net worth (Stockholders' equity)

8. Current assets = Cash + Deposits + Accounts receivable +

+ Inventory current assets + Long term assets = Total Assets


Current (short term) liabilities = Accounts payable + Accrued taxes + Debts +

+ Long term debt and other liabilities = Total liabilities

9. Total assets - Total liabilities = Book value

10. Stockholders' equity = Par value of stock + Capital surplus + Retained surplus

11. Statement of cash flows (operations, investing, financing)

12. Accounting Vs. Economics earnings (Influenced by inventories depreciation, Seasonality and business cycles, Inflation, extraordinary items)

13. Abnormal stock returns are obtained where actual earnings deviate from projected earnings (SUE - Standardized unexpected earnings).

14. The job of the security analyst: To study past data, Eliminate ² "noise" and form expectations about future dividends and earning that determine the intrinsic value (and the future price) of a stock.

15. Return on equity (ROE) = Net Profits / Equity

16. Return on assets (ROA) = EBIT / Assets

17. ROE = (1-Tax rate) [ROA + (ROA - Interest rate) × Debt / Equity]

18. Increased debt will positively contribute to a firm's ROE if its ROA exceeds the interest rate on the debt (Example)

19. Debt makes a company more sensitive to business cycles and the company carries a higher financial risk.

20. The Du Pont system

ROE = Net Profit/Pretax Profit × Pretax Profit/EBIT × EBIT/Sales × Sales/Assets × Assets/Equity

(1) (2) (3) (4) (5)

21. Factor 3 (Operating profit margin or return on sales) is ROS

22. Factor 4 (Asset turnover) is ATO

23. Factor 3 × Factor 4 = ROA

24. Factor 1 is the Tax burden ratio

25. Factor 2 is the Interest burden ratio

26. Factor 5 is the Leverage ratio

27. Factor 6 = Factor 2 × Factor 5 is the Compound leverage factor

28. ROE = The burden × ROA × Compound leverage factor

29. Compare ROS and ATO Only within the same industry!

30. Fixed asset turnover = Sales / Fixed assets

31. Inventory turnover ratio = Cost of goods sold / Inventory

32. Average collection period (Days receivables) = Accounts receivables / Sales × 365

33. Current ratio = Current assets / Current liabilities

34. Quick ratio = (Cash + Receivables) / Current liabilities is the Acid test ratio

35. Interest coverage ratio (Times interest earned) = EBIT / Interest expense

36. P / B ratio = Market price / Book value

37. Book value is not necessarily Liquidation value

38. P / E ratio = Market price / Net earnings per share (EPS)

39. P / E is not P /E Multiple (Emerges from DDM - Discounted dividend models)

40. Current earnings may differ from Future earnings

41. ROE = E / B = P/B / P/E

42. Earnings yield = E / P = ROE / P/B

43. The GAAP - Generally accepted accounting principles - allows different representations of leases, inflation, pension costs, inventories and depreciation.

44. Inventory valuation:

Last In First Out (LIFO)

First In First Out (FIFO)

45. Economic depreciation - The amount of a firm's operating cash flow that must be re-invested in the firm to sustain its real cash flow at the current level.

Accounting depreciation (accelerated, straight line) - Amount of the original acquisition cost of an asset allocated to each accounting period over an arbitrarily specified life of the asset.

46. Measured depreciation in periods of inflation is understated relative to replacement cost.

47. Inflation affects real interest expenses (deflates the statement of real income), inventories and depreciation (inflates).

[Graham's Technique]


1. BOND - IOU issued by Borrower (=Issuer) to Lender

2. PAR VALUE (=Face Value)

COUPON (=Interest payment)

3. The PRESENT VALUE (=The Opportunity Cost)

1 / (1+r)n r = interest rate n = years



Pb = å C / (1+r)t + PAR / (1+r)n Pb = Price of the Bond

t=1 C = Coupon

PAR = Principal payment

n = number of payments

5. BOND CONVEXITY - an increase in interest rates results in a price decline that is smaller than the price gain resulting from a decrease of equal magnitude in interest rates.




2. ANNUALIZED PERCENTAGE RATE (APR) = YTM ´ Number of periods in 1 year


n = number of periods in 1 year




n = number of days to maturity


8. BEY = 365 ´ BDY / 360 - (BDY ´ n)

9. BDY < BEY < EAY

10. FOR PREMIUM BOND: C > CY > YTM (Loss on Pb relative to par)



1. Zero coupons, stripping

2. Appreciation of Original issue discount (OID)

3. Coupon bonds, callable

4. Invoice price = Asked price + Accrued interest

5. Appreciation / Depreciation and: Market interest rates, Taxes, Risk (Adjustment)



1. Coverage ratios

2. Leverage ratios

3. Liquidity ratios

4. Profitability ratios

5. Cash flow to debt ratio

6. Altman's formula (Z-score) for predicting bankruptcies:

Z = 3,3 times EBIT / TOTAL ASSETS + 99,9 times SALES / ASSETS +






1. Macroeconomy - the economic environment in which all the firms operate

2. Macroeconomic Variables:

GDP (Gross Domestic Product) or Industrial Production - vs. GNP

Employment (unemployment, underemployment) rate(s)

Factory Capacity Utilization Rate

Inflation (vs. employment, growth)

Interest rates (=increase in PNV factor)

Budget deficit (and its influence on interest rates & private borrowing)

Current account & Trade deficit (and exchange rates)

"Safe Haven" attributes (and exchange rates)

Exchange rates (and foreign trade and inflation)

Tax rates (and investments / allocation, and consumption)

Sentiment (and consumption, and investment)

3. Demand and Supply shocks

4. Fiscal and Monetary policies

5. Leading, coincident and lagging indicators

6. Business cycles:

Sensitivity (elasticity) of sales

Operating leverage (fixed to variable costs ratio)

Financial leverage



1. Return On Investment (ROI) = Interest + Capital Gains

2. Zero coupon bond:

Pb = PAR / (1+I)n

3. Bond prices change according to interest rates, time, taxation and to expectations about default risk, callability and inflation

4. Coupon bonds = a series of zero coupon bonds

5. Duration = average maturity of a bond's cash flows = the weight or the proportion of the total value of the bond accounted for by each payment.

Wt = CFt/(1+y)t / Pb Swt = 1 = bond price


Macauley's formula D = S t ´ Wt (where yield curve is flat!)


6. Duration:

  1. Summary statistic of effective average maturity.
  2. Tool in immunizing portfolios from interest rate risk.
  3. Measure of sensitivity of portfolio to changes in interest rates.

7. DP/P = - D ´ [ D (1+y) / 1+y ] = [ - D / 1+y ] ´ D (1+y) = - Dm ´ D y

8. The EIGHT durations rules

  1. Duration of zero coupon bond = its time to maturity.
  1. When maturity is constant, a bond's duration is higher when the coupon rate is lower.
  1. When the coupon rate is constant, a bond's duration increases with its time to maturity.

Duration always increases with maturity for bonds selling at par or at a premium.

With deeply discounted bonds duration decreases with maturity.

  1. Other factors being constant, the duration of a coupon bond is higher when the bond's YTM is lower.
  1. The duration of a level perpetuity = 1+y / y
  1. The duration of a level annuity = 1+y/y - T/(1+y)T -1
  1. The duration of a coupon bond = 1+y/y - (1+y)+T(c-y) / c[(1+y)T-1]+y
  1. The duration of coupon bonds selling at par values = {1+y/y ´ [1 - 1/(1+y)T]} ´ 100

9. Passive bond management - control of the risk, not of prices.

- indexing (market risk)

- immunization (zero risk)

10. Some are interested in protecting the current net worth - others with payments (=the future worth).

11. BANKS: mismatch between maturities of liabilities and assets.

Gap Management: certificates of deposits (liability side) and adjustable rate mortgages (assets side)

12. Pension funds: the value of income generated by assets fluctuates with interest rates

13. Fixed income investors face two types of risks:

Price risk

Reinvestment (of the coupons) rate risks

14. If duration selected properly the two effects cancel out.

For a horizon equal to the portfolio's duration - price and re-investment risks cancel out.

15. BUT: Duration changes with yield rebalancing

16. BUT: Duration will change because of the passage of time (it decreases less rapidly than maturity)

17. Cash flow matching -buying zeros or bonds yielding coupons equal to the future payments (dedication strategy)

18. A pension fund is a level perpetuity and its duration is according to rule (E).

19. There is no immunization against inflation (except indexation).

20. Active bond management

- Increase / decrease duration if interest rate declines / increases are forecast

- Identifying relative mispricing

21. The Homer - Leibowitz taxonomy:

  1. Substitution swap - replacing one bond with identical one.
  1. Intermarket spread swap - when the yield spread between two sectors of the bond market is too wide.
  1. Rate anticipation swap - changing duration according to the forecasted interest rates.
  1. Pure yield pickup swap - holding higher yield bond.
  1. Tax swap - intended to exploit tax advantages.

22. Contingent immunization (Leibowitz - Weinberger):

Active management until portfolio drops to

minimum future value / (1+I)T = Trigger value

if portfolio drops to trigger value - immunization.

23. Horizon Analysis

Select a Holding Period

Predict the yield curve at the end of that period

[We know the bond's time to maturity at the end of the holding period]

{We can read its yield from the yield curve} determine price

24. Riding the yield curve

If the yield curve is upward sloping and it is projected not to shift during the investment horizon as maturities fall (=as time passes) - the bonds will become shorter - the yields will fall - capital gains

Danger: Expectations that interest rates will rise.



1. Between two parties exposed to opposite types of interest rate risk.


Short term - Long term

Variable rate liabilities - Fixed rate liabilities

Long term - Short term

Fixed rate assets - Variable rate assets

Risk: Rising interest rates Risk: Falling interest rates

2. The Swap

SNL would make fixed rate payments to the corporation based on a notional amount

Corporation will pay SNL an adjustable interest rate on the same notional amount

3. After the swap


Long term loans Short term deposits Short term assets Long term bonds

(claim to) variable (obligation to) make (claim to) fixed (obligation to) make

- rate cash flows fixed cash payments cash flows variable-rate payments

net worth net worth

William Sharpe, John Lintner, Jan Mossin

1. Capital Asset Pricing Model (CAPM) predicts the relationship between an asset's risk and its expected return = benchmark rate of return (investment evaluation) = expected returns of assets not yet traded

2. Assumptions

[Investors are different in wealth and risk aversion} but:

  1. Investor's wealth is negligible compared to the total endowment;
  2. Investors are price - takers (prices are unaffected by their own trade);
  3. All investors plan for one, identical, holding period (myopic, suboptimal behaviour);
  4. Investments are limited to publicly traded financial assets and to risk free borrowing / lending arrangements;
  5. No taxes on returns, no transaction costs on trades;
  6. Investors are rational optimizers (mean variance - Markowitz portfolio selection model);
  7. All investors analyse securities the same way and share the same economic view of the world ® homogeneous expectations identical estimates of the probability distribution of the future cash flows from investments.

3. Results

  1. All the investors will hold the market portfolio.
  1. The market portfolio is the best, optimal and efficient one.

A passive (holding) strategy is the best.

Investors vary only in allocating the amount between risky and risk - free assets.

  1. The risk premium on the market portfolio will be proportional to:

its risk

and the investor's risk aversion

  1. The risk premium on an individual asset will be proportional to the risk premium on the market portfolio

and the beta coefficient of the asset (relative to the market portfolio).

Beta measures the extent to which returns on the stock and the market move together.

4. Calculating the Beta

  1. The graphic method

The line from which the sum of standard deviations of returns is lowest.

The slope of this line is the Beta.

  1. The mathematical method


bi = Cov (ri, rm) / sm2 = S (yti-yai)(ytm-yam) / S (ytm-tam)2

t=1 t=1

5. Restating the assumptions

  1. Investors are rational
  1. Investors can eliminate risk by diversification

- sectoral

- international

  1. Some risks cannot be eliminated - all investments are risky
  1. Investors must earn excess returns for their risks (=reward)
  1. The reward on a specific investment depends only on the extent to which it affects the market portfolio risk (Beta)

6. Diversified investors should care only about risks related to the market portfolio.



1/2 1 2

Investment with Beta 1/2 should earn 50% of the market's return

with Beta 2 - twice the market return.

7. Recent research discovered that Beta does not work.

A better measure:

B / M

(Book Value) / (Market Value)

8. If Beta is irrelevant - how should risks be measured?

9. NEER (New Estimator of Expected Returns):

The B to M ratio captures some extra risk factor and should be used with Beta.

10. Other economists: There is no risk associated with high B to M ratios.

Investors mistakenly underprice such stocks and so they yield excess returns.

11. FAR (Fundamental Asset Risk) - Jeremy Stein

There is a distinction between:

  1. Boosting a firm's long term value and
  2. Trying to raise the share's price

If investors are rational:

Beta cannot be the only measure of risk ® we should stop using it

Any decision boosting (A) will affect (B) ® (A) and (B) are the same

If investors are irrational

Beta is right (it captures an asset's fundamental risk = its contribution to the market portfolio risk) ® we should use it, even if investors irrational if investors are making predictable mistakes - a manager must choose:

If he wants (B) ® NEER (accommodating investors expectations)

If he wants (A) BETA


1. Efficient market hypothesis - share prices reflect all available information

2. Weak form

Are past prices reflected in present prices?

No price adjustment period - no chance for abnormal returns

(prices reflect information in the time that it takes to decipher it from them)

If we buy after the price has changed - will we have abnormal returns?

Technical analysis is worthless

3. Semistrong form

Is publicly available information fully reflected in present prices?

Buying price immediately after news will converge, on average, to equilibrium

Public information is worthless

4. Strong form

Is all information - public and private - reflected in present prices?

No investor can properly evaluate a firm

All information is worthless

5. Fair play - no way to use information to make abnormal returns

An investor that has information will estimate the yield and compare it to the equilibrium yield. The deviation of his estimates from equilibrium cannot predict his actual yields in the future.

His estimate could be > equilibrium > actual yield or vice versa. On average, his yield will be commensurate with the risk of the share.

6. Two basic assumptions

  1. Yields are positive
  2. High / low yields indicates high / low risk

7. If (A) is right, past prices contain no information about the future

8. Random walk

  1. Prices are independent (Monte Carlo fallacy)
  2. Prices are equally distributed in time

9. The example of the quarterly increase in dividends

10. The Rorschach Blots fallacy (patterns on random graphical designs)

® cycles (Kondratieff)

11. Elton - Gruber experiments with series of random numbers

12. Price series and random numbers yield similar graphs

13. The Linear model

Pa - Pa-1 = ( ED P + ) ´ ( Pa-1-c - Pa-z-c +R )

P = Price of share

C = Counter

ED P = Expected change in Price

R = Random number

14. The Logarithmic model

= cum. Y

Sometimes, instead of Pc we use D P +

15. Cluster analysis (Fama - Macbeth)

+ and - distributed randomly. No statistical significance.

16. Filter models - share prices will fluctuate around equilibrium because of profit taking and bargain hunting

17. New equilibrium is established by breaking through trading band

18. Timing - percentage of break through determines buy / sell signals

19. Filters effective in BEAR markets but equivalent to random portfolio management

20. Fama - Blum: best filter is the one that covers transaction costs

21. Relative strength models - P / P

Divide investment equally between top 5% of shares with highest RS and no less than 0,7

Sell shares falling below this benchmark and divide the proceeds among others

22. Reservations:

  1. High RS shares are the riskiest
  2. The group selected yield same as market -
    - but with higher risk


1. Versus fundamental: dynamic (trend) vs. static (value)

2. Search for recurrent and predictable patterns

3. Patterns are adjustment of prices to new information

4. In an efficient market there is no such adjustment, all public information is already in the prices

5. The basic patterns:

  1. momentum
  2. breakaway
  3. head and shoulders ® chartists

6. Buy/sell signals

Example: Piercing the neckline of Head and Shoulders

7. The Dow theory uses the Dow Jones industrial average (DJIA) as key indicator of underlying trends + DJTransportation as validator

8. Primary trend - several months to several years

Secondary (intermediate) trend - deviations from primary trend: 1/3, 1/2, 2/3 of preceding primary trend

Correction - return from secondary trend to primary trend

Tertiary (minor) trend - daily fluctuations

9. Channel - tops and bottoms moving in the direction of primary trend

10. Technical analysis is a self fulfilling prophecy - but if everyone were to believe in it and to exploit it, it would self destruct.

People buy close to resistance because they do not believe in it.

11. The Elliott Wave theory - five basic steps, a fractal principle

12. Moving averages - version I - true value of a stock is its average price

prices converge to the true value

version II - crossing the price line with the moving

average line predicts future prices

13. Relative strength - compares performance of a stock to its sector or to the performance of the whole market

14. Resistance / support levels - psychological boundaries to price movements assumes market price memory

15. Volume analysis - comparing the volume of trading to price movements high volume in upturns, low volume in down movements - trend reversal

16. Trin (trading index) =

Trin > 1 Bearish sign

17. BEAR / Bull markets - down/up markets disturbed by up/down movements

18. Trendline - price moves upto 5% of average

19. Square - horizontal transition period separating price trends (reversal patterns)

20. Accumulation pattern - reversal pattern between BEAR and BULL markets

21. Distribution pattern - reversal pattern between BULL and BEAR markets

22. Consolidation pattern - if underlying trends continues

23. Arithmetic versus logarithmic graphs

24. Seasaw - non breakthrough penetration of resistance / support levels

25. Head and shoulder formation (and reverse formation):

Small rise (decline), followed by big rise (decline), followed by small rise (decline).

First shoulder and head-peak (trough) of BULL (BEAR) market.

Volume very high in 1st shoulder and head and very low in 2nd shoulder.

26. Neckline - connects the bottoms of two shoulders.

Signals change in market direction.

27. Double (Multiple) tops and bottoms

Two peaks separated by trough = double tops

Volume lower in second peak, high in penetration

The reverse = double bottoms

28. Expanding configurations

Price fluctuations so that price peaks and troughs

can be connected using two divergent lines.

Shoulders and head (last).

Sometimes, one of the lines is straight:

UPPER (lower down) or - accumulation, volume ­ in penetration

LOWER (upper up) 5% penetration signals reversal

29. Conservative upper expanding configuration

Three tops, each peaking

Separated by two troughs, each lower than the other

Signals peaking of market

5% move below sloping trendline connecting two troughs

or below second through signals reversal

30. Triangles - consolidation / reversal patterns

31. Equilateral and isosceles triangle (COIL - the opposite of expansion configuration)

Two (or more) up moves + reactions

Each top lower than previous - each bottom higher than previous

connecting lines converge

Prices and volume strongly react on breakthrough

32. Triangles are accurate when penetration occurs

Between 1/2 - 3/4 of the distance between the most congested peak and the highest peak.

33. Right angled triangle

Private case of isosceles triangle.

Often turn to squares.

34. Trendlines

Connect rising bottoms or declining tops (in Bull market)

Horizontal trendlines

35. Necklines of H&S configurations

And the upper or lower boundaries of a square are trendlines.

36. Upward trendline is support

Declining trendline is resistance

37. Ratio of penetrations to number of times that the trendline was only touched without being penetrated

Also: the time length of a trendline

the steepness (gradient, slope)

38. The penetration of a steep trendline is less meaningful and the trend will prevail.

39. Corrective fan

At the beginning of Bull market - first up move steep, price advance unsustainable.

This is a reaction to previous downmoves and trendline violated.

New trendline constructed from bottom of violation (decline) rises less quickly, violated.

A decline leads to third trendline.

This is the end of the Bull market

(The reverse is true for Bear market.)

40. Line of return - parallel to upmarket trendline, connects rising tops (in uptrends) or declining bottoms (in downtrends).

41. Trend channel - the area between trendlines and lines of return.

42. Breach of line of return signals (temporary) reversal in basic trend.

43. Simple moving average

Average of N days where last datum replaces first datum changes direction after peak / trough.

44. Price < MA ® Decline

Price > MA ® Upturn

45. MA at times support in Bear market

resistance in Bull market

46. Any break through MA signals change of trend.

This is especially true if MA was straight or changed direction before.

If broken trough while continuing the trend - a warning.

We can be sure only when MA straightens or changes.

47. MA of 10-13 weeks secondary trends

MA of 40 weeks primary trends

Best combination: 10+30 weeks

48. Interpretation

30w down, 10w < 30w downtrend

30w up, 10w > 30w uptrend

49. 10w up, 30w down (in Bear market)

10w down, 30w up (in Bull market)

No significance

50. MAs very misleading when market stabilizes and very late.

51. Weighted MA (1st version)

Emphasis placed on 7w in 13w MA (wrong - delays warnings)

Emphasis placed on last weeks in 13w

52. Weighted MA (2nd version)

Multiplication of each datum by its serial number.

53. Weighted MA (3rd version)

Adding a few data more than once.

54. Weighted MAs are autonomous indicators - without being crossed with other MAs.

55. Exponential MA - algorithm

  1. Simple 20w MA
  2. Difference between 21st datum and MA multiplied by exponent (2/N) = result 1
  3. Result 1 added to MA
  4. If difference between datum and MA negative - subtract, not add

56. Envelopes

Symmetrical lines parallel to MA lines (which are the centre of trend) give a sense of the trend and allow for fatigue of market movement.

57. Momentum

Division of current prices by prices a given time ago

Momentum is straight when prices are stable

When momentum > reference and going up market up (Bull)

When momentum > reference and going down Bull market stabilizing

When momentum < reference and going down market down (Bear)

When momentum < reference and going up Bear market stabilizing

58. Oscillators measure the market internal strengths:

59. Market width momentum

Measured with advance / decline line of market

(=the difference between rising / falling shares)

When separates from index - imminent reversal

momentum = no. of rising shares / no. of declining shares

60. Index to trend momentum

Index divided by MA of index

61. Fast lines of resistance (Edson Gould)

The supports / resistances will be found in 1/3 - 2/3 of previous price movement.

Breakthrough means new tops / bottoms.

62. Relative strength

Does not indicate direction - only strength of movement.

More Technical Analysis:

1. Williams %R = 100 x

r = time frame

2. The Williams trading signals:

  1. Divergence:
  1. Bearish - WM% R rises above upper reference line


Cannot rise above line during next rally

  1. Bullish - WM% R falls below lower reference line


Cannot decline below line during next slide

  1. Failure swing

When WM%R fails to rise above upper reference line during rally


Fall below lower reference line during decline

3. Stochastic

A fast line (%K) + slow line (%D)


  1. Calculate raw stochastic (%K) = x 100

n = number of time units (normally 5)

  1. %D = x 100 (smoothing)

4. Fast stochastic

%K + %D on same chart (%K similar to WM%R)

5. Slow stochastic

%D smoothed using same method

6. Stochastic trading signals

  1. Divergence
  1. Bullish

Prices fall to new low

Stochastic traces a higher bottom than during previous decline

  1. Bearish

Prices rally to new high

Stochastic traces a lower top than during previous rally

  1. Overbought / Oversold
  1. When stochastic rallies above upper reference line - market O/B
  2. When stochastic falls below lower reference line - market O/S
  1. Line direction

When both lines are in same direction - confirmation of trend

7. Four ways to measure volume

  1. No, of units of securities traded
  2. No, of trades
  3. Tick volume
  4. Money volume

8. OBV Indicator (on-balance volume)

Running total of volume with +/- signs according to price changes

9. Combined with:

  1. The Net Field trend Indicator

(OBV calculated for each stock in the index and then rated +1, -1, 0)

  1. Climax Indicator

The sum of the Net Field Trend Indicators

10. Accumulation / Distribution Indicator

A/D = x V

11. Volume accumulator

Uses P instead of 0.

12. Open Interest

Number of contract held by buyers or

owed by short sellers in a given market on a given day.

13. Herrich Payoff Index (HPI)

HPI = Ky + (K' - Ky)

K = [(P - Py) x C x V] x [1 ± {(½ I - Iy½ x 2 / G}

G= today's or yesterday's I (=open interest, whichever is less)

+/- determined: if P > Py (+), if P < Py (-)

Annex: The Foundations of Common Investment schemes Challenged

The credit and banking crisis of 2007-9 has cast in doubt the three pillars of modern common investment schemes. Mutual funds (known in the UK as "unit trusts"), hedge funds, and closed-end funds all rely on three assumptions:

Assumption number one

That risk inherent in assets such as stocks can be "diversified away". If one divides one's capital and invests it in a variety of financial instruments, sectors, and markets, the overall risk of one's portfolio of investments is lower than the risk of any single asset in said portfolio.

Yet, in the last decade, markets all over the world have moved in tandem. These highly-correlated ups and downs gave the lie to the belief that they were in the process of "decoupling" and could, therefore, be expected to fluctuate independently of each other. What the crisis has revealed is that contagion transmission vectors and mechanisms have actually become more potent as barriers to flows of money and information have been lowered.

Assumption number two

That investment "experts" can and do have an advantage in picking "winner" stocks over laymen, let alone over random choices. Market timing coupled with access to information and analysis were supposed to guarantee the superior performance of professionals. Yet, they didn't.

Few investment funds beat the relevant stock indices on a regular, consistent basis. The yields on "random walk" and stochastic (random) investment portfolios often surpass managed funds. Index or tracking funds (funds who automatically invest in the stocks that compose a stock market index) are at the top of the table, leaving "stars", "seers", "sages", and "gurus" in the dust.

This manifest market efficiency is often attributed to the ubiquity of capital pricing models. But, the fact that everybody uses the same software does not necessarily mean that everyone would make the same stock picks. Moreover, the CAPM and similar models are now being challenged by the discovery and incorporation of information asymmetries into the math. Nowadays, not all fund managers are using the same mathematical models.

A better explanation for the inability of investment experts to beat the overall performance of the market would perhaps be information overload. Recent studies have shown that performance tends to deteriorate in the presence of too much information.

Additionally, the failure of gatekeepers - from rating agencies to regulators - to force firms to provide reliable data on their activities and assets led to the ascendance of insider information as the only credible substitute. But, insider or privileged information proved to be as misleading as publicly disclosed data. Finally, the market acted more on noise than on signal. As we all know, noise it perfectly randomized. Expertise and professionalism mean nothing in a totally random market.

Assumption number three

That risk can be either diversified away or parceled out and sold. This proved to be untenable, mainly because the very nature of risk is still ill-understood: the samples used in various mathematical models were biased as they relied on data pertaining only to the recent bull market, the longest in history.

Thus, in the process of securitization, "risk" was dissected, bundled and sold to third parties who were equally at a loss as to how best to evaluate it. Bewildered, participants and markets lost their much-vaunted ability to "discover" the correct prices of assets. Investors and banks got spooked by this apparent and unprecedented failure and stopped investing and lending. Illiquidity and panic ensued.

If investment funds cannot beat the market and cannot effectively get rid of portfolio risk, what do we need them for?

The short answer is: because it is far more convenient to get involved in the market through a fund than directly. Another reason: index and tracking funds are excellent ways to invest in a bull market.

Statistics Primer

Categorical measures (categorical, qualitative, classification variables): binary, nominal (order meaningless), ordinal (order meaningful)


Continuous measures: interval vs. ratio


Discrete measures (whole numbers)


Data can be listed, arranged as frequency scores, in histograms (bar charts).


Normal (Gaussian) distribution


Skewed (asymmetrical) distribution: positive (rises rapidly, drops slowly), negative (rises slowly, drops rapidly)


Floor effect (depression), ceiling effect (math problems)


Kurtosis (not bell shaped): positive (not enough in tails), negative (too many in tails)




Measures of central tendency:


Mean (x bar=sigma x divided by N) requires symmetrical distribution (no outliers) and interval or ratio data


Median (when distribution asymmetrical and data ordinal): half odd number of scores and take next number; mean of two middle scores in even number of scores


Mode: most frequent score in distribution, most frequent observation among scores


Dispersion (spread):


Range: distance between highest and lowest score


Inter-quartile range (IQR): used with ordinal data or with non-normal distributions, the distance between the upper (quarter) and lower (three quarters) quartiles, values (3 quartiles in a variable divide it to 4 groups, median is the 2nd quartile)


Semi IQR=IQR/2


(Population) Standard deviation=square root of (the totals of each variable minus the mean squared) divided by N-1


Boxplot (box and whisker plot): median represented as thick line, upper and lower quartile as box around the median, whiskers are maximum and minimum points, outlier cannot be 1.5 length of box.


Population: descriptive statistics and inferential statistics


Representative sample is random: each member of the population has equal chance to be selected and the selection of one member does not affect the chances to be selected of any other member. But random sample is impossible, so volunteer sample or snowball (referrals) sample.


Uniform distribution (number on dice) vs. normal (sums of 2 numbers on multiple dice)


In histogram represented as line chart, with continuous variable on x axis and frequency density on y axis, the number of people with any score or range of scores equals the area of the chart. Probability is this area divided by the total area.


Normal distribution is defined by its mean and standard deviation. Half the scores are above the mean and half under it (probability=0.5). 95.45% of cases within 2 SDs of mean. 47.72% between mean and -2SDs. 49.9% between mean and +3SDs. 2.27% lie more than 2 SDs below the mean.


Z-score presented in terms of SDs above the mean = score minus mean divided by standard deviation. One SD above mean=z-score +1. Look at a z table for probability of getting that z score.


Sampling distribution of the mean: probability of different values plotted as normal distribution.


Central limit theorem: sampling distribution of the sample means in the population is normal (big sample) or t-shaped (small sample) – or: if the distribution in the sample is normal, the sampling distribution will be bell curved. If the sample distribution is not normal but the sample is large, sample distribution will be normal, or t-shaped.


If we carry out the same experiment over and over again, we will get different results each time. Standard deviation of this sample of scores is standard error (se) of the mean: standard deviation of the sampling distribution of the mean. The bigger the sample size, the closer the sample mean to the population mean and the smaller the standard deviation. Bigger variation, greater uncertainty as to the population mean. Standard error of the mean of variable X equals its standard deviation divided by the square root of the number of cases in the sample. To halve the standard error, the sample size needs to increase fourfold.




If we carry out a study a number of times, we get a range of mean scores that make up a sampling distribution which is normal if the sample is large and t-shaped if the sample is small.


We know the standard deviation of the sampling distribution of the mean (=standard error).


If we know the mean and we know the standard deviation, we know the probability of any value.




Null hypothesis (Ho)


Alternative hypothesis (H1)


Probability value: probability of a result if the null hypothesis is true NOT the probability that the null hypothesis is true.


One-tailed, directional hypothesis: probability p=1 divided by 2 to the power of k (number of tosses)


We use two-tailed (non-directional) hypothesis: p=1 divided by 2 to the power of k-1 (2 to the power of zero equals 1 which is why p=1)


Alpha: cut off rate for rejecting the null hypothesis as false=0.05 (5%, 1 in 20). Below alpha is statistically significant. Rejecting the null hypothesis does not mean adopting the alternative hypothesis (not proof, just evidence). Alpha above 0.05 doesn’t prove that the null hypothesis is true, only that it cannot be rejected.


Type I error: rejecting the null hypothesis when it actually may be true. If p=0.05, there is less than 5% chance of the result occurring if the null hypothesis is true/there is a 5% chance of the result occurring if the null hypothesis is true/there is 5% probability that the decision to reject the null hypothesis is wrong. Type II error: failing to reject the null hypothesis when it is false.


If the population mean has a particular value, what is the probability that I would find a value as far from that value as in my sample, or further? For a small sample t=score (mean in sample) minus (suspected population) mean divided by se. Look at a t table for probability of getting that t score.


Degrees of freedom (df)=N-1: what proportion of cases lie more than a certain number of SDs above the mean in a t distribution with N-1 df.


Confidence interval: the likely range of the population value, between the confidence limits (usually 95%, alpha 0.05%, or significance level 0.05). We know the sample mean and the standard error (standard deviation of the sampling distribution). We need to know the values that cover 95% of the population in a t distribution with a certain number of df (what t value gives these values).


CI=mean plus/minus t value for alpha 0.05 multiplied by the standard error (two calculations for UPL/LCL). Interpretation: in 95% of studies, the population mean will be within the confidence limits.


In 95% of studies, the true value is contained within the confidence intervals. In 5% of studies, the true value is therefore not contained within the confidence intervals. In 5% of studies, the result will be statistically significant, when the null hypothesis is true. If the confidence intervals contain the null hypothesis, the result is not statistically significant. If the confidence intervals do not contain the null hypothesis, the result is statistically significant. Confidence intervals and statistical significance are therefore two sides of the same coin.


Experiments with repeated measures (within subjects) design: scores of individuals in one condition against their scores in another condition (people are their own control group).


Problems: practice effects (solutions: counterbalancing and practice items); sensitization; carry over effects.


Related design: people in two groups are closely related.


Cross-over design studies: when people cross from one group to the other.


Correlational design: did people who scored high in one test score high on the second test? Not interested in whether the scores overall went up or down. Repeated measures: did people score higher on one occasion rather than the other? Not interested if people who scored high the first time scored high the second time as well.


What statistical test should I use? (statsols.com)


Which Statistics Test Should I Use? (socscistatistics.com)


Parametric tests (like t test) make inferences about population parameters.


Non-parametric tests (like Wilcoxon) use ranking, use data measured on an ordinal scale (we don’t know the size of the gaps, just the order of the scores).


T-Test (Repeated Measures)


Repeated measures t-test: when data are measured on continuous (interval) level and the differences between the two scores are normally distributed (even if the variables are not normally distributed). Makes no assumption about the variances of the variables (like the independent samples t-test).


1. Calculate the difference between the scores for each person.

2. Calculate the mean and standard deviation of the difference.

3. Calculate the standard error of the difference (SD divided by square root of N), using the result from step 2.

4. Calculate the confidence interval for the difference, using the result from step 3.

5. Calculate the statistical significance of the difference, using the result from step 3.


Step 4:


We need to know what confidence intervals we are interested in. The answer is almost always the 95% confidence intervals, so α (alpha) is equal to 0.05. We need the cutoff value for t, at the 0.05 level. We can use a table or a computer program to do this, but first we need to know the degrees of freedom (df). In this case, df = N-1, so we have 15 df. With 15 df, the 0.05 cut-off for t is 2.131. The equation has a -} symbol in it. This means that we calculate the answer twice, once for the lower confidence interval and once for the upper confidence interval.

CI=mean plus/minus t alpha multiplied by standard error

Lower CI = -0.031 - 2.131 × 0.221 = -0.502

Upper CI = -0.031 + 2.131 × 0.221 = 0.440

Given that the confidence interval crosses zero, we cannot reject the null hypothesis that there is no difference between the two conditions.


Step 5


Find the t value=mean of differences divided by standard error of the differences.

Table or computer tell us probability p of getting a value of t at least as large as the calculated value with N-1 degrees of freedom. If higher than 0.05 we cannot reject null hypothesis.


Wilcoxon Test


When differences not normally distributed, or measures are ordinal.


Step 1 rank the change scores ignoring the sign (quantity, not whether more or less). If there are tied scores (several identical scores), they are given the mean rank (example: the score 12 is ranked in position 2,3,4 so its mean rank is 3 and it will be assigned the rank 3 wherever it appears).


Step 2 separate the ranks of positive changes from the ranks of negative changes. Add up the 2 columns of ranks. T is the lower of these 2 values.


Step 3 the probability of T according to the sample size (in a table). Or use the normal approximation (convert T to z and look up the p value in a normal distribution table).


Z =                                       T-N(N+1)/4 – 0.5 (continuity correction)


          Sqrt N(N+1)(2N+1)/24 – [sigma t to the third power minus sigma t]/48 (t is number of ties)


The p value associated with z is one tail of the distribution, so must be multiplied by 2.


Sign Test


Used when we have nominal data with 2 categories and have repeated measures data.


S is the smallest of the obtained values (example: 14 yes and 19 no, S=14).

N is the number of scores that were not tied.

P calculated using S and N in a table.


Independent groups design: comparing 2 or more independent groups (different participants in each group). Quasi experimental design: when people already belong to different categories (men and women, for example). Three kinds of dependent variables: continuous, ordinal, and categorical.


T Test (Independent Groups, between subjects, two samples)


Data measured on continuous (interval) scale, data within each group are normally distributed, standard deviations of the two groups are equal (easier test), but can not be (more difficult test).


Step 1 calculate the SD of each group

Step 2 calculate the standard deviation of the difference= SD1 squared (n1-1)+SD2 squared (n2-1)/n1+n2-2

When sample sizes are same, the formula is: SDdiff=SD1 squared+SD2 squared/2

Step 3 calculate the standard error of the difference=SQRT (SDdiff/n1+SDdiff/n2) or, if two sample sizes are equal=SQRT SDdiff/0.5(n1+n2)

Step4 calculate the CI=diff plus/minus t alpha multiplied by standard error (df is n1+n2-2)

Step 5 calculate t (probability associated with null hypothesis: probability of getting t at least this large if null hypothesis is true).


Homogeneity of variance used only in pooled (not unpooled) variance t-test. Pooled for equal sample sizes, unpooled for unequal sample sizes. If Levene’s test gives statistically significant result, the variances (SDs) are not the same and one should use unpooled.


General t-test


t=difference between two means (d)/standard error

CI=d plus minus t of alpha multiplied by standard error


Unpooled (welch) variance t-test


Unequal sample sizes: comparing two naturally occurring groups, expensive intervention, ethical or recruitment issues.


Step 1 calculate the SDs

Step 2 calculate the SE of difference=SQRT of (SD1 squared/N1+SD2 squared/N2)

Step 3 calculate the degrees of freedom=

                                     (SD1 squared/N1+SD2 squared/N2) squared


               (SD1 squared/N1) squared/(N1-1) + (SD2 squared/N2) squared/(N2-1)


(See also: Satterthwaite’s method)


Step 4 calculate the confidence intervals (two-tailed alpha)

Step 5 calculate the value of t and find its p value


Cohen’s d: effect size for independent groups t-test: how far apart the means of the two samples are in standard deviation units. d = 2t/(SQRT of df)  df=N1+N2-2

Mann-Whitney U Test


Compares two unrelated groups, non-parametric (no assumptions regarding normal distribution and interval data). N1 number in group with larger ranked total, N2 smaller ranked total.


Step 1 ranking

Step 2 Calculate U1 = N1*N2+(N1*(N1+1)/2)-sigma ranks1

Step 3 calculate U2 = N1*N2-U1

Step 4 find U (the smaller of U1 or U2)

Step 5 find p value in table (if p value of U above that, it is not statistically significant at the 0.05 level)   

When sample large, convert U score to z score and get p value, then multiply by 2 for two-tailed significance:

z=U-(N1*N2/2)/SQRT N1N2(N1+N2+1)/12   


t-test tells us whether difference in means between 2 groups is statistically significant. Mann Whitney compares ranks. If the two groups have same distribution shapes, it compares the medians.


Theta=probability (B>A)+0.5*probability (B=A)=U/N1N2. Measures the probability that the score of a randomly selected person from group B will be higher or equal to the score of a randomly selected person from group A.           


Categorical or nominal data: discrete values (yes/no).


Mean is average of continuous data. Median is the average with ordinal data. With nominal data, we provide proportions.


Absolute difference (A-B) and relative difference (A-B/A relative decrease or A-B/B relative increase).


Odds ratio (OR): ratio between 2 odds.


Odd=number of events: number of nonevents (usually, 1, so only number of events mentioned). Odds=probability p/1-p.


If data is placed in table cells A-D, OR=AD/BC


Nu is SD of the OR=SQRT of (1/A+1/B+1/C+1/D). It is related to a normal distribution, so we need its value associated with the cut-off with 95% CI (z of alpha/2 or two-tailed alpha).


Lower confidence limit CL=OR*exp (-1zalpha2tailed*nu). Upper CL=OR*exp(zalpha2tailded*nu).


Chi-square test


Put data in a table with 4 cells and add the totals of both rows and columns.

Calculate E(expected value) for each cell (if null hypothesis were true)=(R*C)/T (R total for given row, C for column, T total).

Calculate differences between data (O or observed values) and E.

Chi squared=sigma of (O-E)squared/E for each of the cells

df=(number of rows-1)*(number of columns-1)

Check in table p value of chi squared as high or higher with given dfs.


Use Fisher’s exact test when expected values in 2X2 table are smaller than 5:

P=(A+B)!(C+D)!(A+C)!(B+D)!/(A+B+C+D)!A!B!C!D! (! Factorial: multiply every number below down to 1, so 4!=4*3*2*1). Calculate for the result of the study and for all results more extreme than the result.


Use (Yates’) continuity correction:

Chi squared=sigma of (!O-E!-0.5) squared/E (absolute values, ignoring plus or minus signs)


Scatterplot: one variable on x axis, the other on the y axis, point per each person showing scores on both variables.


Summarize the relationship between the 2 variables with a number, calculate Cis for this number, find out if the relationship is statistically significant (probability of finding a relationship that is at least this strong if null hypothesis that there is no relationship in the population is true).


Line of best fit: use slope to predict the value of one variable based on the score of the other (regression line). Slope of 1.32 means that for every 1 unit move on x axis, the line rises 1.32 units along the y axis. The height is where the line hits the y axis (the constant, y-intercept or just intercept): expected score on y when score on x is zero. Expected y score=intercept/beta0+(slope/beta1*x score)=regression equation.


Standardized scores, slopes: measured in terms of SDs, not in absolute units.

Pearson Correlation coefficient (r) is parametric (data continuous and normally distributed): standardized slope=(beta*SDx)/SDy (expected increase in one variable when the other variable increases by 1 SD).


SD=spread of points of one variable around mean. Square of SD is variance.


Variance=sigma of (x-mean or d, difference) squared/(N-1)


In regression analysis, instead of d being the difference between score and mean, it is the difference between expected y value given x score and actual y score (residual). Expected value (y ^) = Bo (intercept)*b1 (slope)x. Residual squared equals difference squared and is input in calculating variance (SD squared).


How large is the VAR of the residuals in terms of VAR of x axis scores? VAR of x scores = sigma of (x score minus mean) squared/(N-1). Residual VAR/x scores VAR=proportion of VAR that is residual VAR (not explained in terms of y). 1-VAR residuals=proportion of VAR explained in terms of y. SQRT of non-residual VAR is SD or correlation coefficient (standardized slope).


Squaring the correlation gets the of variance in one variable that is explained by the other variable (proportion of variance).


Correlation is both descriptive and inferential statistic. Table for probability value associated with null hypothesis (r=0) or to describe the strength of the relationship between 2 variables (Cohen: r=0.1 small 0.3 medium 0.5 large).


Covariance of x,y = (x-mean of x)(y minus mean of y)/(N-1)


Pearson Correlation (r) of x,y = covariance (x,y)/SDx*SDy=sigma of [(x minus mean x)(y minus mean y)]/SQRT sigma (x minus mean x) squared * SQRT sigma (y minus mean y) squared


Confidence intervals: range of a value in the population, not just in a sample.


Step 1 Fisher’s z transformation: transforms distribution of the correlation into a z distribution (normal, mean 0, SD 1). Z’=0.5*ln (1+r/1-r)


Step 2 Standard error=1/SQRT (N-3)


Step 3 CI=Z’ plus/minus z of alpha two-tailed * se (z of 2-tailed alpha is value of normal distribution that includes the percentage of values that we want to cover)


Step 4 convert CIs back to correlations r=exp(2z’)-1/exp(2z’)+1 (2 calculations for 2 CIs, upper and  lower)


Regression line slope (beta1)=r*SDy/SDx


Intercept=mean y-beta1*mean x


For dichotomous values (yes/no): phi=r=SQRT (chi squared/N)

p value of the correlation=p value of chi squared


One variable dichotomous, one continuous: point-biserial formula (gives same results like Pearson, but easier)

r = [(mean x1-mean x2)*SQRT (p*(1-p)]SDx  (x score in group, p proportion of people in group 1, SD of both groups together.


Non-parametric correlations


Spearman rank correlation coefficient (how closely ranked data are related)


Step 1 draw scatterplot: what sort of relationship (positive if upward slope), outliers

Step 2 rank data in each group separately (1 lowest score, ties given average score)

Step 3 find d (difference between the ranks of both variables for each person – 0 d = r 1)

Step 4 convert the total d scores to a correlation r=1-6*sigma d squared/N to third power – N (valid only when no ties in data).

Step 5 to find significance, convert to t statistic. Or use Pearson.


Kendall’s tau-a: p values same as Spearman’s but r always lower


Correlation between A and B: A causes B, B causes A, C causes both A and B. Correlation is not causation, but causation is always correlation.


ANOVA (Analysis of Variance) measures outcome (dependent) variable on a continuous scale which depends on an often categorical predictor variables that we either manipulate or measure. Categorical predictor variables are called factors or independent variables. The outcome is also affected by error (other things).


The differences between the scores on outcomes are represented by the variance of the outcome score (difference between each person’s score and the mean score)


Two questions: (1) How much of the variance (difference) between the two groups is due to predictor variable? And (2) Is this proportion of variance statistically significant (larger than we would expect by chance if null hypothesis were true)?


Partition variance into: 1. Total variance 2. Variance owing to predictors (differences between groups) 3. Error (differences within groups)


Variance sums of squares (of squared deviations from the mean): 1. Total sum of squares (SS) 2. Between groups sum of squares 3. Error (within group) sum of squares (SS within or SS error).


Between groups sum of squares: variance that represents the difference between the groups (SS between). Sometimes refers to the between groups sum of squares for one predictor (SS predictor).


SStotal=sigma (x-mean x, mean of all scores) squared


SSinbetween=sigma (x-mean of the group)


SSbetween=SStotal-SSwithin  SStotal=SSwithin+SSbetween


Effect size=SSbetween/SStotal=R squared=eta squared


Statistical significance

Calculate Mean Squares (MS): between and within (but not total).


Step 1 calculate degrees of freedom (dftotal, between, within/error): dftotal=dfwithin+dfbetween

Dftotal=N-1    dfbetween=g-1 (g number of groups)    dfwithin=dftotal-dfbetween


Step 2 calculate MS    MSbetween=SSbetween/dfbetween   MSwithin=SSwithin/dfwithin


Step 3 Calculate F ratio (test statistic for ANOVA) = MSbetween/MSwithin=(SSbetween/dfbetween)/(SSwithin/dfwithin)


Step 4 calculate p value for F with degrees of freedom between and within (use table): report F with degrees of freedom


ANOVA and t test are same when we have 2 groups F=t squared.


But ANOVA can be used to analyze more than 2 groups, calculate p value associated with regression line.


No assumption that outcome variable is normally distributed but that data within each group is normally distributed and that SD within each group is equal (homogeneity of variance with variance being the square of the SD).


ANOVA tests null hypothesis that mean of x of group 1=mean of x of group 2=mean of x of group k. Post hoc tests to determine where the difference comes from if we reject the null hypothesis. Post hoc tests compare each group to each other group. The number of needed tests=number of groups k*(k-1)/2.


Using only t-tests to compare 3 groups creates alpha inflation (type I error: rejecting a true null hypothesis) owing to increase in type I error rate above the nominal type I error rate 0.05. So, we use Bonferroni corrected confidence intervals, dividing alpha by the number of post hoc tests and Bonferroni corrected statistical significance (with p value of t multiplied by 3).


Measures should be reliable (measures well) and valid (measures what it is supposed to measure).


Test is reliable if: (1) it is an accurate measure (2) results are dependable (3) using the measure again obtains same results


Reliability means temporal stability (test-retest reliability) and internal consistency (all parts measure the same thing).


Measuring stability over time with the Bland-Altman limits of agreement


Line of equality: line that all the points would lie on if the person had scored the same for both measures. Range of values within which a person’s score is likely to be is limits of agreement.


Step 1 calculate between scores at time a and time 2

Step 2 draw scatterplot with mean score for each person on x axis and difference on y axis

Step 3 find mean of difference scores

Step 4 find SD of difference score

Step 5 find the 95% limits of agreement (if data normally distributed, 95% of sample lie within 1.96 SDs of the mean difference Lower limit=mean-2SD  Upper limit=mean+2SD

Step 6 add horizontal lines to scatterplot, showing limits of agreement and mean difference

We tell how far apart measures are with SD or variance (SD squared): variance between items within one person and variance between people.


Cronbach’s (coefficient) alpha: estimate of correlation between true score and measured score


Standardized alpha when variance of each item is equal=(k items in scale*average correlation, excluding 1s)/1+(k-1)*average correlation  0.7 high alpha (Square of correlation gives proportion of variance shared by 2 variables. Alpha is correlation. Squaring 0.7 gives 0.49, just under 0.5. Alpha higher than 0.7 guarantees that more than half the variance in the measure is true score variance.


High alpha measures highly correlated items or longer test.


Correlation not useful with categorical data. Cohen’s Kappa: agreement beyond chance agreement.


Step 1 enter data into cells ABCD (AD in agreement, BC in disagreement)

Step 2 calculate expected frequencies only for AD (E=R*C/T Row total, Column total, Grand Total)

Step 3 Kappa=(A+D)-[E(A)+E(D)]/N-[E(A)+E(D)]


>0.2 poor agreement 0.2-0.4 fair 0.4-0.6 moderate 0.6-0.8 good 0.8-1 very good


In case of ordinal data, we use weighted kappa (weighting the level of disagreement by the distance between the measures (rating the measures and squaring the ratings)


Copyright Notice

This material is copyrighted. Free, unrestricted use is allowed on a non commercial basis.
The author's name and a link to this Website must be incorporated in any reproduction of the material for any use and by any means.

Go Back to Home Page!

Internet: A Medium or a Message?

Malignant Self Love - Narcissism Revisited

Frequently Asked Questions about Narcissism

The Narcissism List Home

Philosophical Musings

Write to me: palma@unet.com.mk  or narcissisticabuse-owner@yahoogroups.com