使用参数调整自动增强交易回测

如果您一直在阅读我之前的文章，您可能会看到我一直在努力构建机器学习驱动的交易策略。这个策略的目标是在没有我自己太多投入的情况下击败市场（除了最初的代码开发），这说起来容易做起来难。我之前的许多尝试都集中在回测一种交易策略，该策略利用一种称为 Facebook Prophet 的特定时间序列模型。该模型的策略通常针对加密货币市场或与情绪分析一起进行回测。

这一次，我将针对旧股票市场对其进行回测。这种交易策略的目标是简单地超越相同股票的基本买入和持有策略，并且最终仍然盈利。但是这一次，我将通过调整我为这个 AI 驱动的交易策略制作的许多自定义参数来增强它。我经常发现这些参数需要针对不同的股票进行不同的调整。但是，每次回测新股票时，我都不需要手动更改它们。

为了自动化这个任务，我需要完成以下工作：

确保我的自定义函数彼此无缝集成。
循环遍历我创建的不同参数的各种组合。
开发一个分析函数，自动评估每个回测以确定是否需要再次运行回测，但使用不同的参数。

在接下来的部分中，您可能会注意到我以前的文章中使用的一些熟悉的功能，但这里和那里都有一些细微的改动。我会指出任何重大变化，但它们与以前大致相同。现在让我们开始吧！

要导入的 Python 库

# 库
import pandas as pd 
from datetime import datetime, timedelta 
from tqdm.notebook import tqdm 
import numpy as np 
import plotly.express as px 
from predicting import Prophet 
import yfinance as yf 
import itertools 
import time 
import random 
from scipy import stats 
from statsmodels.stats .weightstats 导入 ztest

数据和 Facebook Prophet

以下函数用于：首先，从 Yahoo Finance 获取价格数据，然后使用一些默认参数实例化 Facebook Prophet，最后在特定时间范围内运行它：

def getStockPrices(stock, n_days, training_days, mov_avg):
    """
    Gets stock prices from now to N days ago and training amount will be in addition 
    to the number of days to train.
    """
    
    # Designating the Ticker
    ticker = yf.Ticker(stock)

    # Getting all price history
    price_history = ticker.history(period="max")
    
    # Check on length
    if len(price_history)<n_days+training_days+mov_avg:
        return pd.DataFrame(), price_history
    
    # Getting relevant length
    prices = price_history.tail(n_days+training_days+mov_avg)
        
    # Filling NaNs with the most recent values for any missing data
    prices = prices.fillna(method='ffill')
    
    # Getting the N Day Moving Average and rounding the values for some light data preprocessing
    prices['MA'] = prices[['Close']].rolling(
        window=mov_avg
    ).mean().apply(lambda x: round(x, 2))

    # Resetting format for FBP
    prices = prices.reset_index().rename(
        columns={"Date": "ds", "MA": "y"}
    )
    
    # Dropping the Nans
    prices.dropna(inplace=True, subset=['y'])
    
    return prices, price_history
  
  
def fbpTrainPredict(df, forecast_period, interval_width=0.80):
    """
    Uses FB Prophet and fits to a appropriately formatted DF. Makes a prediction N days into 
    the future based on given forecast period. Returns predicted values as a DF.
    """
    # Setting up prophet
    m = Prophet(
        daily_seasonality=True, 
        yearly_seasonality=True, 
        weekly_seasonality=True,
        interval_width=interval_width
    )
    
    # Fitting to the prices
    m.fit(df[['ds', 'y']])
    
    # Future DF
    future = m.make_future_dataframe(
        periods=forecast_period,
        freq='B',
        include_history=False
    )
            
    # Predicting values
    forecast = m.predict(future)

    # Returning a set of predicted values
    return forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]
  
  
def runningFBP(prices, forecast_period, training_days, interval_width):
    """
    Runs Facebook Prophet to get predictions over a set period 
    of time. Uses FBP to train and predict every N days and gets the 
    price forecasts.
    """
    # DF for the predicted values
    pred_df = pd.DataFrame()

    # Running the model each day
    for i in tqdm(range(training_days, len(prices)+1), leave=False):
        
        # Training then Predicting the last day of the forecast
        forecast = fbpTrainPredict(
            prices[i-training_days:i], 
            forecast_period,
            interval_width=interval_width
        ).tail(1)
                
        # Adding the forecast predicted (last day)
        pred_df = pred_df.append(forecast, ignore_index=True)
        
    # Prepping for merge by converting date values to be the same type
    pred_df['ds'] = pred_df['ds'].apply(lambda x: str(x)[:10])

    prices['ds'] = prices['ds'].apply(lambda x: str(x)[:10])
    
    # Shifting the forecasts back in order to compare it to the 'current' open values
    pred_df[['yhat', 'yhat_lower', 'yhat_upper']] = pred_df[['yhat', 'yhat_lower', 'yhat_upper']].shift(-forecast_period)
    
    # Merging with the prices DF in order to compare values for positions later
    merge_df = prices[['ds', 'Open']].merge(
        pred_df,
        on='ds',
        how='outer'
    ).dropna().set_index('ds')
    
    return merge_df

功能说明

这 3 个函数与我在之前的文章中使用的函数非常相似，如果不一样的话。以下是每一个的简要介绍：

getStockPrices()— 检索给定股票的价格历史记录并格式化返回的 Pandas DataFrame 以无缝适应 Facebook Prophet。
fbpTrainPredict()— 使用一组默认参数实例化 Prophet，并根据提供的 DF 进行预测。
runningFBP() — 在特定时间长度内重复使用实例化的 Facebook Prophet，进行并收集其预测，然后返回这些预测的 DF。

建立先知交易头寸

此函数建立交易策略的逻辑。它决定一个仓位是否应该买入、持有/退出。根据我之前的文章，你可能对这个位置函数很熟悉。

def fbpPositions(pred_df, short=True):
    """
    Gets positions based on the predictions and the actual values. This
    is the logic of the trading strategy.
    """
    if pred_df['open'] < pred_df['yhat_lower']:
        return 1
    elif pred_df['open'] > pred_df['yhat_upper'] and short:
        return -1
    else:
        return 0

本质上，这个函数所做的是观察 FBP 的价格预测并将它们与“开盘”价格进行比较。由于 FBP 预测上限、中限和下限价格，这些值将与“开盘”价格进行比较以确定策略的推荐交易头寸。

分析风险和绩效

接下来，我需要一个能够在没有我输入的情况下自动评估交易策略回测性能的函数。在我的每一篇相关文章中，我都必须检查回测结果的图表以确定其性能。在使用特定参数评估单个回测时，此方法很好。但是，为了遍历我希望测试的各种参数组合，我需要开发一个函数来检查回测的性能，而无需可视化和我的个人评估。

def riskAnalysis(performance, prices, price_history, interval_width):
    """
    Analyzes the performance DataFrame to calculate various
    evaluation metrics on the backtest to determine if
    the backtest performance was favorable.
    """
    ### Hypothesis testing average returns
    
    # Weekly returns for fb prophet
    rets = performance['fbp_positions'].pct_change(5).dropna()

    # Buy and hold for about the last two years for the stock
    hold = price_history['Close'].tail(500).apply(np.log).diff().cumsum().apply(np.exp).dropna()

    # Weekly returns in those years
    hold_ret = hold.pct_change(5).mean()

    # Average returns
    if len(performance)<=30:
        # T-testing
        stat_test = stats.ttest_1samp(
            rets, 
            popmean=hold_ret
        )
    else:
        # Z-testing
        stat_test = ztest(
            rets, 
            value=hold_ret
        )

    # Ending portfolio balance
    bal = performance.tail(1)
    
    # Moving Average returns
    ma_ret = performance.rolling(window=5).mean().dropna()
    
    # How often fbp beats holding
    ma_ret['diff'] = ma_ret['fbp_positions'] > ma_ret['buy_hold']
    
    diff = ma_ret['diff'].mean()

    # How often the fbp portfolio had a balance greater than its initial balance
    ma_ret['beat_bal'] = ma_ret['fbp_positions'] > 1
    
    beat_bal = ma_ret['beat_bal'].mean()
    
    # How often fbp MA returns were positive
    ma_ret['uptrend'] = ma_ret['fbp_positions'].diff().dropna()>=0
    
    uptrend = ma_ret['uptrend'].mean()
    
    # Performance score
    score = 0
    
    # P-value check
    if stat_test[1]<0.05:
        score += 1
    
    # Checking ending portfolio balance
    if bal['fbp_positions'][0]>bal['buy_hold'][0] and bal['fbp_positions'][0]>1: 
        score += 1
    
    # How often fbp outperformed buy and hold
    if diff>.8: 
        score += 1
        
    # How often fbp had returns greater than the initial portfolio balance
    if beat_bal>.6: 
        score += 1
        
    # How often fbp had positive upward trend
    if uptrend>.55:
        score += 1
        
    # Dictionary containing values
    score_res = {
        "result": True,
        "score": score,
        "endingBalance": {
            "prophet": bal['fbp_positions'][0],
            "buyHold": bal['buy_hold'][0]
        },
        "betterThanBuyHold": diff,
        "greaterPortfolioBalance": beat_bal,
        "upwardTrend": uptrend,
        "pValue": stat_test[1],
        "interval_width": interval_width
    }
    
    if score>=5:
                
        return score_res
    
    else:
        # Backtest result is bad
        score_res['result'] = False
        
        return score_res

在这个函数中，我选择创建自定义评估指标来评估回测结果。以下是我寻找的 5 件事：

回测的期末投资组合余额。
它在同一时间范围内的表现优于买入并持有策略的频率。
投资组合余额超过其初始余额的频率。
每日回报为正的频率。
如果此交易策略的平均每周回报在统计上显着优于买入并持有策略（过去 2 年）的平均每周回报。

为了被认为是成功的回测表现，交易策略必须通过这些评估指标中的每一个。您可能会注意到许多评估指标都有硬编码值。这些值是我自己任意设定的，但是，我相信它们仍然足够重要，可以保证通过。（但是，如果您认为它们应该不同，请随意尝试自己的代码，我将在本文末尾提供）。

创建回测函数

完成风险分析功能后，我现在可以创建一个回测功能，将我最近创建的所有功能链接在一起。

def backtestStock(stock, pred_df, prices, price_history, interval_width):
    
    # Adding positions to the forecast DF
    positions = pred_df

    # Getting forecast prophet positions 
    positions['fbp_positions'] = positions.apply(
        lambda x: fbpPositions(x, short=True), 
        axis=1
    )

    # Buy and hold position
    positions['buy_hold'] = 1
    
    # Getting daily returns
    log_returns = prices[['ds', 'Close']].set_index(
        'ds'
    ).loc[positions.index].apply(np.log).diff()
    
    # The positions to backtest (shifted ahead by 1 to prevent lookahead bias)
    bt_positions = positions[[
        'buy_hold', 
        'fbp_positions'
    ]].shift(1)

    # The returns during the backtest
    returns = bt_positions.multiply(
        log_returns['Close'], 
        axis=0
    )

    # Inversing the log returns to get daily portfolio balance
    performance = returns.cumsum().apply(
        np.exp
    ).dropna().fillna(
        method='ffill'
    )
    
    # Performing risk analysis
    risk = riskAnalysis(performance, prices, price_history, interval_width)
                
    return risk, performance

你会注意到我坚持使用我以前多次使用过的向量化回测方法。无论如何，在对回测结果运行风险分析函数后，该函数会返回一个简单的回测及其性能分析报告。

调整参数

正如您在我创建的每个函数中可能已经注意到的那样，我为每个函数都包含了许多可调整的变量：

FBP 的训练天数——训练模型所需的天数。
移动平均量——平均多少天以平滑价格数据（通常设置为 3 或 5）。
预测期——预测模型将预测未来多少天（通常设置为 3 或 5）。
区间宽度 — 价格预测上限和下限的范围有多远（默认为 0.8）。
回测时长——对模型进行回测的天数（时间越长，回测多个参数所需的时间越长）。

这些变量中的每一个都可能对绩效结果产生重大影响。我尝试过的每只股票都需要对这些变量中的至少一个进行调整。

以前，我会满足于对回测参数进行硬编码并保持原样。现在，我相信下一步是找到最佳的参数组合，以实现最佳的回测。

def parameterTuning(stock, n_days_lst, training_days_lst, mov_avg_lst, 
                    forecast_period_lst, fbp_intervals, stop_early=True):
    """
    Given a list of parameters for a specific stock. Iterates through
    different combination of parameters until a successful backtest
    performance is found.
    
    Optional stop_early variable for stopping tuning immediately 
    when a positive backtest result is found
    """
    
    # Tuning the stock with FB Prophet
    print(f"Tuning FBP parameters for {stock}. . .")
    
    # All combinations of the parameters
    params = [n_days_lst, training_days_lst, mov_avg_lst, forecast_period_lst, fbp_intervals]
    
    lst_params = list(itertools.product(*params))
    
    # Randomizing order of params
    random.shuffle(lst_params)
    
    # List of tested params
    param_lst = []
    
    # Iterating through combos
    for param in tqdm(lst_params):

        # Retrieving prices with the given parameters
        prices, price_history = getStockPrices(
            stock, 
            n_days=param[0],
            training_days=param[1], 
            mov_avg=param[2]
        )

        # Checking if the prices retrieved are empty
        if prices.empty:
            print(f"Not enough price history for {stock}; skipping backtest...")
            continue

        # Running Facebook Prophet with the set parameters
        pred_df = runningFBP(
            prices, 
            forecast_period=param[3], 
            training_days=param[1],
            interval_width=param[4]
        )

        # Running backtest
        backtest, performance = backtestStock(
            stock, 
            pred_df, 
            prices, 
            price_history, 
            interval_width=param[4]
        )
        
        # Creating param dictionary to record results
        res = {
            "n_days": param[0],
            "training_days": param[1],
            "mov_avg": param[2],
            "forecast_period": param[3],
            "interval_width": param[4],
            "backtestAnalysis": backtest,
            "bt_performance": performance
        }
        
        # Appending the results
        param_lst.append(res)
                
        # Checking backtest result
        if backtest['result']==True and stop_early==True:
                        
            return {
                "optimumParamLst": param_lst,
                "optimumResultFound": True
            }
        
    # Dictionary containing sorted parameter results list; best result is last
    param_d = {
        "optimumParamLst": sorted(
            param_lst,
            key=lambda x: (
                x['backtestAnalysis']['score'], 
                x['backtestAnalysis']['pValue']*-1,
                x['backtestAnalysis']['endingBalance']['prophet']
            )
        ),
        "optimumResultFound": True
    }

    # Returning parameter tuning results
    if backtest['result']==True:
        
        return param_d
    
    else:
        
        # Poor backtest performances
        param_d['optimumResultFound'] = False
        
        return param_d

此函数为上面列出的每个变量使用一组不同的值。然后，它遍历可以进行的许多参数组合，直到为该特定股票找到成功的回测。

该stop_early参数用于在找到参数后立即停止搜索最佳参数组合。可能有许多最佳参数集，但如果您只想找到第一个尽快通过所有分析检查的参数集，则提供了该选项。如果没有找到最佳组合，则返回“最佳”参数集（通过最多的分析检查、最低的 P 值、最高的投资组合期末余额等）。

用当前数据回测股票

假设参数调整功能能够提供成功的回测，那么接下来会是什么？在股票列表上使用这些功能并给出交易头寸建议。最后，我可以在今天的股价上使用这个策略！

def backtestStockAndPredict(stock, n_days_lst, training_days_lst, mov_avg_lst, 
                            forecast_period_lst, fbp_intervals,stop_early=True, visualize=True):
    
    # Printing the stock
    print(f"\nBacktesting {stock}. . .")

    # Tuning parameters for stock
    results = parameterTuning(
        stock, 
        n_days_lst, 
        training_days_lst, 
        mov_avg_lst, 
        forecast_period_lst, 
        fbp_intervals,
        stop_early=stop_early
    )
    
    if results['optimumResultFound']==False:
        print(f"\t***No optimum params found for {stock}***")
    
    # Optimum Parameters
    opt_params = results['optimumParamLst'][-1]
    
    # Visualizing last (best) result
    if visualize:
        
        # Getting the performance DF
        performance = opt_params['bt_performance']
        
        # Visual of the backtest
        fig = px.line(
            performance,
            x=performance.index,
            y=performance.columns,
            title=f'FBProphet vs Buy&Hold for {stock}',
            labels={"value": "Portfolio Balance",
                    "index": "Date"}
        )

        fig.show()
    
    # Retrieving prices with the given parameters
    prices, price_history = getStockPrices(
        stock, 
        n_days=opt_params['n_days'],
        training_days=opt_params['training_days'], 
        mov_avg=opt_params['mov_avg']
    )
            
    # Run Prophet for current prediction
    preds = fbpTrainPredict(
        prices.tail(opt_params['training_days']), 
        opt_params['forecast_period'],
        opt_params['interval_width']
    ).tail(1)

    preds['Open'] = prices.tail(1)['Open'].values

    # Getting forecast prophet positions
    trade_decision = fbpPositions(preds.to_dict('records')[0], short=True)

    trade_dict = {
        1 : f"Buy {stock}",
        0 : f"Exit {stock}/Do nothing",
        -1: f"Short {stock}"
    }

    # Printing trade decision
    print(trade_dict[trade_decision])
    
    # Printing the optimum params
    print("Best Optimum Parameters Found:\n", opt_params)

    return

在此功能中，特定股票会经过通常的回测过程、风险分析和参数调整。之后，如果找到最佳参数，则该函数使用这些参数进行另一个预测，但基于今天的价格数据。最后，我应该把当天的职位推荐打印出来！

回测股票列表

在下面的代码中，我设置了一个参数值列表以进行测试，并可能找出哪一组参数表现最好。我选择的股票清单没有什么特别之处。我想尝试看看这种交易策略如何针对不同类型的行业及其各自的股票代码执行。

# 要测试的不同参数列表
n_days_lst = [100]training_days_lst = [50,100,200,400]mov_avg_lst = [3,5]predict_period_lst = [3,5]fbp_intervals = [0.80, 0.90, 0.99]# 要回测股票代码的股票列表
= ["MSFT", "JNJ", "DIS", "LMT", "GOOG"]# 洗牌股票
random.shuffle(tickers)# 在股票代码中遍历 i 的股票
：
    
    # 回测每一个
    backtestStockAndPredict( 
        stock=i, 
        n_days_lst=n_days_lst, 
        training_days_lst=training_days_lst, 
        mov_avg_lst=mov_avg_lst, 
        forecast_period_lst=forecast_period_lst, 
        fbp_intervals=fbp_intervals, 
        stop_early=True, 
        Visualize=True 
    )

使用此代码块，我可以遍历列表并使用我的自定义函数和参数值来找到成功的回测结果……