從變數、一元運算符和二元運算符以及超參數的所有可能組合中選擇最佳表達式
我有一些股票領域的財務變數,例如 OHLC 價格、交易量和其他具有不同時間頻率的基本面。使用這個集合,我正在創建一個表達式,它給出了在特定日期在整個宇宙中投資於該特定股票的權重。
例如,表達式可以是
$$ (close-open)/vwap $$ $$ (close-average(close,10))/std(close,10) $$
該表達式在宇宙中的每個股票上進行評估,最終向量被正規化以達到為該特定股票分配的權重。如果權重為正,則意味著我們正在做多該股票,如果為負,則方向為空,如果權重為 0,則沒有為該特定股票分配頭寸。
對過去的數據重複此設置並進行回測以評估表達式的性能。回測揭示了一些參數,如夏普或絕對回報等。
Now given a fixed set of variables, operators, and functions that have some meta parameters, what is the best possible algorithm to find optimal expressions that result in good performance over the historic data(let us say all possible combinations which have a sharpe ratio above 4). Also, it would great to know if there are any algorithms in the literature instead of searching over the whole space of expressions, which can be used to artificially generate new expressions based on a training set created by humans which contains the sequence they used to arrive at a good expressions and expressions which didn’t work (artificially generating new expressions).
The problem you describe can be handled as an optimization problem: evolve a program such that it maximizes some performance measure. The technique you may want to look into is called “Genetic Programming”. For a financial application see for example Single versus Multiple Tree Genetic Programming for Dynamic Decision Making.