回測與實時交易數據處理和抽象
I’m an individual trying to build a trading system which will ideally be eventually scalable to 1-15 second resolution intraday trading strategies. I’m having some trouble understanding the difference between data feeds applied to a backtest and data feeds applied to live trading and I have a few specific questions:
Are backtests and live trading typically built upon the same abstraction or operate on the same trading/processing engine?
My first thought is that they are, since they should ideally operate identically for backtesting accuracy and realism
Do live data feeds and historic datasets typically provide an identical interface in their respective handlers for querying/retrieving their datasets?
The current state of my system only runs on historic data for backtests with the trading engine updating the time step of the data handler itself. This doesn’t seem like a viable solution for higher frequency real time data since it relies on the trading engine to update the data feed rather than the data feed handler. After some research, it seems like a query-based mechanism would be more suitable for real time data since it gives data management control to the handler. Though I’m having trouble understanding how static historical data could be loaded into data stream to be processed in an identical way.
How are data streams most effectively processed?
Under the assumption that both historical data and live data are fed into event streams. I’m having trouble understanding how those streams are then made to be queried or otherwise predictably retrievable by the trading engine. A time series database seems to make the most sense purely due to its ability to handle the volume of data and store enough lookback data, but most time series databases though are fairly expensive for an individual, and I’m not sure it’s the most cost effective way of processing the volume of data. What other options are there for providing an effective query engine for both historical and live data feeds?
I’m probably a bit out of my depth here since I’m still fairly new to this, so please tell me if I’m taking the wrong approach to this. I’d also love some more info as to how these systems are designed, or any other resources/books to read regarding data handling methods.
Yes, I recommend making historical backtests and live trading as similar as possible. This leaves you one lesser source of variability when you inevitably see different backtest and live results.
Do live data feeds and historic datasets typically provide an identical interface in their respective handlers for querying/retrieving their datasets?
Are backtests and live trading typically built upon the
$$ … $$ operate on the same trading/processing engine?
These are two different things. Both are important.
Having the same interface lets you reuse the same code for both backtest and production. Arguably, this is slightly more important because:
- Most strategies are very complex state machines, and it is very difficult to implement the same strategy twice over with two different sets of interfaces.
- At some point upstream, it is nearly impossible to use the exact same “processing engine” for backtest and production anyway since the former reads from a file while the latter reads off a multicast/unicast subscription, and the former spends most of the time waiting while the latter can keep reading. I’ve seen some firms go to extreme lengths to do unify the two, even making their backtesting platform replay whole packet captures just to backtest 1 symbol, with insubstantial benefits.
- The purpose of a “backtest” is not necessarily to get accurate metrics like PnL, and there could be many other goals which cause you to design your backtest loop (presumably a major part of your “processing engine”) to prioritize speed (throughput) over accuracy, or parallelism over serial processing, async over synchronous etc.
How are data streams most effectively processed?
See my other posts on databases.