You have been granted access to this page through First Click Free. Subsequent use of TabbFORUM will require logging in. If you don't have an account, registration is free.

Videos

  • Rail_thumb_adam_honore_david_etue-finqloud-safenet-cloud_security_in_financial_services

    Debunking the Cloud Security Myth

    The idea that the cloud is not secure is a misperception, says David Etue, SafeNet. While the cost and agility gains that come with moving to the cloud also come with a loss of some...
     
  • Rail_thumb_steve_phillips-nasdaq_omx-latam

    Latin America's Quest for Liquidity

    Brazil no longer is the only game in town when it comes to trading in Latin America. According to Steve Phillips, senior managing director, Latin America and Caribbean, NASDAQ OMX...
     
  • Rail_thumb_adam_will_screen_shot

    What Will Nasdaq Do With eSpeed?

    Following several failed cross-border exchange mergers, Nasdaq’s acquisition of the electronic fixed income trading platform eSpeed provides the exchange with a way to diversify...
     
 

More Video | Podcasts

Advertisement
Spotlight-blackInnovations in Trading and Technology (more stories)

22 March 2011

No More Limitations: Data Analytics

Innovation happens where business and technology meet, Watkins says. In the world of data, innovation has answered this question: Why move the data to the analytics when you can move the analytics to the data?

In our ultra-low latency world, innovation rarely occurs until business and technology come together.

On the business side, for example, exchanges, traders and managers typically know their problems and opportunities but generally might not be aware of new technologies that can either solve problems or address opportunities until competitors – or other business units within the same organization – have adopted new techniques.

On the technology side, engineers creating advanced technology understand that new techniques can enable new systems. Cutting edge firms are trying to find scenarios in which those advancements can be leveraged as a springboard to new positions for their clients.

How do these two sides come together? Often the most innovative careers migrate back and forth between the buy side of technology, such as with investment banks, hedge funds, asset managers, and the sell side of tech, such as vendors or consulting shops.

On the buy side of technology, advanced engineering groups incubate intelligent applications under strenuous and prudent evaluation. When a concrete strategy is tested atop of new innovation, the market can realize the position change of that event almost instantaneously.

Overall, development is usually led by front office teams faced with client-driven, high value/lower cost business drivers. For example, it is now an axiom in trading that only one in 10 equity trades is profitable, only two in 10 options trades, and only three in 10 in futures trades. So the simple requirement to be one of the winners in trading is based on a strong market execution strategy, coupled with frequent evaluations and enhancement of technology.

One of the most interesting problem areas has always been around market data. What are the prices? What were they? Who bought what? Who's selling and why? How does the back testing look?

As trading technology advances, the number of messages grow. As the messages grow, volumes follow. Volumes lead to more quotes. Market segmentation encourages more quoting venues. The world of trading has become the world of the best information in the fastest times … to receive it and act upon it.

Today’s markets throw off an ever-increasing amount of data that always contains a relative advantage to those who can fully absorb, analyze, understand and mine it. As volatility spikes become the norm, trading segments seek to analyze more events, compare venues with regions and assets to trade. As such, market data, the normalization of it and the storage and the analysis of it puts more and more stress on the enterprise. Not to mention the integrity of the analytics.

The current state of the art is that data is brought to the analytics. Data is kept somewhere deep in an archive, a database or storage unit. It is then removed and brought to the analytics in piecemeal to determine risk, build strategy, back test the strategy, view compliance metrics and on down to proper settlement.

Ideally, faster systems and/or larger capacity systems now available can help with growing data sets but moving data across the enterprise continues to put so much strain and latency on the process that most data is actually discarded and new data used, thus sacrificing information for performance.

Further, storage capacities have to be built out to allow mining of the growing “big data,” with average costs well above $100,000 per terabyte. Then there’s the movement of data across the LAN and the WAN that adds pressure to the bottom line in cost, time delay and data integrity.

Those who are able to archive and replay today’s market prices and run analysis on that info can easily be looking at tens of millions annually to buy, store, recall, replay, move, distribute and test with no guarantee of a profit. The tradeoff option is do it slower, take longer and analyze smaller sets at the risk of being last in the marketplace.

Analytics in and of themselves are by definition increasingly complex sets of patterns and models used in all aspects of finance. It’s arguably an art of designing interfaces and winning computations that make the difference to what is used successfully and what is not. Still, the size and depth make the opportunity of value moot.

Sometimes the best analytics can be held hostage to bottlenecks of technology that prohibit the data’s integrity, accuracy and timely arrival. However, if the analytics reside where the data is stored, a plethora of problematic implications dissipate. The accuracy and integrity of the solution is thus sharpened. The need to query data becomes almost obsolete as the data provides live results with constant computations running.

The bottom line is that putting powerful predictive analytics, which require no programming or quantitative teams, on the desktops of decision makers creates results heretofore only available to the largest of companies. The advent of in-database analytics offer this at a fraction of the cost of traditional solutions. In fact, we are currently seeing a 10:1 return on investment for our clients.

Why move the data to the analytics when you can move the analytics to the data? Analyzing large volume of data presents numerous challenges – it is time consuming, very expensive and requires management of complex technology infrastructure. In traditional approaches for analyzing data, end-users must move data into memory for processing. This activity accounts for up to 75 percent of the cycle time and imposes severe constraints on delivery of results. In addition, the client or server where the processing is done must have enough memory to store the data and intermediate results.

Fuzzy Logix, founded by investment bankers who have constructed highly functional and performance-driven analytics for a variety of asset classes and risk measurements, started the company for others with similar data-management and analytical bottlenecks.

In these efforts a series of algorithms, functions and computations were compiled as a library called DB Lytix.

The joint efforts by database, data warehousing and business intelligence technology are an example of the two worlds – business and technology – coming closer together. Some estimates indicate predictive analytics will account for 30 percent of all analytics by 2014. Of course these are, well, predictions, but as long as the data continues to compile together with the analytics, there are no limitations.

Spotlight-white-trans For more stories in the Innovations in Trading and Technology Spotlight Series click here.

Comments | Post a Comment

4 Comments to "No More Limitations: Data Analytics":
  • Comment_me_1
    louislovas

    23 March 2011

    Nice article to define a problem space. The marriage of analytics and data is at the heart of the technology from OneMarketData. Putting the analytics as 'close to the metal' as possible reduces latency, network bandwidth and increases performance of trading strategies and quant research. What you describe is the core reason why relational databases are a poor choice for the quantitative world - no inbuilt analytics (unless you consider SUM,AVE analytics). Thus making all true analysis external to data. This is where and why the latency/performance penality (not to mention custom coding efforts) are incurred. Real 'tick' engines such as OneMarketData's OneTick product provide true analytical processing inside the data engine - both historical and live markets. So quant strategies, backtesting can take advantage of nanosecond performance.

  • Comment_valente_2010_head
    gvalente

    23 March 2011

    While I agree that some databases are a poor choice "Big Data" and some a poor choice for "analytics" and an even more of them are a poor choice for "Big Data Analytics", I think you might have thrown the baby out with the bath water here. Many MPP Relational databases include inbuilt analytics these days - Netezza, Aster, and my company XtremeData all do it in a large scale MPP fashion from 1TB to 10PB in size. Analytics at the data with all cores (1000's of them working in parallel on their part of the data without extracting it from the DB itself). So what do you mean exactly? For example, we all support the ability to use SQL to call a built-in analytical function library in the database (Like KXEN, SAS, R, and/or Fuzzy). You, the user, do not have to know anything new, but simply ask the question you want to ask. These are standard languages and Tools - R is now the most popular statistical language and SQL has been used for 20+ years as the defacto standard. SAS/KXEN/Fuzzy bring analytical functions to the table as a library so you just call it with a user defined function - it is very simple and not a "custom coding effort" at all. Now, we each have our own features and benefits, but saying RDBMS is a poor choice is simple not true anymore and hasn't been for 3-5 years. If anything, I would say that OneTick is a custom coding effort, doesn't linearly scale to "big data", and has a fixed defined by OneTick "schema". I'm not a OneTick expert, but what I've seen and competed against is OneTick is the KING at anything that fits in memory / CEP and falls apart at anything larger than one server aka your OneQuantData product at large scale (full depth of book types of equities / options data). Maybe that has changed and I'll let others jump in, but I'm pretty sure you've overlooked the past 5 years of innovation in the "Big Data Analytics" space with your comments above. gv: twitter/xtremedata

  • Comment_me_1
    louislovas

    24 March 2011

    I’ve always loved that phrase; “throw the baby out with the bath water” it conjures up all sorts of mental images. OneQuantData is a reference data product, something completely different than OneTick. OneTick scales and quite massively, we have numerous customers running Options market making across multiple servers. The product easily leverages multiple cores to handle the OPRA firehose. Data can be partitioned or duplicated across servers so full shared nothing or shared everything is possible and user queries can be parallel processed across cores - either on single machine or multiple machines. There is complete flexibility with server deployments. Schema creation is, and always has been completely flexible and user definable. We do ship a number of pre-defined schemas for trades, quote and orderbook types because it’s the most often used in our customer’s business – why not. We make the most efficient use of memory, and like any system the more the better. User queries have transparent access to data, it can reside either in-memory, in archival databases, or streaming live. A query’s time frame can span from years in the past to the present and into the future. The OneTick architecture will manage the blending of data from on-disk, in-memory and live connections. In finance if you store time-series tick data, it means users will demand that you also handle corporate actions, futures contract rolls, corrections and cancellations without undo data management/manipulation headaches. This is all a natural part of OneTick since it sole purpose is financial data modeling. In addition to our own analytical library of over 100 functions highly purposed for the financial markets, R and MATLAB are directly integrated within our server. All these functions can be applied to any schema type in user queries – even a complex structure like an order book. Creative innovation by Vertica, Netezza, XtremeData and others has brought relational engines out of the stone-age, it’s about time. Yet such technology is more general-purpose by-design. Those new vendors, make great Oracle-busters since they can drop-replace them and likely be orders-of-magnitude faster… for some shipping company’s order-entry application. But effectively managing financial data is a horse of different color, general-purpose database technology doesn’t play.

  • Comment_dan_pt_headshot
    dwatkins

    28 March 2011

    Thank you for your comments and feedback Luis!! I have learned a few things, so thank you! However, I do have to say, that even though the large data-warehouse companies offer general purpose computing, where they differ even more over the traditional db's are in sheer speed and capacity. Many have allowed for Fuzzy Logix to embed it's in-database analytics to co-exist in code with some of the faster dw's out there, and have brought time process times from days as low as seconds. The real-time world can still only swallow a bite size at a time, a terabyte or two, before discarding valuable bits of information which could be used later. The large 30 - 50 terabyte dw's no longer having to discard old data but create a historical db and add regularly are a next gen technology most will adopt soon enough. Plus, the db's and the dw's out here, have so many more compute cores, calculations can run much faster and with more data to sniff, more accurately..

You must log in to comment.