Abstract: |
|
Data intensive science is characterized by problems where data is the primary challenge, whether it be the volume, rate of acquisition, complexity, or uncertainty of the data. The increasingly data intensive nature of scientific discovery is transforming the methodology of scientific investigation. Traditionally the scientific method has started with a hypothesis formulated from a scientific theory that is confirmed or refuted through experiments which in turn advance scientific theory. Over a decade ago, it was recognized that computation forms a “third leg” of the scientific method: computation, primarily in the form of simulation, accelerates the development of theory and informs the design of experiments. As a result, enormous volumes of heterogeneous and complex Data are increasingly being generated from sophisticated, large-scale computations and experiments. Additionally, this massive infusion of data is now readily accessible to scientists worldwide through the Internet. This has sparked a revolution in the scientific method – scientists can now remotely explore large sources of globally distributed data, in near real-time, to discover new hypothesis, which in turn spawn many more computations and experiments that generate even more data creating a chain reaction that propels scientific discovery at an continually increasing rate.
At the extreme scale the research challenges of analyzing data from computations and experiments can be daunting. This talk will provide an overview of techniques for analysis at extreme scale and will provide some specific examples of using a randomized, graph-based analytical approach to address these challenges. |
|