| |
Jermaine, Arumugam, Pol and Dobra win SIGMOD 2007 Best Paper Award
CISE professors Chris Jermaine (pictured at center), Alin Dobra (right), CISE grad student Subi Arumugam (left) and CISE PhD alum
Abhijit Pol (not pictured; now at Yahoo!) were co-recipients of the
SIGMOD 2007 best paper award for their research article entitled
"Scalable Approximate Query Processing with the DBO Engine". SIGMOD is
one of the most selective and widely read venues for publication in
the database research area. SIGMOD receives hundreds of research
submissions each year, and typically less than 15% of the articles are
selected for presentation at the conference.
Their paper describes the query processing engine of a prototype
database engine being developed at UF, called DBO or Database-Online.
The goal of the National Science Foundation-sponsored DBO project is
to build a database engine that can answer analytic or statistical
queries over terabyte-sized data archives just as fast as any popular
commercial or public license database engine such as Oracle or
Postgres. The key benefit of using DBO is that not only does DBO
compute exact answers quickly, but it also uses statistical methods to
always provide the user with a guess as to what the final answer to
the query will be, even very early during query execution. For
example, imagine that a user wishes to compute the total sales of a
certain product by a company, broken down by the company's various
divisions. After a very short time, DBO may report a current estimate
of $8.450 million for one of the divisions, with a 95% chance that the
true value is between $8.410 million and $8.490 million. Since DBO may
be able to provide this estimate after only a few minutes of query
processing when running the query to completion may take hours, it can
result in a huge time savings for the user in the case where having
two digits of accuracy is enough. As Jermaine says, "The data stored
in a data warehouse are typically riddled with errors due to the data
collection, integration, and cleaning process. So it probably does not
make any sense to spend hours trying to compute a few extra decimal
points of an answer that cannot be trusted past the first few digits
anyway!"
|
| |
|