FREE Registration is required
Overview:
This paper studies selectivity estimation techniques for set similarity queries. A wide variety of similarity measures for sets have been proposed in the past. In this paper they concentrate on the class of weighted similarity measures (e.g., TF/IDF and BM25 cosine similarity and variants) and design selectivity estimators based on a priori constructed samples. First, they study the pitfalls associated with straightforward applications of random sampling, and argue that care needs to be taken in how the samples are constructed; uniform random sampling yields very low accuracy, while query sensitive real-time sampling is more expensive than exact solutions (both in CPU and I/O cost). They show how to build robust samples a priori, based on existing synopses for distinct value estimation.
(Is this item miscategorized? Does it need more tags? Let us know.)
| Format: | Size: | 246 KB | |
| Date: | Aug 2008 | ||
| Pages: | 12 |
Top results from Software Engineering
White Papers, Webcasts, and Resources
- Retailers' Response to the Global Economy Downturn — Enabling Immersive Shopping Experiences OracleTo survive todays economy, retailers must innovate to serve customers more effectively using tightly integrated CRM software suites. Read why.
- Designing High Availability for Internet Information Services CA XOsoftEnd downtime forever on your Web servers running Microsoft Internet Information Services with this step-by-step guide to high availability.
- Spend 3 minutes with free EBS ROI Tool - and Save Thousands IBMSee exactly how an Oracle EBS upgrade can lower your cost of ownership, deliver greater business intelligence, and improve capabilities company-wide.
Premier Vendor Content Whitepapers, webcasts & resources from our Power Center Sponsors
Featured Training Courses
- Implementing and Administering Windows 7 in the Enterprise
- CCNA Boot Camp v2.0
- VMware vSphere: Install, Configure, Manage [V4]
- Certified Ethical Hacker
- Management and Leadership Skills
- Browse all Training Courses
Enterprise Applications
- Check out some of the easiest and most powerful ways to boost productivity while saving money on your application infrastructure. See ZDNet's comprehensive Enterprise Application resource center, now!
- New Online Dashboard
- Read about top issues IT decision-makers face every day, plus get cost effective solutions to real life IT problems. Oracle Topline



