Large data sets require statistical analysis, but some data may come in varied forms (text, audio, video, sensor data, etc.). To cope with data structure variations, optimization explores integrating statistical processing techniques with data processing systems to make such systems easer to build, maintain and deploy.
Mass digitization of printed media into plain test is changing the types of data that companies and academic institutions manage. Scanned images are converted to plain text by conversion software, but often the software is error-prone. Any query of the digitized media may miss information that may lead to a poor results or applications. Staccato software improves the digitization process.
Laura Albert is an affiliate of the optimization group at WID and an Associate Professor of Industrial and Systems Engineering. Her background is in operations research–the discipline of applying advanced analytical methods to make better decisions. She studies how to allocate scarce resources to patients, passengers, and casualties in a risk-based manner. In addition to her academic […]
There is an arms race to perform increasingly sophisticated data analysis on ever more varied types of data (text, audio, video, OCR, sensor data, etc.). Current data processing systems typically assume that the data have rigid, precise semantics, which these new data sources do not possess. On the other hand, many of the state-of-the-art approaches […]
Jellyfish is an algorithm for solving data-processing problems with matrix-valued decision variables regularized to have low rank. Particular examples of problems solvable by Jellyfish include matrix completion problems and least-squares problems regularized by the nuclear norm or γ2-norm. Jellyfish implements a projected incremental gradient method with a biased, random ordering of the increments. This biased […]
Active learning methods automatically adapt data collection by selecting the most informative samples in order to accelerate machine learning. Because of this, real-world testing and comparing active learning algorithms requires collecting new datasets (adaptively), rather than simply applying algorithms to benchmark datasets, as is the norm in (passive) machine learning research. To facilitate the development, […]