Search
-
This dissertation presents results of our studies in making data to decision pipelines for embedded and social sensing efficient. Due to the pervasive presence of wired sensors, wireless sensors, and mobile devices, the amount of data about the physical world (environmental measurements, traffic, etc.) and human societies (news, trends, conversations, intelligence reports, etc.) reaches an unprecedented rate and volume. This motivates us to optimize the way information is collected from sensors and social entities.
-
The development of Web 2.0 techniques has led to the prosperity of online communities, which spread to various domains and areas in our daily life. When it comes to the medicine and
healthcare domain, a series of good online services such as Yahoo! Groups,WebMD and Med-
Help, offer patients and physicians a good platform to discuss health problems, e.g., diseases and drugs, diagnoses and treatments, which also provide a large volume of data for researchers to analyze and explore.
-
In practical data integration systems, it is common for the data sources being integrated to provide conflicting information about the same entity. Consequently, a major challenge for data integration is to derive the most complete and accurate integrated records from diverse and sometimes conflicting sources. We term this challenge the truth finding problem. We observe that some sources are generally more reliable than others, and therefore a good model of source quality is the key to solving the truth finding problem.
-
Users want web pages to load quickly. Because modern web pages make connections to many hosts, this requires that small flows complete quickly at high percentiles.
We explore how to achieve this goal with protocols that do not require modifications to all the routers and agents transmitting the
flow. Our simulations indicate that both Random Early Detection and Fair Queuing can significantly reduce flow completion times at both the median and the 99th percentile. Fair Queueing provides more consistent reductions across varying bandwidth-delay products and background traffic.
-
As software is growing in size and complexity, accompanied by vendors’ increased time-to-market pressure, it has become increasingly difficult to deliver bulletproof software. Consequently, software systems still fail in the production environment.
Once a failure occurs in production systems, it is important for the vendors to trouble-shoot it as quickly as possible since these failures directly affect the customers. Consequently vendors typically invest significant amounts of resources in production failure diagnosis.
-
In many hard real-time avionics systems, more and more features are being added to faster but cheaper hardware. Thus, hardware resources such as computation and network bandwidth are increasingly being shared by multiple applications, leading to rapid increases in the size and complexity of the overall system.
-
Explosion in Big Data has led to a rapid increase in the popularity of Big Data analytics.
-
In today's data-intensive cloud systems, there is a tension between resource limitations and strict requirements. In an effort to scale up in the cloud, many systems today have unfortunately forced users to relax their requirements. However, users still have to deal with constraints, such as strict time deadlines or limited dollar budgets. Several applications critically rely on strongly consistent access to data hosted in clouds.
-
With the rapid development of positioning technologies, sensor networks, and online social media, spatiotemporal data is now widely collected from smartphones carried by people, sensor tags attached to animals, GPS tracking systems on cars and airplanes, RFID tags on merchandise, and location-based services offered by social media.
-
Real-world physical objects and abstract data entities are interconnected, forming gigantic networks.
By structuring these objects and their interactions into multiple types, such networks become
semi-structured heterogeneous information networks. Most real-world applications that handle big
data, including interconnected social media and social networks, scientific, engineering, or medical
information systems, online e-commerce systems, and most database systems, can be structured
into heterogeneous information networks.
|