476.
Distributional Features for Text Categorization.
Abstract :-
Text categorization is the task of assigning predefined categories
to natural language text. With the widely used ‘bag
of words’ representation, previous researches usually
assign a word with values such that whether this word appears
in the document concerned or how frequently this word appears.
Although these values are useful for text categorization,
they have not fully expressed the abundant information contained
in the document. This paper explores the effect of other types
of values, which express the distribution of a word in the
document. These novel values assigned to a word are called
distributional features, which include the compactness of
the appearances of the word and the position of the first
appearance of the word. The proposed distributional features
are exploited by a tfidf style equation and different features
are combined using ensemble learning techniques. Experiments
show that the distributional features are useful for text
categorization. In contrast to using the traditional term
frequency values solely, including the distributional features
requires only a little additional cost, while the categorization
performance can be significantly improved. Further analysis
shows that the distributional features are especially useful
when documents are long and the writing style is casual.
477.A
Survey of Uncertain Data Algorithms and Applications.
Abstract :-
In recent years, a number of indirect data
collection methodologies have led to the proliferation of
uncertain data. Such databases are much more complex because
of the additional challenges of representing the probabilistic
information. In this paper, we provide a survey of uncertain
data mining and management applications. We will explore the
various models utilized for uncertain data representation.
In the field of uncertain data management, we will examine
traditional database management methods such as join processing,
query processing, selectivity estimation, OLAP queries, and
indexing. In the field of uncertain data mining, we will examine
traditional mining problems such as frequent pattern mining,
outlier detection, classification, and clustering. We discuss
different methodologies to process and mine uncertain data
in a variety of forms..
478.
Provably Secure Steganography.
Abstract :-
Steganography is the problem of hiding secret messages in
“innocent-looking” public communication so that
the presence of the secret messages cannot be detected. This
paper introduces a cryptographic formalization of steganographic
security in terms of computational indistinguishability from
a channel, an indexed family of probability distributions
on cover messages. We use cryptographic and complexity-theoretic
proof techniques to show that the existence of one-way functions
and the ability to sample from the channel are necessary conditions
for secure Steganography. We then construct a steganographic
protocol, based on rejection sampling from the channel that
is provably secure and has nearly optimal bandwidth under
these conditions. This is the first known example of a general
provably secure steganographic protocol. We also give the
first formalization of “robust” Steganography,
where an adversary attempts to remove any hidden messages
without unduly disrupting the cover channel. We give a necessary
condition on the amount of disruption the adversary is allowed
in terms of a worst case measure of mutual information. We
give a construction that is provably secure and computationally
efficient and has nearly optimal bandwidth, assuming repeatable
access to the channel distribution.
479.
Efficient Software-Based Encoding and Decoding of BCH Codes.
Abstract :-
Error correction software for Bose-Chaudhuri-Hochquenghem
(BCH) codes is optimized for general purpose processors that
do not equip hardware for Galois field arithmetic. The developed
software applies parallelization with a table lookup method
to reduce the number of iterations, and maximum parallelization
under a cache size limitation is sought for a high throughput
implementation. Since this method minimizes the number of
lookup tables for encoding and decoding processes, a large
parallel factor can be chosen for a given cache size.The naive
word length of a general purpose CPU is used as a whole by
employing the developed mask elimination method. The tradeoff
of the algorithm complexity and the regularity is examined
for several syndrome generation methods, which leads to a
simple error detection scheme that reuses the encoder and
a simplified syndrome generation method requiring only a small
number of Galois field multiplications. The parallel factor
for Chien search is increased much by transforming the error
locator polynomial so that it contains symmetric exponents
of positive and negative signs. The experimental results demonstrate
that the developed software cannot only provide sufficient
throughput for real-time error correction of NAND flash memory
in embedded systems but also enhance the reliability of file
systems in general purpose computers.
479.
Collusive Piracy Prevention in P2P Content Delivery Networks.
Abstract :-
Collusive piracy is the main source of intellectual property
violations within the boundary of a P2P network. Paid clients
(colluders) may illegally share copyrighted content files
with unpaid clients (pirates). Such online piracy has hindered
the use of open P2P networks for commercial content delivery.
We propose a proactive content poisoning scheme to stop colluders
and pirates from alleged copyright infringements in P2P file
sharing. The basic idea is to detect pirates timely with identity-based
signatures and time stamped tokens. The scheme stops collusive
piracy without hurting legitimate P2P clients by targeting
poisoning on detected violators, exclusively. We developed
a new peer authorization protocol (PAP) to distinguish pirates
from legitimate clients. Detected pirates will receive poisoned
chunks in their repeated attempts. Pirates are thus severely
penalized with no chance to download successfully in tolerable
time. Based on simulation results, we find 99.9 percent prevention
rate in Gnutella, KaZaA, and Freenet. We achieved 85- 98 percent
prevention rate on eMule, eDonkey, Morpheus, etc. The scheme
is shown less effective in protecting some poison-resilient
networks like BitTorrent and Azureus. Our work opens up the
low-cost P2P technology for copyrighted content delivery.
The advantage lies mainly in minimum delivery cost, higher
content availability, and copyright compliance in exploring
P2P network resources.
480.
A High-Speed Compressor for Double-Precision Floating-Point
Data.
Abstract :-
Many scientific programs exchange large quantities
of double-precision data between processing nodes and with
mass storage devices. Data compression can reduce the number
of bytes that need to be transferred and stored. However,
data compression is only likely to be employed in high-end
computing environments if it does not impede the throughput.
This paper describes and evaluates FPC, a fast lossless compression
algorithm for linear streams of 64-bit floating-point data.
FPC works well on hard-to-compress scientific data sets and
meets the throughput demands of high-performance systems.
A comparison with five lossless compression schemes, BZIP2,
DFCM, FSD, GZIP, and PLMI, on 4 architectures and 13 data
sets shows that FPC compresses and decompresses one to two
orders of magnitude faster than the other algorithms at the
same geometric-mean compression ratio. Moreover, FPC provides
a guaranteed throughput as long as the prediction tables fit
into the L1 data cache. For example, on a 1.6-GHz Itanium
2 server, the throughput is 670 Mbytes/s regardless of what
data are being compressed.
481.Progressive
Parametric Query Optimization.
Abstract :-
Commercial applications usually rely on precompiled parameterized
procedures to interact with a database. Unfortunately, executing
a procedure with a set of parameters different from those
used at compilation time may be arbitrarily suboptimal. Parametric
query optimization (PQO) attempts to solve this problem by
exhaustively determining the optimal plans at each point of
the parameter space at compile time. However, PQO is likely
not cost-effective if the query is executed infrequently or
if it is executed with values only within a subset of the
parameter space. In this paper, we propose instead to progressively
explore the parameter space and build a parametric plan during
several executions of the same query. We introduce algorithms
that, as parametric plans are populated, are able to frequently
bypass the optimizer but still execute optimal or near-optimal
plans.
482.
Identification of Move Method Refactoring Opportunities.
Abstract :-
Placement of attributes/methods within classes
in an object-oriented system is usually guided by conceptual
criteria and aided by appropriate metrics. Moving state and
behavior between classes can help reduce coupling and increase
cohesion, but it is nontrivial to identify where such refactorings
should be applied. In this paper, we propose a methodology
for the identification of Move
Method refactoring opportunities that constitute a way for
solving many common Feature Envy bad smells. An algorithm
that employs the notion of distance between system entities
attributes/methods) and classes extracts a list of behavior-preserving
refactorings based on the examination of a set of preconditions.
In practice, a software system may exhibit such problems in
many different places. Therefore, our approach measures the
effect of all refactoring suggestions based on a novel Entity
Placement metric that quantifies how well entities have been
placed in system classes. The proposed methodology can be
regarded as a semi-automatic approach since the designer will
eventually decide whether a suggested refactoring should be
applied or not based on conceptual or other design quality
criteria. The evaluation of the proposed approach has been
performed considering qualitative, metric, conceptual, and
efficiency aspects of the suggested refactorings in a number
of open-source projects.
.
483.
Replacing Associative Load Queues A Timing-Centric Approach.
Abstract :-
One of the main challenges of modern processor design is the
implementation of a scalable and efficient mechanism to detect
memory access order violations as a result of out-of-order
execution. Traditional age-ordered associative load queues
are complex, inefficient, and power hungry. In this paper,
we introduce two new dependence checking schemes with different
design tradeoffs, but both explicitly rely on timing information
as a primary instrument to rule out dependence violation.
Our timing-centric designs operate at a fraction of the energy
cost of an associative LQ and achieve the same functionality
with an insignificant performance impact on average. Studies
with parallel benchmarks also show that they are equally effective
and efficient in a chip-multiprocessor environment.
484.
Delay-Constrained Multicast Routing Using the Noisy Chaotic
Neural Networks.
Abstract :-
We present a method to compute the delay-constrained multicast
routing tree by employing chaotic neural networks. The experimental
result shows that the noisy chaotic neural network (NCNN)
provides an optimal solution more often compared to the transiently
chaotic neural network (TCNN) and the Hopfield neural network
(HNN). Furthermore, compared with the bounded shortest multicast
algorithm (BSMA), the NCNN is able to find multicast trees
with lower cost..
485.
A Flexible Software-Based Framework for Online Detection of
Hardware Defects.
Abstract :-
This work proposes a new, software-based, defect detection
and diagnosis technique. We introduce a novel set of instructions,
called Access-Control Extensions (ACE), that can access and
control the microprocessor’s internal state. Special
firmware periodically suspends microprocessor execution and
uses the ACE instructions to run directed tests on the hardware.
When a hardware defect is present, these tests can diagnose
and locate it, and then activate system repair through resource
reconfiguration. The software nature of our framework makes
it flexible: testing techniques can be modified/upgraded in
the field to trade-off performance with reliability without
requiring any change to the hardware. We describe and evaluate
different execution models for using the ACE framework. We
also describe how the proposed ACE framework can be extended
and utilized to improve the quality of post-silicon debugging
and manufacturing testing of modern processors. We evaluated
our technique on a commercial chipmultiprocessor based on
Sun’s Niagara and found that it can provide very high
coverage, with 99.22 percent of all silicon defects detected.
Moreover, our results show that the average performance overhead
of software-based testing is only 5.5 percent. Based on
a detailed register transfer level (RTL) implementation of
our technique, we find its area and power consumption overheads
to be modest, with a 5.8 percent increase in total chip area
and a 4 percent increase in the chip’s overall power
consumption
.
486.
Atomicity Analysis of Service Composition across Organizations.
Abstract :-
Atomicity is a highly desirable property
for achieving application consistency in service compositions.
To achieve atomicity, a service composition should satisfy
the atomicity sphere, a structural criterion for the backend
processes of involved services. Existing analysis techniques
for the atomicity sphere generally assume complete knowledge
of all involved backend processes. Such an assumption is invalid
when some service providers do not release all details of
their backend processes to service consumers outside
the organizations. To address this problem, we propose a process
algebraic framework to publish atomicity-equivalent public
views from the backend processes. These public views extract
relevant task properties and reveal only partial process details
that service providers need to expose. Our framework enables
the analysis of the atomicity sphere for service compositions
using these public views instead of their backend processes.
This allows service consumers to choose suitable services
such that their composition satisfies the atomicity sphere
without disclosing the details of their backend processes.
Based on the theoretical result, we present
algorithms to construct atomicity-equivalent public views
and to analyze the atomicity sphere for a service composition.
Two case studies from the supply chain and insurance domains
are given to evaluate our proposal and demonstrate the applicability
of our approach.
489.
Complexities of Graph-Based Representations for Elementary
Functions.
Abstract :-
This paper analyzes complexities of decision diagrams for
elementary functions such as polynomial, trigonometric,logarithmic,
square root, and reciprocal functions. These real functions
are converted into integer-valued functions by using fixed-point
representation. This paper presents the numbers of nodes in
decision diagrams representing the integer-valued functions.
First, complexities of decision diagrams for polynomial functions
are analyzed, since elementary functions can be approximated
by polynomial functions. A theoretical analysis shows that
binary moment diagrams (BMDs) have low complexity for polynomial
functions. Second, this paper analyzes complexity of edge-valued
binary decision diagrams (EVBDDs) for monotone functions,
since many common elementary functions are monotone. It introduces
a new class of integer functions, Mp-monotone increasing function,
and derives an upper bound on the number of nodes in an EVBDD
for the Mp-monotone increasing function. A theoretical analysis
shows that EVBDDs have low complexity for Mp-monotone increasing
functions. This paper also presents the exact number of nodes
in the
smallest EVBDD for the n-bit multiplier function, and a variable
order for the smallest EVBDD.
490.
Constructing Minimum Connected Dominating Sets with Bounded
Diameters in Wireless Networks.
Abstract :-
Connected Dominating Sets (CDSs) can serve as virtual backbones
for wireless networks. A smaller virtual backbone incurs less
maintenance overhead. Unfortunately, computing a minimum size
CDS is NP-hard, and thus most researchers in this area concentrate
on how to construct smaller CDSs. However, people neglected
other important metrics of network, such as diameter and average
hop distances between two communication parties. In this paper,
we investigate the problem of constructing quality CDS in
terms of size, diameter, and Average Backbone Path Length
(ABPL). We present two centralized algorithms having constant
performance ratios for its size and diameter of the constructed
CDS. Especially, the size of CDS computed by the second algorithm
is no more than 6.906 times of its optimal solution. Furthermore,
we give its distributed version, which not only can be implemented
in real situation easily but also considers energy to extend
network lifetime. In our simulation, we show that in average
the distributed algorithm not only generates a CDS with smaller
diameter and ABPL than related work but also suppresses its
size well. We also show that it is more energy efficient than
others in prolonging network lifetime.
491.
Content Outsourcing via Generalized Communities.
Abstract :-
Content distribution networks (CDNs) balance costs and quality
in services related to content delivery. Devising an efficient
content outsourcing policy is crucial since, based on such
policies, CDN providers can provide client-tailored content,
improve performance, and result in significant economical
gains. Earlier content outsourcing approaches may often prove
ineffective since they drive prefetching decisions by assuming
knowledge of content popularity statistics, which are not
always available and are extremely
volatile. This work addresses this issue, by proposing a novel
self-adaptive technique under a CDN framework on which outsourced
content is identified with no a priori knowledge of (earlier)
request statistics. This is employed by using a structure-based
approach identifying coherent clusters of “correlated”
Web server content objects, the so-called Web page communities.
These communities are the core outsourcing unit, and in this
paper, a detailed simulation experimentation has shown that
the proposed technique is robust and
effective in reducing user-perceived latency as compared with
competing approaches, i.e., two communities-based approaches,
Web caching, and non-CDN.
492.
Difficulty-Aware Hybrid Search in P2P Networks.
Abstract :-
By combining an unstructured protocol with a DHT-based global
index, hybrid peer-to-peer (P2P) improves search efficiency
in terms of query recall and response time. The major challenge
in hybrid search is how to estimate the number of peers that
can answer a given query. Existing approaches assume that
such a number can be directly obtained by computing item popularity.
In this work, we show that such an assumption is not always
valid, and previous designs cannot distinguish whether items
related to a query are distributed in many peers or are in
a few peers. To address this issue, we propose QRank, a difficulty-aware
hybrid search, which ranks queries by weighting keywords based
on term frequency. Using rank values, QRank selects proper
search strategies for queries. We conduct comprehensive trace-driven
simulations to evaluate this design. Results show that QRank
significantly improves the search quality as well as reducing
system traffic cost compared with existing approaches.
493.
Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing.
Abstract :-
Large applications executing on Grid or cluster architectures
consisting of hundreds or thousands of computational nodes
create problems with respect to reliability. The source of
the problems are node failures and the need for dynamic configuration
over extensive runtime. This paper presents two fault-tolerance
mechanisms called Theft-Induced Checkpointing and Systematic
Event Logging. These are transparent protocols capable of
overcoming problems associated with both benign faults, i.e.,
crash faults, and node or subnet volatility. Specifically,
the protocols base the state of the execution on a dataflow
graph, allowing for efficient recovery in dynamic heterogeneous
systems as well as multithreaded applications. By allowing
recovery even under different numbers of processors, the approaches
are especially suitable for applications with a need for adaptive
or reactionary configuration control. The low-cost protocols
offer the capability of controlling or bounding the overhead.
A formal cost model is presented, followed by an
experimental evaluation. It is shown that the overhead of
the protocol is very small, and the maximum work lost by a
crashed process is small and bounded.
494,495,496,...Continue
Click NEXT