[Home]Thesis/Outline

Search: | FrontPage | Thesis | RecentChanges | Preferences

Difference (from prior author revision) (no other diffs)

Changed: 1,3c1,15
/Todo


introduction
INTRODUCTION
What problems do filesharing systems solve?
Weaknesses of File-sharing Systems
Goal: Creating and Discovering Communities
Approach
Decompose File-sharing Applications Into Orthogonal Aspects
Development of a Schema for File-sharing Communities
Communities are objects
All Files are XML Files
Contributions
User-designed Communities
Framework for Creating File-sharing Applications
Framework for Sharing Communities
Standard Metadata Layer
Thesis Overview

Changed: 5,6c17,31
(see also /OneSentence)
The use of structured documents to represent shared resources forms a framework for the flexible creation and discovery of peer-to-peer resource-sharing communities.
STATE OF THE ART
Classifying Peer-to-peer systems
Network architecture
Hybrid Systems
Pure Systems
Super peers
Structured Systems
Anonymity and Censorship
Search
Query Routing
Communities
Location-centric systems
Search for Location-centric systems
Metadata
Summary

Changed: 8,10c33
(why worthwhile)
* what is peer-to-peer
* allows better distribution of information: robustness (distributed system - nodes can come and go without affecting availability), scalability (popular documents replicated - means load is distributed)
EFFECTIVE SUPPORT FOR COMMUNITIES

Changed: 12,19c35,50
(findings)
* documents include metadata - effective search entails making all of this metadata searchable
* effective search also means ability to find communities
* metadata searches available for narrow class of documents
* problem is: applications require a priori knowledge of metadata structure - means that it is not extensible
* structured documents are machine readable - can use them to separate metadata knowledge from the application
* u-p2p is a prototype framework which allows communities to be created from structured documents
* u-p2p treats communities as sharable resources: allows communities to be discovered
THE CONCEPT OF COMMUNITY
What is a Community: Community Schema Design
Attributes of file-sharing communities
Protocol
Security
Anonymity
Deniability
Authentication
Format
Name
Community as Class/Community? as Object
Community as Bootstrap
Communities challenges
Proliferation of communities
Evolution of communities
Composition of communities

Changed: 21c52
background information
U-P2P IS A FRAMEWORK

Changed: 23,31c54,67
(optional?)
* expand on advantages of peer-to-peer networks, file-sharing
* expand on semantic web
* discuss standards: xml, xsl, xslt, xml schema
* what problems are these designed to solve?
* xml - meaningful metadata (vs. html) - machine readable
* xsl - separate content from presentation - this is a similar problem: communities are content, application is presentation
* xslt - ?
* xml schema - evolution from dtd - specified in xml
DESIGN
Overview
Schemas and the Choice of XML Schema
Advantages of XML
Compatible with Other Technologies
How XML Allows Flexible Community Creation
Using XML Wrappers to Express Objects
Interface Design and Challenges
Metadata Issues
Quality of Metadata
Subsetting Metadata
Subsetting Details
Repository Design
Adapter Design

Changed: 33c69,73
/XmlReferences
DETAILED DESIGN
Interface - html -> webserver -> tomcat
Query Format and Challenges
Advanced Approaches to Creating Objects
Adapter Detailed Design

Changed: 35c75,79
review of the state of the art
APPLICATIONS
Hybrid Communities
Optimizing search
Implementing Community Aware Gnutella
Modelling, Simulating, Testing Performance

Changed: 37c81
(keep in mind - present don't analyse)
CONCLUSIONS

Changed: 39,43c83
progression:
* no knowledge about metadata except filename (metadata embedded in filename) - napster, early gnutella
* knowledge of binary format which embeds metadata (e.g. mp3 id3, word document) - fasttrack, opencola
* knowledge of metadata fields (schema) for limited set of objects - limewire, edutella
* knowledge of one format that includes pointer to the schema of the embedded data - open archive initiative http://www.openarchives.org/OAI_protocol/openarchivesprotocol.html - used to index "deep web" but centralized
FUTURE WORK

Removed: 45,126d84
todo:
* is edutella classification correct?
* read more about OAI.
* classify jxtasearch - impression: isn't this just infrastructure?
* classify other resource discovery research - is it working in p2p - how can we trim down this list? - for example: meta-data search engines
* classify distributed file system research (e.g. chord) - is this relevant?
* is jabber important - xml structured messaging
* classify bitzi - metadata associated with signatures - trust

research question

problem statement
* reiterate /OneSentence
* there is no flexible way to extend the benefits of peer-to-peer file-sharing to a wide variety of documents
* requirements: easy to add new document types, can perform search on some (preferrably all) document metadata, easy to discover the community

question is unanswered - have to relate the following to specific examples in "state of the art"

* no consistent way to search meta-data fields of different types: can search embedded metadata or pre-programmed metadata - but apps have a priori knowledge of object structure - means it can't take advantage of metadata that it doesn't already know about what it means: search is primitive - least common denominator: filenames

* no simple way to extend existing applications to accomodate different types: embedded knowledge means that tailoring to new object types requires extending an existing application or creating a new one, principle of least powers suggests we should find a simpler way to specify this (e.g. not in a programming language) - why? "The reason for this is that the less powerful the language, the more you can do with the data stored in that language. If you write it in a simple declarative from, anyone can write a program to analyze it in many ways." - tim berners lee - http://www.w3.org/DesignIssues/Principles.html

* communities are fragmented: what does this mean - each application has knowledge of a different subset of documents, current approach means having different apps for each type - alternately communities of objects that aren't even searchable using peer-to-peer - wide variety of centralized databases with custom interfaces

worthwhile
* existing structured documents: CML, Genetic, Biodiversity, Design Patterns, others
* come up with case studies: How have these been leveraged in the past - e.g. how are objects of this type currently shared and indexed
* need to show that they've never been used in p2p context - or if they have show how it is different
* prove that it is useful to share these types of document - increases availability of the document, accessibility - same as justification for publishing on the web but with better scalability
* futureproofs the infrastructure - will we be sharing the same types of files five years from now? gives future binary formats flexible machine-readable wrapper

describing how you solved the problem

implementation: ideas

object as structured document
* xml schema - way of incorporating this knowledge on the fly - not embedded in application, but since it is standards based the application can now have a priori knowledge
* application rendering as transformation (search, create, view)
* metadata indexing as transformation
* knowledge layer independence from network layer
* how all of the above help resource discovery

community as structured document
* of course the system has to have a priori knowledge of at least a single schema in order to perform initial search - but this is intuitive - this should be the search engine for OTHER object schemas
* this is the concept of "community" in up2p
* metaclass analogy
* re-state how resource discovery can be applied to community objects
* how community object can bootstrap entry into community
* components of a community object

implementation: details

* schema, stylesheets - machine readable data separate from program
* benefit of rendering interface in html (html is an xml specification of a ui!)
* how does schema generate application
* xml transformations + rendering: why a servlet container/web server is ideal
* independent network layer? have to show details of how it can sit on different network architectures (requires further work)
* why and how of indexing of metadata - stylesheet (requires further work) -
* standard query/publish interface: publish - xml specification of object, query - partial xml specification of an object?
* software architecture stuff - use case, object diagrams (further work - reading? use tools?)

conclusion

conclusions
* a framework that allows file-sharing to be effectively applied to a wide variety of objects has been developed
* the same framework can also be used to allow discovery of resource-sharing communities
* the foundation of this framework is the separation of the specification of the shared object (content) from the application used to share the object (presentation)
* structured documents provide the ideal method of implementing this separation

contributions
* created a prototype of the framework
* demonstrated use of framework to create a community for sharing design patterns
* demonstrated use of framework to discover multiple communities
* illustrated protocol independence of the framework

future research
* object creation interface less restrictive than html forms
* improved query language - e.g. xml query
* more robust methods of marking indexed attributes (is it even required)
* protocol-independence - show it running on a wider range of protocols - jxtasearch, gnutella, fasttrack, etc.
* closer integration of xml schema tool - hide underlying xml - auto-generate stylesheets
* rendering of complex schema

 INTRODUCTION
  What problems do filesharing systems solve?
  Weaknesses of File-sharing Systems
  Goal: Creating and Discovering Communities
  Approach
   Decompose File-sharing Applications Into Orthogonal Aspects
   Development of a Schema for File-sharing Communities
   Communities are objects
   All Files are XML Files
  Contributions
   User-designed Communities
   Framework for Creating File-sharing Applications
   Framework for Sharing Communities
   Standard Metadata Layer
  Thesis Overview

 STATE OF THE ART
  Classifying Peer-to-peer systems
   Network architecture
    Hybrid Systems
    Pure Systems
    Super peers
   Structured Systems
   Anonymity and Censorship
   Search
    Query Routing
    Communities
    Location-centric systems
    Search for Location-centric systems
    Metadata
  Summary

 EFFECTIVE SUPPORT FOR COMMUNITIES

 THE CONCEPT OF COMMUNITY
  What is a Community: Community Schema Design
  Attributes of file-sharing communities
   Protocol
   Security
   Anonymity
   Deniability
   Authentication
   Format
   Name
  Community as Class/Community? as Object
  Community as Bootstrap
  Communities  challenges
   Proliferation of communities
   Evolution of communities
   Composition of communities

 U-P2P IS A FRAMEWORK

 DESIGN
  Overview
  Schemas and the Choice of XML Schema
  Advantages of XML
   Compatible with Other Technologies
   How XML Allows Flexible Community Creation
   Using XML Wrappers to Express Objects
  Interface Design and Challenges
  Metadata Issues
   Quality of Metadata
   Subsetting Metadata
   Subsetting Details
  Repository Design
  Adapter Design

 DETAILED DESIGN
  Interface - html -> webserver -> tomcat
  Query Format and Challenges
  Advanced Approaches to Creating Objects
  Adapter Detailed Design

 APPLICATIONS
  Hybrid Communities
  Optimizing search
  Implementing Community Aware Gnutella
  Modelling, Simulating, Testing Performance

 CONCLUSIONS

 FUTURE WORK


FrontPage | Thesis | RecentChanges | Preferences
This page is read-only | View other revisions
Last edited November 30, 2002 11:52 am (diff)
Search: