Thesis/Outline|
/Todo introduction |
|
INTRODUCTION What problems do filesharing systems solve? Weaknesses of File-sharing Systems Goal: Creating and Discovering Communities Approach Decompose File-sharing Applications Into Orthogonal Aspects Development of a Schema for File-sharing Communities Communities are objects All Files are XML Files Contributions User-designed Communities Framework for Creating File-sharing Applications Framework for Sharing Communities Standard Metadata Layer Thesis Overview |
|
(see also /OneSentence) The use of structured documents to represent shared resources forms a framework for the flexible creation and discovery of peer-to-peer resource-sharing communities. |
|
STATE OF THE ART Classifying Peer-to-peer systems Network architecture Hybrid Systems Pure Systems Super peers Structured Systems Anonymity and Censorship Search Query Routing Communities Location-centric systems Search for Location-centric systems Metadata Summary |
|
(why worthwhile) * what is peer-to-peer * allows better distribution of information: robustness (distributed system - nodes can come and go without affecting availability), scalability (popular documents replicated - means load is distributed) |
|
EFFECTIVE SUPPORT FOR COMMUNITIES |
|
(findings) * documents include metadata - effective search entails making all of this metadata searchable * effective search also means ability to find communities * metadata searches available for narrow class of documents * problem is: applications require a priori knowledge of metadata structure - means that it is not extensible * structured documents are machine readable - can use them to separate metadata knowledge from the application * u-p2p is a prototype framework which allows communities to be created from structured documents * u-p2p treats communities as sharable resources: allows communities to be discovered |
|
THE CONCEPT OF COMMUNITY What is a Community: Community Schema Design Attributes of file-sharing communities Protocol Security Anonymity Deniability Authentication Format Name Community as Class/Community? as Object Community as Bootstrap Communities – challenges Proliferation of communities Evolution of communities Composition of communities |
|
background information |
|
U-P2P IS A FRAMEWORK |
|
(optional?) * expand on advantages of peer-to-peer networks, file-sharing * expand on semantic web * discuss standards: xml, xsl, xslt, xml schema * what problems are these designed to solve? * xml - meaningful metadata (vs. html) - machine readable * xsl - separate content from presentation - this is a similar problem: communities are content, application is presentation * xslt - ? * xml schema - evolution from dtd - specified in xml |
|
DESIGN Overview Schemas and the Choice of XML Schema Advantages of XML Compatible with Other Technologies How XML Allows Flexible Community Creation Using XML Wrappers to Express Objects Interface Design and Challenges Metadata Issues Quality of Metadata Subsetting Metadata Subsetting Details Repository Design Adapter Design |
|
/XmlReferences |
|
DETAILED DESIGN Interface - html -> webserver -> tomcat Query Format and Challenges Advanced Approaches to Creating Objects Adapter Detailed Design |
|
review of the state of the art |
|
APPLICATIONS Hybrid Communities Optimizing search Implementing Community Aware Gnutella Modelling, Simulating, Testing Performance |
|
(keep in mind - present don't analyse) |
|
CONCLUSIONS |
|
progression: * no knowledge about metadata except filename (metadata embedded in filename) - napster, early gnutella * knowledge of binary format which embeds metadata (e.g. mp3 id3, word document) - fasttrack, opencola * knowledge of metadata fields (schema) for limited set of objects - limewire, edutella * knowledge of one format that includes pointer to the schema of the embedded data - open archive initiative http://www.openarchives.org/OAI_protocol/openarchivesprotocol.html - used to index "deep web" but centralized |
|
FUTURE WORK |
|
todo: * is edutella classification correct? * read more about OAI. * classify jxtasearch - impression: isn't this just infrastructure? * classify other resource discovery research - is it working in p2p - how can we trim down this list? - for example: meta-data search engines * classify distributed file system research (e.g. chord) - is this relevant? * is jabber important - xml structured messaging * classify bitzi - metadata associated with signatures - trust research question problem statement * reiterate /OneSentence * there is no flexible way to extend the benefits of peer-to-peer file-sharing to a wide variety of documents * requirements: easy to add new document types, can perform search on some (preferrably all) document metadata, easy to discover the community question is unanswered - have to relate the following to specific examples in "state of the art" * no consistent way to search meta-data fields of different types: can search embedded metadata or pre-programmed metadata - but apps have a priori knowledge of object structure - means it can't take advantage of metadata that it doesn't already know about what it means: search is primitive - least common denominator: filenames * no simple way to extend existing applications to accomodate different types: embedded knowledge means that tailoring to new object types requires extending an existing application or creating a new one, principle of least powers suggests we should find a simpler way to specify this (e.g. not in a programming language) - why? "The reason for this is that the less powerful the language, the more you can do with the data stored in that language. If you write it in a simple declarative from, anyone can write a program to analyze it in many ways." - tim berners lee - http://www.w3.org/DesignIssues/Principles.html * communities are fragmented: what does this mean - each application has knowledge of a different subset of documents, current approach means having different apps for each type - alternately communities of objects that aren't even searchable using peer-to-peer - wide variety of centralized databases with custom interfaces worthwhile * existing structured documents: CML, Genetic, Biodiversity, Design Patterns, others * come up with case studies: How have these been leveraged in the past - e.g. how are objects of this type currently shared and indexed * need to show that they've never been used in p2p context - or if they have show how it is different * prove that it is useful to share these types of document - increases availability of the document, accessibility - same as justification for publishing on the web but with better scalability * futureproofs the infrastructure - will we be sharing the same types of files five years from now? gives future binary formats flexible machine-readable wrapper describing how you solved the problem implementation: ideas object as structured document * xml schema - way of incorporating this knowledge on the fly - not embedded in application, but since it is standards based the application can now have a priori knowledge * application rendering as transformation (search, create, view) * metadata indexing as transformation * knowledge layer independence from network layer * how all of the above help resource discovery community as structured document * of course the system has to have a priori knowledge of at least a single schema in order to perform initial search - but this is intuitive - this should be the search engine for OTHER object schemas * this is the concept of "community" in up2p * metaclass analogy * re-state how resource discovery can be applied to community objects * how community object can bootstrap entry into community * components of a community object implementation: details * schema, stylesheets - machine readable data separate from program * benefit of rendering interface in html (html is an xml specification of a ui!) * how does schema generate application * xml transformations + rendering: why a servlet container/web server is ideal * independent network layer? have to show details of how it can sit on different network architectures (requires further work) * why and how of indexing of metadata - stylesheet (requires further work) - * standard query/publish interface: publish - xml specification of object, query - partial xml specification of an object? * software architecture stuff - use case, object diagrams (further work - reading? use tools?) conclusion conclusions * a framework that allows file-sharing to be effectively applied to a wide variety of objects has been developed * the same framework can also be used to allow discovery of resource-sharing communities * the foundation of this framework is the separation of the specification of the shared object (content) from the application used to share the object (presentation) * structured documents provide the ideal method of implementing this separation contributions * created a prototype of the framework * demonstrated use of framework to create a community for sharing design patterns * demonstrated use of framework to discover multiple communities * illustrated protocol independence of the framework future research * object creation interface less restrictive than html forms * improved query language - e.g. xml query * more robust methods of marking indexed attributes (is it even required) * protocol-independence - show it running on a wider range of protocols - jxtasearch, gnutella, fasttrack, etc. * closer integration of xml schema tool - hide underlying xml - auto-generate stylesheets * rendering of complex schema |
INTRODUCTION What problems do filesharing systems solve? Weaknesses of File-sharing Systems Goal: Creating and Discovering Communities Approach Decompose File-sharing Applications Into Orthogonal Aspects Development of a Schema for File-sharing Communities Communities are objects All Files are XML Files Contributions User-designed Communities Framework for Creating File-sharing Applications Framework for Sharing Communities Standard Metadata Layer Thesis Overview
STATE OF THE ART
Classifying Peer-to-peer systems
Network architecture
Hybrid Systems
Pure Systems
Super peers
Structured Systems
Anonymity and Censorship
Search
Query Routing
Communities
Location-centric systems
Search for Location-centric systems
Metadata
Summary
EFFECTIVE SUPPORT FOR COMMUNITIES
THE CONCEPT OF COMMUNITY What is a Community: Community Schema Design Attributes of file-sharing communities Protocol Security Anonymity Deniability Authentication Format Name Community as Class/Community? as Object Community as Bootstrap Communities – challenges Proliferation of communities Evolution of communities Composition of communities
U-P2P IS A FRAMEWORK
DESIGN Overview Schemas and the Choice of XML Schema Advantages of XML Compatible with Other Technologies How XML Allows Flexible Community Creation Using XML Wrappers to Express Objects Interface Design and Challenges Metadata Issues Quality of Metadata Subsetting Metadata Subsetting Details Repository Design Adapter Design
DETAILED DESIGN Interface - html -> webserver -> tomcat Query Format and Challenges Advanced Approaches to Creating Objects Adapter Detailed Design
APPLICATIONS Hybrid Communities Optimizing search Implementing Community Aware Gnutella Modelling, Simulating, Testing Performance
CONCLUSIONS
FUTURE WORK