论文

A Research on the Method of Fine Granularity Webpage Data Extraction of Open Access Journals

关键词

作者

刘金全 (1964-),男,吉林大学数量经济研究中心教授,博士生导师,主要研究方向为宏观经济计量分析。

参考文献 查看全部 ↓
  • Anton,Tobias,Xpath-Wrapper Induction by Generalizing Tree Traversal Patterns (Lernen,Wissensentdeckung Und Adaptivitt,2005),pp.126-133.
  • Arasu,Arvind,and H. Garcia-Molina,Extracting Structured Data from Webpages (Proc Acm Sigmod International Conference on Management of Data. ACM Press,2003),pp. 337-348.
  • Cai,Deng,et al.,Extracting Content Structure for Web Pages Based on Visual Representation (Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications,Xian,China,April 2003). pp.406-417.
  • Crescenzi,Valter,G. Mecca,and P. Merialdo,RoadRunner:towards automatic data extraction from large Websites (Proceedings of the 27th International Conference on Very Large Data Bases,Roma,Italy,September 2001),pp.109-118.
  • Eikvil,Line,Information extraction from World Wide Web-A survey(Norwegian Computing Center,1999).
  • Frandsen,Tove Faber,“The Integration of Open Access Journals in the Scholarly Communication System:Three Science Fields”,Information Processing and Management 45(2009):131-141.
  • Liu,Bing,R. Grossman,and Y. Zhai,Mining Data Records in Web Pages(paper represented at the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Washington,DC,August 2003),pp. 1-10.
  • Meng,Xiaofeng,et al.,SG-WRAP:A Schema Guided Wrapper Generator Data Engineering (Proceedings of the 18th International Conference on Data Engineering,San Jose,CA,February 2002)
  • Park,Justin,and D. Barbosa,Adaptive Record Extraction from Web Pages (Proceedings of the 16th International Conference on World Wide Web,Banff,Alberta,Canada,May 2007),pp.1335-1336.
  • Simon,Kai,and G. Lausen,Viper:Augmenting Automatic Information Extraction with Visual Perceptions (Proceedings of the 14th ACM international conference on Information and knowledge management,Bremen,Germany,2005). pp.381-388.
  • Wang,Jiying,and F. H. Lochovsky,Data Extraction and Label Assignment for Web Databases(Proceedings of the 12th International Conference on World Wide Web,Budapest,Hungary,May 2003). pp.187-196.
  • Wei,Liu,X.F. Meng,and W.Y. Meng,Vision-Based Web Data Records Extraction (Proceedings of the Ninth International Workshop on the Web and Databases,Chicago,2006).
  • Zhai,Yanhong,and B. Liu,Web Data Extraction Based on Partial Tree Alignment (Proceedings of the 14th international conference on World Wide Web,Chiba,Japan,May 2005). pp.76
  • Zhai,Yanhong,and B. Liu,Automatic Wrapper Generation Using Tree Matching and Partial Tree Alignment (Proceedings of the Twenty-First National Conference on Artificial Intelligence(AAAI-06) and Eighteenth Innovative Applications of Artificial Intelligence Conference(IAAI-06),Boston,MA,USA,July 2006). pp. 1687-1690.
  • Zhao,Hongkun,et al.,Fully Automatic Wrapper Generation for Search Engines (Proceedings of the 14th international conference on World Wide Web,Chiba,Japan,May 2005). pp.66-75.

A Research on the Method of Fine Granularity Webpage Data Extraction of Open Access Journals

可试读20%内容 PDF阅读 阅读器阅览

试读已结束,剩余80%未读

¥18.68 查看全文 >

VIP免费

论文目录

  • 1 Foreword
  • 2 Relevant Work
    1. 2.1 Deep Web Information Extraction Method
    2. 2.2 Analysis of Features of OAJ Websites
  • 3 Field-based Fine Granularity Webpage Data Extraction Method
    1. 3.1 Template Training Process of Data Extraction Rules
      1. 3.1.1 Semi-automatic Customization Tool for Data Extraction Templates
      2. 3.1.2 Training data preprocessing
      3. 3.1.3 Information Path Location
      4. 3.1.4 Rule Template Customization
      5. 3.1.5 Intelligent Correlation of Rule Template and Seed Page
    2. 3.2 Extraction Stage
      1. 3.2.1 Data Update Check Module
      2. 3.2.2 Data Extraction Module
  • 4 Case Validation
  • 5 Conclusion

论文图片/图表

查看更多>>>