Heritrix - Internet Archive Crawler

本项目在 Web爬虫工具 中的评分:   8

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/ heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.



其他Java Web爬虫工具 开源项目资源:

本网站对列举的开源项目、软件、源码、类库所评定的分值(PR, Progject Rank),是根据该项目的规模、复杂度、采用人数、开发人数、活跃度、说明文档、演示网站等诸多因素综合所给出主观评价,仅供你参考之用。
List of Companies, Suppliers, Distributors, Importers & Exporters
收藏本网站 | 联系我们 | 英文图书网 | 十万个为什么 | Sitemap生成器 | 国际商贸
Copyright © 2007 - 2012 Why and How