Hero scripts of machete.

filesite 7ee17c68b6 add bot for all websites		1 year ago
bot	add WebCrawler for all websites	1 year ago
bypass	bot lib for douyin done	2 years ago
data	Readme update	2 years ago
lib	add bot for all websites	1 year ago
log	add task log, max run time	1 year ago
plugin	change log time's format	2 years ago
test	add bot for all websites	1 year ago
tmp	bot kuaishou support share url	2 years ago
todo	Readme update	2 years ago
.gitignore	add bot for all websites	1 year ago
LICENSE	Initial commit	2 years ago
README.md	add bot for all websites	1 year ago
cloud.mjs	add bot for all websites	1 year ago
config.mjs	add bot for all websites	1 year ago
install_cloud.sh	add bot for all websites	1 year ago
install_hero.sh	cloud and hero install script ready	2 years ago
package.json	project name change	1 year ago
spider.mjs	add bot for all websites	1 year ago
start_cloud.sh	add bot for all websites	1 year ago

machete的hero爬虫脚本库

Hero scripts of machete.

本项目基于Node.js和开源工具Hero（Ulixee官网）。

支持以下平台视频分享网页数据抓取：

爬虫采集到的数据结构见Machete项目的TaJian皮肤文档，目前实现了以下几个属性：

目录/文件说明

git clone "https://git.filesite.io/filesite/machete_hero.git"
cd machete_hero/

npm install

如果你对npm和node不熟悉，请自行了解。

./start_cloud.sh

npm start

带参数启动，设置自定义配置文件，覆盖默认的config.mjs

npm start -- config_custom.json

在目录todo/里创建任务文件，爬虫检测到新任务后自动抓取数据并保存到data/目录下。

手动添加任务命令示例：

echo "https://tajian.tv" > todo/test_01.task

写一个.mjs脚本，调用bot/下的类库，实现目标网页访问和解析获取所需数据。

还可以参考bot/下的类库，实现对任意网站的数据抓取。

bot/目录下的类库调用方法，可参考test/scrap_test.mjs测试脚本，测试脚本使用方法见test/README.md文档。