Hero scripts of machete.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
filesite 58bff9943c spider ready, task moniter ready 1 year ago
bot spider ready, task moniter ready 1 year ago
bypass bot lib for douyin done 1 year ago
data add todo and data dir 1 year ago
lib spider ready, task moniter ready 1 year ago
plugin add client log plugin 1 year ago
test spider ready, task moniter ready 1 year ago
tmp bot kuaishou support share url 1 year ago
todo add todo and data dir 1 year ago
.gitignore add todo and data dir 1 year ago
LICENSE Initial commit 1 year ago
README.md Readme update 1 year ago
config.mjs spider ready, task moniter ready 1 year ago
install_cloud.sh cloud start command update 1 year ago
install_hero.sh cloud and hero install script ready 1 year ago
package.json add spider and task manager 1 year ago
spider.mjs spider ready, task moniter ready 1 year ago

README.md

machete的hero爬虫脚本库

Hero scripts of machete.

本项目基于Node.js和开源工具Hero(官网:https://ulixee.org)。

支持以下平台视频分享网页数据抓取:

  • 抖音网页版
  • 快手网页版
  • 西瓜视频网页版
  • Bilibili

爬虫采集到的数据结构见Machete项目的TaJian皮肤文档。

目录/文件说明

  • bot - 针对各大平台的网页HTML解析类
  • bypass - 针对各大平台的常用域名收集
  • test - 类库测试代码
  • tmp - 临时文件保存目录
  • install_cloud.sh - hero服务端安装(非必需)
  • install_hero.sh - hero客户端安装

使用方法

  1. 下载本源码到本地后,进入项目根目录;
git clone "https://git.filesite.io/filesite/machete_hero.git"
cd machete_hero/
  1. 执行下面命令安装依赖包:
npm install

如果你对npm和node不熟悉,请自行了解。

  1. 写一个.mjs脚本,调用bot/下的类库,实现目标网页访问和解析获取所需数据。

bot/目录下的类库调用方法,可参考test/scrap_test.mjs测试脚本, 测试脚本使用方法见test/README.md文档。