第58节 lychee 实现 docs 检查
❤️💕💕记录sealos开源项目的学习过程。k8s,docker和云原生的学习。Myblog:http://nsddd.top
[TOC]
介绍
什么是 lychee?
⚡ 用Rust编写的快速、异步、基于流的链接检查器。
在Markdown、HTML、reStructuredText或任何其他文本文件或网站中查找损坏的超链接和邮件地址!
可作为命令行实用程序、库和 GitHub Action使用。
安装
在基于APT/dpkg的Linux发行版(例如Debian、Ubuntu、Linux Mint和Kali Linux),以下命令将安装所有必需的构建依赖项,包括Rust工具链和 cargo
:
curl -sSf 'https://sh.rustup.rs' | sh
apt install gcc pkg-config libc6-dev libssl-dev
使用
递归检查当前目录中支持的文件中的所有链接
lychee .
您还可以指定各种类型的输入:
# check links in specific local file(s):
lychee README.md
lychee test.html info.txt
# check links on a website:
lychee https://endler.dev
# check links in directory but block network requests
lychee --offline path/to/directory
# check links in a remote file:
lychee https://raw.githubusercontent.com/lycheeverse/lychee/master/README.md
# check links in local files via shell glob:
lychee ~/projects/*/README.md
# check links in local files (lychee supports advanced globbing and ~ expansion):
lychee "~/projects/big_project/**/README.*"
# ignore case when globbing and check result for each link:
lychee --glob-ignore-case --verbose "~/projects/**/[r]eadme.*"
# check links from epub file (requires atool: https://www.nongnu.org/atool)
acat -F zip {file.epub} "*.xhtml" "*.html" | lychee -
Lychee将其他文件格式解析为明文,并使用linkify提取链接。如果没有格式或编码细节,这通常工作得很好,但如果您需要对新文件格式的专门支持,请考虑创建一个问题。
Docker Usage
下面是如何将本地目录挂载到容器中,并使用lychee检查一些输入。传递 --init
参数,以便可以从终端停止荔枝。我们还传递 -it
来启动一个交互式终端,这是显示进度条所必需的。
docker run --init -it -v `pwd`:/input lycheeverse/lychee /input/README.md
GitHub Token
为了避免在检查GitHub链接时受到速率限制,您可以选择使用Github令牌设置环境变量,如 GITHUB_TOKEN=xxxx
,或使用 --github-token
CLI选项。它也可以在配置文件中设置。 下面是一个示例配置文件。
令牌可以在您的GitHub帐户设置页面中生成。没有额外权限的个人令牌足以检查公共存储库链接。
Commandline Parameter参数
有大量命令行参数可用于自定义行为。下面是一个完整的列表。
A fast, async link checker
Finds broken URLs and mail addresses inside Markdown, HTML, `reStructuredText`, websites and more!
Usage: lychee [OPTIONS] <inputs>...
Arguments:
<inputs>...
The inputs (where to get links to check from). These can be: files (e.g. `README.md`), file globs (e.g. `"~/git/*/README.md"`), remote URLs (e.g. `https://example.com/README.md`) or standard input (`-`). NOTE: Use `--` to separate inputs from options that allow multiple arguments
Options:
-c, --config <CONFIG_FILE>
Configuration file to use
[default: lychee.toml]
-v, --verbose...
Set verbosity level; more output per occurrence (e.g. `-v` or `-vv`)
-q, --quiet...
Less output per occurrence (e.g. `-q` or `-qq`)
-n, --no-progress
Do not show progress bar.
This is recommended for non-interactive shells (e.g. for continuous integration)
--cache
Use request cache stored on disk at `.lycheecache`
--max-cache-age <MAX_CACHE_AGE>
Discard all cached requests older than this duration
[default: 1d]
--dump
Don't perform any link checking. Instead, dump all the links extracted from inputs that would be checked
--archive <ARCHIVE>
Specify the use of a specific web archive. Can be used in combination with `--suggest`
[possible values: wayback]
--suggest
Suggest link replacements for broken links, using a web archive. The web archive can be specified with `--archive`
-m, --max-redirects <MAX_REDIRECTS>
Maximum number of allowed redirects
[default: 5]
--max-retries <MAX_RETRIES>
Maximum number of retries per request
[default: 3]
--max-concurrency <MAX_CONCURRENCY>
Maximum number of concurrent network requests
[default: 128]
-T, --threads <THREADS>
Number of threads to utilize. Defaults to number of cores available to the system
-u, --user-agent <USER_AGENT>
User agent
[default: lychee/0.12.0]
-i, --insecure
Proceed for server connections considered insecure (invalid TLS)
-s, --scheme <SCHEME>
Only test links with the given schemes (e.g. http and https)
--offline
Only check local files and block network requests
--include <INCLUDE>
URLs to check (supports regex). Has preference over all excludes
--exclude <EXCLUDE>
Exclude URLs and mail addresses from checking (supports regex)
--exclude-file <EXCLUDE_FILE>
Deprecated; use `--exclude-path` instead
--exclude-path <EXCLUDE_PATH>
Exclude file path from getting checked
-E, --exclude-all-private
Exclude all private IPs from checking.
Equivalent to `--exclude-private --exclude-link-local --exclude-loopback`
--exclude-private
Exclude private IP address ranges from checking
--exclude-link-local
Exclude link-local IP address range from checking
--exclude-loopback
Exclude loopback IP address range and localhost from checking
--exclude-mail
Exclude all mail addresses from checking
--remap <REMAP>
Remap URI matching pattern to different URI
--header <HEADER>
Custom request header
-a, --accept <ACCEPT>
Comma-separated list of accepted status codes for valid links
-t, --timeout <TIMEOUT>
Website timeout in seconds from connect to response finished
[default: 20]
-r, --retry-wait-time <RETRY_WAIT_TIME>
Minimum wait time in seconds between retries of failed requests
[default: 1]
-X, --method <METHOD>
Request method
[default: get]
-b, --base <BASE>
Base URL or website root directory to check relative URLs e.g. https://example.com or `/path/to/public`
--basic-auth <BASIC_AUTH>
Basic authentication support. E.g. `username:password`
--github-token <GITHUB_TOKEN>
GitHub API token to use when checking github.com links, to avoid rate limiting
[env: GITHUB_TOKEN]
--skip-missing
Skip missing input files (default is to error if they don't exist)
--include-verbatim
Find links in verbatim sections like `pre`- and `code` blocks
--glob-ignore-case
Ignore case when expanding filesystem path glob inputs
-o, --output <OUTPUT>
Output file of status report
-f, --format <FORMAT>
Output format of final status report (compact, detailed, json, markdown)
[default: compact]
--require-https
When HTTPS is available, treat HTTP links as errors
-h, --help
Print help (see a summary with '-h')
-V, --version
Print version
错误码 (Exit codes)
0
表示成功(已成功检查所有链接或已按照配置排除/跳过所有链接)1
用于丢失的输入和任何意外的运行时失败或配置错误2
表示链路检查失败(如果任何未排除的链路未通过检查)
Ignoring links
您可以通过使用 --exclude
指定正则表达式模式(例如, --exclude example\.(com|org)
)。如果当前工作目录中存在名为 .lycheeignore
的文件,则其内容也将被排除。该文件允许您列出多个要排除的正则表达式(每行一个模式)。
要从扫描中排除文件/目录,请使用 lychee.toml
和 exclude_path
。
exclude_path = ["some/path", "*/dev/*"]
Caching
如果设置了 --cache
标志,荔枝将在当前目录中名为 .lycheecache
的文件中缓存响应。如果文件存在并且设置了标志,则在启动时将加载该高速缓存。这可以大大加快未来的运行速度。请注意,默认情况下,荔枝不会在磁盘上存储任何数据。
用作自己的库
使用 rust 调用~
actions
使用荔枝快速检查Markdown、HTML和文本文件中的链接。
可以融合 issue 的使用,比如说:
与“从文件创建问题”一起使用时,当操作发现链接问题时,将打开 issue
Note 当然,这需要你的权限
Usage
以下是GitHub工作流文件的完整示例:
它将每天检查一次所有存储库链接,并在出现错误时创建一个问题。将此保存在 .github/workflows/links.yml
下:
name: Links
on:
repository_dispatch:
workflow_dispatch:
schedule:
- cron: "00 18 * * *"
jobs:
linkChecker:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Link Checker
id: lychee
uses: lycheeverse/lychee-action@v1.7.0
env:
GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
- name: Create Issue From File
if: env.lychee_exit_code != 0
uses: peter-evans/create-issue-from-file@v4
with:
title: Link Checker Report
content-filepath: ./lychee/out.md
labels: report, automated issue
(You不需要自己配置 GITHUB_TOKEN
;由Github自动设置)。
如果您总是希望使用最新的功能,但又避免破坏性的更改,则可以将版本替换为 lycheeverse/lychee-action@v1
注意
可以对照这个加一些参数,比如 -E 屏蔽所有 localhost 和内网ip的检查,以及忽略401和403的错误,429那个是GitHub的限速报错(可加可不加)
Alternative approach (替代方法)
这将在任何git push事件和所有pull请求期间检查所有存储库链接。如果出现错误,操作将失败。
这样做的好处是确保在合并请求期间,不会添加任何断开的链接,并且如果任何现有链接断开,它们将被捕获。将其保存在 .github/workflows/links-fail-fast.yml
下:
name: Links (Fail Fast)
on:
push:
pull_request:
jobs:
linkChecker:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Link Checker
uses: lycheeverse/lychee-action@v1.7.0
with:
fail: true
env:
GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
标志
END 链接
✴️版权声明 © :本书所有内容遵循CC-BY-SA 3.0协议(署名-相同方式共享)©