用pandoc生成大型中文文档的痛点与解决方案
Pandoc是转换文本格式的利器。在用 Pandoc 转换中文文档和生成大型文档时,例如用中文写作毕业论文等时,会遇到一些很麻烦的问题。我在前面的博客里说过我在用 Markdown 写博士毕业论文,这篇博客就我自己的经验讲一下在用 Pandoc 生成大型中文文档的痛点与解决方案。
2019-11-25
6 min read
Customizing pdf output of Pandoc
I am writing my PhD thesis and instead of using LaTeX, I want to write it in markdown together with Pandoc. This has several merits. I can easily transform the markdown file to docx for my supervisor to revise. It can also easily be transformed to pdf files through LaTeX. However, the default pdf output doesn’t conform to the format my school requires. What I am going to do is to customize the pdf format to meet the standard of my school.
2019-11-02
3 min read
Scraping all the texts of Luxun(鲁迅) from the Internet using Python (用Python爬取《鲁迅全集》)
I want to do some text mining practices on the texts of Luxun(鲁迅), a great Chinese writer. The first step is to get all the texts by Luxun, and I have no time typing all the texts word by word. So I decided to srape the texts from an online source. Source of the texts The texts of Luxun are scraped from 子夜星网. As it claimed, it contains all the texts in the Complete works of Luxun(鲁迅全集). I checked it, and so it did.
2019-10-12
2 min read