Customizing pdf output of Pandoc

2019-11-02
3 min read

I am writing my PhD thesis and instead of using LaTeX, I want to write it in markdown together with Pandoc. This has several merits. I can easily transform the markdown file to docx for my supervisor to revise. It can also easily be transformed to pdf files through LaTeX. However, the default pdf output doesn’t conform to the format my school requires. What I am going to do is to customize the pdf format to meet the standard of my school.

There is another problem in customizing the pdf output. Since I write the thesis in Chinese, how to modify the fonts and other character-related features does not seem to be obvious. To customize the format of the bibliography is an even greater problem.

As a matter of fact, there are two ways to customize the pdf output of Pandoc. The first is using a LaTeX template. Another is to use the --header option in the Pandoc command. I will concentrate on the latter approach and maybe a later post will deal with the first.

The format

My graduate school does not specify strictly every detail of the format. The main requirements are :

  • The font size of the main matter is 10.5 point, or in No.5 (五号字);
  • The margin for the paper is 2.54 cm for top and bottom, and 3.17 cm for left and right;
  • The space of the main matter is 20 pound;
  • The font of the Chapter and Section header is in Heiti(黑体) and the number in Chinese character;
  • The font of the Subsection and below is in Kaishu (楷书) and the number in arabic number;
  • The footnote is in the format of ①, ②, and is numbered by page. The font size of footnote is 9 point;
  • The font size of the captions of figures and tables is 9 point.

The requirements are not too detailed, so I think a preamble file will suffice.

General format

The setting of the general format is straightforward. I will use geometry to set the page margin and setspace for line spacing.

Page margin

To set the page margin, simply use the package geometry and add the corresponding command in the preamble part.

\usepackage{geometry}
\geometry{top=2.54cm, bottom=2.54cm, left=3.17cm, right=3.17cm}

This sets the margin for top and bottom to 2.54cm, and 3.17cm for left and right.

Line spacing

Set line spacing is also straightforward using the setspace package.

\usepackage{setspace}
\linespread{1.5}

The format of chapters and sections

The general format of chapters and sections can be customized using the titlesec package.

\titleformat{\chapter}{\centering\Large\heiti}{第\zhnum{chapter}章}{1em}{}
\titleformat*{\section}{\centering\Large\heiti}
\titleformat*{\subsubsection}{\kaiti}

The above commands set the font of chapters and sections to \heiti(黑体) and that of subsubsections to \kaiti.

\renewcommand\thesection{第\zhnum{section}节}
\renewcommand\thesubsection{\zhnum{subsection}、}
\renewcommand\thesubsubsection{(\zhnum{subsubsection})}

Font

The xeCJK package provides the support of font for Chinese characters. The fonts I need are Heiti and Kaiti, while SimSun is chosen as the main font. To achieve these, I first set the main font for CJK characters to SimSun.

\usepackage{xeCJK}
\setmainfont{Times New Roman}
\setCJKmainfont{SimSun}

I also set the font for Roman alphabets to Times New Roman using the function \setmainfont provided by the fontspec package. Note that I didn’t import the package explicitly, since it is refered in the xeCJK package.

A more serious problem is to specify font for some of the text while not affecting the main font. In this case, xeCJK has a function \setCJKfamilyfont, and I use it to define a command that specify the text included in the braces.

\newcommand{\kaiti}{\setCJKfamilyfont{kaiti}{KaiTi} \CJKfamily{kaiti}}
\newcommand{\heiti}{\setCJKfamilyfont{heiti}{SimHei} \CJKfamily{heiti}}

I defined \kaiti and \heiti based on the built-in font of xeCJK. To specify the font of any text, simply put it in the brace as a plain LaTeX command: \kaiti{楷体} or \heiti{黑体}.

Footnote

To number footnote by page, I use the option perpage in the footmisc package.

\usepackage[perpage]{footmisc}

To mark footnotes with circled numbers, I searched online and found the pifont package provides such symbols. What I have to do is to redefine the \footnote command such that it uses the circled numbers.

\usepackage{pifont}
\renewcommand{\thefootnote}{\ding{\numexpr171+\value{footnote}}

The pifont package provides a lot of symbols, which can be viewed on its quick reference. To use other symbols, simply change the number corresponding to the symbol.

The font size

Since the requirement of the format is mixed (It uses Number for body and headings, but points for footnotes and captions), I will use 11pt to approximate No.5 font size in Chinese. Using this specification, 9pt can be defined using the \footnotesize command, and No.2 size (21pt) can be defined using the \huge command. As for the size of captions, I use the caption package.

\usepackage{caption}
\captionsetup{font=footnotesize}

Output to pdf using pandoc

To put it all together and store the settings in a file named preable.tex, the markdown file can be output to pdf files in the desired form using pandoc.

pandoc --pdf-engine=xelatex -H preamble.tex thesis.md -o thesis.pdf

When running the command, an error occured:

Error producing PDF.
! Argument of \paragraph has an extra }.
<inserted text>
                \par
l.1628 \ttl@extract\paragraph

I searched and this post in Stack Overflow solves the problem. Simply add subparagraph: yes to the header of the markdown file, and the pdf file can be generated. Another tip is to add numbersections: true to the header to automatically generate numbered sections.

To do

Despite all these effort, another serious problem remain untouched. Citation and bibiliography. To simply cite and generate bibliography in LaTeX is effortless through BibTex. But to generate bibliography in desired Chinese format is a hard problem. I will explore this question later on.

In fact, I do have a good idea on how to solve this problem. Pandoc provides a powerful tool called csl. To generate bibliography in Chinese format, there are three files to choose from:

  • chinese-gb7714-1987-numeric.csl;
  • chinese-gb7714-2005-numeric.csl;
  • chinese-gb7714-2005-author-date.csl.

I can follow this line, but the output is not so satisfying, and I decided to do some research in native LaTeX on how people solve this problem.