I'm using TeXLive 2020 in Debian Bullseye. I generated a PDF document using XeLaTeX containing Devanagari characters. By using the option \XeTeXgenerateactualtext=1
, I'm able to copy the Devanagari text from the XeLaTeX generated PDF into an Unicode-aware text editor.
But when I use ebook-convert
to convert it into a plain text file using
ebook-convert test.pdf test.txt
I'm unable to get the original Devanagari characters back.
edit: The modified MWE is as follows:
\documentclass[12pt]{article}\usepackage{polyglossia} %supports Unicode; compulsory\setdefaultlanguage{english}\setmainfont{Gentium Basic} %Unicode English font; any other font can be used as well. \setotherlanguage{sanskrit}\newfontfamily{\dev}[Script=Devanagari, Mapping=RomDev]{Shobhika}\begin{document}\XeTeXgenerateactualtext=1 \textit{Plain Unicode Diacritical Text:} dhṛtarāṣṭra uvāca \\ \textit{Plain Unicode Devanagari text:} {\dev धृतराष्ट्रउवाच} \\ \textit{Devanagari text generated from RomDev.tec:} {\dev dhṛtarāṣṭra uvāca}\end{document}
I have many XeLaTeX generated PDFs containing Devanagari characters, and I want to convert them into plain text documents (using the CLI and not Copy-Paste) for further usage, but am unable to do so. Please help me.
Regards.