Quantcast
Viewing all articles
Browse latest Browse all 73

How to copy&paste unformatted text from TeX?

Despite the unequivocal superiority of TeX, one utility of Word (or other word processors such as OpenOffice/LibreOffice) is that either formatted or unformatted text can be easily exported. For example, copy&paste from Word by default yields the formatted text, and copy&Ctrl+Shift+V yields the unformatted text. For the TeX system, things seem complicated. Copy from PDF usually carries extra formatting, say, hyphens at line ends, or ligatures. There are PDF viewers e.g. Adobe Reader which offers "Copy" vs "Copy With Formatting", but this largely depends on the behavior of the PDF software. There are alternatives like the TeX package dvi-text which converts DVI to plain text. But this still adds to the complexity.

Let's think about this use case. I write down something and make a PDF for my archive. Because I want to make things formatted, I typeset in TeX instead of plain text. And later I need to fill out some forms online, which I prefer copy&paste of the unformatted text from the PDF. What does the TeX community recommend for such task? Thanks in advance.


Update 1. This is a comparison between TeXLive 2017 (pdfTeX v1.40.18) and 2025 (pdfTeX v1.40.27) following @Mico's suggestion. The test Tex is

\documentclass[11pt,a4paper]{article}\usepackage{lipsum}\begin{document}\lipsum[1]\end{document}

Below shows what the PDF produced from 2017 vs 2025 looks like:

With the same PDF reader (Acrobat v2025.001.20474), selection-right-click-"Copy" then Ctrl-V paste into Word (365 up-to-date version) produces the following snapshots:

Note that the page layout is set landscape to avoid word wrap so that each line in C and D is truly individual.

The result bears out @Mico's observation in that TeXLive 2025 can recognize and automatically strip the extra hyphens. But it worked only for Adobe, at least not for TeXworks or SumatraPDF, as far as I tested on Windows 10 (v2009). And even Adobe cannot handle the extra line endings, another annoyance that I resonate with @cfr.


Update 2. There have been several similar questions, in addition to the one shared by @Marijn (thanks). Some were asked 10 years ago...

The last one is particularly interesting. It suggests that it's not a problem specific to TeX, but also to other PDF producers. Looks like something tough.


Viewing all articles
Browse latest Browse all 73

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>