I noticed recently that changing the size of parentheses in LaTeX results in a different encoding of the text in the pdf (as determined using the copy/paste functionality in my pdf viewer). For instance, in the command
$\sin(x) + \sin\bigl(x\bigr)$
the first term is encoded as s i n ( x )
whereas the second term is encoded ass i n <CR> <LF> <U+FFFD> <CR> <LF> x <CR> <LF> <U+FFFD>
. (Here <CR>
is carriage return, <LF>
is line feed, and <U+FFFD>
is the unicode symbol for an unknown, unrecognised, or unrepresentable character).
This behavior is undesirable because it makes it impossible to find all instance of "sin(x)" by searching the file. As a mathematician who exclusively reads papers and books on the computer, I find it very important that pdf documents be easily searchable. This is also critical from an accessibility perspective.
Question: Is there any easy way to improve the encoding of the pdf file so that (for instance) the two terms above are encoded in the same way?
This site has a related problem that was solved using the accsupp
package, but in my case that method results in the text string \sin (x) + \sin \big (x\big )
, which is not so desirable either and is a hassle to make work in the LaTeX file. [Random question: do some visually impaired users prefer that the size of delimiters be recorded, as this output suggests?]